preface
Demand background
Snowball has seen a surge in users and product lines in recent years. In order to better meet the company's business development and user's personalized needs, the following objectives are achieved:
- Meet the timeliness of information control by users
- Increase the coverage of user terminal models
- Improve user satisfaction and product experience
Snowball unified push platform came into being. Push, as a key channel in APP operation, can well promote the realization of goals through its rational use. At present, it has been: paying attention to posting, replying to comments, stock price reminders, individual stock announcements, portfolio position adjustment and other business scenarios to serve users, and helping operators deliver selected content such as 7 * 24-hour information and market express to target users at the first time.
product design
Snowball's early self built push was based on self built long connection and third-party dependency. In the early stage, due to historical business redundancy and poor code maintainability, there were the following problems:
problem | reason |
---|---|
Lack of ACK mechanism | Push is asynchronous, and it is impossible to know whether it will be delivered. The third party may be affected by other push parties, which may cause delay and loss |
Lack of message persistence | Messages are pushed one by one, and history and status cannot be traced |
Lack of idempotent retransmission mechanism | There are too many push links. Any problem in any link will cause message loss and can no longer be received |
Complex client access logic | Every time a new APP is accessed, repeated work is required, and the code cannot be reused |
Strong coupling between client and SDK of push service | The interfaces provided by the push end are not uniform. If the Client needs to be replaced, it needs to be rewritten |
Lack of data monitoring and statistics | How many push, how many success, how many failure |
Based on the above problems, through investigation and comparison of the implementation schemes in the industry, and based on the current situation of snowball, the self built push channel based on major mainstream manufacturers is designed and realized, which fundamentally solves the practical difficulties faced by the push scene.
Channel capacity building
Android channel
There are many Android mobile phone manufacturers, and the push service inevitably needs to face the problem of fragmentation. At present, snowball push has integrated the channels of Huawei, Xiaomi, OPPO, VIVO and Meizu native mobile phone manufacturers, and the access of other devices depends on the third-party Youmeng channel.
In terms of push content audit, quota restriction and flow control, major mobile phone manufacturers have their own different platform rules. In the face of these common problems, the platform has been following up the development promoted by the China Academy of communications and communications since the establishment of the platform Unified push Alliance At present, it is not an appropriate time to implement landing in combination with the current situation of snowball.
The following is the optimization scheme of the snowball push platform, which does not mention that the manufacturer has not reached or found similar problems at the current stage.
Limit on the total amount of push in a single day
The total push limit of mobile phone manufacturers is shown in the following table:
Solution:
- Optimize the business content push logic, formulate different strategies for the distribution of channel content of each manufacturer, and ensure the distribution of key content
- Different push limits can be added by submitting applications according to the type of APP application and the rules of the manufacturer
According to the message distribution, identify the business type to which you belong, and prioritize to ensure the distribution of key content
message Message { int64 messageId = 1; // Push message batch number string title = 2; // Push message title string payload = 3; // Push content body string description = 4; // Description above notification bar (summary) string callback = 5; // "eg: http://example.com "Callback address" string summary_callback = 6; // "eg: http://img.com "Notification picture address" Type type = 7; // Business type: distribution priority is divided according to business type Application app = 8; // For the pushed client, multi terminal apps are distributed by the same platform repeated int64 target = 9; // Push target user (detailed push user, array format) int64 created = 10; // Message creation time int32 ttl = 11; // Expiration time of message (unit: ms) map<string, string> ext = 12; // Other custom fields map<string, string> version_filter = 13; // Version filtering, funnel mode Application targetType = 14; // id type of push target user }
Real time push rate limit
Snowball is a wealth management application, in which transaction, market and content information have always been the primary content concerned by users. The real-time requirement for push is high, and the push service faces many problems such as data volume and QPS. The flow control restrictions are as follows:
Solution:
- Similar to the quota solution, optimize the distribution logic and ensure the distribution of key contents
- Make full use of batch distribution interfaces provided by major manufacturers
- The QPS restricted by Xiaomi and Huawei is the interface access frequency. Therefore, before the data reaches the manufacturer's channel, it is aggregated in advance according to the user's transmission channel, and batch transmission is used as much as possible. (for example, according to Xiaomi's official description, a request can carry up to 1000 target devices. For example, at 3000QPS, up to 3 million devices can be pushed in one second. The highest sending speed can be 300w / S.)
- The batch distribution interface of OPPO and VIVO has different access frequency restrictions from the single distribution interface. During data distribution, it is identified according to the message content. When the upper limit is triggered by the batch distribution interface, it is switched to the single distribution interface.
- Set the message validity time. After triggering the manufacturer's QPS upper limit, the channel layer enters the push and release queue again
//Xiaomi push channel triggers flow control restriction, and the return retry is judged according to the status code ... String responseBody = URLDecoder.decode(response.body().string(), "UTF-8"); JsonNode obj = MAPPER.readTree(responseBody); ... if ("200002".equals(obj.get("code").asText())) { // 200002 speed limit, try again later limitCounter.increment(); LOGGER.warn("millet api Interface call triggers frequency control restriction, retransmission user uid List:{} | Returned message body:{} | Push APP: {} | Of this batch of messages messageId: {}", uidList, responseBody, message.getApp().name(), message.getMessageId()); pushStatusProducer.sendMessageRetry(message.toBuilder().clearTarget().addAllTarget(uidList).build()); return; }
This scheme should pay attention to:
- The deadline of the message is required, otherwise the final distribution is successful, and the timeliness of the content will be compromised in the user experience
- Message retransmission needs to consider idempotency. In the case of weak network and other boundaries, retransmission will lead to repeated push and affect the user experience. The solutions given by major manufacturers of message idempotency are as follows:
In addition to the solutions in the table, the solutions given by the above manufacturers actually have similar solutions. For example, Xiaomi's' extra The 'Jobkey' field or Huawei's' group 'field can realize message folding and improve the user experience.
//Millet API interface request body encapsulation, using notify_ The ID parameter guarantees idempotent message distribution RequestBody requestBody = new FormBody.Builder() .add("payload", MAPPER.valueToTree(messageTemplate).toString()) .add("restricted_package_name", packageName) .add("description", (messageTemplate.getDescription().length() > 120 ? messageTemplate.getDescription().substring(0, 120) + CutString.SUB_TAIL: messageTemplate.getDescription())) .add("extra.notification_large_icon_uri", StringUtils.trimToEmpty(message.getSummaryCallback())) .add("title", messageTemplate.getTitle().length() > 50 ? messageTemplate.getTitle().substring(0, 50) : messageTemplate.getTitle()) .add("pass_through", "0") .add("notify_type", "-1") // When sending a message, developers can set the group ID(JobKey) of the message. Messages with the same group ID will be aggregated into a message group .add("extra.jobkey", String.valueOf(messageTemplate.getMessageId() & Integer.MAX_VALUE)) .add("registration_id", StringUtils.join(deviceTokens, ",")) //By default, only one push message is displayed in the notification bar. If you want to display multiple push messages in the notification bar, you need to set different notify for different messages_ id .add("notify_id", String.valueOf(messageTemplate.getMessageId() & Integer.MAX_VALUE)) .build();
IOS & other channels
The channel distribution of Apple manufacturers is implemented according to the APNs officially provided. In the early stage, it was implemented based on JDK. Due to poor performance, it currently adopts the open-source third-party SDK: push
There are occasional problems during use, but most of them are caused by the network link environment. Through the research, we get a scheme: deploy the service node where the iOS push task is located to the APNs server nearby. However, based on the actual use status and current iOS business requirements, it is only discussed here.
Meizu channel is accessed according to the official API document, which can meet the current QPS and total usage. I won't talk about it here.
Other third-party channels such as Youmeng channel or aurora channel can optimize the two capabilities of the channel on the premise of channel access of the above major manufacturers:
- Access of other mobile phone users to improve the coverage of push distribution
- Assume a fallback role in the construction of the system to ensure the robustness of the system
Platform capacity building
At present, the push platform enriches the system, data and business capabilities of the platform on the basis of providing channel capabilities
System capability
At present, the push platform consists of 8 4vCPU 8GiB servers: 80+w/s total messages are distributed Meet the business index of 1 + billion / day (the current performance bottleneck is limited by the manufacturer). How to ensure the high availability and stability of the system, in addition to good initial architecture design, it is also necessary to carry out lasting optimization iteration and tracking of the system. The index system is convenient for early warning and problem analysis.
There are many problems during the distribution of push channel optimization. Two representative problems are posted here:
Manufacturer channel call selection
The selection of channel distribution is initially integrated with the SDK provided by various manufacturers. Most of the packages conflict with the company's infrastructure, and there are many problems in performance optimization and business compatibility. For example, log component conflict, compatibility difficulty in SD K thread pool adjustment and version upgrade, incomplete returned content of HTTP interface data, etc. Therefore, the API interface is finally selected for encapsulation, the multi-channel message protocol is analyzed by itself, and the push channel connection standard is unified.
For the above reasons, using the asynchronous request of message bus and OkHttp, the data format, code model and performance objectives are unified.
//call_before, OkHttp is packaged in a unified format before distribution public static RequestBody requestBodyFormat(MessageProto.Message message, String packageName, List<String> deviceTokens, boolean channelSwitch) throws UnsupportedEncodingException { MessageTemplate messageTemplate = MessageTemplate.messageConvert(message, MessageProto.Platform.XIAOMI); messageTemplate.setTitle(StringUtils.isEmpty(messageTemplate.getTitle()) ? PushTitleUtils.getTitleFromAPP(message.getApp()) : messageTemplate.getTitle()); RequestBody requestBody = new FormBody.Builder() .add("payload", MAPPER.valueToTree(messageTemplate).toString()) .add("restricted_package_name", packageName) .add("description", (messageTemplate.getDescription().length() > 120 ? messageTemplate.getDescription().substring(0, 120) + CutString.SUB_TAIL: messageTemplate.getDescription())) .add("extra.notification_large_icon_uri", StringUtils.trimToEmpty(message.getSummaryCallback())) .add("title", messageTemplate.getTitle().length() > 50 ? messageTemplate.getTitle().substring(0, 50) : messageTemplate.getTitle()) .add("pass_through", "0") .add("notify_type", "-1") // When sending a message, developers can set the group ID(JobKey) of the message. Messages with the same group ID will be aggregated into a message group .add("extra.jobkey", String.valueOf(messageTemplate.getMessageId() & Integer.MAX_VALUE)) //The batch interface is used to issue up to 1000 devicetokens at a time, making full use of the batch mechanism to improve the system throughput .add("registration_id", StringUtils.join(deviceTokens, ",")) //By default, only one push message is displayed in the notification bar. If you want to display multiple push messages in the notification bar, you need to set different notify for different messages_ id .add("notify_id", String.valueOf(messageTemplate.getMessageId() & Integer.MAX_VALUE)) .build(); return requestBody; } //call, OkHttp to send channel messages public void send(List<UserStateProto.Device> deviceList, MessageProto.Message message, RequestBody requestBody) { List<Long> uidList_GE = deviceList.stream().map(m -> m.getUid()).collect(Collectors.toList()); try { LOGGER.info("millet api User to be sent before interface call uid List:{} | Message sent:{} | Push APP: {} | Of this batch of messages messageId: {}", uidList_GE, OkHttp3ConvertUtils.requestBodyURLToString(requestBody), message.getApp().name(), message.getMessageId()); Request request = new Request.Builder() .url(xiaomiSendUrl) .addHeader("Authorization", String.format("key=%s", accessToken)) .post(requestBody) .build(); Call call = okHttpClient.newCall(request); call.enqueue(new XiaomiResponseCall(deviceList, message, pushStatusProducer)); } catch (Exception e) { exceptionCounter.increment(deviceList.size()); LOGGER.error("millet api Interface calling process exception, failed user uid List:{} | Reason for failure:{} | Push APP: {} | Of this batch of messages messageId: {}", uidList_GE, e.getMessage(), message.getApp().name(), message.getMessageId(), e); pushStatusProducer.sendByDeviceList(PushResultEnum.FAIL, PushFailedTypeEnum.SYSTEM_ERROR, e.getMessage(), deviceList, message); } } //call_back, OkHttp asynchronous result callback public void onResponse(Call call, Response response) throws IOException { String responseBody = URLDecoder.decode(response.body().string(), "UTF-8"); if (response.isSuccessful()) { JsonNode obj = MAPPER.readTree(responseBody); if ("0".equals(obj.get("code").asText())) { JsonNode jsonNode = obj.findPath("data").findPath("bad_regids"); if (jsonNode.isMissingNode()) { successCounter.increment(deviceList.size()); LOGGER.info("millet api The interface call returns all successful users uid List:{} | Returned message body:{} | Push APP: {} | Of this batch of messages messageId: {}", uidList, responseBody, message.getApp().name(), message.getMessageId()); pushStatusProducer.sendByDeviceList(PushResultEnum.SUCCESS, PushFailedTypeEnum.NULL, "SUCCESS", deviceList, message); } else { List<String> failedTokenList = new ArrayList<>(); for (String objNode : jsonNode.textValue().split(",")) { failedTokenList.add(objNode); } List<UserStateProto.Device> failedList = deviceList.stream().filter(f -> failedTokenList.contains(f.getDeviceToken())).collect(Collectors.toList()); failedCounter.increment(failedList.size()); LOGGER.info("millet api The interface call returns some failed users uid List:{} | Returned message body:{} | Push APP: {} | Of this batch of messages messageId: {}", failedList.stream().map(m -> m.getUid()).collect(Collectors.toList()), responseBody, message.getApp().name(), message.getMessageId()); pushStatusProducer.sendByDeviceList(PushResultEnum.IGNORE, PushFailedTypeEnum.CHANNEL_ERROR, responseBody, failedList, message); List<UserStateProto.Device> successedList = deviceList.stream().filter(f -> !failedTokenList.contains(f.getDeviceToken())).collect(Collectors.toList()); successCounter.increment(successedList.size()); LOGGER.info("millet api The interface call returns partially successful users uid List:{} | Returned message body:{} | Push APP: {} | Of this batch of messages messageId: {}", successedList.stream().map(m -> m.getUid()).collect(Collectors.toList()), responseBody, message.getApp().name(), message.getMessageId()); pushStatusProducer.sendByDeviceList(PushResultEnum.SUCCESS, PushFailedTypeEnum.NULL, "SUCCESS", successedList, message); } } else if ("200002".equals(obj.get("code").asText())) { // 200002 speed limit, try again later limitCounter.increment(); LOGGER.warn("millet api Interface call triggers frequency control restriction, retransmission user uid List:{} | Returned message body:{} | Push APP: {} | Of this batch of messages messageId: {}", uidList, responseBody, message.getApp().name(), message.getMessageId()); pushStatusProducer.sendMessageRetry(message.toBuilder().clearTarget().addAllTarget(uidList).build()); return; } else { failedCounter.increment(deviceList.size()); LOGGER.warn("millet api The interface call returns all failed users uid List:{} | Returned message body:{} | Push APP: {} | Of this batch of messages messageId: {}", uidList, responseBody, message.getApp().name(), message.getMessageId()); pushStatusProducer.sendByDeviceList(PushResultEnum.IGNORE, PushFailedTypeEnum.CHANNEL_ERROR, responseBody, deviceList, message); } } else { failedCounter.increment(deviceList.size()); LOGGER.error("millet api The interface call returned an exception. The failed user uid List:{} | Returned message body:{} | Push APP: {} | Of this batch of messages messageId: {}", uidList, responseBody, message.getApp().name(), message.getMessageId()); pushStatusProducer.sendByDeviceList(PushResultEnum.IGNORE, PushFailedTypeEnum.CHANNEL_ERROR, responseBody, deviceList, message); } }
Push message whole chain tracking
Because the offline push is not sent by the self built long connection channel, how to locate the current state of each push message of each user is a problem that can not be ignored. Each manufacturer's push background integrates a corresponding problem Debug tool, so the return data of the API interface in the push platform data embedding point needs to record the manufacturer's corresponding trace_id for problem location and data analysis.
For example, Xiaomi manufacturers need IMEI and the batch ID returned by the interface. The link status issued by the manufacturer can be known through Xiaomi background query
Data capability
The next step to complete message push is to further carry out closed-loop management and effect tracking for different businesses and scenarios, and quantify the push effect through the data market. The data market currently covers dozens of business scenarios of three apps, providing real-time data and offline data analysis.
In the data capacity-building, the architecture directly transmits all data layers on the system link through the message bus. Refine the message format of each message, which is specified by msg_id + uid is used as the unique identifier, and event is used uniformly at the application end_ As a buried point field of the push platform, tracking realizes the specification and access standard of the data index system.
//Message bus real-time push data format specification public void sendByDevice(PushResultEnum pushResultEnum, PushFailedTypeEnum pushFailedTypeEnum, String reason, UserStateProto.Device device, MessageProto.Message message) { MessageAck messageAck = new MessageAck(); messageAck.setUploadTime(System.currentTimeMillis()); messageAck.setMsgId(message.getMessageId()); messageAck.setUid(device.getUid()); messageAck.setChannel(device.getDeviceChannel()); messageAck.setResult(pushResultEnum.getTypeName()); messageAck.setFailedType(pushFailedTypeEnum.getTypeName()); messageAck.setFailedReason(reason); messageAck.setAppVersion(device.getAppVersion()); messageAck.setToken(device.getDeviceToken()); messageAck.setDescription(message.getDescription()); messageAck.setApp(message.getApp().name()); messageAck.setBizType(message.getExtMap().get(TrackingExtKey.BIZ_TYPE)); //Expand the K/V field to meet the temporary change requirements messageAck.setExt(message.getExtMap()); messageAck.setCallback(message.getCallback()); sendMessageACK(messageAck); }
Relying on the push data capability, we can achieve: analysis of APP unloading rate (depending on the manufacturer's push token, and the data can be used as a reference), optimization of push content heat label, optimization of manufacturer channel delivery rate index, optimization of user experience of push business, etc.
Business capability
A powerful push operation console not only provides basic push distribution function, but also provides push effect analysis for operation. For each push message, record the detailed data of each push stage to form funnel analysis. Operators understand the life cycle of a message through the operation console, quantify the push effect, and optimize subsequent topics and groups.
Operation side
Operational decisions are ever-changing. In addition to basic functions such as regular task distribution, the platform has isolated the functional level and data level in architecture design, so as to facilitate the dynamic target selection and algorithm personalization with big data and algorithms.
Audit side
Manufacturers have their own strict standards for push content, the regulatory environment for domestic operation and strict management of user data. The push platform modularizes data flow processing in the platform construction to meet the dynamic adjustment of audit content.
Review summary
The above is mainly to share some problems faced and solved in the process of building and optimizing the push platform, focusing on architecture technology selection and manufacturer channel optimization, mainly including the following two points:
- In terms of architecture, try to decouple business functions from data system, and separate business logic and data analysis by using message bus
- In the selection of channel distribution, API interfaces are used for interaction to facilitate subsequent maintenance, performance optimization and access to personalized business needs
Based on the above solutions and skills, the problems at the beginning of the article can be solved in the following ways:
problem | realization |
---|---|
Lack of ACK mechanism | The ACK status of the manufacturer channel is fed back in real time by using the callback result called by the HTTP interface |
Lack of message persistence | Use MSG for each message_ ID + uid mechanism to build a message tracking and interception mechanism through data capability |
Lack of retransmission mechanism | The idempotent parameters provided by the manufacturer are directly used to achieve the retransmission and distribution of exception messages |
Complex client access logic | Cooperate with front-end infrastructure to precipitate basic capabilities and components to achieve reuse and rapid access |
Strong coupling between client and SDK of push service | Standardize the data embedding point fields of all manufacturers' interfaces, lightweight front-end code and achieve standardized data flow at the same time |
Lack of data monitoring and statistics | Enrich system monitoring and link tracking, and split data and function codes to facilitate quantitative indicators |
Future outlook
Intelligent frequency control and disturbance free design
Improve the utilization efficiency of the overall resources of the platform, reduce unnecessary interruptions of users, and give resources to the parts most concerned by users.
Design of push synchronization in and out of stations
Cooperate with the waterfall reminder in the station to achieve the combination of offline and long-term connection online push of the manufacturer, so as to reduce the pressure of the push platform.
Complementary distribution design of SMS and PUSH
Cooperate with SMS reminder to improve the arrival rate of key information and improve user product experience.
Reference link
APNs / MiPush / HMS / Opush / Vpush / meizu push
Introduction to the author
He Kuang Province, Wang Wenwen, from snowball community platform / basic components.
recruitment information
Snowball business is developing by leaps and bounds, and the engineer team looks forward to the participation of Niu Ren. If you are interested in "being the preferred online wealth management platform for Chinese people", I hope you can make contributions together. Click "read the original" to view the hot positions, waiting for you.