Four ways to improve file upload performance, will you?

Business requirements

Product Manager: Xiao Ming, we need to make an attachment upload request. The content may be pictures, pdf or video.

Xiao Ming: it can be realized, but the file size should be limited. It's better not to exceed 30MB. It's too large, the upload is slow, and the server pressure is also high.

Product Manager: video is necessary for communication. Just limit it to less than 50MB.

Xiao Ming: Yes.

Test classmate: this file is too slow to upload. I tried a 50mb file and spent a minute.

Xiao Ming: what up, so slow.

Product Manager: No, you're too slow. Find a way to optimize it.

The way of optimization

Problem location

The overall file upload and call link is as follows:

Xiao Ming found that the front-end started uploading, and it took nearly 30 seconds to get the request to the back-end. It should be caused by the slow parsing of files by the browser.

The back-end service is also slow to request file service.

Solution

Xiao Ming: does the file service have an asynchronous interface?

File service: not yet.

Xiao Ming: this upload is really slow. Do you have any optimization suggestions?

File service: No, it's so slow.

Xiao Ming:

Finally, Xiao Ming decided to adjust the synchronous return of the back-end to asynchronous return to reduce the waiting time of users.

The implementation of the back-end is adjusted to adapt to the business. The front-end obtains the asynchronous return ID after calling, and the back-end queries the results returned by the file service synchronously according to the ID.

The disadvantage is also obvious. The user does not know that the asynchronous upload fails.

However, due to time reasons, that is, we can weigh the pros and cons and go online temporarily.

Recently, Xiao Ming has some time, so he wants to implement a file service himself.

File service

Since the function of file service is very primitive, Xiao Ming wants to implement one by himself and optimize it from the following aspects:

(1) Compress

(2) Asynchronous

(3) Second transmission

(4) Concurrent

(5) Direct connection

compress

In daily development, communicate with the product as clearly as possible to allow users to upload / download compressed package files.

Because network transmission is very time-consuming.

Another advantage of compressed files is to save storage space. Of course, we generally don't need to consider this cost.

Advantages: simple implementation and outstanding effect.

Disadvantages: you need to combine the business and convince the product. If the product wants picture preview, video playback, compression is not applicable.

asynchronous

For more time-consuming operations, we will naturally think of asynchronous execution to reduce the user's synchronous waiting time.

After receiving the file content, the server returns a request ID and executes the processing logic asynchronously.

How to get the execution results?

There are generally two common schemes:

(1) Provide result query interface

Relatively simple, but there may be invalid queries.

(2) Provide asynchronous result callback function

The implementation is troublesome, and the execution results can be obtained at the first time.

Second transmission

All my friends should have used cloud disk. Sometimes, cloud disk uploads files, but very large files can be uploaded instantly.

How is this achieved?

Each file content corresponds to a unique file hash value.

Before uploading, we can query whether the hash value exists. If it already exists, we can directly add a reference, skipping the link of file transmission.

Of course, this advantage can only be reflected when you have a large amount of user file data and a certain repetition rate.

The pseudo code is as follows:

public FileUploadResponse uploadByHash(final String fileName,
                                       final String fileBase64) {
    FileUploadResponse response = new FileUploadResponse();

    //Determine whether the file exists
    String fileHash = Md5Util.md5(fileBase64);
    FileInfoExistsResponse fileInfoExistsResponse = fileInfoExists(fileHash);
    if (!RespCodeConst.SUCCESS.equals(fileInfoExistsResponse.getRespCode())) {
        response.setRespCode(fileInfoExistsResponse.getRespCode());
        response.setRespMessage(fileInfoExistsResponse.getRespMessage());
        return response;
    }

    Boolean exists = fileInfoExistsResponse.getExists();
    FileUploadByHashRequest request = new FileUploadByHashRequest();
    request.setFileName(fileName);
    request.setFileHash(fileHash);
    request.setAsyncFlag(asyncFlag);
    // If the file does not exist, upload the content again
    if (!Boolean.TRUE.equals(exists)) {
        request.setFileBase64(fileBase64);
    }

    // Call server
    return fillAndCallServer(request, "api/file/uploadByHash", FileUploadResponse.class);
}

Concurrent

Another way is to segment a relatively large file.

For example, 100MB files are cut into 10 sub files and uploaded concurrently. A file corresponds to a unique batch number.

When downloading, download files concurrently according to the batch number and splice them into a complete file.

The pseudo code is as follows:

public FileUploadResponse concurrentUpload(final String fileName,
                                           final String fileBase64) {
    // Segment first
    int limitSize = fileBase64.length() / 10;
    final List<String> segments = StringUtil.splitByLength(fileBase64, limitSize);

    // Concurrent upload
    int size = segments.size();
    final ConcurrentHashMap<Integer, String> map = new ConcurrentHashMap<>();
    final CountDownLatch lock = new CountDownLatch(size);

    for(int i = 0; i < segments.size(); i++) {
        final int index = i;
        Thread t = new Thread() {
            public void run() {
               // Concurrent upload
               // countDown
               lock.countDown();
            }
        };
        t.start();
    }

    // Wait for completion
    lock.await();

    // For information processing after uploading
}

Direct connection

Of course, another strategy is that the client directly accesses the server and skips the back-end services.

Of course, this premise requires that the file service must provide an HTTP file upload interface.

We also need to consider security issues. It is best for the front end to call the back end to obtain the authorization token, and then carry the token for file upload.

Expand reading

Four ways to improve file upload performance, will you?

Seven implementation methods of asynchronous query to synchronization

java compression archiving algorithm framework tool compress

Summary

File upload is a very common business requirement, and the performance of upload must be considered and optimized.

The above methods can be flexibly combined and put into better practice in combination with their own business.

I hope this article is helpful to you. If you like it, you are welcome to like it and forward it.

I'm old ma. I look forward to seeing you again next time.

Keywords: Algorithm

Added by morph07 on Sat, 26 Feb 2022 06:46:27 +0200