Large file upload: second transmission, breakpoint continuous transmission and fragment upload

File upload is an old topic. When the file is relatively small, you can directly convert the file into a byte stream and upload it to the server. However, when the file is relatively large, it is not a good way to upload it in an ordinary way. After all, few people will tolerate it. When the file is interrupted in the middle of uploading, It's an unpleasant experience to continue uploading but only start uploading again. Do you have a better upload experience? The answer is yes. There are several upload methods to be introduced below

1. What is second pass

Generally speaking, if you upload something to be uploaded, the server will do MD5 verification first. If there is the same thing on the server, it will directly give you a new address. In fact, what you download is the same file on the server. If you don't want to transfer it in seconds, as long as you change MD5, you can modify the file itself (changing the name is not enough), such as a text file, If you add a few more words, MD5 will change and will not be transmitted in seconds

2. The second transmission core logic implemented in this paper

a. Use the set method of redis to store the file upload status, where key is the md5 of file upload, and value is the flag bit of whether the upload is completed,

b. When the flag bit true indicates that the upload has been completed, if the same file is uploaded at this time, enter the second transmission logic. If the flag bit is false, it indicates that the upload has not been completed. At this time, you need to call the set method to save the path of the block number file record, where key is the upload file md5 plus a fixed prefix, and value is the block number file record path

Fragment upload

1. What is fragment upload

Slice upload is to separate the files to be uploaded into multiple data blocks (we call them parts) according to a certain size. After uploading, the server will summarize all uploaded files and integrate them into the original files.

2. Scene of fragment upload

1. Large file upload

2. The network environment is poor, and there is a risk of retransmission

Breakpoint continuation

1. What is breakpoint continuation

Breakpoint continuation is to artificially divide the download or upload task (a file or a compressed package) into several parts during download or upload. Each part adopts a thread to upload or download. If there is a network failure, you can continue to upload or download the unfinished part from the part that has been uploaded or downloaded, There is no need to upload or download from scratch. The breakpoint continuation in this paper is mainly aimed at the breakpoint upload scenario.

2. Application scenario

Breakpoint continuation can be regarded as a derivative of piecemeal upload, so breakpoint continuation can be used in all scenarios where piecemeal upload can be used.

3. The core logic of realizing breakpoint continuation

In the process of fragment upload, if the upload is interrupted due to abnormal factors such as system crash or network interruption, the client needs to record the upload progress. When uploading again is supported later, you can continue to upload from the place where the last upload was interrupted.

In order to avoid the problem that the progress data of the client after uploading is deleted, which leads to the restart of uploading from the beginning, the server can also provide corresponding interfaces for the client to query the uploaded fragment data, so as to make the client know the uploaded fragment data and continue to upload from the next fragment data.

4. Implementation process steps

a. Scheme I, general steps

b. Scheme 2. Steps of this paper

5. Implementation of fragment upload / breakpoint upload code

a. The front-end uses the plug-in of webuploader provided by Baidu to partition. Because this article mainly introduces the implementation of the server code and how to fragment the webuploader, the specific implementation can be seen in the following links:

http://fex.baidu.com/webuploader/getting-started.html

b. The backend uses two methods to write files. One is RandomAccessFile. If you are not familiar with RandomAccessFile, you can see the following link:

https://blog.csdn.net/dimudan2015/article/details/81910690

The other is to use MappedByteBuffer. Friends who are not familiar with MappedByteBuffer can check the following links:

https://www.jianshu.com/p/f90866dcbffc

The core code of the back-end write operation

a. RandomAccessFile implementation

@UploadMode(mode = UploadModeEnum.RANDOM_ACCESS)  
@Slf4j  
public class RandomAccessUploadStrategy extends SliceUploadTemplate {  
  
  @Autowired  
  private FilePathUtil filePathUtil;  
  
  @Value("${upload.chunkSize}")  
  private long defaultChunkSize;  
  
  @Override  
  public boolean upload(FileUploadRequestDTO param) {  
    RandomAccessFile accessTmpFile = null;  
    try {  
      String uploadDirPath = filePathUtil.getPath(param);  
      File tmpFile = super.createTmpFile(param);  
      accessTmpFile = new RandomAccessFile(tmpFile, "rw");  
      //This must be consistent with the value set at the front end
      long chunkSize = Objects.isNull(param.getChunkSize()) ? defaultChunkSize * 1024 * 1024  
          : param.getChunkSize();  
      long offset = chunkSize * param.getChunk();  
      //The offset to locate the slice
      accessTmpFile.seek(offset);  
      //Write the partition data
      accessTmpFile.write(param.getFile().getBytes());  
      boolean isOk = super.checkAndSetUploadProgress(param, uploadDirPath);  
      return isOk;  
    } catch (IOException e) {  
      log.error(e.getMessage(), e);  
    } finally {  
      FileUtil.close(accessTmpFile);  
    }  
   return false;  
  }  
  
}

b. MappedByteBuffer implementation

@UploadMode(mode = UploadModeEnum.MAPPED_BYTEBUFFER)  
@Slf4j  
public class MappedByteBufferUploadStrategy extends SliceUploadTemplate {  
  
  @Autowired  
  private FilePathUtil filePathUtil;  
  
  @Value("${upload.chunkSize}")  
  private long defaultChunkSize;  
  
  @Override  
  public boolean upload(FileUploadRequestDTO param) {  
  
    RandomAccessFile tempRaf = null;  
    FileChannel fileChannel = null;  
    MappedByteBuffer mappedByteBuffer = null;  
    try {  
      String uploadDirPath = filePathUtil.getPath(param);  
      File tmpFile = super.createTmpFile(param);  
      tempRaf = new RandomAccessFile(tmpFile, "rw");  
      fileChannel = tempRaf.getChannel();  
  
      long chunkSize = Objects.isNull(param.getChunkSize()) ? defaultChunkSize * 1024 * 1024  
          : param.getChunkSize();  
      //Write the partition data
      long offset = chunkSize * param.getChunk();  
      byte[] fileData = param.getFile().getBytes();  
      mappedByteBuffer = fileChannel  
.map(FileChannel.MapMode.READ_WRITE, offset, fileData.length);  
      mappedByteBuffer.put(fileData);  
      boolean isOk = super.checkAndSetUploadProgress(param, uploadDirPath);  
      return isOk;  
  
    } catch (IOException e) {  
      log.error(e.getMessage(), e);  
    } finally {  
  
      FileUtil.freedMappedByteBuffer(mappedByteBuffer);  
      FileUtil.close(fileChannel);  
      FileUtil.close(tempRaf);  
  
    }  
  
    return false;  
  }  
  
}

c. File operation core template class code

@Slf4j  
public abstract class SliceUploadTemplate implements SliceUploadStrategy {  
  
  public abstract boolean upload(FileUploadRequestDTO param);  
  
  protected File createTmpFile(FileUploadRequestDTO param) {  
  
    FilePathUtil filePathUtil = SpringContextHolder.getBean(FilePathUtil.class);  
    param.setPath(FileUtil.withoutHeadAndTailDiagonal(param.getPath()));  
    String fileName = param.getFile().getOriginalFilename();  
    String uploadDirPath = filePathUtil.getPath(param);  
    String tempFileName = fileName + "_tmp";  
    File tmpDir = new File(uploadDirPath);  
    File tmpFile = new File(uploadDirPath, tempFileName);  
    if (!tmpDir.exists()) {  
      tmpDir.mkdirs();  
    }  
    return tmpFile;  
  }  
  
  @Override  
  public FileUploadDTO sliceUpload(FileUploadRequestDTO param) {  
  
    boolean isOk = this.upload(param);  
    if (isOk) {  
      File tmpFile = this.createTmpFile(param);  
      FileUploadDTO fileUploadDTO = this.saveAndFileUploadDTO(param.getFile().getOriginalFilename(), tmpFile);  
      return fileUploadDTO;  
    }  
    String md5 = FileMD5Util.getFileMD5(param.getFile());  
  
    Map<Integer, String> map = new HashMap<>();  
    map.put(param.getChunk(), md5);  
    return FileUploadDTO.builder().chunkMd5Info(map).build();  
  }  
  
  /**  
   * Check and modify the file upload progress
   */  
  public boolean checkAndSetUploadProgress(FileUploadRequestDTO param, String uploadDirPath) {  
  
    String fileName = param.getFile().getOriginalFilename();  
    File confFile = new File(uploadDirPath, fileName + ".conf");  
    byte isComplete = 0;  
    RandomAccessFile accessConfFile = null;  
    try {  
      accessConfFile = new RandomAccessFile(confFile, "rw");  
      //Mark the segment as true to indicate completion
      System.out.println("set part " + param.getChunk() + " complete");  
      //Create a conf file. The length of the file is the total number of partitions. Each time a partition is uploaded, a 127 is written to the conf file. Then the location that has not been uploaded is 0 by default, and the uploaded location is byte MAX_ VALUE 127  
      accessConfFile.setLength(param.getChunks());  
      accessConfFile.seek(param.getChunk());  
      accessConfFile.write(Byte.MAX_VALUE);  
  
      //completeList: check whether all the pieces are completed. If all the pieces in the array are 127 (all the pieces are uploaded successfully)
      byte[] completeList = FileUtils.readFileToByteArray(confFile);  
      isComplete = Byte.MAX_VALUE;  
      for (int i = 0; i < completeList.length && isComplete == Byte.MAX_VALUE; i++) {  
        //And operation. If some parts are not completed, then # isComplete # is not # byte MAX_ VALUE  
        isComplete = (byte) (isComplete & completeList[i]);  
        System.out.println("check part " + i + " complete?:" + completeList[i]);  
      }  
  
    } catch (IOException e) {  
      log.error(e.getMessage(), e);  
    } finally {  
      FileUtil.close(accessConfFile);  
    }  
 boolean isOk = setUploadProgress2Redis(param, uploadDirPath, fileName, confFile, isComplete);  
    return isOk;  
  }  
  
  /**  
   * Save the upload progress information into redis
   */  
  private boolean setUploadProgress2Redis(FileUploadRequestDTO param, String uploadDirPath,  
      String fileName, File confFile, byte isComplete) {  
  
    RedisUtil redisUtil = SpringContextHolder.getBean(RedisUtil.class);  
    if (isComplete == Byte.MAX_VALUE) {  
      redisUtil.hset(FileConstant.FILE_UPLOAD_STATUS, param.getMd5(), "true");  
      redisUtil.del(FileConstant.FILE_MD5_KEY + param.getMd5());  
      confFile.delete();  
      return true;  
    } else {  
      if (!redisUtil.hHasKey(FileConstant.FILE_UPLOAD_STATUS, param.getMd5())) {  
        redisUtil.hset(FileConstant.FILE_UPLOAD_STATUS, param.getMd5(), "false");  
        redisUtil.set(FileConstant.FILE_MD5_KEY + param.getMd5(),  
            uploadDirPath + FileConstant.FILE_SEPARATORCHAR + fileName + ".conf");  
      }  
  
      return false;  
    }  
  }  
/**  
   * Save file operation
   */  
  public FileUploadDTO saveAndFileUploadDTO(String fileName, File tmpFile) {  
  
    FileUploadDTO fileUploadDTO = null;  
  
    try {  
  
      fileUploadDTO = renameFile(tmpFile, fileName);  
      if (fileUploadDTO.isUploadComplete()) {  
        System.out  
            .println("upload complete !!" + fileUploadDTO.isUploadComplete() + " name=" + fileName);  
        //TODO save file information to database
  
      }  
  
    } catch (Exception e) {  
      log.error(e.getMessage(), e);  
    } finally {  
  
    }  
    return fileUploadDTO;  
  }  
/**  
   * File rename
   *  
   * @param toBeRenamed The file whose name will be modified
   * @param toFileNewName New name
   */  
  private FileUploadDTO renameFile(File toBeRenamed, String toFileNewName) {  
    //Check whether the file to be renamed exists and whether it is a file
    FileUploadDTO fileUploadDTO = new FileUploadDTO();  
    if (!toBeRenamed.exists() || toBeRenamed.isDirectory()) {  
      log.info("File does not exist: {}", toBeRenamed.getName());  
      fileUploadDTO.setUploadComplete(false);  
      return fileUploadDTO;  
    }  
    String ext = FileUtil.getExtension(toFileNewName);  
    String p = toBeRenamed.getParent();  
    String filePath = p + FileConstant.FILE_SEPARATORCHAR + toFileNewName;  
    File newFile = new File(filePath);  
    //Modify file name
    boolean uploadFlag = toBeRenamed.renameTo(newFile);  
  
    fileUploadDTO.setMtime(DateUtil.getCurrentTimeStamp());  
    fileUploadDTO.setUploadComplete(uploadFlag);  
    fileUploadDTO.setPath(filePath);  
    fileUploadDTO.setSize(newFile.length());  
    fileUploadDTO.setFileExt(ext);  
    fileUploadDTO.setFileId(toFileNewName);  
  
    return fileUploadDTO;  
  }  
}

summary

In the process of fragment upload, the front-end and back-end need to cooperate. For example, the file size of the upload block number at the front and rear ends must be consistent, otherwise there will be problems in the upload. Secondly, a file server should be set up for normal file related operations, such as using fastdfs, hdfs, etc.

In this example code, when the computer is configured with 4-core memory of 8G, it takes more than 30 minutes to upload 24G files. The main time is spent on the calculation of md5 value at the front end, and the writing speed at the back end is still relatively fast. If the project team thinks that the self built file server takes too much time and the project needs only upload and download, it is recommended to use Alibaba oss server. For its introduction, you can check the official website:

https://help.aliyun.com/product/31815.html

Alibaba oss is essentially an object storage server rather than a file server. Therefore, if there is a need to delete or modify a large number of files, oss may not be a good choice.

Keywords: Java network server

Added by phpsir on Sat, 29 Jan 2022 20:14:38 +0200

Programming VIP