Decompression of deflate compressed data in web pages by Zlib Library

In general, there will be a Content-Encoding field in the head of the web page request to indicate that the web page has compression algorithm enabled to improve the efficiency of web page transmission. Generally, Gzip or deflate are used as field values. In fact, deflate compression algorithm is used to compress data. Occasionally encountered in the work of such page content without decompression of the code is troublesome.
Deflate is the most basic algorithm. gzip adds 10 bytes of gzheader before deflate's rawdata, 8 bytes of check bytes (optional crc32 and adler32) and length identifier bytes at the end.
The zlib library is a common C++ decompression zip file library, which provides a variety of interfaces for invocation. Whereas deflate uses inflateInit(), gzip initializes it with inflateInit2(), which has one more parameter than inflateInit(): - MAX_WBITS, which means processing raw deflate data. Because zlib compressed blocks in gzip data do not have two bytes of zlib header. When using inflateInit2, the zlib library is required to ignore the zlib header.
zlib provides many interfaces, encapsulating the simplest two interfaces, compress and uncompress, on top of these complex operations. We usually call it directly.
The following functions are briefly introduced:
1, deflateInit() + deflate() + deflateEnd()
The compression function is accomplished by the combination of three functions. See the test_deflate() function of example.c. In fact, compress() function is implemented by these three functions (compress.c file of Engineering zlib).
2, inflateInit() + inflate() + inflateEnd()
Similar to the above, uncompress() functions are implemented internally using uncompress() functions for decompression.
3,uLong compressBound(uLong sourceLen);
Calculating the required buffer length, this function does not accurately calculate how long the compressed data is, but it can ensure that the compressed length will not be too long, so it is easy to allocate space.
Next is our most commonly used pair of functions.
4,int compress (Bytef *dest, uLongf *destLen, const Bytef *source, uLong sourceLen);
Compressed data, it should be noted that the size of the BUF after compression is pre-allocated, which is why we use the compressBound function.
5,int uncompress (Bytef *dest, uLongf *destLen,const Bytef *source, uLong sourceLen);
Decompression data, the same, after decompression BUF size to know in advance, too small will decompress failure.
In general, compress and uncompress functions are sufficient to meet our requirements.
Finally, attach the code to decompress a file with compressed data obtained from the web page:

/*************************************************************************
    > File Name: un_zip.cpp
    > Author:zeus 
    > Mail:zuixuewosha@163.com 
    > Created Time: 2017 Friday, June 09, 2006, 16:32:28
    > useage: a.out src_file  dst_file
 ************************************************************************/

#include <stdlib.h>
#include <stdio.h>
#include <zlib.h>

int main(int argc, char* argv[])
{
    FILE* file;
    unsigned char* src_buf = NULL;
    unsigned char* ubuf = NULL;
    int failed_number = 0;

    /* Decompress the data of srcfile file file by command line parameters and store it in dstfile file */
    if(argc < 3)
    {
        printf("Usage: a.out srcfile dstfile\n");
        return -1;
    }

    if((file = fopen(argv[1], "rb")) == NULL)
    {
        printf("Can\'t open %s!\n", argv[1]);
        return -1;
    }
    /* Loading source file data to buffer */
    fseek(file,0,SEEK_END);
    unsigned long src_length = ftell(file);
    unsigned long dst_length = 65536;
    rewind(file);
    printf("src_length is %d\n",src_length);
    if((src_buf = (unsigned char*)malloc(sizeof(unsigned char) * src_length)) == NULL)
    {
        printf("No enough memory!\n");
        fclose(file);
        return -1;
    }
    fread(src_buf, sizeof(unsigned char), src_length, file);
    /* Decompress data, allocate more space after failure, maximum number of failures 10 */
    while(failed_number < 10)
    {
        if((ubuf = (unsigned char*)malloc(sizeof(unsigned char) * dst_length)) == NULL)
        {
            printf("No enough memory!\n");
            fclose(file);
            return -1;
        }

        if(uncompress(ubuf, &dst_length, src_buf, src_length) != Z_OK)
        {
            printf("Uncompress %s failed,try allocate more space!\n", argv[1]);
            free(ubuf);
            ubuf = NULL;
            /*Allocate twice the space after failure*/
            dst_length *=2; 
            failed_number++;
        }
        else
        {
            break;
        }
    }
    fclose(file);

    if((file = fopen(argv[2], "wb")) == NULL)
    {
        printf("Can\'t create %s!\n", argv[2]);
        return -1;
    }
    /* Save the decompressed data to the target file */
    fwrite(ubuf, sizeof(unsigned char), dst_length, file);
    fclose(file);

    free(src_buf);
    free(ubuf);

    return 0;
}

Keywords: zlib encoding

Added by musclehead on Sun, 23 Jun 2019 01:20:29 +0300