0x01 introduction
MD5 message digest algorithm (English: MD5 message digest algorithm), a widely used cryptographic hash function, can generate a 128 bit (16 byte) hash value. It can be used to ensure the integrity and consistency of information transmission. Save the password in the database (md5(pass,salt) mode).
0x02 algorithm flow
1. Data filling
Fill the message with data so that the length of the message is 448 for 512. Set the message length as X, that is, X mod 512=448 is met. According to this formula, the length of data to be filled is obtained.
Filling method: fill after the message. The first digit is 1 and the rest is 0.
2. Add message length
After the result of the first step, fill in the length of the original message. The storage length that can be used for is 64 bits. If the message length is greater than 2 ^ 64, only its lower 64 bit value is used, that is (the message length is modulo 264).
After this step, the final message length is an integer multiple of 512.
3. Initialize variables
The message digest is calculated with four variables (A,B,C,D). A, B, C and D here are 32-bit registers.
A = 01234567h B = 89abcdefh C = fedcba98h D = 76543210h
4. Data processing
Messages are processed in 512 bit packets. Firstly, four auxiliary functions are defined, each of which takes three 32-bit doublewords as input and outputs a 32-bit doubleword. The main flow is shown in the following figure. ABCD is four 32-bit values used to assist in the calculation of MD5. The X matrix is a chunk of 512bite, and the T matrix is an addition constant.
F(X,Y,Z)=(X & Y) | ((~X) & Z); G(X,Y,Z)=(X & Z) | (Y & (~Z)); H(X,Y,Z)=X ^ Y ^ Z; I(X,Y,Z)=Y ^ (X | (~Z));
5. Output
When the 512 bit grouping store operation is completed, the cascade output of A, B, C and D is the result of MD5 hash
0x03 source code implementation 1
The first source code found online is( https://github.com/pod32g/MD5/blob/master/md5.c ), the implementation process is to install the above algorithm implementation, which is easier to understand. There are two constant matrices in the code, additive constant matrix and cyclic left shift matrix. The initialized ABCD is also one of the features.
#include <stdio.h> #include <stdlib.h> #include <string.h> #include <stdint.h> //T[i]=4294967296*abs(sin(i)), addition constant matrix const uint32_t k[64] = { 0xd76aa478, 0xe8c7b756, 0x242070db, 0xc1bdceee , 0xf57c0faf, 0x4787c62a, 0xa8304613, 0xfd469501 , 0x698098d8, 0x8b44f7af, 0xffff5bb1, 0x895cd7be , 0x6b901122, 0xfd987193, 0xa679438e, 0x49b40821 , 0xf61e2562, 0xc040b340, 0x265e5a51, 0xe9b6c7aa , 0xd62f105d, 0x02441453, 0xd8a1e681, 0xe7d3fbc8 , 0x21e1cde6, 0xc33707d6, 0xf4d50d87, 0x455a14ed , 0xa9e3e905, 0xfcefa3f8, 0x676f02d9, 0x8d2a4c8a , 0xfffa3942, 0x8771f681, 0x6d9d6122, 0xfde5380c , 0xa4beea44, 0x4bdecfa9, 0xf6bb4b60, 0xbebfbc70 , 0x289b7ec6, 0xeaa127fa, 0xd4ef3085, 0x04881d05 , 0xd9d4d039, 0xe6db99e5, 0x1fa27cf8, 0xc4ac5665 , 0xf4292244, 0x432aff97, 0xab9423a7, 0xfc93a039 , 0x655b59c3, 0x8f0ccc92, 0xffeff47d, 0x85845dd1 , 0x6fa87e4f, 0xfe2ce6e0, 0xa3014314, 0x4e0811a1 , 0xf7537e82, 0xbd3af235, 0x2ad7d2bb, 0xeb86d391 }; //Offset per round const uint32_t r[] = { 7, 12, 17, 22, 7, 12, 17, 22, 7, 12, 17, 22, 7, 12, 17, 22, 5, 9, 14, 20, 5, 9, 14, 20, 5, 9, 14, 20, 5, 9, 14, 20, 4, 11, 16, 23, 4, 11, 16, 23, 4, 11, 16, 23, 4, 11, 16, 23, 6, 10, 15, 21, 6, 10, 15, 21, 6, 10, 15, 21, 6, 10, 15, 21 }; //Cycle shift left #define LEFTROTATE(x, c) (((x) << (c)) | ((x) >> (32 - (c)))) //unsigned int to byte[4] void to_bytes(uint32_t val, uint8_t* bytes) { bytes[0] = (uint8_t)val; bytes[1] = (uint8_t)(val >> 8); bytes[2] = (uint8_t)(val >> 16); bytes[3] = (uint8_t)(val >> 24); } //[unsigned] turn 4 uint32_t to_int32(const uint8_t* bytes) { return (uint32_t)bytes[0] | ((uint32_t)bytes[1] << 8) | ((uint32_t)bytes[2] << 16) | ((uint32_t)bytes[3] << 24); } void md5(const uint8_t* initial_msg, size_t initial_len, uint8_t* digest) { uint32_t h0, h1, h2, h3; uint8_t* msg = NULL; size_t new_len, offset; uint32_t w[16]; uint32_t a, b, c, d, i, f, g, temp; // Initialize magic number A, B, C, D h0 = 0x67452301; h1 = 0xefcdab89; h2 = 0x98badcfe; h3 = 0x10325476; //Data filling for (new_len = initial_len + 1; new_len % (512 / 8) != 448 / 8; new_len++) ; msg = (uint8_t*)malloc(new_len + 8); memcpy(msg, initial_msg, initial_len); msg[initial_len] = 0x80; //Add 1 to the first bit of binary 10 0000 of 0x80 for (offset = initial_len + 1; offset < new_len; offset++) msg[offset] = 0; //Add length to_bytes(initial_len * 8, msg + new_len); // initial_ len>>29 == initial_ Len * 8 > > 32 to avoid overflow to_bytes(initial_len >> 29, msg + new_len + 4); //Process each 512byte for (offset = 0; offset < new_len; offset += (512 / 8)) { for (i = 0; i < 16; i++) w[i] = to_int32(msg + offset + i * 4); a = h0; b = h1; c = h2; d = h3; // Main cycle: for (i = 0; i < 64; i++) { if (i < 16) { f = (b & c) | ((~b) & d); g = i; } else if (i < 32) { f = (d & b) | ((~d) & c); g = (5 * i + 1) % 16; } else if (i < 48) { f = b ^ c ^ d; g = (3 * i + 5) % 16; } else { f = c ^ (b | (~d)); g = (7 * i) % 16; } temp = d; d = c; c = b; b = b + LEFTROTATE((a + f + k[i] + w[g]), r[i]); a = temp; } h0 += a; h1 += b; h2 += c; h3 += d; } free(msg); to_bytes(h0, digest); to_bytes(h1, digest + 4); to_bytes(h2, digest + 8); to_bytes(h3, digest + 12); } int main(int argc, char** argv) { char* msg; size_t len; int i; uint8_t result[16]; if (argc < 2) { printf("usage: %s 'string'\n", argv[0]); return 1; } msg = argv[1]; len = strlen(msg); md5((uint8_t*)msg, len, result); // display result for (i = 0; i < 16; i++) printf("%2.2x", result[i]); puts(""); return 0; }
0x04 source code implementation 2
The second source code found is the official original document and detailed notes of the source code based on the algorithm( https://blog.csdn.net/ZCShouCSDN/article/details/83866095 ). The algorithm is a little difficult to understand.
- Call MD5Init to initialize MD5_CTX, initialize ABCD constant
- Call MD5Update, and the multiple part of 512byte is calculated circularly first
- Call MD5Final, fill in and add the length of the last part that is not 512byte, then call MD5Update for circular calculation, and finally return the MD5 value
In the implementation of this version, there is no matrix of the above two constants, but MD5 still exists_ ABCD feature during CTX initialization. Four basic operations, such as FF, GG, HH and II, can be easily identified
#include <string.h> #include <stdio.h> typedef struct { /* The number and length of bits storing the original information (excluding the filled bits), and the longest is 2^64 bits. If the message length is greater than 2 ^ 64, only its lower 64 bit value is used, that is (the message length is modulo of 2 ^ 64) */ unsigned int count[2]; /* Four 32bits are used to store the final calculated message summary. When the message length is greater than 512 bits, it is also used to store the intermediate results of each 512 bits */ unsigned int state[4]; /*Buffer for storing input information, 512bits */ unsigned char buffer[64]; } MD5_CTX; void MD5Init(MD5_CTX* context); void MD5Update(MD5_CTX* context, unsigned char* input, unsigned int inputlen); void MD5Final(MD5_CTX* context, unsigned char digest[16]); /* MD5 The constants used in the conversion are specified by the algorithm itself */ #define S11 7 #define S12 12 #define S13 17 #define S14 22 #define S21 5 #define S22 9 #define S23 14 #define S24 20 #define S31 4 #define S32 11 #define S33 16 #define S34 23 #define S41 6 #define S42 10 #define S43 15 #define S44 21 /* MD5 The basic function specified by the algorithm itself */ #define F(x, y, z) ((x & y) | (~x & z)) #define G(x, y, z) ((x & z) | (y & ~z)) #define H(x, y, z) (x ^ y ^ z) #define I(x, y, z) (y ^ (x | ~z)) /* Shift the x loop to the left by n bits */ #define ROTATE_LEFT(x, n) ((x << n) | (x >> (32 - n))) /** MD5 Four rounds of transformation specified by the algorithm itself * Rotation is separate from addition to prevent recomputation. **/ #define FF(a, b, c, d, x, s, ac) \ { \ a += F(b, c, d) + x + ac; \ a = ROTATE_LEFT(a, s); \ a += b; \ } #define GG(a, b, c, d, x, s, ac) \ { \ a += G(b, c, d) + x + ac; \ a = ROTATE_LEFT(a, s); \ a += b; \ } #define HH(a, b, c, d, x, s, ac) \ { \ a += H(b, c, d) + x + ac; \ a = ROTATE_LEFT(a, s); \ a += b; \ } #define II(a, b, c, d, x, s, ac) \ { \ a += I(b, c, d) + x + ac; \ a = ROTATE_LEFT(a, s); \ a += b; \ } /** * Why 64 bytes for the buffer used for bits filling? Because when the number of bits of the information to be encrypted is divided by 512 and the remainder is 448, * The maximum number of bits to be filled is 512 = 64 * 8. **/ unsigned char PADDING[] = { 0x80, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }; static void MD5Transform(unsigned int state[4], unsigned char block[64]); static void MD5Encode(unsigned char* output, unsigned int* input, unsigned int len); static void MD5Decode(unsigned int* output, unsigned char* input, unsigned int len); /* MD5 initialization. Begins an MD5 operation, writing a new context. */ /* Initialize md5 structure */ void MD5Init(MD5_CTX* context) { /* Set the length of the current valid information to 0 */ context->count[0] = 0; context->count[1] = 0; /* Load magic initialization constants.*/ /* Initialize the link variable. The algorithm requires this */ context->state[0] = 0x67452301; context->state[1] = 0xEFCDAB89; context->state[2] = 0x98BADCFE; context->state[3] = 0x10325476; } /* MD5 block update operation. Continues an MD5 message-digest operation, processing another message block, and updating the context. */ /* Pass the encrypted information to the md5 structure, which can be called multiple times context: Initialized md5 structure input: The information to be encrypted can be any length inputLen: Specifies the length of the input */ void MD5Update(MD5_CTX* context, unsigned char* input, unsigned int inputlen) { unsigned int i = 0, index = 0, partlen = 0; /* Modulo 64 (64bytes = 512bits) for calculating the number of bytes of the bits length of the existing information. It is used to judge whether the total length of the existing information plus the current transmitted information can reach 512 bits. If it can be reached, it will process the enough 512 bits once */ index = (context->count[0] >> 3) & 0x3F; /* (context->count[0] >> 3) = context->count[0]/8 ,Namely: count the number of bytes; The latter & 3F is Mod 64 */ partlen = 64 - index; /* Calculate the number of bytes that are short of the existing length and add up to 64Bytes (512bits) */ /* Save the number of bits of the input information */ context->count[0] += inputlen << 3; if (context->count[0] < (inputlen << 3)) { context->count[1]++; } context->count[1] += inputlen >> 29; /* If the number of bytes currently input is greater than the number of existing bytes, the length shall make up the number of bytes that are different by an integral multiple of 64 bytes */ if (inputlen >= partlen) { /* Use the currently entered content to make up 512bits for the content of context - > buffer */ memcpy(&context->buffer[index], input, partlen); /* Use the basic function to convert the filled 512bits (which has been saved to context - > buffer) once, and the conversion result is saved to context - > state */ MD5Transform(context->state, context->buffer); /* Convert the remaining bytes currently input (if the remaining bytes are greater than 512bits), and save the conversion results to context - > state */ for (i = partlen; i + 64 <= inputlen; i += 64) { MD5Transform(context->state, &input[i]); } index = 0; } else { i = 0; } /*Fill the remaining contents of 512bits in the input buffer into context - > buffer and wait for later processing*/ memcpy(&context->buffer[index], &input[i], inputlen - i); } /* MD5 finalization. Ends an MD5 message-digest operation, writing the the message digest and zeroizing the context. */ /*Get the final result of encryption digest: Save the final encryption string context: You initialized and filled in the md5 structure of the information */ void MD5Final(MD5_CTX* context, unsigned char digest[16]) { unsigned int index = 0, padlen = 0; unsigned char bits[8]; /* Modulo 64 (64bytes = 512bits) for calculating the number of bytes of the bits length of the existing information. */ index = (context->count[0] >> 3) & 0x3F; /* Calculate the number of bytes to be filled. The value range of padLen is 1-64 */ padlen = (index < 56) ? (56 - index) : (120 - index); /* Copy the length of the information (all) to be converted to bits */ MD5Encode(bits, context->count, 8); /* */ MD5Update(context, PADDING, padlen); /* Make up the bits length of the original information (the bits with fixed length are represented by 64 bits), which can happen to be enough for 512 bits this time, no more or less */ MD5Update(context, bits, 8); /* Save the final result to digest. ok, it's finally done */ MD5Encode(digest, context->state, 16); } /* Encodes input (UINT4) into output (unsigned char). Assumes len is a multiple of 4. */ /*copy a 4-byte integer into a buffer in the form of characters output: Character buffer for output input: An array of four byte integers to be converted len: output The length of the buffer is required to be an integer multiple of 4 */ static void MD5Encode(unsigned char* output, unsigned int* input, unsigned int len) { unsigned int i = 0, j = 0; while (j < len) { output[j] = input[i] & 0xFF; output[j + 1] = (input[i] >> 8) & 0xFF; output[j + 2] = (input[i] >> 16) & 0xFF; output[j + 3] = (input[i] >> 24) & 0xFF; i++; j += 4; } } /* Decodes input (unsigned char) into output (UINT4). Assumes len is a multiple of 4. */ /*Contrary to the above function, this one copies the data in the character buffer into a 4-byte integer (that is, it is saved as an integer) output: Save the converted integer input: Character buffer to convert len: The length of the input character buffer is required to be an integer multiple of 4 */ static void MD5Decode(unsigned int* output, unsigned char* input, unsigned int len) { unsigned int i = 0, j = 0; while (j < len) { output[i] = (input[j]) | (input[j + 1] << 8) | (input[j + 2] << 16) | (input[j + 3] << 24); i++; j += 4; } } /* MD5 basic transformation. Transforms state based on block. */ /* 512bits information (i.e. block buffer) is processed once, and each processing includes four rounds state[4]: md5 The state[4] in the structure is used to save the intermediate result or final result of encrypting 512bits information block[64]: 512bits information to be encrypted */ static void MD5Transform(unsigned int state[4], unsigned char block[64]) { unsigned int a = state[0]; unsigned int b = state[1]; unsigned int c = state[2]; unsigned int d = state[3]; unsigned int x[64]; MD5Decode(x, block, 64); /* Round 1 */ FF(a, b, c, d, x[0], S11, 0xd76aa478); /* 1 */ FF(d, a, b, c, x[1], S12, 0xe8c7b756); /* 2 */ FF(c, d, a, b, x[2], S13, 0x242070db); /* 3 */ FF(b, c, d, a, x[3], S14, 0xc1bdceee); /* 4 */ FF(a, b, c, d, x[4], S11, 0xf57c0faf); /* 5 */ FF(d, a, b, c, x[5], S12, 0x4787c62a); /* 6 */ FF(c, d, a, b, x[6], S13, 0xa8304613); /* 7 */ FF(b, c, d, a, x[7], S14, 0xfd469501); /* 8 */ FF(a, b, c, d, x[8], S11, 0x698098d8); /* 9 */ FF(d, a, b, c, x[9], S12, 0x8b44f7af); /* 10 */ FF(c, d, a, b, x[10], S13, 0xffff5bb1); /* 11 */ FF(b, c, d, a, x[11], S14, 0x895cd7be); /* 12 */ FF(a, b, c, d, x[12], S11, 0x6b901122); /* 13 */ FF(d, a, b, c, x[13], S12, 0xfd987193); /* 14 */ FF(c, d, a, b, x[14], S13, 0xa679438e); /* 15 */ FF(b, c, d, a, x[15], S14, 0x49b40821); /* 16 */ /* Round 2 */ GG(a, b, c, d, x[1], S21, 0xf61e2562); /* 17 */ GG(d, a, b, c, x[6], S22, 0xc040b340); /* 18 */ GG(c, d, a, b, x[11], S23, 0x265e5a51); /* 19 */ GG(b, c, d, a, x[0], S24, 0xe9b6c7aa); /* 20 */ GG(a, b, c, d, x[5], S21, 0xd62f105d); /* 21 */ GG(d, a, b, c, x[10], S22, 0x2441453); /* 22 */ GG(c, d, a, b, x[15], S23, 0xd8a1e681); /* 23 */ GG(b, c, d, a, x[4], S24, 0xe7d3fbc8); /* 24 */ GG(a, b, c, d, x[9], S21, 0x21e1cde6); /* 25 */ GG(d, a, b, c, x[14], S22, 0xc33707d6); /* 26 */ GG(c, d, a, b, x[3], S23, 0xf4d50d87); /* 27 */ GG(b, c, d, a, x[8], S24, 0x455a14ed); /* 28 */ GG(a, b, c, d, x[13], S21, 0xa9e3e905); /* 29 */ GG(d, a, b, c, x[2], S22, 0xfcefa3f8); /* 30 */ GG(c, d, a, b, x[7], S23, 0x676f02d9); /* 31 */ GG(b, c, d, a, x[12], S24, 0x8d2a4c8a); /* 32 */ /* Round 3 */ HH(a, b, c, d, x[5], S31, 0xfffa3942); /* 33 */ HH(d, a, b, c, x[8], S32, 0x8771f681); /* 34 */ HH(c, d, a, b, x[11], S33, 0x6d9d6122); /* 35 */ HH(b, c, d, a, x[14], S34, 0xfde5380c); /* 36 */ HH(a, b, c, d, x[1], S31, 0xa4beea44); /* 37 */ HH(d, a, b, c, x[4], S32, 0x4bdecfa9); /* 38 */ HH(c, d, a, b, x[7], S33, 0xf6bb4b60); /* 39 */ HH(b, c, d, a, x[10], S34, 0xbebfbc70); /* 40 */ HH(a, b, c, d, x[13], S31, 0x289b7ec6); /* 41 */ HH(d, a, b, c, x[0], S32, 0xeaa127fa); /* 42 */ HH(c, d, a, b, x[3], S33, 0xd4ef3085); /* 43 */ HH(b, c, d, a, x[6], S34, 0x4881d05); /* 44 */ HH(a, b, c, d, x[9], S31, 0xd9d4d039); /* 45 */ HH(d, a, b, c, x[12], S32, 0xe6db99e5); /* 46 */ HH(c, d, a, b, x[15], S33, 0x1fa27cf8); /* 47 */ HH(b, c, d, a, x[2], S34, 0xc4ac5665); /* 48 */ /* Round 4 */ II(a, b, c, d, x[0], S41, 0xf4292244); /* 49 */ II(d, a, b, c, x[7], S42, 0x432aff97); /* 50 */ II(c, d, a, b, x[14], S43, 0xab9423a7); /* 51 */ II(b, c, d, a, x[5], S44, 0xfc93a039); /* 52 */ II(a, b, c, d, x[12], S41, 0x655b59c3); /* 53 */ II(d, a, b, c, x[3], S42, 0x8f0ccc92); /* 54 */ II(c, d, a, b, x[10], S43, 0xffeff47d); /* 55 */ II(b, c, d, a, x[1], S44, 0x85845dd1); /* 56 */ II(a, b, c, d, x[8], S41, 0x6fa87e4f); /* 57 */ II(d, a, b, c, x[15], S42, 0xfe2ce6e0); /* 58 */ II(c, d, a, b, x[6], S43, 0xa3014314); /* 59 */ II(b, c, d, a, x[13], S44, 0x4e0811a1); /* 60 */ II(a, b, c, d, x[4], S41, 0xf7537e82); /* 61 */ II(d, a, b, c, x[11], S42, 0xbd3af235); /* 62 */ II(c, d, a, b, x[2], S43, 0x2ad7d2bb); /* 63 */ II(b, c, d, a, x[9], S44, 0xeb86d391); /* 64 */ state[0] += a; state[1] += b; state[2] += c; state[3] += d; } int main() { MD5_CTX md5_ctx; unsigned char c[] = "123456"; //e10adc3949ba59abbe56e057f20f883e unsigned char md5[16]; /* Initialize before calculation */ MD5Init(&md5_ctx); /* Call MD5Update repeatedly to calculate multiple package data (if any) */ MD5Update(&md5_ctx, c, strlen((char*)c)); /* Finally, call MD5Final to get the final result. */ MD5Final(&md5_ctx, md5); printf("character string admin of MD5 The code is:"); for (int i = 0; i < 16; i++) printf("%02x", md5[i]); printf("\n"); return 0; }
0x05 reverse analysis and identification MD5 algorithm
Compile the above implementations to generate the release X64 version of the program. The key structures of the md5 algorithm are identified below
The first MD5 source code identification
Krypto ANALyzer with PEid band identifies the additive constant matrix used by MD5
Through IDA analysis program, first locate Krypto ANALyzer and identify 0x140003274. You can see that the red boxes 0x140003270 to 0x140003370 are the addition constant matrix, and the yellow boxes 0x140003370 to 0x140003460 are the circular left shift table.
By using ida's plug-in Findcrypt, you can identify the constant ABCD and additive constant matrix used by MD5
Algorithm recognition
1. Initialize the ABCD to 0x67452301, 0xefcdab89, 0x98badcfe, 0x10325476
2. Fill 1 and 0 with data until mod 512 ===448
3. Fill the text length into the last 8 bytes
4. Because of ida recognition, the original 16 times w array assignment of the for loop is changed into 16 assignment statements
The main cycle of 5.64 times can better identify four basic operations.
The second MD5 source code identification
Krypto ANALyzer with PEid band is not recognized
The following is using IDA pro7 Using Findcrypt in 5, we can recognize the assignment of ABCD4 constants and the addition constants.
It may be because of vs optimization, MD5_init and MD5_update has been optimized into the main function. V7, V8 and V9 are MD5_CTX structure.
- Code MD5 first_ CTX performs initialization assignment 0x67452301, 0xEFCDAB89, 0x98BADCFE, 0x10325476
- How to execute MD5_update code, record the length first, and then enter the for loop to call MD5Transform
The following is the code of MD5Transform. You can obviously see four kinds of operation functions and addition constants.
0x06 summary
For the identification of MD5, the structures of four basic functions FF, GG, HH and II can be identified through constant ABCD, addition constant matrix, cyclic left shift table.
For magic modified MD5, constant ABCD and addition constant matrix are often modified. At this time, only four basic functions can be identified.
The common MD5 blasting can be queried through the online website. If there are not many encrypted characters in each section of magic modified MD5, you can calculate the magic modified MD5 of all characters by violence first, and then look up the table.
reference resources
Encryption and decryption Fourth Edition
MD5 implementation (official original document based on algorithm) and detailed notes of source code https://blog.csdn.net/ZCShouCSDN/article/details/83866095
Implementation of MD5
https://github.com/pod32g/MD5/blob/master/md5.c