Full format link: https://blog.imakiseki.cf/2022/03/07/techdev/python-cpp-string-find-perf-test/
background
Recently, in preparation for an algorithm competition, the language mistakenly chose python, so we had no choice but to start language migration for common scenes. The scene of string search appears from time to time in the algorithm competition. This paper compares the speed of this scene in Python and C + +, a common language in the competition, and provides relevant parameters and running results for others' reference.
parameter
Hardware and operating system
-` root@<hostname> .o+` ------------ `ooo/ OS: Arch Linux ARM aarch64 `+oooo: Host: Raspberry Pi 4 Model B `+oooooo: Kernel: 5.16.12-1-aarch64-ARCH -+oooooo+: Uptime: 3 hours, 32 mins `/:-:++oooo+: Packages: 378 (pacman) `/++++/+++++++: Shell: zsh 5.8.1 `/++++++++++++++: Terminal: /dev/pts/0 `/+++ooooooooooooo/` CPU: (4) @ 1.500GHz ./ooosssso++osssssso+` Memory: 102MiB / 7797MiB .oossssso-````/ossssss+` -osssssso. :ssssssso. :osssssss/ osssso+++. /ossssssss/ +ssssooo/- `/ossssso+/:- -:/+osssso+- `+sso+:-` `.-/+oso: `++:. `-/+/ .` `/
Compilation environment and interpretation environment
- Python
- Interpreter: Python 3.10.2 (main, Jan 23 2022, 21:20:14) [GCC 10.2.0] on linux
- Interactive environment: IPython 8.0.1
- C++
- Compiler: g++ (GCC) 11.2.0
- Compile command: G + + test cpp -Wall -O2 -g -std=c++11 -o test
scene
Two scenarios are set in this measurement: the character distribution of the source string in scenario 1 is generated by the pseudo-random number generator, which represents the average situation of string search; The source string of Scene 2 can be continuously divided into 20000 character segments with a length of 50, of which the 15001st is the pattern string, such as "ab... b" (1 "a", 49 "b"), and the rest of the character segments are such as "ab... c" (1 "a", 48 "b", 1 "c").
project | Scenario 1: Average | Scenario 2: worse case |
---|---|---|
character set | Lowercase letters | abc |
Character distribution | random.choice | Strong regularity |
Source string length | 1,000,000 | 1,000,000 |
Mode string length | 1,000 | 50 |
Mode string occurrence position | 250,000,500,000,750,000 | 750,000 |
Number of occurrences of mode string | 1 | 1 |
test method
In this measurement, Python language uses the built-in type str find() member function, C + + language uses string class respectively find() member function, strstr standard library function and KMP algorithm implemented by users.
Test object | Core code |
---|---|
Python | src.find(pat) |
C++ - test.cpp | src.find(pat) |
C++ - test_strstr.cpp | strstr(src, pat) |
C++ - test_kmp.cpp | KMP(src, pat) |
source code
Generate source string and mode string
import random # Scenario 1: # Source string s = "".join(chr(random.choice(range(ord("a"), ord("z") + 1))) for _ in range(1000000)) # List of pattern strings. Each of the three elements corresponds to a pattern string p = [s[250000:251000], s[500000:501000], s[750000:751000]] # Scenario 2: # Pattern string p = 'a' + 'b' * 49 # Other character segments _s = "a" + "b" * 48 + "c" # Source string s = _s * 15000 + p + _s * 4999 # Stored in a file for easy access by C + + programs with open('source.in', 'w') as f: f.write(s) with open('pattern.in', 'w') as f: f.write(p[0])
Test code
Python
In []: %timeit s.find(p[0])
C++ - test.cpp
#include <chrono> #include <iostream> #include <cstring> #include <fstream> #define LOOP_COUNT (1000) using namespace std; using std::chrono::high_resolution_clock; using std::chrono::duration_cast; using std::chrono::duration; using std::chrono::milliseconds; double test(string s, string p, size_t* pos_ptr) { auto t1 = high_resolution_clock::now(); *pos_ptr = s.find(p); auto t2 = high_resolution_clock::now(); duration<double, milli> ms_double = t2 - t1; return ms_double.count(); } int main() { string s, p; size_t pos; ifstream srcfile("source.in"); ifstream patfile("pattern.in"); srcfile >> s; patfile >> p; double tot_time = 0; for (int i = 0; i < LOOP_COUNT; ++i) { tot_time += test(s, p, &pos); } cout << "Loop count: " << LOOP_COUNT << endl; cout << "Source string length: " << s.length() << endl; cout << "Pattern string length: " << p.length() << endl; cout << "Search result: " << pos << endl; cout << "Time: " << tot_time / LOOP_COUNT << " ms" << endl; return 0; }
C++ - test_strstr.cpp
#include <chrono> #include <iostream> #include <cstring> #include <fstream> #define LOOP_COUNT (1000) using namespace std; using std::chrono::high_resolution_clock; using std::chrono::duration_cast; using std::chrono::duration; using std::chrono::milliseconds; char s[1000005], p[1005], *pos=NULL; double test(char* s, char* p, char** pos_ptr) { auto t1 = high_resolution_clock::now(); *pos_ptr = strstr(s, p); auto t2 = high_resolution_clock::now(); duration<double, milli> ms_double = t2 - t1; return ms_double.count(); } int main() { ifstream srcfile("source.in"); ifstream patfile("pattern.in"); srcfile >> s; patfile >> p; double tot_time = 0; for (int i = 0; i < LOOP_COUNT; ++i) { tot_time += test(s, p, &pos); } cout << "Loop count: " << LOOP_COUNT << endl; cout << "Source string length: " << strlen(s) << endl; cout << "Pattern string length: " << strlen(p) << endl; cout << "Search result: " << pos - s << endl; cout << "Time: " << tot_time / LOOP_COUNT << " ms" << endl; return 0; }
C++ - test_kmp.cpp
#include <chrono> #include <iostream> #include <cstring> #include <fstream> #include <cstdlib> #define LOOP_COUNT (1000) using namespace std; using std::chrono::high_resolution_clock; using std::chrono::duration_cast; using std::chrono::duration; using std::chrono::milliseconds; int dp[1005]; int KMP(string s, string p) { int m = s.length(), n = p.length(); if (n == 0) return 0; if (m < n) return -1; memset(dp, 0, sizeof(int) * (n+1)); for (int i = 1; i < n; ++i) { int j = dp[i+1]; while (j > 0 && p[j] != p[i]) j = dp[j]; if (j > 0 || p[j] == p[i]) dp[i+1] = j + 1; } for (int i = 0, j = 0; i < m; ++i) if (s[i] == p[j]) { if (++j == n) return i - j + 1; } else if (j > 0) { j = dp[j]; --i; } return -1; } double test(string s, string p, int* pos_ptr) { auto t1 = high_resolution_clock::now(); *pos_ptr = KMP(s, p); auto t2 = high_resolution_clock::now(); duration<double, milli> ms_double = t2 - t1; return ms_double.count(); } int main() { string s, p; int pos; ifstream srcfile("source.in"); ifstream patfile("pattern.in"); srcfile >> s; patfile >> p; double tot_time = 0; for (int i = 0; i < LOOP_COUNT; ++i) { tot_time += test(s, p, &pos); } cout << "Loop count: " << LOOP_COUNT << endl; cout << "Source string length: " << s.length() << endl; cout << "Pattern string length: " << p.length() << endl; cout << "Search result: " << pos << endl; cout << "Time: " << tot_time / LOOP_COUNT << " ms" << endl; return 0; }
result
The difference between the average execution time of the command% timeython and the average execution time of the command% timeython. C + + code takes the average time after running each mode string for 1000 times.
Unless otherwise specified, the following times are in microseconds and reserved to the whole digit.
scene | Mode string occurrence position | Python | C++ - test.cpp | C++ - test_strstr.cpp | C++ - test_kmp.cpp |
---|---|---|---|---|---|
Scenario 1 | 250,000 | 105 | 523 | 155 | 2564 |
Scenario 1 | 500,000 | 183 | 1053 | 274 | 3711 |
Scenario 1 | 750,000 | 291 | 1589 | 447 | 4900 |
Scenario 2 | 750,000 | 2630* | 618 | 353 | 3565 |
*The original output is "2.63 ms". The average value of IPython's% timeit output retains 3 significant digits. Since this time has exceeded 1 millisecond, the microsecond bit is discarded. The unit here is still microseconds, and the value is recorded as "2630".
limitations
The hardware of the equipment used in this measurement is inferior to the standard configuration machine in the algorithm competition, and the "absolute value" in the measured results has low reference.
summary
According to the results in the above table, under the given environment and relevant parameters, the running time of Python in scenario 1 is about one fifth of that of string::find in C + +, which is close to std:strstr; In scenario 2, the running time of Python increases significantly, but the running time of the first two test methods of C + + is close to or even shorter than that before. In the four tests, the running time of KMP algorithm implemented by C + + users is longer than that of Python under the same conditions.
The fast lookup (. find()) and count (. count()) algorithms for the built-in type str in Python are based on Boyer Moore algorithm and Horspool algorithm The latter is the simplification of the former, while the former and Knuth Morris Pratt algorithm of
For information about the fact that string::find in C + + runs longer than STD:: STR, see Bug 66414 - string::find ten times slower than strstr.
It is noteworthy that the running time of KMP algorithm implemented in C + + is much longer than that in C + + standard library and even Python. This is also similar to the situation that "the running efficiency of self-designed assembly code is lower than that of compiler". A problem of Stack Overflow strstr faster than algorithms? There's someone down there answer As follows:
Why do you think strstr should be slower than all the others? Do you know what algorithm strstr uses? I think it's quite likely that strstr uses a fine-tuned, processor-specific, assembly-coded algorithm of the KMP type or better. In which case you don't stand a chance of out-performing it in C for such small benchmarks.
KMP algorithm is not the fastest of all linear complexity algorithms. In different environments (software and hardware, test data, etc.), KMP, its variants and even other linear complexity algorithms cannot be judged. The compiler takes into account many possible factors in the design, so that there can be relatively better strategies to get results in different environments as far as possible. Therefore, under the condition of ensuring the correct results, it is better to directly use the functions provided in the standard library rather than write them according to the algorithm principle.
At the same time, the actual measurement also confirms the statement that Python is not suitable to achieve high results in the algorithm competition from the perspective of running time.
reference resources
- https://stackoverflow.com/questions/22387586/measuring-execution-time-of-a-function-in-c
- https://www.cplusplus.com/reference/string/string/find/
- https://stackoverflow.com/questions/681649/how-is-string-find-implemented-in-cpython
- https://github.com/python/cpython/blob/main/Objects/stringlib/fastsearch.h#L5
- https://stackoverflow.com/questions/8869605/c-stringfind-complexity
- https://stackoverflow.com/questions/19506571/can-it-be-faster-to-find-the-minimum-periodic-string-inside-another-string-in-te
- https://gcc.gnu.org/onlinedocs/gcc-9.4.0/libstdc++/api/a17342_source.html
- https://opensource.apple.com/source/tcl/tcl-10/tcl/compat/strstr.c.auto.html
- https://gist.github.com/hsinewu/44a1ce38a1baf47893922e3f54807713
- https://stackoverflow.com/questions/11799956/performance-comparison-strstr-vs-stdstringfind
- https://stackoverflow.com/questions/7586990/strstr-faster-than-algorithms
- https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66414
- http://0x80.pl/notesen/2016-10-08-slow-std-string-find.html