Multi-threaded operation of log files on Linux

1. Learning goals and plans?

2. Learning vi, grep awk, sed and some Linux commands

1. grep: Searches for files in the specified directory and manipulates the results, suitable for simply finding matching text

1.grep after test.log > res.log: in test. Find the row with the after keyword in the log and save it in res.log
2.find-name'*.log'ls: Find in the current directory. File at the end of log and display information
3.grep-n^r test. Txt: find rows starting with R

2.Linux

1. View ip:ifconfig (win s are: ipconfig)
2. Expand folder: ls, ll
3.ping ip: Check if the host and virtual machine can ping, ping:www.baidu.com to see if you can connect to the Internet
4.route add default gw xx.xxx.xxx.xx: increase default gateway
5. File/folder creation and deletion: mkdir a, rmdir a, touch a.txt, rm (-rf) a.txt
6. Folder rename: mv a.txt b.txt
7. Copy file AAA to usr directory: cp/usr/tmp/aaa/usr
8. Compression: tar-z c v f ab.tar*:z is to call gzip compression command to compress, c is to package files, v is to show the running process, f is to specify the file name
9. Unzip: tar-zxvf ab.tar
10.whereis: Locate the source code file whereis ls
11. Network management: vi/etc/sysconfig/network host name, vi/etc/sysconfig/network-scripts/ifcfg-eth0 network information configuration
12. View directory: pwd
13. View currently running processes: ps-ef
14. View port: netstat-an
15.Clear screen: ctrl+l
16.df-l: Display disk usage
17.df-a: List of file systems
18. Compile: g++ test. C-O test
19. Run:. / test
20. Check if vim is installed: rqm-qa|grep vim
21. View the text: cat test.txt
22. File operations: more,less,cat,tail
23. Merge files: cat 1. Log 2. Log > 3. Log

3.vi

1.esc+shift: Switch mode
2.q: Exit
3.q!: forced return
4.wq: Save changes
5.i: Enter insert mode
6.dd: Delete a line
7.yy: Copy the line of the cursor
8.p: Paste to cursor position
9.r/R: Replace the character where the cursor goes
10.ctrl+g: Display the row
11. /+ Keyword: Find Characters
12.:set+nu: Displays the line number for each line of text

4.sed: Use scripts to process files for editing matched text

1. Options:
-h: Display help information
-n: Displays the results of script ing, typically with p
-e: sed directly on instruction line mode without modifying the source file
-i: Modify the file but do not output to the terminal
2. Commands:
A: increase: sed-3'4 a newline'test.txt (add a newline to the fourth line)
C: instead: sed-e'2,5c abc'test.txt (2-5 lines replaced with abc)
D: Delete: sed-e'2,5d'test.txt
i: Insert: Compared to a, i is inserted before
P: Print: sed-n'3,5p'test.txt

Sed-n'1~2p'test. Txt >> a.log: cumulative
Sed-n'1~2p'test. Txt>a.log: overwrite
Sed-e's/1/one/'test.txt: replace global 1 with one
sed "2c hello" test.txt: replace the second line with Hello

5.awk: Based on specified rules and extraction information in files and strings

6.GDB Debugging

Compile command: g++ -g test. Cpp-o test
Start debugging: gdb test
Exit: quit
Add a breakpoint on the fifth line: break(b)5
Run:run
Step Debugging:next
(gdb): b/r/c/n/p/l/q/until/finish

Three: Multi-threaded text processing

1. File Read-Write Operation

#include<iostream>
#include<fstream>
#include<string>
#include<vector>
#include<stdio.h>
#include<stdlib.h>
#include <ctype.h>
#define ID_LEN    36

using namespace std;

int main()
{
    fstream f("res1.log");
    string line;
    vector<string> vec; //save all the new line
    int number = 0; // time of ip request id
    while (getline(f, line))
    {
        string part_1, part_2, newline;
        int begin_1, begin_2, end_1, end_2;

        begin_1 = line.find("did=") + 4;  // did
        // end_1 = begin_1 + 36;
        part_1 = line.substr(begin_1, ID_LEN);

        begin_2 = line.find("host:") + 5;
        end_2 = line.find(">") - 5;
        int len_2 = end_2 - begin_2;
        part_2 = line.substr(begin_2, len_2);

        newline = part_2 + ":" + part_1;
        vec.push_back(newline);
        cout << newline << endl;
    }

    ofstream outfile("res1_1.log");  // write newline to new file
    string write_line;
    for (int i = 0; i < vec.size(); i++)
    {
        for (int j = i+1; j < vec.size(); j++)
        {
            if (vec[i] == vec[j])
            {
                number++;
                vec.erase(vec.begin()+j,vec.begin()+j+1);
            }
        }
        write_line = vec[i] + "(" + to_string(number) + ")";
        outfile << write_line << endl;

        number = 0;
    }
    outfile.close();
    
    return  0;
}

2. Multithreaded Basic Knowledge Points

Linux uses the pthread library, c++11 has the threads library, which is simpler. We use pthread here.

1. Basic functions:
pthread_create(,): Create a new thread, the original thread continues to execute, and the new thread executes the function body code. Original thread uses pthread_ The join() function waits for the new thread to end

2. Semaphores and Mutexes:
When a thread is running at the same time, semaphores and mutexes need to be introduced to better control thread execution and access critical parts of the code
Semaphore: Assign a thread one out of five phone lines
Mutex: A critical resource that can only be accessed by one thread at a time.
3. Mutual exclusion: Two threads can access a locked piece of code in an orderly way, and each time a thread operates on it, it needs to be locked to prevent access by other threads, because if there is no lock, errors may occur
4. Concurrent and Parallel
Concurrency: Alternate processing at the same time
Parallel: Multithreaded

3. Lock

pthread_mutex_t work_mutex //initialization
pthread_t thread[THREAD_NUMBER]; // Thread id array
Pthread_ Mutex_ Lock(&work_mutex) //lock
Pthread_ Mutex_ Unlock(&work_mutex) //unlock

Four: Code

Ideas: The original log file is filtered out and saved to two files by grep command. Two threads are created in the program to read the filtered data separately. Each thread extracts and integrates the valid information of each row, and saves the integrated information to a map. The operation of map needs to be mutually exclusive. At the end of this process, the first location of the map is id+ip, and the second location is the number of occurrences, which are combined and output to the final target file.

#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include<semaphore.h>
#include<iostream>
#include<fstream>
#include<string>
#include<vector>
#include <ctype.h>
#include <map>
#include <unistd.h>

using namespace std;
#Define THREAD_ NUMBER 2 /* Number of Threads*/
#define ID_LEN    36
pthread_mutex_t work_mutex;     /*Define Mutex*/

map<string, int> mp;
int number = 0; // time of ip request id

void* thrd_func1(void* arg) {
    cout << "first start" << endl;
    pthread_mutex_lock(&work_mutex);

    fstream f("res1.log");
    string line;
    while (getline(f, line))
    {
        string part_1, part_2, newline;
        int begin_1, begin_2, end_1, end_2;

        begin_1 = line.find("did=") + 4;  // did
        // end_1 = begin_1 + 36;
        part_1 = line.substr(begin_1, ID_LEN);
        //cout << "part1:" << part_1 << endl;

        begin_2 = line.find("host:") + 5;
        end_2 = line.find(">") - 5;
        int len_2 = end_2 - begin_2;
        part_2 = line.substr(begin_2, len_2);
        //cout << "part2" << part_2 << endl;

        newline = part_2 + ":" + part_1;
        mp[newline]++;
        cout << "first print:" << newline << endl;
    }
    pthread_mutex_unlock(&work_mutex);
    sleep(1);
    
}

void* thrd_func2(void* arg) {
    cout << "second start" << endl;
    pthread_mutex_lock(&work_mutex);

    fstream f("res2.log");
    string line;
    while (getline(f, line))
    {
        string part_1, part_2, newline;
        int begin_1, begin_2, end_1, end_2;

        begin_1 = line.find("did=") + 4;  // did
        // end_1 = begin_1 + 36;
        part_1 = line.substr(begin_1, ID_LEN);
        //cout << "part1:" << part_1 << endl;

        begin_2 = line.find("host:") + 5;
        end_2 = line.find(">") - 5;
        int len_2 = end_2 - begin_2;
        part_2 = line.substr(begin_2, len_2);
        //cout << "part2" << part_2 << endl;

        newline = part_2 + ":" + part_1;
        mp[newline]++;
        cout << "second print:"<<newline << endl;
    }
    pthread_mutex_unlock(&work_mutex);
    sleep(1);
}
int main(void) {
    
    pthread_t thread[THREAD_NUMBER];  // Thread id array 
    //void* thrd_ret; // Pointer to the return value of a new thread

    int res;
    res = pthread_mutex_init(&work_mutex, NULL);
    if (res != 0)
    {
        cout << "initiate fail" << endl;
        exit(EXIT_FAILURE);
    }

    res = pthread_create(&thread[0], NULL, thrd_func1, NULL);  // Parameters: thread id created, thread parameters, starting address of the thread running the function, parameters of the function running  
    res = pthread_create(&thread[1], NULL, thrd_func2, NULL);
    printf("Create treads success\n Waiting for threads to finish...\n");

    for (int i = 0; i < THREAD_NUMBER; i++) {       /* Waiting for thread to end */
        res = pthread_join(thread[i], NULL);
        if (!res) {
            printf("Thread %d joined\n", i);
        }
        else {
            printf("Thread %d join failed\n", i);
        }
    } 

    ofstream outfile("res4.log");  // write newline to new file
    string write_line;

    for (auto iter = mp.begin(); iter != mp.end(); ++iter)
    {
        write_line = iter->first + "(" + to_string(iter->second) + ")";
        outfile << write_line << endl;
    }
    cout << "finish write" << endl;
    outfile.close();
    return 0;
}

5. Result

Six: Problems encountered

1.sleep(1) was written before unlock, causing the loop to resume directly after unlock

for (int i = 0; i < 500; i++)
    {
        pthread_mutex_lock(&work_mutex);
        cout << "first plus:" << n << endl;
        n = n + 1;
        pthread_mutex_unlock(&work_mutex);
        usleep(10);
    }

2. substr method of string class, substr(begin, length). The first parameter is the start position, and the second parameter is the intercept length, not the end position

  1. In the vectors version, when saving newline s with vectors, you need to remove erase s that have been compared to the same lines, otherwise the lines that already have number++ after writing to the file will still appear
for (int i = 0; i < vec.size(); i++)
    {
        for (int j = i+1; j < vec.size(); j++)
        {
            if (vec[i] == vec[j])
            {
                number++;
                vec.erase(vec.begin()+j,vec.begin()+j+1);
            }
        }

4.to_ The string() function needs to be explained at compile time: g++ -lpthread thread_ 4.cpp-o thread_ 4-std=c++ 11
5. Use the mount command on Linux to mount the shared folder of window s:
mount -o username = abc, ver =2.1//ip//source /home/a

Keywords: C++ Linux Multithreading

Added by dougp23 on Tue, 18 Jan 2022 00:28:35 +0200