Embedded foundation -- pattern matching of parsing string

Following the last article, the author introduces a simpler parsing tool, sscanf.

warm-up

When you are in C language class and doing c language course design or experiment, you should often contact printf and scanf. The former prints characters to standard output, while the latter reads and parses strings from standard input.

sscanf is similar to scanf, except that it does not read from standard input, but directly parses the string passed in by the user.

int sscanf(const char *str, const char *format, ...);
  • str string to be parsed
  • Format format parameters
  • ... variable length parameter, which is the address of a series of variables used to store the parsing results
  • Returns the number of successfully resolved fields

What does the format parameter mean? Let's warm up first.

static void test(void)
{
	const char *str = "Today is 2021.7.31";

	int year = 0;
	int month = 0;
	int day = 0;
	int ret;

	ret = sscanf(str, "Today is %d.%d.%d", &year, &month, &day);
	printf("ret=%d, year:%d, month:%d, day:%d\n", ret, year, month, day);
}

"Today is% d.% d.% d" is a format parameter, which contains ordinary characters, such as today is, and format specifier,% d table int type. Sscanf parses STR based on formatting parameters. For normal characters, sscanf checks whether STR is consistent with it. For the format specifier, the content in str is extracted according to its meaning, and the result is stored in the address parameter. The above code demonstrates the method of extracting the date information, and the results are as follows:

ret=3, year:2021, month:7, day:31

If you still don't remember the formatting parameters, classmate, you certainly didn't take the C language class seriously. Why don't you turn over the book and review. Today, I don't want to talk about sscanf from scratch, but introduce a little-known usage.

problem

Take another example to resolve the domain name and port number.

static void test2(void)
{
	const char *str = "www.baidu.com:80";

	char addr[64] = "";
	int port = 0;
	int ret;

	ret = sscanf(str, "%s:%d", addr, &port);
	printf("ret=%d, addr:%s, port:%d\n", ret, addr, port);
}

%s is used to resolve string (domain name),% d is used to resolve int (port number). The results are as follows:

ret=1, addr:www.baidu.com:80, port:0

This is not what we expected. sscanf parses the domain name and port number as strings. This is because the string corresponding to% s ends when it encounters blank characters (spaces, newlines) or '\ 0'. sscanf will not pay attention to:% d until it is finished.

The above situation belongs to: you want to end the string but don't end it, so you have parsed too much content. Sometimes the opposite happens. Please see the following example:

static void test3(void)
{
	const char *str = "how are you";
	char buf[64] = "";

	sscanf(str, "%s", buf);
	printf("%s\n", buf);
}

I wanted to parse out the complete how are you, but the output result is only how.

pattern

Using% s to match strings is very limited. But this does not mean that sscanf is not easy to use. There is another way to match strings, that is, pattern matching.

The format of the pattern is:% [pattern], where the pattern is used to define a character set, and the string to be matched is composed of this character set. Pattern can be multiple characters, you can use - to define a range, or you can use ^ to define the character set inversely. It's a little abstract. Let's look at some concrete examples.

  • %[abcd] matches a string consisting of a,b,c, and d. For example, for the string abcdefg, it will match abcd.
  • %[^ abcd] when pattern starts with ^, it matches characters other than the pattern character set. Therefore, for the string gfedcba, it will match gfe.
  • %[0-9a-fA-F] its character set is 0123456789abcdefABCDEF, which is actually hexadecimal numeric characters.

When using - to define a range, note that the start character must be less than the end character.% n [z-a] matches not the range, but the three characters Z, - and a.

If you have studied regular expressions, you should be familiar with the above patterns. However, the pattern matching function provided by sscanf is much simpler than regular expressions. After knowing this hidden usage of sscanf, the author has tried repeatedly, and the most used is ^.

Now do you know how to resolve the domain name and port number?

Want to stop thinking?

Well, the answer is as follows:

static void test2_fix(void)
{
	const char *str = "www.baidu.com:80";

	char addr[64] = "";
	int port = 0;
	int ret;

	ret = sscanf(str, "%[^:]:%d", addr, &port);
	printf("ret=%d, addr:%s, port:%d", ret, addr, port);
}

Is it very simple? Since the domain name is the previous content, it is defined as% [^:].

Analytic GPS

Now you can use sscanf to parse GPS. The example of GPS is as follows:

$GNRMC,122921.000,A,3204.862246,N,11845.911047,E,0.099,191.76,280521,,E,A*00

Direct code:

static void parse_gps(const char *gps)
{
    char valid = ' ';
    double longitude = 0;
    double latitude = 0;
    int ret;

    ret = sscanf(gps,
             "$GNRMC,%*[^,],%c,%lf,%*c,%lf,%*c,",   /* UTC,valid,latitude,ns,longitude,ew,  */
             &valid, &latitude, &longitude);

    LOG_D("parse gps(%s)", gps);


    if (ret != 3)
    {
        LOG_E("fail");
    }
    else
    {
        LOG_D("succeed, valid:%c, latitude:%lf, longitude:%lf", valid, latitude, longitude);
    }
}

The following figure shows the fields corresponding to each format specifier in the format parameters.

The second specifier,% * [^,] is used to match the time 122921.000. The difference is that there is an additional * here, which means that the content of the corresponding field does not need to be parsed, and there are no related variables in the following address parameters. You see, & valid, & latitude, & longitude store valid flag characters, latitude and latitude respectively, and address without time variable* c the same is true.

During the test, three use cases are used to test the success and failure scenarios.

void parse_string_example(void)
{
    const char *strs[] =
    {
            "$GNRMC,122921.000,A,3204.862246,N,11845.911047,E,0.099,191.76,280521,,E,A*00",
            "hello world",
            "$GNRMC,,,,,,,,,,,,*00"
    };

    LOG_I("test parse string");

    for (int i = 0; i < ARRAY_SIZE(strs); i++)
    {
        parse_gps(strs[i]);
    }

}

The results are as follows:

For the complete example code in this article, see the demo project created by the author based on stm32f407:

Address: git@gitee.com:wenbodong/mcu_demo.git
 Example: examples/05_string/example.c
 Need to open when using examples/examples.h Medium EXAMPLE_SHOW_STRING. 

Keywords: C Embedded system string regex

Added by jerryroy on Sun, 02 Jan 2022 16:06:15 +0200