c regular expression

c regular expression

Reference course: https://github.com/ziishaned/learn-regex/blob/master/translations/README-cn.md

There are three main functions in linux for regular expressions

#include <sys/types.h>
#include <regex.h>

//Generation rules
int regcomp(regex_t *preg, const char *regex, int cflags);

//Target string to match
int regexec(const regex_t *preg, const char *string, size_t nmatch,
                   regmatch_t pmatch[], int eflags);

//Release memory
void regfree(regex_t *preg);

The parameter cflags of function regcomp

  • REG_EXTENDED

    Use the extended regular expression grammar.

    This means that the extended regular expression grammar is used to interpret regular expressions. The POSIX specification divides the implementation of regular expressions into two kinds: basic regular expressions (BRE) and extended regular expressions (ERE).

    What is the difference between BRE and ERE? In fact, it's just the difference of metacharacters! In BRE, only ^, $,. [,], * are recognized as metacharacters, and all other characters are recognized as character characters. In ERE, meta-characters such as (,), {,},?, + |, etc. (and their related functions) are added. The grep command supports BRE by default. To support ERE, you need to use the - E option.

  • REG_ICASE

    Ignore case.

  • REG_NOSUB

    If the compiled regular expression is used with this option, the nmatch and pmatch parameters of regexec() will be ignored when used by the following regexec() function.

  • REG_NEWLINE

    In fact, REG_NEWLINE has two functions:

    1. Make ^ and $valid.

    2. Absolutely do not match newline characters

Function regexec parameter eflags: Not yet understood

The role of parentheses ()

If there is a regular expression [name=[^ &]*] (the purpose is to match name=xxx in the URL), and if it matches, it returns [name=xxx] in the pmatch, but what if I still want XXX after the = sign?

Then add a bracket, [name=([^ &]*)], this regular expression, after matching, the matching value in pmatch[0] is [name=xxx], and the matching value in pmatch[1] is [xxxx], which is the magic of brackets.

Role of REG_NEWLINE

Suppose there is a regular expression [age=[^ &]*] (the purpose is to match name=xxx after rn in the URL). If regcomp is invoked without the specified parameter REG_NEWLINE, the target string: [username = xdd r nage = 22 & husband = qinger r n & like = Study & look = pretty r n] cannot be matched;

When REG_NEWLINE is added, [age=22] is matched.

Limitation of function regexec: When matched to one, even if there is one, it does not match, that is to say, it matches only once.

So, what if you want to match multiple times? Manually change the target string, cut off the matched string, and use the remaining string to continue calling the regexec function until all matches are completed. Provide a small example to complete multiple matches.

Principle: rm_eo returns the matched end position, so let the new string sbuf point to the end of the last time, and then continue to call regexec, that is, change sbuf every time.

/* Substring function */
static char* substr(const char*str, unsigned start, unsigned end)
{
    unsigned n = end - start;
    static char stbuf[256];
    strncpy(stbuf, str + start, n);
    stbuf[n] = 0;
    return stbuf;
}


size_t nmatch = 3;
regmatch_t pmatch[3];
const char* lbuf = "The fat cat. sat. on the mat.";
const char* sbuf = lbuf;

while(regexec(&reg, sbuf, nmatch, pmatch, 0) == 0){

  for (x = 0; x < nmatch && pmatch[x].rm_so != -1; ++x) {
    printf("    $%d='%s'\n", x, substr(sbuf, pmatch[x].rm_so, pmatch[x].rm_eo));

  }
  //rm_eo returns the matched end position, so let the new string point to the end of the last time, and then continue calling regexec
  sbuf = &sbuf[pmatch[--x].rm_eo];
}

Complete small examples:

#include <stdio.h>
#include <sys/types.h>
#include <regex.h>
#include <string.h>

/* Substring function */
static char* substr(const char*str, unsigned start, unsigned end)
{
    unsigned n = end - start;
    static char stbuf[256];
    strncpy(stbuf, str + start, n);
    stbuf[n] = 0;
    return stbuf;
}

int main(){

  regex_t reg;

  int ret = regcomp(&reg, "(at\\.)$", REG_NEWLINE | REG_EXTENDED);
  if(ret != 0){
    printf("regcomp error %d\n", ret);
    return 1;
  }


  size_t nmatch = 3;
  regmatch_t pmatch[3];
  const char* lbuf = "The fat cat. sat. on the mat.";
  const char* sbuf = lbuf;
  int x = 0;

  while(regexec(&reg, sbuf, nmatch, pmatch, 0) == 0){

    for (x = 0; x < nmatch && pmatch[x].rm_so != -1; ++x) {
      printf("    $%d='%s'\n", x, substr(sbuf, pmatch[x].rm_so, pmatch[x].rm_eo));

    }
    sbuf = &sbuf[pmatch[--x].rm_eo];
  }



  regfree(&reg);
}  

Reference resources: https://www.cnblogs.com/qingergege/p/7359935.html

c/c++ Learning Mutual Assistance QQ Group: 877684253

I am writing to xiaoshitou 5854

Keywords: C github Linux

Added by Hebbs on Mon, 09 Sep 2019 14:12:23 +0300