Embedded C language knowledge summary

1 embedded C language summary

Grammatically speaking, C language is not complex, but it is not easy to write high-quality and reliable embedded C programs. It is not only necessary to be familiar with hardware characteristics and defects, but also need to have a certain understanding of compilation principle and computer technology knowledge. In so many years of embedded development, I have also accumulated some experience and thinking in this regard. I hope to sum up and systematically explain the important knowledge points of embedded C language, which is the origin of this article. Based on my own practice in embedded, combined with relevant materials, this paper expounds the C language knowledge and key points that embedded needs to understand, hoping that everyone who reads this article can gain something.

1. Keywords

Keywords are reserved identifiers with special functions in C language, which can be divided into

1). Data type (commonly used char, short, int, long, unsigned, float, double)

2). Operations and expressions (=, +, -, *, while, do while, if, goto, switch case)

3). Data storage (auto, static, extern al, const, register, volatile, restricted),

4). Structure (struct, enum, union,typedef),

5). Bit operations and logical operations (<, > >, &, |, ~, ^, & &),

6). Preprocessing (#define, #include, #error, #if...#elif...#else...#endif, etc.),

7). Platform extension keywords (_asm, _inline, _syscall)

These keywords together constitute the C syntax of the embedded platform.

The embedded application can be logically abstracted into three parts:

1). Data input (such as sensor, signal, interface input),

2). Data processing (such as protocol decoding and packet, AD sampling value conversion, etc.)

3). Data output (GUI display, output pin status, DA output control voltage, PWM wave duty cycle, etc.),

The management of data runs through the development of the whole embedded application, including data type, storage space management, bit and logic operation, and data structure. C language supports the realization of the above functions from the syntax, and provides the corresponding optimization mechanism to deal with the more limited resource environment under the embedded environment.

2 data type

C language supports commonly used character, integer and floating-point variables. Some compilers such as keil also support bit (bit) and SFR (register) data types to meet special address operations. C language only specifies the minimum value range of each basic data type. Therefore, the same type may occupy different lengths of storage space on different chip platforms. Therefore, the compatibility of subsequent transplantation needs to be considered in code implementation, and the typedef provided by C language is the keyword used to deal with this situation, which is adopted in most software projects supporting cross platforms, Typical are as follows:

typedef unsigned char uint8_t;
typedef unsigned short uint16_t;
typedef unsigned int uint32_t;
......
typedef signed int int32_t;

Since the basic data width of different platforms is different, how to determine the basic data type of the current platform, such as the width of int, requires the interface sizeof provided by C language, which is implemented as follows.

printf("int size:%d, short size:%d, char size:%d\n", sizeof(int), sizeof(char), sizeof(short));

There is also an important knowledge point here, that is, the width of the pointer, such as

char *p;
printf("point p size:%d\n", sizeof(p));

In fact, this is related to the addressable width of the chip. For example, the width of 32-bit MCU is 4 and the width of 64 bit MCU is 8. Sometimes, this is also a simple way to check the bit width of MCU.

3. Memory management and storage architecture

C language allows program variables to determine the memory address at the time of definition. Through the scope and the keywords extern and static, it realizes a fine processing mechanism. According to different hardware regions, there are three ways to allocate memory (excerpted from C + + high-quality programming):

1). Allocate from a static storage area. Memory is allocated when the program is compiled, and it exists throughout the running period of the program. For example, global variables, static variables.

2). Create on the stack. When the function is executed, the storage units of local variables in the function can be created on the stack, and these storage units are automatically released at the end of function execution. Stack memory allocation is built into the instruction set of the processor, which is very efficient, but the allocated memory capacity is limited.

3). Allocation from the heap, also known as dynamic memory allocation. When the program runs, it uses malloc or new to apply for any amount of memory. The programmer is responsible for when to release memory with free or delete. The lifetime of dynamic memory is determined by the programmer. It is very flexible to use, but it also encounters the most problems at the same time.

Here is a simple example of C language.

//main.c
#include <stdio.h>
#include <stdlib.h>

static int st_val;                   //Static global variable -- static storage area
int ex_val;                           //Global variable -- static storage
int main(void)
{
   int a = 0;                         //Local variable -- on stack application
   int *ptr = NULL;                   //Pointer variable
   static int local_st_val = 0;       //Static variable
   local_st_val += 1;
   a = local_st_val;
   ptr = (int *)malloc(sizeof(int)); //Request space from heap
   if(ptr != NULL)
   {
      printf("*p value:%d", *ptr);
      free(ptr);
      ptr = NULL;
      //After free, you need to set ptr to null, otherwise the subsequent ptr verification will be invalid and the wild pointer will appear
   }            
}    

The scope of C language not only describes the accessible area of the identifier, but also specifies the storage area of the variable in the file scope_ Val and ex_val is allocated to the static storage area, where the static keyword mainly defines whether the variable can be accessed by other files, and the variables a, ptr and local in the scope of the code block_ st_val should be allocated to different areas according to different types. A is a local variable, which is allocated to the stack. ptr is used as a pointer, and malloc allocates space. Therefore, it is defined in the heap, while local_st_val is defined by the keyword, indicating that it is allocated to the static storage area, which involves important knowledge points. Static has different meanings in the file scope and code block scope: in the file scope, it is used to limit the external link of functions and variables (whether they can be accessed by other files), and in the code block scope, it is used to allocate variables to the static storage area.

For C language, understanding the above knowledge is basically sufficient for memory management, but for embedded C, defining a variable may not be in memory (SRAM), but may also be stored in FLASH space or directly by registers (register defines variables or some local variables under high optimization level), such as global variables defined as const defined in FLASH, The local variables defined as register will be optimized to be placed directly in the general register. When optimizing the running speed or limited storage, understanding this knowledge is very meaningful for code maintenance. In addition, the compiler of embedded C language will extend the memory management mechanism, such as supporting decentralized loading mechanism and__ attribute__ ((section) (user defined area)), allowing specified variables to be stored in special areas, such as SDRAM, SQI FLASH, which strengthens memory management to adapt to complex application environment scenarios and requirements.

LD_ROM 0x00800000 0x10000 { ;load region size_region
    EX_ROM 0x00800000 0x10000 { ;load address = execution address
  *.o (RESET, +First)
  *(InRoot$$Sections)
  .ANY (+RO)
  }
  EX_RAM 0x20000000 0xC000 { ;rw Data
    .ANY (+RW +ZI)
  }
  EX_RAM1 0x2000C000 0x2000 {
    .ANY(MySection)
   }
  EX_RAM2 0x40000000 0x20000{
    .ANY(Sdram)
  }
}

int a[10] __attribute__((section("Mysection")));
int b[100] __attribute__((section("Sdram")));

In this way, we can assign variables to the required area, which is necessary in some cases. For example, when making GUI or web pages, because a large number of pictures and documents need to be stored, the internal FLASH space may be insufficient. At this time, we can declare variables to the external area. In addition, the data of some parts of external memory is important. In order to avoid being overwritten by other contents, SRAM areas may need to be divided separately to avoid fatal errors caused by incorrect modifications. These experiences are common and important in actual product development. However, due to space reasons, only brief examples are provided here. If such needs are encountered in work, it is recommended to understand them in detail.

As for the use of heap, for embedded Linux, it is consistent with the standard C language. Pay attention to the check after malloc and remember to set it empty after release, Avoid "wild pointers". However, for resource constrained microcontrollers, there are generally few scenarios using malloc. If you need to frequently apply for memory blocks, a set of memory management mechanism based on static storage area and memory block segmentation will be built. On the one hand, the efficiency will be higher (use fixed size blocks to segment in advance and directly find the number when using) On the other hand, the use of memory blocks is controllable, which can effectively avoid the problem of memory fragmentation. Common mechanisms such as RTOS and network LWIP adopt this mechanism, and I am used to it. Therefore, the details of heap are not described. If you want to know, you can refer to the storage related instructions in < C Primer Plus >.

4. Pointers and arrays

Arrays and pointers are often the main causes of program bug s, such as array cross-border, pointer cross-border, illegal address access and non aligned access. Behind these problems, there are often the shadow of pointers and arrays. Therefore, understanding and mastering pointers and arrays is the only way to become a qualified C language developer.

Array is composed of elements of the same type. When it is declared, the compiler allocates a space in memory according to the characteristics of internal elements. In addition, C language also provides multi-dimensional array to meet the needs of special scenarios, while pointer provides a symbolic method of using address. It is meaningful only to point to a specific address. C language pointer has the greatest flexibility, Before being accessed, you can point to any address, which greatly facilitates the operation of hardware, but also has higher requirements for developers. Refer to the following codes:

int main(void)
{
  char cval[] = "hello";
  int i;
  int ival[] = {1, 2, 3, 4};
  int arr_val[][2] = {{1, 2}, {3, 4}};
  const char *pconst = "hello";
  char *p;
  int *pi;
  int *pa;
  int **par;

  p = cval;
  p++;            //addr increases by 1
  pi = ival;
  pi+=1;          //addr increased by 4
  pa = arr_val[0];
  pa+=1;          //addr increased by 4
  par = arr_val;
  par++;         //addr increased by 8
  for(i=0; i<sizeof(cval); i++)
  {
      printf("%d ", cval[i]);
  }
  printf("\n");
  printf("pconst:%s\n", pconst);
  printf("addr:%d, %d\n", cval, p);
  printf("addr:%d, %d\n", icval, pi);
  printf("addr:%d, %d\n", arr_val, pa);
  printf("addr:%d, %d\n", arr_val, par);
}

/* PC Operation results under 64 bit system
0x68 0x65 0x6c 0x6c 0x6f 0x0
pconst:hello
addr:6421994, 6421995
addr:6421968, 6421972
addr:6421936, 6421940
addr:6421936, 6421944 */

For arrays, the value is generally obtained from 0 and ends with length-1, adopt [0, length) semi open and semi closed interval access, which is generally not a problem, but sometimes, when we need to read the array backwards, we may mistakenly take length as the starting point, resulting in access out of bounds. In addition, when operating the array, sometimes in order to save space, the accessed subscript variable i is defined as unsigned char type, while the range of unsigned char type in C language is 0 ~255. If the array is large, it will cause the array to fail to cut off when it exceeds, so as to fall into an endless loop. This can be easily avoided in the initial code construction, but if the demand is changed in the later stage, there will be hidden dangers in other places where the array is used after increasing the array, which needs special attention.

As mentioned earlier, the space occupied by the pointer is related to the addressing width of the chip. The 32-bit platform is 4 bytes and the 64 bit platform is 8 bytes, and the length of the pointer addition and subtraction operation is related to its type. For example, the char type is 1 and the int type is 4. If you carefully observe the above code, you will find that the value of par increases by 8 because of the pointer pointing to the pointer, The corresponding variable is the pointer, that is, the length is the length of the pointer type, which is 8 on the 64 bit platform and 4 on the 32-bit platform. It is not difficult to understand these knowledge, but the slight carelessness of these features in engineering application will bury problems that are not easy to detect. In addition, the pointer also supports cast, which is quite useful in some cases. Refer to the following code:

#include <stdio.h>

typedef struct
{
  int b;
  int a;
}STRUCT_VAL;
static __align(4) char arr[8] = {0x12, 0x23, 0x34, 0x45, 0x56, 0x12, 0x24, 0x53};
int main(void)
{
    STRUCT_VAL *pval;
    int *ptr;
    pval = (STRUCT_VAL *)arr;
    ptr = (int *)&arr[4];
    printf("val:%d, %d", pval->a, pval->b);
    printf("val:%d,", *ptr);
}
//0x45342312 0x53241256
//0x53241256

Pointer based coercion efficiently and quickly solves the problem of data analysis in protocol analysis and data storage management, but the data alignment and size end involved in the processing process are common and error prone problems, such as the above arr character array__ The forced definition of align(4) as 4-byte alignment is necessary to ensure that non aligned access exceptions will not be triggered during subsequent conversion to int pointer access. If there is no forced definition, char is 1-byte aligned by default, Of course, this does not necessarily trigger an exception (the address of arr is determined by the layout of the whole memory, which is also related to whether the actually used space supports non aligned access. For example, when some SDRAM uses non aligned access, an exception will be triggered), which may lead to the increase or decrease of other variables, which may trigger this exception, and the place where the exception is often irrelevant to the added variables, Moreover, the code runs normally on some platforms and triggers exceptions after switching platforms. This hidden phenomenon is difficult to find and solve in embedded systems. In addition, the C language pointer has a special usage, that is, it is accessed through forced conversion to a specific physical address, and the callback is realized through the function pointer, as follows:

#include <stdio.h>

typedef int (*pfunc)(int, int);
int func_add(int a, int b){
 return a+b;
}
int main(void)
{
    pfunc *func_ptr;
    *(volatile uint32_t *)0x20001000 = 0x01a23131;
    func_ptr = func_add;
    printf("%d\n", func_ptr(1, 2));
}

As explained here, volatile is changeable and variable. It is generally used in the following situations:

1) Hardware registers of parallel devices (e.g. status registers)

2) Non automatic variables that can be accessed in an interrupt service subroutine

3) Variables shared by several tasks in multithreaded applications

Volatile can solve the problem of asynchrony when the user mode and abnormal interrupt access the same variable. In addition, volatile also prevents the optimization of address access when accessing the hardware address, so as to ensure the access of the actual address. Being proficient in the application of volatile is very important in the embedded bottom layer and one of the basic requirements of embedded C practitioners. Function pointer is not common in the development of general embedded software, but for many important implementations, such as asynchronous callback and driver module, using function pointer can realize many applications in a simple way. Of course, I can only say that I can throw a brick to attract jade, and many detailed knowledge is worth understanding and mastering in detail.

5. Structure type and alignment

C language provides user-defined data types to describe a class of transactions with the same feature points. The main support includes structure, enumeration and union. Enumeration restricts data access through aliases, which makes the data more intuitive and easy to read. The implementation is as follows:

typedef enum {spring=1, summer, autumn, winter }season;  

season s1 = summer; 

The of the consortium is the data type that can store different types of data in the same storage space. For the occupied space of the consortium, the variable with the largest occupied space shall prevail, as follows:

typedef union{     
  char c;     
  short s;     
  int i; 
}UNION_VAL;
 
UNION_VAL val; 
int main(void) 
{     
  printf("addr:0x%x, 0x%x, 0x%x\n",            
         (int)(&(val.c)), (int)(&(val.s)), (int)(&(val.i)));     
  val.i = 0x12345678;     
  if(val.s == 0x5678)         
    printf("Small end mode\n");       
  else         
    printf("Big end mode\n");     
} 
/*
addr:0x407970, 0x407970, 0x407970 
Small end mode
*/ 

The purpose of the consortium is mainly to access the internal data segment by sharing the memory address, which provides a simpler way to parse some variables. In addition, the size end mode of the test chip is also a common application of the consortium. Of course, the purpose can also be achieved by using the forced conversion of the pointer, as follows:

int data = 0x12345678; 
short *pdata = (short *)&data; 
if(*pdata == 0x5678)     
  printf("%s\n", "Small end mode"); 
else   
  printf("%s\n", "Big end mode");  

It can be seen that using a consortium can avoid the abuse of pointers in some cases.

Structure is a collection of variables with common characteristics. Compared with C + + classes, it has no security access restrictions and does not support direct internal functions. However, through custom data types and function pointers, it can still realize many operations similar to classes. For most embedded projects, Structured data processing is very convenient for optimizing the overall architecture and later maintenance, as illustrated by the following examples:

typedef int (*pfunc)(int, int); 
typedef struct{     
  int num;     
  int profit;     
  pfunc get_total; 
}STRUCT_VAL;
  
int GetTotalProfit(int a, int b)
{     
  return a*b; 
}  

int main(void){     
  STRUCT_VAL Val;     
  STRUCT_VAL *pVal;      
  Val.get_total = GetTotalProfit;     
  Val.num = 1;     
  Val.profit = 10;     
  printf("Total:%d\n",  Val.get_total(Val.num, Val.profit));  //Variable access    
  pVal = &Val;     
  printf("Total:%d\n",  pVal->get_total(pVal->num, pVal->profit)); //Pointer access 
} 
/* 
Total:10 
Total:10 
*/ 

The structure of C language supports the access of pointers and variables. The data in any memory can be parsed through conversion (such as the pointer forced conversion resolution protocol mentioned earlier). In addition, by packaging data and function pointers and passing them through pointers, it is an important basis for realizing the real interface switching of drive layer, which has important practical significance. In addition, based on bit domain, consortium, Structure, which can realize another bit operation, which is of great significance for encapsulating the underlying hardware register. The practice is as follows:

typedef unsigned char uint8_t; 
  union reg{     
    struct{         
    uint8_t bit0:1;         
    uint8_t bit1:1;         
    uint8_t bit2_6:5;         
    uint8_t bit7:1;     
  }bit;     
  uint8_t all; 
}; 

int main(void)
{     
  union reg RegData;     
  RegData.all = 0;      
  RegData.bit.bit0 = 1;     
  RegData.bit.bit7 = 1;     
  printf("0x%x\n", RegData.all);      
  RegData.bit.bit2_6 = 0x3;     
  printf("0x%x\n", RegData.all); 
} 
/* 
0x81 
0x8d
*/ 

Through the Union and bit field operation, the bit in the data can be accessed, which provides a simple and intuitive processing method on the platform with limited registers and memory. In addition, another important knowledge point for the structure is alignment. Through alignment access, the operation efficiency can be greatly improved. However, due to the storage length introduced by alignment, It is also an error prone problem. For the understanding of alignment, it can be classified as the following description.

Basic data type: aligned with the default length, such as char aligned with 1 byte, short aligned with 2 bytes, etc

Array: align according to the basic data type. If the first one is aligned, the following ones will be aligned naturally.

Consortium: aligned according to the data type with the largest length.

Structure: each data type in the structure shall be aligned, and the structure itself shall be aligned with the maximum internal data type length

union DATA{     
  int a;     
  char b; 
};  
struct BUFFER0{     
  union DATA data;     
  char a;     
  //reserved[3]     
  int b;     
  short s;     
  //reserved[2] 
}; //16 bytes  
struct BUFFER1{     
  char a;              
  //reserved[0]     
  short s;    
  union DATA data;     
  int b; 
};//12 bytes  

int main(void) 
{     
  struct BUFFER0 buf0;     
  struct BUFFER1 buf1;          
  printf("size:%d, %d\n", sizeof(buf0), sizeof(buf1));     
  printf("addr:0x%x, 0x%x, 0x%x, 0x%x\n",              
       (int)&(buf0.data), (int)&(buf0.a), (int)&(buf0.b), (int)&(buf0.s));          
  printf("addr:0x%x, 0x%x, 0x%x, 0x%x\n",              
       (int)&(buf1.a), (int)&(buf1.s), (int)&(buf1.data), (int)&(buf1.b)); 
} 
/* 
size:16, 12 
addr:0x61fe10, 0x61fe14, 0x61fe18, 0x61fe1c 
addr:0x61fe04, 0x61fe06, 0x61fe08, 0x61fe0c 
*/ 

Among them, the size of the union Union is consistent with the largest internal variable int, which is 4 bytes. According to the read value, we know that the actual memory layout is consistent with the filling position. In fact, learning to understand the alignment mechanism of C language through filling is an effective and fast way.

6. Pretreatment mechanism

C language provides rich preprocessing mechanisms to facilitate the implementation of cross platform code. In addition, the data and code block replacement, string formatting and code segment switching realized by C language through macro mechanism are of great significance for engineering applications. The following describes the common preprocessing mechanisms in the application of C language according to the functional requirements.

#The include include include file command. In C language, its effect is to insert all the contents of the include file into the current location. This includes not only header files, some parameter files and configuration files, but also the file can be inserted into the specified location of the current code. Where < > and "" respectively indicate whether to retrieve from the standard library path or user-defined path.

#The common usage of define macro definition includes defining constants or code segment aliases. Of course, in some cases, the unified processing of interfaces can be realized in combination with ## formatting strings. Examples are as follows:

#define MAX_SIZE  10
#define MODULE_ON  1
#define ERROR_LOOP() do{\
                     printf("error loop\n");\
                   }while(0);
#define global(val) g_##val
int global(v) = 10;
int global(add)(int a, int b)
{
    return a+b;
}

#if..#elif...#else...#endif, #ifdef..#endif, #ifndef...#endif condition selection judgment. Condition selection is mainly used to switch code blocks. In this kind of comprehensive projects and cross platform projects, it is often used to meet the needs of many situations.

#undef undefined parameters to avoid redefinition problems.

#error, #warning is used for user-defined alarm information. It can be used in conjunction with #if and #ifdef to limit the predefined configuration of errors.

#The predefined processing of pragma with parameters is the common #pragma pack(1). However, after use, the whole subsequent files will be aligned with the set bytes. This problem can be solved by combining push and pop. The code is as follows:

#pragma pack(push)
#pragma pack(1)
struct TestA
{
   char i;
   int b;
}A;
#pragma pack(pop); // Pay attention to calling pop, otherwise subsequent files will be aligned with the value defined by pack, and the execution is not as expected
 Equivalent to
 struct _TestB{  
   char i;
   int b;
 }__attribute__((packed))A; 

7. Summary

If you see here, you should have a clear understanding of C language. Embedded C language gives developers full freedom in dealing with hardware physical address, bit operation and memory access. Through array, pointer and forced conversion skills, you can effectively reduce the replication process in data processing, which is necessary for the bottom layer, It also facilitates the development of the whole architecture. However, the problems of illegal access, overflow, cross-border, alignment of different hardware platforms, data width and size end brought about by this freedom can generally be handled by the function designer. For those who take over the project later, if their own design does not consider these problems clearly, it often represents problems and troubles, so for any embedded C practitioner, Clearly grasp these basic knowledge and necessary.

That's all for the preliminary summary of embedded C language, but the key and difficult points of C language in embedded application are not only these, such as inline assembly supported by embedded C language, reliability realization between communication, stored data verification and integrity assurance. These engineering applications and skills are difficult to be explained in simple words, In addition, it is also worth explaining in detail the skills of finding and solving after exception triggering. Since the space and myself have not been sorted out clearly, I'll stop here first. If I'm free later, I'll share them. In addition, the knowledge points mentioned in this article are only briefly described due to space reasons, and the internal principles and more applications are not investigated in detail. If they are encountered in work or study, it is highly recommended to study from other materials. In addition, due to my limited ability, there may be mistakes in my understanding in this article, If there is something I don't understand or missing, I'm also very welcome to point out. I will be taught with an open mind.

References:

  1. < C Primer Plus > -- Fifth Edition
  2. < high quality C/C + + Programming Guide > -- Lin Yue
  3. MDK document description

Keywords: C Embedded system STM

Added by The Saint on Sun, 16 Jan 2022 12:54:58 +0200