A precise delay method in Cortex-M kernel

This paper introduces a precise delay method in Cortex-M kernel

Preface

Why study this method of delay?

  1. Many times when we run the operating system, we usually use a hardware timer - SysTick, while the clock beat of our operating system is generally set to 100-1000HZ, that is, 1ms - 10ms to generate an interrupt. Many bare metal tutorials use delay functions based on SysTick, which inevitably leads to conflicts.
  2. A lot of people will say, isn't there still a timer? The timer's timing is super accurate. I don't deny this point, but suppose that if a system always enters into timer interrupt (10us once / 1us once / 0.5us once), the whole system will be interrupted frequently, and the thread cannot run well. In addition, it also consumes a hardware timer resource. A hardware timer may do other things!
  3. According to the modification of ST HAL database, in fact, Jay thinks that everything in ST is good, that is, the HAL database out is too disgusting, and there is no way to do it. There is a HAL_Delay() in HAL database, which also uses SysTick delay. When porting the operating system, there will be many inconveniences, but fortunately, HAL_Delay() is a weak definition. We can rewrite the implementation of this function. Then Of course, kernel delay is the best way (I think so). Of course, you can write a simple delay with the for loop completely.
  4. Maybe what I said is not authoritative, so I quote a sentence in Cortex-M3 authoritative guide - "there are remaining counters in DWT, which are typically used for" profiling "of program code. By programming them, you can have them emit events (in the form of trace packets) when the counter overflows. Typically, the CYCCNT register is used to measure the number of cycles taken to perform a task, which can also be used for time benchmarking purposes (it can be used to count CPU utilization in the operating system). "

DWT in Cortex-M

In Cortex-M, there is a peripheral called DWT (data watch point and trace), which is used for system debugging and tracking.
It has a 32-bit register called CYCCNT, which is an up counter. It records the number of kernel clocks running. If the kernel clock beats once, the counter will add 1. The accuracy is very high, which determines the frequency of the kernel. If the F103 series, the kernel clock is 72M, the accuracy is 1/72M = 14ns, and the running time of the program is microsecond, so 14ns The accuracy of is far enough. The longest time that can be recorded is 32 power of 60s=2 / 72000000 (assuming that the kernel frequency is 72M, and the time that the kernel jumps once is about 1/72M=14ns). However, if the chip with 400M main frequency is H7, its timing accuracy is as high as 2.5ns (1 / 400000000 = 2.5). If it is i.MX RT1052, the longest time that can be recorded is 32 power of 8.13s=2 / 528000000.( Suppose the kernel frequency is 528M, and the time of kernel hop is about 1/528M=1.9ns). When CYCCNT overflows, it will clear 0 and start counting up again.

m3, m4 and m7 are available in actual measurement (m0 is not available).
Accuracy: 1 / core frequency (s).

To realize the function of delay, there are three registers: DEMCR, DWT ﹣ Ctrl and DWT ﹣ CYCCNT, which are respectively used to enable DWT function, enable CYCCNT and obtain the system clock value.

DEMCR

To enable DWT peripherals, you need to write 1 enable (key point, exam!!) controlled by bit 24 of another kernel debug register, DEMCR.
The address of DEMCR is 0xE000 EDFC

About DWT ﹣ cyccnt

Clear 0 before enabling DWT ﹣ cyccnt register.
Let's look at the base address of DWT ﹣ cypcnt. From the ARM-Cortex-M manual, we can see that its base address is 0xE000 1004, the reset default value is 0, and its type is readable and writable. When we write 0 to 0xE000 1004, we will clear DWT ﹣ cypcnt.

About CYCCNTENA

CYCCNTENA Enable the CYCCNT counter. If not enabled, the counter does not count and no event is
generated for PS sampling or CYCCNTENA. In normal use, the debugger must initialize
the CYCCNT counter to 0.
It is the first bit of the DWT control register. If write 1 is enabled, the CYCCNT counter will be enabled. Otherwise, the CYCCNT counter will not work.

In summary

To use the CYCCNT step of DWT:

  1. First enable DWT peripheral, which is controlled by bit 24 of other kernel debug register DEMCR, write 1 enable
  2. Clear 0 before enabling the CYCCNT register.
  3. Enable the CYCCNT register, which is controlled by the CYCCNTENA of DWT, that is, bit 0 of DWT control register, write 1 enable

code implementation

/**
  ******************************************************************
  * @file    core_delay.c
  * @author  fire
  * @version V1.0
  * @date    2018-xx-xx
  * @brief   Precise delay using kernel registers
  ******************************************************************
  * @attention
  *
  * Experimental platform: Wildfire STM32 development board  
  * Forum: http://www.firebbs.cn
  * Taobao: https://fire-stm32.taobao.com
  *
  ******************************************************************
  */
  
#include "./delay/core_delay.h"   

/*
**********************************************************************
*         Time stamp related register definition
**********************************************************************
*/
/*
 In Cortex-M, there is a peripheral called DWT (data watch point and trace).
 The peripheral has a 32-bit register called CYCCNT, which is an up counter.
 The number of kernel clocks is recorded. The longest time that can be recorded is:
 10.74s=2 32 times of / 40000000
 (Assuming that the kernel frequency is 400M, the time of one core hop is about 1/400M=2.5ns)
 When CYCCNT overflows, it will clear 0 and start counting up again.
 Operation steps of enabling CYCCNT counting:
 1,First enable DWT peripheral, which is controlled by bit 24 of other kernel debug register DEMCR, write 1 enable
 2,Clear 0 before enabling the CYCCNT register
 3,Enable CYCCNT register, which is controlled by bit 0 of DWT Ctrl (macro defined as DWT Cr in code), write 1 enable
 */


#define  DWT_CR      *(__IO uint32_t *)0xE0001000
#define  DWT_CYCCNT  *(__IO uint32_t *)0xE0001004
#define  DEM_CR      *(__IO uint32_t *)0xE000EDFC


#define  DEM_CR_TRCENA                   (1 << 24)
#define  DWT_CR_CYCCNTENA                (1 <<  0)


/**
  * @brief  Initialization timestamp
  * @param  nothing
  * @retval nothing
  * @note   This function must be called before using the delay function
  */
HAL_StatusTypeDef HAL_InitTick(uint32_t TickPriority)
{
    /* Enable DWT peripherals */
    DEM_CR |= (uint32_t)DEM_CR_TRCENA;                

    /* DWT CYCCNT Register count clear 0 */
    DWT_CYCCNT = (uint32_t)0u;

    /* Enable Cortex-M DWT CYCCNT register */
    DWT_CR |= (uint32_t)DWT_CR_CYCCNTENA;
  
    return HAL_OK;
}

/**
  * @brief  Read current timestamp
  * @param  nothing
  * @retval The current timestamp, which is the value of the DWT ﹣ cyccnt register
  */
uint32_t CPU_TS_TmrRd(void)
{        
  return ((uint32_t)DWT_CYCCNT);
}

/**
  * @brief  Read current timestamp
  * @param  nothing
  * @retval The current timestamp, which is the value of the DWT ﹣ cyccnt register
  */
uint32_t HAL_GetTick(void)
{        
  return ((uint32_t)DWT_CYCCNT/SysClockFreq*1000);
}


/**
  * @brief  Using internal count of CPU to realize precise delay, 32-bit counter
  * @param  us : Delay length in us
  * @retval nothing
  * @note   Before using this function, the CPU ﹣ TS ﹣ tmrinit function must be called to enable the counter.
            Or enable macro CPU? TS? Init? In? Delay? Function
            The maximum delay value is 8 seconds, i.e. 8 * 1000 * 1000
  */
void CPU_TS_Tmr_Delay_US(uint32_t us)
{
  uint32_t ticks;
  uint32_t told,tnow,tcnt=0;

  /* Initialize the time stamp register inside the function. */  
#if (CPU_TS_INIT_IN_DELAY_FUNCTION)  
  /* Initialize time stamp and clear */
  HAL_InitTick(5);
#endif
  
  ticks = us * (GET_CPU_ClkFreq() / 1000000);  /* Number of beats required */      
  tcnt = 0;
  told = (uint32_t)CPU_TS_TmrRd();         /* Counter value at first entry */

  while(1)
  {
    tnow = (uint32_t)CPU_TS_TmrRd();  
    if(tnow != told)
    { 
        /* 32 Bit counter is up counter */    
      if(tnow > told)
      {
        tcnt += tnow - told;  
      }
      /* Reload */
      else 
      {
        tcnt += UINT32_MAX - told + tnow; 
      } 
      
      told = tnow;

      /*If the time exceeds / equals to the time to be delayed, exit */
      if(tcnt >= ticks)break;
    }  
  }
}

/*********************************************END OF FILE**********************/
#ifndef __CORE_DELAY_H
#define __CORE_DELAY_H

#include "stm32h7xx.h"

/* Get kernel clock frequency */
#define GET_CPU_ClkFreq()       HAL_RCC_GetSysClockFreq()
#define SysClockFreq            (218000000)
/* For convenience, the CPU ﹣ TS ﹣ tmrinit function is called inside the delay function to initialize the time stamp register.
   This initializes the function every time it is called.
   Set the macro value to 0, and then call CPU ﹣ TS ﹣ tmrinit when the main function is just running to avoid initialization every time */  

#define CPU_TS_INIT_IN_DELAY_FUNCTION   0  


/*******************************************************************************
 * Function declaration
 ******************************************************************************/
uint32_t CPU_TS_TmrRd(void);
HAL_StatusTypeDef HAL_InitTick(uint32_t TickPriority);

//Before using the following functions, you must call the CPU ﹣ TS ﹣ tminit function enable counter, or enable the macro CPU ﹣ TS ﹣ init ﹣ in ﹣ delay ﹣ function
//The maximum delay is 8 seconds
void CPU_TS_Tmr_Delay_US(uint32_t us);
#define HAL_Delay(ms)     CPU_TS_Tmr_Delay_US(ms*1000)
#define CPU_TS_Tmr_Delay_S(s)       CPU_TS_Tmr_Delay_MS(s*1000)


#endif /* __CORE_DELAY_H */

matters needing attention:

If the user is not using in HAL library, comment out:

uint32_t HAL_GetTick(void)
{        
  return ((uint32_t)DWT_CYCCNT/SysClockFreq*1000);
}

At the same time, it is recommended to rename hal'inittick() function.

Rewrite the following macro definitions according to your platform:

/* Get kernel clock frequency */
#define GET_CPU_ClkFreq()       HAL_RCC_GetSysClockFreq()
#define SysClockFreq            (218000000)

Epilogue

In fact, in ucos-iii source code, one function is to measure the interruption time. It is to use STM32 time stamp to record a certain time when the program is running. If two time points before and after the program are recorded, the running time of the program can be calculated.
However, there is very little information about the description of kernel registers. Fortunately, we found an arm manual, which contains the detailed description of these kernel registers. The time stamp related registers are described in Chapter 10 and Chapter 11. You can ask me to take the information you want to see backstage.

Like to pay attention to me!

Relevant codes can be obtained by replying to "DWT" in the background of public account.

Welcome to the public account of "IoT development"

Keywords: C Database Programming

Added by ERuiz on Thu, 17 Oct 2019 00:26:12 +0300