ESP32 access Baidu Intelligent Cloud speech recognition, online speech recognition

1, Overview

It uses ESP32 to access Baidu intelligent cloud to realize online speech recognition. Realizing the most basic speech recognition function is still very simple, but we still encounter some small problems. Record it here.

2, Step summary

(1) select "voice recognition" at Baidu cloud control terminal and create an application to obtain API Key and Secret Key
(2) obtain the token according to the API Key and Secret Key generated by the creation application
(3) collect audio data, package the data into the specified format, and POST it to the request API
(4) receive the returned data

3, Concrete implementation

1, Create a speech recognition application

2, Obtain the token according to the API Key and Secret Key generated by the created application

after creating an application, click management application, and there will be API Key and Secret Key, as shown in the following figure

with API Key and Secret Key, you can get the token. The code for getting the token by ESP32 is attached below

void gain_token(void)   //Get token
{
    int httpCode;
    //Note that you should put your name in the following website_ Apikey and your_ Replace the Secret Key with its own API Key and Secret Key
    http_client.begin("https://aip.baidubce.com/oauth/2.0/token?grant_type=client_credentials&client_id=your_apikey&client_secret=your_secretkey");
    httpCode = http_client.GET();
    if(httpCode > 0) {
        if(httpCode == HTTP_CODE_OK) {
            String payload = http_client.getString();
            Serial.println(payload);
        }
    }
    else {
        Serial.printf("[HTTP] GET... failed, error: %s\n", http_client.errorToString(httpCode).c_str());
    }
    http_client.end();
}

the following data will be returned if the request is successful

{
  "refresh_token": "25.b55fe1d287227ca97aab219bb249b8ab.315360000.1798284651.282335-8574074",
  "expires_in": 2592000,
  "scope": "public wise_adapt",
  "session_key": "9mzdDZXu3dENdFZQurfg0Vz8slgSgvvOAUebNFzyzcpQ5EnbxbF+hfG9DQkpUVQdh4p6HbQcAiz5RmuBAja1JJGgIdJI",
  "access_token": "24.6c5e1ff107f0e8bcef8c46d3424a0e78.2592000.1485516651.282335-8574074",
  "session_secret": "dfac94a3489fe9fca7c3221cbf7525ff"
}

access_ The value corresponding to the token is the available token. The valid period of each applied token is 30 days. If it expires, you need to apply again. You can apply for multiple tokens. You don't need to call the program to get the token every time. You can apply for one for 30 days and update it regularly.

3, Collect data and POST it to the request API

there are two POST modes for data upload: JSON format and RAW format.
here is how to upload in JSON format. The following figure shows some necessary parameters for uploading in JSON format

the data type and content in the figure are very clear. You only need to package the data in this format and send it. The following is the specific implementation code of ESP32.

if(digitalRead(key)==0) //Press the key
{
    Serial.printf("Start recognition\r\n\r\n");
    digitalWrite(led,HIGH);
    adc_start_flag=1;
    
    timerStart(timer);
    while(!adc_complete_flag)  //Wait for the acquisition to complete
    {
        ets_delay_us(10);
    }
    timerStop(timer);
    adc_complete_flag=0;        //Clear sign

    digitalWrite(led,LOW);
    
    memset(data_json,'\0',strlen(data_json));   //Empty array
    strcat(data_json,"{");
    strcat(data_json,"\"format\":\"pcm\",");
    strcat(data_json,"\"rate\":8000,");         //Sampling rate if the sampling rate changes, remember to modify the value. There are only 16000 and 8000 fixed sampling rates
    strcat(data_json,"\"dev_pid\":1537,");      //Mandarin Chinese
    strcat(data_json,"\"channel\":1,");         //Mono
    strcat(data_json,"\"cuid\":\"123456\",");   //The identification code can be typed in a few characters, but it is better to be unique
   	strcat(data_json,"\"token\":\"XXXXXXXXXXXXXXXXXX\",");  //token 		 Here, you need to change it to the token you applied for
    strcat(data_json,"\"len\":32000,");         //Data length if the transmitted data length changes, remember to modify this value. This value is the number of data bytes collected by ADC, not the length after base64 coding
    strcat(data_json,"\"speech\":\"");
    strcat(data_json,base64::encode((uint8_t *)adc_data,sizeof(adc_data)).c_str());     //base64 encoded data
    strcat(data_json,"\"");
    strcat(data_json,"}");

    int httpCode;
    http_client.begin("http://vop.baidu.com/server_api"); 		// Request API
    http_client.addHeader("Content-Type","application/json");	//Set fixed header: content type: application / JSON
    httpCode = http_client.POST(data_json);

    if(httpCode > 0) {
        if(httpCode == HTTP_CODE_OK) {
            String payload = http_client.getString();	//receive data 
            Serial.println(payload);
        }
    }
    else {
        Serial.printf("[HTTP] GET... failed, error: %s\n", http_client.errorToString(httpCode).c_str());
    }
    http_client.end();

    while (!digitalRead(key));
    Serial.printf("Recognition complete\r\n");
}

the above code splices the data into the required JSON format, sends it to the request API through POST, and receives the printed data message. The timer used is set to 8K frequency to collect audio data regularly. It is not shown in the above code, and a complete code will be attached later.
ESP32 has a JSON library, which is in the "cJSON.h" header file, but I'm useless, because I don't know why there are inexplicable errors when I find the data is too long, and I don't go deep into it. I use the function strcat() to splice the data into the specified format. Fortunately, it's more troublesome when writing, but the problem is not big.
the POST occurrence data has a fixed header: content type: application / JSON, which needs to be set before POST.

4, Receive data

in the code of the previous step, the received data is realized. Here are the returned data.

{"corpus_no":"6990616182318679817","err_msg":"success.","err_no":0,"result":["It's a lovely day."],"sn":"440339165021627629665"}

{"corpus_no":"6990616203881655850","err_msg":"success.","err_no":0,"result":["What do you have for lunch?"],"sn":"204332180621627629670"}

{"corpus_no":"6990616272746191297","err_msg":"success.","err_no":0,"result":["Turn on the light."],"sn":"657868059871627629686"}

if the data is sent successfully, the correct recognition data will be returned. Of course, the returned speech recognition will be inaccurate when the sound signal is bad.
remember, the returned speech recognition demerit is encoded in UTF-8 mode. At first, I wondered why the returned words are like traditional Chinese characters. I chose Chinese Putonghua, and then tried to change this and that. It's not easy to change that. I'm a little depressed. Because it was evening, I was lying in bed and suddenly thought that the returned voice result was Chinese coding. Isn't that the problem of different coding methods? I got up from bed, turned on the computer and changed to UTF-8 coding method. I tried it. OK!

4, Summary

The voice recognition service of Baidu intelligent cloud is free to receive a certain number of times. The 1 million 500 thousand time seems to be enough for us to test and use.
the above implementation steps and some display codes are only some key ones. Some of them are listed, such as ESP32 networking and data acquisition. Although they are not listed, the implementation is simple. The following will upload the completed code, which can realize the most basic short speech recognition code.
this is mainly written to record. I still encounter some problems when trying. I don't have many examples of online search ESP32 accessing Baidu speech recognition resources. When I do it, I will record and share it to help others and myself. Of course, it only realizes the most basic.

The above is purely personal experience, and personal ability is also limited. Errors may occur. If there are any problems, please correct them and I will correct them.

Some related links are attached below:
Baidu voice recognition document: https://ai.baidu.com/ai-doc/SPEECH/Vk38lxily . contains the request description, return description, error code, and so on.
token application description document: https://ai.baidu.com/ai-doc/REFERENCE/Ck3dwjhhu . describes how to use API Key and Secret Key to apply for a token.
An online HTTP request simulation tool: https://www.jsonla.com/http/test.html . can be used to simulate post and get requests, and can be used to obtain token s.
ESP32 JSON library description: https://blog.csdn.net/qq_36347513/article/details/116481167 Written by others, it introduces the instructions for the use of relevant programs of ESP32 JSON library, which is very detailed.

Added by nathanblogs on Tue, 04 Jan 2022 06:59:34 +0200

Programming VIP