External call system "intelligent external call in mrcp mode"

Introduction to MRCP

This paper will explain the connection of outbound call system to mrcp (such as iFLYTEK mrcp). First, introduce mrcp:
The basic architecture of MRCP, which supports many media resources on the server side. Media resources include various media types. MRCP defines six media resource types:
basicsynth, which supports basic speech synthesis
speechsynth, supporting standard speech synthesis
dtmfrecog supports DTMF recognition
Speechlog, supporting speech recognition
recorder, supporting voice recording
speakverify, speaker verification, voiceprint matching

From the definition of types, we can see that the role of MRCP is obvious: make full use of the advantages of SIP protocol to perfectly solve the problems of media management and session control. From the perspective of SIP protocol, the session attribute itself is not the most important. It focuses more on the positioning of media resources and provides an integration function. Because of the media resource server query service provided by SIP protocol, MRCP client can obtain the support ability about media resources.

We can understand that MRCP server implements sip protocol and rtp protocol, connects itself as a sipserver and freeswitch as a client, and freeswitch transmits its rtp stream to MRCP server in real time.
In fact, mrcp protocol is to build a middleware for speech recognition (asr) and speech synthesis (tts). Therefore, mrcp server needs to interact with freeswitch, and freeswitch needs to interact with real users. In fact, mrcp server implements sip protocol and rtp protocol, taking itself as a sipserver and freeswitch as a client, At the same time, freeswitch transmits its rtp stream to mrcp server in real time.

Enter text

This paper uses the media bug to monitor the media, and builds the adk based on iFLYTEK and the connection between unimrcp server and freeswitch.

Let's refer to the existing materials on the Internet first, click here
This is the freeswitch with the largest number of star s in Chinese on github, but there are some pits that need to be optimized.

1, Build the MRCP server deployment. After compiling the dependency, execute the following command

./configure --with-apr=/opt/mrcp/MRCP-Plugin-Demo-master/unimrcp-deps-1.5.0/libs/apr --with-apr-util=/opt/mrcp/MRCP-Plugin-Demo-master/unimrcp-deps-1.5.0/libs/apr-util

The following -- with APR is the directory corresponding to my installation directory and code file, otherwise the compilation may not succeed for a long time.

2, Configure MRCP server and freeswitch configuration
freeswitch you need to configure two places,
a. You need to add a conf/mrcp_profiles/unimrcpserver-mrcp-v2.xml

<include>
 <profile name="unimrcpserver-mrcp-v2" version="2">
   <param name="client-ip" value="127.0.0.1"/>
   <param name="client-port" value="9060"/>
   <param name="server-ip" value="192.168.0.190"/>
   <param name="server-port" value="8060"/>
   <param name="sip-transport" value="udp"/>
   <param name="rtp-ip" value="192.168.0.190"/>
   <param name="rtp-port-min" value="4000"/>
   <param name="rtp-port-max" value="5000"/>
   <param name="codecs" value="PCMU PCMA L16/96/8000"/>
    <param name="speechsynth" value="speechsynthesizer"/>
    <param name="speechrecog" value="speechrecognizer"/>
   <synthparams>
   </synthparams>
   <recogparams>
       <param name="start-input-timers" value="false"/>
   </recogparams>
 </profile>
</include>

As I mentioned above, the client is freeswitch sip uas, so you can configure ip and sip ports according to your own

Server is the sip address configured by your MRCP server

3, Download the sdk corresponding to the training

Note that two should be checked, one is voice dictation (asr) and the other is voice synthesis (tts)

Replace the corresponding directory, plugins / third party / xfyun. Remember to modify the appkey in the code.

Otherwise, problems will be encountered during tts recognition:

MRCP/2.0 83 1 200 IN-PROGRESS
Channel-Identifier: ede2ac36452811ec@speechsynth

2021-11-14 16:57:43:392587 [WARN]   [xfyun] Synthesizing ...
2021-11-14 16:57:43:543463 [WARN]   [xfyun] QTTSAudioGet failed, error code: 10407.
2021-11-14 16:57:43:551022 [INFO]   Process SPEAK-COMPLETE Event <ede2ac36452811ec@speechsynth> [1]
2021-11-14 16:57:43:551051 [NOTICE] State Transition SPEAKING -> IDLE <ede2ac36452811ec@speechsynth>
2021-11-14 16:57:43:551089 [INFO]   Send MRCPv2 Data 192.168.0.190:1544 <-> 192.168.0.190:41578 [122 bytes]
MRCP/2.0 122 SPEAK-COMPLETE 1 COMPLETE
Channel-Identifier: ede2ac36452811ec@speec

10407 is the permission problem, so remember to replace xfyun_ In the login method, the asterisk after appid = does not need to be changed.
Then you will find the following problems

2021-11-14 16:41:52:936062 [NOTICE] Create RTP Termination Factory 192.168.0.190:[5000,6000]
2021-11-14 16:41:52:936073 [INFO]   Register RTP Termination Factory [RTP-Factory-1]
2021-11-14 16:41:52:936086 [INFO]   Load Plugin [XFyun-Recog-1] [/usr/local/unimrcp/plugin/xfyunrecog.so]
2021-11-14 16:41:52:936626 [WARN]   Failed to Load DSO: /lib/libmsc.so: undefined symbol: _ZTVN10__cxxabiv117__class_type_infoE
2021-11-14 16:41:52:936653 [INFO]   Load Plugin [XFyun-Synth-1] [/usr/local/unimrcp/plugin/xfyunsynth.so]
2021-11-14 16:41:52:937044 [WARN]   Failed to Load DSO: /lib/libmsc.so: undefined symbol: _ZTVN10__cxxabiv117__class_type_infoE
2021-11-14 16:41:52:937076 [INFO]   Register RTP Settings [RTP-Settings-1]

The project is very good. I don't know why. Someone responded to the problem, but no one updated the code. Is it semi open source [kidding]

This is the need to add - lstdc to the Makefile file++

./third-party/xfyun/samples/sch_translate_sample/Makefile:18:LDFLAGS += -lmsc -lrt -ldl -lpthread -lstdc++
./third-party/xfyun/samples/iat_online_sample/Makefile:18:LDFLAGS += -lmsc -lrt -ldl -lpthread -lstdc++
./third-party/xfyun/samples/ise_online_sample/Makefile:18:LDFLAGS += -lmsc -lrt -ldl -lpthread -lstdc++
./third-party/xfyun/samples/tts_online_sample/Makefile:18:LDFLAGS += -lmsc -lrt -ldl -lpthread -lstdc++
./xfyun-recog/xfyunrecog.la:20:dependency_libs=' -L../../plugins/third-party/xfyun/libs/x64 -lmsc -ldl -lpthread -lrt -lstdc++'
./xfyun-recog/Makefile.in:359:                              -lmsc -ldl -lpthread -lrt -lstdc++
./xfyun-recog/Makefile:359:                              -lmsc -ldl -lpthread -lrt -lstdc++
./xfyun-recog/.libs/xfyunrecog.lai:20:dependency_libs=' -L../../plugins/third-party/xfyun/libs/x64 -lmsc -ldl -lpthread -lrt -lstdc++'
./xfyun-recog/Makefile.am:8:                              -lmsc -ldl -lpthread -lrt -lstdc++

After the configuration is completed, the startup should not see the error report.

Write the route: use python instead of lua

   <extension name="mrcq_demo">
     <condition field="destination_number" expression="^5001$">
                          <action application="set" data="RECORD_TITLE=Recording ${destination_number} ${caller_id_number} ${strftime(%Y-%m-%d %H:%M)}"/>
                          <action application="set" data="RECORD_COPYRIGHT=(c) 2011"/>
                          <action application="set" data="RECORD_SOFTWARE=FreeSWITCH"/>
                          <action application="set" data="RECORD_ARTIST=FreeSWITCH"/>
                          <action application="set" data="RECORD_COMMENT=FreeSWITCH"/>
                          <action application="set" data="RECORD_DATE=${strftime(%Y-%m-%d %H:%M)}"/>
                          <action application="set" data="RECORD_STEREO=true"/>
                          <action application="record_session" data="$${base_dir}/recordings/archive/${strftime(%Y-%m-%d-%H-%M-%S)}_${destination_number}_${caller_id_number}_${call_uuid}.wav"/>
        <action application="answer"/>
        <action application="sleep" data="2000"/>
                <action application="python" data="mrcp"/>
     </condition>
   </extension>
#encoding=utf-8
from freeswitch import *
def handler1(session, args):
    call_addr='user/1018'
    session.execute("bridge", call_addr)

def handler(session, args):
    #uuid = "ggg"
    #console_log("1", "... test from my python program\n")
    #session = PySession(uuid)
    session.answer()
    session.set_tts_params("unimrcp", "xiaofang")
    session.speak("Hello, I love you, China, hey, you, love you�")
    #session.execute()
    session.execute("play_and_detect_speech", "say:please say yes or no. please say no or yes. please say something! detect:unimrcp {start-input-timers=false,no-input-timeout=5000,recognition-timeout=5000}builtin:grammar/boolean?language=en-US;y=1;n=2")

    session.hangup()

A simple demo experience is ok.

Optimize: make a simple interactive robot. The python code is as follows:

#encoding=utf-8
import json
import tempfile
import requests
import xml.etree.ElementTree as ET
import freeswitch as fs
from freeswitch import *


# `UNI_ENGINE`: unimrcp engine
# In Python, `+` is optional for quoted string concatenation, ^_^
UNI_ENGINE = 'detect:unimrcp {start-input-timers=false,' \
        'no-input-timeout=5000,recognition-timeout=5000}'
# this will be ignored by baidu ASR, and `chat-empty` is also available
UNI_GRAMMAR = 'builtin:grammar/boolean?language=en-US;y=1;n=2'

def asr2text(result):
    """fetch recognized text from asr result (xml)"""
    root = ET.fromstring(result)
    node = root.find('.//input[@mode="speech"]')
    text = None
    if node is not None and node.text:
        # node.text is unicode
        text = node.text.encode('utf-8')
    return text

def handler1(session, args):
    call_addr='user/1018'
    session.execute("bridge", call_addr)

def handler(session, args):
    fs.consoleLog('info', '>>> start chatbot service')
    #uuid = "ggg"
    #console_log("1", "... test from my python program\n")
    #session = PySession(uuid)
    session.answer()

    # First request proxy - what should be returned in the first sentence,
    answer_sound = Synthesizer()('How do you do, baby. ')

    while session.ready():
        # here, we play anser sound and detect user input in a loop
        session.execute('play_and_detect_speech',
                answer_sound + UNI_ENGINE + UNI_GRAMMAR)
        asr_result =  session.getVariable('detect_speech_result')
        if asr_result is None:
            # if result is None, it means session closed or timeout
            fs.consoleLog('CRIT', '>>> ASR NONE')
            break
        try:
            text = asr2text(asr_result)
        except Exception as e:
            fs.consoleLog('CRIT', '>>> ASR result parse failed \n%s' % e)
            continue
        fs.consoleLog('CRIT', '>>> ASR result is %s' % text)
        # len will get correct length with unicode
        if text is None or len(unicode(text, encoding='utf-8')) < 2:
            fs.consoleLog('CRIT', '>>> ASR result TOO SHORT')
            # answer_sound = sound_query('inaudible')
            answer_sound = Synthesizer()('Sorry, I didn't catch what you said. Please say it again.')
            continue
        # chat with robot
        # text = Robot()(text)
        fs.consoleLog('CRIT', 'Robot result is %s' % text)
        if not text:
            text = 'Sorry, I just lost myself on the road of life. What else can I do for you?'
        # speech synthesis
        answer_sound = Synthesizer()(text)
    
    # session close
    fs.msleep(800)
    session.hangup(1)
    # session.set_tts_params("unimrcp", "xiaofang")
    # session.speak("Hello, I love you, China, hey, you, love you")
    #session.playFile("/path/to/your.mp3", "")
    #session.speak("Please enter telephone number with area code and press pound sign. ")
    #input = session.getDigits("", 11, "*#", "#", 10000)
    # session.hangup(1)



class Synthesizer:

    def __init__(self):
        self.audiofile = tempfile.NamedTemporaryFile(prefix='session_', suffix='.wav')

    def __call__(self, text):
        if isinstance(text, unicode):
            text = text.encode('utf-8')
        audio = requests.post("http://127.0.0.1:8001/tts_text", files=dict(text=(None, text))).content
        import uuid
        name = str(uuid.uuid1())
        filename = "/tmp/" + name
        with open(filename, "wb") as file:
            file.write(audio.decode())
        return filename

Use your own trained offline speech synthesis model (tts)

summary

I hope you can improve your understanding of the significance and role of MRCP and master the docking MRCP through this article. We also have a full set of self-developed FreeSwitch, ASR, TTS and other capabilities, which have been updated to the privatized deployment version, which is safe and fast. In the follow-up, I will give more detailed explanations to each outbound call center and telephone network infrastructure. If you like, you can pay attention to me ~ if you have questions, you can leave a message or send a private letter to me

Keywords: NLP FreeSwitch

Added by dan7474 on Mon, 29 Nov 2021 18:41:01 +0200