iOS voice to text implementation

At present, an IM APP is being developed. Internal friends can send voice. Long press is required to realize the function of voice to text. Previously, Alibaba NUI was used Framework, but this crap often repeats the text transferred out. It can't be handled even with multi-channel control. The experience is too poor. No way, I decided to replace it with apple's own implementation. After all, siri is so powerful! This implementation includes local audio and remote audio. You only need to save the corresponding path in the data model on time, and the internal will be recognized automatically.

Now let's look at the implementation conditions:

In info Add two key value pairs to plist:

1. Privacy - Speech Recognition Usage Description (for requesting voice recognition) 2. Privacy - Microphone Usage Description (for requesting microphone voice input authorization).

And give the corresponding text description.

Import library file:

#import <Speech/Speech.h>

The following is the implementation header file and logical file:

Header file: nsvoice2text h

#import <Foundation/Foundation.h>
#import <Speech/Speech.h>

NS_ASSUME_NONNULL_BEGIN

typedef NS_ENUM(NSUInteger, NSVoice2TextAuthorationStatus) {
    NSVoice2TextAuthorizationStatusNotDetermined,  //Speech recognition not authorized
    NSVoice2TextAuthorizationStatusDenied,         //The user refused to use speech recognition
    NSVoice2TextAuthorizationStatusRestricted,     //Speech recognition is limited on this device
    NSVoice2TextAuthorizationStatusAuthorized,     //Speech recognition
};


@interface NSVoiceModel : NSObject
@property (nonatomic,copy) NSString *path;

@property (nonatomic,assign) NSInteger taskId;
@end


@interface NSVoice2TextFinal : NSObject
@property (nonatomic,copy) NSString *value;

@property (nonatomic,assign) NSInteger taskId;

@property (nonatomic,copy) NSError * __nullable error;
@end


@interface NSVoice2Text : NSObject

+ (BOOL) isRunning;

//jurisdiction
+ (void)voice2TextRequestAuthorationStatus:(void (^)(NSVoice2TextAuthorationStatus status))requestBlock;

+ (void)voice2TextGotter:(NSArray <NSVoiceModel *>*)glist runningModelBlock:(void (^__nullable)(NSVoiceModel *amodel))runningModelBlock resultsBlock:(void (^)(NSVoice2TextFinal *finalValue))resultsBlock;

@end

NS_ASSUME_NONNULL_END

Implementation file: nsvoice2text m

#import "NSVoice2Text.h"

typedef void (^VoiceConversionResultsBlock) (NSVoice2TextFinal *finalValue);

@interface NSVoiceModel ()
@property (nonatomic, copy) VoiceConversionResultsBlock voiceConversionBlock;

@property (nonatomic, copy) void (^voiceConversionRunningBlock)(NSVoiceModel *md);
@end

@implementation NSVoiceModel

@end

@implementation NSVoice2TextFinal
@end


static NSVoice2Text *v2text = nil;

@interface NSVoice2Text ()<SFSpeechRecognizerDelegate>
{
    BOOL isRunning;
    NSMutableArray <NSVoiceModel *>* taskList;
}

@property (nonatomic, assign) NSVoice2TextAuthorationStatus authorationStatus;

@property(nonatomic,strong)SFSpeechRecognizer *speechRecognizer;//Speech recognizer
@property(nonatomic,strong) SFSpeechURLRecognitionRequest *recognitionRequest;//Speech recognition request
@property (nonatomic, strong) SFSpeechRecognitionTask *recognitionTask;//Voice Task Manager

@end

@implementation NSVoice2Text
- (instancetype)init
{
    self = [super init];
    if (self)
    {
        taskList = [NSMutableArray arrayWithCapacity:0];
    }
    return self;
}

+ (instancetype)shareInstance
{
    if (!v2text)
    {
        v2text = [[NSVoice2Text alloc] init];
    }
    
    return v2text;
}


+ (void)releaseInstance
{
    if (v2text)
    {
        v2text = nil;
    }
}


- (SFSpeechRecognizer *)speechRecognizer
{
    if (_speechRecognizer == nil) {
        NSLocale *cale = [[NSLocale alloc]initWithLocaleIdentifier:@"zh-CN"];
        _speechRecognizer = [[SFSpeechRecognizer alloc]initWithLocale:cale];
        _speechRecognizer.delegate = self;
    }
    return _speechRecognizer;
}


+ (BOOL) isRunning
{
    return [NSVoice2Text shareInstance]->isRunning;
}


- (void)resume
{
    isRunning = YES;
    NSVoiceModel *md = [self->taskList firstObject];
    if (md)
    {
        if (md.voiceConversionRunningBlock)
        {
            md.voiceConversionRunningBlock(md);
        }
        if (md.path && md.path > 0)
        {
            NSString *text = @"^(http|https)+.*";
            NSPredicate *regextest = [NSPredicate predicateWithFormat:@"SELF MATCHES %@", text];
            BOOL flag = [regextest evaluateWithObject:md.path];
            if (flag)
            {
                [self startVoiceConversionWithURL:md.path];
            }
            else
            {
                [self startVoiceConversionWithFilePath:md.path];
            }
        }
        else
        {
            NSVoice2TextFinal *el = [[NSVoice2TextFinal alloc] init];
            el.taskId = -1;
            el.error = [NSError errorWithDomain:@"Voice path error or empty" code:404 userInfo:nil];
            md.voiceConversionBlock(el);
        }
    }
    else
    {
        isRunning = NO;
        [NSVoice2Text releaseInstance];
    }
}


- (void)addItToTask:(NSVoiceModel *)md
{
    [taskList addObject:md];
}


+ (void)voice2TextRequestAuthorationStatus:(void (^)(NSVoice2TextAuthorationStatus status))requestBlock
{
    //Send voice authentication request (first judge whether the device supports voice recognition function)
    [SFSpeechRecognizer requestAuthorization:^(SFSpeechRecognizerAuthorizationStatus status)
    {
        [[NSVoice2Text shareInstance] setAuthorationStatus:status];
        requestBlock(status);
    }];
}

+ (void)voice2TextGotter:(NSArray <NSVoiceModel *>*)glist runningModelBlock:(void (^__nullable)(NSVoiceModel *amodel))runningModelBlock resultsBlock:(void (^)(NSVoice2TextFinal *finalValue))resultsBlock
{
    [glist enumerateObjectsUsingBlock:^(NSVoiceModel * _Nonnull obj, NSUInteger idx, BOOL * _Nonnull stop) {
        [obj setVoiceConversionRunningBlock:runningModelBlock];
        [obj setVoiceConversionBlock:resultsBlock];
        [[NSVoice2Text shareInstance] addItToTask:obj];
    }];
    
    if (![NSVoice2Text shareInstance]->isRunning)
    {
        [[NSVoice2Text shareInstance] resume];
    }
}


- (void)startVoiceConversionWithFilePath:(NSString *)path
{
    self.recognitionRequest = [[SFSpeechURLRecognitionRequest alloc]initWithURL:[NSURL fileURLWithPath:path]];
    [self startVoiceConversion];
}

- (void)startVoiceConversionWithURL:(NSString *)url
{
    self.recognitionRequest = [[SFSpeechURLRecognitionRequest alloc]initWithURL:[NSURL URLWithString:url]];
    [self startVoiceConversion];
}

#pragma mark - private methods
///Start conversion
- (void)startVoiceConversion
{
    __weak typeof(taskList) weakTaskList = taskList;
    __weak typeof(self) this = self;
    self.recognitionTask =  [self.speechRecognizer recognitionTaskWithRequest:self.recognitionRequest resultHandler:^(SFSpeechRecognitionResult * _Nullable result, NSError *  error){
        if (!error)
        {
            NSVoiceModel *md = [weakTaskList firstObject];
            if (result)
            {
                BOOL isFinal = [result isFinal];//End
                if (isFinal)
                {
                    NSString *str = [[result bestTranscription]formattedString];
                    NSVoice2TextFinal *el = [[NSVoice2TextFinal alloc] init];
                    el.taskId = md.taskId;
                    el.error = nil;
                    el.value = str;
                    md.voiceConversionBlock(el);
                    
                    [weakTaskList removeObject:md];
                    [this resume];
                }
            }
            else
            {
                NSVoice2TextFinal *el = [[NSVoice2TextFinal alloc] init];
                el.taskId = md.taskId;
                el.error = error;
                md.voiceConversionBlock(el);
                
                [weakTaskList removeObject:md];
                [this resume];
            }
        }
    }];
}
@end

The queue to text function has been implemented inside this implementation. You only need to pass in the data model at any time.

Code analysis:

1. Permission request

+ (void)voice2TextRequestAuthorationStatus:(void (^)(NSVoice2TextAuthorationStatus status))requestBlock;

It is used to request privacy permission. This function can only be used with the consent of the user. Otherwise, this function cannot be used.

2. Incoming audio file path

+ (void)voice2TextGotter:(NSArray <NSVoiceModel *>*)glist 
runningModelBlock:(void (^__nullable)(NSVoiceModel *amodel))runningModelBlock 
resultsBlock:(void (^)(NSVoice2TextFinal *finalValue))resultsBlock

The audio is transferred in as NSVoiceModel. The mapping relationship between your audio file and this model is realized. taskID is used to realize binding. Refer to the definition and implementation of header file.

2.1 runningModelBlock supports queue transactions, so it will be output to the public if it is currently processing. The "converting" text can be displayed on this page

2.2 resultsBlock, convert the result text and output it with NSVoice2TextFinal. You only need to deal with the logic in it.

3. Complete use:

[NSVoice2Text voice2TextRequestAuthorationStatus:^(NSVoice2TextAuthorationStatus status)
    {
        if (status == NSVoice2TextAuthorizationStatusAuthorized)
        {
            NSVoiceModel *md = [[NSVoiceModel alloc] init];
            [md setTaskId:[bmodel.messageId integerValue]];
            [md setPath:bmodel.audioFilePath];
            
            [NSVoice2Text voice2TextGotter:@[md] runningModelBlock:^(NSVoiceModel * _Nonnull amodel)
            {
                NSString *taskId = intToStr(amodel.taskId);
                //Find the UI of the corresponding processing through taskId, and display "converting"
            }
            resultsBlock:^(NSVoice2TextFinal * _Nonnull finalValue)
            {
                if (!finalValue.error)
                {
                    NSString *taskId = intToStr(finalValue.taskId);
                    NSString *trTexgt = [finalValue value];

                    //Find the corresponding UI through taskId, complete the conversion, and get the converted text
                }
                else
                {
                    NSString *taskId = intToStr(finalValue.taskId);
                    //If the voice conversion corresponding to this taskId fails, you can also find the corresponding UI and display the text "conversion failed"
                }
            }];
        }
        else
        {
            [weakSelf showToastMessageThenHide:@"Unauthorized speech recognition"];
        }
    }];

Keywords: iOS AI

Added by nickmanners on Sat, 01 Jan 2022 04:17:56 +0200

Programming VIP

iOS voice to text implementation

Popular Keywords