1. Download and install opensmile
Download from the official website: openSMILE - audEERING
You can use it by decompressing it under windows
2. Use of opensmile
Extract various audio features of sound signals, mainly low level descriptors (llds)
3. Using opensmile
3.1 direct use under Windows
Run the to extract audio features in the form of command line.
① First switch to the processing file smilexract_ Release. Exe directory
② Reuse statement:
SMILExtract_Release -C configuration file - I "audio to be processed" - O "path and file name of feature vector to be saved"
---->Control output data format (parameter)
============================= -instname <string> Usually the name of the input file is saved in CSV and ARFF The first column of the output. The default is"unknow" ============================= -lldcsvoutput, -D <filename> start-up LLD Frame output to CSV Format file -appendcsvlld <0/1> Set to 1 to add to existing CSV At the end of the file, 0 is overwritten by default -timestampcsvlld <0/1> Set to 0 to disable the output of time steps to CSV The second column is 1 by default -headercsvlld <0/1> Set to 0 to disable the entry of titles into CSV,The default is 1 ============================= -lldhtkoutput <filename> start-up LLD Frame output to HTK Format file ============================= -lldarffoutput, -D <filename> start-up LLD Frame output to ARFF Format file -appendarfflld <0/1> Set to 1 to add to existing ARFF At the end of the file, 0 is overwritten by default -timestamparfflld <0/1> Set to 0 to disable the output of time steps to ARFF The second column is 1 by default -lldarfftargetsfile <file> The specified configuration contains a document defining the target domain (class). The default is: shared/arff_targets_conf.inc ============================= -output, -O <filename> Default output options. ARFF Format, storage feature summary -appendarff <0/1> Set to 0 to not add to existing ARFF At the end of the document, 1 is added by default -timestamparff <0/1> Set to 1 to output the time step to ARFF The second column is 0 by default -arfftargetsfile <file>The specified configuration contains a document defining the target domain (class). The default is: shared/arff_targets_conf.inc ============================= -csvoutput <filename> Default output options. CSV Format, storage feature summary -appendcsv <0/1> Set to 0 to not add to existing CSV At the end of the document, the default is 1 -timestampcsv <0/1> Set to 0 to disable the output of time steps to CSV The second column is 1 by default -headercsv <0/1> Set to 0 to disable the entry of titles into CSV,The default is 1 ============================= -htkoutput <filename> Output feature summary (function) to HTK Format file
3.2 use in Python
3.2.1 LLD feature extraction of single audio file
1. Set the OpenSmile path
2. Select and set the profile to use
3. Extract relevant features through system commands
import os infilename = 'Ses01F_impro01_F002.wav' outfilename = 'Ses01F_impro01_F002.csv' #Set OpenSmile path exe_opensmile = 'D:/opensmile-2.3.0/bin/Win32/SMILExtract_Release' #Select and set the profile to use path_config = 'D:/opensmile-2.3.0/config/ComParE_2016.conf' #Set system commands opensmile_options = '-configfile ' + path_config + ' -appendcsvlld 0 -timestampcsvlld 1 -headercsvlld 1' outputoption = '-lldcsvoutput' opensmile_call =exe_opensmile + ' ' + opensmile_options + ' -inputfile ' + infilename + ' ' + outputoption + ' ' + outfilename #implement os.system(opensmile_call)
3.2.2 batch processing
import os from multiprocessing.dummy import Pool as ThreadPool # Set your opensmile Extracter and path here exe_opensmile = 'D:/opensmile-2.3.0/bin/Win32/SMILExtract_Release' path_config = 'D:/opensmile-2.3.0/config/ComParE_2016.conf' # Set your data path and output path here data_path = "E:/Dataset/IEMOCAP_full_release/allwave" save_path = './audio_features_ComParE2016/' # output folder # Extractor set-ups opensmile_options = '-configfile ' + path_config + ' -appendcsvlld 0 -timestampcsvlld 1 -headercsvlld 1' outputoption = '-lldcsvoutput' def feature_extract(fn): infilename = addr_files + '/'+fn instname = os.path.splitext(fn)[0] outfilename = save_path + '/' + instname + '.csv' opensmile_call = exe_opensmile + ' ' + opensmile_options + ' -inputfile ' + infilename + ' ' + outputoption \ + ' ' + outfilename + ' -instname ' + instname + ' -output ?' os.system(opensmile_call) for root, dirs, files in os.walk(data_path): for dir in dirs: files = os.listdir(data_path+'/'+dir) addr_files=data_path+'/'+dir pool = ThreadPool() pool.map(feature_extract, files) pool.close() pool.join()
4. Profile Introduction (various feature sets provided)
4.1 feature set for emotion analysis
4.2 detailed introduction
1. IS09_emotion.conf
The names of the 16 low-level descriptors (LLDs) that appear in the CSV file:
- pcm_ RMS energy of rmsenergy signal frame
- mfcc# Mel frequency cepstrum coefficient 1-12
- Pcm_zcr# zero crossing rate of time signal (frame based)
- voiceProb , the utterance probability calculated from ACF.
- F0 = fundamental frequency calculated from cepstrum
Suffix attached to the name of the lower level descriptor_ sma indicates that they are smoothed by a moving average filter with a window length of 3.
Suffix attached to sma_ de indicates that the current feature is the first-order delta coefficient (differential) smoothed by the low-level descriptor.
2. IS10_paraling.conf
The names of the 34 low-level descriptors (LLDs) that appear in the CSV file:
- pcm_loudness , the normalized intensity is increased to the loudness of a power of 0.3
- mfcc # Mel frequency cepstrum coefficient 0-14
- logMelFreqBand = log power in MEL band 0-7 (distribution range from 0 to 8 kHz)
- lspFreq = 8 line spectrum pair frequencies calculated from 8 LPC coefficients
- F0finEnv# smooth fundamental frequency contour
- voicingFinalUnclipped , the utterance probability of the final fundamental frequency candidate
Four pitch related LLD names:
- F0final} smoothed fundamental frequency
- jitterLocal (frame to frame) jitter (pitch period length deviation)
- Jitter DDP differential inter frame jitter ('Jitter of the Jitter ')
- shimmerLocal local (frame to frame) flashing (pitch period amplitude deviation)
3. Other configuration files are similar
Ref:
(38 messages) introduction to opensmile_ qq_22237367 blog - CSDN blog_ opensmile