Introduction and use of OpenSmile

1. Download and install opensmile

Download from the official website: openSMILE - audEERING

You can use it by decompressing it under windows

2. Use of opensmile

Extract various audio features of sound signals, mainly low level descriptors (llds)

3. Using opensmile

3.1 direct use under Windows

Run the to extract audio features in the form of command line.
① First switch to the processing file smilexract_ Release. Exe directory
② Reuse statement:
    SMILExtract_Release -C configuration file - I "audio to be processed" - O "path and file name of feature vector to be saved"

---->Control output data format (parameter)

=============================
-instname <string> Usually the name of the input file is saved in CSV and ARFF The first column of the output. The default is"unknow"
=============================
-lldcsvoutput, -D <filename>  start-up LLD Frame output to CSV Format file
-appendcsvlld <0/1> Set to 1 to add to existing CSV At the end of the file, 0 is overwritten by default
-timestampcsvlld  <0/1> Set to 0 to disable the output of time steps to CSV The second column is 1 by default
-headercsvlld <0/1> Set to 0 to disable the entry of titles into CSV,The default is 1
=============================
-lldhtkoutput <filename> start-up LLD Frame output to HTK Format file
=============================
-lldarffoutput, -D <filename> start-up LLD Frame output to ARFF Format file
-appendarfflld <0/1> Set to 1 to add to existing ARFF At the end of the file, 0 is overwritten by default
-timestamparfflld <0/1> Set to 0 to disable the output of time steps to ARFF The second column is 1 by default
-lldarfftargetsfile <file> The specified configuration contains a document defining the target domain (class). The default is: shared/arff_targets_conf.inc
=============================
-output, -O <filename> Default output options. ARFF Format, storage feature summary
-appendarff <0/1> Set to 0 to not add to existing ARFF At the end of the document, 1 is added by default 
-timestamparff <0/1> Set to 1 to output the time step to ARFF The second column is 0 by default
-arfftargetsfile <file>The specified configuration contains a document defining the target domain (class). The default is: shared/arff_targets_conf.inc
=============================
-csvoutput <filename> Default output options. CSV Format, storage feature summary
-appendcsv <0/1> Set to 0 to not add to existing CSV At the end of the document, the default is 1
-timestampcsv <0/1> Set to 0 to disable the output of time steps to CSV The second column is 1 by default
-headercsv <0/1> Set to 0 to disable the entry of titles into CSV,The default is 1
=============================
-htkoutput <filename> Output feature summary (function) to HTK Format file

3.2 use in Python

3.2.1 LLD feature extraction of single audio file

1. Set the OpenSmile path

2. Select and set the profile to use

3. Extract relevant features through system commands

import os

infilename = 'Ses01F_impro01_F002.wav'
outfilename =  'Ses01F_impro01_F002.csv'

#Set OpenSmile path
exe_opensmile = 'D:/opensmile-2.3.0/bin/Win32/SMILExtract_Release'
#Select and set the profile to use
path_config = 'D:/opensmile-2.3.0/config/ComParE_2016.conf'

#Set system commands
opensmile_options = '-configfile ' + path_config + ' -appendcsvlld 0 -timestampcsvlld 1 -headercsvlld 1'
outputoption = '-lldcsvoutput'
opensmile_call =exe_opensmile + ' ' + opensmile_options + ' -inputfile ' + infilename + ' ' + outputoption + ' ' + outfilename
#implement
os.system(opensmile_call)

3.2.2 batch processing

import os
from multiprocessing.dummy import Pool as ThreadPool

# Set your opensmile Extracter and path here
exe_opensmile = 'D:/opensmile-2.3.0/bin/Win32/SMILExtract_Release'
path_config = 'D:/opensmile-2.3.0/config/ComParE_2016.conf'

# Set your data path and output path here
data_path = "E:/Dataset/IEMOCAP_full_release/allwave"
save_path = './audio_features_ComParE2016/'  # output folder

# Extractor set-ups
opensmile_options = '-configfile ' + path_config + ' -appendcsvlld 0 -timestampcsvlld 1 -headercsvlld 1'
outputoption = '-lldcsvoutput'

def feature_extract(fn):
  infilename = addr_files + '/'+fn
  instname = os.path.splitext(fn)[0]
  outfilename = save_path + '/' + instname + '.csv'
  
  opensmile_call = exe_opensmile + ' ' + opensmile_options + ' -inputfile ' + infilename + ' ' + outputoption \
                   + ' ' + outfilename + ' -instname ' + instname + ' -output ?'
  os.system(opensmile_call)


for root, dirs, files in os.walk(data_path):
    for dir in dirs:
      files = os.listdir(data_path+'/'+dir)
      addr_files=data_path+'/'+dir

      pool = ThreadPool()
      pool.map(feature_extract, files)
      pool.close()
      pool.join()

4. Profile Introduction (various feature sets provided)

4.1 feature set for emotion analysis

4.2 detailed introduction

1.  IS09_emotion.conf

The names of the 16 low-level descriptors (LLDs) that appear in the CSV file:

  1. pcm_ RMS energy of rmsenergy signal frame
  2. mfcc# Mel frequency cepstrum coefficient 1-12
  3. Pcm_zcr# zero crossing rate of time signal (frame based)
  4. voiceProb , the utterance probability calculated from ACF.
  5. F0 = fundamental frequency calculated from cepstrum

Suffix attached to the name of the lower level descriptor_ sma indicates that they are smoothed by a moving average filter with a window length of 3.

Suffix attached to sma_ de indicates that the current feature is the first-order delta coefficient (differential) smoothed by the low-level descriptor.  

2.  IS10_paraling.conf

The names of the 34 low-level descriptors (LLDs) that appear in the CSV file:

  1. pcm_loudness , the normalized intensity is increased to the loudness of a power of 0.3
  2. mfcc # Mel frequency cepstrum coefficient 0-14
  3. logMelFreqBand = log power in MEL band 0-7 (distribution range from 0 to 8 kHz)
  4. lspFreq = 8 line spectrum pair frequencies calculated from 8 LPC coefficients
  5. F0finEnv# smooth fundamental frequency contour
  6. voicingFinalUnclipped , the utterance probability of the final fundamental frequency candidate

Four pitch related LLD names:

  1. F0final} smoothed fundamental frequency
  2. jitterLocal (frame to frame) jitter (pitch period length deviation)
  3. Jitter DDP differential inter frame jitter ('Jitter of the Jitter ')
  4. shimmerLocal local (frame to frame) flashing (pitch period amplitude deviation)

3. Other configuration files are similar

Ref:

(38 messages) introduction to opensmile_ qq_22237367 blog - CSDN blog_ opensmile 

Keywords: Python AI Deep Learning

Added by kjtocool on Sat, 15 Jan 2022 23:23:21 +0200