I would like to add a gender detection capability to a news video translator app I'm working on, so that the app can switch between male and female voice according to the voice onscreen. I'm not expecting 100% accuracy.
I used EZAudio to obtain waveform data of a time period of audio and used the average RMS value to set a threshold(cutOff) value between male and female. Initially cutOff = 3.3.
- (void)setInitialVoiceGenderDetectionParameters:(NSArray *)arrayAudioDetails
{
float initialMaleAvg = ((ConvertedTextDetails *)[arrayAudioDetails firstObject]).audioAverageRMS;
// The average RMS value of a time period of Audio, say 5 sec
float initialMaleVector = initialMaleAvg * 80;
// MaleVector is the parameter to change the threshold according to different news clippings
cutOff = (initialMaleVector < 5.3) ? initialMaleVector : 5.3;
cutOff = (initialMaleVector > 23) ? initialMaleVector/2 : 5.3;
}
Initially adjustValue = -0.9 and tanCutOff = 0.45. These values 5.3, 23, cutOff, adjustValue and tanCutOff are obtained from rigorous testing. Also tan of values are used to magnify the difference in values.
- (BOOL)checkGenderWithPeekRMS:(float)pRMS andAverageRMS:(float)aRMS
{
//pRMS is the peak RMS value in the audio snippet and aRMS is the average RMS value
BOOL male = NO;
if(tan(pRMS) < tanCutOff)
{
if(pRMS/aRMS > cutOff)
{
cutOff = cutOff + adjustValue;
NSLog(@"FEMALE....");
male = NO;
}
else
{
NSLog(@"MALE....");
male = YES;
cutOff = cutOff - adjustValue;
}
}
else
{
NSLog(@"FEMALE.");
male = NO;
}
return male;
}
Usage of the adjustValue is to calibrate the threshold each time a news video is translated as each video has different noise levels. But I know this method is noob-ish. What can I do create a stable threshold? or How can I normalise each audio snippet?
Alternate or more efficient ways to determine gender from audio wave data is also welcome.
Edit: From Nikolay's suggestion I researched on gender recognition using CMU Sphinx. Can anybody suggest how can I extract MFCC features and feed into a GMM/SVM classifier using Open Ears (CMU Sphinx for iOS platform) ?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…