Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.3k views
in Technique[技术] by (71.8m points)

tensorflow - What does the shape of a spectrogram really mean?

I have the following code taken from this tutorial.

def get_spectrogram(waveform):
  zero_padding = tf.zeros([4900] - tf.shape(waveform), dtype=tf.float32)
  waveform = tf.cast(waveform, tf.float32)
  equal_length = tf.concat([waveform, zero_padding], 0)
  spectrogram = tf.signal.stft(equal_length, frame_length=256, frame_step=128)
  spectrogram = tf.abs(spectrogram)
    
  return spectrogram

spectrogram = get_spectrogram(waveform)
print('Spectrogram shape:', spectrogram.shape)

And i have the following output of spectrogram shape.

Spectrogram shape: (37, 129)

What does the first and second value mean?

If I have 4900 samples and a frame_step of 128. Shouldn't the first value be 38?

4900/128 = 38.28125 -> 38 rounded

It also happens that with a Kotlin library I get a shape of (38, 127).

I need to understand, since I am implementing a model in Android with TFLite, therefore I am pre-processing the data from the mobile device.

question from:https://stackoverflow.com/questions/65838342/what-does-the-shape-of-a-spectrogram-really-mean

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

I'm not familiar exactly with Python API, but assuming it's doing similar to WaveBeans which I'm very familiar with, it looks like what you've got is the 2-dimensional matrix.

What you're doing is a Short Fourier Transform, which is basically taking FFT over time. Whilst the FFT magnitude or phase is 2-dimensional and can be represented as a 1-dimensional vector, the SFT is 3-dimensional and have also the time axes, which is why it is 2-dimensional vector.

So it looks like the 38 side is time indexes, the 127 side is frequency index, the values are the FFT value on specific time-frequency bin, though that are complex numbers. Thinking of it as a polar coordinates, the phase is the angle, the magnitude is the length. In your code seems you're getting the magnitude by calling .abs() function, so you've already got rid of complex number representation.

Within WaveBeans there is an API to work with FFT specifically to extract out the phase and magnitude, as well as frequency values, and time values.

So to just keep the answer full I'll provide a code snippet:

// let's take simple sine as an example
val waveformAsAStream = 440.sine().trim(1000)

val fftStream = waveformAsAStream
    .window(256,128)
    // zero padding is already done inside, but if window.size == fft.size it doesn't really do anything
    .fft(256)

// evaluate it, for example as a kotlin sequence
val stft = fftStream.asSequence(44100.0f)
    .toList()

// get the specific sample for the sake of the demonstration
val fftSample = stft.drop(10).first()

// get time in nano seconds
fftSample.time()
// outputs the time of the taken sample:
// 29024943

// get frequencies values
fftSample.frequency().toList()
// outputs a list of size 128, each element is a frequency in Hz :
// [0.0, 172.265625, 344.53125, 516.796875, 689.0625, ..., 21360.9375, 21533.203125, 21705.46875, 21877.734375]

// get magnitude values
fftSample.magnitude().toList()
//  outputs a list of size 128, each element is magnitude value for specific bin in dB:
// [29.629418039768613, 31.125367384785786, 38.077554502661705, 38.480916556622745, ..., -11.57802246867041]

// the index of the closest bin (index) of the frequency
fftSample.bin(440.0)
// outputs:
// 3

// get the magnitude in the FFT spectrogram of the specific frequency
fftSample.magnitude().toList()[fftSample.bin(440.0)]
// outputs:
// 38.480916556622745

Although I would recommend for better FFT output result to use window functions for example hamming is the popular one, and use less sized windows (zero padding will do the aligning trick in that case as FFT requires specific input length), i.e something like this:

waveformAsAStream
    .window(101, 85)
    .hamming()
    .fft(256)

If you want to play around with the values you may use Kotlin Jupyter notebook with WaveBeans library, check it out on github


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...