In this post, I’m going to explain all the different parts of a Speech Analysis window in WaveSurfer.
Going from the top, there’s a waveform (波形 in Japanese), and that shows the wave of sound in the air that the microphone detects. Really, all of the information that you can hear is in that waveform, but it’s hard to read. That’s why we divide it up into the other parts, which I’ll explain further.
Next is the spectrogram, which shows which frequencies are strongest. The dark bands in the spectrogram show which frequencies are strong. If you play with it a little, you’ll notice that sounds like ‘sh’ and ‘f’ and ‘s’ have really high frequencies, with a lot of black stuff at the top. Sounds like ‘a’ and ‘o’ mostly have dark bands at the bottom. In Wavesurfer, you can see four colored lines. The red one helps you find the first formant frequency, which we abbreviate ‘F1’. Green shows ‘F2’, blue shows ‘F3’, and yellow shows ‘F4’. These are the frequencies that get stronger when the sound wave passes through your throat, past your tongue, and out your mouth.
Third is the intensity curve or ‘power plot’, and that shows how loud the sound is. By default, WaveSurfer doesn’t show this, so you need to right click, choose ‘Create Pane’, and click ‘Power Plot’. Notice that it’s usually high when the waveform is thick and low when the waveform is thin. When sounds are stressed in English (and Korean and Mandarin and a bunch of other languages), they get louder, and we can measure that with this graph. The units are decibels. A whisper is about 20 decibels, a conversation is usually around 60 decibels, and thunder is around 120 decibels. Anything over 125 will hurt your ears fast, but anything over 85 or 90 can be bad if you listen to it for a long time. More info on that here.
Finally on the bottom there’s a ‘pitch contour’, and that shows how high or low the sound is. It’s like a music scale. Pitch really means how high or low something sounds in our brain, but human brains aren’t very accurate, actually. The real, physical sound is called the fundamental frequency, and we abbreviate that ‘F0’. In Japanese, we use pitch to tell certain words apart, like 端, 箸 and 橋, 腫れる and 晴れる, or 海 and 膿. Most, maybe even all languages, use pitch to give other meanings, like to show when something is a question.
The same sample will look a little different in Praat, but we can get the same information. The top is still a waveform, and the second part is still a spectrogram. If you want Praat to show you a pitch contour, intensity curve, or help you find formants, you can use the menus at the top.
The red dots on the spectrogram help you to find the formant frequencies. The blue lines show the pitch, but be careful! The pitch is on a different scale from the spectrogram. The spectrogram goes from 0 to 5000 Hertz by default (scale on the left side), but the pitch contour goes from 75 to 500 Hertz (scale on the right side). Finally, the yellow line shows the intensity curve.
That should cover the very basics of what you’re seeing and what everything means. Please comment if you have any questions or comments!