Understanding speech analysis software

In this post, I’m going to explain all the different parts of a Speech Analysis window in WaveSurfer.

Speech Analysis in WaveSurfer

Going from the top, there’s a waveform (波形 in Japanese), and that shows the wave of sound in the air that the microphone detects. Really, all of the information that you can hear is in that waveform, but it’s hard to read. That’s why we divide it up into the other parts, which I’ll explain further.

Next is the spectrogram, which shows which frequencies are strongest. The dark bands in the spectrogram show which frequencies are strong. If you play with it a little, you’ll notice that sounds like ‘sh’ and ‘f’ and ‘s’ have really high frequencies, with a lot of black stuff at the top. Sounds like ‘a’ and ‘o’ mostly have dark bands at the bottom. In Wavesurfer, you can see four colored lines. The red one helps you find the first formant frequency, which we abbreviate ‘F1’. Green shows ‘F2’, blue shows ‘F3’, and yellow shows ‘F4’. These are the frequencies that get stronger when the sound wave passes through your throat, past your tongue, and out your mouth.

Third is the intensity curve or ‘power plot’, and that shows how loud the sound is. By default, WaveSurfer doesn’t show this, so you need to right click, choose ‘Create Pane’, and click ‘Power Plot’. Notice that it’s usually high when the waveform is thick and low when the waveform is thin. When sounds are stressed in English (and Korean and Mandarin and a bunch of other languages), they get louder, and we can measure that with this graph. The units are decibels. A whisper is about 20 decibels, a conversation is usually around 60 decibels, and thunder is around 120 decibels. Anything over 125 will hurt your ears fast, but anything over 85 or 90 can be bad if you listen to it for a long time. More info on that here.

Finally on the bottom there’s a ‘pitch contour’, and that shows how high or low the sound is. It’s like a music scale. Pitch really means how high or low something sounds in our brain, but human brains aren’t very accurate, actually. The real, physical sound is called the fundamental frequency, and we abbreviate that ‘F0’. In Japanese, we use pitch to tell certain words apart, like 端, 箸 and 橋, 腫れる and 晴れる, or 海 and 膿. Most, maybe even all languages, use pitch to give other meanings, like to show when something is a question.

A speech sample in Praat

The same sample will look a little different in Praat, but we can get the same information. The top is still a waveform, and the second part is still a spectrogram. If you want Praat to show you a pitch contour, intensity curve, or help you find formants, you can use the menus at the top.

A speech sample in Praat with pitch, formants and intensity shown.

The red dots on the spectrogram help you to find the formant frequencies. The blue lines show the pitch, but be careful! The pitch is on a different scale from the spectrogram. The spectrogram goes from 0 to 5000 Hertz by default (scale on the left side), but the pitch contour goes from 75 to 500 Hertz (scale on the right side). Finally, the yellow line shows the intensity curve.

That should cover the very basics of what you’re seeing and what everything means. Please comment if you have any questions or comments!


  1. Hitomi in the pronunciation class of Sophi says:

    Thank uou for your nice posts about pronunciations.
    I have questions.

    How i should do to make a clear well understood wedge sound other than putting my tongue in the middle of my mouth?

    • gengojeff says:

      /ʌ/ is kind of difficult. Actually, not everyone agrees on where we should write it on the chart. Some people think it’s in the back, not the middle. If you look on my vowel chart in Excel, you can see that my /ʌ/ is very very close to /ɑ/!

      One easy way is to say an /o/ sound, but don’t make your lips round. I think this sound is really close to how I make an English /ʌ/. It sounds very different, so it’s easy to hear. I think this is the same sound we make in Japanese when we try to have ‘cute’ pronunciation. Try to say 「はしゃぎましょう!」 with a big smile and kind of メイド喫茶, モエ-type voice. Usually when we make that voice, our tongue goes forward and we don’t round our lips.

      Another way is to say an /e/ sound, like へぇぇぇぇ. Then, pull your tongue back. Try to sound like you’re really, really dumb. That sounds a lot like a “huh” sound /hʌ/ in English, I think.

  2. Hitomi in the pronunciation class of Sophi says:

    How I sould do to make clear funny u sound other than putting my tongue in the back and high of my mouth(front and lower than [u] sound)?


    • gengojeff says:

      That’s a great question! Unfortunately, the really, really correct answer is complicated. Actually, there are a lot of ways, and even scientists don’t completely agree.

      Here’s what I think. English has two kinds of vowels: tense vowels (緊張母音) and lax vowels (弛緩母音). /i/, /e/, /u/, and /o/ are tense. /ɪ/, /ɛ/, /ʊ/, and /ɔ/ are lax. There are different rules for when we can use lax and tense vowels. English words can never end with a lax vowel, so that’s why we say ‘karady’ when we pronounce 空手, or ‘saki’ when we pronounce 酒.

      For me, it feels like the back of my tongue relaxes to make lax vowels. Actually, I think Japanese people make lax vowels, too. It’s just that in Japanese, it doesn’t make a different word. Think about how you would say the る sound in 帰る if you go on NHK news. Now, think about how a 19-year-old boy says 帰る to his friends. I think the boy will usually make a lazy kind of sound, and that sounds a lot like /ʊ/ to me. So if you pronounce a う sound in a lazy way, that can sound like the English /ʊ/.

      Another way is to make a diphthong (二重母音). That’s what I do in my native accent. In class, I try really hard to make a clear /ʊ/ sound to help you hear, but I really don’t use it when I talk. My pronunciation sounds like /uə/. I start with my tongue far back and high, with my lips rounded a little. Then I let my tongue relax to the middle of my mouth and unround my lips. So it’s like a take a little trip, and I pass through /ʊ/. I think this way is probably a lot easier to learn, actually.

      Thank you for your question!

