Vowel formants produced by DARLA and FAVE

A dataset containing information on 3,504 vowel tokens. It is formatted very nearly how FAVE outputs its spreadsheets, so it is useful for demonstrating code assuming you have a FAVE-produced spreadsheet.

darla

Format

A dataframe with 3,504 rows and 43 variables. There are probably better explanations of these variables elsewhere on the internet, but here's my description:

name

speaker name. Since this is just my data, it's just "joey"

sex

speaker sex. Since it's me, it's "M" for "male"

vowel

vowel, in ARPABET. Since DARLA uses the CMU Pronouncing Dictionary, it uses ARPABET for transcription of the vowels. This is a handy transcription system since all General American English vowels are represented using two-letter codes.

Other columns will be added later.

Details

To create this dataset, I first selected 300 sentences from COCA. This was part of a research project and I wanted to run through the sentences I would have participants read by going through them myself. The sentences were chosen because they contained key words I wanted to elicit.

It is not the cleanest dataset. I sat at my kitchen table and read the 300 sentences with a decent microphone, but I had a bit of sore throat, so my voice was not how it normally sounded. Plus, there was an infact screaming during part of the recording. For my purposes, I didn't need a particularly clean dataset, so I was fine with that. This was sometime around June 2017.

I processed the data using DARLA (http://darla.dartmouth.edu) using its automatic speech recognition option. I didn't have the time to transcribe the data myself, and I didn't find it necessary, really. DARLA admits that its word-level transcriptions are not completely reliable, but that it does do a pretty good job at at least getting the vowels correct. Comparing this output to the script I read, I'd say that's about right.

After DARLA did its automatic transcription, it sent the audio and transcription to be force aligned. At the time, Prosody-Lab was used.

Finally, DARLA sends the audio and phoneme-level transcriptions to FAVE for formant extraction. The resulting dataset is very nearly the same as the one that FAVE produced. The only modifications are in the first few metadata columns.

This dataset is useful for two reasons. First, I use this code in a lot of my R workshops; if you have FAVE-produced data yourself, you'll be able to follow along using your own data with minimal modifications to the code. Second, it's a great example of noisy data, which is useful when demonstrating functions that detect outliers.

Metadata about the speaker: White male, born in 1989 in suburban St. Louis where I lived until I was 18. Parents are from upstate New York and Minnesota. Lived in Utah, Brazil, and Georgia as an adult. Data was recorded July 2020 (age 31).