This is a a tidyverse-compatible function that makes it easy to normalize your data using the method described in the Atlas of North American English (Labov, Ash, & Boberg 2006).
norm_anae(df, hz_cols, token_id, speaker_id, g = "telsur")
The dataframe containing the formant measurements you want to normalize.
A list of columns (unquoted) containing the formant measurements themselves.
The name of the column containing unique identifiers per vowel token.
If your data is set up so that there is one row per token, you can put row.names(.)
here instead.
The name of the column containing unique identifiers per speaker (usually the column containing the speaker name).
By default, "telsur"
, whichwill use the Telsur G value (6.896874)
listed in the ANAE. If set to "calculate"
, it will calculate the G
value based on the dataset. This can be set to any arbitrary number, such as
0
as well.
The same dataframe, but with new column(s), suffixed with "_anae" that have the normalized data.
The data must be grouped by speaker prior to running the function.
The function works best when only F1 and F2 data are included. F3 can be included but the results may not be comparable with other studies.
By default, the function will use the Telsur G value listed in the ANAE (6.896874)
which will make the results most compatible with the ANAE and other
studies that use the same normalization procedure. The function can calculate
a G value based on the dataset provided when g
is set to "calculate"
.
Alternatively, g
can be set to an arbitrary number, such as zero.
It is unclear how the ANAE function should work with trajectory data. This function pools all data together and normalizes it together, which means one small modification was required to calculate the G value if the Telsur G is not used: I had to add the average number of time points per vowel token in the denominator. Not sure if that's how it should be done, but it makes sense to me and returns sensible results.
Labov, William, Sharon Ash, and Charles Boberg. The Atlas of North American English: Phonetics, Phonology and Sound Change. Berlin: Walter de Gruyter, 2006.
library(tidyverse)
df <- joeysvowels::idahoans
df %>%
group_by(speaker) %>%
norm_anae(hz_cols = c(F1, F2), speaker_id = speaker) %>%
ungroup() %>%
select(F1, F2, F1_anae, F2_anae) # <- just the relevant columns
#> # A tibble: 1,100 × 4
#> F1 F2 F1_anae F2_anae
#> <dbl> <dbl> <dbl> <dbl>
#> 1 699. 1655. 714. 1690.
#> 2 685. 1360. 700. 1388.
#> 3 713. 1507. 728. 1539.
#> 4 801. 1143. 818. 1167.
#> 5 757. 1258. 772. 1284.
#> 6 804. 1403. 821. 1432.
#> 7 664. 1279. 678. 1306.
#> 8 757. 1325. 773. 1353.
#> 9 730. 1578. 746. 1611.
#> 10 700. 1546. 715. 1578.
#> # … with 1,090 more rows
# Slightly different if G is calculated internally.
df %>%
group_by(speaker) %>%
norm_anae(hz_cols = c(F1, F2), speaker_id = speaker, g = "calculate") %>%
ungroup() %>%
select(F1, F2, F1_anae, F2_anae) # <- just the relevant columns
#> # A tibble: 1,100 × 4
#> F1 F2 F1_anae F2_anae
#> <dbl> <dbl> <dbl> <dbl>
#> 1 699. 1655. 660. 1562.
#> 2 685. 1360. 646. 1283.
#> 3 713. 1507. 673. 1422.
#> 4 801. 1143. 756. 1078.
#> 5 757. 1258. 714. 1187.
#> 6 804. 1403. 759. 1323.
#> 7 664. 1279. 626. 1207.
#> 8 757. 1325. 715. 1250.
#> 9 730. 1578. 689. 1489.
#> 10 700. 1546. 660. 1459.
#> # … with 1,090 more rows
# G can be set to an arbitrary value.
df %>%
group_by(speaker) %>%
norm_anae(hz_cols = c(F1, F2), speaker_id = speaker, g = 0) %>%
ungroup() %>%
select(F1, F2, F1_anae, F2_anae) # <- just the relevant columns
#> # A tibble: 1,100 × 4
#> F1 F2 F1_anae F2_anae
#> <dbl> <dbl> <dbl> <dbl>
#> 1 699. 1655. 0.722 1.71
#> 2 685. 1360. 0.707 1.40
#> 3 713. 1507. 0.736 1.56
#> 4 801. 1143. 0.827 1.18
#> 5 757. 1258. 0.781 1.30
#> 6 804. 1403. 0.830 1.45
#> 7 664. 1279. 0.685 1.32
#> 8 757. 1325. 0.782 1.37
#> 9 730. 1578. 0.754 1.63
#> 10 700. 1546. 0.722 1.60
#> # … with 1,090 more rows