Normalize vowel formant measurements a log-means normalization procedure as described in Barreda & Nearey (2018). This function is intended to be used within a tidyverse pipeline.

norm_logmeans(
  .df,
  .formant_cols,
  .speaker_col,
  .vowel_col,
  .return = "data",
  i_know_more_than_you = FALSE
)

Arguments

.df

The data frame containing the formant measurements you want to normalize. Formant data must be log transformed! See example code below.

.formant_cols

The (unquoted) name(s) of the column containing the formant measurements.

.speaker_col

The (unquoted) name of the column containing the unique identifiers per speaker.

.vowel_col

The (unquoted) name of the column containing the unique identifiers per vowel

.return

A string. By default, "data", which will returned the your original data with the normalized data appended. If you set this to "params", you'll get a data frame with the normalization paramters for each speakers.

i_know_more_than_you

Logical. The function won't work if you've got data that doesn't look like log10-transformed formant data. If you want to force the function to run anyway, set this to `TRUE`.

Value

The original dataframe with new columns containing the normalized measurements. These new columns have "_norm" appended to the column names.

Details

The data should not be grouped beforehand (e.g. with group_by). The data must be numeric, and there cannot be any NAs.

Note

Thanks to Santiago Barreda for providing most of the code for this function.

References

Barreda, Santiago, and Terrance M. Nearey. 2018. "A Regression Approach to Vowel Normalization for Missing and Unbalanced Data." The Journal of the Acoustical Society of America 144(1): 500–520. https://doi.org/10.1121/1.5047742.

Examples

library(tidyverse)
idaho <- joeysvowels::idahoans

# Basic usage. Note that the data has to be log10-transformed.
idaho %>%
    mutate(F1_log = log10(F1), F2_log = log10(F2)) %>%
    norm_logmeans(.formant_cols = c(F1_log, F2_log),
                  .speaker_col = speaker,
                  .vowel_col = vowel) %>%
    head()
#>   speaker    sex vowel      F1       F2       F3       F4   F1_log   F2_log
#> 1      01 female    AA 699.307 1655.421 2019.293 3801.101 2.844668 3.218908
#> 2      01 female    AA 685.203 1359.909 1913.979 4257.150 2.835819 3.133510
#> 3      01 female    AA 713.323 1507.440 2460.500 3616.576 2.853286 3.178240
#> 4      01 female    AA 801.302 1142.841 1867.670 2907.539 2.903796 3.057986
#> 5      01 female    AA 756.531 1257.887 1772.368 2778.159 2.878827 3.099642
#> 6      01 female    AA 804.281 1402.582 2339.445 4299.040 2.905408 3.146928
#>   F1_log_logmeans F2_log_logmeans
#> 1     -0.14163058      0.23261001
#> 2     -0.15047920      0.14721139
#> 3     -0.13301223      0.19194158
#> 4     -0.08250223      0.07168736
#> 5     -0.10747173      0.11334317
#> 6     -0.08089065      0.16062981

# Return the speaker paramters instead.
idaho %>%
    mutate(F1_log = log10(F1), F2_log = log10(F2)) %>%
    norm_logmeans(.formant_cols = c(F1_log, F2_log),
                  .speaker_col = speaker,
                  .vowel_col = vowel,
                  .return = "params") %>%
    head()
#>   speakers     gbar
#> 1       01 2.986298
#> 2       02 2.875643
#> 3       03 2.996838
#> 4       04 2.969478
#> 5       05 2.957691
#> 6       06 2.899768

# If you forget to log-transform the data, it'll throw an error.
idaho %>%
    norm_logmeans(.formant_cols = c(F1, F2),
                  .speaker_col = speaker,
                  .vowel_col = vowel)
#> Error in norm_logmeans(., .formant_cols = c(F1, F2), .speaker_col = speaker,     .vowel_col = vowel): Are you sure your formant data is log-transformed? If it's not, see ?tidy_norm for code on how to do that. If you are certain you're right, please add `i_know_more_than_you = TRUE` to this function.

# But you can force the function to run on non-transformed data if you're sure you know what you're doing.
idaho %>%
    norm_logmeans(.formant_cols = c(F1, F2),
                  .speaker_col = speaker,
                  .vowel_col = vowel,
                  i_know_more_than_you = TRUE) %>%
    head()
#>   speaker    sex vowel      F1       F2       F3       F4 F1_logmeans
#> 1      01 female    AA 699.307 1655.421 2019.293 3801.101  -0.3261165
#> 2      01 female    AA 685.203 1359.909 1913.979 4257.150  -0.3464912
#> 3      01 female    AA 713.323 1507.440 2460.500 3616.576  -0.3062720
#> 4      01 female    AA 801.302 1142.841 1867.670 2907.539  -0.1899684
#> 5      01 female    AA 756.531 1257.887 1772.368 2778.159  -0.2474628
#> 6      01 female    AA 804.281 1402.582 2339.445 4299.040  -0.1862576
#>   F2_logmeans
#> 1   0.5356043
#> 2   0.3389668
#> 3   0.4419618
#> 4   0.1650662
#> 5   0.2609823
#> 6   0.3698638