A function to classify vowel data into contextual allophones.

code_allophones(
  .df,
  .old_col,
  .new_cols = c("allophone", "allophone_environment"),
  .pre_seg,
  .fol_seg,
  .coronals = c("T", "D", "S", "Z", "SH", "ZH", "JH", "N"),
  .voiceless = c("P", "T", "K", "CH", "F", "TH", "S", "SH")
)

Arguments

.df

The dataset containing vowel data.

.old_col

The unquoted name of the column containing the vowel labels. Often called "vowel" or "phoneme" in many datasets. Note that the function assumes Wells lexical sets (FLEECE, TRAP, etc.) rather than ARPABET (IY, AE, etc.) or IPA (i, æ, etc.). If your vowels are not already coded using Wells' labels you can quickly do so with switch_transcriptions or one of the shortcuts like arpa_to_wells

.new_cols

A vector of two strings containing the names of the columns you would like to use. By default c("allophone", "allophone_environment"). The first name becomes the name of the column containing the new allophone labels. The second column becomes the name of the column describing those labels.

.pre_seg

The unquoted name of the column that contains the labels for the previous segement. In DARLA-generated spreadsheets, this is `pre_seg` and in FastTrack-generated spreadsheets, it's `previous_sound`. Assumes ARPABET labels.

.fol_seg

The unquoted name of the column that contains the labels for the following segement. In DARLA-generated spreadsheets, this is `fol_seg` and in FastTrack-generated spreadsheets, it's `next_sound`. Assumes ARPABET labels.

.coronals

A vector of strings containing ARPABET labels for coronal consonants. By default, c("T", "D", "S", "Z", "SH", "ZH", "JH", "N"). This is used to create the `TOOT` allophone of `GOOSE`.

.voiceless

A vector of strings containing ARPABET labels for voiceless consonants. By default, c("P", "T", "K", "CH", "F", "TH", "S", "SH"). This is used to create the `PRICE` allophone of `PRIZE`.

Value

A dataframe with two additional columns. One column contains labels for the allophones and the other contains category labels for those allophones' contexts. The second column can be useful for quickly excluding certain allophones like prelaterals or prerhotics or coloring families of allophones in visualizations (such as turning all prelateral allophones gray). These two new columns are positioned immediately after the original vowel column indicated in .old_col,

Note

Here are the list of the contextual allophones that are created. Note that I largely follow my own advice about what to call elsewhere allophones, what to call prelateral allophones, and other allophones. Obviously, this list is pretty subjective and largely based on what my own research has needed, so it may not work completely for you and your research. Please contact me at joey_stanley@byu.edu if you want to see an allophone get added or if you spot an error in the coding.

  • FLEECE becomes

    • ZEAL before laterals

    • BEET elsewhere

  • KIT becomes

    • GUILT before laterals

    • NEAR before rhotics

    • BIG before G

    • BIN before M and N

    • BING before NG

    • BIT elsewhere

  • FACE becomes

    • FLAIL before laterals

    • VAGUE before G

    • BAIT elsewhere

  • DRESS becomes

    • SHELF before laterals

    • SQUARE before rhotics

    • BEG before G

    • BEN before M and N

    • BENG before NG

    • BET elsewhere

  • TRAP becomes

    • TALC before laterals

    • BAG before G

    • BAN before M and N

    • BANG before NG

    • BAT elsewhere

  • LOT becomes

    • GOLF before laterals

    • START before rhotics

    • BOT elsewhere

  • THOUGHT becomes

    • FAULT before laterals

    • FORCE befpre rhotics

    • BOUGHT elsewhere

  • STRUT becomes

    • MULCH before laterals

    • BUT elsewhere

  • GOAT becomes

    • JOLT before laterals

    • BOAT elsewhere

  • FOOT becomes

    • WOLF before laterals

    • CURE before rhotics

    • PUT elsewhere

  • GOOSE becomes

    • MULE before Y

    • TOOT before coronals

    • SPOOL before laterals

    • BOOT elsewhere

  • PRICE becomes

    • PRICE before voiceless segments

    • PRIZE elsewhere

Unfortunately, it is not straightforward to customize this list but you can always copy the source code and modify the list yourself.

Alternatively, you can use forcats::fct_collapse() to collapse distinctions that you don't need. See example code below.

You can also of course create your own allophones if desired. Note that some allophones depend on other environmental information like syllable structure and morpheme/word boundaries, or they may be entirely lexical (FORCE vs. NORTH). They may be more complicated than what ARPABET can code for (MARY, MERRY, and MARRY) or just inconsistently coded. For the sake of simplicity, these allophones are not included in this function.

The environments therefore are the following

  • "prelateral" includes ZEAL, GUILT, FLAIL, SHELF, TALC, GOLF, FAULT, MULCH, JOLT, WOLF, SPOOL

  • "prerhotic" includes NEAR, SQUARE, START, FORCE, CURE

  • "prevelar" includes BIG, VAGUE, BEG, BAG,

  • "prenasal" includes BIN, BEN, BAN

  • "prevelarnasal" includes BING, BENG, BANG

  • "prevoiceless" includes PRICE

  • "post-Y" includes MULE

  • "postcoronal" includes TOOT

  • "elsewhere" includes BEET, BIT, BAIT, BET, BAT, BOT, BOUGHT, BUT, BOAT, PUT, BOOT, PRIZE

Examples

suppressPackageStartupMessages(library(tidyverse))

# Get some sample DARLA data to play with
darla <- joeysvowels::darla %>%
  select(word, vowel, pre_seg, fol_seg) %>%
  mutate(phoneme = joeyr:::arpa_to_wells(vowel), .after = vowel)

# Basic usage
darla %>%
  code_allophones(.old_col = phoneme, .fol_seg = fol_seg, .pre_seg = pre_seg) %>%
  slice_sample(n = 20)
#>            word vowel phoneme allophone allophone_environment pre_seg fol_seg
#> 1         BRIAN    AY   PRICE     PRIZE             elsewhere       R     AH0
#> 2           PUT    UH    FOOT       PUT             elsewhere       P       T
#> 3        REAGAN    IY  FLEECE      BEET             elsewhere       R       G
#> 4       BALANCE    AE    TRAP      TALC            prelateral       B       L
#> 5       HIGHEST    AY   PRICE     PRIZE             elsewhere      HH     AH0
#> 6          KIDS    IH     KIT       BIT             elsewhere       K       D
#> 7         CROWN    AW   MOUTH     MOUTH             elsewhere       R       N
#> 8        LOADED    OW    GOAT      BOAT             elsewhere       L       D
#> 9      INVOLVED    AA     LOT      GOLF            prelateral       V       L
#> 10     NORTHERN    AO THOUGHT     FORCE             prerhotic       N       R
#> 11           Q.    UW   GOOSE      MULE                post-Y       Y        
#> 12 SUBSEQUENTLY    AH   STRUT       BUT             elsewhere       S       B
#> 13         GAME    EY    FACE      BAIT             elsewhere       G       M
#> 14       COTTON    AO THOUGHT    BOUGHT             elsewhere       K       T
#> 15      EXISTED    AH   STRUT       BUT             elsewhere       T       D
#> 16         WIRE    AY   PRICE     PRIZE             elsewhere       W       R
#> 17            A    EY    FACE      BAIT             elsewhere     IY1      CH
#> 18         MORE    AO THOUGHT     FORCE             prerhotic       M       R
#> 19      RAINBOW    EY    FACE      BAIT             elsewhere       R       N
#> 20     INTERPOL    IH     KIT       BIN              prenasal     IY1       N

# Specify the names of the new columns with the `.new_cols` argument
darla %>%
  code_allophones(.old_col = phoneme,
                  .new_cols = c("allophone", "environment"),
                  .fol_seg = fol_seg,
                  .pre_seg = pre_seg) %>%
  slice_sample(n = 20)
#>             word vowel phoneme allophone environment pre_seg fol_seg
#> 1            SEE    IY  FLEECE      BEET   elsewhere       S       W
#> 2        BALLOON    AH   STRUT     MULCH  prelateral       B       L
#> 3     CRITICALLY    IY  FLEECE      BEET   elsewhere       L     IH0
#> 4           HALF    AE    TRAP       BAT   elsewhere      HH       F
#> 5         AROUND    ER   NURSE     NURSE   elsewhere       S     AW1
#> 6             GO    OW    GOAT      BOAT   elsewhere       G     AH0
#> 7          ABOUT    AW   MOUTH     MOUTH   elsewhere       B       T
#> 8         THIRTY    ER   NURSE     NURSE   elsewhere      TH       T
#> 9       OVERLAND    ER   NURSE     NURSE   elsewhere       V       L
#> 10      ELEPHANT    AH   STRUT       BUT   elsewhere       F       N
#> 11           HER    ER   NURSE     NURSE   elsewhere      HH      CH
#> 12 RELATIONSHIPS    IH     KIT       BIT   elsewhere      SH       P
#> 13            ON    AO THOUGHT    BOUGHT   elsewhere       T       N
#> 14      ACTUALLY    AE    TRAP       BAT   elsewhere       T       K
#> 15         LEAST    IY  FLEECE      BEET   elsewhere       L       S
#> 16       SOUNDED    AW   MOUTH     MOUTH   elsewhere       S       N
#> 17    DISRUPTION    AH   STRUT       BUT   elsewhere       R       P
#> 18      LEARNING    ER   NURSE     NURSE   elsewhere       L       N
#> 19       COUNTRY    AH   STRUT       BUT   elsewhere       K       N
#> 20       FEDERAL    EH   DRESS       BET   elsewhere       F       D

# Filtering by environment is straightforward
darla %>%
  code_allophones(.old_col = phoneme,
                  .new_cols = c("allophone", "environment"),
                  .fol_seg = fol_seg,
                  .pre_seg = pre_seg) %>%
  filter(environment == "elsewhere") %>%
  slice_sample(n = 20)
#>                word vowel phoneme allophone environment pre_seg fol_seg
#> 1           READING    IY  FLEECE      BEET   elsewhere       R       D
#> 2               MIA    IY  FLEECE      BEET   elsewhere       M     AH0
#> 3             ABOUT    AW   MOUTH     MOUTH   elsewhere       B       T
#> 4              DOWN    AW   MOUTH     MOUTH   elsewhere       D       N
#> 5  RESPONSIBILITIES    AA     LOT       BOT   elsewhere       P       N
#> 6             SOLAR    ER   NURSE     NURSE   elsewhere       L     IY0
#> 7              OPEN    OW    GOAT      BOAT   elsewhere       T       P
#> 8                AT    AE    TRAP       BAT   elsewhere       N       T
#> 9       INTERRUPTED    AH   STRUT       BUT   elsewhere     ER0       P
#> 10            SILLY    IY  FLEECE      BEET   elsewhere       L       T
#> 11               US    EH   DRESS       BET   elsewhere     UW1       S
#> 12            BOXES    AA     LOT       BOT   elsewhere       B       K
#> 13        COMMUNITY    IY  FLEECE      BEET   elsewhere       T       T
#> 14         MAGAZINE    IY  FLEECE      BEET   elsewhere       Z       N
#> 15          ENTRIES    IY  FLEECE      BEET   elsewhere       R       Z
#> 16      INFORMATION    ER   NURSE     NURSE   elsewhere       F       M
#> 17          DIGITAL    IH     KIT       BIT   elsewhere       D      JH
#> 18            SLATE    EY    FACE      BAIT   elsewhere       L       T
#> 19             GAME    EY    FACE      BAIT   elsewhere       G       M
#> 20             COST    AO THOUGHT    BOUGHT   elsewhere       K       S
darla %>%
  code_allophones(.old_col = phoneme,
                  .new_cols = c("allophone", "environment"),
                  .fol_seg = fol_seg,
                  .pre_seg = pre_seg) %>%
  filter(!environment %in% c("prerhotic", "prevelarnasal", "prevelar")) %>%
  slice_sample(n = 20)
#>              word vowel phoneme allophone environment pre_seg fol_seg
#> 1        PATIENTS    AH   STRUT       BUT   elsewhere      SH       N
#> 2             WHY    AY   PRICE     PRIZE   elsewhere       W      HH
#> 3         ANYBODY    IY  FLEECE      BEET   elsewhere       D        
#> 4           DOING    UW   GOOSE      TOOT postcoronal       D     IH0
#> 5           KNOWN    OW    GOAT      BOAT   elsewhere       N       N
#> 6             HER    ER   NURSE     NURSE   elsewhere      HH      CH
#> 7  IMPLEMENTATION    EY    FACE      BAIT   elsewhere       T      SH
#> 8           MAPLE    EY    FACE      BAIT   elsewhere       M       P
#> 9       AVAILABLE    EY    FACE     FLAIL  prelateral       V       L
#> 10         THAT'S    AE    TRAP       BAT   elsewhere      DH       T
#> 11          YOUNG    AH   STRUT       BUT   elsewhere       Y      NG
#> 12             ON    AO THOUGHT    BOUGHT   elsewhere       V       N
#> 13       OVERLAND    OW    GOAT      BOAT   elsewhere       V       V
#> 14            NOT    AA     LOT       BOT   elsewhere       N       T
#> 15           TEST    EH   DRESS       BET   elsewhere       T       S
#> 16            OWN    OW    GOAT      BOAT   elsewhere       R       N
#> 17           VAST    AE    TRAP       BAT   elsewhere       V       S
#> 18        CHOOSES    UW   GOOSE      BOOT   elsewhere      CH       Z
#> 19           OVER    ER   NURSE     NURSE   elsewhere       V       T
#> 20           PUCK    AH   STRUT       BUT   elsewhere       P       K

# Some users may want to supply their own list of coronal consonants.
darla %>%
  code_allophones(.old_col = phoneme,
                  .new_cols = c("allophone", "environment"),
                  .fol_seg = fol_seg,
                  .pre_seg = pre_seg,
                  .coronals = c("T", "D", "S", "Z", "SH", "ZH", "JH", "N", "Y")) %>%
  filter(phoneme == "GOOSE") %>%
  slice_sample(n = 20)
#>        word vowel phoneme allophone environment pre_seg fol_seg
#> 1  STUDENTS    UW   GOOSE      TOOT postcoronal       T       D
#> 2     HUMAN    UW   GOOSE      MULE      post-Y       Y       M
#> 3      INTO    UW   GOOSE      TOOT postcoronal       T     EY1
#> 4       WHO    UW   GOOSE     SPOOL  prelateral      HH       L
#> 5       CUE    UW   GOOSE      MULE      post-Y       Y      DH
#> 6        TO    UW   GOOSE      TOOT postcoronal       T       F
#> 7  NUMEROUS    UW   GOOSE      TOOT postcoronal       N       M
#> 8       USE    UW   GOOSE      MULE      post-Y       Y       Z
#> 9        TO    UW   GOOSE     SPOOL  prelateral       T       L
#> 10       TO    UW   GOOSE      TOOT postcoronal       T      DH
#> 11       TO    UW   GOOSE      TOOT postcoronal       T        
#> 12     INTO    UW   GOOSE      TOOT postcoronal       T       W
#> 13     RULE    UW   GOOSE     SPOOL  prelateral       R       L
#> 14  FORTUNE    UW   GOOSE      BOOT   elsewhere      CH       N
#> 15    GROUP    UW   GOOSE      BOOT   elsewhere       R       P
#> 16       TO    UW   GOOSE      TOOT postcoronal       T       K
#> 17       TO    UW   GOOSE      TOOT postcoronal       T     IH0
#> 18      TWO    UW   GOOSE      TOOT postcoronal       T     ER0
#> 19       TO    UW   GOOSE      TOOT postcoronal       T        
#> 20       TO    UW   GOOSE      TOOT postcoronal       T     AE1

# Other users may want to specify their own list of voiceless consonants.
darla %>%
  code_allophones(.old_col = phoneme,
                  .new_cols = c("allophone", "environment"),
                  .fol_seg = fol_seg,
                  .pre_seg = pre_seg,
                  .voiceless = c("P", "T", "K", "CH", "F", "TH", "S", "SH", "X")) %>%
  filter(phoneme == "PRICE") %>%
  slice_sample(n = 20)
#>          word vowel phoneme allophone  environment pre_seg fol_seg
#> 1     ANXIETY    AY   PRICE     PRIZE    elsewhere       Z     AH0
#> 2       PRICE    AY   PRICE     PRICE prevoiceless       R       S
#> 3    POLITELY    AY   PRICE     PRICE prevoiceless       L       T
#> 4       WHITE    AY   PRICE     PRICE prevoiceless       W       T
#> 5        FIND    AY   PRICE     PRIZE    elsewhere       F       N
#> 6        I'VE    AY   PRICE     PRIZE    elsewhere       K       V
#> 7  IDENTIFIED    AY   PRICE     PRIZE    elsewhere     IY0       D
#> 8        I'VE    AY   PRICE     PRIZE    elsewhere       T       V
#> 9       WHILE    AY   PRICE     PRIZE    elsewhere       W       L
#> 10       TIME    AY   PRICE     PRIZE    elsewhere       T       M
#> 11     HYBRID    AY   PRICE     PRIZE    elsewhere      HH       B
#> 12       LINE    AY   PRICE     PRIZE    elsewhere       L       N
#> 13      BLIND    AY   PRICE     PRIZE    elsewhere       L       N
#> 14       SIDE    AY   PRICE     PRIZE    elsewhere       S       D
#> 15      BRYCE    AY   PRICE     PRICE prevoiceless       R       S
#> 16         BY    AY   PRICE     PRIZE    elsewhere       B        
#> 17       FIVE    AY   PRICE     PRIZE    elsewhere       F       V
#> 18   BRIGHTLY    AY   PRICE     PRICE prevoiceless       R       T
#> 19    MICHAEL    AY   PRICE     PRICE prevoiceless       M       K
#> 20       FIND    AY   PRICE     PRIZE    elsewhere       F       N

# Collapsing distinctions can be done post hoc (though it may take extra work to get the environment column to match.)
darla %>%
  code_allophones(.old_col = phoneme,
                  .new_cols = c("allophone", "environment"),
                  .fol_seg = fol_seg,
                  .pre_seg = pre_seg) %>%
  # Get a subset for demonstration purposes
  filter(allophone %in% c("BIT", "BIG")) %>%
  group_by(allophone) %>%
  slice_sample(n = 5) %>%
  ungroup() %>%
  # Now collapse distinctions
  mutate(allophone = fct_collapse(allophone, "BIT" = c("BIT", "BIG")),
         environment = ifelse(allophone == "BIT", "elsewhere", allophone))
#> # A tibble: 10 × 7
#>    word        vowel phoneme allophone environment pre_seg fol_seg
#>    <chr>       <chr> <fct>   <fct>     <chr>       <chr>   <chr>  
#>  1 BEGINNING   IH    KIT     BIT       elsewhere   B       G      
#>  2 TRIGGER     IH    KIT     BIT       elsewhere   R       G      
#>  3 EXISTENCE   IH    KIT     BIT       elsewhere   IY0     G      
#>  4 BIG         IH    KIT     BIT       elsewhere   B       G      
#>  5 EXOTIC      IH    KIT     BIT       elsewhere   V       G      
#>  6 HEADED      IH    KIT     BIT       elsewhere   D       D      
#>  7 CULTIVATED  IH    KIT     BIT       elsewhere   T       D      
#>  8 HIS         IH    KIT     BIT       elsewhere   HH      Z      
#>  9 ADMITTING   IH    KIT     BIT       elsewhere   M       T      
#> 10 ARBITRATION IH    KIT     BIT       elsewhere   B       T      

# Creating new allophones depends on the complexity of the allophone
darla %>%
  code_allophones(.old_col = phoneme,
                  .new_cols = c("allophone", "environment"),
                  .fol_seg = fol_seg,
                  .pre_seg = pre_seg) %>%
  # Create voice and voiceless distinctions for MOUTH
  mutate(allophone = case_when(phoneme == "MOUTH" & fol_seg %in% c("P", "T", "K", "CH", "F", "TH", "S", "SH") ~ "BOUT",
                               phoneme == "MOUTH" ~ "LOUD",
                               TRUE ~ allophone),
         environment = if_else(allophone == "BOUT",  "prevoiceless", environment)) %>%
  # Get a subset for demonstration purposes
  filter(phoneme == "MOUTH") %>%
  group_by(allophone) %>%
  slice_sample(n = 5) %>%
  ungroup()
#> # A tibble: 10 × 7
#>    word       vowel phoneme allophone environment  pre_seg fol_seg
#>    <chr>      <chr> <fct>   <chr>     <chr>        <chr>   <chr>  
#>  1 WITHOUT    AW    MOUTH   BOUT      prevoiceless TH      T      
#>  2 HOUSE      AW    MOUTH   BOUT      prevoiceless HH      S      
#>  3 OUT        AW    MOUTH   BOUT      prevoiceless T       T      
#>  4 OUT        AW    MOUTH   BOUT      prevoiceless T       T      
#>  5 ABOUT      AW    MOUTH   BOUT      prevoiceless B       T      
#>  6 COUNTY     AW    MOUTH   LOUD      elsewhere    K       N      
#>  7 BOUNDARIES AW    MOUTH   LOUD      elsewhere    B       N      
#>  8 OUR        AW    MOUTH   LOUD      elsewhere    V       R      
#>  9 NOW        AW    MOUTH   LOUD      elsewhere    N       DH     
#> 10 HOWEVER    AW    MOUTH   LOUD      elsewhere    HH      EH1