A function to classify vowel data into contextual allophones.

  .new_cols = c("allophone", "allophone_environment"),
  .coronals = c("T", "D", "S", "Z", "SH", "ZH", "JH", "N"),
  .voiceless = c("P", "T", "K", "CH", "F", "TH", "S", "SH")



The dataset containing vowel data.


The unquoted name of the column containing the vowel labels. Often called "vowel" or "phoneme" in many datasets. Note that the function assumes Wells lexical sets (FLEECE, TRAP, etc.) rather than ARPABET (IY, AE, etc.) or IPA (i, æ, etc.). If your vowels are not already coded using Wells' labels you can quickly do so with switch_transcriptions or one of the shortcuts like arpa_to_wells


A vector of two strings containing the names of the columns you would like to use. By default c("allophone", "allophone_environment"). The first name becomes the name of the column containing the new allophone labels. The second column becomes the name of the column describing those labels.


The unquoted name of the column that contains the labels for the previous segement. In DARLA-generated spreadsheets, this is `pre_seg` and in FastTrack-generated spreadsheets, it's `previous_sound`. Assumes ARPABET labels.


The unquoted name of the column that contains the labels for the following segement. In DARLA-generated spreadsheets, this is `fol_seg` and in FastTrack-generated spreadsheets, it's `next_sound`. Assumes ARPABET labels.


A vector of strings containing ARPABET labels for coronal consonants. By default, c("T", "D", "S", "Z", "SH", "ZH", "JH", "N"). This is used to create the `TOOT` allophone of `GOOSE`.


A vector of strings containing ARPABET labels for voiceless consonants. By default, c("P", "T", "K", "CH", "F", "TH", "S", "SH"). This is used to create the `PRICE` allophone of `PRIZE`.


A dataframe with two additional columns. One column contains labels for the allophones and the other contains category labels for those allophones' contexts. The second column can be useful for quickly excluding certain allophones like prelaterals or prerhotics or coloring families of allophones in visualizations (such as turning all prelateral allophones gray). These two new columns are positioned immediately after the original vowel column indicated in .old_col,


Here are the list of the contextual allophones that are created. Note that I largely follow my own advice about what to call elsewhere allophones, what to call prelateral allophones, and other allophones. Obviously, this list is pretty subjective and largely based on what my own research has needed, so it may not work completely for you and your research. Please contact me at joey_stanley@byu.edu if you want to see an allophone get added or if you spot an error in the coding.

  • FLEECE becomes

    • ZEAL before laterals

    • BEET elsewhere

  • KIT becomes

    • GUILT before laterals

    • NEAR before rhotics

    • BIG before G

    • BIN before M and N

    • BING before NG

    • BIT elsewhere

  • FACE becomes

    • FLAIL before laterals

    • VAGUE before G

    • BAIT elsewhere

  • DRESS becomes

    • SHELF before laterals

    • SQUARE before rhotics

    • BEG before G

    • BEN before M and N

    • BENG before NG

    • BET elsewhere

  • TRAP becomes

    • TALC before laterals

    • BAG before G

    • BAN before M and N

    • BANG before NG

    • BAT elsewhere

  • LOT becomes

    • GOLF before laterals

    • START before rhotics

    • BOT elsewhere

  • THOUGHT becomes

    • FAULT before laterals

    • FORCE befpre rhotics

    • BOUGHT elsewhere

  • STRUT becomes

    • MULCH before laterals

    • BUT elsewhere

  • GOAT becomes

    • JOLT before laterals

    • BOAT elsewhere

  • FOOT becomes

    • WOLF before laterals

    • CURE before rhotics

    • PUT elsewhere

  • GOOSE becomes

    • MULE before Y

    • TOOT before coronals

    • SPOOL before laterals

    • BOOT elsewhere

  • PRICE becomes

    • PRICE before voiceless segments

    • PRIZE elsewhere

Unfortunately, it is not straightforward to customize this list but you can always copy the source code and modify the list yourself.

Alternatively, you can use forcats::fct_collapse() to collapse distinctions that you don't need. See example code below.

You can also of course create your own allophones if desired. Note that some allophones depend on other environmental information like syllable structure and morpheme/word boundaries, or they may be entirely lexical (FORCE vs. NORTH). They may be more complicated than what ARPABET can code for (MARY, MERRY, and MARRY) or just inconsistently coded. For the sake of simplicity, these allophones are not included in this function.

The environments therefore are the following


  • "prerhotic" includes NEAR, SQUARE, START, FORCE, CURE

  • "prevelar" includes BIG, VAGUE, BEG, BAG,

  • "prenasal" includes BIN, BEN, BAN

  • "prevelarnasal" includes BING, BENG, BANG

  • "prevoiceless" includes PRICE

  • "post-Y" includes MULE

  • "postcoronal" includes TOOT

  • "elsewhere" includes BEET, BIT, BAIT, BET, BAT, BOT, BOUGHT, BUT, BOAT, PUT, BOOT, PRIZE



# Get some sample DARLA data to play with
darla <- joeysvowels::darla %>%
  select(word, vowel, pre_seg, fol_seg) %>%
  mutate(phoneme = joeyr:::arpa_to_wells(vowel), .after = vowel)

# Basic usage
darla %>%
  code_allophones(.old_col = phoneme, .fol_seg = fol_seg, .pre_seg = pre_seg) %>%
  slice_sample(n = 20)
#>            word vowel phoneme allophone allophone_environment pre_seg fol_seg
#> 1       JACKSON    AE    TRAP       BAT             elsewhere      JH       K
#> 2         WEEKS    IY  FLEECE      BEET             elsewhere       W       K
#> 3       ECLIPSE    IH     KIT       BIT             elsewhere       L       P
#> 4     WORKPLACE    ER   NURSE     NURSE             elsewhere       W       K
#> 5            IN    IH     KIT       BIN              prenasal     IY0       N
#> 6        CHANCE    AE    TRAP       BAN              prenasal      CH       N
#> 7         BREAD    EH   DRESS       BET             elsewhere       R       D
#> 8  INTELLECTUAL    IH     KIT       BIN              prenasal       N       N
#> 9          MOST    OW    GOAT      BOAT             elsewhere       M       S
#> 10        STEEL    IY  FLEECE      ZEAL            prelateral       T       L
#> 11      HOWEVER    AW   MOUTH     MOUTH             elsewhere      HH     EH1
#> 12      STRAINS    EY    FACE      BAIT             elsewhere       R       N
#> 13    CELLPHONE    EH   DRESS     SHELF            prelateral       S       L
#> 14         EVER    EH   DRESS       BET             elsewhere       Z       V
#> 15       MARKET    AA     LOT     START             prerhotic       M       R
#> 16        RIGHT    AY   PRICE     PRICE          prevoiceless       R       T
#> 17       COFFEE    IY  FLEECE      BEET             elsewhere       F        
#> 18         BODY    AA     LOT       BOT             elsewhere       B       D
#> 19       AMOUNT    AW   MOUTH     MOUTH             elsewhere       M       N
#> 20      ECLIPSE    IY  FLEECE      BEET             elsewhere     ER0       K

# Specify the names of the new columns with the `.new_cols` argument
darla %>%
  code_allophones(.old_col = phoneme,
                  .new_cols = c("allophone", "environment"),
                  .fol_seg = fol_seg,
                  .pre_seg = pre_seg) %>%
  slice_sample(n = 20)
#>           word vowel phoneme allophone   environment pre_seg fol_seg
#> 1        SUDAN    UW   GOOSE      TOOT   postcoronal       S       D
#> 2    PREPARING    IH     KIT      BING prevelarnasal       R      NG
#> 3         VOID    OY  CHOICE    CHOICE     elsewhere       V       D
#> 4       CARLOS    OW    GOAT      BOAT     elsewhere       L       S
#> 5      NOTICED    OW    GOAT      BOAT     elsewhere       N       T
#> 6       EASILY    IY  FLEECE      BEET     elsewhere               Z
#> 7      READING    IY  FLEECE      BEET     elsewhere       R       D
#> 8     INSIDERS    AY   PRICE     PRIZE     elsewhere       S       D
#> 9           BY    AY   PRICE     PRICE  prevoiceless       B       S
#> 10     SETTING    EH   DRESS       BET     elsewhere       S       T
#> 11       THREE    IY  FLEECE      BEET     elsewhere       R     EY1
#> 12          MR    ER   NURSE     NURSE     elsewhere       T       M
#> 13        CALM    AA     LOT       BOT     elsewhere       K       M
#> 14          AT    AE    TRAP       BAT     elsewhere               T
#> 15 DISAPPEARED    IH     KIT       BIT     elsewhere       D       S
#> 16   ELEPHANTS    EH   DRESS     SHELF    prelateral       L       L
#> 17      CANCEL    AE    TRAP       BAN      prenasal       K       N
#> 18        WOOD    UH    FOOT       PUT     elsewhere       W       D
#> 19         ARE    ER   NURSE     NURSE     elsewhere       T       K
#> 20      PLENTY    IY  FLEECE      BEET     elsewhere       T     AH1

# Filtering by environment is straightforward
darla %>%
  code_allophones(.old_col = phoneme,
                  .new_cols = c("allophone", "environment"),
                  .fol_seg = fol_seg,
                  .pre_seg = pre_seg) %>%
  filter(environment == "elsewhere") %>%
  slice_sample(n = 20)
#>        word vowel phoneme allophone environment pre_seg fol_seg
#> 1   STATION    EY    FACE      BAIT   elsewhere       T      SH
#> 2    LOOKED    UH    FOOT       PUT   elsewhere       L       K
#> 3     AFTER    AE    TRAP       BAT   elsewhere       Z       F
#> 4        BE    IY  FLEECE      BEET   elsewhere       B     EY1
#> 5      EASY    IY  FLEECE      BEET   elsewhere     ER0       Z
#> 6     WORST    ER   NURSE     NURSE   elsewhere       W       S
#> 7      TERM    ER   NURSE     NURSE   elsewhere       T       M
#> 8     COLOR    ER   NURSE     NURSE   elsewhere       L     IH1
#> 9      FOOD    UW   GOOSE      BOOT   elsewhere       F       D
#> 10    NEVER    EH   DRESS       BET   elsewhere       N       V
#> 11       D.    IY  FLEECE      BEET   elsewhere       D       F
#> 12     GAME    EY    FACE      BAIT   elsewhere       G       M
#> 13   HARDLY    IY  FLEECE      BEET   elsewhere       L       S
#> 14   WANTED    IH     KIT       BIT   elsewhere       T       D
#> 15       ON    AO THOUGHT    BOUGHT   elsewhere       N       N
#> 16    DEATH    EH   DRESS       BET   elsewhere       D      TH
#> 17 BUSINESS    IH     KIT       BIT   elsewhere       N       S
#> 18     CLUB    AH   STRUT       BUT   elsewhere       L       B
#> 19    COULD    UH    FOOT       PUT   elsewhere       K       D
#> 20     LAST    AE    TRAP       BAT   elsewhere       L       S
darla %>%
  code_allophones(.old_col = phoneme,
                  .new_cols = c("allophone", "environment"),
                  .fol_seg = fol_seg,
                  .pre_seg = pre_seg) %>%
  filter(!environment %in% c("prerhotic", "prevelarnasal", "prevelar")) %>%
  slice_sample(n = 20)
#>          word vowel phoneme allophone environment pre_seg fol_seg
#> 1    SETTLERS    ER   NURSE     NURSE   elsewhere       L       Z
#> 2    TOGETHER    EH   DRESS       BET   elsewhere       G      DH
#> 3   ADMITTING    IH     KIT       BIT   elsewhere       M       T
#> 4          U.    UW   GOOSE      MULE      post-Y       Y     EH1
#> 5        TOOL    UW   GOOSE     SPOOL  prelateral       T       L
#> 6        MANY    EH   DRESS       BEN    prenasal       M       N
#> 7         WAY    EY    FACE      BAIT   elsewhere       W        
#> 8       SPELL    EH   DRESS     SHELF  prelateral       P       L
#> 9    SUPPOSED    OW    GOAT      BOAT   elsewhere       P       Z
#> 10      ORBIT    AH   STRUT       BUT   elsewhere       B       T
#> 11        ANY    IY  FLEECE      BEET   elsewhere       N        
#> 12       ONES    AH   STRUT       BUT   elsewhere       W       N
#> 13          A    EY    FACE      BAIT   elsewhere       Z       D
#> 14       POLL    OW    GOAT      JOLT  prelateral       P       L
#> 15       SOLO    OW    GOAT      BOAT   elsewhere       L       S
#> 16        HER    ER   NURSE     NURSE   elsewhere      HH       L
#> 17     COUPLE    AH   STRUT       BUT   elsewhere       K       P
#> 18 CRITICALLY    IY  FLEECE      BEET   elsewhere       L     IH0
#> 19     REVEAL    IY  FLEECE      ZEAL  prelateral       V       L
#> 20     MATTER    AE    TRAP       BAT   elsewhere       M       T

# Some users may want to supply their own list of coronal consonants.
darla %>%
  code_allophones(.old_col = phoneme,
                  .new_cols = c("allophone", "environment"),
                  .fol_seg = fol_seg,
                  .pre_seg = pre_seg,
                  .coronals = c("T", "D", "S", "Z", "SH", "ZH", "JH", "N", "Y")) %>%
  filter(phoneme == "GOOSE") %>%
  slice_sample(n = 20)
#>        word vowel phoneme allophone environment pre_seg fol_seg
#> 1      MOVE    UW   GOOSE      BOOT   elsewhere       M       V
#> 2      INTO    UW   GOOSE      TOOT postcoronal       T       P
#> 3       WHO    UW   GOOSE      BOOT   elsewhere      HH      HH
#> 4    SCHOOL    UW   GOOSE     SPOOL  prelateral       K       L
#> 5       WHO    UW   GOOSE      BOOT   elsewhere      HH       W
#> 6      TOOL    UW   GOOSE     SPOOL  prelateral       T       L
#> 7        TO    UW   GOOSE      TOOT postcoronal       T     IY0
#> 8     BOOTS    UW   GOOSE      BOOT   elsewhere       B       T
#> 9        TO    UW   GOOSE     SPOOL  prelateral       T       L
#> 10       Q.    UW   GOOSE      MULE      post-Y       Y        
#> 11       TO    UW   GOOSE      TOOT postcoronal       T       F
#> 12       TO    UW   GOOSE      TOOT postcoronal       T       B
#> 13       TO    UW   GOOSE      TOOT postcoronal       T      HH
#> 14     FOOD    UW   GOOSE      BOOT   elsewhere       F       D
#> 15     TOOL    UW   GOOSE     SPOOL  prelateral       T       L
#> 16       TO    UW   GOOSE      TOOT postcoronal       T       N
#> 17       TO    UW   GOOSE      TOOT postcoronal       T     IH0
#> 18 STUDENTS    UW   GOOSE      TOOT postcoronal       T       D
#> 19   SCHOOL    UW   GOOSE     SPOOL  prelateral       K       L
#> 20     FOOD    UW   GOOSE      BOOT   elsewhere       F       D

# Other users may want to specify their own list of voiceless consonants.
darla %>%
  code_allophones(.old_col = phoneme,
                  .new_cols = c("allophone", "environment"),
                  .fol_seg = fol_seg,
                  .pre_seg = pre_seg,
                  .voiceless = c("P", "T", "K", "CH", "F", "TH", "S", "SH", "X")) %>%
  filter(phoneme == "PRICE") %>%
  slice_sample(n = 20)
#>           word vowel phoneme allophone  environment pre_seg fol_seg
#> 1         LIKE    AY   PRICE     PRICE prevoiceless       L       K
#> 2       BRIGHT    AY   PRICE     PRICE prevoiceless       R       T
#> 3    EMPHASIZE    AY   PRICE     PRIZE    elsewhere       S       Z
#> 4         FIVE    AY   PRICE     PRIZE    elsewhere       F       V
#> 5     SUPPLIES    AY   PRICE     PRIZE    elsewhere       L       Z
#> 6       STRIKE    AY   PRICE     PRICE prevoiceless       R       K
#> 7      MICHAEL    AY   PRICE     PRICE prevoiceless       M       K
#> 8      ARRIVED    AY   PRICE     PRIZE    elsewhere     ER0       V
#> 9         WILD    AY   PRICE     PRIZE    elsewhere       W       L
#> 10      ADVISE    AY   PRICE     PRIZE    elsewhere       V       Z
#> 11        LIES    AY   PRICE     PRIZE    elsewhere       L       Z
#> 12        VICE    AY   PRICE     PRICE prevoiceless       V       S
#> 13 MERCHANDISE    AY   PRICE     PRIZE    elsewhere       D       Z
#> 14        LIFE    AY   PRICE     PRICE prevoiceless       L       F
#> 15        LIKE    AY   PRICE     PRICE prevoiceless       L       K
#> 16        EYES    AY   PRICE     PRIZE    elsewhere       N       Z
#> 17       RIVAL    AY   PRICE     PRIZE    elsewhere       R       V
#> 18      RIDING    AY   PRICE     PRIZE    elsewhere       R       D
#> 19        FINE    AY   PRICE     PRIZE    elsewhere       F       N
#> 20          BY    AY   PRICE     PRIZE    elsewhere       B      DH

# Collapsing distinctions can be done post hoc (though it may take extra work to get the environment column to match.)
darla %>%
  code_allophones(.old_col = phoneme,
                  .new_cols = c("allophone", "environment"),
                  .fol_seg = fol_seg,
                  .pre_seg = pre_seg) %>%
  # Get a subset for demonstration purposes
  filter(allophone %in% c("BIT", "BIG")) %>%
  group_by(allophone) %>%
  slice_sample(n = 5) %>%
  ungroup() %>%
  # Now collapse distinctions
  mutate(allophone = fct_collapse(allophone, "BIT" = c("BIT", "BIG")),
         environment = ifelse(allophone == "BIT", "elsewhere", allophone))
#> # A tibble: 10 × 7
#>    word      vowel phoneme allophone environment pre_seg fol_seg
#>    <chr>     <chr> <fct>   <fct>     <chr>       <chr>   <chr>  
#>  1 TRIGGER   IH    KIT     BIT       elsewhere   R       G      
#>  2 BEGINNING IH    KIT     BIT       elsewhere   B       G      
#>  3 EXISTED   IH    KIT     BIT       elsewhere   Z       G      
#>  4 EXISTENCE IH    KIT     BIT       elsewhere   IY0     G      
#>  5 EXOTIC    IH    KIT     BIT       elsewhere   V       G      
#>  6 DID       IH    KIT     BIT       elsewhere   D       D      
#>  7 CROCKETT  IH    KIT     BIT       elsewhere   K       T      
#>  8 LISTEN    IH    KIT     BIT       elsewhere   L       S      
#>  9 PERFECT   IH    KIT     BIT       elsewhere   F       K      
#> 10 SOUNDED   IH    KIT     BIT       elsewhere   D       D      

# Creating new allophones depends on the complexity of the allophone
darla %>%
  code_allophones(.old_col = phoneme,
                  .new_cols = c("allophone", "environment"),
                  .fol_seg = fol_seg,
                  .pre_seg = pre_seg) %>%
  # Create voice and voiceless distinctions for MOUTH
  mutate(allophone = case_when(phoneme == "MOUTH" & fol_seg %in% c("P", "T", "K", "CH", "F", "TH", "S", "SH") ~ "BOUT",
                               phoneme == "MOUTH" ~ "LOUD",
                               TRUE ~ allophone),
         environment = if_else(allophone == "BOUT",  "prevoiceless", environment)) %>%
  # Get a subset for demonstration purposes
  filter(phoneme == "MOUTH") %>%
  group_by(allophone) %>%
  slice_sample(n = 5) %>%
#> # A tibble: 10 × 7
#>    word    vowel phoneme allophone environment  pre_seg fol_seg
#>    <chr>   <chr> <fct>   <chr>     <chr>        <chr>   <chr>  
#>  1 HOUSE   AW    MOUTH   BOUT      prevoiceless HH      "S"    
#>  2 ABOUT   AW    MOUTH   BOUT      prevoiceless B       "T"    
#>  3 ABOUT   AW    MOUTH   BOUT      prevoiceless B       "T"    
#>  4 HOUSE   AW    MOUTH   BOUT      prevoiceless HH      "S"    
#>  5 WITHOUT AW    MOUTH   BOUT      prevoiceless TH      "T"    
#>  6 NOW     AW    MOUTH   LOUD      elsewhere    N       ""     
#>  7 HOWEVER AW    MOUTH   LOUD      elsewhere    HH      "EH1"  
#>  8 AMOUNT  AW    MOUTH   LOUD      elsewhere    M       "N"    
#>  9 HOUSED  AW    MOUTH   LOUD      elsewhere    HH      "Z"    
#> 10 ALLOWED AW    MOUTH   LOUD      elsewhere    L       "D"