A function to classify vowel data into contextual allophones.
The dataset containing vowel data.
The unquoted name of the column containing the vowel labels.
Often called "vowel" or "phoneme" in many datasets. Note that the function
assumes Wells lexical sets (FLEECE, TRAP, etc.) rather than ARPABET (IY, AE, etc.)
or IPA (i, æ, etc.). If your vowels are not already coded using Wells' labels
you can quickly do so with switch_transcriptions
or one of the
shortcuts like arpa_to_wells
A vector of two strings containing the names of the columns
you would like to use. By default c("allophone", "allophone_environment")
.
The first name becomes the name of the column containing the new allophone
labels. The second column becomes the name of the column describing those
labels.
The unquoted name of the column that contains the labels for the previous segement. In DARLA-generated spreadsheets, this is `pre_seg` and in FastTrack-generated spreadsheets, it's `previous_sound`. Assumes ARPABET labels.
The unquoted name of the column that contains the labels for the following segement. In DARLA-generated spreadsheets, this is `fol_seg` and in FastTrack-generated spreadsheets, it's `next_sound`. Assumes ARPABET labels.
A vector of strings containing ARPABET labels for coronal consonants.
By default, c("T", "D", "S", "Z", "SH", "ZH", "JH", "N")
. This is used
to create the `TOOT` allophone of `GOOSE`.
A vector of strings containing ARPABET labels for voiceless
consonants. By default, c("P", "T", "K", "CH", "F", "TH", "S", "SH")
.
This is used to create the `PRICE` allophone of `PRIZE`.
A dataframe with two additional columns. One column contains labels
for the allophones and the other contains category labels for those
allophones' contexts. The second column can be useful for quickly excluding
certain allophones like prelaterals or prerhotics or coloring families of
allophones in visualizations (such as turning all prelateral allophones gray).
These two new columns are positioned immediately after the original vowel
column indicated in .old_col
,
Here are the list of the contextual allophones that are created. Note that I largely follow my own advice about what to call elsewhere allophones, what to call prelateral allophones, and other allophones. Obviously, this list is pretty subjective and largely based on what my own research has needed, so it may not work completely for you and your research. Please contact me at joey_stanley@byu.edu if you want to see an allophone get added or if you spot an error in the coding.
FLEECE becomes
ZEAL before laterals
BEET elsewhere
KIT becomes
GUILT before laterals
NEAR before rhotics
BIG before G
BIN before M and N
BING before NG
BIT elsewhere
FACE becomes
FLAIL before laterals
VAGUE before G
BAIT elsewhere
DRESS becomes
SHELF before laterals
SQUARE before rhotics
BEG before G
BEN before M and N
BENG before NG
BET elsewhere
TRAP becomes
TALC before laterals
BAG before G
BAN before M and N
BANG before NG
BAT elsewhere
LOT becomes
GOLF before laterals
START before rhotics
BOT elsewhere
THOUGHT becomes
FAULT before laterals
FORCE befpre rhotics
BOUGHT elsewhere
STRUT becomes
MULCH before laterals
BUT elsewhere
GOAT becomes
JOLT before laterals
BOAT elsewhere
FOOT becomes
WOLF before laterals
CURE before rhotics
PUT elsewhere
GOOSE becomes
MULE before Y
TOOT before coronals
SPOOL before laterals
BOOT elsewhere
PRICE becomes
PRICE before voiceless segments
PRIZE elsewhere
Unfortunately, it is not straightforward to customize this list but you can always copy the source code and modify the list yourself.
Alternatively, you can use forcats::fct_collapse()
to collapse
distinctions that you don't need. See example code below.
You can also of course create your own allophones if desired. Note that some allophones depend on other environmental information like syllable structure and morpheme/word boundaries, or they may be entirely lexical (FORCE vs. NORTH). They may be more complicated than what ARPABET can code for (MARY, MERRY, and MARRY) or just inconsistently coded. For the sake of simplicity, these allophones are not included in this function.
The environments therefore are the following
"prelateral" includes ZEAL, GUILT, FLAIL, SHELF, TALC, GOLF, FAULT, MULCH, JOLT, WOLF, SPOOL
"prerhotic" includes NEAR, SQUARE, START, FORCE, CURE
"prevelar" includes BIG, VAGUE, BEG, BAG,
"prenasal" includes BIN, BEN, BAN
"prevelarnasal" includes BING, BENG, BANG
"prevoiceless" includes PRICE
"post-Y" includes MULE
"postcoronal" includes TOOT
"elsewhere" includes BEET, BIT, BAIT, BET, BAT, BOT, BOUGHT, BUT, BOAT, PUT, BOOT, PRIZE
suppressPackageStartupMessages(library(tidyverse))
# Get some sample DARLA data to play with
darla <- joeysvowels::darla %>%
select(word, vowel, pre_seg, fol_seg) %>%
mutate(phoneme = joeyr:::arpa_to_wells(vowel), .after = vowel)
# Basic usage
darla %>%
code_allophones(.old_col = phoneme, .fol_seg = fol_seg, .pre_seg = pre_seg) %>%
slice_sample(n = 20)
#> word vowel phoneme allophone allophone_environment pre_seg fol_seg
#> 1 JACKSON AE TRAP BAT elsewhere JH K
#> 2 WEEKS IY FLEECE BEET elsewhere W K
#> 3 ECLIPSE IH KIT BIT elsewhere L P
#> 4 WORKPLACE ER NURSE NURSE elsewhere W K
#> 5 IN IH KIT BIN prenasal IY0 N
#> 6 CHANCE AE TRAP BAN prenasal CH N
#> 7 BREAD EH DRESS BET elsewhere R D
#> 8 INTELLECTUAL IH KIT BIN prenasal N N
#> 9 MOST OW GOAT BOAT elsewhere M S
#> 10 STEEL IY FLEECE ZEAL prelateral T L
#> 11 HOWEVER AW MOUTH MOUTH elsewhere HH EH1
#> 12 STRAINS EY FACE BAIT elsewhere R N
#> 13 CELLPHONE EH DRESS SHELF prelateral S L
#> 14 EVER EH DRESS BET elsewhere Z V
#> 15 MARKET AA LOT START prerhotic M R
#> 16 RIGHT AY PRICE PRICE prevoiceless R T
#> 17 COFFEE IY FLEECE BEET elsewhere F
#> 18 BODY AA LOT BOT elsewhere B D
#> 19 AMOUNT AW MOUTH MOUTH elsewhere M N
#> 20 ECLIPSE IY FLEECE BEET elsewhere ER0 K
# Specify the names of the new columns with the `.new_cols` argument
darla %>%
code_allophones(.old_col = phoneme,
.new_cols = c("allophone", "environment"),
.fol_seg = fol_seg,
.pre_seg = pre_seg) %>%
slice_sample(n = 20)
#> word vowel phoneme allophone environment pre_seg fol_seg
#> 1 SUDAN UW GOOSE TOOT postcoronal S D
#> 2 PREPARING IH KIT BING prevelarnasal R NG
#> 3 VOID OY CHOICE CHOICE elsewhere V D
#> 4 CARLOS OW GOAT BOAT elsewhere L S
#> 5 NOTICED OW GOAT BOAT elsewhere N T
#> 6 EASILY IY FLEECE BEET elsewhere Z
#> 7 READING IY FLEECE BEET elsewhere R D
#> 8 INSIDERS AY PRICE PRIZE elsewhere S D
#> 9 BY AY PRICE PRICE prevoiceless B S
#> 10 SETTING EH DRESS BET elsewhere S T
#> 11 THREE IY FLEECE BEET elsewhere R EY1
#> 12 MR ER NURSE NURSE elsewhere T M
#> 13 CALM AA LOT BOT elsewhere K M
#> 14 AT AE TRAP BAT elsewhere T
#> 15 DISAPPEARED IH KIT BIT elsewhere D S
#> 16 ELEPHANTS EH DRESS SHELF prelateral L L
#> 17 CANCEL AE TRAP BAN prenasal K N
#> 18 WOOD UH FOOT PUT elsewhere W D
#> 19 ARE ER NURSE NURSE elsewhere T K
#> 20 PLENTY IY FLEECE BEET elsewhere T AH1
# Filtering by environment is straightforward
darla %>%
code_allophones(.old_col = phoneme,
.new_cols = c("allophone", "environment"),
.fol_seg = fol_seg,
.pre_seg = pre_seg) %>%
filter(environment == "elsewhere") %>%
slice_sample(n = 20)
#> word vowel phoneme allophone environment pre_seg fol_seg
#> 1 STATION EY FACE BAIT elsewhere T SH
#> 2 LOOKED UH FOOT PUT elsewhere L K
#> 3 AFTER AE TRAP BAT elsewhere Z F
#> 4 BE IY FLEECE BEET elsewhere B EY1
#> 5 EASY IY FLEECE BEET elsewhere ER0 Z
#> 6 WORST ER NURSE NURSE elsewhere W S
#> 7 TERM ER NURSE NURSE elsewhere T M
#> 8 COLOR ER NURSE NURSE elsewhere L IH1
#> 9 FOOD UW GOOSE BOOT elsewhere F D
#> 10 NEVER EH DRESS BET elsewhere N V
#> 11 D. IY FLEECE BEET elsewhere D F
#> 12 GAME EY FACE BAIT elsewhere G M
#> 13 HARDLY IY FLEECE BEET elsewhere L S
#> 14 WANTED IH KIT BIT elsewhere T D
#> 15 ON AO THOUGHT BOUGHT elsewhere N N
#> 16 DEATH EH DRESS BET elsewhere D TH
#> 17 BUSINESS IH KIT BIT elsewhere N S
#> 18 CLUB AH STRUT BUT elsewhere L B
#> 19 COULD UH FOOT PUT elsewhere K D
#> 20 LAST AE TRAP BAT elsewhere L S
darla %>%
code_allophones(.old_col = phoneme,
.new_cols = c("allophone", "environment"),
.fol_seg = fol_seg,
.pre_seg = pre_seg) %>%
filter(!environment %in% c("prerhotic", "prevelarnasal", "prevelar")) %>%
slice_sample(n = 20)
#> word vowel phoneme allophone environment pre_seg fol_seg
#> 1 SETTLERS ER NURSE NURSE elsewhere L Z
#> 2 TOGETHER EH DRESS BET elsewhere G DH
#> 3 ADMITTING IH KIT BIT elsewhere M T
#> 4 U. UW GOOSE MULE post-Y Y EH1
#> 5 TOOL UW GOOSE SPOOL prelateral T L
#> 6 MANY EH DRESS BEN prenasal M N
#> 7 WAY EY FACE BAIT elsewhere W
#> 8 SPELL EH DRESS SHELF prelateral P L
#> 9 SUPPOSED OW GOAT BOAT elsewhere P Z
#> 10 ORBIT AH STRUT BUT elsewhere B T
#> 11 ANY IY FLEECE BEET elsewhere N
#> 12 ONES AH STRUT BUT elsewhere W N
#> 13 A EY FACE BAIT elsewhere Z D
#> 14 POLL OW GOAT JOLT prelateral P L
#> 15 SOLO OW GOAT BOAT elsewhere L S
#> 16 HER ER NURSE NURSE elsewhere HH L
#> 17 COUPLE AH STRUT BUT elsewhere K P
#> 18 CRITICALLY IY FLEECE BEET elsewhere L IH0
#> 19 REVEAL IY FLEECE ZEAL prelateral V L
#> 20 MATTER AE TRAP BAT elsewhere M T
# Some users may want to supply their own list of coronal consonants.
darla %>%
code_allophones(.old_col = phoneme,
.new_cols = c("allophone", "environment"),
.fol_seg = fol_seg,
.pre_seg = pre_seg,
.coronals = c("T", "D", "S", "Z", "SH", "ZH", "JH", "N", "Y")) %>%
filter(phoneme == "GOOSE") %>%
slice_sample(n = 20)
#> word vowel phoneme allophone environment pre_seg fol_seg
#> 1 MOVE UW GOOSE BOOT elsewhere M V
#> 2 INTO UW GOOSE TOOT postcoronal T P
#> 3 WHO UW GOOSE BOOT elsewhere HH HH
#> 4 SCHOOL UW GOOSE SPOOL prelateral K L
#> 5 WHO UW GOOSE BOOT elsewhere HH W
#> 6 TOOL UW GOOSE SPOOL prelateral T L
#> 7 TO UW GOOSE TOOT postcoronal T IY0
#> 8 BOOTS UW GOOSE BOOT elsewhere B T
#> 9 TO UW GOOSE SPOOL prelateral T L
#> 10 Q. UW GOOSE MULE post-Y Y
#> 11 TO UW GOOSE TOOT postcoronal T F
#> 12 TO UW GOOSE TOOT postcoronal T B
#> 13 TO UW GOOSE TOOT postcoronal T HH
#> 14 FOOD UW GOOSE BOOT elsewhere F D
#> 15 TOOL UW GOOSE SPOOL prelateral T L
#> 16 TO UW GOOSE TOOT postcoronal T N
#> 17 TO UW GOOSE TOOT postcoronal T IH0
#> 18 STUDENTS UW GOOSE TOOT postcoronal T D
#> 19 SCHOOL UW GOOSE SPOOL prelateral K L
#> 20 FOOD UW GOOSE BOOT elsewhere F D
# Other users may want to specify their own list of voiceless consonants.
darla %>%
code_allophones(.old_col = phoneme,
.new_cols = c("allophone", "environment"),
.fol_seg = fol_seg,
.pre_seg = pre_seg,
.voiceless = c("P", "T", "K", "CH", "F", "TH", "S", "SH", "X")) %>%
filter(phoneme == "PRICE") %>%
slice_sample(n = 20)
#> word vowel phoneme allophone environment pre_seg fol_seg
#> 1 LIKE AY PRICE PRICE prevoiceless L K
#> 2 BRIGHT AY PRICE PRICE prevoiceless R T
#> 3 EMPHASIZE AY PRICE PRIZE elsewhere S Z
#> 4 FIVE AY PRICE PRIZE elsewhere F V
#> 5 SUPPLIES AY PRICE PRIZE elsewhere L Z
#> 6 STRIKE AY PRICE PRICE prevoiceless R K
#> 7 MICHAEL AY PRICE PRICE prevoiceless M K
#> 8 ARRIVED AY PRICE PRIZE elsewhere ER0 V
#> 9 WILD AY PRICE PRIZE elsewhere W L
#> 10 ADVISE AY PRICE PRIZE elsewhere V Z
#> 11 LIES AY PRICE PRIZE elsewhere L Z
#> 12 VICE AY PRICE PRICE prevoiceless V S
#> 13 MERCHANDISE AY PRICE PRIZE elsewhere D Z
#> 14 LIFE AY PRICE PRICE prevoiceless L F
#> 15 LIKE AY PRICE PRICE prevoiceless L K
#> 16 EYES AY PRICE PRIZE elsewhere N Z
#> 17 RIVAL AY PRICE PRIZE elsewhere R V
#> 18 RIDING AY PRICE PRIZE elsewhere R D
#> 19 FINE AY PRICE PRIZE elsewhere F N
#> 20 BY AY PRICE PRIZE elsewhere B DH
# Collapsing distinctions can be done post hoc (though it may take extra work to get the environment column to match.)
darla %>%
code_allophones(.old_col = phoneme,
.new_cols = c("allophone", "environment"),
.fol_seg = fol_seg,
.pre_seg = pre_seg) %>%
# Get a subset for demonstration purposes
filter(allophone %in% c("BIT", "BIG")) %>%
group_by(allophone) %>%
slice_sample(n = 5) %>%
ungroup() %>%
# Now collapse distinctions
mutate(allophone = fct_collapse(allophone, "BIT" = c("BIT", "BIG")),
environment = ifelse(allophone == "BIT", "elsewhere", allophone))
#> # A tibble: 10 × 7
#> word vowel phoneme allophone environment pre_seg fol_seg
#> <chr> <chr> <fct> <fct> <chr> <chr> <chr>
#> 1 TRIGGER IH KIT BIT elsewhere R G
#> 2 BEGINNING IH KIT BIT elsewhere B G
#> 3 EXISTED IH KIT BIT elsewhere Z G
#> 4 EXISTENCE IH KIT BIT elsewhere IY0 G
#> 5 EXOTIC IH KIT BIT elsewhere V G
#> 6 DID IH KIT BIT elsewhere D D
#> 7 CROCKETT IH KIT BIT elsewhere K T
#> 8 LISTEN IH KIT BIT elsewhere L S
#> 9 PERFECT IH KIT BIT elsewhere F K
#> 10 SOUNDED IH KIT BIT elsewhere D D
# Creating new allophones depends on the complexity of the allophone
darla %>%
code_allophones(.old_col = phoneme,
.new_cols = c("allophone", "environment"),
.fol_seg = fol_seg,
.pre_seg = pre_seg) %>%
# Create voice and voiceless distinctions for MOUTH
mutate(allophone = case_when(phoneme == "MOUTH" & fol_seg %in% c("P", "T", "K", "CH", "F", "TH", "S", "SH") ~ "BOUT",
phoneme == "MOUTH" ~ "LOUD",
TRUE ~ allophone),
environment = if_else(allophone == "BOUT", "prevoiceless", environment)) %>%
# Get a subset for demonstration purposes
filter(phoneme == "MOUTH") %>%
group_by(allophone) %>%
slice_sample(n = 5) %>%
ungroup()
#> # A tibble: 10 × 7
#> word vowel phoneme allophone environment pre_seg fol_seg
#> <chr> <chr> <fct> <chr> <chr> <chr> <chr>
#> 1 HOUSE AW MOUTH BOUT prevoiceless HH "S"
#> 2 ABOUT AW MOUTH BOUT prevoiceless B "T"
#> 3 ABOUT AW MOUTH BOUT prevoiceless B "T"
#> 4 HOUSE AW MOUTH BOUT prevoiceless HH "S"
#> 5 WITHOUT AW MOUTH BOUT prevoiceless TH "T"
#> 6 NOW AW MOUTH LOUD elsewhere N ""
#> 7 HOWEVER AW MOUTH LOUD elsewhere HH "EH1"
#> 8 AMOUNT AW MOUTH LOUD elsewhere M "N"
#> 9 HOUSED AW MOUTH LOUD elsewhere HH "Z"
#> 10 ALLOWED AW MOUTH LOUD elsewhere L "D"