A function to classify vowel data into contextual allophones.
The dataset containing vowel data.
The unquoted name of the column containing the vowel labels.
Often called "vowel" or "phoneme" in many datasets. Note that the function
assumes Wells lexical sets (FLEECE, TRAP, etc.) rather than ARPABET (IY, AE, etc.)
or IPA (i, æ, etc.). If your vowels are not already coded using Wells' labels
you can quickly do so with switch_transcriptions
or one of the
shortcuts like arpa_to_wells
A vector of two strings containing the names of the columns
you would like to use. By default c("allophone", "allophone_environment")
.
The first name becomes the name of the column containing the new allophone
labels. The second column becomes the name of the column describing those
labels.
The unquoted name of the column that contains the labels for the previous segement. In DARLA-generated spreadsheets, this is `pre_seg` and in FastTrack-generated spreadsheets, it's `previous_sound`. Assumes ARPABET labels.
The unquoted name of the column that contains the labels for the following segement. In DARLA-generated spreadsheets, this is `fol_seg` and in FastTrack-generated spreadsheets, it's `next_sound`. Assumes ARPABET labels.
A vector of strings containing ARPABET labels for coronal consonants.
By default, c("T", "D", "S", "Z", "SH", "ZH", "JH", "N")
. This is used
to create the `TOOT` allophone of `GOOSE`.
A vector of strings containing ARPABET labels for voiceless
consonants. By default, c("P", "T", "K", "CH", "F", "TH", "S", "SH")
.
This is used to create the `PRICE` allophone of `PRIZE`.
A dataframe with two additional columns. One column contains labels
for the allophones and the other contains category labels for those
allophones' contexts. The second column can be useful for quickly excluding
certain allophones like prelaterals or prerhotics or coloring families of
allophones in visualizations (such as turning all prelateral allophones gray).
These two new columns are positioned immediately after the original vowel
column indicated in .old_col
,
Here are the list of the contextual allophones that are created. Note that I largely follow my own advice about what to call elsewhere allophones, what to call prelateral allophones, and other allophones. Obviously, this list is pretty subjective and largely based on what my own research has needed, so it may not work completely for you and your research. Please contact me at joey_stanley@byu.edu if you want to see an allophone get added or if you spot an error in the coding.
FLEECE becomes
ZEAL before laterals
BEET elsewhere
KIT becomes
GUILT before laterals
NEAR before rhotics
BIG before G
BIN before M and N
BING before NG
BIT elsewhere
FACE becomes
FLAIL before laterals
VAGUE before G
BAIT elsewhere
DRESS becomes
SHELF before laterals
SQUARE before rhotics
BEG before G
BEN before M and N
BENG before NG
BET elsewhere
TRAP becomes
TALC before laterals
BAG before G
BAN before M and N
BANG before NG
BAT elsewhere
LOT becomes
GOLF before laterals
START before rhotics
BOT elsewhere
THOUGHT becomes
FAULT before laterals
FORCE befpre rhotics
BOUGHT elsewhere
STRUT becomes
MULCH before laterals
BUT elsewhere
GOAT becomes
JOLT before laterals
BOAT elsewhere
FOOT becomes
WOLF before laterals
CURE before rhotics
PUT elsewhere
GOOSE becomes
MULE before Y
TOOT before coronals
SPOOL before laterals
BOOT elsewhere
PRICE becomes
PRICE before voiceless segments
PRIZE elsewhere
Unfortunately, it is not straightforward to customize this list but you can always copy the source code and modify the list yourself.
Alternatively, you can use forcats::fct_collapse()
to collapse
distinctions that you don't need. See example code below.
You can also of course create your own allophones if desired. Note that some allophones depend on other environmental information like syllable structure and morpheme/word boundaries, or they may be entirely lexical (FORCE vs. NORTH). They may be more complicated than what ARPABET can code for (MARY, MERRY, and MARRY) or just inconsistently coded. For the sake of simplicity, these allophones are not included in this function.
The environments therefore are the following
"prelateral" includes ZEAL, GUILT, FLAIL, SHELF, TALC, GOLF, FAULT, MULCH, JOLT, WOLF, SPOOL
"prerhotic" includes NEAR, SQUARE, START, FORCE, CURE
"prevelar" includes BIG, VAGUE, BEG, BAG,
"prenasal" includes BIN, BEN, BAN
"prevelarnasal" includes BING, BENG, BANG
"prevoiceless" includes PRICE
"post-Y" includes MULE
"postcoronal" includes TOOT
"elsewhere" includes BEET, BIT, BAIT, BET, BAT, BOT, BOUGHT, BUT, BOAT, PUT, BOOT, PRIZE
suppressPackageStartupMessages(library(tidyverse))
# Get some sample DARLA data to play with
darla <- joeysvowels::darla %>%
select(word, vowel, pre_seg, fol_seg) %>%
mutate(phoneme = joeyr:::arpa_to_wells(vowel), .after = vowel)
# Basic usage
darla %>%
code_allophones(.old_col = phoneme, .fol_seg = fol_seg, .pre_seg = pre_seg) %>%
slice_sample(n = 20)
#> word vowel phoneme allophone allophone_environment pre_seg fol_seg
#> 1 BRIAN AY PRICE PRIZE elsewhere R AH0
#> 2 PUT UH FOOT PUT elsewhere P T
#> 3 REAGAN IY FLEECE BEET elsewhere R G
#> 4 BALANCE AE TRAP TALC prelateral B L
#> 5 HIGHEST AY PRICE PRIZE elsewhere HH AH0
#> 6 KIDS IH KIT BIT elsewhere K D
#> 7 CROWN AW MOUTH MOUTH elsewhere R N
#> 8 LOADED OW GOAT BOAT elsewhere L D
#> 9 INVOLVED AA LOT GOLF prelateral V L
#> 10 NORTHERN AO THOUGHT FORCE prerhotic N R
#> 11 Q. UW GOOSE MULE post-Y Y
#> 12 SUBSEQUENTLY AH STRUT BUT elsewhere S B
#> 13 GAME EY FACE BAIT elsewhere G M
#> 14 COTTON AO THOUGHT BOUGHT elsewhere K T
#> 15 EXISTED AH STRUT BUT elsewhere T D
#> 16 WIRE AY PRICE PRIZE elsewhere W R
#> 17 A EY FACE BAIT elsewhere IY1 CH
#> 18 MORE AO THOUGHT FORCE prerhotic M R
#> 19 RAINBOW EY FACE BAIT elsewhere R N
#> 20 INTERPOL IH KIT BIN prenasal IY1 N
# Specify the names of the new columns with the `.new_cols` argument
darla %>%
code_allophones(.old_col = phoneme,
.new_cols = c("allophone", "environment"),
.fol_seg = fol_seg,
.pre_seg = pre_seg) %>%
slice_sample(n = 20)
#> word vowel phoneme allophone environment pre_seg fol_seg
#> 1 SEE IY FLEECE BEET elsewhere S W
#> 2 BALLOON AH STRUT MULCH prelateral B L
#> 3 CRITICALLY IY FLEECE BEET elsewhere L IH0
#> 4 HALF AE TRAP BAT elsewhere HH F
#> 5 AROUND ER NURSE NURSE elsewhere S AW1
#> 6 GO OW GOAT BOAT elsewhere G AH0
#> 7 ABOUT AW MOUTH MOUTH elsewhere B T
#> 8 THIRTY ER NURSE NURSE elsewhere TH T
#> 9 OVERLAND ER NURSE NURSE elsewhere V L
#> 10 ELEPHANT AH STRUT BUT elsewhere F N
#> 11 HER ER NURSE NURSE elsewhere HH CH
#> 12 RELATIONSHIPS IH KIT BIT elsewhere SH P
#> 13 ON AO THOUGHT BOUGHT elsewhere T N
#> 14 ACTUALLY AE TRAP BAT elsewhere T K
#> 15 LEAST IY FLEECE BEET elsewhere L S
#> 16 SOUNDED AW MOUTH MOUTH elsewhere S N
#> 17 DISRUPTION AH STRUT BUT elsewhere R P
#> 18 LEARNING ER NURSE NURSE elsewhere L N
#> 19 COUNTRY AH STRUT BUT elsewhere K N
#> 20 FEDERAL EH DRESS BET elsewhere F D
# Filtering by environment is straightforward
darla %>%
code_allophones(.old_col = phoneme,
.new_cols = c("allophone", "environment"),
.fol_seg = fol_seg,
.pre_seg = pre_seg) %>%
filter(environment == "elsewhere") %>%
slice_sample(n = 20)
#> word vowel phoneme allophone environment pre_seg fol_seg
#> 1 READING IY FLEECE BEET elsewhere R D
#> 2 MIA IY FLEECE BEET elsewhere M AH0
#> 3 ABOUT AW MOUTH MOUTH elsewhere B T
#> 4 DOWN AW MOUTH MOUTH elsewhere D N
#> 5 RESPONSIBILITIES AA LOT BOT elsewhere P N
#> 6 SOLAR ER NURSE NURSE elsewhere L IY0
#> 7 OPEN OW GOAT BOAT elsewhere T P
#> 8 AT AE TRAP BAT elsewhere N T
#> 9 INTERRUPTED AH STRUT BUT elsewhere ER0 P
#> 10 SILLY IY FLEECE BEET elsewhere L T
#> 11 US EH DRESS BET elsewhere UW1 S
#> 12 BOXES AA LOT BOT elsewhere B K
#> 13 COMMUNITY IY FLEECE BEET elsewhere T T
#> 14 MAGAZINE IY FLEECE BEET elsewhere Z N
#> 15 ENTRIES IY FLEECE BEET elsewhere R Z
#> 16 INFORMATION ER NURSE NURSE elsewhere F M
#> 17 DIGITAL IH KIT BIT elsewhere D JH
#> 18 SLATE EY FACE BAIT elsewhere L T
#> 19 GAME EY FACE BAIT elsewhere G M
#> 20 COST AO THOUGHT BOUGHT elsewhere K S
darla %>%
code_allophones(.old_col = phoneme,
.new_cols = c("allophone", "environment"),
.fol_seg = fol_seg,
.pre_seg = pre_seg) %>%
filter(!environment %in% c("prerhotic", "prevelarnasal", "prevelar")) %>%
slice_sample(n = 20)
#> word vowel phoneme allophone environment pre_seg fol_seg
#> 1 PATIENTS AH STRUT BUT elsewhere SH N
#> 2 WHY AY PRICE PRIZE elsewhere W HH
#> 3 ANYBODY IY FLEECE BEET elsewhere D
#> 4 DOING UW GOOSE TOOT postcoronal D IH0
#> 5 KNOWN OW GOAT BOAT elsewhere N N
#> 6 HER ER NURSE NURSE elsewhere HH CH
#> 7 IMPLEMENTATION EY FACE BAIT elsewhere T SH
#> 8 MAPLE EY FACE BAIT elsewhere M P
#> 9 AVAILABLE EY FACE FLAIL prelateral V L
#> 10 THAT'S AE TRAP BAT elsewhere DH T
#> 11 YOUNG AH STRUT BUT elsewhere Y NG
#> 12 ON AO THOUGHT BOUGHT elsewhere V N
#> 13 OVERLAND OW GOAT BOAT elsewhere V V
#> 14 NOT AA LOT BOT elsewhere N T
#> 15 TEST EH DRESS BET elsewhere T S
#> 16 OWN OW GOAT BOAT elsewhere R N
#> 17 VAST AE TRAP BAT elsewhere V S
#> 18 CHOOSES UW GOOSE BOOT elsewhere CH Z
#> 19 OVER ER NURSE NURSE elsewhere V T
#> 20 PUCK AH STRUT BUT elsewhere P K
# Some users may want to supply their own list of coronal consonants.
darla %>%
code_allophones(.old_col = phoneme,
.new_cols = c("allophone", "environment"),
.fol_seg = fol_seg,
.pre_seg = pre_seg,
.coronals = c("T", "D", "S", "Z", "SH", "ZH", "JH", "N", "Y")) %>%
filter(phoneme == "GOOSE") %>%
slice_sample(n = 20)
#> word vowel phoneme allophone environment pre_seg fol_seg
#> 1 STUDENTS UW GOOSE TOOT postcoronal T D
#> 2 HUMAN UW GOOSE MULE post-Y Y M
#> 3 INTO UW GOOSE TOOT postcoronal T EY1
#> 4 WHO UW GOOSE SPOOL prelateral HH L
#> 5 CUE UW GOOSE MULE post-Y Y DH
#> 6 TO UW GOOSE TOOT postcoronal T F
#> 7 NUMEROUS UW GOOSE TOOT postcoronal N M
#> 8 USE UW GOOSE MULE post-Y Y Z
#> 9 TO UW GOOSE SPOOL prelateral T L
#> 10 TO UW GOOSE TOOT postcoronal T DH
#> 11 TO UW GOOSE TOOT postcoronal T
#> 12 INTO UW GOOSE TOOT postcoronal T W
#> 13 RULE UW GOOSE SPOOL prelateral R L
#> 14 FORTUNE UW GOOSE BOOT elsewhere CH N
#> 15 GROUP UW GOOSE BOOT elsewhere R P
#> 16 TO UW GOOSE TOOT postcoronal T K
#> 17 TO UW GOOSE TOOT postcoronal T IH0
#> 18 TWO UW GOOSE TOOT postcoronal T ER0
#> 19 TO UW GOOSE TOOT postcoronal T
#> 20 TO UW GOOSE TOOT postcoronal T AE1
# Other users may want to specify their own list of voiceless consonants.
darla %>%
code_allophones(.old_col = phoneme,
.new_cols = c("allophone", "environment"),
.fol_seg = fol_seg,
.pre_seg = pre_seg,
.voiceless = c("P", "T", "K", "CH", "F", "TH", "S", "SH", "X")) %>%
filter(phoneme == "PRICE") %>%
slice_sample(n = 20)
#> word vowel phoneme allophone environment pre_seg fol_seg
#> 1 ANXIETY AY PRICE PRIZE elsewhere Z AH0
#> 2 PRICE AY PRICE PRICE prevoiceless R S
#> 3 POLITELY AY PRICE PRICE prevoiceless L T
#> 4 WHITE AY PRICE PRICE prevoiceless W T
#> 5 FIND AY PRICE PRIZE elsewhere F N
#> 6 I'VE AY PRICE PRIZE elsewhere K V
#> 7 IDENTIFIED AY PRICE PRIZE elsewhere IY0 D
#> 8 I'VE AY PRICE PRIZE elsewhere T V
#> 9 WHILE AY PRICE PRIZE elsewhere W L
#> 10 TIME AY PRICE PRIZE elsewhere T M
#> 11 HYBRID AY PRICE PRIZE elsewhere HH B
#> 12 LINE AY PRICE PRIZE elsewhere L N
#> 13 BLIND AY PRICE PRIZE elsewhere L N
#> 14 SIDE AY PRICE PRIZE elsewhere S D
#> 15 BRYCE AY PRICE PRICE prevoiceless R S
#> 16 BY AY PRICE PRIZE elsewhere B
#> 17 FIVE AY PRICE PRIZE elsewhere F V
#> 18 BRIGHTLY AY PRICE PRICE prevoiceless R T
#> 19 MICHAEL AY PRICE PRICE prevoiceless M K
#> 20 FIND AY PRICE PRIZE elsewhere F N
# Collapsing distinctions can be done post hoc (though it may take extra work to get the environment column to match.)
darla %>%
code_allophones(.old_col = phoneme,
.new_cols = c("allophone", "environment"),
.fol_seg = fol_seg,
.pre_seg = pre_seg) %>%
# Get a subset for demonstration purposes
filter(allophone %in% c("BIT", "BIG")) %>%
group_by(allophone) %>%
slice_sample(n = 5) %>%
ungroup() %>%
# Now collapse distinctions
mutate(allophone = fct_collapse(allophone, "BIT" = c("BIT", "BIG")),
environment = ifelse(allophone == "BIT", "elsewhere", allophone))
#> # A tibble: 10 × 7
#> word vowel phoneme allophone environment pre_seg fol_seg
#> <chr> <chr> <fct> <fct> <chr> <chr> <chr>
#> 1 BEGINNING IH KIT BIT elsewhere B G
#> 2 TRIGGER IH KIT BIT elsewhere R G
#> 3 EXISTENCE IH KIT BIT elsewhere IY0 G
#> 4 BIG IH KIT BIT elsewhere B G
#> 5 EXOTIC IH KIT BIT elsewhere V G
#> 6 HEADED IH KIT BIT elsewhere D D
#> 7 CULTIVATED IH KIT BIT elsewhere T D
#> 8 HIS IH KIT BIT elsewhere HH Z
#> 9 ADMITTING IH KIT BIT elsewhere M T
#> 10 ARBITRATION IH KIT BIT elsewhere B T
# Creating new allophones depends on the complexity of the allophone
darla %>%
code_allophones(.old_col = phoneme,
.new_cols = c("allophone", "environment"),
.fol_seg = fol_seg,
.pre_seg = pre_seg) %>%
# Create voice and voiceless distinctions for MOUTH
mutate(allophone = case_when(phoneme == "MOUTH" & fol_seg %in% c("P", "T", "K", "CH", "F", "TH", "S", "SH") ~ "BOUT",
phoneme == "MOUTH" ~ "LOUD",
TRUE ~ allophone),
environment = if_else(allophone == "BOUT", "prevoiceless", environment)) %>%
# Get a subset for demonstration purposes
filter(phoneme == "MOUTH") %>%
group_by(allophone) %>%
slice_sample(n = 5) %>%
ungroup()
#> # A tibble: 10 × 7
#> word vowel phoneme allophone environment pre_seg fol_seg
#> <chr> <chr> <fct> <chr> <chr> <chr> <chr>
#> 1 WITHOUT AW MOUTH BOUT prevoiceless TH T
#> 2 HOUSE AW MOUTH BOUT prevoiceless HH S
#> 3 OUT AW MOUTH BOUT prevoiceless T T
#> 4 OUT AW MOUTH BOUT prevoiceless T T
#> 5 ABOUT AW MOUTH BOUT prevoiceless B T
#> 6 COUNTY AW MOUTH LOUD elsewhere K N
#> 7 BOUNDARIES AW MOUTH LOUD elsewhere B N
#> 8 OUR AW MOUTH LOUD elsewhere V R
#> 9 NOW AW MOUTH LOUD elsewhere N DH
#> 10 HOWEVER AW MOUTH LOUD elsewhere HH EH1