Phoneme Model


Evan Kirshenbaum's feature set used in his ASCII transcription of the International Phonetic Alphabet (IPA)[1], [2] describes the phonemes in a way consistent with how the phonemes are organised in the IPA code chart. That is the approach used in the Phonemes document to describe the phonemes in a phoneme definition file.

Those phoneme features often represent the action of more than one articulatory mechanism used to produce speech, or affect the same area. Internally, espeak-ng makes use of the articulatory model, not the IPA descriptions. This document describes how the feature-based IPA model is mapped to the articulatory model.

People working on adding new voices or languages do not need to read this document, but should instead read the Phonemes document. This is intended for people working on the espeak-ng codebase, or people interested in how espeak-ng works internally.

NOTE: This model is in the process of being implemented. As such, the current implementation does not reflect this document.

Manner of Articulation

The manner of articulation is described in terms of several distinct feature types. The possible manners of articulation are:

Manner of ArticulationFeaturePhoneme Model
nasalnaspmc egs nsl occ
plosive (stop)stppmc egs orl occ
affricateafrpmc egs orl occ frr
fricativefrcpmc egs orl frv
tap/flapflppmc egs orl fla
trilltrlpmc egs orl tri
approximantaprpmc egs orl app
clickclkvlc igs orl
ejectiveejcvlc igs orl occ
implosiveimpgtc igs
vowelvwlpmc egs orl vow

For imp consonants, they use the features of the base phoneme except for the pmc and egs features. Thus, a nas imp is a gtc igs nsl occ.

The vwl phonemes are described using vowel height and backness features, while consonants (the other manners of articulation) are described using place of articulation features.

Additionally, the manner of articulation can be refined using the following features:

FeatureNameDescription
latlateralThe air flow is directed along the sides of the tongue.
sibsibilantThe air flow is directed through the teeth with the tongue.

Air Flow

FeatureNameDescription
egsegressiveThe air flow is moving outwards from the initiator to the target.
igsingressiveThe air flow is moving inwards from the target to the initiator.

Initiator

FeatureNameDescription
pmcpulmonicThe diaphragm and lungs are used to generate the airstream.
gtcglottalicThe glottis is used to generate the airstream.
vlcvelaricThe velum is closed and the tongue is used to generate the airstream.
pcvpercussiveThere is no airstream used to produce this sound.

Target

FeatureNameDescription
nslnasalThe air flows through the nose.
orloralThe air flows through the mouth.

Co-articulation

FeatureNameTarget
nzdnasalizednsl

Manner

FeatureNameDescription
occocclusiveThe air flow is blocked within the vocal tract.
frvfricativeThe air flow is constricted, causing turbulence.
flaflapA single tap of the tongue against the secondary articulator.
tritrillA rapid vibration of the primary articulator against the secondary articulator.
appapproximantThe vocal tract is narrowed at the place of articulation without being turbulant.
vowvowelThe phoneme is articulated as a vowel instead of a consonant.

Place of Articulation

The place of articulation is described in terms of an active articulator and one or more passive articulators[9]. The possible places of articulation are:

Place of ArticulationFeatureActiveLipsTeethPassive
bilabialblblblulp
linguolabiallgllmnulp
labiodentallbdlblutt
bilabial-labiodentalbldbldulputt
interdentalidtlmnutt
dentaldntapcutt
denti-alveolardtalmnuttalf
alveolaralvlmnalf
apico-alveolarapaapcalf
palato-alveolarplalmnalb
apical retroflexarfsacalb
retroflexrfxapchpl
alveolo-palatalalpdslalb
palatalpaldslhpl
velarveldslspl
labio-velarlbvdslulpspl
uvularuvldsluvu
pharyngealphrrdlprx
epiglotto-pharyngealepplyxprx
(ary-)epiglottalepglyxegs
glottalgltlyxgts

Active Articulators

FeatureNameArticulator
lbllabiallower lip
lmnlaminaltongue blade
apcapicaltongue tip
sacsubapicalunderside of the tongue
dsldorsaltongue body
rdlradicaltongue root
lyxlaryngeallarynx

Passive Articulators

FeatureArticulator
ulpupper lip
uttupper teeth
alfalveolar ridge (front)
albalveolar ridge (back)
hplhard palate
splsoft palate (velum)
uvuuvular
prxpharynx
egsepiglottis
gtsglottis

Co-articulation

FeatureNameArticulator
pzdpalatalizedhpl
vzdvelarizedspl
fzdpharyngealizedprx
nzdnasalizednsl
rzdrhoticizedapc hpl

Phonation

The phonation features describe the degree to which the glottis (vocal chords) are open or closed.

FeatureNameDescription
vlsvoicelessThe glottis is fully open, such that the vocal chords do not vibrate.
brvbreathy voiceThe glottis is closed slightly, to produce a whispered or murmured sound.
slvslack voiceThe glottis is opened wider than mdv, but not enough to be brv.
mdvmodal voiceThe glottis is opened to provide the optimal vibration of the vocal chords.
stvstiff voiceThe glottis is closed narrower than mdv, but not enough to be crv.
crvcreaky voiceThe glottis is closed to produce a vocal or glottal fry.
glcglottal closureThe glottis is fully closed.

Voice

VoiceFeaturePhoneme Model
voicelessvlsvls
voicedvcdmdv

Vowel Height

FeatureName
hghclose (high)
smhnear-close (semi-high)
umdclose-mid (upper-mid)
midmid
lmdopen-mid (lower-mid)
smlnear-open (semi-low)
lowopen (low)

Vowel Backness

FeatureName
fntfront
cntcenter
bckback

Rounding and Labialization

FeatureNameRoundedPosition
unrunroundedNoClose to the jaw.
ptrprotrudedYesProtrude outward from the jaw.
cmpcompressedYesClose to the jaw.

The degree of rounding/labialization is specified using the following features:

FeatureName
mrdmore rounded
lrdless rounded

Vowel Rounding

RoundingFeaturePhoneme Model
unroundedunrunr
roundedrndptr if bck or cnt; cmp if fnt.

Syllabicity

FeatureName
sylsyllabic
nsynon-syllabic

Consonant Release

FeatureName
frrfricative release
aspaspirated
nrsnasal release
lrslateral release
unxno audible release (unexploded)

Tongue Root

The tongue root position can be specified using the following features:

FeatureSymbolName
atr◌̘advanced tongue root
rtr◌̙retracted tongue root

Fortis and Lenis

FeatureName
ftsfortis
lnslenis

Stress

FeatureName
st1primary stress
st2secondary stress
st3extra stress

Length

FeatureName
estextra short
hlghalf-long
lnglong

Rhythm

FeatureName
sbrsyllable break
lnklinked (no break)

Intonation

FeatureName
fbrminor (foot) break
ibrmajor (intonation) break
glrglobal rise
glfglobal fall

Tone Stepping

FeatureName
ustupstep
dstdownstep

Tones

Tones are defined using the following 3 properties:

tone_start  <value>
tone_middle <value>
tone_end    <value>

The <value> field for these properties is a number with one of the following values:

Tone<value>
extra high (top)5
high4
mid3
low2
extra low (bottom)1

A level tone can be specified by just using the tone_start value. A raising or falling tone can be specified using the tone_start and tone_end values. A raising-falling (peaking) or falling-raising (dipping) tone can be specified using all three values.

References

  1. Kirshenbaum, Evan, Representing IPA phonetics in ASCII (HTML). 1993.

  2. Kirshenbaum, Evan, Representing IPA phonetics in ASCII (PDF). 2001.

  3. International Phonetic Association, The International Phonetic Alphabet and the IPA Chart. 2015. Creative Commons Attribution-Sharealike 3.0 Unported License (CC-BY-SA).

  4. Wikipedia. International Phonetic Alphabet. 2017. Creative Commons Attribution-Sharealike 3.0 Unported License (CC-BY-SA).

  5. Dunn, R. H., Cainteoir Text-to-Speech Phoneme Features. 2013-2015.

  6. Wikipedia. Voiced glottal fricative. 2017, Creative Commons Attribution-Sharealike 3.0 Unported License (CC-BY-SA).

  7. Wikipedia. Extensions to the International Phonetic Alphabet. 2017, Creative Commons Attribution-Sharealike 3.0 Unported License (CC-BY-SA).

  8. Wikipedia. Fortis and lenis. 2017, Creative Commons Attribution-Sharealike 3.0 Unported License (CC-BY-SA).

  9. Wikipedia. Place of articulation. 2017, Creative Commons Attribution-Sharealike 3.0 Unported License (CC-BY-SA).