Aperiodicity, a characteristic of noise added to the vocal fold output, is responsible for distortion, raspiness, breathiness, jitter and shimmer.


‘Periodicity’ describes sounds that repeat exactly over and over again. In reality, these don’t exist; so, normally, sounds that we label as ‘periodic’ are actually ‘quasi-periodic’ (i.e., almost periodic; they repeat in a noticeably similar way over and over). Aperiodicities are, therefore, sounds that do not repeat. They’re effectively noise. However, this doesn’t mean that a supraglottic (i.e., occuring above the vocal fold level) periodic vibratory behavior isn’t aperiodic. If that vibratory behavior is out of sync with the vocal fold motion, that is also considered an aperiodicity. So, in general, aperiodicities are sounds that are out of sync with vocal fold motion.

Despite the usually quasi-periodic output of the vocal fold vibratory motion, we often come across aperiodicity in phonation.

In a spectrogram, aperiodicity is identified as:

(a) bands of noise in between the harmonics produced by the vocal folds in the cases of distortion, raspiness and breathiness, or

(b) irregular fluctuations in pitch (jitter) or intensity (shimmer) in all or part of the harmonic series


Aperiodicity is a perfectly safe technique. However, it can be a sign of vocal deterioration if it’s happening unintentionally, but even then those aperiodicities may be non-issues, especially if they’re not intense, consistent or widely undulating (wide jitter/shimmer).

Aperiodicities may be learned or unlearned behaviorally by any normal voice with training.



Distortion occurs when the aperiodic noise (which can be of glottic or supraglottic origin; i.e., it can be produced by the true vocal folds or by vibrating bodies above it) picks up enough intensity to significantly compete with the quasi-periodic sound output originating from the vocal folds.


Raspiness is the result of very audible aperiodicities with formants that hover around specific pitches but do not compete with harmonics in intensity.

It includes certain types of glottic and supraglottic distortion, which may be interpreted as raspiness, and, more importantly, also includes aspirate rasp (which happens when the false vocal folds approximate so that air slightly vibrates their edges, producing a pitch).


Breathiness occurs when a significant volume of air leaks through the glottis without being vibrated to a pitch, resulting in white noise.

Jitter & Shimmer

Jitter is characterized by an abnormal pitch flux, whereas shimmer is characterized by an abnormal amplitude flux.

Although these can be signs of unvoluntary destabilization or difficulty (as happens with age), jitter and shimmer can still be done purposefully for artistic choice.

Artistic freedom allows jitter and shimmer to be performed purposefully, for effect; however, these can be signs of unvoluntary destabilization or instability, as usually happens with age and during physical exercise. The direct causes involve loss of stabilizer muscle tone and the occurrence of tremors in muscles involving the Power and/or Source of the voice.


American Speech-Language-Hearing Association. Voice Disorders: Dysphonia. (n.d.). Retrieved from https://www.asha.org/PRPSpecificTopic.aspx?folderid=8589942600&section=Signs_and_Symptoms

Jotz, G. P., Cervantes, O., Abrahão, M., Settanni, F. A. P., & Angelis, E. C. D. (2002). Noise-to-Harmonics Ratio as an Acoustic Measure of Voice Disorders in Boys. Journal of Voice16(1), 28–31. doi: 10.1016/s0892-1997(02)00068-1

Kreiman, J., & Gerratt, B. R. (2005). Perception of aperiodicity in pathological voice. The Journal of the Acoustical Society of America117(4), 2201–2211. doi: 10.1121/1.1858351


Laryngeal Vibratory Mechanisms


The laryngeal vibratory mechanisms M0, M1, M2 and M3 refer to the patterns of true vocal fold vibrating behavior that often gave rise to some of the sensations prototypically designated by registers/voices, such as pulse phonation, strohbass, grave, chest voice, mixed voice, falsetto, head voice, flageolet, super head voice, and whistle. Some of these, like super head voice, exist as customized extensions/aspects of other registers/voices. Extensions and aspects refer to specialized versions of a given voice/register. The sensations felt in different voices, registers, extensions and aspects can usually be more concretely correlated with these laryngeal vibratory mechanisms.


In music and speech, humans have long needed to classify the various ways in which the true vocal folds vibrate to produce sound. Before the advent of scientifically visualizing the larynx, first popularized by Manuel Garcia, primitive vocal fold descriptions relied on bodily perceptions secondary to true vocal fold activity at different pitch ranges. Consequently, the classification system based on vocal registers — vocal fry, chest voice, mixed voice, falsetto, head voice, flageolet, whistle register — became commonplace. Each of these is named for perceptions and/or misconceptions about the effects that vocal fold vibrations have on the voice and the body at large. For instance, head register/head voice is believed to be “placed” in the head, while chest register/chest voice is believed to be “placed” in the chest.

As the definition of each vocal register varies drastically between different singers and different vocal traditions, it is hard to use these register/voice terms productively. Alternatively, the concept of laryngeal vibratory mechanisms (LVMs) more objectively correlates with proven vocal fold behaviors.


What is a laryngeal vibratory mechanism?

The laryngeal vibratory mechanisms, which describe the different configurations that the true vocal folds can take independently of pitch range, have concrete borders. Therefore, each laryngeal vibratory mechanism can be thought of as a vocal ‘gear shift’: there are overlaps in range where each may be used, but it’s only possible to use one at a time. Behavior in one LVM can parallel features correlated with another LVM, but there is only one true LVM that can describe the behavior of the vocal folds at any given instance. For example, vocal fold contact pulsations known as “fry” are a constitutive feature of M0, but these pulsations can happen optionally in mechanisms M1, M2 and M3. Using fry in M1 would emulate a feature of M0, but this is not “mixing” with M0. Only one mechanism can be used at a given moment.

As you transition between laryngeal vibratory mechanisms, the vocal folds undergo different behavioral changes. For instance, due to the existing thresholds for these parameters in each laryngeal vibratory mechanism, open quotient increases with the succession of LVMs M0-M3, while closed quotient and recruited vocal fold mass in vibration decrease M0-M3. These changes are realized by varying levels of activation of the cricothyroid, thyroarytenoid, interarytenoid and cricoarytenoid muscles. The activity of these muscles largely controls the adduction and abduction of the vocal folds.

Consequently, these physical changes result in a decay of the harmonic series produced, a decrease in amplitude of the harmonics, with the succession of LVMs M0-M3.

What are the laryngeal vibratory mechanisms?

The different laryngeal vibratory mechanisms, which have distinct common usage ranges and are created by different patterns of vocal fold vibration, can be further subdivided into vocal aspects:

  • M0 (?? – D2): pulse phonation, strohbass
    • Slack, extended, full-bodied vocal fold motion with vocal fry
  • M1 (G1 – A5): grave, chest voice/register, mixed voice
    • Modal speech mechanism; majority true vocal fold engagement in vibration.
  • M2 (Bb3 – A6): mixed voice, falsetto, head voice/register
    • Thinner vocal folds; outer edge (“cover”) engagement in vibration.
  • M3 (A5 – ??): flageolet, whistle
    • Partial anterior vocal fold tip engagement in vibration.


Further topics in mechanism usage


In the instants of transition between laryngeal vibratory mechanisms, the concrete borders that demarcate and define them often produce what is usually referred to as the pop, passaggio, break or, usually in the case of switching between M1 and M2, yodel. These breaks are defined as the frequency/amplitude jumps and pauses in phonation caused by committed changes in vocal fold configuration between mechanisms. Between different aspects of the same mechanism, one might observe a “pseudopassaggio” caused by changes in vocal fold coordination.


Each laryngeal vibratory mechanism can be further subdivided into different aspects: specialized usages of mechanisms with respect to range and vocal fold configurations.


Roubeau B, Henrich N, Castellengo M. Laryngeal Vibratory Mechanisms: The Notion of Vocal Register Revisited. 2009.

Nathalie Henrich, Christophe D’Alessandro, Boris Doval, Michèle Castellengo. Glottal open quotient in singing: Measurements and correlation with laryngeal mechanisms, vocal intensity, and fundamental frequency. Journal of the Acoustical Society of America, Acoustical Society of America, 2005, 117 (3), pp.1417-1430.

Common usage ranges were inferred from proprioception and spectrographic analysis due to the lack of research on the topic.



Nasality is the subjective interpretation of nasalance and is achieved by allowing airflow through the nasal cavity. A voiced sound can be classified according to its nasality as hyponasal, homeonasal or hypernasal.

Velopharyngeal Port

From Anatomy and Physiology of the Velopharyngeal Mechanism:

The Velopharyngeal Port (highlighted in green)
Reused with modification from Blausen.com staff (2014). “Medical gallery of Blausen Medical 2014”. WikiJournal of Medicine 1 (2). DOI:10.15347/wjm/2014.010. ISSN 2002-4436.

The velopharyngeal mechanism consists of a muscular valve that extends from the posterior surface of the hard palate (roof of mouth) to the posterior pharyngeal wall and includes the velum (soft palate), lateral pharyngeal walls (sides of the throat), and the posterior pharyngeal wall (back wall of the throat).

At the level of the oropharynx, the outflowing air is able to escape through the oral cavity (that of the mouth) and/or through the nasal cavity. Your velopharyngeal port (VPP) is mostly responsible for determining how much of the outflowing air escapes through each cavity. Given that the oral and nasal cavities are morphologically (and in other aspects) different, they’re different filters, which will interact with the outflowing vibrating air in different ways, yielding different end sounds that were, however, both produced by the same source. These two different sounds can be isolated: the oral output sound by phonating with pinched nostrils and the nasal output sound by phonating with a closed mouth.

By controlling the degree of opening of the velopharyngeal port (VPP), you should be able to control the volume of air that flows through the nose, rather than through the mouth. However, this is not the full story. For example, if you’re experiencing nasal congestion or if you pinch your nostrils, your low (or even null) nasal airflow won’t be the result of a closed velopharyngeal port. Instead, there’s no airflow because the air isn’t allowed to escape through the nostrils, due to obstruction. In addition, among other factors, the position of the tongue may help redirect the airflow more in the direction of either cavity, influencing the relative airflow at the nose without affecting the degree of opening of the velopharyngeal port.


Nasality can be defined as the shared tonal quality of voiced sounds produced with nasal airflow. Depending on how pronounced this tonal quality is, each voiced sound can be further classified as hypernasal, homeonasal or hyponasal. It’s important, though, not to think of these classifications as discrete/distinct, but of nasality as a gradient that includes these classifications.

In addition, it’s also important to distinguish the cause of the hypernasality/hyponasality. One thing is to choose to be consistently hypernasal or hyponasal in your voice acting or singing (like Heather Headley, Marie Daulne of Zap Mama, Gwen Stefani, Yukimi Nagano of Little Dragon, Missy Elliott, Charli XCX and Patrick Stump of Fall Out Boy). Another thing altogether is to be hypernasal or hyponasal as a result of illness, allergies, turbinate hypertrophy, velopharyngeal dysfunctions (such as that of cleft palate) or other medical conditions.

Similarly, hypernasality can also be involuntary, but able to be behaviorally corrected, as is the case in the deaf. Because they’re usually only able to speak based on what they see or is taught to them (mouth shape and tongue positioning), the nuance of nasality isn’t something they’re usually made aware of. Therefore, their velum almost always remains in a default, lowered position, resulting in constant and heavy hypernasality.

On the other hand, in voice feminization, due to the effect of the larger nasal cavity of AMABs (assigned males at birth) on voiced sounds, it might be useful to eliminate nasalance in all vowels and consonants that don’t inherently require nasality.


Nasalance, however, is a measure calculated as the intensity of the nasal sound output relative to the oral sound output.

So how do these concepts differ? They differ in that, while nasalance constitutes an objective, physical measurement, nasality constitutes the subjective dimension of nasalance. In other words, nasality is how we mentally interpret nasalance, depending on the vowel/consonant being pronounced, among other aspects. Why does it depend on the vowel/consonant? Because some are only distinguishable by their nasality. Additionally, the nasality of some vowels depends on which consonants surround them.

Practical application

In practice, the most important component of the velopharyngeal port is, by far, the velum (the soft palate). Lowering the velum increases the opening of the velopharyngeal port (VPP). Raising the velum decreases the opening of the velopharyngeal port. The pharyngeal wall then allows for a tighter seal between the oropharynx and nasopharynx. It’s important not to think of the velum as a structure that’s either raised or lowered. Between the extremes of velum position, there’s a number of different positions that partially account for the gradient of hypernasality-hyponasality.

There’s plenty of exercises you can do in order to learn to control the velopharyngeal port.

When you walk past foul-smelling garbage, you might have a tendency to raise your velum and breathe only through the mouth, so as to not sense the smell. Knowing this, after imitating this scenario a couple of times, you should start being able to recognize movement of the velum..

Another useful exercise is to cover your mouth with your hand and feel the vibrations in your hand as you sing or speak. The more nasal you are, the less you will feel the vibrations in your hand. Alternatively, the less nasal you are, the more you will feel the vibrations in your hand. As you do this, pay attention to the sensation of the different positions of the velum.

One more tip is to use nasal phonemes, like /ŋ/ (as in hanger, spring, belong), in order to find the configuration for hypernasality, and really try to accentuate the nasal airflow. Pay attention to the sensation of the velum moving as you increase/decrease your nasality.

Once you’re able to, when practicing, confidently associate the opening of the velopharyngeal port with the degree of nasality, you should explore how your nasality and nasalance change as you go through your entire range. Be wary of uncontrolled hyponasality in belting and in the higher range of your falsetto/head voice (M2) and try to counteract it if you so wish.

A very important notion to keep in mind is that hyponasality and hypernasality are not necessarily problematic. They can be used for artistic effect. They’re not inherently “wrong”. A problem comes up only when you’re not able to control your nasality to sound how you want to sound.


Perry, J. (2011). Anatomy and Physiology of the Velopharyngeal Mechanism. Seminars in Speech and Language, 32(02), 083-092. doi:10.1055/s-0031-1277712

Bunton, K., & Story, B. H. (2011). The relation of nasality and nasalance to nasal port area based on a computational model. The Cleft palate-craniofacial journal : official publication of the American Cleft Palate-Craniofacial Association49(6), 741-9.

Kim, E. Y., Yoon, M. S., Kim, H. H., Nam, C. M., Park, E. S., & Hong, S. H. (2012). Characteristics of nasal resonance and perceptual rating in prelingual hearing impaired adults. Clinical and experimental otorhinolaryngology5(1), 1-9.

Supraglottic Distortion


Distortion is an aperiodic effect that introduces especially loud ‘noise’ to the voice. Supraglottic distortion is a classification of distortion originating from vibratory activity above the level of the vocal folds.

Definition and classification

Periodic sounds in a strict sense are sounds that repeat exactly over and over again. In reality, the human voice is not a perfect machine so normal vocal fold vibration tends to actually be ‘quasi-periodic’ (i.e., almost periodic; they repeat in a noticeably similar way over and over). This too is scientifically considered a periodic sound.

Aperiodicities are, therefore, sounds that do not repeat. They’re effectively noise. However, this doesn’t mean that a periodic supraglottic vibration doesn’t count as an aperiodicity. If that vibratory behavior is out of sync with the vocal fold motion, that is also considered an aperiodicity. So, in general, aperiodicities are sounds that are out of sync with vocal fold motion.

Distortion occurs when the aperiodic noise picks up enough intensity to significantly compete with the periodic sound output originating from the vocal folds.

Growl is a type of distortion. Like most things in voice, because it’s a colloquial term, its definition is highly variable in different communities. However, the approximate definition is that it’s a heavier distortion with supraglottic vibration (vibration of bodies above the vocal folds) that tends to be lower in pitch than what people would call scream and/or less ringy.

In addition to growl, there are several other forms of more specifically used terms to define supraglottic distortions in the voice.

Anatomy of supraglottic distortions

The supraglottic vibrations involve three accessory vibrating bodies that are positioned above the vocal fold level. These are the epiglottis, aryepiglottic folds, and ventricular folds (which are also known as false vocal folds):

A top-down view of the larynx.

The epiglottis is the highest of these vibrating bodies. It’s shaped like a leaf hanging back from the root of the tongue over the larynx (the voice box). When the tongue rises during swallowing, the epiglottis goes down to cover the opening to the larynx, preventing food and liquid from entering the larynx and, consequently, the trachea and lungs.

To locate and feel the epiglottis, one can produce a sound known as knödel. This borrowed German word describes the sound of phonating with a dumpling in your mouth. During knödel, the back of the tongue pushes the epiglottis back so it folds over the larynx. In this position, it is predisposed to vibrate with sufficient air pressure.

The aryepiglottic folds extend all the way from the sides of the epiglottis down to the arytenoid cartilages, where the vocal folds attach. Their constriction approximates the epiglottis and arytenoid cartilages, highlighting piercing frequencies above the note we perceive. It’s this characteristic that allows babies’ voices to carry over loud ambient noise with high spectral energies that instinctively trigger human attention.

A witch voice can act as a trigger to locate and feel the aryepiglottic folds constrict. From a controlled position of constriction, the aryepiglottic folds can be triggered to vibrate with sufficient air pressure.

The false vocal folds (FVFs; also called vestibular folds or ventricular folds) sit right above the true vocal folds. Like the epiglottis, they help prevent food aspiration.

Making a wheezing sound can act as a trigger for extending the false vocal folds. While extended, sufficient air pressure can move the false vocal folds to vibrate.

In paradoxical vocal fold motion disorder (PVFMD), both the true vocal folds and the false vocal folds constrict during inhalation and relax during exhalation. This paradoxical breathing behavior causes difficulty in breathing accompanied by a wheezing sound. Symptoms are usually confused with those of asthma.


Supraglottic distortion involves a contraction that allows the vibration of the body in question to be motivated by airflow. The vocal folds shouldn’t come closer to each other in order to reach the threshold pressure for supraglottic vibration — their contraction should be minimized.

With that in mind, supraglottic distortion has been proven to be safe in both the short and long-term.


The definitions vary from source to source. A definition is proposed here in “Anatomy of a growl”.


TVF: Mid to high M1/M2

FVF: more aggressively vibrated

Aryepiglottic folds: vibration and twang are common


TVF: Lower pitched M1/M2 (usually M1)

FVF: variable vibration levels

Low larynx position


TVF: Variable

Epiglottis: Vibrated


TVF: High pitch M2/M3

FVF: Aggressively vibrated


TVF: Low pitched M1

FVF: Aggressively vibrated at a frequency about half the TVF vibration

Aryepiglottic folds: Variable


The dynamic combinatory engagement of the true vocal folds and different supraglottic vibratory bodies creates a vast array of types of supraglottic distortion.

Whether the aryepiglottic folds participate in supraglottic distortion is still under scientific review.


Aaen, M., Mcglashan, J., & Sadolin, C. (2020). Laryngostroboscopic Exploration of Rough Vocal Effects in Singing and their Statistical Recognizability: An Anatomical and Physiological Description and Visual Recognizability Study of Distortion, Growl, Rattle, and Grunt using laryngostroboscopic Imaging and Panel Assessment. Journal of Voice34(1). doi: 10.1016/j.jvoice.2017.12.020

Caffier, P. P., Nasr, A. I., Rendon, M. D. M. R., Wienhausen, S., Forbes, E., Seidner, W., & Nawka, T. (2018). Common Vocal Effects and Partial Glottal Vibration in Professional Nonclassical Singers. Journal of Voice32(3), 340–346. doi: 10.1016/j.jvoice.2017.06.009

Paradoxical Vocal Fold Motion Disorder. Retrieved from http://www.otolaryngology.pitt.edu/centers-excellence/voice-center/conditions-we-treat/paradoxical-vocal-fold-motion-disorder


The Power-Source-Filter Model of Voice Production


Diagram of the respiratory system, by Theresa Knott (CC BY-SA 3.0)

Phonation — the act of using one’s voice — is a process that involves many interconnected parts. The complex system that is the voice can be divided into three subsystems: a power source (the respiratory system), a sound source (typically the vocal folds), and an acoustic filter (the vocal tract).

This Power-Source-Filter (PSF) model of voice production is a specialization of the Source-Filter model, widely used in speech pathology, speech analysis and speech synthesis, to singing. This model constitutes the central dogma of singing, the foundation for understanding how the human voice works.


The lungs are two large organs of respiration that are located in the chest cavity. They are responsible for the exchange of oxygen and carbon dioxide that make human life possible. They are also integral to the supply of airflow to the vocal folds (also mistakenly known as vocal cords or, even worse, vocal chords) during exhalation, where this air pressure is converted to sound.


The lungs are not capable of inflating by themselves. They will expand only when there is an increase in the volume of the chest cavity, as they are pneumatically (i.e., through suction) attached to the inside wall of the chest.

At rest, this is achieved primarily through contraction of the diaphragm and, to a much lesser extent, the external intercostal muscles (muscles that run between the ribs and help form and move the chest wall), whereby the diaphragm flattens from a domed saddle-like shape to a shape resembling a flat disc, and the external intercostal muscles increase the circumference of the chest by raising the ribs. The negative pressure generated by the increase in the volume of the lungs leads to air streaming into them — the process of inhalation. The depression of the diaphragm during contraction consequently leads to pressure that promotes the downward displacement of the abdominal organs.

  • Diaphragmatic breathing (belly breathing): If this downward displacement isn’t resisted by the abdominal wall muscles, it leads to a bulging out of the abdominal wall. Although it’s not possible to breathe or sing through the diaphragm, it’s this coordination that that common imagery is meant to elicit. It’s important to keep in mind that exaggerating this position does not improve breathing.
  • Thoracic breathing (chest breathing): This type of coordination is characterized by a circumferential expansion of the ribs, caused by the contraction of the external intercostal muscles, as well as various accessory muscles of inhalation — muscles that aid in inhalation when contraction of the diaphragm doesn’t elicit sufficient air intake.
  • Clavicular breathing (high, shoulder breathing): In this type of coordination, the descent of the diaphragm is limited. Instead, you might notice, for example, contraction of the sternocleidomastoid (SCM) muscle or of the scalene muscles — all accessory muscles of inhalation —, which elevate the sternum and upper ribs, respectively, allowing for expansion of the lungs. However, the smaller size of the upper ribs heavily limits the volume of this expansion.

Breathing coordination isn’t, though, all black and white — it likely won’t adhere strictly to any one of the above coordinations.

Despite common belief, like all skeletal muscles, the diaphragm can be consciously controlled.


Unlike inhalation, at rest, exhalation isn’t initiated through muscular contraction. This is due to the lungs’ rubber band-like material properties, which make it so that the more stretched they are, the higher their tendency to collapse into the smallest shape possible.

So, with the release of diaphragmatic contraction — the end of inhalation —, the rubber band-like fibers of the lungs lead to the lungs’ completely passive collapse, decreasing chest volume and stretching the diaphragm back into its domed saddle-like shape as a result, while also leading to air flow out of the body — the process of exhalation. The lungs don’t completely collapse, however, because of resistance from the rigidity of the inner chest wall.

Just as there are accessory muscles of inhalation — muscles that aid in breathing in conditions that demand a higher respiratory rate —, there are also accessory muscles of exhalation, which act by pulling the ribcage down or, in some other way, increasing pressure within the abdominal cavity, leading to the upward movement of the diaphragm. In turn, this upward movement causes a faster reduction of the volume of the thoracic cavity (i.e., the chest cavity), causing air to flow out of the body at a faster rate than that of exhalation at rest.

The contraction of these accessory muscles of exhalation during singing can, therefore, be used to indirectly move the still-contracted diaphragm, creating a mechanism for pressure-controlled exhalation — a process which you may have heard of under the misleading name “breath support”.

You can then contract your abdominal wall muscles to push back on the guts and, thus, indirectly move the still contracted diaphragm, giving you a mechanism for pressure-controlled exhalation — the process of breath support.

Although the Italian term appoggio is defined in different ways by different singers and pedagogues — it’s often indirectly taught using imagery —, it’s generally agreed to work by allowing for pressure-controlled exhalation as well as increased air intake during inhalation.

Both inhalation and exhalation have an effect on subglottal air pressure — i.e., the air pressure just underneath the plane of the vocal folds —, with higher subglottal air pressure being associated primarily with higher amplitude sounds (louder sounds), but also, to a lesser extent, an increased fundamental frequency (higher pitches).


True Vocal Folds

Image by Alan Hoofring through https://visualsonline.cancer.gov

The true vocal folds (TVFs), which are located inside the larynx, are the main sound source of the voice. When phonating, the vocal folds alternate between open and closed positions. Each vibration allows a very brief puff of air to come through the glottis — the vocal folds and the slit-like opening between them —, dividing the outflowing air into regions of high and
low air pressure. Therefore, the frequency of vocal fold vibration — how many glottal opening-closing cycles occur in each second — determines the pitch at which you’re phonating.

The vocal folds aren’t structurally uniform; they’re layered and are composed of five tissue layers.
The different stiffness characteristics of these layers result in three mechanically decoupled groupings: the body, the transitional layer, and the cover of the vocal folds. The transitional layer consists of connective tissue, which connects the softer cover of the vocal folds to the more rigid thyrovocalis muscle of the body.

Cover (mucosa): the epithelium and superficial lamina

Transitional layer (vocal ligament): the intermediate lamina
propria and the deep lamina propria.

Body: the thyrovocalis muscle.

The frequency at which the vocal folds vibrate can be controlled through coordination of muscular engagement, by partially determining the elasticity and proximity of the vocal folds and the subglottal pressure.

Understanding of laryngeal anatomy is fundamental to understanding the processes behind the enormous diversity of voice qualities that this organ can produce.

Your vocal folds are directly responsible for a variety of aspects of your voice, such as:

  • Onsets and offset: how phonation is initiated (in the case of onsets) and ended (in the case of offsets). The type of onset is determined by the state of glottal opening/closure before breath flow is initiated, whereas the type of offset is determined by the state of glottal opening/closure right after breath flow is terminated.
  • Laryngeal vibratory mechanisms: the four different patterns of vocal fold vibrating behavior that are responsible for vocal registers and the passaggios (or pops, or yodels, or cracks, or breaks) in the voice when transitioning between mechanisms.
  • Open Quotient (OQ): the ratio of the duration of the open phase (when the glottis is open in each cycle of vibration) to that of the duration of each complete cycle of vibration. Closed quotient (CQ) is calculated as follows: 1 – OQ, therefore, a high closed quotient implies a low open quotient, and vice-versa. A higher closed quotient produces a buzzier sound at the vocal fold level.
  • Desynchronization:
    • Breathiness: the turbulent sound that might noticeably accompany your voice and which results from a significant volume of air passing through the glottis without being vibrated to a pitch.
    • Fry: the popping or rattling sound produced by a slack pattern of vocal fold vibrating behavior, which can be added to any laryngeal vibratory mechanism (not just M0).
    • Cordal polyphony: the simultaneous production of multiple pitches through specific desynchronous vibration of the vocal folds.

False Vocal Folds

The false vocal folds (or vestibular folds or ventricular folds or superior vocal folds) lie above the vocal folds. During swallowing, the false vocal folds (FVFs) squeeze together to help prevent food and drink from entering the airway.

In extended vocal technique — the nontraditional extension of traditional vocal technique —, the false vocal folds can vibrate to produce a growl or they can be partially squeezed together to produce a grunt.

In addition, squeezing the false vocal folds together can cause the true vocal folds to approximate.

Aryepiglottic Folds

The aryepiglottic folds (AFs), located at the entrance of the larynx, compose the sides of the aryepiglottic funnel (or aryepiglottic sphincter, though it’s not an actual sphincter), whose opening can be constricted to produce twang.

In extended technique, the aryepiglottic folds can vibrate to produce a growl.


In phonation, though, sound isn’t always produced by the larynx. It can also be produced by the mouth without any contribution from laryngeal structures, in buccal, esophageal or laryngeal speech, for example.

Pictured are the true vocal folds (red), the false vocal folds (blue) and the aryepiglottic folds (green).


The vocal tract
Illustration from Anatomy & Physiology, Connexions Web site. http://cnx.org/content/col11496/1.6/, Jun 19, 2013, OpenStax College

Although we perceive only one pitch when listening to someone speak or sing, in reality, we’re also hearing a multiplicity of different frequencies, called overtones, each with different perceivable loudness (caused by the amplitudes of the sound waves in each frequency). The pitch we hear is determined by the frequency of largest amplitude, which is usually the fundamental frequency (fo), the frequency of vocal fold vibration.

As the sound that was produced by the source passes through the filter (the remainder of the larynx, the pharynx, the mouth and the nose), the amplitudes of the different overtones that compose the sound are each lowered or increased. The way in which the filter heightens or dampens each frequency is responsible for the tone, timbre or texture of the voice. The frequencies amplified the most by the geometric properties of the vocal tract are called formants.

This doesn’t mean, however, that your tone is fixed. Your vocal tract is highly modifiable through movement of many different structures:
the larynx, the false vocal folds, the true vocal folds, the epiglottis (the protective cover flap of the larynx), the aryepiglottic funnel, the velum (or soft palate), the jaw, the tongue, and the lips, for example, from shortest to longest distance from the source.

The effect of the filter on the source sound.

This video by Kim Neely, a licensed speech-language pathologist with a Master of Music in vocal performance, is an incredible introduction to this model and how it’s used clinically, with examples of how it can be used for troubleshooting during singing.


Zhang, Z. (2016). Cause-effect relationship between vocal fold physiology and voice production in a three-dimensional phonation model. The Journal of the Acoustical Society of America139(4), 1493–1507. http://doi.org/10.1121/1.4944754

Vocal Tract Length


Manipulation of the vocal tract length (VTL), measured from the glottis1 to the lips, is the main way to alter the vocal tract volume (VTV), a measure of the space inside the vocal tract. By altering the vocal tract volume, one can make their voice sound lighter or darker.


Illustration from Anatomy & Physiology, Connexions Web site. http://cnx.org/content/col11496/1.6/, Jun 19, 2013, OpenStax College

Although we perceive only one pitch when listening to someone speak or sing, in reality, we’re also hearing a multitude of different frequencies, called overtones, each with different perceivable loudness (caused by the amplitudes of the sound waves in each frequency). The pitch we hear is determined by the frequency of largest amplitude, which is usually the fundamental frequency (fo), the frequency of vocal fold vibration.

As the sound that was produced by the source passes through the filter (the remainder of the larynx2, the pharynx, the mouth and the nose), the amplitudes of the different overtones that compose the sound are each lowered in different amounts. The way in which the filter relatively heightens or dampens each frequency is responsible for the tone, timbre or texture of the voice.

This doesn’t mean, however, that your tone is fixed. Your vocal tract (i.e. the space from your vocal folds to your lips) is highly modifiable through the movement of its many different structures: the larynx, the true vocal folds, the false vocal folds, the epiglottis (the protective cover flap of the larynx), the aryepiglottic funnel, the velum (or soft palate), the jaw, the tongue, and the lips, for example, from shortest to longest distance from the source.

The geometric volume of the vocal tract (i.e. vocal tract volume, or VTV) affects how the voice sounds for the same reason that the sound of a trombone sounds deeper and darker than that of a trumpet: the smaller the resonating space, the higher the frequencies emphasized as the sound travels from the source through the filter. Therefore, a smaller vocal tract (created by raising the larynx) emphasizes higher frequencies, creating brighter sounds, whereas a larger vocal tract (created by lowering the larynx) emphasizes lower frequencies, creating a darker sound.

When you swallow, your larynx quickly rises and then slides back down. This illustrates the broad range of motion your larynx is capable of, and, by extension, the amount by which you can shorten or lengthen the vocal tract.

For singers, public speakers and voice actors

For singers, public speakers and voice actors, raising and lowering the larynx is imperative for artistic freedom. The tonal changes engendered by this vocal tract modification help tell a story of emotion, seeing as sounds produced with a raised larynx are usually associated with brightness and happiness and sounds produced with a lowered larynx are usually associated with darkness, gloom or authority.

For transgender voice users

For transgender voice users, raising and lowering the larynx is the primary (but not only) aspect to manipulate in order to significantly alter one’s perceived vocal gender. With puberty, the larynx increases in size, the laryngeal prominence (or Adam’s Apple) of people who were assigned male at birth becomes more pronounced, the larynx drops and the vocal folds increase in size. For transgender men, hormone replacement therapy triggers these changes in what is commonly known as second puberty3. However, even without hormone replacement therapy, lowering the larynx can still be a useful tool for transgender men in obtaining a passing voice4. Because hormone replacement therapy (HRT) can’t revert the effects that puberty has on transgender women, they have to work with the instrument they already have, raising their larynx to emulate the shorter vocal tract length of cisgender5 females. Fortunately, with practice and dedication, this is completely achievable.

Practical application

The easiest and safest method6 for controlling larynx height (and, therefore, vocal tract length) is the big dog small dog (BDSD) exercise. The instructions are simple: pant like a small dog to raise your larynx or pant like a big dog to lower your larynx. From there, you can break off into words that start with an ‘h’, then into full sentences that begin with the same letter, and, finally, onto improvised speech.

Additionally, yawning is very effective in lowering the larynx.

Keeping the larynx in a non-neutral position requires practice, in order to develop the proper larynx-raising or larynx-lowering musculature and mind-body connection, while making sure to pay attention not to allow excessive constriction at the vocal fold level or the excessive activation of unnecessary muscles (like the masseter muscle).

Zheanna, from transvoicelessons.com, explains how to manipulate vocal tract length for transgender voice modification.