Phonation — the act of using one’s voice — is a process that involves many interconnected parts. The complex system that is the voice can be divided into three subsystems: a power source (the respiratory system), a sound source (typically the vocal folds), and an acoustic filter (the vocal tract).
This Power-Source-Filter (PSF) model of voice production is a specialization of the Source-Filter model, widely used in speech pathology, speech analysis and speech synthesis, to singing. This model constitutes the central dogma of singing, the foundation for understanding how the human voice works.
The lungs are two large organs of respiration that are located in the chest cavity. They are responsible for the exchange of oxygen and carbon dioxide that make human life possible. They are also integral to the supply of airflow to the vocal folds (also mistakenly known as vocal cords or, even worse, vocal chords) during exhalation, where this air pressure is converted to sound.
The lungs are not capable of inflating by themselves. They will expand only when there is an increase in the volume of the chest cavity, as they are pneumatically (i.e., through suction) attached to the inside wall of the chest.
At rest, this is achieved primarily through contraction of the diaphragm and, to a much lesser extent, the external intercostal muscles (muscles that run between the ribs and help form and move the chest wall), whereby the diaphragm flattens from a domed saddle-like shape to a shape resembling a flat disc, and the external intercostal muscles increase the circumference of the chest by raising the ribs. The negative pressure generated by the increase in the volume of the lungs leads to air streaming into them — the process of inhalation. The depression of the diaphragm during contraction consequently leads to pressure that promotes the downward displacement of the abdominal organs.
- Diaphragmatic breathing (belly breathing): If this downward displacement isn’t resisted by the abdominal wall muscles, it leads to a bulging out of the abdominal wall. Although it’s not possible to breathe or sing through the diaphragm, it’s this coordination that that common imagery is meant to elicit. It’s important to keep in mind that exaggerating this position does not improve breathing.
- Thoracic breathing (chest breathing): This type of coordination is characterized by a circumferential expansion of the ribs, caused by the contraction of the external intercostal muscles, as well as various accessory muscles of inhalation — muscles that aid in inhalation when contraction of the diaphragm doesn’t elicit sufficient air intake.
- Clavicular breathing (high, shoulder breathing): In this type of coordination, the descent of the diaphragm is limited. Instead, you might notice, for example, contraction of the sternocleidomastoid (SCM) muscle or of the scalene muscles — all accessory muscles of inhalation —, which elevate the sternum and upper ribs, respectively, allowing for expansion of the lungs. However, the smaller size of the upper ribs heavily limits the volume of this expansion.
Breathing coordination isn’t, though, all black and white — it likely won’t adhere strictly to any one of the above coordinations.
Despite common belief, like all skeletal muscles, the diaphragm can be consciously controlled.
Unlike inhalation, at rest, exhalation isn’t initiated through muscular contraction. This is due to the lungs’ rubber band-like material properties, which make it so that the more stretched they are, the higher their tendency to collapse into the smallest shape possible.
So, with the release of diaphragmatic contraction — the end of inhalation —, the rubber band-like fibers of the lungs lead to the lungs’ completely passive collapse, decreasing chest volume and stretching the diaphragm back into its domed saddle-like shape as a result, while also leading to air flow out of the body — the process of exhalation. The lungs don’t completely collapse, however, because of resistance from the rigidity of the inner chest wall.
Just as there are accessory muscles of inhalation — muscles that aid in breathing in conditions that demand a higher respiratory rate —, there are also accessory muscles of exhalation, which act by pulling the ribcage down or, in some other way, increasing pressure within the abdominal cavity, leading to the upward movement of the diaphragm. In turn, this upward movement causes a faster reduction of the volume of the thoracic cavity (i.e., the chest cavity), causing air to flow out of the body at a faster rate than that of exhalation at rest.
The contraction of these accessory muscles of exhalation during singing can, therefore, be used to indirectly move the still-contracted diaphragm, creating a mechanism for pressure-controlled exhalation — a process which you may have heard of under the misleading name “breath support”.
Although the Italian term appoggio is defined in different ways by different singers and pedagogues — it’s often indirectly taught using imagery —, it’s generally agreed to work by allowing for pressure-controlled exhalation as well as increased air intake during inhalation.
Both inhalation and exhalation have an effect on subglottal air pressure — i.e., the air pressure just underneath the plane of the vocal folds —, with higher subglottal air pressure being associated primarily with higher amplitude sounds (louder sounds), but also, to a lesser extent, an increased fundamental frequency (higher pitches).
True Vocal Folds
The true vocal folds (TVFs), which are located inside the larynx, are the main sound source of the voice. When phonating, the vocal folds alternate between open and closed positions. Each vibration allows a very brief puff of air to come through the glottis — the vocal folds and the slit-like opening between them —, dividing the outflowing air into regions of high and
low air pressure. Therefore, the frequency of vocal fold vibration — how many glottal opening-closing cycles occur in each second — determines the pitch at which you’re phonating.
The vocal folds aren’t structurally uniform; they’re layered and are composed of five tissue layers.
The different stiffness characteristics of these layers result in three mechanically decoupled groupings: the body, the transitional layer, and the cover of the vocal folds. The transitional layer consists of connective tissue, which connects the softer cover of the vocal folds to the more rigid thyrovocalis muscle of the body.
Cover (mucosa): the epithelium and superficial lamina
Transitional layer (vocal ligament): the intermediate lamina
propria and the deep lamina propria.
Body: the thyrovocalis muscle.
The frequency at which the vocal folds vibrate can be controlled through coordination of muscular engagement, by partially determining the elasticity and proximity of the vocal folds and the subglottal pressure.
Your true vocal folds (pictured in red) are directly responsible for a variety of aspects of your voice, such as:
- Onsets and offset: how phonation is initiated (in the case of onsets) and ended (in the case of offsets). The type of onset is determined by the state of glottal opening/closure before breath flow is initiated, whereas the type of offset is determined by the state of glottal opening/closure right after breath flow is terminated.
- Laryngeal vibratory mechanisms: the four different patterns of vocal fold vibrating behavior that are responsible for vocal registers and the passaggios (or pops, or yodels, or cracks, or breaks) in the voice when transitioning between mechanisms.
- Open Quotient (OQ): the ratio of the duration of the open phase (when the glottis is open in each cycle of vibration) to that of the duration of each complete cycle of vibration. Closed quotient (CQ) is calculated as follows: 1 – OQ, therefore, a high closed quotient implies a low open quotient, and vice-versa. A higher closed quotient produces a buzzier sound at the vocal fold level.
- Breathiness: the turbulent sound that might noticeably accompany your voice and which results from a significant volume of air passing through the glottis without being vibrated to a pitch.
- Fry: the popping or rattling sound produced by a slack pattern of vocal fold vibrating behavior, which can be added to any laryngeal vibratory mechanism (not just M0).
- Cordal polyphony: the simultaneous production of multiple pitches through specific desynchronous vibration of the vocal folds.
False Vocal Folds
The false vocal folds (or vestibular folds or ventricular folds or superior vocal folds; pictured in blue) lie above the vocal folds. During swallowing, the false vocal folds (FVFs) squeeze together to help prevent food and drink from entering the airway.
In extended vocal technique — the nontraditional extension of traditional vocal technique —, the false vocal folds can vibrate to produce a growl or they can be partially squeezed together to produce a grunt.
In addition, squeezing the false vocal folds together can cause the true vocal folds to approximate.
The aryepiglottic folds (AFs; pictured in green), located at the entrance of the larynx, compose the sides of the aryepiglottic funnel (or aryepiglottic sphincter, though it’s not an actual sphincter), whose opening can be constricted to produce twang.
In extended technique, the aryepiglottic folds can vibrate to produce a growl.
In phonation, though, sound isn’t always produced by the larynx. It can also be produced by the mouth without any contribution from laryngeal structures, in buccal, esophageal or laryngeal speech, for example.
Although we perceive only one pitch when listening to someone speak or sing, in reality, we’re also hearing a multiplicity of different frequencies, called overtones, each with different perceivable loudness (caused by the amplitudes of the sound waves in each frequency). The pitch of a voice corresponds to its fundamental frequency, fo, the frequency at which the vocal folds vibrate.
As the sound that was produced by the source passes through the filter (the remainder of the larynx, the pharynx, the mouth and the nose), the amplitudes of the different overtones that compose the sound are each lowered or increased. The way in which the filter heightens or dampens each frequency is responsible for the tone, timbre or texture of the voice. The frequencies amplified the most by the geometric properties of the vocal tract are called formants.
This doesn’t mean, however, that your tone is fixed. Your vocal tract is highly modifiable through movement of many different structures: the larynx, the false vocal folds, the true vocal folds, the epiglottis (the protective cover flap of the larynx), the aryepiglottic funnel, the velum (or soft palate), the jaw, the tongue, and the lips, for example, from shortest to longest distance from the source.
Zhang, Z. (2016). Cause-effect relationship between vocal fold physiology and voice production in a three-dimensional phonation model. The Journal of the Acoustical Society of America, 139(4), 1493–1507. http://doi.org/10.1121/1.4944754