Trans Voice Visualized: How to Read Spectrograms

What are they?

Spectrograms look like the following:

In essence they are a particular graphical way of representing sound. They have several properties which make them useful for voice training or for exploring different techniques. You should have a spectrogram ready and running; it will make it easier to understand the rest of this guide.


Each platform has several different spectrogram apps. These have been recommended:

  • For Android: Spectroid is a good choice with reasonable default settings. It works also on Android smartwatches, with the caveat that the menu button is in the top right corner and won’t be accessible if you have a round watchface.
  • For iOS: Spectrogram Pro ($3) or Visual Audio (free).
  • For PC (GNU/Linux, macOS, Windows): Friture has a widget called “2D spectrogram” which is what you want.
  • Special mention: inFormant, the so-called Clo’s app. With a good microphone and silent background it shows the first few formants. Modifying the location of these is one way to shift the perceived gender of the voice.

Colors and lines!

The picture below shows the spectrogram in Friture. What you would see is the colored part moving to the left and new colors being generated on the right. In some spectrograms this moves from left to right, from down to up, or up to down (e.g. in Spectroid). To understand what this all means you need to understand each one of the three axis shown: Frequency, Time, and Power Spectral Density, or PSD for short.

In essence the spectrogram tells you how much energy (PSD) there is at a given frequency at a given time. In the picture above the black region on the left corresponds to the time before the spectrogram was activated. The part with purple and pink haze represents the background noise from my device (a laptop with noisy fans), fridge, wind and other noise that reached the microphone. Then, the orange / yellow lines were produced by pronouncing a vowel. I did a pitch slide, starting with a low pitch (in the middle of the picture) and raising it and then keeping the pitch fixed high.


This represents how many times per second an element producing sound vibrates. Each sound, no matter how simple or complicated can be split up into frequencies, just as a picture can be split up into pixels. A sound that consists of a single pure frequency is a sine wave. Such a wave of frequency 360Hz sounds like this. A speaker making that sound would have its element going back and forth 360 times a second in a pattern given by the sine function. Believe it or not, any sound, including voices, can be split into sine waves of different frequencies and loudness.


The time-axis at the bottom simply tells how long ago something happened. The right side of the spectrogram in Friture is the only part that shows what’s happening NOW. The rest is what happened before. The time-axis of Friture is a bit wonky: 10 seconds on the axis corresponds to the sound that’s being recorded currently, and so 6 seconds on the axis represents what was recorded 4 seconds ago. In the picture, you can see that I started speaking at roughly 4.5 seconds, which means it was 10-4.5=5.5 seconds before I took the screenshot.

PSD (power spectral density)

The various colors in the spectrogram tell how loud a given frequency is at a given time. Usually the colder colors (black, blue, purple, dark pink) represent sounds that are less loud than those represented by warmer colors (bright pink, red, orange, yellow, white). This is shown in the axis on the right. Loundess is measured in units of decibels, dB. The higher the number the louder it is. However, negative values are used often, so pay attention.

So, how to read the spectrogram?

Once you know the meaning of the three axis you can understand what each individual pixel in the spectrogram means at any instant. Let us look at what happens at the pixel where the cursor is located in the picture below. I was doing some pitch slides up and down and up and down…

The label kindly tells us that the cursor is located at 9.65s on the time-axis and 275Hz on the frequency axis. The vertical and horizontal lines show these also. Most spectrogram apps display these helpful lines, at least for frequency. Then there’s the third axis: the color. It looks like it’s orange, so maybe around -50dB on the color axis displayed to the right. This means that the frequency 275Hz had loudness -50dB at time 9.65s (which is 0.35 seconds ago because the spectrogram slides to the left in Friture).

The actual value of -50dB is not important. This loudness value depends on the type of microphone you use, how far from the microphone you are talking and whether you are talking towards the microphone or away from it. Instead you should compare it to other colors around! The color below the cursor is orange which is much higher than the black or purple of the background. This means that my voice is doing something at 275Hz!

What is all of this good for?

Reading spectrograms is not about looking at the individual pixels, much like when watching a video you don’t think: “Great! The pixel at 500×750 was red 24 seconds after the movie started!” To get anything useful you need to focus on the various shapes the spectrogram shows. Here’s a few voice parameters that can be read from a spectrogram.


The pitch of the voice is given by the middle of the lowest line that your voice makes. The picture below shows my voice at 124Hz, 180Hz, and, lastly, 254Hz. Note that the lines become thicker in the frequency-axis as the frequency becomes lower. It is normal, and is caused by the logarithmic frequency axis. The lines would all have the same thickness under a linear axis. Experiment and choose whichever axis type you prefer in the settings.

Harmonics and breathiness

The pitch of a voice is often called the fundamental frequency or the 1st harmonic. The 2nd harmonic is the line right above the 1st harmonic, and its frequency is twice the frequency of the 1st harmonic. The 3rd harmonic is the third line and so forth. Harmonic number N has frequency N times that of the fundamental frequency. This applies to any “periodic” sound, i.e. those that are produced by a vibrating or rotating element, e.g. the vocal folds. Background noise, the wind, or breath is not. That’s why you don’t see these lines of concentrated energy or loudness in those sounds.

If you see some color between the harmonics it can mean a few different things. The most likely scenarios are that a) there is background or microphone noise, or b) there is breathiness in the voice. The picture below shows a breathy voice first and then no breathiness. Note that, compared to earlier pictures, I changed the frequency axis to linear to see details between frequencies up to 2kHz better than in the logarithmic scale.

Larynx height

It is possible to keep track of how high or low you can move the larynx using a spectrogram. The pictures below show how the sound of the big dog small dog (BDSD) and whisper siren exercises look.

If you want to feminize the voice, you want to raise the larynx, so move the bright spots above higher. For masculinization you lower the larynx and move the spots lower. A drill for raising the larynx is the BDSD, where you start by panting like a big dog, then try to pant like a small dog. The picture on the left shows this; the black stripes between the purple stripes are the moments in time between the pants, where barely any sound is produced. You can notice two brighter spots in the vertical purple thick lines. These are where the first and second resonances of your vocal tract are on the frequency scale. Raising the larynx raises these resonances. NOTE: other movements, such as tongue position, might also move the resonance. So be sure to isolate the larynx as much as possible if your goal is to track the progress with larynx. On my first pant (time 0.4s in the picture) the first resonance was at 700Hz. On the last pant (time 4.2s) that resonance was at around 900Hz. If I train more raising the larynx I could probably reach higher than 900Hz while keeping other parts of the vocal tract constant.

The picture on the right is the whisper siren, in which you start whispering some vowel, and try to change the perceived pitch of that whisper. Just as for BDSD, the brighter spots vertically show the frequencies which are the first two resonances (you will see more of them if your frequency axis has a bigger range). The first resonance was around 600Hz at the beginning, and it ended up at roughly 900Hz.

Spectrogram configuration

Various spectrogram apps have different settings. In general modifying the following is useful in certain situations:

  • Frequency axis: minimum and maximum frequency, logarithmic or linear (or other) scale. These will help with making the various harmonic lines clearer, by zooming in or out, or packing the frequencies close together to see more global properties.
  • Time axis: what can be changed here is just the length of time in the past that’s being displayed. Making this shorter makes the spectrogram move faster and vice versa.
  • Loudness / energy / PSD axis: Here it’s often possible to decide which loudness or power level correspond to the maximum and minimum color. The main use is for removing background noise from the display or to make the peaks of the harmonics brighter or more clearly defined.

There are other options too, for example technical parameters related to how the spectrogram is calculated. This is an advanced topic with very particular applications for voice practice, so they will not be described here.

Okay, here’s the effects of just one technical parameter: Friture and Spectroid have an option called “FFT size”. Increasing this makes the spectrogram more precise in frequency, but less responsive in time. This is useful for analyzing situations where there’s no change in the voice in time. Decreasing it does the opposite. You cannot have ultimate precision both in time and frequency. It’s just like Heisenberg’s uncertainty principle.


The spectrogram shows how the energy of your voice is distributed among the different frequencies contained in the voice, and shows how that has changed in the past several seconds. The various shapes displayed by it can reveal the pitch of the voice, breathiness (or background noise) and changes in larynx height. Many other voice parameters can be analyzed too. The main way to do this is to do an unmodified voice, and right after that do the modified one. Look at the difference in the spectrogram, and see if the difference increases when the voice is further modified. Not all voice modifications can be seen in the spectrogram as easily as the above, but many can be seen with enough practice.

Oral vs Nasal Breathing: Facts to Consider

Whether one should breathe through the mouth or the nose when singing or speaking is a matter of great contention.

Inside your nasal cavities, the nasal turbinates are highly irrigated by blood vessels inside them. In fact, it’s the dilation of these blood vessels that sometimes causes your nose to congest. However, they also have a very important function: they warm the air you breathe in as it passes above, between, around and under them, causing moisture from the walls of the turbinates to evaporate and therefore increase the humidity of the inhaled air. So, when you inhale through the mouth, the air is not going to be heated to the same extent and it’s not going to be as humidified.

There are two types of hydration when it comes to the larynx, the organ which houses the vocal folds: superficial hydration, which concerns the fluid over the vocal fold and laryngeal surfaces; and systemic hydration, which concerns the fluid within the body and vocal folds.

Dehydration may be superficial, and therefore result from more temporary, localized settings, such as being at the beach or running against the wind, or systemic, and therefore less temporary because it reflects the amount of water in your body. The use of clean steamers and nebulizers will only increase superficial hydration, not systemic hydration.

We must also explore phonation threshold pressure (PTP), defined as the lowest subglottic pressure (pressure under the vocal folds) required to initiate and sustain vocal fold oscillation. PTP is considered a measure of ease of phonation and an indicator of vocal fold health. By increasing the viscosity of the vocal folds, in general, dehydration increases PTP, making it harder to initiate and sustain phonation, and may also decrease vocal range.

In summary: dehydration reduces the stability and ease of phonation. In the long term, it may also contribute to vocal disorders. For example, superficial dehydration increases jitter and shimmer in the voice, which are irregular fluctuations in pitch and intensity, respectively.

As you’ll learn by reading our encyclopedia article about the Power-Source-Filter model of voice production, the vocal folds have multiple layers. In general, they can be divided into two: the body, which is located deeper and includes the vocalis muscle, and the cover, which is more superficial. To avoid confusion, keep in mind that there’s also a three-layer model for the structure of the vocal folds.

Intuitively, dry air is more likely to decrease superficial hydration, by sucking water from the cover and overlying mucus covering, and not the deeper body of the vocal folds. This may be part of the reason why superficial dehydration affects the laryngeal vibratory mechanism M2 more than M1.

However, nasal breathing takes longer for the same volume of air. This disparity is especially pronounced in people with allergies, turbinate hypertrophy or a deviated septum because their small nasal cavities allow less (or no) air to pass through in each second. Consult with your doctor to know what steps could be taken to minimize or solve these issues if they’re significant to you.

So, no matter if you’re a singer, a public speaker or a voice actor, one must find a compromise: to breathe through the mouth whenever there’s not enough time to breathe through the nose.


Elena Stevens, M. (2017). Examining the Reversal of Vocal Fold Dehydration Using Aerosolized Saline in an Excised Larynx Model BYU ScholarsArchive Citation [Brigham Young University].

Hemler, R. J. B., Wieneke, G. H., Lebacq, J., & Dejonckere, P. H. (2001). Laryngeal mucosa elasticity and viscosity in high and low relative air humidity. European Archives of Oto-Rhino-Laryngology, 258(3), 125–129.

King, R. E., Steed, K., Rivera, A. E., Wisco, J. J., & Thibeault, S. L. (2018). Magnetic resonance imaging quantification of dehydration and rehydration in vocal fold tissue layers. PLOS ONE, 13(12), e0208763.

Mahalingam, S., & Boominathan, P. (2016). Effects of steam inhalation on voice quality-related acoustic measures. The Laryngoscope, 126(10), 2305–2309.

Sivasankar, M., & Leydon, C. (2010). The role of hydration in vocal fold physiology. In Current Opinion in Otolaryngology and Head and Neck Surgery (Vol. 18, Issue 3, pp. 171–175). Lippincott Williams and Wilkins.

VaréGne, P., Ferrus, L., Manier, G., & Gire, J. (1986). Heat and water respiratory exchanges: comparison between mouth and nose breathing in humans. Clinical Physiology, 6(5), 405–414.

Zou, Z. fei, Chen, W., Li, W., & Yuan, K. (2019). Impact of Vocal Fold Dehydration on Vocal Function and Its Treatment. Current Medical Science, 39(2), 310–316.

Where the Boys At? — A Rant on the Disparities of the Transmasculine Voice

No, seriously. Where they at?

A query of “transgender speech therapy” yields many results for voice feminization. But when it comes to the transmasculine side of things, resources, information, and anecdotes are few and far between.  As a trans guy early in his transition, the process of scouring search results only to be met with the commonly dismissive, “Just wait for testosterone to do its magic!”™ is demoralizing.

Will I wake up one day and not be “ma’am”ed on the phone?  Without any conscious intervention on my own, will I be able to join an online game without stirring confusion amongst my teammates?  And will my voice stop being the contributing factor that washes away my physical presentation in public? 

The unfortunate answer is… maybe.  For some trans men (and non-binary transmasculine people), testosterone truly is the magic solution to sounding masculine.  For others, we’re left in the dark wondering if we just need more time on the juice or if there’s something wrong with us.

It’s an issue that isn’t unique to trans men either.  Cis men, as detailed in David Thorpe’s documentary, “Do I Sound Gay?”, also feel the anxiety of developing a voice that puts them at odds with their peers. For some, embracing one’s voice— however it may be —is liberating. With gender dysphoria breathing down one’s neck, ‘just accepting it’ is rarely an option.

Testosterone can be a blessing in the pursuit of trans masculinization. An increase in muscle mass, fat redistribution, and facial hair all help to produce society’s view of a male physique.  Included amongst these changes is the thickening of the vocal cords.  The only issue being that thick vocal cords on their own have little bearing on whether one has a deep or masculine-sounding voice.

As detailed in a 2012 study by Ingo R. Titze, “The greatest conceptual error is made with regard to vocal fold thickness. It seems intuitively correct to assume that if vocal fold thickness is doubled, mass is doubled, and hence f0 is lowered. This is entirely erroneous, however.” You can think of f0 as the pitch you’re talking or singing in (although this is not absolutely, completely true).

All of that said, I’d be imprudent to say I’m not thankful for all of the changes testosterone allots me.  Trans women receive no such added assistance in voice training from estrogen.  Even though I am early into my transition, in six months, I’ve already noticed a significant drop to the pitch of my voice. What used to sit at the 150 Hz range is now at 90 Hz.

That still doesn’t answer the reason why trans men and transmasculine people are often left out when the topic of vocal transition is brought up.

The dismissive stance is that masculinization is easier.  The misconception exists where all one needs is a haircut, baggy clothing, and a scowling expression before the Man Fairy™ swoops down to fill in the rest.  After all, it’s a lack of information that leads people down the path of assumptions. And the world of trans masculinization is riddled with limited data.

Some state that there are just less trans men than trans women.  A study in the Journal of Sexual Medicine conducted in two separate clinics found that transmasculine transitions are on the rise.  Between the years of 1989 and 2005, the sample size of 420 adolescents diagnosed with gender dysphoria consisted of 58.6% AMAB (assigned male at birth) people and 41.4% AFAB (assigned female at birth) people.  The researchers noticed a shift between the years of 2006 and 2013— the ratios changed to 36.7% AMAB people and 63.3% AFAB people.

While the researchers noticed the swing occurring in two separate countries independent of each other, the cause was… inconclusive.  But it does showcase that transmasculine people are by no means a minority among the transgender community. If anything, the population is statistically on the rise.

Our question of why trans men are underrepresented is one that remains unanswered from a statistical sense.  Perhaps that’s why the topic deserves discussion. If trans men aren’t as much of a rarity as our lack of portrayal makes us seem, is there an underlying social issue? 

Who better to ask than fellow trans men themselves? A few videos by Boyform touch on the topic.  When it comes to interacting with people outside of trans spaces, trans men rarely make it into the conversation for better or for worse.  Transmasculine people are spared the constant slander that popular media uses to discriminate against trans women. In return, us transmasculine folk gain invisibility.

Perhaps not the most definitive source, but Wikipedia’s list of transgender characters in films shows 15 films featuring trans men and 73 featuring trans women.  As far as the quality of representation, that’s another story.

Popular media’s depiction of trans men is often abused ‘women’, tomboys, or butch lesbians.  Though, these delineations are often tame compared to trans women being treated as a plot twist, joke, or purely sexualized.  Add to the fact that these characters are often played by cis actors and the pieces of the puzzle come together why so many of these depictions lack any understanding of what it means to be transgender. 

This in itself brings up another disheartening paradigm. Trans women are both more represented but also more discriminated against.  Moving from film to reality, the state of transmisogyny is even more damaging.  Media witch hunts, often fueled by TERF rhetoric, paint an image of trans women ‘invading’ cis women’s spaces or eliciting predatory behavior. 

Statistically, 50% of the transgender population has experienced some type of sexual assault in their lifetime.  While these numbers, amongst other statistics, often fail to separate trans women, trans men, and non-binary people, NCAVP’s Hate Violence Report cites trans women of color as the leading victims of hate crimes.

Why bring up any of this in a discussion of transmasculine speech training resources?  Well, while the exact culprit for the lack of masculinization resources remains inconclusive, the reason for an abundance of feminization resources is clear.  For me, my voice leads to being called a slur or two in an online video game.  For a trans woman, her voice may lead to physical violence. 

Invisibility is both the blessing and curse of transmasculine people. We are given a reprieve from some of the violence and discrimination that plagues the transgender community, but in turn, our resources are often limited.

After all, as Boyform discusses in his video, many passing trans men choose to live ‘stealth’.  They often leave trans spaces and communities stranding transmasculine people early in their transitions to discover a means of masculinization on their own.

But we’ve got this, boys! As mentioned earlier, we are fortunate enough that testosterone provides an additional benefit to our vocal ability during transition.  The process can sometimes take years before one’s f0 reaches a masculine range, but that’s all the more reason why masculinizing speech therapy is important. 

Six months into my journey and I can tell you, the road to sounding like Alucard from Symphony of the Night is not for the faint of heart.  It feels like I’m learning how to speak all over again.  I no longer know how to sing. Some days, I struggle to even project my voice at all.  Some of this is due to my vocal folds thickening while my larynx has remained the same length.  It doesn’t help that I’ve been using the same vocal patterns I’ve always used since before I transitioned.

Larynx height and embouchure are not affected by testosterone. But luckily for us, they can be trained! A lowered larynx increases pharynx space which creates a deeper, warmer tone. Using open vowel sounds increases this effect further and maintaining a high closed quotient (in short, CQ is the ratio of how long the vocal folds are touching to how long the cycle of vibration lasts) further helps one capture what is typically a masculine-sounding voice.

That’s not to say there isn’t the potential for further complications. Other than the initial ‘settling period’ of cracking and inconsistency, loss of the M2 range is common.  Entrapped vocality— a condition caused by one’s vocal folds enlarging to a degree far larger than their larynx —creates a constant ‘hoarse-sounding’ effect. Understanding one’s own voice is imperative to being able to manipulate the sounds to achieve the desired effect. 

The beauty of speech masculinization training is that anyone can benefit from it— even people who are not taking testosterone! 

With all of that said, what are the tips and tricks to taking control of one’s voice? 

How can one lower their larynx

How does one maintain a high CQ?

How can one avoid vocal damage while in the early stages of testosterone?

Stay tuned for future posts where we finally give a voice to trans voice masculinization. 


Aitken, Madison, et al. “Evidence for an Altered Sex Ratio in Clinic‐Referred Adolescents with Gender Dysphoria.” Wiley Online Library, John Wiley & Sons, Ltd, 22 Jan. 2015,
The Changing Female-To-Male (FTM) Voice,
“List of Transgender Characters in Film and Television.” Wikipedia, Wikimedia Foundation, 20 Apr. 2020,
Stotzer, Rebecca L. “Data Sources Hinder Our Understanding of Transgender Murders.” American Journal of Public Health, American Public Health Association, Sept. 2017,
Titze, Ingo R. “Vocal Fold Mass Is Not a Useful Quantity for Describing F0 in Vocalization.” Journal of Speech, Language, and Hearing Research : JSLHR, U.S. National Library of Medicine, Apr. 2011,
“Violence Against Transgender People and People of Color Is Disproportionately High, LGBTQH Murder Rate Peaks.” GLAAD, 4 June 2012,
Zeigen, Laura, and Health & Science University Otolaryngology Departmental. “Effectiveness of Testosterone Therapy for Masculinizing Voice in Transgender Patients: A Meta-Analytic Review.” Taylor & Francis,

The higher the soft palate, the closer to God — A Rant

Ahh, the soft palate. Or, in scientific terms, the powerhouse of the voice velum:

It’s that floppy bit at the back end of the of the roof of the mouth, which touches the back wall of the throat when raised. You see it? Look at it go! Truly a powerhouse.

The nasality that the soft palate “powerhouse” provides access to is VERY often debated as one of the biggest keys, or crutches, to the voice… depending on who you ask. In other words, should you leave your soft palate “light-switch” in the “off” (lowered/nasal) or “on” (raised/non-nasal) position? We’re about to ask YOU in a second so we can find you and fight you. But, first, let’s catch-up those of you who are new to nasality and the soft palate because not one of your souls will be exempted from our judgment.

So what’s a soft palate even? You might wanna check out our nasality article and hop back over here… Okay, welcome back, you remember that? The soft palate controls the opening to the nasal cavity for any of that nasality stuff to even be a thing. There are other articulators that assist in hearing what we know as nasality, but for some reason there is a lot of discussion single-heartedly devoted to the soft palate. Related terms for the soft palate include velum, velopharyngeal port, velopharyngeal opening, “the only part of the vocal tract that apparently ever mattered or will matter —  all praise to the floppy bit at the top-back of your mouth; may its mercy be ever swift and painless.” Debates about optimal soft palate position can even exceed discussions of nasality, invoking incidental points of contention from “openness of the throat” to “is it Tuesday on Mars?” 

Caught up? Well, prepare to catch these Scinguistics hands if you pick the wrong path on this soft palate positioning flow chart. Which is the ideal position of the soft palate for singing?

VPO: velopharyngeal opening, the opening that connects the oropharynx to the nasopharynx

Even if you picked the right answer, you probably still gotta fight us… we know half of y’all picked the right answer out of negligence. Fess up. For the rest of y’all, we got some more fire. Like how about this round letting y’all know that even untrained singers tend to lift the soft palate on high notes (because it’s probably just a reflex). We got even more rounds popping off with the truth that a lower soft palate doesn’t block up your throat, and that lowered soft palate activities like humming actually make the onset of phonation EASIER. And for the rest of y’all making claims about the sound of a lifted or lowered soft palate getting nasty on your sound, BANG BANG BANG, turns out that soft palate position doesn’t independently influence what we hear as nasality. Yezzzz that’s right, chomp on that TRUUUUTH. Now where does all this foolery come from to step to science?!?! (And don’t tell us that nobody believes any of the debunked answers, because we got those answers straight from the mouths of vocal pedagogy and tip sites via the same kinds of searches we linked earlier in this rant.) We can probably divide this soft palate rabblerousery into three main camps: classical puritans, modern mix zealots, and “the nose is in the throat” cultists.

Ok, you know that this wouldn’t be a Scinguistics rant if we didn’t yell at classical. Classical seems to unironically worship the soft palate elevation supremacy cited in most of the rationales for the elevated soft palate hype pathways in the flowchart. Why this elevation of soft palate elevation? Classical vocal styles, such as opera, tend toward favoring of timbral profiles that often exclude nasality. Nasality can be seen as darkening, skewing a chiaroscuro (light — dark) balance already skewed dark by the typical low larynx and seasonal knodel timbral settings featured on the majority of opera singers. Furthermore, this darkness can be thought of as counter to the bright “ring” endowed by the high, amplified “singer’s formant” that allows opera singers to be heard above the orchestra. When this ring is achieved by squillo (intralaryngeal compression), nasality can be seen as a competing force that drowns the high, bright timbral ping of squillo in darkness. To bring the ping out of the nose, some opera practitioners may advise their large academic audience towards elevating the soft palate to close off entry to the nose. We’ve already established that the soft palate doesn’t need to be elevated to exclude nasality, but we know classical ain’t tryna hear from us on no science. Speaking of, science seems to suggest that soft palate elevation on high notes is a pretty universal reflex for even “untrained” (literary shade for “non-classically trained”) singers. You know classical has to go and say that soft palate elevation encourages ideal phonation on high notes, which science responds to with studies that show that humming can ease onset of phonation. As such, there is the possibility that soft palate elevation in classical stems more from instinct or textural preference than technical necessity. Now what of lowered soft palate stans?

Here at Scinguistics, we are equal opportunity squawkers. We squawk at modern voice pedagogy with equitable fervor compared to classical. Modern voice pedagogy at large appears to stan the lowered soft palate position. Since the main rationales for this are more subjective qualities such as ability to produce ring and mixed voice qualities, our disagreements about these claims amount more to counterclaims that a raised soft-palate does not inherently preclude potential sources of these aforementioned timbral qualities. Sufficient vocal fold compression can provide ring in the form of squillo and mixed voice quality in the form of elevating the glottal closed quotient values of M2 towards those exhibited by M1. Both of these qualities, and other laryngeal co-ordinations, are not subject to the doings of the soft palate. This leads us to the final, most important, yelling of all:


But the voice IS. The articulators closer to the actual vocal folds (you know… the source of the VOICE), such as the muscles pulling on the folds and the larynx housing them, can more immediately determine ease of singing than the soft palate which does not co-ordinate directly with the vocal folds. The main point of this rant is to free the voice from the constraints of necessitating certain soft palate positions in order to “properly” use the voice. As substantiated by the science referenced earlier on, this freeing of the voice from soft palate conspiracy allows voice users to pick soft palate position based on stylistic preference and not mythos about which position is safer or “right”. This also has bearings for the larynx height/soft palate position conflations often seen in TRANS VOICE (no we didn’t forget about you). High larynx can accidentally trigger excessive nasality while low larynx can be mixed up with hyponasality. Coming soon to a future near you we have a conspiracy sesh/rant about this common tricky spot in trans voice pedagogy powered by this rant about the soft palate and that other one about nasality.

So, as suggested by the flowchart and this rant, the soft palate can do whatever it wants, and we like to yell. That being said, we’ll still fight you if you gonna act like the lifting and lowering of the soft palate is virtually meaningless. Timbrally speaking, soft palate position factors into the ability to generate audible nasal and hyponasal texture. Check out our nasality encyclopedia article for more info on that. There is an epidemic of people misusing “nasal” so don’t think you’re just gonna skirt through the middle of our flowchart on some “everything that sounds weird is nasal lmfao y33t” without getting wrecked. Picking the right answer to our flowchart without being bothered about the intricacies of nasality is still WRONG. Sorry, don’t make the rules.

To those of you who are aware of the true impact of soft palate on nasality but still believe that this is a technical and not a stylistic choice… I guess we might let you live a little bit or something.

Does mixed voice exist? — A Rant

So you’ve read our article on the four laryngeal vibratory mechanisms and you start thinking about the mixed voice. You start wondering…. surely, since people say that mixed voice is a mix between chest voice (usually considered to be in M1) and head voice or falsetto (usually considered to be in M2), it must be a register between M1 and M2… so M1.5.

The existence of a M1.5 is a very prevalent myth. Plenty of people think that mixed voice is a distinct mechanism of vocal fold vibration, i.e., a distinct register. But M1.5 doesn’t actually exist.

If you’ve read our article on the Power-Source-Filter Model of Voice Production, then you know that the vocal folds can be divided into three layers: the body, the transitional layer and the cover. There’s one fundamental difference between M1 and M2: in M1, both the body and the cover of the vocal folds vibrate; whereas in M2 only the cover vibrates.

So, when you’re doing a glissando from chest voice to head voice, at some point, the body of your vocal folds is going to stop vibrating. At that point, you’re instantly in M2 because only the cover is vibrating. There’s no transitional mechanism, no M1.2, no M1.3, no M1.5. It’s this instant break from M1 to M2 that can make that cracking sound we’re all too familiar with. When, on purpose, you don’t disguise this break, it’s called a yodel or a flip.

But then why don’t all transitions from M1 to M2 produce a yodel? Trained singers are able to disguise this break when they want to give the illusion of a sound that’s fully connected throughout their range. In order to explain how they do this, we need to introduce the concepts of open quotient (OQ) and closed quotient (CQ).

  • Open Quotient (OQ): the ratio of the duration of the open phase (when the glottis is open in each cycle of vibration) to that of the duration of each complete cycle of vibration; in simpler terms, it’s a cyclic measure of how long the vocal folds stay apart.
  • Closed quotient (CQ): the ratio of the duration of the closed phase (when the glottis is closed in each cycle of vibration) to that of the duration of each complete cycle of vibration; in simpler terms, it’s a cyclic measure of how long the vocal folds stay together.

    Closed quotient is calculated with the formula 1 – OQ, therefore, a high closed quotient implies a low open quotient and vice-versa. In more practical terms, a higher CQ produces a buzzier sound at the vocal fold level. A word of caution: you can’t say you’re “in an open quotient” or “in a closed quotient”, that’s like saying you’re in a frequency. You can however, say that you’re using a high OQ or a low OQ.

Now we’re able to actually understand how singers are able to, when they want to, navigate their M1-M2 range without a noticeable break.

On a glissando from chest voice to head voice, by slowly increasing the open quotient, they’re able to lighten their voice, making it more tonally similar to M2, yet the body of the vocal folds (the vocalis muscle) is still vibrating in sync with the cover, so it’s M1. When the transition actually occurs, the similarity in tonal quality hides the break, but it’s still there (and can almost always be picked up by a trained ear). The body and cover of the vocal folds decouple from each other, leaving only the cover vibrating, producing a spike in frequency (which is also usually hidden by decreasing the volume and reduced with training). The exact opposite thing happens on downward glissandi from head voice to chest voice.

By (its very vague) definition, mixed voice encompasses both part of M1, where it takes the notation of mx1, and part of M2, where it takes the notation of mx2. The m doesn’t stand for mechanism, it’s just the m in mix.

So, does mixed voice exist? Is it a vocal register? I’d say that the answer to that question doesn’t matter. What matters is knowing that M1.5 doesn’t exist, that mixed voice isn’t a distinct pattern of vocal fold vibration and that mixed voice can’t be exactly defined (there’s no one point at which you can say you’re in mixed voice or not in mixed voice, unless you’re in M0 or M3).

This also means that you don’t have to “find your mix”. It’s not a mechanism you have to find. It’s just a matter of lightening/darkening your voice so as to disguise a physiological break in your voice when transitioning between laryngeal vibratory mechanisms.


Sylvain Lamesch, Robert Expert, Michèle Castellengo, Nathalie Henrich Bernardoni, Bertrand Chuberre. Investigating voix mixte: A scientific challenge towards a renewed vocal pedagogy. 3rd Conference on Interdiciplinary Musicology, CIM07, Aug 2007, Tallinn, Estonia. ffhal-0020799

Falsetto for fem voice — A Rant

Ah!, the world of trans voice. Full of cis people who just one day decided to start teaching trans voice despite having done absolutely no training or reading on the subject…

I’m sure you’ve heard the common advice: if you want to achieve a passing female voice, all you have to do is use your falsetto (or your head voice). Well, you’ve been lied to.

Don’t get me wrong, there is a place for this sound in fem voice, but that place is not as your primary speaking mechanism. It can be used for laughing and squeaking and singing, and, in those contexts, it’ll sound great and feminine, but not in speaking, not ever.

People seem to labor under this common presumption that the main difference between a man’s and a woman’s voice is pitch, and that thus the most effective way to correct this difference is to raise your pitch as high as you possibly can. What’s the best way to raise your pitch that high? Using falsetto. But you’ve heard low-voiced women, and they completely pass… so pitch isn’t the problem. And cis women don’t use falsetto either (unless they have puberphonia, in which case they’ll have difficulty being heard), so why people are teaching the technique that male comedians use to comedically “imitate” women (and in some cases mock trans women) is completely beyond me.

So what should you do? You should work on raising your larynx (which is different from raising your pitch) in order to emulate the vocal tract length of cis women. We have an entire encyclopedia article (with instructions!) dedicated to this topic.

But larynx height isn’t the only fundamental aspect to voice feminization, even if it’s the most important one. The Scinguistics Discord server has a compilation of valuable resources, as well as a community of trans men and women working together to obtain the voices they want!

The Swallowing Method — A Rant

After you read this title, some of you might’ve been left with a question: what is the swallowing method? Well, first and foremost, it’s a

It’s just… it’s a damn lie. Now that we’ve got that out of the way, let me try to properly explain.

For voice feminization, the larynx (also known as the voice box) has to be in a higher position. In fact, this is the primary aspect of voice feminization. Now, place your finger on your Adam’s Apple and swallow. You’ll notice that during the process of swallowing, the larynx rises, stops for a fraction of a second and then goes back down.

An approach that initially makes sense is to hold the larynx in its highest position, mid-swallow. So, it seems to make sense and it’s a popular method, so why do we hate it? There’s actually a couple of reasons:

  1. We’ve been swallowing since birth. When you hold the larynx mid-swallow, it will still want to rebound back down, like it always has. Trying to stop this rebound could cause overactive/unnecessary muscular engagement in a conscious attempt to stop the rebound.
  2. When you swallow, the vocal folds come together in order to prevent food or liquid from going into the lungs, which can cause overcompression at the vocal fold level. The same overcompression can happen when phonating while lifting weights or during childbirth. Speaking with overcompression can end up causing vocal fold inflammation and possibly vocal damage in the long run.
  3. When you swallow, the epiglottis (like the vocal folds) obstructs the airway to stop food and drink from entering the lungs, which is directly counterproductive to any type of voice production.

What usually happens is that, even if people know about the dangers of the swallowing method, they do it anyways because they think the big dog small dog exercise takes too long… but you’re not somehow the exception to the rule. Just do as you’re told, oh my god. We’ve seen this happen way too many times… it’s frustrating. What’s worse is that that excessive muscular tension can then find its way into your everyday life and make you feel uncomfortable all day.

In conclusion, when someone asks you what the swallowing method is, you’ll know what to say:

A special thanks to Zheanna from, whose consultation was imperative in the writing of this article!

Singing from the Diaphragm — A Rant

What do singers mean when they talk about singing from the diaphragm? They either mean it as a reference for trying to guide you into feeling the right sensations for abdominal breathing, without explaining the underlying processes, or they legitimately don’t know any better. 

Diagram of the respiratory system, by Theresa Knott (CC BY-SA 3.0)

Besides the very popular diaphragm, the intercostal muscles (the muscles between all of your ribs) also act as breathing muscles. So, although you can’t breathe without your diaphragm, you can use your diaphragm a little less by using the intercostal muscles — that’s usually referred to as chest breathing because it feels like you are mainly using your chest to breathe. Similarly, just using your diaphragm alone is called abdominal breathing.

When the diaphragm contracts, it flattens from a dome shape to a shape resembling a flat muscle disc, creating more space in the lungs, which generates a negative pressure inside the lungs due to which air streams into them — the process of inhalation.

At the same time, however, the space created in the lungs, above the diaphragm, is compensated below: the abdominal organs (liver, kidneys, intestines, etc.) get pushed and squished together; then, the added pressure tries to relieve itself in the front, where your abdominal wall muscles (your “abs”) are holding it all together. The result is that feeling which may be described as “breathing through the diaphragm”, but, of course, no air is actually getting into the abdomen.

You can then contract your abdominal wall muscles to push back on the guts and, thus, indirectly move the still contracted diaphragm, giving you a mechanism for pressure-controlled exhalation — a process which you may have heard of under the misleading name “breath support”.

That’s the whole point of the entire exercise of bringing your breathing down into or through your diaphragm: so you have better mechanical control over the air release during singing, with more control over the airflow, as opposed to in “chest breathing”, where your diaphragm is pushing down less. If the abdominal wall gets less stretched, it’s harder to sufficiently engage the abdominal wall muscles in the same way for air release.

It’s also possible to engage both intercostals and the diaphragm to their fullest in a mixed type of breathing (as in without the detriment of “chest breathing”, where your abs can’t sufficiently “support”), but I assume your teacher’s priority is probably first to sensitize you to those body parts first before getting fancy.

That’s probably your most important lesson in singing: singing terminology as it stands is very nebulous and nonsensical, so you’ve always got to reflect on what each term that describes certain sensations means in a real, physical sense. It doesn’t mean that you have got to give up on working with sensations and imagery, but you need to be aware that these sensations and imagery only work when you’re being guided by someone who actually understands the physical and physiological bases behind each one, which is still a rare thing.