Converting your music the right way

In our shop, you’ll find audio files in different resolutions and will come across terms like bit depth, sample rate, dithering, LUFS, and true peak. These terms are important when it comes to selecting music for your projects, especially if you plan to further edit and release the music in different ways. Whether it’s for a film, commercial, music video, CD, DVD, or streaming platforms like Spotify, Apple Music or Audible. Many of these media have specific requirements for properly delivering your tracks. Some require specific loudness levels and peak limits, while others specify the resolution in which your music should be delivered. To help you make the most of the music you’ve purchased on emaer-music.com, here’s a detailed guide on how to prepare your music for the appropriate use cases.

Note: If you’d like to save time and effort, we’d be happy to take care of converting your music for you at an affordable rate. Just send us your project details and we’ll provide you with a no-obligation quote.

Send a message to emaer

On which platform or for which medium are you releasing your tracks?

If you’re using music for films and videos, the specifics like bit depth, sample rate, etc., aren’t as crucial at first. For film, a sample rate of 48kHz and usually a bit depth of 24-bit are almost always used. But even if your source material is in 16-bit and 44.1kHz, most video editing programs can adjust these files to fit your project settings without any loss in sound quality. Generally, upsampling isn’t a problem. However, when downsampling, it’s important to use good conversion software to keep the sound loss as minimal as possible.

If upsampling isn’t an issue, why not just record, produce and deliver everything in 16-bit and 44.1kHz?

Here’s the thing: the higher the bit depth and sample rate, the more detailed, clearer, and higher quality the music sounds. When you work with higher values, you have more control over the sound because it is represented more accurately and finely. It’s like taking a photo in 8K versus 720p – higher is always better, at least during production. However, the higher the values, the larger the files become. And that’s not always a good thing. For example, streaming platforms would need incredibly large bandwidth to stream all music in extremely high resolution to your device. You’d likely spend more time waiting for files to load than actually listening. That’s why many streaming services specify exactly in which formats music should be submitted.

If you plan to release your music on CD, the criteria are different from releasing it for film or on DVD. Spotify has different requirements than Apple Music, and in general, it’s definitely a good idea to know what you need to do in order to convert your tracks to the right format for the specific requirements. (As mentioned, if you don’t want to do it yourself, feel free to reach out to us.)

Sample Rate and Bit Depth

Music producers often aim for the best possible balance between file size and resolution during editing. Many studios work with 24 or 32-bit depth and sample rates ranging from 48kHz to 96kHz. However, these formats need to be reduced to the desired final formats without significant loss of sound quality. To do this properly without major mistakes, it’s important to understand what you’re doing. That’s why it’s helpful to know what bit depth and sample rate mean.

Bit Depth

Bit depth in music production refers to how much information is stored per sample. It affects the dynamics and sound quality of a digital audio signal. Common bit depths are 16-bit (CD quality) and 24-bit (professional recordings). 16-bit is also the standard for many streaming platforms, though 24-bit is sometimes accepted as well. A higher bit depth allows for a greater dynamic range, meaning that both quiet and loud sounds can be reproduced more clearly and in more detail. In short: the higher the bit depth, the better the sound quality.

Sample Rate

The sample rate in music production refers to how often per second an analog signal (such as a sound) is digitized. It is measured in Hertz (Hz). A higher sample rate means that more data is captured per second, leading to better sound quality. Common sample rates are 44.1 kHz (CD quality) and 48 kHz (film sound). Essentially, the higher the sample rate, the finer and more detailed the sound, but it also requires more storage space.

Different sample rates in music production are used to meet various requirements related to sound quality and specific applications:

Sound Quality: Higher sample rates (e.g. 96 kHz or 192 kHz) capture more detail in the sound and offer better audio quality, especially for demanding music productions and high-resolution recordings.
Application: Different industries use different sample rates. For example, 44.1 kHz is commonly used for CDs, while 48 kHz is standard in film and video productions.
Editing: When editing audio, a higher sample rate can be helpful for applying effects and processing with greater precision.
Compromise between Quality and File Size: Lower sample rates require less storage space, making them more suitable for applications like streaming, where bandwidth and storage are a concern.

Overall, producers choose the sample rate based on the specific needs of the project and the desired sound quality.

True Peak and Loudness (LUFS)

In addition to requirements for bit depth and sample rate, streaming services have other specifications, such as True Peak and Loudness. But it’s not just for streaming that this matters; understanding LUFS (Loudness Units relative to Full Scale) is also essential for achieving a balanced volume across an album, where all tracks should sound relatively consistent in loudness. On the other hand, True Peak knowledge becomes even more crucial when downsampling your music, e.g. from 48kHz to 44.1kHz, or when converting your tracks to MP3.

So, what happens when you convert and downsample and what should you watch out for?

When you downsample (e.g. from 48kHz to 44.1kHz), you’re reducing the sample rate, which can lead to a loss of high-frequency information. This can cause artifacts, such as unwanted distortion or a loss of clarity, particularly when you are working with audio that is at higher sample rates.

Similarly, when you convert your tracks to MP3, you’re compressing the audio, which also reduces the quality due to the loss of data. The more compression, the more the audio quality suffers. This is especially true when converting from high-quality formats (like WAV or FLAC) to MP3.

LUFS (Loudness Units relative to Full Scale)

LUFS is crucial for maintaining a consistent perceived loudness across an album or collection of tracks. Streaming platforms often normalize tracks to a certain LUFS value, so it’s important to master your tracks at the right level to avoid them being too quiet or overly loud compared to others. Knowing the LUFS value ensures a smooth listening experience, whether it’s for streaming or for creating a well-balanced album.

Why is LUFS important?

Unlike traditional peak level measurements, which only measure the highest point of a signal (i.e. the loudest volume the signal reaches without considering the listener’s perception), LUFS takes into account the average loudness over a longer period and how the human ear perceives loudness. This means that LUFS provides a more accurate representation of how loud a track actually sounds to the listener.

LUFS is increasingly used in music production and mixing, especially in regard to loudness normalization across various media and streaming platforms.

LUFS and Loudness Normalization

Streaming platforms like Spotify, YouTube, or Apple Music use loudness normalization to ensure tracks stay within a specific loudness range so that users don’t have to constantly adjust the volume. The loudness of a song is normalized to a specific LUFS scale, for example -14 LUFS for many streaming services.

So, if you produce a track that is too loud (e.g. -5 LUFS), the streaming platform will automatically lower its volume, which can lead to a loss of dynamics and potentially lower sound quality.

In summary, LUFS is a crucial measure for ensuring consistency in perceived loudness across different platforms, and understanding it helps avoid unwanted loudness adjustments by streaming services, preserving both the dynamics and quality of your track.

Target LUFS in Music Production:

Mastering for Streaming Platforms: Most streaming services target a loudness of -14 LUFS (integrated), as this provides a good balance between loudness and dynamics.
Radio & CD Productions: For other formats like radio or CD, the loudness can be around -9 LUFS to -10 LUFS, as these platforms often apply less loudness normalization.
Peak Levels: The peak loudness (True Peak) should ideally be set to -1 dBFS or -0.1 dBFS to avoid distortion.

True Peak

When downsampling or converting, True Peak levels (the highest point of the audio waveform) can cause clipping or distortion if they exceed 0 dB. This is particularly important when converting to lossy formats (like MP3) or when adjusting the sample rate, as these processes may cause the peaks to get out of range. Ensure you apply True Peak Limiting to avoid distortion, especially when your tracks are played back on different systems.

True Peak Limiting is a process in music production that ensures the volume of an audio track does not exceed a certain level. It takes into account not only the digital level (dBFS) but also how the audio will sound in analog formats or when played on different devices. When you edit a track, you can use a limiter to prevent the volume from „clipping“ or distorting. A True Peak Limiter analyzes the signal and ensures that the maximum volume level (the „True Peak“) does not exceed 0 dBFS. This is important because digital audio files can sometimes sound distorted or clipped when converted to other formats or streamed, even if they do not exceed the 0 dBFS threshold in their digital form.

In summary: True Peak Limiting helps preserve audio quality and prevent unwanted distortion by ensuring that the volume does not exceed a safe threshold, even when the audio is processed or played back.

For streaming platforms, it is recommended that the maximum level of a track is set to around -1 dBFS or even -1.5 dBFS. This buffer helps prevent clipping or distortion that can occur during conversion or streaming.

Headroom and Clipping

Create Headroom for your tracks for maximum volume. When releasing music on streaming platforms, you should not exceed a True Peak value of -1.0 dBFS, as this could lead to rejection from the platform or cause your music to sound distorted. But why is that the case, when audio files only clip once they exceed the value of 0.0 dBFS, such as 0.1 dBFS?

The issue lies in the conversion process: When you downsample high-resolution files from a higher sample rate to a lower one, new peak levels can emerge that may lead to clipping. The problem worsens when you then convert these files to MP3. Let’s take a closer look:

In this image, you can see an audio track that was recorded at 48 kHz and rendered with a True Peak of -0.3 dBFS. After being downsampled from 48 kHz to 44.1 kHz, the overall peak level now sits at -0.2 dBFS (True Peak). Everything is still fine here, as no clipping occurs. However, if the song had been produced at -0.1 dBFS or even 0.0 dBFS, the track would have clipped after the downsampling.

Conclusion: If you plan to play your music on the radio or release it on CD at 44.1 kHz, but your original material is, for example 48 kHz, be sure that the True Peak of your source material has enough „room“ (headroom). A safe setting here is -0.3 dBFS True Peak.

Why do streaming services usually require -1.0 dBFS True Peak, and what happens when converting to MP3?

Since streaming services want to deliver music even when your internet connection is poor, it’s impossible to transfer large amounts of data efficiently. Therefore, the music needs to be downsampled and converted into smaller files. This results in smaller data sizes but also increases sound degradation. It’s better to listen to lower-quality music than none at all. However, in order to avoid the music sounding overly distorted or clipping during the conversion process, a headroom of -1.0 dBFS (or sometimes even more) is required. The real question, though, is: Do I really want to listen to music that has been extremely compressed and degraded in quality?

Let’s take MP3 as an example

While a high-resolution audio file in WAV format is large, MP3 files are quite small, fitting thousands onto an MP3 player and allowing streaming platforms to deliver them more easily over poor internet connections compared to large WAV files.

To reduce the file size, certain frequency ranges of the track are cut off during the conversion to MP3. Additionally, the sample rate is reduced, which requires less data, making the file smaller. The audio file is compressed. However, this compression can cause frequency overlaps, where levels can stack on top of each other, potentially leading to clipping. The following example illustrates this clearly.

The track was mastered at -0.3 dBFS True Peak. When converting to MP3 with a high resolution of 320 kbps, the signal overdrives, rising from -0.3 to 0.3 dBFS. It becomes more problematic when we reduce the resolution of the MP3. In this example, the level increases from -0.3 to 1.1 dBFS True Peak, which is an increase of 1.4 dBFS and would require a headroom of at least -1.5 dBFS. However, for streaming platforms, a headroom of „only“ -1.0 dBFS is usually sufficient. This is because lower-resolution MP3s already sound poor and any occasional overdriven peak doesn’t really matter in the end.

Conclusion: If you’re releasing your song on CD or for radio, you can increase the volume to levels up to -0.3 dBFS True Peak. However, if you plan to convert your audio files to MP3 or release them on streaming platforms, make sure the files do not exceed a True Peak level of -1.0 dBFS.

Sample Rate Conversion

To convert your sample rate, you can follow these steps: Import your audio file into your music production software, such as Logic Pro X. Go to the export (often called „bounce“) option. From there, you can select a new sample rate and convert it, for example from 48 kHz to 44.1 kHz. However, it’s worth noting that the internal converters in some programs are not always the best. For our conversions, we use iZotope RX, as its conversion is significantly more detailed and sounds better compared to the one in Logic Pro X.

Dithering

After you’ve converted the sample rate of your song, you also need to adjust the bit depth. If you want to convert from 24-bit or 32-bit to 16-bit, you can do this as follows: The easiest way is to set the desired bit depth in your music production software when bouncing. Alternatively, you can use specialized programs that perform this conversion more accurately and without loss compared to the internal converters. When converting from, for example 24-bit to 16-bit, there won’t be any clipping, but there may be distortion that can affect the sound quality. To smooth out these distortions, there is a process called „dithering.“

Dithering is a process in which a specific type of noise is added to the actual signal to mask artifacts that arise from requantization. Dithering helps maintain sound quality by reducing audible differences that can occur when reducing bit depth. Simply put: dithering ensures that the sound stays good even after the reduction.

Recommendations include the plugin Ghz Good Dither, iZotope Ozone, or our favorite, iZotope RX. Other limiters, such as the Sonnox Oxford Limiter, also offer dithering. However, it’s important to note: If you let a program handle the dithering process, all other dithering options should be turned off!

Dither Algorithms and their differences:

Different dithering algorithms and settings are used, each with a unique method of adding noise and quantization errors into the audio data. Here’s a brief explanation of the common dithering settings and which might be most suitable for your needs:

Flat-Top Dithering: This algorithm adds constant noise across the entire frequency spectrum. It results in uniform dithering effects but can sound somewhat „noisy“ for certain audio content.

When to use?: Less suited for music with fine details and clear highs.

Noise-Shaping Dithering: Noise-shaping attempts to mold the noise to make it less noticeable. It shifts the noise level to frequency areas that are less perceptible (usually in the higher frequencies). This results in a „cleaner“ sound quality.

When to use?: Ideal when you want to minimize the dithering effect, especially for music with fine details or complex tonal colors.

Triangle and Pow-R (Power Noise Shaping): These algorithms offer different methods for shaping the noise. Pow-R 1, Pow-R 2, and Pow-R 3 are specific versions that distribute the noise more intelligently across different frequencies. Pow-R 2 and Pow-R 3 are generally preferred for finer control over noise shaping.

When to use?: Pow-R 2 is often one of the best choices for most music productions because it strikes a good balance between reducing noise and maintaining sound quality.

TPDF (Triangular Probability Density Function): TPDF is the simplest and oldest dithering algorithm and generates evenly distributed noise.

When to use?: Can be sufficient for simpler productions, but less effective for avoiding noise in complex musical pieces.

What is the best setting?

There is no „one-size-fits-all“ answer, as it depends on what you’re trying to achieve with your material and the type of music you’re producing. However, here are some general recommendations:

For most music productions: Pow-R 2 is a very good choice, as it effectively controls the dithering effect and preserves the sound quality of the final product without introducing too much audible noise.
For maximum transparency and detail retention: Noise Shaping combined with Pow-R 2 or Pow-R 3 ensures that the dithering effect is minimized, particularly in the higher frequencies. This is especially important for music with fine details, complex textures, or delicate elements.
For simpler applications: If you need less complexity and just want to reduce the bit depth, Flat-Top Dithering or TPDF may be sufficient.

Conclusion

Dithering is an important process in music production that ensures the preservation of sound quality when reducing the bit depth – especially when converting from 24-bit to 16-bit. It helps minimize audible artifacts like noise and distortion that can occur due to requantization. Without dithering, these issues could negatively impact the sound of your tracks, especially in more complex or detailed productions.

Overall, it’s crucial to choose the right dither algorithm to maintain the best possible sound quality when converting your audio files to a lower bit depth. This ensures precise and clear playback – both during production and in the final output.

The choice of the best dither algorithm depends on your goal. Pow-R 2 or Noise Shaping dithering are generally the best options for high sound quality without too much audible noise, especially in complex music productions. However, if you need a quick and simple solution without paying too much attention to detail, TPDF or Flat-Top may be sufficient.

Many dither plugins have their own terminology. Instead of confusing names like Pow-R 2, you can simply select how much dithering is applied and what kind of noise-shaping should be used. Setting both to a good middle range (such as „normal“ or „medium“) is typically enough for most productions.

In summary, when downsampling or converting, be mindful of True Peak levels to avoid distortion and ensure that your tracks are in the ideal LUFS range for consistent loudness. Using proper tools for conversion and understanding these technical aspects can greatly improve the quality and compatibility of your audio for various platforms.

In summary: When downsampling or converting, be mindful of True Peak levels to avoid distortion and ensure that your tracks are in the ideal LUFS range for consistent loudness. Using proper tools for conversion and understanding these technical aspects can greatly improve the quality and compatibility of your audio for various platforms.

And as mentioned: if you don’t want to deal with this yourself, we’re happy to take care of it! Send us your request.

Send a message to emaer