Every major OTT platform requires a specific set of audio stems alongside the programme mix. These stems allow platforms to generate alternative language versions, accessibility mixes, and localised content without access to the original session files. Getting stem delivery wrong is one of the most common reasons audio deliveries are rejected at QC — and rejections cost production money and schedule.
Core Stem Types and Their Purpose
The four primary stems required by virtually all OTT platforms are:
- DX (Dialogue) — all on-screen spoken dialogue, including production sound and ADR. Does not include narration or voice-over unless specified as "full dialogue."
- FX (Sound Effects) — all designed sound effects, Foley, and backgrounds. Music and dialogue must be absent.
- MX (Music) — all music, including score and licensed tracks. No dialogue or effects.
- M&E (Music and Effects) — FX and MX combined. This is the primary deliverable for international versioning — dubbing studios replace the DX layer while preserving M&E intact. The M&E must be complete — all missing elements (like production FX from dialogue scenes) must be replaced with designed equivalents. Incomplete M&E is the most common stem delivery failure.
Netflix Audio Delivery Specifications
Netflix requires: a Dolby Atmos ADM BWF master (–27 LKFS, –2 dBTP), a 5.1 mix in MXF IMF format, stereo versions (LtRt and LoRo), and separate stems for DX, FX, MX, and M&E in both 5.1 and stereo. File naming must follow Netflix naming conventions exactly (programme ID, version, stem type, format, reel number). Any deviation from naming conventions causes automated rejection before human QC review.
All audio files must be 24-bit, 48 kHz. Netflix does not accept 96 kHz or 32-bit float for delivery. Timecode must be continuous (no gaps or discontinuities) and must match the video master exactly. Even a single-frame audio/video offset results in rejection.
The complete reference for stem delivery — DX, FX, MX, M&E construction, international delivery, platform-specific specs, and QC checklists.
Get the BookM&E Construction: The Critical Deliverable
The M&E is the most labour-intensive stem to produce correctly because it requires filling all sonic holes that dialogue leaves behind. Every scene where production sound contains dialogue mixed with background effects requires a full-effects replacement of the production ambience. This work — called "filling M&E holes" — is done by the sound effects editor before the final mix and requires access to all Foley, backgrounds, and designed FX from the session.
The rule is: when you mute all dialogue tracks from the final mix, the remaining audio (FX + music) should sound complete. If there are obvious holes where dialogue was present — sudden silence in an active scene, missing footsteps during a walking dialogue sequence, absent ambient sound under a conversation — the M&E will fail QC and require remedial work.
File Naming and Metadata
File naming conventions vary slightly between platforms, but the general structure is: [ProgrammeID]_[Version]_[StemType]_[Format]_[Reel].wav. For example: SHOW001_EN_MX_51_R01.wav. All audio metadata (sample rate, bit depth, timecode origin, programme title) must be embedded in the BWF header. Missing or incorrect metadata in BWF headers is a common QC failure.
Delivery checklist: Before submitting, verify: loudness compliance on all formats, True Peak compliance on all formats, M&E completeness (mute all DX and listen), file naming against platform spec, and timecode continuity. Most QC failures could be caught with a 30-minute self-check pass.
Pre-configured print master buses for DX, FX, MX, M&E in 5.1 and stereo. Every stem output is ready to arm and print before the mix even starts.
Download Template