The audiobook market has grown dramatically alongside the rise of streaming platforms like Audible, Apple Books, and Google Play Books. For audio engineers, it represents a consistent and rewarding category of work — one that demands technical precision, a thorough understanding of voice recording, and a meticulous approach to platform compliance. This guide covers the complete professional workflow: from preparing the recording environment and capturing the performance, through to editing, mastering, and delivering files that pass ACX and platform quality checks first time.
1. Pre-Production: Before the Microphone Goes Up
Good audiobook production starts long before recording. The most important investment you can make is in preparation — it saves significant time in the editing room.
Script Preparation
Work from a properly formatted script — page-numbered, clearly laid out, with chapter markers and consistent character names. If you are producing the audiobook yourself or for an independent author, create a proofed, final version of the manuscript before recording begins. Changes made mid-production are expensive in time. Mark any words with unusual pronunciation (technical terms, foreign words, character names) and agree on the pronunciations with the narrator upfront.
Room Treatment and Acoustic Environment
The quality of the recording environment is the single biggest determinant of audiobook quality. ACX and most major platforms have strict noise floor requirements. Your recording space must achieve a noise floor of −60 dBFS RMS or lower (ACX requires ≤ −60 dBFS). This is significantly quieter than what most home studios achieve without proper treatment. Key considerations:
- Eliminate HVAC noise — turn off air conditioning and heating during recording sessions if possible.
- Use heavy curtains, moving blankets, or acoustic panels to reduce room reflections.
- Record in the quietest time of day — avoid traffic peaks, construction noise, and activity in adjacent rooms.
- A purpose-built vocal booth or portable isolation enclosure (e.g., Kaotica Eyeball, sE Reflexion Filter, or a walk-in wardrobe lined with clothing) can provide a significant noise floor reduction.
Microphone Selection and Positioning
A large-diaphragm condenser microphone is the standard choice for audiobook narration — it captures the full frequency range of the voice with low self-noise. Popular options include the Neumann U87, Audio-Technica AT4050, Rode NT1, and Audio-Technica AT2020 at a lower price point. Position the microphone approximately 15–25 cm from the narrator's mouth, slightly above or at lip level, with a pop filter 5–8 cm in front. The cardioid polar pattern should face away from the room's primary reflection points.
Gain staging: Record with peaks hitting around −6 dBFS to −12 dBFS. Never clip the input signal. Leave headroom for dynamic variation — a narrator speaking a whisper should still register clearly, and a louder emphasis should not clip.
2. Recording Session Best Practices
Audiobook recording sessions are long — a single narrator may record 8–10 hours to complete a full-length novel. Maintaining consistency across those sessions is one of the central challenges of audiobook production.
Chapter-by-Chapter Recording
Record one chapter at a time and export each chapter as a separate file. This makes editing and quality control manageable. Use a consistent naming convention from the start: AuthorName_ChapterXX_Take01.wav. Record at 44.1 kHz or 48 kHz, 24-bit. ACX accepts both; 44.1 kHz is the traditional broadcast standard for audiobooks.
Pickups and Retakes
Establish a clear retake protocol before the session starts. The narrator should pause, breathe, and then re-read from the beginning of the sentence (not mid-sentence) when they make an error. Leave a visible marker in the recording software (a clap, a tone, or a comment marker) at the point of the retake. This makes the editor's job significantly faster. Never stop the recording between retakes — keep the session rolling.
Room Tone Reference
Record 30–60 seconds of room tone at the beginning of each session, with the narrator sitting in position and the microphone active, but without speaking. This reference is essential for noise reduction and for filling gaps in the edited audio.
A practical mastering template for audiobook production — designed to help you manage gain staging, EQ, dynamics, consistency, and ACX-ready delivery faster.
Get It Now3. Editing the Narration
Audiobook editing is detailed, time-consuming work. A rough guide is that editing takes two to three times the runtime of the finished audio — a ten-hour audiobook may take 20–30 hours to edit properly.
Removing Errors and Noise
Work through each chapter sequentially. Remove false starts, mouth clicks, lip smacks, breath sounds that are distractingly loud, and any external noise events (traffic, aircraft, phone notifications). Use clip gain to level individual phrases before applying any processing — this gives your dynamics plugins a consistent input to work with.
Breath Editing
This is a significant and often underestimated part of audiobook editing. Not all breaths should be removed — some are natural and improve the listening experience. Overly loud, intrusive, or mis-timed breaths should be reduced in level (not necessarily deleted) using clip gain. A good rule: if a breath draws your attention as a listener, it should be reduced or removed. If it sounds like a natural human pause, leave it.
Pacing and Pause Management
Audiobook listeners rely on consistent pacing to stay engaged over many hours. Manage inter-sentence and inter-paragraph pauses deliberately. ACX specifies that no silence exceeding −60 dBFS RMS lasting more than 5 seconds should appear within the body of a chapter. Use room tone to fill edited gaps rather than digital silence — room tone sounds natural; digital silence sounds like a technical fault.
4. Processing and Mastering
Once editing is complete, apply your processing chain to achieve a professional, compliant final master.
Noise Reduction
If the recording has any residual background noise, apply spectral noise reduction (iZotope RX, Adobe Audition's Noise Reduction, or Cedar) using the room tone reference recorded at the start of the session. Learn the noise profile from the room tone, then apply a transparent reduction to the narration. Use the minimum amount of reduction that achieves a clean result — aggressive noise reduction creates metallic artefacts and thin-sounding voices.
EQ
A high-pass filter at 80–100 Hz removes low-frequency handling noise and room rumble. Gentle cuts in the 200–400 Hz range can reduce muddiness or boxiness in the recording. A presence lift at 2–5 kHz aids intelligibility. Cut any harshness in the 3–6 kHz range if the voice recording is bright. The target is a voice that sounds natural, clear, and easy to listen to for extended periods.
Compression
A gentle, transparent compressor helps even out dynamic variation. A ratio of 2:1 to 3:1 with a medium attack (15–25 ms) and a longer release (150–300 ms) works well for narration. Use subtle gain reduction — 2–4 dB of GR on peaks is appropriate. A limiter at the end of the chain to control true peak is essential for compliance.
De-Essing
Apply a de-esser to control sibilance. Set the frequency range to target the specific sibilance peak in the recording (typically 5–8 kHz) and apply only as much reduction as needed — over-de-essing creates a lispy sound that is unpleasant over long listening sessions.
5. Platform Delivery Standards
Each major platform has specific technical requirements. Meeting these is non-negotiable for publication.
| Platform | RMS Level | Peak Level | Noise Floor | Format |
|---|---|---|---|---|
| ACX (Audible/Amazon) | −23 to −18 dBFS RMS | ≤ −3 dBFS | ≤ −60 dBFS RMS | MP3 192 kbps or WAV 44.1 kHz / 16-bit |
| Apple Books | −23 to −18 dBFS RMS | ≤ −3 dBFS | ≤ −60 dBFS RMS | MP3 192 kbps |
| Findaway / Libro.fm | −23 to −18 dBFS RMS | ≤ −3 dBFS | ≤ −60 dBFS RMS | MP3 192 kbps |
| Author's Republic | −23 to −18 dBFS RMS | ≤ −3 dBFS | ≤ −60 dBFS RMS | MP3 192 kbps or WAV |
RMS vs LUFS: ACX and audiobook platforms use RMS measurement rather than integrated LUFS. These are different measurements. Ensure your metering plugin is set to RMS (not LUFS) when checking compliance against ACX specifications.
File Structure
ACX requires each chapter to be submitted as a separate file, plus an opening file containing any front matter (title, author, narrator, copyright notice) and a closing credits file. Each file must begin and end with 0.5–1 second of room tone silence. Chapter files should be named clearly and sequentially.
6. Quality Control
Before submission, run each chapter through ACX Check (available as a free Audacity plugin) or your DAW's analysis tools to verify:
- RMS level is within the −23 to −18 dBFS window.
- Peak level does not exceed −3 dBFS.
- Noise floor (measured from the room tone at the beginning or end of the file) is at or below −60 dBFS RMS.
- No internal silence exceeds 5 seconds.
- The opening and closing room tone handles are present.
- Encoding is correct (MP3 192 kbps CBR, or WAV at the correct bit depth).
Listen to the first few minutes and last few minutes of each file on headphones and on a speaker — not just on studio monitors. Audiobook listeners predominantly use earbuds and car speakers. Any low-frequency muddiness or high-frequency harshness that is tolerable on full-range monitors will be amplified on typical consumer playback devices.
7. Common Rejection Reasons and How to Avoid Them
ACX and other platforms reject a significant proportion of first-time submissions. The most common technical rejection reasons are:
- Noise floor too high: HVAC, room noise, or computer fan noise is audible. Treat the recording environment before re-recording.
- RMS level out of range: Too quiet (below −23 dBFS) or too loud (above −18 dBFS). Use a limiter or gain adjustment to hit the target window.
- Peak exceeds −3 dBFS: A peak limiter with a ceiling set to −3 dBFS or lower prevents this.
- Audible noise reduction artefacts: Metallic or watery sounds caused by aggressive noise reduction. Reduce the amount of NR applied and re-process.
- Internal silence too long: Edit all long pauses to within the 5-second limit, using room tone to fill the gap naturally.
Speed up your audiobook workflow with a ready-made mastering setup built for cleaner narration, better consistency, and faster ACX-compliant delivery.
Get It NowConclusion
Professional audiobook production is a discipline that rewards methodical, patient work. The technical standards are clear and achievable, but they demand attention to every stage of the process — from the acoustic environment through to the final export settings. A well-produced audiobook is a pleasure to listen to over many hours; poor technical quality will fatigue the listener and result in negative reviews regardless of the quality of the writing or performance. Master the workflow described here and you will consistently deliver audiobooks that pass platform QC first time and serve the listener well.