Author Nation Live 25 B1-42 Digitally-Narrated Audiobooks... I know, I know. Hear me out.
Phil Marshall, AI technologist and founder of Spoken.press, delivered a comprehensive session on the current state and future of digital narration for indie authors. The presentation addressed the fundamental barriers that have historically limited audiobook production—cost, time, and workflow complexity—and demonstrated how AI narration technology has evolved from novelty to professional-grade tool. Marshall explained that half of Americans now consume spoken word media daily, with audio experiencing 26% year-over-year growth while other formats remain flat. The session covered the partnership ecosystem between narration platforms (11 Labs, Hume) and distribution channels (Voices by an Audio, Spotify, Kobo, YouTube, BookFunnel), while acknowledging that ACX/Audible does not yet accept AI-narrated content. Marshall introduced Spoken's "pay when perfect" pricing model and previewed upcoming "Motion" technology for AI-generated video book trailers. The session emphasized workflow transformation—treating audiobook production as an organic, iterative process rather than a one-time handoff.
Tools/Software
- Spoken (spoken.press): AI narration platform designed specifically for authors, integrating 11 Labs and Hume voice technologies with mastering to ACX standards
- 11 Labs: AI voice generation company ($3 billion market cap) offering 32-language support and voice actor libraries; known for reliable, consistent output
- Hume: AI voice technology partner specializing in more natural, emotive voice delivery; uses single-model architecture
- Voices by an Audio (formerly Findaway Voices): Distribution platform acquired and spun out by Spotify, offering 10 distribution endpoints
- ACX (Audiobook Creation Exchange): Amazon/Audible's audiobook platform; currently does not accept AI narration
- Kobo: Ebook/audiobook retailer accepting AI-narrated content
- BookFunnel: Distribution platform accepting digital narration
- Spotify: Music/podcast platform accepting AI-narrated audiobooks
- YouTube: Emerging audiobook distribution channel with ad monetization potential
- Kurzweil Reading Machine: 1976 text-to-speech device for the blind (historical reference)
- WaveNet: 2016 Google neural network for text-to-speech (historical reference)
- Scribe Shadow: Translation service mentioned for foreign language audiobook production
Key Concepts
- Digital Narration: AI-generated voice narration for audiobooks
- Multi-Voice Narration: Full-cast audiobook production with unique voices for each character
- Duet Narration: Alternating POV narration typically using male/female narrators
- Single Narrator: Traditional one-voice audiobook format
- Pay When Perfect: Spoken's pricing model—free to use during production, fixed cost only when finalized
- Voice Cloning/Custom Voice Generation: Creating unique AI voices based on character descriptions
- Passage Attribution: Technology identifying which character speaks each line of dialogue
- Dialogue Tag Removal: Editing technique removing "he said/she said" tags unnecessary in multi-voice audio
- Mastering to ACX Standards: Audio processing including 192 bit rate, floor/ceiling leveling, background noise removal, normalization
- Motion: Spoken's upcoming AI video trailer generation feature
- Hallucinations: AI voice errors including mispronunciations or "demon speak"
- Voice Actor Voices vs. System Generated Voices: Library voices from real actors versus AI-generated custom voices
Specific Strategies
- Workflow-First Approach: Treating digital narration as iterative/organic rather than fixed product
- Dual Manuscript Strategy: Maintaining separate manuscripts for print (with dialogue tags) and audio (tags removed)
- YouTube Audiobook Monetization: Publishing full audiobooks on YouTube with ad revenue
- Single Narrator + Multi-Voice Strategy: Creating single narrator version for YouTube (free) and multi-voice for paid distribution
- Chapter-by-Chapter Proofing: Narrating and proofing one chapter at a time before proceeding
- Custom Voice Generation from Character Descriptions: Using character prompts to generate unique voices
🔒 Unlock the Full Replay
In the full video, Phil Marshall shares a complete screen-share walkthrough of the Spoken platform—from manuscript upload through voice selection, custom voice generation, passage-by-passage proofing, and final mastering. Watch him build a multi-voice short story in real-time, showing exactly how the system automatically attributes dialogue to characters and separates it from narration.
Q: How much does AI audiobook narration cost per word with Spoken's pricing model?
A: $10 per 5,000 finished words (approximately $0.002 per word). Phil Marshall announced Spoken's "pay when perfect" fixed pricing, eliminating credit-based systems where re-narrations deplete your balance. A 90,000-word book would cost approximately $180, with unlimited re-narrations included until the author is satisfied with the final product.
Q: What percentage of Americans consume spoken word media daily?
A: Half of all Americans listen to spoken word media every single day. Phil Marshall cited this statistic to emphasize the growing dominance of audio consumption, particularly among younger demographics. He noted that audio is experiencing 26% year-over-year growth while other reading modalities have remained relatively flat.
Q:How long does it take to proof a single-narrator AI audiobook? A: Approximately 14 hours for a 90,000-word book (10 hours listening + 3-4 hours updating).
A: Marshall explained that a 90,000-word manuscript produces roughly 10 hours of audio. Using 11 Labs single narrator with minimal editing, authors should expect to listen through the entire work plus spend additional time on corrections and tweaks.
Q: Does Spoken work with Findaway Voices for audiobook distribution?
A:Findaway Voices has been acquired and rebranded as "Voices by an Audio." Spoken has established a direct integration with this platform, which provides distribution to 10 different endpoints including Kobo. When Voices by an Audio receives files from Spoken, they're automatically cleared for distribution without additional review.
Q: Can AI narration platforms create children's voices for kids' books?
A:11 Labs prohibits children's voices due to safety concerns across their many applications. However, Hume generates quality children's voices, and Spoken's stock library includes child voice options. Spoken conducts safety reviews across six different ratings for all content, with strict blocks on any content involving minors in inappropriate contexts.