Text to Speech
Convert text into natural-sounding speech using advanced AI models from multiple providers.
What is the Text to Speech Node?
The Text to Speech (TTS) node is a unified interface that provides access to multiple leading text-to-speech services. This powerful node allows you to convert text into high-quality spoken audio using models from OpenAI, ElevenLabs, and Amazon AWS Polly. Each provider offers unique voices, languages, and quality levels, giving you flexibility to choose the best option for your use case.
Supported Models
OpenAI Models
- GPT-4o Mini TTS: Cost-effective, fast generation with high-quality output
- TTS 1: Standard quality with fast generation
- TTS 1 HD: High-definition audio quality
ElevenLabs Models
- ElevenLabs Flash V2.5: Ultra-fast generation with excellent quality
- ElevenLabs Turbo V2.5: Balanced speed and quality for production use
- ElevenLabs Multilingual V2: Support for 30+ languages with natural intonation
Amazon AWS Models
- AWS Polly: Multiple voice engines (Standard, Neural, Long-form, Generative) with extensive language support
How to use it?
-
Add the Text to Speech node: Drag and drop the Text to Speech node into your workflow from the Speech category.
-
Select Your Model: Choose from the available providers and models based on your requirements:
- For speed: Use ElevenLabs Flash V2.5 or OpenAI GPT-4o Mini TTS
- For quality: Use OpenAI TTS 1 HD or ElevenLabs Turbo V2.5
- For multilingual: Use ElevenLabs Multilingual V2 or AWS Polly
-
Configure Credentials: Select appropriate credentials based on your chosen model:
- OpenAI models require OpenAI API credentials
- ElevenLabs models require an ElevenLabs API Key
- AWS Polly requires AWS credentials
-
Provider-Specific Configuration:
-
OpenAI Configuration
- Voice ID: Choose from 10 voice options (Alloy, Ash, Ballad, Coral, Echo, Fable, Nova, Onyx, Sage, Shimmer)
- Speed: Adjust playback speed from 0.25x to 4x (default: 1.0)
-
ElevenLabs Configuration
- Voice ID: Use the ID of the voice you want to use, with managed credentials only the default voices are supported. The voice needs to be in your library.
- Language: Optionally enforce a specific language (supports 30+ languages)
- Voice Stability: Control consistency (0-1, default: 0.5)
- Voice Speed: Adjust speaking rate (0.7-1.2, default: 1.0)
- Similarity: Control voice similarity to original (0-1, default: 0.75)
- Style Exaggeration: Adjust emotional expression (0-1, default: 0)
- Speaker Boost: Enable for enhanced clarity (default: true)
- Seed: Optional seed for reproducible results (0-4,294,967,295)
-
AWS Polly Configuration
- Region: Select AWS region for processing
- Voice Engine: Choose from Standard, Neural, Long-form, or Generative
- Voice ID: Select from 60+ voices across multiple languages and accents
-
-
Connect Input: Connect a text input to provide the content you want to convert to speech.
-
Connect Output: The audio output can be connected to:
- File Writer to save the audio file
- Other nodes for further processing
Example Task: Creating an Audiobook
Objective: Convert written text into a professional audiobook with natural-sounding narration.
Step-by-Step Setup
-
Add a Text Input:
- Drag and drop a Text Input node into your workflow
- Enter or paste the text you want to convert to speech
-
Add and Configure Text to Speech:
- Drag the Text to Speech node into your workflow
- Select Model: Choose "OpenAI TTS 1 HD" for high-quality narration
- Select Credentials: Choose your OpenAI credentials
- Select Voice: Choose "Nova" for a warm, engaging narration voice
- Set Speed: Keep at 1.0 for natural pacing
-
Connect Text Input to TTS:
- Connect the output of the Text Input node to the text input of the TTS node
-
Add File Writer:
- Drag a File Writer node to save the generated audio
- Storage Provider: Select AWS S3
- File Path: Enter
audiobooks/chapter-1.mp3 - Bucket Name: Enter your S3 bucket name
-
Connect TTS to File Writer:
- Connect the audio output from the TTS node to the file input of the File Writer
Example Task: Multilingual Content Generation
Objective: Generate speech in multiple languages for international audiences.
Step-by-Step Setup
-
Configure for Multilingual Output:
- Add the Text to Speech node
- Select Model: Choose "ElevenLabs Multilingual V2"
- Select Credentials: Use Nocodo Managed Credentials
- Language: Select your target language (e.g., "Spanish" for es, "French" for fr)
- Voice Stability: Set to 0.6 for consistent pronunciation
- Similarity: Set to 0.8 for authentic accent
-
Connect Your Workflow:
- Connect translated text input to the TTS node
- Connect the audio output to your desired destination
Cost Optimization Tips
-
Model Selection:
- Use GPT-4o Mini TTS for high-volume, cost-sensitive applications
- Use ElevenLabs Flash V2.5 for rapid prototyping
- Reserve HD models for final production content
-
Text Preprocessing:
- Remove unnecessary characters and formatting
- Batch similar content together
- Use shorter text segments for testing
-
Voice Settings:
- Start with default settings and adjust incrementally
Required AWS IAM Roles and Permissions
When using AWS Polly, ensure your IAM user has the following permissions:
polly:SynthesizeSpeechpolly:DescribeVoices
When saving to S3 (via File Writer):
s3:PutObjects3:GetObjects3:ListBucket
Output Formats
The Text to Speech node outputs audio in the following formats:
- OpenAI: MP3, OPUS, AAC, FLAC, WAV, PCM
- ElevenLabs: MP3 (various bitrates and formats available)
- AWS Polly: MP3, OGG, PCM
Useful Resources
- OpenAI TTS API Documentation: Official OpenAI text-to-speech documentation
- ElevenLabs API Documentation: Comprehensive ElevenLabs API guide
- AWS Polly Documentation: Complete AWS Polly service documentation
- File Writer Node Documentation: Learn how to save generated audio files
Troubleshooting
Common Issues
-
Audio Quality Issues:
- Try using a higher-quality model (TTS 1 HD for OpenAI)
- Adjust stability and similarity settings for ElevenLabs
- Ensure text is properly formatted without special characters
-
Voice Sounds Unnatural:
- Reduce voice speed if too fast
- Adjust style exaggeration settings
- Try different voices to find the best match
-
Language Support:
- Verify the selected model supports your target language
- Use ElevenLabs Multilingual V2 for best multilingual support
- Check AWS Polly regional voice availability
-
API Errors:
- Verify credentials are valid and active
- Check API usage limits and quotas
- Ensure proper IAM permissions for AWS Polly
By following these guidelines and leveraging the appropriate model for your use case, you can create high-quality speech synthesis for any application, from audiobooks and podcasts to multilingual content and accessibility features.