Text to Speech

Convert text into natural-sounding speech using advanced AI models from multiple providers.

What is the Text to Speech Node?

The Text to Speech (TTS) node is a unified interface that provides access to multiple leading text-to-speech services. This powerful node allows you to convert text into high-quality spoken audio using models from OpenAI, ElevenLabs, and Amazon AWS Polly. Each provider offers unique voices, languages, and quality levels, giving you flexibility to choose the best option for your use case.

Supported Models

OpenAI Models

GPT-4o Mini TTS: Cost-effective, fast generation with high-quality output
TTS 1: Standard quality with fast generation
TTS 1 HD: High-definition audio quality

ElevenLabs Models

ElevenLabs Flash V2.5: Ultra-fast generation with excellent quality
ElevenLabs Turbo V2.5: Balanced speed and quality for production use
ElevenLabs Multilingual V2: Support for 30+ languages with natural intonation

Amazon AWS Models

AWS Polly: Multiple voice engines (Standard, Neural, Long-form, Generative) with extensive language support

How to use it?

Add the Text to Speech node: Drag and drop the Text to Speech node into your workflow from the Speech category.
Select Your Model: Choose from the available providers and models based on your requirements:
- For speed: Use ElevenLabs Flash V2.5 or OpenAI GPT-4o Mini TTS
- For quality: Use OpenAI TTS 1 HD or ElevenLabs Turbo V2.5
- For multilingual: Use ElevenLabs Multilingual V2 or AWS Polly
Configure Credentials: Select appropriate credentials based on your chosen model:
- OpenAI models require OpenAI API credentials
- ElevenLabs models require an ElevenLabs API Key
- AWS Polly requires AWS credentials
Provider-Specific Configuration:
- OpenAI Configuration
  - Voice ID: Choose from 10 voice options (Alloy, Ash, Ballad, Coral, Echo, Fable, Nova, Onyx, Sage, Shimmer)
  - Speed: Adjust playback speed from 0.25x to 4x (default: 1.0)
- ElevenLabs Configuration
  - Voice ID: Use the ID of the voice you want to use, with managed credentials only the default voices are supported. The voice needs to be in your library.
  - Language: Optionally enforce a specific language (supports 30+ languages)
  - Voice Stability: Control consistency (0-1, default: 0.5)
  - Voice Speed: Adjust speaking rate (0.7-1.2, default: 1.0)
  - Similarity: Control voice similarity to original (0-1, default: 0.75)
  - Style Exaggeration: Adjust emotional expression (0-1, default: 0)
  - Speaker Boost: Enable for enhanced clarity (default: true)
  - Seed: Optional seed for reproducible results (0-4,294,967,295)
- AWS Polly Configuration
  - Region: Select AWS region for processing
  - Voice Engine: Choose from Standard, Neural, Long-form, or Generative
  - Voice ID: Select from 60+ voices across multiple languages and accents
Connect Input: Connect a text input to provide the content you want to convert to speech.
Connect Output: The audio output can be connected to:
- File Writer to save the audio file
- Other nodes for further processing

Example Task: Creating an Audiobook

Objective: Convert written text into a professional audiobook with natural-sounding narration.

Step-by-Step Setup

Add a Text Input:
- Drag and drop a Text Input node into your workflow
- Enter or paste the text you want to convert to speech
Add and Configure Text to Speech:
- Drag the Text to Speech node into your workflow
- Select Model: Choose "OpenAI TTS 1 HD" for high-quality narration
- Select Credentials: Choose your OpenAI credentials
- Select Voice: Choose "Nova" for a warm, engaging narration voice
- Set Speed: Keep at 1.0 for natural pacing
Connect Text Input to TTS:
- Connect the output of the Text Input node to the text input of the TTS node
Add File Writer:
- Drag a File Writer node to save the generated audio
- Storage Provider: Select AWS S3
- File Path: Enter audiobooks/chapter-1.mp3
- Bucket Name: Enter your S3 bucket name
Connect TTS to File Writer:
- Connect the audio output from the TTS node to the file input of the File Writer

Example Task: Multilingual Content Generation

Objective: Generate speech in multiple languages for international audiences.

Step-by-Step Setup

Configure for Multilingual Output:
- Add the Text to Speech node
- Select Model: Choose "ElevenLabs Multilingual V2"
- Select Credentials: Use Nocodo Managed Credentials
- Language: Select your target language (e.g., "Spanish" for es, "French" for fr)
- Voice Stability: Set to 0.6 for consistent pronunciation
- Similarity: Set to 0.8 for authentic accent
Connect Your Workflow:
- Connect translated text input to the TTS node
- Connect the audio output to your desired destination

Cost Optimization Tips

Model Selection:
- Use GPT-4o Mini TTS for high-volume, cost-sensitive applications
- Use ElevenLabs Flash V2.5 for rapid prototyping
- Reserve HD models for final production content
Text Preprocessing:
- Remove unnecessary characters and formatting
- Batch similar content together
- Use shorter text segments for testing
Voice Settings:
- Start with default settings and adjust incrementally

Required AWS IAM Roles and Permissions

When using AWS Polly, ensure your IAM user has the following permissions:

polly:SynthesizeSpeech
polly:DescribeVoices

When saving to S3 (via File Writer):

s3:PutObject
s3:GetObject
s3:ListBucket

Output Formats

The Text to Speech node outputs audio in the following formats:

OpenAI: MP3, OPUS, AAC, FLAC, WAV, PCM
ElevenLabs: MP3 (various bitrates and formats available)
AWS Polly: MP3, OGG, PCM

Useful Resources

OpenAI TTS API Documentation: Official OpenAI text-to-speech documentation
ElevenLabs API Documentation: Comprehensive ElevenLabs API guide
AWS Polly Documentation: Complete AWS Polly service documentation
File Writer Node Documentation: Learn how to save generated audio files

Troubleshooting

Common Issues

Audio Quality Issues:
- Try using a higher-quality model (TTS 1 HD for OpenAI)
- Adjust stability and similarity settings for ElevenLabs
- Ensure text is properly formatted without special characters
Voice Sounds Unnatural:
- Reduce voice speed if too fast
- Adjust style exaggeration settings
- Try different voices to find the best match
Language Support:
- Verify the selected model supports your target language
- Use ElevenLabs Multilingual V2 for best multilingual support
- Check AWS Polly regional voice availability
API Errors:
- Verify credentials are valid and active
- Check API usage limits and quotas
- Ensure proper IAM permissions for AWS Polly

By following these guidelines and leveraging the appropriate model for your use case, you can create high-quality speech synthesis for any application, from audiobooks and podcasts to multilingual content and accessibility features.

What is the Text to Speech Node?​

Supported Models​

OpenAI Models​

ElevenLabs Models​

Amazon AWS Models​

How to use it?​

Example Task: Creating an Audiobook​

Step-by-Step Setup​

Example Task: Multilingual Content Generation​

Step-by-Step Setup​

Cost Optimization Tips​

Required AWS IAM Roles and Permissions​

Output Formats​

Useful Resources​

Troubleshooting​

Common Issues​

What is the Text to Speech Node?

Supported Models

OpenAI Models

ElevenLabs Models

Amazon AWS Models

How to use it?

Example Task: Creating an Audiobook

Step-by-Step Setup

Example Task: Multilingual Content Generation

Step-by-Step Setup

Cost Optimization Tips

Required AWS IAM Roles and Permissions

Output Formats

Useful Resources

Troubleshooting

Common Issues