← Back to Home

How to Generate Audio with Gemini Text-to-Speech

Updated March 5, 2026
geminittstext-to-speechaudio generationpcmwavvoices

Generating Audio with Gemini Text-to-Speech

Both simple single-call generation and pipeline-grade parallel generation are valid.

Models

Approach A: Basic Single-call TTS

const response = await genAI.models.generateContent({
  model: 'gemini-2.5-pro-preview-tts',
  contents: text,
  config: {
    responseModalities: ['AUDIO'],
    speechConfig: {
      voiceConfig: { prebuiltVoiceConfig: { voiceName: 'Leda' } }
    },
  },
});

Example voices often used: Aoede, Charon, Fenrir, Kore, Leda, Puck.

Approach B: Pipeline Per-slide TTS

Extract and Convert Audio

Gemini returns base64 PCM (24kHz mono 16-bit). Convert to WAV for downstream compatibility.

const parts = response.response.candidates?.[0]?.content?.parts;
const audioPart = parts?.find((part: any) => part.inlineData?.data);
const pcmBytes = Buffer.from(audioPart.inlineData.data, 'base64');

Cost and Logging

Which Approach to Choose