Converting Raw PCM to WAV Format
Gemini TTS returns raw PCM data without a WAV header. Add a proper WAV header for compatibility.
WAV Header Structure
function createWavHeader(dataLength: number, sampleRate: number, channels: number, bitsPerSample: number): Buffer {
const header = Buffer.alloc(44);
// RIFF chunk descriptor
header.write('RIFF', 0); // ChunkID
header.writeUInt32LE(36 + dataLength, 4); // ChunkSize
header.write('WAVE', 8); // Format
// fmt sub-chunk
header.write('fmt ', 12); // Subchunk1ID
header.writeUInt32LE(16, 16); // Subchunk1Size (16 for PCM)
header.writeUInt16LE(1, 20); // AudioFormat (1 = PCM)
header.writeUInt16LE(channels, 22); // NumChannels
header.writeUInt32LE(sampleRate, 24); // SampleRate
header.writeUInt32LE(sampleRate * channels * bitsPerSample / 8, 28); // ByteRate
header.writeUInt16LE(channels * bitsPerSample / 8, 32); // BlockAlign
header.writeUInt16LE(bitsPerSample, 34); // BitsPerSample
// data sub-chunk
header.write('data', 36); // Subchunk2ID
header.writeUInt32LE(dataLength, 40); // Subchunk2Size
return header;
}
Complete Conversion
function pcmToWav(pcmBuffer: Buffer, sampleRate: number = 24000, channels: number = 1, bitsPerSample: number = 16): Buffer {
const header = createWavHeader(pcmBuffer.length, sampleRate, channels, bitsPerSample);
return Buffer.concat([header, pcmBuffer]);
}
Usage with Gemini TTS
import { createWavHeader } from './audio-utils';
// Gemini returns raw PCM
const pcmBuffer = Buffer.from(audioPart.inlineData.data, 'base64');
// Convert to WAV
const wavBuffer = pcmToWav(pcmBuffer, 24000, 1, 16);
// Save WAV file
fs.writeFileSync('output.wav', wavBuffer);
Constants for Gemini TTS
const AUDIO_CONFIG = {
SAMPLE_RATE: 24000, // 24 kHz
CHANNELS: 1, // Mono
BYTES_PER_SAMPLE: 2, // 16-bit = 2 bytes
};
Header Field Explanations
| Offset | Field | Size | Description |
|---|---|---|---|
| 0-3 | ChunkID | 4 | "RIFF" |
| 4-7 | ChunkSize | 4 | 36 + dataLength |
| 8-11 | Format | 4 | "WAVE" |
| 12-15 | Subchunk1ID | 4 | "fmt " |
| 16-19 | Subchunk1Size | 4 | 16 for PCM |
| 20-21 | AudioFormat | 2 | 1 = PCM |
| 22-23 | NumChannels | 2 | 1 = mono |
| 24-27 | SampleRate | 4 | 24000 |
| 28-31 | ByteRate | 4 | SampleRate × Channels × BitsPerSample/8 |
| 32-33 | BlockAlign | 2 | Channels × BitsPerSample/8 |
| 34-35 | BitsPerSample | 2 | 16 |
| 36-39 | Subchunk2ID | 4 | "data" |
| 40-43 | Subchunk2Size | 4 | dataLength |
| 44+ | Data | - | PCM audio data |
Byte Order
All multi-byte values use little-endian byte order (UInt32LE, UInt16LE).