Generating Video Scripts with Gemini AI
Both legacy scenes schema and modern slides+highlights schema are valid, depending on your pipeline.
Approach A: Legacy Scene-based Schema
Good for standalone script generation or older renderers.
{
title: string;
description: string;
scenes: Array<{
sceneNumber: number;
headline?: string;
narration: string;
visualDescription: string;
duration: number;
}>;
}
Approach B: Pipeline Schema (Slides + Highlights)
Good for modern multi-stage pipelines.
{
videoFormat: 'long' | 'single_short' | 'multi_short';
title: string;
description: string;
thumbnailDescription: string;
highlights: Array<{ title: string; description: string; reason: string; slides: number[] }>;
slides: Array<{
slideIndex: number;
headline: string;
imageDescription: string; // English
audioNarration: string; // Japanese
estimatedDuration: number;
directorNotes?: string;
audioProfile?: 'urgent' | 'calm' | 'excited' | 'serious' | 'casual' | 'dramatic';
}>;
}
Prompting Rules (Works for Both)
- Force JSON-only output.
- Keep language boundaries explicit.
- Ground facts in source material.
- Include narration and visual guidance per unit (scene/slide).
Normalization and Ownership Rules
In stage-based pipelines:
- selection stage owns final format choice
- script stage can normalize model output to selected format
- script save should avoid overwriting selection-owned fields (
video_format)
Cost Tracking
Capture usageMetadata token counts and log per script-generation attempt.