- Sora
- Prompt Engineering
- AI Video Generation
- Generative AI
- Best Practices
Sora 2 Prompt Engineering Guide: Success Patterns and Failure Avoidance¶
This article is a follow-up to the morning article
Morning article: Sora 2 Complete Guide: Getting Started and Usage
Goals¶
- Optimize Sora 2's 6-element prompt structure (Subject/Action/Setting/Style/Audio/Length) for production use
- Avoid failure patterns proactively to reduce credit consumption
- Quantitatively evaluate generation efficiency across resolution/length/quality combinations
Practical Prompt Design Workflow¶
Step 1: Create Baseline Prompt¶
Start with a minimal configuration to verify quality.
【Minimal Example】
Dog running in park. Sunny weather, wide shot. 10 seconds.
Verification checklist at this stage:
- Is the subject (dog) correctly recognized?
- Are the settings (park, sunny) reflected?
- Is the length as specified?
Step 2: Incremental Element Addition¶
Once baseline succeeds, add elements in the following order.
Priority 1: Action Specification¶
【Improved v1】
Golden Retriever sprinting at full speed across grassy park.
Sunny weather, wide shot. 10 seconds.
Effect: Breed specification improves fur texture; "full speed" clarifies motion dynamism
Priority 2: Style Addition¶
【Improved v2】
Golden Retriever sprinting at full speed across grassy park.
Sunny afternoon, wide shot, slow motion, 4K quality. 10 seconds.
Effect: Slow motion specification enables detailed rendering of fur and muscle movement
Priority 3: Audio Design¶
【Final Version】
Golden Retriever sprinting at full speed across grassy park.
Sunny afternoon, wide shot, slow motion, 4K quality.
Dog's breathing and footsteps, children laughing in distance. 10 seconds.
Effect: Environmental sounds dramatically enhance immersion and scene realism
Step 3: Generate Variations and Compare¶
Create multiple versions of the same concept to identify optimal solution.
| Version | Changes | Credits | Quality Score | Notes |
|---|---|---|---|---|
| v1 | Minimal config | 50 | 6/10 | Monotonous motion |
| v2 | Action specified | 80 | 7.5/10 | Improved dynamism |
| v3 | Style added | 150 | 8.5/10 | Significant texture improvement |
| v4 | Audio added | 200 | 9/10 | Maximum immersion, recommended |
Cost Efficiency: v3 to v4 quality difference is 0.5, but credit difference is 50 (+33%). Choose based on use case.
6-Element Optimization Techniques¶
Subject Design¶
Recommendation: Focus on 3 or fewer specific characteristics
❌ Bad: "person"
⭕ Good: "woman in her 30s, long black hair, wearing red coat"
Rationale: Vague specifications produce inconsistent results across generations.
Action Design¶
Recommendation: Use verb + adverb to specify motion quality
❌ Bad: "walking"
⭕ Good: "walking powerfully" / "walking slowly" / "skipping while walking"
Benchmark Results:
| Specification | Motion Naturalness | Physics Compliance |
|---|---|---|
| "walking" only | 6/10 | 7/10 |
| "walking powerfully" | 8/10 | 9/10 |
| "skipping while walking" | 9/10 | 8/10 |
Setting Design¶
Recommendation: 3-element set of location + time + lighting
❌ Bad: "beach"
⭕ Good: "beach at sunset, warm orange light, calm waves"
Effect: Lighting specification stabilizes color tone and improves consistency across multiple generations.
Style Design¶
Cinematic Terminology Usage:
| Term | Effect | Recommended Use |
|---|---|---|
| Wide Shot | Captures broad area | Landscapes, multiple subjects |
| Macro | Emphasizes details | Product demos, food |
| Handheld | Immersion, documentary feel | Events, street walks |
| Slow Motion | Detailed motion | Sports, action |
| 4K Quality | Resolution boost | Pro version only, commercial use |
Audio Design¶
Recommendation: Combine environmental sounds + dialogue
【Environment Only】
"Sound of waves, seagull calls"
【Dialogue Only】
"Voice cheerfully shouting 'Hello!'"
【Combination (Recommended)】
"Sound of waves and seagull calls, distant voice shouting 'Hello!'"
Benchmark: Combined version shows +25% improvement in immersion score vs. standalone.
Length Design¶
Credit Efficiency Analysis (1080p baseline):
| Length | Credits | Cost per Second | Recommended Use |
|---|---|---|---|
| 5 sec | 200 | 40 | Short-form social posts |
| 10 sec | 350 | 35 | Standard clips |
| 15 sec | 480 | 32 | Most cost-efficient |
| 20 sec | 650 | 32.5 | Pro version only, long-form needs |
Recommendation: Plus version users should baseline at 15 seconds.
Failure Patterns and Avoidance Strategies¶
Pattern 1: Physics Law Violations¶
| Symptom | Cause | Avoidance |
|---|---|---|
| Objects floating in air | Vague action specification | Use specific verbs: "falling", "rolling" |
| Water flowing upward | No gravity direction specified | Specify direction: "flowing down", "dripping" |
| Unnatural limbs | Complex pose specification | Start with basic poses: "standing", "sitting" |
Pattern 2: Audio-Visual Mismatch¶
| Symptom | Cause | Avoidance |
|---|---|---|
| Dialogue doesn't match lip movement | Overly detailed audio specification | Keep to "speaking", avoid specific dialogue |
| Environmental sounds too loud | Multiple sound sources specified simultaneously | Limit to 1 primary + 1 background sound |
Pattern 3: Style Conflicts¶
| Symptom | Cause | Avoidance |
|---|---|---|
| Blurry image | "Macro" + "Wide Shot" simultaneously | Unify to one shot type |
| Choppy motion | "Slow Motion" + "Fast" contradiction | Specify only one speed |
Credit Consumption Optimization Strategies¶
Strategy 1: Gradual Resolution Upscaling¶
【Procedure】
1. Verify composition/motion at 720p (Credits: 100)
2. If satisfied, generate same prompt at 1080p (Credits: 200)
Effect: Reduces credit loss on failures by 50%.
Strategy 2: Batch Generation (Pro Version)¶
Leverage Pro version's "5 simultaneous generations" feature:
【Example: Product Demo Video】
1. Generate 5 angle variations simultaneously with same prompt
2. Select best result
3. Total 1000 credits for 5 variations = 200 per video
Effect: Increased options improve quality vs. single generation.
Strategy 3: Relaxed Mode Usage (Pro Version)¶
Prioritize unlimited Relaxed mode for non-urgent production:
【Recommended Use Cases】
- Test generations
- Archive footage creation
- Multi-variation comparisons
Caution: Generation time is 2-3x longer, unsuitable for deadline-critical projects.
Automation & Extension Ideas¶
- Prompt Template Management: Store frequently-used structures in Notion/Obsidian for reusability
- Quality Score Logging: Rate each generation (1-10) to accumulate success patterns
- Credit Consumption Tracker: Record daily usage in Excel to avoid month-end shortage
- API Integration (Future): Automate prompt A/B testing after API release