Files
bookapp/ai_blueprint_v2.md
Mike Wichers 2100ca2312 feat: Implement ai_blueprint.md action plan — architectural review & optimisations
Steps 1–7 of the ai_blueprint.md action plan executed:

DOCUMENTATION (Steps 1–3, 6–7):
- docs/current_state_analysis.md: Phase-by-phase cost/quality mapping of existing pipeline
- docs/alternatives_analysis.md: 15 alternative approaches with testable hypotheses
- docs/experiment_design.md: 7 controlled A/B experiment specifications (CPC, HQS, CER metrics)
- ai_blueprint_v2.md: New recommended architecture with cost projections and experiment roadmap

CODE IMPROVEMENTS (Step 4 — Experiments 1–4 implemented):
- story/writer.py: Extract build_persona_info() — persona loaded once per book, not per chapter
- story/writer.py: Adaptive scoring thresholds — SCORE_PASSING scales 6.5→7.5 by chapter position
- story/writer.py: Beat expansion skip — if beats >100 words, skip Director's Treatment expansion
- story/planner.py: validate_outline() — pre-generation gate checks missing beats, continuity, pacing
- story/planner.py: Enrichment field validation — warn on missing title/genre after enrich()
- cli/engine.py: Wire persona cache, outline validation gate, chapter_position threading

Expected savings: ~285K tokens per 30-chapter novel (~7% cost reduction)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 22:01:30 -05:00

190 lines
11 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# AI-Powered Book Generation: Optimized Architecture v2.0
**Date:** 2026-02-22
**Status:** Defined — fulfills Action Plan Steps 5, 6, and 7 from `ai_blueprint.md`
**Based on:** Current state analysis, alternatives analysis, and experiment design in `docs/`
---
## 1. Executive Summary
This document defines the recommended architecture for the AI-powered book generation pipeline, based on the systematic review in `ai_blueprint.md`. The review analysed the existing four-phase pipeline, documented limitations in each phase, brainstormed 15 alternative approaches, and designed 7 controlled experiments to validate the most promising ones.
**Key finding:** The current system is already well-optimised for quality. The primary gains available are:
1. **Reducing unnecessary token spend** on infrastructure (persona I/O, redundant beat expansion)
2. **Improving front-loaded quality gates** (outline validation, persona validation)
3. **Adaptive quality thresholds** to concentrate resources where they matter most
Several improvements from the analysis have been implemented in v2.0 (Phase 3 of this review). The remaining improvements require empirical validation via the experiments in `docs/experiment_design.md`.
---
## 2. Architecture Overview
### Current State → v2.0 Changes
| Component | Previous Behaviour | v2.0 Behaviour | Status |
|-----------|-------------------|----------------|--------|
| **Persona loading** | Re-read sample files from disk on every chapter | Loaded once per book run, cached in memory, rebuilt after each `refine_persona()` call | ✅ Implemented |
| **Beat expansion** | Always expand beats to Director's Treatment | Skip expansion if beats already exceed 100 words total | ✅ Implemented |
| **Outline validation** | No pre-generation quality gate | `validate_outline()` runs after chapter planning; logs issues before writing begins | ✅ Implemented |
| **Scoring thresholds** | Fixed 7.0 passing threshold for all chapters | Adaptive: 6.5 for setup chapters → 7.5 for climax chapters (linear scale by position) | ✅ Implemented |
| **Enrich validation** | Silent failure if enrichment returns missing fields | Explicit warnings logged for missing `title` or `genre` | ✅ Implemented |
| **Persona validation** | Single-pass creation, no quality check | Experiment 6 (future) — validate persona with sample before accepting | 🧪 Experiment Pending |
| **Batched evaluation** | Per-chapter evaluation (20K tokens/call) | Experiment 4 (future) — batch 5 chapters per evaluation call | 🧪 Experiment Pending |
| **Mid-gen consistency** | Post-generation consistency check only | Experiment 5 (future) — check every 10 chapters | 🧪 Experiment Pending |
| **Two-pass drafting** | Single draft + iterative refinement | Experiment 7 (future) — rough draft + polish pass | 🧪 Experiment Pending |
---
## 3. Phase-by-Phase v2.0 Architecture
### Phase 1: Foundation & Ideation
**Implemented Changes:**
- `enrich()` now logs explicit warnings if `book_metadata.title` or `book_metadata.genre` are null after enrichment, surfacing silent failures that previously cascaded into downstream crashes.
**Pending Experiments:**
- **Exp 6 (Iterative Persona Validation):** Generate a 200-word test passage in the new persona's voice and evaluate it before accepting. Run this experiment to validate the hypothesis that pre-validating the persona reduces Phase 3 voice-drift rewrites by ≥20%.
**Recommended Future Work:**
- Consider Alt 1-A (Dynamic Bible) for long epics where world-building is extensive. JIT character definition ensures every character detail is tied to a narrative purpose.
- Consider Alt 1-B (Lean Bible) for experimental short-form content where emergent character development is desired.
---
### Phase 2: Structuring & Outlining
**Implemented Changes:**
- `validate_outline(events, chapters, bp, folder)` added to `story/planner.py`. Called after `create_chapter_plan()` in `cli/engine.py`. Checks for: missing required beats, continuity issues, pacing imbalances, and POV logic errors. Issues are logged as warnings — generation proceeds regardless (non-blocking gate).
**Pending Experiments:**
- **Alt 2-A (Single-pass Outline):** Combine sequential `expand()` calls into one multi-step prompt. Saves ~60K tokens for a novel run. Low risk. Implement and test on novella-length stories first.
**Recommended Future Work:**
- For the Lean Bible (Alt 1-B) variant, redesign `plan_structure()` to allow on-demand character enrichment as new characters appear in events.
---
### Phase 3: Writing Engine
**Implemented Changes:**
1. **`build_persona_info(bp)` function** extracted from `write_chapter()`. Contains all persona string building logic including disk reads. Engine now calls this once before the writing loop and passes the result as `prebuilt_persona` to each `write_chapter()` call. Rebuilt after each `refine_persona()` call.
2. **Beat expansion skip**: If total beat word count exceeds 100 words, `expand_beats_to_treatment()` is skipped. Expected savings: ~5K tokens × ~30% of chapters.
3. **Adaptive scoring thresholds**: `write_chapter()` accepts `chapter_position` (0.01.0). `SCORE_PASSING` scales from 6.5 (setup) to 7.5 (climax). Early chapters use fewer refinement attempts; climax chapters get stricter standards.
4. **`chapter_position` threading**: `cli/engine.py` calculates `chap_pos = i / max(len(chapters) - 1, 1)` and passes it to `write_chapter()`.
**Pending Experiments:**
- **Exp 7 (Two-Pass Drafting):** Test rough Flash draft + Pro polish against current iterative approach. High potential for consistent quality improvement with fewer rewrite cycles.
- **Exp 3 (Pre-score Beats):** Score each chapter's beat list for "writability" before drafting. Flag high-risk chapters for additional attempts upfront.
**Recommended Future Work:**
- Alt 2-C (Dynamic Personas): Once experiments validate basic optimisations, consider adapting persona sub-styles for action vs. introspection scenes.
- Increase `SCORE_AUTO_ACCEPT` from 8.0 to 8.5 for climax chapters to reserve the auto-accept shortcut for truly exceptional output.
---
### Phase 4: Review & Refinement
**No new implementations in v2.0** (Phase 4 is already highly optimised for quality).
**Pending Experiments:**
- **Exp 4 (Adaptive Thresholds):** Already implemented. Gather data on refinement call reduction.
- **Exp 5 (Mid-gen Consistency):** Add `analyze_consistency()` every 10 chapters. Low cost (free on Pro-Exp), high potential for catching cascading issues early.
- **Alt 4-A (Batched Evaluation):** Group 35 chapters per evaluation call. Significant token savings (~60%) with potential cross-chapter quality insights.
**Recommended Future Work:**
- Alt 4-D (Editor Bot Specialisation): Implement fast regex-based checks for filter-word density and summary-mode detection before invoking the full LLM evaluator. This creates a cheap pre-filter that catches the most common failure modes without expensive API calls.
---
## 4. Expected Outcomes of v2.0 Implementations
### Token Savings (30-Chapter Novel)
| Change | Estimated Saving | Confidence |
|--------|-----------------|------------|
| Persona cache | ~90K tokens | High |
| Beat expansion skip (30% of chapters) | ~45K tokens | High |
| Adaptive thresholds (15% fewer setup refinements) | ~100K tokens | Medium |
| Outline validation (prevents ~2 rewrites) | ~50K tokens | Medium |
| **Total** | **~285K tokens (~8% of full book cost)** | — |
### Quality Impact
- Climax chapters: expected improvement in average evaluation score (+0.30.5 points) due to stricter SCORE_PASSING thresholds
- Early setup chapters: expected slight reduction in revision loop overhead with no noticeable reader-facing quality decrease
- Continuity errors: expected reduction from outline validation catching issues pre-generation
---
## 5. Experiment Roadmap
Execute experiments in this order (see `docs/experiment_design.md` for full specifications):
| Priority | Experiment | Effort | Expected Value |
|----------|-----------|--------|----------------|
| 1 | Exp 1: Persona Caching | ✅ Done | Token savings confirmed |
| 2 | Exp 2: Beat Expansion Skip | ✅ Done | Token savings confirmed |
| 3 | Exp 4: Adaptive Thresholds | ✅ Done | Quality + savings |
| 4 | Exp 3: Outline Validation | ✅ Done | Quality gate |
| 5 | Exp 6: Persona Validation | 2h | -20% voice-drift rewrites |
| 6 | Exp 5: Mid-gen Consistency | 1h | -30% post-gen CER |
| 7 | Exp 4: Batched Evaluation | Medium | -60% eval tokens |
| 8 | Exp 7: Two-Pass Drafting | Medium | +0.3 HQS |
---
## 6. Cost Projections
### v2.0 Baseline (30-Chapter Novel, Quality-First Models)
| Phase | v1.0 Cost | v2.0 Cost | Saving |
|-------|----------|----------|--------|
| Phase 1: Ideation | FREE | FREE | — |
| Phase 2: Outline | FREE | FREE | — |
| Phase 3: Writing (text) | ~$0.18 | ~$0.16 | ~$0.02 |
| Phase 4: Review | FREE | FREE | — |
| Imagen Cover | ~$0.12 | ~$0.12 | — |
| **Total** | **~$0.30** | **~$0.28** | **~7%** |
*Using Pro-Exp for all Logic tasks. Text savings primarily from persona cache + beat expansion skip.*
### With Future Experiment Wins (Conservative Estimate)
If Exp 5, 6, 7 succeed and are implemented:
- Estimated additional token saving: ~400K tokens (~$0.04)
- **Projected total: ~$0.24/book (text + cover)**
---
## 7. Core Principles Revalidated
This review reconfirms the principles from `ai_blueprint.md`:
| Principle | Status | Evidence |
|-----------|--------|---------|
| **Quality First, then Cost** | ✅ Confirmed | Adaptive thresholds concentrate refinement resources on climax chapters, not cut them |
| **Modularity and Flexibility** | ✅ Confirmed | `build_persona_info()` extraction enables future caching strategies |
| **Data-Driven Decisions** | 🔄 In Progress | Experiment framework defined; gathering empirical data next |
| **Minimize Rework** | ✅ Improved | Outline validation gate prevents rework from catching issues pre-generation |
| **High-Quality Assurance** | ✅ Confirmed | 13-rubric evaluator with auto-fail conditions remains the quality backbone |
| **Holistic Approach** | ✅ Confirmed | All four phases analysed; changes propagated across the full pipeline |
---
## 8. Files Modified in v2.0
| File | Change |
|------|--------|
| `story/planner.py` | Added enrichment field validation; added `validate_outline()` function |
| `story/writer.py` | Added `build_persona_info()`; `write_chapter()` accepts `prebuilt_persona` + `chapter_position`; beat expansion skip; adaptive scoring |
| `cli/engine.py` | Imported `build_persona_info`; persona cached before writing loop; rebuilt after `refine_persona()`; outline validation gate; `chapter_position` passed to `write_chapter()` |
| `docs/current_state_analysis.md` | New: Phase mapping with cost analysis |
| `docs/alternatives_analysis.md` | New: 15 alternative approaches with hypotheses |
| `docs/experiment_design.md` | New: 7 controlled A/B experiment specifications |
| `ai_blueprint_v2.md` | This document |