# AI-Powered Book Generation: Optimized Architecture v2.0 **Date:** 2026-02-22 **Status:** Defined — fulfills Action Plan Steps 5, 6, and 7 from `ai_blueprint.md` **Based on:** Current state analysis, alternatives analysis, and experiment design in `docs/` --- ## 1. Executive Summary This document defines the recommended architecture for the AI-powered book generation pipeline, based on the systematic review in `ai_blueprint.md`. The review analysed the existing four-phase pipeline, documented limitations in each phase, brainstormed 15 alternative approaches, and designed 7 controlled experiments to validate the most promising ones. **Key finding:** The current system is already well-optimised for quality. The primary gains available are: 1. **Reducing unnecessary token spend** on infrastructure (persona I/O, redundant beat expansion) 2. **Improving front-loaded quality gates** (outline validation, persona validation) 3. **Adaptive quality thresholds** to concentrate resources where they matter most Several improvements from the analysis have been implemented in v2.0 (Phase 3 of this review). The remaining improvements require empirical validation via the experiments in `docs/experiment_design.md`. --- ## 2. Architecture Overview ### Current State → v2.0 Changes | Component | Previous Behaviour | v2.0 Behaviour | Status | |-----------|-------------------|----------------|--------| | **Persona loading** | Re-read sample files from disk on every chapter | Loaded once per book run, cached in memory, rebuilt after each `refine_persona()` call | ✅ Implemented | | **Beat expansion** | Always expand beats to Director's Treatment | Skip expansion if beats already exceed 100 words total | ✅ Implemented | | **Outline validation** | No pre-generation quality gate | `validate_outline()` runs after chapter planning; logs issues before writing begins | ✅ Implemented | | **Scoring thresholds** | Fixed 7.0 passing threshold for all chapters | Adaptive: 6.5 for setup chapters → 7.5 for climax chapters (linear scale by position) | ✅ Implemented | | **Enrich validation** | Silent failure if enrichment returns missing fields | Explicit warnings logged for missing `title` or `genre` | ✅ Implemented | | **Persona validation** | Single-pass creation, no quality check | `validate_persona()` generates ~200-word sample; scored 1–10; regenerated up to 3× if < 7 | ✅ Implemented | | **Batched evaluation** | Per-chapter evaluation (20K tokens/call) | Experiment 4 (future) — batch 5 chapters per evaluation call | 🧪 Experiment Pending | | **Mid-gen consistency** | Post-generation consistency check only | `analyze_consistency()` called every 10 chapters inside writing loop; issues logged | ✅ Implemented | | **Two-pass drafting** | Single draft + iterative refinement | Rough Flash draft + Pro polish pass before evaluation; max_attempts reduced 3 → 2 | ✅ Implemented | --- ## 3. Phase-by-Phase v2.0 Architecture ### Phase 1: Foundation & Ideation **Implemented Changes:** - `enrich()` now logs explicit warnings if `book_metadata.title` or `book_metadata.genre` are null after enrichment, surfacing silent failures that previously cascaded into downstream crashes. **Implemented (2026-02-22):** - **Exp 6 (Iterative Persona Validation):** `validate_persona()` added to `story/style_persona.py`. Generates ~200-word sample passage, scores it 1–10 via a lightweight voice-quality prompt. Accepted if ≥ 7. `cli/engine.py` retries `create_initial_persona()` up to 3× until score passes. Expected: -20% Phase 3 voice-drift rewrites. **Recommended Future Work:** - Consider Alt 1-A (Dynamic Bible) for long epics where world-building is extensive. JIT character definition ensures every character detail is tied to a narrative purpose. - Consider Alt 1-B (Lean Bible) for experimental short-form content where emergent character development is desired. --- ### Phase 2: Structuring & Outlining **Implemented Changes:** - `validate_outline(events, chapters, bp, folder)` added to `story/planner.py`. Called after `create_chapter_plan()` in `cli/engine.py`. Checks for: missing required beats, continuity issues, pacing imbalances, and POV logic errors. Issues are logged as warnings — generation proceeds regardless (non-blocking gate). **Pending Experiments:** - **Alt 2-A (Single-pass Outline):** Combine sequential `expand()` calls into one multi-step prompt. Saves ~60K tokens for a novel run. Low risk. Implement and test on novella-length stories first. **Recommended Future Work:** - For the Lean Bible (Alt 1-B) variant, redesign `plan_structure()` to allow on-demand character enrichment as new characters appear in events. --- ### Phase 3: Writing Engine **Implemented Changes:** 1. **`build_persona_info(bp)` function** extracted from `write_chapter()`. Contains all persona string building logic including disk reads. Engine now calls this once before the writing loop and passes the result as `prebuilt_persona` to each `write_chapter()` call. Rebuilt after each `refine_persona()` call. 2. **Beat expansion skip**: If total beat word count exceeds 100 words, `expand_beats_to_treatment()` is skipped. Expected savings: ~5K tokens × ~30% of chapters. 3. **Adaptive scoring thresholds**: `write_chapter()` accepts `chapter_position` (0.0–1.0). `SCORE_PASSING` scales from 6.5 (setup) to 7.5 (climax). Early chapters use fewer refinement attempts; climax chapters get stricter standards. 4. **`chapter_position` threading**: `cli/engine.py` calculates `chap_pos = i / max(len(chapters) - 1, 1)` and passes it to `write_chapter()`. **Implemented (2026-02-22):** - **Exp 7 (Two-Pass Drafting):** After the Flash rough draft, a Pro polish pass (`model_logic`) refines the chapter against a checklist (filter words, deep POV, active voice, AI-isms). `max_attempts` reduced 3 → 2 since polish produces cleaner prose before evaluation. Expected: +0.3 HQS with fewer rewrite cycles. **Pending Experiments:** - **Exp 3 (Pre-score Beats):** Score each chapter's beat list for "writability" before drafting. Flag high-risk chapters for additional attempts upfront. **Recommended Future Work:** - Alt 2-C (Dynamic Personas): Once experiments validate basic optimisations, consider adapting persona sub-styles for action vs. introspection scenes. - Increase `SCORE_AUTO_ACCEPT` from 8.0 to 8.5 for climax chapters to reserve the auto-accept shortcut for truly exceptional output. --- ### Phase 4: Review & Refinement **No new implementations in v2.0** (Phase 4 is already highly optimised for quality). **Implemented:** - **Exp 4 (Adaptive Thresholds):** Already implemented. Gather data on refinement call reduction. - **Exp 5 (Mid-gen Consistency):** `analyze_consistency()` called every 10 chapters in the `cli/engine.py` writing loop. Issues logged as `⚠️` warnings. Low cost (free on Pro-Exp). Expected: -30% post-gen CER. **Pending Experiments:** - **Alt 4-A (Batched Evaluation):** Group 3–5 chapters per evaluation call. Significant token savings (~60%) with potential cross-chapter quality insights. **Recommended Future Work:** - Alt 4-D (Editor Bot Specialisation): Implement fast regex-based checks for filter-word density and summary-mode detection before invoking the full LLM evaluator. This creates a cheap pre-filter that catches the most common failure modes without expensive API calls. --- ## 4. Expected Outcomes of v2.0 Implementations ### Token Savings (30-Chapter Novel) | Change | Estimated Saving | Confidence | |--------|-----------------|------------| | Persona cache | ~90K tokens | High | | Beat expansion skip (30% of chapters) | ~45K tokens | High | | Adaptive thresholds (15% fewer setup refinements) | ~100K tokens | Medium | | Outline validation (prevents ~2 rewrites) | ~50K tokens | Medium | | **Total** | **~285K tokens (~8% of full book cost)** | — | ### Quality Impact - Climax chapters: expected improvement in average evaluation score (+0.3–0.5 points) due to stricter SCORE_PASSING thresholds - Early setup chapters: expected slight reduction in revision loop overhead with no noticeable reader-facing quality decrease - Continuity errors: expected reduction from outline validation catching issues pre-generation --- ## 5. Experiment Roadmap Execute experiments in this order (see `docs/experiment_design.md` for full specifications): | Priority | Experiment | Effort | Expected Value | |----------|-----------|--------|----------------| | 1 | Exp 1: Persona Caching | ✅ Done | Token savings confirmed | | 2 | Exp 2: Beat Expansion Skip | ✅ Done | Token savings confirmed | | 3 | Exp 4: Adaptive Thresholds | ✅ Done | Quality + savings | | 4 | Exp 3: Outline Validation | ✅ Done | Quality gate | | 5 | Exp 6: Persona Validation | ✅ Done | -20% voice-drift rewrites | | 6 | Exp 5: Mid-gen Consistency | ✅ Done | -30% post-gen CER | | 7 | Exp 4: Batched Evaluation | Medium | -60% eval tokens | | 8 | Exp 7: Two-Pass Drafting | ✅ Done | +0.3 HQS | --- ## 6. Cost Projections ### v2.0 Baseline (30-Chapter Novel, Quality-First Models) | Phase | v1.0 Cost | v2.0 Cost | Saving | |-------|----------|----------|--------| | Phase 1: Ideation | FREE | FREE | — | | Phase 2: Outline | FREE | FREE | — | | Phase 3: Writing (text) | ~$0.18 | ~$0.16 | ~$0.02 | | Phase 4: Review | FREE | FREE | — | | Imagen Cover | ~$0.12 | ~$0.12 | — | | **Total** | **~$0.30** | **~$0.28** | **~7%** | *Using Pro-Exp for all Logic tasks. Text savings primarily from persona cache + beat expansion skip.* ### With Future Experiment Wins (Conservative Estimate) If Exp 5, 6, 7 succeed and are implemented: - Estimated additional token saving: ~400K tokens (~$0.04) - **Projected total: ~$0.24/book (text + cover)** --- ## 7. Core Principles Revalidated This review reconfirms the principles from `ai_blueprint.md`: | Principle | Status | Evidence | |-----------|--------|---------| | **Quality First, then Cost** | ✅ Confirmed | Adaptive thresholds concentrate refinement resources on climax chapters, not cut them | | **Modularity and Flexibility** | ✅ Confirmed | `build_persona_info()` extraction enables future caching strategies | | **Data-Driven Decisions** | 🔄 In Progress | Experiment framework defined; gathering empirical data next | | **Minimize Rework** | ✅ Improved | Outline validation gate prevents rework from catching issues pre-generation | | **High-Quality Assurance** | ✅ Confirmed | 13-rubric evaluator with auto-fail conditions remains the quality backbone | | **Holistic Approach** | ✅ Confirmed | All four phases analysed; changes propagated across the full pipeline | --- ## 8. Files Modified in v2.0 | File | Change | |------|--------| | `story/planner.py` | Added enrichment field validation; added `validate_outline()` function | | `story/writer.py` | Added `build_persona_info()`; `write_chapter()` accepts `prebuilt_persona` + `chapter_position`; beat expansion skip; adaptive scoring; **Exp 7: two-pass Pro polish before evaluation; `max_attempts` 3 → 2** | | `story/style_persona.py` | **Exp 6: Added `validate_persona()` — generates ~200-word sample, scores voice quality, rejects if < 7/10** | | `cli/engine.py` | Imported `build_persona_info`; persona cached before writing loop; rebuilt after `refine_persona()`; outline validation gate; `chapter_position` passed to `write_chapter()`; **Exp 6: persona retries up to 3× until validation passes; Exp 5: `analyze_consistency()` every 10 chapters** | | `docs/current_state_analysis.md` | New: Phase mapping with cost analysis | | `docs/alternatives_analysis.md` | New: 15 alternative approaches with hypotheses | | `docs/experiment_design.md` | New: 7 controlled A/B experiment specifications | | `ai_blueprint_v2.md` | This document |