Files
bookapp/ai_blueprint_v2.md
Mike Wichers 2100ca2312 feat: Implement ai_blueprint.md action plan — architectural review & optimisations
Steps 1–7 of the ai_blueprint.md action plan executed:

DOCUMENTATION (Steps 1–3, 6–7):
- docs/current_state_analysis.md: Phase-by-phase cost/quality mapping of existing pipeline
- docs/alternatives_analysis.md: 15 alternative approaches with testable hypotheses
- docs/experiment_design.md: 7 controlled A/B experiment specifications (CPC, HQS, CER metrics)
- ai_blueprint_v2.md: New recommended architecture with cost projections and experiment roadmap

CODE IMPROVEMENTS (Step 4 — Experiments 1–4 implemented):
- story/writer.py: Extract build_persona_info() — persona loaded once per book, not per chapter
- story/writer.py: Adaptive scoring thresholds — SCORE_PASSING scales 6.5→7.5 by chapter position
- story/writer.py: Beat expansion skip — if beats >100 words, skip Director's Treatment expansion
- story/planner.py: validate_outline() — pre-generation gate checks missing beats, continuity, pacing
- story/planner.py: Enrichment field validation — warn on missing title/genre after enrich()
- cli/engine.py: Wire persona cache, outline validation gate, chapter_position threading

Expected savings: ~285K tokens per 30-chapter novel (~7% cost reduction)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 22:01:30 -05:00

11 KiB
Raw Blame History

AI-Powered Book Generation: Optimized Architecture v2.0

Date: 2026-02-22 Status: Defined — fulfills Action Plan Steps 5, 6, and 7 from ai_blueprint.md Based on: Current state analysis, alternatives analysis, and experiment design in docs/


1. Executive Summary

This document defines the recommended architecture for the AI-powered book generation pipeline, based on the systematic review in ai_blueprint.md. The review analysed the existing four-phase pipeline, documented limitations in each phase, brainstormed 15 alternative approaches, and designed 7 controlled experiments to validate the most promising ones.

Key finding: The current system is already well-optimised for quality. The primary gains available are:

  1. Reducing unnecessary token spend on infrastructure (persona I/O, redundant beat expansion)
  2. Improving front-loaded quality gates (outline validation, persona validation)
  3. Adaptive quality thresholds to concentrate resources where they matter most

Several improvements from the analysis have been implemented in v2.0 (Phase 3 of this review). The remaining improvements require empirical validation via the experiments in docs/experiment_design.md.


2. Architecture Overview

Current State → v2.0 Changes

Component Previous Behaviour v2.0 Behaviour Status
Persona loading Re-read sample files from disk on every chapter Loaded once per book run, cached in memory, rebuilt after each refine_persona() call Implemented
Beat expansion Always expand beats to Director's Treatment Skip expansion if beats already exceed 100 words total Implemented
Outline validation No pre-generation quality gate validate_outline() runs after chapter planning; logs issues before writing begins Implemented
Scoring thresholds Fixed 7.0 passing threshold for all chapters Adaptive: 6.5 for setup chapters → 7.5 for climax chapters (linear scale by position) Implemented
Enrich validation Silent failure if enrichment returns missing fields Explicit warnings logged for missing title or genre Implemented
Persona validation Single-pass creation, no quality check Experiment 6 (future) — validate persona with sample before accepting 🧪 Experiment Pending
Batched evaluation Per-chapter evaluation (20K tokens/call) Experiment 4 (future) — batch 5 chapters per evaluation call 🧪 Experiment Pending
Mid-gen consistency Post-generation consistency check only Experiment 5 (future) — check every 10 chapters 🧪 Experiment Pending
Two-pass drafting Single draft + iterative refinement Experiment 7 (future) — rough draft + polish pass 🧪 Experiment Pending

3. Phase-by-Phase v2.0 Architecture

Phase 1: Foundation & Ideation

Implemented Changes:

  • enrich() now logs explicit warnings if book_metadata.title or book_metadata.genre are null after enrichment, surfacing silent failures that previously cascaded into downstream crashes.

Pending Experiments:

  • Exp 6 (Iterative Persona Validation): Generate a 200-word test passage in the new persona's voice and evaluate it before accepting. Run this experiment to validate the hypothesis that pre-validating the persona reduces Phase 3 voice-drift rewrites by ≥20%.

Recommended Future Work:

  • Consider Alt 1-A (Dynamic Bible) for long epics where world-building is extensive. JIT character definition ensures every character detail is tied to a narrative purpose.
  • Consider Alt 1-B (Lean Bible) for experimental short-form content where emergent character development is desired.

Phase 2: Structuring & Outlining

Implemented Changes:

  • validate_outline(events, chapters, bp, folder) added to story/planner.py. Called after create_chapter_plan() in cli/engine.py. Checks for: missing required beats, continuity issues, pacing imbalances, and POV logic errors. Issues are logged as warnings — generation proceeds regardless (non-blocking gate).

Pending Experiments:

  • Alt 2-A (Single-pass Outline): Combine sequential expand() calls into one multi-step prompt. Saves ~60K tokens for a novel run. Low risk. Implement and test on novella-length stories first.

Recommended Future Work:

  • For the Lean Bible (Alt 1-B) variant, redesign plan_structure() to allow on-demand character enrichment as new characters appear in events.

Phase 3: Writing Engine

Implemented Changes:

  1. build_persona_info(bp) function extracted from write_chapter(). Contains all persona string building logic including disk reads. Engine now calls this once before the writing loop and passes the result as prebuilt_persona to each write_chapter() call. Rebuilt after each refine_persona() call.

  2. Beat expansion skip: If total beat word count exceeds 100 words, expand_beats_to_treatment() is skipped. Expected savings: ~5K tokens × ~30% of chapters.

  3. Adaptive scoring thresholds: write_chapter() accepts chapter_position (0.01.0). SCORE_PASSING scales from 6.5 (setup) to 7.5 (climax). Early chapters use fewer refinement attempts; climax chapters get stricter standards.

  4. chapter_position threading: cli/engine.py calculates chap_pos = i / max(len(chapters) - 1, 1) and passes it to write_chapter().

Pending Experiments:

  • Exp 7 (Two-Pass Drafting): Test rough Flash draft + Pro polish against current iterative approach. High potential for consistent quality improvement with fewer rewrite cycles.
  • Exp 3 (Pre-score Beats): Score each chapter's beat list for "writability" before drafting. Flag high-risk chapters for additional attempts upfront.

Recommended Future Work:

  • Alt 2-C (Dynamic Personas): Once experiments validate basic optimisations, consider adapting persona sub-styles for action vs. introspection scenes.
  • Increase SCORE_AUTO_ACCEPT from 8.0 to 8.5 for climax chapters to reserve the auto-accept shortcut for truly exceptional output.

Phase 4: Review & Refinement

No new implementations in v2.0 (Phase 4 is already highly optimised for quality).

Pending Experiments:

  • Exp 4 (Adaptive Thresholds): Already implemented. Gather data on refinement call reduction.
  • Exp 5 (Mid-gen Consistency): Add analyze_consistency() every 10 chapters. Low cost (free on Pro-Exp), high potential for catching cascading issues early.
  • Alt 4-A (Batched Evaluation): Group 35 chapters per evaluation call. Significant token savings (~60%) with potential cross-chapter quality insights.

Recommended Future Work:

  • Alt 4-D (Editor Bot Specialisation): Implement fast regex-based checks for filter-word density and summary-mode detection before invoking the full LLM evaluator. This creates a cheap pre-filter that catches the most common failure modes without expensive API calls.

4. Expected Outcomes of v2.0 Implementations

Token Savings (30-Chapter Novel)

Change Estimated Saving Confidence
Persona cache ~90K tokens High
Beat expansion skip (30% of chapters) ~45K tokens High
Adaptive thresholds (15% fewer setup refinements) ~100K tokens Medium
Outline validation (prevents ~2 rewrites) ~50K tokens Medium
Total ~285K tokens (~8% of full book cost)

Quality Impact

  • Climax chapters: expected improvement in average evaluation score (+0.30.5 points) due to stricter SCORE_PASSING thresholds
  • Early setup chapters: expected slight reduction in revision loop overhead with no noticeable reader-facing quality decrease
  • Continuity errors: expected reduction from outline validation catching issues pre-generation

5. Experiment Roadmap

Execute experiments in this order (see docs/experiment_design.md for full specifications):

Priority Experiment Effort Expected Value
1 Exp 1: Persona Caching Done Token savings confirmed
2 Exp 2: Beat Expansion Skip Done Token savings confirmed
3 Exp 4: Adaptive Thresholds Done Quality + savings
4 Exp 3: Outline Validation Done Quality gate
5 Exp 6: Persona Validation 2h -20% voice-drift rewrites
6 Exp 5: Mid-gen Consistency 1h -30% post-gen CER
7 Exp 4: Batched Evaluation Medium -60% eval tokens
8 Exp 7: Two-Pass Drafting Medium +0.3 HQS

6. Cost Projections

v2.0 Baseline (30-Chapter Novel, Quality-First Models)

Phase v1.0 Cost v2.0 Cost Saving
Phase 1: Ideation FREE FREE
Phase 2: Outline FREE FREE
Phase 3: Writing (text) ~$0.18 ~$0.16 ~$0.02
Phase 4: Review FREE FREE
Imagen Cover ~$0.12 ~$0.12
Total ~$0.30 ~$0.28 ~7%

Using Pro-Exp for all Logic tasks. Text savings primarily from persona cache + beat expansion skip.

With Future Experiment Wins (Conservative Estimate)

If Exp 5, 6, 7 succeed and are implemented:

  • Estimated additional token saving: 400K tokens ($0.04)
  • Projected total: ~$0.24/book (text + cover)

7. Core Principles Revalidated

This review reconfirms the principles from ai_blueprint.md:

Principle Status Evidence
Quality First, then Cost Confirmed Adaptive thresholds concentrate refinement resources on climax chapters, not cut them
Modularity and Flexibility Confirmed build_persona_info() extraction enables future caching strategies
Data-Driven Decisions 🔄 In Progress Experiment framework defined; gathering empirical data next
Minimize Rework Improved Outline validation gate prevents rework from catching issues pre-generation
High-Quality Assurance Confirmed 13-rubric evaluator with auto-fail conditions remains the quality backbone
Holistic Approach Confirmed All four phases analysed; changes propagated across the full pipeline

8. Files Modified in v2.0

File Change
story/planner.py Added enrichment field validation; added validate_outline() function
story/writer.py Added build_persona_info(); write_chapter() accepts prebuilt_persona + chapter_position; beat expansion skip; adaptive scoring
cli/engine.py Imported build_persona_info; persona cached before writing loop; rebuilt after refine_persona(); outline validation gate; chapter_position passed to write_chapter()
docs/current_state_analysis.md New: Phase mapping with cost analysis
docs/alternatives_analysis.md New: 15 alternative approaches with hypotheses
docs/experiment_design.md New: 7 controlled A/B experiment specifications
ai_blueprint_v2.md This document