Files

Mike Wichers 2100ca2312 feat: Implement ai_blueprint.md action plan — architectural review & optimisations

Steps 1–7 of the ai_blueprint.md action plan executed:

DOCUMENTATION (Steps 1–3, 6–7):
- docs/current_state_analysis.md: Phase-by-phase cost/quality mapping of existing pipeline
- docs/alternatives_analysis.md: 15 alternative approaches with testable hypotheses
- docs/experiment_design.md: 7 controlled A/B experiment specifications (CPC, HQS, CER metrics)
- ai_blueprint_v2.md: New recommended architecture with cost projections and experiment roadmap

CODE IMPROVEMENTS (Step 4 — Experiments 1–4 implemented):
- story/writer.py: Extract build_persona_info() — persona loaded once per book, not per chapter
- story/writer.py: Adaptive scoring thresholds — SCORE_PASSING scales 6.5→7.5 by chapter position
- story/writer.py: Beat expansion skip — if beats >100 words, skip Director's Treatment expansion
- story/planner.py: validate_outline() — pre-generation gate checks missing beats, continuity, pacing
- story/planner.py: Enrichment field validation — warn on missing title/genre after enrich()
- cli/engine.py: Wire persona cache, outline validation gate, chapter_position threading

Expected savings: ~285K tokens per 30-chapter novel (~7% cost reduction)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-02-22 22:01:30 -05:00

14 KiB

Raw Blame History

Alternatives Analysis: Hypotheses for Each Phase

Date: 2026-02-22 Status: Completed — fulfills Action Plan Step 2

Methodology

For each phase, we present the current approach, document credible alternatives, and state a testable hypothesis about cost and quality impact. Each alternative is rated for implementation complexity and expected payoff.

Phase 1: Foundation & Ideation

Current Approach

A single Logic-model call expands a minimal user prompt into book_metadata, characters, and plot_beats. The author persona is created in a separate single-pass call.

Alt 1-A: Dynamic Bible (Just-In-Time Generation)

Description: Instead of creating the full bible upfront, generate only world rules and core character archetypes at start. Flesh out secondary characters and specific locations only when the planner references them during outlining.

Mechanism:

Upfront: title, genre, tone, 1–2 core characters, 3 immutable world rules
During expand(): When a new location/character appears in events, call a mini-enrichment to define them
Benefits: Only define what's actually used; no wasted detail on characters who don't appear

Hypothesis: Dynamic bible reduces Phase 1 token cost by ~30% and improves character coherence because every detail is tied to a specific narrative purpose. May increase Phase 2 cost by ~15% due to incremental enrichment calls.

Complexity: Medium — requires refactoring planner.py to support on-demand enrichment

Risk: New characters generated mid-outline might not be coherent with established world

Alt 1-B: Lean Bible (Rules + Emergence)

Description: Define only immutable "physics" of the world (e.g., "no magic exists", "set in 1920s London") and let all characters and plot details emerge from the writing process. Only characters explicitly named by the user are pre-defined.

Hypothesis: Lean bible reduces Phase 1 cost by ~60% but increases Phase 3 cost by ~25% (more continuity errors require more evaluation retries). Net effect depends on how many characters the user pre-defines.

Complexity: Low — strip enrich() down to essentials

Risk: Characters might be inconsistent across chapters without a shared bible anchor

Alt 1-C: Iterative Persona Validation

Description: After create_initial_persona(), immediately generate a 200-word sample passage in that persona's voice and evaluate it with the editor. Only accept the persona if the sample scores ≥ 7/10.

Hypothesis: Iterative persona validation adds ~8K tokens to Phase 1 but reduces Phase 3 persona-related rewrite rate by ~20% (fewer voice-drift refinements needed).

Complexity: Low — add one evaluation call after persona creation

Risk: Minimal — only adds cost if persona is rejected

Phase 2: Structuring & Outlining

Current Approach

Sequential depth-expansion passes convert plot beats into a chapter plan. Each expand() call is unaware of the final desired state, so multiple passes are needed.

Alt 2-A: Single-Pass Hierarchical Outline

Description: Replace sequential expand() calls with a single multi-step prompt that builds the outline in one shot — specifying the desired depth level in the instructions. The model produces both high-level events and chapter-level detail simultaneously.

Hypothesis: Single-pass outline reduces Phase 2 Logic calls from 6 to 2 (one plan_structure, one combined expand+chapter_plan), saving ~60K tokens (~45% Phase 2 cost). Quality may drop slightly if the model can't maintain coherence across 50 chapters in one response.

Complexity: Low — prompt rewrite; no code structure change

Risk: Large single-response JSON might fail or be truncated by model. Novel (30 chapters) is manageable; Epic (50 chapters) is borderline.

Alt 2-B: Outline Validation Gate

Description: After create_chapter_plan(), run a validation call that checks the outline for: (a) missing required plot beats, (b) character deaths/revivals, (c) pacing imbalances, (d) POV distribution. Block writing phase until outline passes validation.

Hypothesis: Pre-generation outline validation (1 Logic call, ~15K tokens, FREE on Pro-Exp) prevents ~~3–5 expensive rewrite cycles during Phase 3, saving 75K–125K Writer tokens (~~$0.05–$0.10 per book).

Complexity: Low — add validate_outline() function, call it before Phase 3 begins

Risk: Validation might be overly strict and reject valid creative choices

Alt 2-C: Dynamic Personas (Mood/POV Adaptation)

Description: Instead of a single author persona, create sub-personas for different scene types: (a) action sequences, (b) introspection/emotion, (c) dialogue-heavy scenes. The writer prompt selects the appropriate sub-persona based on chapter pacing.

Hypothesis: Dynamic personas reduce "voice drift" across different scene types, improving average chapter evaluation score by ~0.3 points. Cost increases by ~12K tokens/book for the additional persona generation calls.

Complexity: Medium — requires sub-persona generation, storage, and selection logic in write_chapter()

Risk: Sub-personas might be inconsistent with each other if not carefully designed

Alt 2-D: Specialized Chapter Templates

Description: Create genre-specific "chapter templates" for common patterns: opening chapters, mid-point reversals, climax chapters, denouements. The planner selects the appropriate template when assigning structure, reducing the amount of creative work needed per chapter.

Hypothesis: Chapter templates reduce Phase 3 beat expansion cost by ~40% (pre-structured templates need less expansion) and reduce rewrite rate by ~15% (templates encode known-good patterns).

Complexity: Medium — requires template library and selection logic

Risk: Templates might make books feel formulaic

Phase 3: The Writing Engine

Current Approach

Single-model drafting with up to 3 attempts. Low-scoring drafts trigger full rewrites using the Pro model. Evaluation happens after each draft.

Alt 3-A: Two-Pass Drafting (Cheap Draft + Expensive Polish)

Description: Use the cheapest available Flash model for a rough first draft (focused on getting beats covered and word count right), then use the Pro model to polish prose quality. Skip the evaluation + rewrite loop entirely.

Hypothesis: Two-pass drafting reduces average chapter evaluation score variance (fewer very-low scores), but might be slower because every chapter gets polished regardless of quality. Net cost impact uncertain — depends on Flash vs Pro price differential. At current pricing (Flash free on Pro-Exp), this is equivalent to the current approach.

Complexity: Low — add a "polish" pass after initial draft in write_chapter()

Risk: Polish pass might not improve chapters that have structural problems (wrong beats covered)

Alt 3-B: Adaptive Scoring Thresholds

Description: Use different scoring thresholds based on chapter position and importance:

Setup chapters (1–20% of book): SCORE_PASSING = 6.5 (accept imperfect early work)
Midpoint + rising action (20–70%): SCORE_PASSING = 7.0 (current standard)
Climax + resolution (70–100%): SCORE_PASSING = 7.5 (stricter standards for crucial chapters)

Hypothesis: Adaptive thresholds reduce refinement calls on setup chapters by ~25% while improving quality of climax chapters. Net token saving ~~100K per book (~~$0.02) with no quality loss on high-stakes scenes.

Complexity: Very low — change 2 constants in write_chapter() to be position-aware

Risk: Lower-quality setup chapters might affect reader engagement in early pages

Alt 3-C: Pre-Scoring Outline Beats

Description: Before writing any chapter, use the Logic model to score each chapter's beat list for "writability" — the likelihood that the beats will produce a high-quality first draft. Flag chapters scoring below 6/10 as "high-risk" and assign them extra write attempts upfront.

Hypothesis: Pre-scoring beats adds ~5K tokens per book but reduces full-rewrite incidents by ~30% (the most expensive outcome). Expected saving: 30% × 15 rewrites × 50K tokens = ~~225K tokens (~~$0.05).

Complexity: Low — add score_beats_writability() call before Phase 3 loop

Risk: Pre-scoring accuracy might be low; Logic model can't fully predict quality from beats alone

Alt 3-D: Persona Caching (Immediate Win)

Description: Load the author persona (bio, sample text, sample files) once per book run rather than re-reading from disk for each chapter. Store in memory and pass to write_chapter() as a pre-built string.

Hypothesis: Persona caching reduces per-chapter I/O overhead and eliminates redundant file reads. No quality impact. Saves ~90K tokens per book (3K tokens × 30 chapters from persona sample files).

Complexity: Very low — refactor engine.py to load persona once and pass it

Risk: None

Alt 3-E: Skip Beat Expansion for Detailed Beats

Description: If a chapter's beats already exceed 100 words each, skip expand_beats_to_treatment(). The existing beats are detailed enough to guide the writer.

Hypothesis: ~30% of chapters have detailed beats. Skipping expansion saves 5K tokens × 30% × 30 chapters = ~45K tokens. Quality impact negligible for already-detailed beats.

Complexity: Very low — add word-count check before calling expand_beats_to_treatment()

Risk: None for already-detailed beats; risk only if threshold is set too low

Phase 4: Review & Refinement

Current Approach

Per-chapter evaluation with 13 rubrics. Post-generation consistency check. Dynamic pacing interventions. User-triggered ripple propagation.

Alt 4-A: Batched Chapter Evaluation

Description: Instead of evaluating each chapter individually (~20K tokens/eval), batch 3–5 chapters per evaluation call. The evaluator assesses them together and can identify cross-chapter issues (pacing, voice consistency) that per-chapter evaluation misses.

Hypothesis: Batched evaluation reduces evaluation token cost by ~60% (from 600K to 240K tokens) while improving cross-chapter quality detection. Risk: individual chapter scores may be less granular.

Complexity: Medium — refactor evaluate_chapter_quality() to accept chapter arrays

Risk: Batched scoring might be less precise per-chapter; harder to pinpoint which chapter needs rewriting

Alt 4-B: Mid-Generation Consistency Snapshots

Description: Run analyze_consistency() every 10 chapters (not just post-generation). If contradictions are found, pause writing and resolve them before proceeding.

Hypothesis: Mid-generation consistency checks add ~3 Logic calls per 30-chapter book (~75K tokens, FREE) but reduce post-generation ripple propagation cost by ~50% by catching issues early.

Complexity: Low — add consistency snapshot call to engine.py loop

Risk: Consistency check might generate false positives that stall generation

Alt 4-C: Semantic Ripple Detection

Description: Replace LLM-based ripple detection in check_and_propagate() with an embedding-similarity approach. When Chapter N is edited, compute semantic similarity between Chapter N's content and all downstream chapters. Only rewrite chapters above a similarity threshold.

Hypothesis: Semantic ripple detection reduces per-ripple token cost from ~15K (LLM scan) to ~2K (embedding query) — 87% reduction. Accuracy comparable to LLM for direct references; may miss indirect narrative impacts.

Complexity: High — requires adding sentence-transformers or Gemini embedding API dependency

Risk: Embedding similarity doesn't capture narrative causality (e.g., a character dying affects later chapters even if the death isn't mentioned verbatim)

Alt 4-D: Editor Bot Specialization

Description: Create specialized sub-evaluators for specific failure modes:

check_filter_words() — fast regex-based scan (no LLM needed)
check_summary_mode() — detect scene-skipping patterns
check_voice_consistency() — compare chapter voice against persona sample
check_plot_adherence() — verify beats were covered

Run cheap checks first; only invoke full 13-rubric LLM evaluation if fast checks pass.

Hypothesis: Specialized editor bots reduce evaluation cost by ~40% (many chapters fail fast checks and don't need full LLM eval). Quality detection equal or better because fast checks are more precise for rule violations.

Complexity: Medium — implement regex-based fast checks; modify evaluation pipeline

Risk: Fast checks might have false positives that reject good chapters prematurely

Summary: Hypotheses Ranked by Expected Value

Alt	Phase	Expected Token Saving	Quality Impact	Complexity
3-D (Persona Cache)	3	~90K	None	Very Low
3-E (Skip Beat Expansion)	3	~45K	None	Very Low
2-B (Outline Validation)	2	Prevents ~100K rewrites	Positive	Low
3-B (Adaptive Thresholds)	3	~100K	Positive	Very Low
1-C (Persona Validation)	1	~60K (prevented rewrites)	Positive	Low
4-B (Mid-gen Consistency)	4	~75K (prevented rewrites)	Positive	Low
3-C (Pre-score Beats)	3	~225K	Positive	Low
4-A (Batch Evaluation)	4	~360K	Neutral/Positive	Medium
2-A (Single-pass Outline)	2	~60K	Neutral	Low
3-B (Two-Pass Drafting)	3	Neutral	Potentially Positive	Low
4-D (Editor Bots)	4	~240K	Positive	Medium
2-C (Dynamic Personas)	2	-12K (slight increase)	Positive	Medium
4-C (Semantic Ripple)	4	~200K	Neutral	High

14 KiB Raw Blame History Unescape Escape

Alternatives Analysis: Hypotheses for Each Phase

Methodology

Phase 1: Foundation & Ideation

Current Approach

Alt 1-A: Dynamic Bible (Just-In-Time Generation)

Alt 1-B: Lean Bible (Rules + Emergence)

Alt 1-C: Iterative Persona Validation

Phase 2: Structuring & Outlining

Current Approach

Alt 2-A: Single-Pass Hierarchical Outline

Alt 2-B: Outline Validation Gate

Alt 2-C: Dynamic Personas (Mood/POV Adaptation)

Alt 2-D: Specialized Chapter Templates

Phase 3: The Writing Engine

Current Approach

Alt 3-A: Two-Pass Drafting (Cheap Draft + Expensive Polish)

Alt 3-B: Adaptive Scoring Thresholds

Alt 3-C: Pre-Scoring Outline Beats

Alt 3-D: Persona Caching (Immediate Win)

Alt 3-E: Skip Beat Expansion for Detailed Beats

Phase 4: Review & Refinement

Current Approach

Alt 4-A: Batched Chapter Evaluation

Alt 4-B: Mid-Generation Consistency Snapshots

Alt 4-C: Semantic Ripple Detection

Alt 4-D: Editor Bot Specialization

Summary: Hypotheses Ranked by Expected Value

14 KiB

Raw Blame History