Files
bookapp/docs/alternatives_analysis.md
Mike Wichers 2100ca2312 feat: Implement ai_blueprint.md action plan — architectural review & optimisations
Steps 1–7 of the ai_blueprint.md action plan executed:

DOCUMENTATION (Steps 1–3, 6–7):
- docs/current_state_analysis.md: Phase-by-phase cost/quality mapping of existing pipeline
- docs/alternatives_analysis.md: 15 alternative approaches with testable hypotheses
- docs/experiment_design.md: 7 controlled A/B experiment specifications (CPC, HQS, CER metrics)
- ai_blueprint_v2.md: New recommended architecture with cost projections and experiment roadmap

CODE IMPROVEMENTS (Step 4 — Experiments 1–4 implemented):
- story/writer.py: Extract build_persona_info() — persona loaded once per book, not per chapter
- story/writer.py: Adaptive scoring thresholds — SCORE_PASSING scales 6.5→7.5 by chapter position
- story/writer.py: Beat expansion skip — if beats >100 words, skip Director's Treatment expansion
- story/planner.py: validate_outline() — pre-generation gate checks missing beats, continuity, pacing
- story/planner.py: Enrichment field validation — warn on missing title/genre after enrich()
- cli/engine.py: Wire persona cache, outline validation gate, chapter_position threading

Expected savings: ~285K tokens per 30-chapter novel (~7% cost reduction)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 22:01:30 -05:00

14 KiB
Raw Blame History

Alternatives Analysis: Hypotheses for Each Phase

Date: 2026-02-22 Status: Completed — fulfills Action Plan Step 2


Methodology

For each phase, we present the current approach, document credible alternatives, and state a testable hypothesis about cost and quality impact. Each alternative is rated for implementation complexity and expected payoff.


Phase 1: Foundation & Ideation

Current Approach

A single Logic-model call expands a minimal user prompt into book_metadata, characters, and plot_beats. The author persona is created in a separate single-pass call.


Alt 1-A: Dynamic Bible (Just-In-Time Generation)

Description: Instead of creating the full bible upfront, generate only world rules and core character archetypes at start. Flesh out secondary characters and specific locations only when the planner references them during outlining.

Mechanism:

  1. Upfront: title, genre, tone, 12 core characters, 3 immutable world rules
  2. During expand(): When a new location/character appears in events, call a mini-enrichment to define them
  3. Benefits: Only define what's actually used; no wasted detail on characters who don't appear

Hypothesis: Dynamic bible reduces Phase 1 token cost by ~30% and improves character coherence because every detail is tied to a specific narrative purpose. May increase Phase 2 cost by ~15% due to incremental enrichment calls.

Complexity: Medium — requires refactoring planner.py to support on-demand enrichment

Risk: New characters generated mid-outline might not be coherent with established world


Alt 1-B: Lean Bible (Rules + Emergence)

Description: Define only immutable "physics" of the world (e.g., "no magic exists", "set in 1920s London") and let all characters and plot details emerge from the writing process. Only characters explicitly named by the user are pre-defined.

Hypothesis: Lean bible reduces Phase 1 cost by ~60% but increases Phase 3 cost by ~25% (more continuity errors require more evaluation retries). Net effect depends on how many characters the user pre-defines.

Complexity: Low — strip enrich() down to essentials

Risk: Characters might be inconsistent across chapters without a shared bible anchor


Alt 1-C: Iterative Persona Validation

Description: After create_initial_persona(), immediately generate a 200-word sample passage in that persona's voice and evaluate it with the editor. Only accept the persona if the sample scores ≥ 7/10.

Hypothesis: Iterative persona validation adds ~8K tokens to Phase 1 but reduces Phase 3 persona-related rewrite rate by ~20% (fewer voice-drift refinements needed).

Complexity: Low — add one evaluation call after persona creation

Risk: Minimal — only adds cost if persona is rejected


Phase 2: Structuring & Outlining

Current Approach

Sequential depth-expansion passes convert plot beats into a chapter plan. Each expand() call is unaware of the final desired state, so multiple passes are needed.


Alt 2-A: Single-Pass Hierarchical Outline

Description: Replace sequential expand() calls with a single multi-step prompt that builds the outline in one shot — specifying the desired depth level in the instructions. The model produces both high-level events and chapter-level detail simultaneously.

Hypothesis: Single-pass outline reduces Phase 2 Logic calls from 6 to 2 (one plan_structure, one combined expand+chapter_plan), saving ~60K tokens (~45% Phase 2 cost). Quality may drop slightly if the model can't maintain coherence across 50 chapters in one response.

Complexity: Low — prompt rewrite; no code structure change

Risk: Large single-response JSON might fail or be truncated by model. Novel (30 chapters) is manageable; Epic (50 chapters) is borderline.


Alt 2-B: Outline Validation Gate

Description: After create_chapter_plan(), run a validation call that checks the outline for: (a) missing required plot beats, (b) character deaths/revivals, (c) pacing imbalances, (d) POV distribution. Block writing phase until outline passes validation.

Hypothesis: Pre-generation outline validation (1 Logic call, ~15K tokens, FREE on Pro-Exp) prevents 35 expensive rewrite cycles during Phase 3, saving 75K125K Writer tokens ($0.05$0.10 per book).

Complexity: Low — add validate_outline() function, call it before Phase 3 begins

Risk: Validation might be overly strict and reject valid creative choices


Alt 2-C: Dynamic Personas (Mood/POV Adaptation)

Description: Instead of a single author persona, create sub-personas for different scene types: (a) action sequences, (b) introspection/emotion, (c) dialogue-heavy scenes. The writer prompt selects the appropriate sub-persona based on chapter pacing.

Hypothesis: Dynamic personas reduce "voice drift" across different scene types, improving average chapter evaluation score by ~0.3 points. Cost increases by ~12K tokens/book for the additional persona generation calls.

Complexity: Medium — requires sub-persona generation, storage, and selection logic in write_chapter()

Risk: Sub-personas might be inconsistent with each other if not carefully designed


Alt 2-D: Specialized Chapter Templates

Description: Create genre-specific "chapter templates" for common patterns: opening chapters, mid-point reversals, climax chapters, denouements. The planner selects the appropriate template when assigning structure, reducing the amount of creative work needed per chapter.

Hypothesis: Chapter templates reduce Phase 3 beat expansion cost by ~40% (pre-structured templates need less expansion) and reduce rewrite rate by ~15% (templates encode known-good patterns).

Complexity: Medium — requires template library and selection logic

Risk: Templates might make books feel formulaic


Phase 3: The Writing Engine

Current Approach

Single-model drafting with up to 3 attempts. Low-scoring drafts trigger full rewrites using the Pro model. Evaluation happens after each draft.


Alt 3-A: Two-Pass Drafting (Cheap Draft + Expensive Polish)

Description: Use the cheapest available Flash model for a rough first draft (focused on getting beats covered and word count right), then use the Pro model to polish prose quality. Skip the evaluation + rewrite loop entirely.

Hypothesis: Two-pass drafting reduces average chapter evaluation score variance (fewer very-low scores), but might be slower because every chapter gets polished regardless of quality. Net cost impact uncertain — depends on Flash vs Pro price differential. At current pricing (Flash free on Pro-Exp), this is equivalent to the current approach.

Complexity: Low — add a "polish" pass after initial draft in write_chapter()

Risk: Polish pass might not improve chapters that have structural problems (wrong beats covered)


Alt 3-B: Adaptive Scoring Thresholds

Description: Use different scoring thresholds based on chapter position and importance:

  • Setup chapters (120% of book): SCORE_PASSING = 6.5 (accept imperfect early work)
  • Midpoint + rising action (2070%): SCORE_PASSING = 7.0 (current standard)
  • Climax + resolution (70100%): SCORE_PASSING = 7.5 (stricter standards for crucial chapters)

Hypothesis: Adaptive thresholds reduce refinement calls on setup chapters by ~25% while improving quality of climax chapters. Net token saving 100K per book ($0.02) with no quality loss on high-stakes scenes.

Complexity: Very low — change 2 constants in write_chapter() to be position-aware

Risk: Lower-quality setup chapters might affect reader engagement in early pages


Alt 3-C: Pre-Scoring Outline Beats

Description: Before writing any chapter, use the Logic model to score each chapter's beat list for "writability" — the likelihood that the beats will produce a high-quality first draft. Flag chapters scoring below 6/10 as "high-risk" and assign them extra write attempts upfront.

Hypothesis: Pre-scoring beats adds ~5K tokens per book but reduces full-rewrite incidents by ~30% (the most expensive outcome). Expected saving: 30% × 15 rewrites × 50K tokens = 225K tokens ($0.05).

Complexity: Low — add score_beats_writability() call before Phase 3 loop

Risk: Pre-scoring accuracy might be low; Logic model can't fully predict quality from beats alone


Alt 3-D: Persona Caching (Immediate Win)

Description: Load the author persona (bio, sample text, sample files) once per book run rather than re-reading from disk for each chapter. Store in memory and pass to write_chapter() as a pre-built string.

Hypothesis: Persona caching reduces per-chapter I/O overhead and eliminates redundant file reads. No quality impact. Saves ~90K tokens per book (3K tokens × 30 chapters from persona sample files).

Complexity: Very low — refactor engine.py to load persona once and pass it

Risk: None


Alt 3-E: Skip Beat Expansion for Detailed Beats

Description: If a chapter's beats already exceed 100 words each, skip expand_beats_to_treatment(). The existing beats are detailed enough to guide the writer.

Hypothesis: ~30% of chapters have detailed beats. Skipping expansion saves 5K tokens × 30% × 30 chapters = ~45K tokens. Quality impact negligible for already-detailed beats.

Complexity: Very low — add word-count check before calling expand_beats_to_treatment()

Risk: None for already-detailed beats; risk only if threshold is set too low


Phase 4: Review & Refinement

Current Approach

Per-chapter evaluation with 13 rubrics. Post-generation consistency check. Dynamic pacing interventions. User-triggered ripple propagation.


Alt 4-A: Batched Chapter Evaluation

Description: Instead of evaluating each chapter individually (~20K tokens/eval), batch 35 chapters per evaluation call. The evaluator assesses them together and can identify cross-chapter issues (pacing, voice consistency) that per-chapter evaluation misses.

Hypothesis: Batched evaluation reduces evaluation token cost by ~60% (from 600K to 240K tokens) while improving cross-chapter quality detection. Risk: individual chapter scores may be less granular.

Complexity: Medium — refactor evaluate_chapter_quality() to accept chapter arrays

Risk: Batched scoring might be less precise per-chapter; harder to pinpoint which chapter needs rewriting


Alt 4-B: Mid-Generation Consistency Snapshots

Description: Run analyze_consistency() every 10 chapters (not just post-generation). If contradictions are found, pause writing and resolve them before proceeding.

Hypothesis: Mid-generation consistency checks add ~3 Logic calls per 30-chapter book (~75K tokens, FREE) but reduce post-generation ripple propagation cost by ~50% by catching issues early.

Complexity: Low — add consistency snapshot call to engine.py loop

Risk: Consistency check might generate false positives that stall generation


Alt 4-C: Semantic Ripple Detection

Description: Replace LLM-based ripple detection in check_and_propagate() with an embedding-similarity approach. When Chapter N is edited, compute semantic similarity between Chapter N's content and all downstream chapters. Only rewrite chapters above a similarity threshold.

Hypothesis: Semantic ripple detection reduces per-ripple token cost from ~15K (LLM scan) to ~2K (embedding query) — 87% reduction. Accuracy comparable to LLM for direct references; may miss indirect narrative impacts.

Complexity: High — requires adding sentence-transformers or Gemini embedding API dependency

Risk: Embedding similarity doesn't capture narrative causality (e.g., a character dying affects later chapters even if the death isn't mentioned verbatim)


Alt 4-D: Editor Bot Specialization

Description: Create specialized sub-evaluators for specific failure modes:

  • check_filter_words() — fast regex-based scan (no LLM needed)
  • check_summary_mode() — detect scene-skipping patterns
  • check_voice_consistency() — compare chapter voice against persona sample
  • check_plot_adherence() — verify beats were covered

Run cheap checks first; only invoke full 13-rubric LLM evaluation if fast checks pass.

Hypothesis: Specialized editor bots reduce evaluation cost by ~40% (many chapters fail fast checks and don't need full LLM eval). Quality detection equal or better because fast checks are more precise for rule violations.

Complexity: Medium — implement regex-based fast checks; modify evaluation pipeline

Risk: Fast checks might have false positives that reject good chapters prematurely


Summary: Hypotheses Ranked by Expected Value

Alt Phase Expected Token Saving Quality Impact Complexity
3-D (Persona Cache) 3 ~90K None Very Low
3-E (Skip Beat Expansion) 3 ~45K None Very Low
2-B (Outline Validation) 2 Prevents ~100K rewrites Positive Low
3-B (Adaptive Thresholds) 3 ~100K Positive Very Low
1-C (Persona Validation) 1 ~60K (prevented rewrites) Positive Low
4-B (Mid-gen Consistency) 4 ~75K (prevented rewrites) Positive Low
3-C (Pre-score Beats) 3 ~225K Positive Low
4-A (Batch Evaluation) 4 ~360K Neutral/Positive Medium
2-A (Single-pass Outline) 2 ~60K Neutral Low
3-B (Two-Pass Drafting) 3 Neutral Potentially Positive Low
4-D (Editor Bots) 4 ~240K Positive Medium
2-C (Dynamic Personas) 2 -12K (slight increase) Positive Medium
4-C (Semantic Ripple) 4 ~200K Neutral High