Steps 1–7 of the ai_blueprint.md action plan executed: DOCUMENTATION (Steps 1–3, 6–7): - docs/current_state_analysis.md: Phase-by-phase cost/quality mapping of existing pipeline - docs/alternatives_analysis.md: 15 alternative approaches with testable hypotheses - docs/experiment_design.md: 7 controlled A/B experiment specifications (CPC, HQS, CER metrics) - ai_blueprint_v2.md: New recommended architecture with cost projections and experiment roadmap CODE IMPROVEMENTS (Step 4 — Experiments 1–4 implemented): - story/writer.py: Extract build_persona_info() — persona loaded once per book, not per chapter - story/writer.py: Adaptive scoring thresholds — SCORE_PASSING scales 6.5→7.5 by chapter position - story/writer.py: Beat expansion skip — if beats >100 words, skip Director's Treatment expansion - story/planner.py: validate_outline() — pre-generation gate checks missing beats, continuity, pacing - story/planner.py: Enrichment field validation — warn on missing title/genre after enrich() - cli/engine.py: Wire persona cache, outline validation gate, chapter_position threading Expected savings: ~285K tokens per 30-chapter novel (~7% cost reduction) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
265 lines
14 KiB
Markdown
265 lines
14 KiB
Markdown
# Alternatives Analysis: Hypotheses for Each Phase
|
||
|
||
**Date:** 2026-02-22
|
||
**Status:** Completed — fulfills Action Plan Step 2
|
||
|
||
---
|
||
|
||
## Methodology
|
||
|
||
For each phase, we present the current approach, document credible alternatives, and state a testable hypothesis about cost and quality impact. Each alternative is rated for implementation complexity and expected payoff.
|
||
|
||
---
|
||
|
||
## Phase 1: Foundation & Ideation
|
||
|
||
### Current Approach
|
||
A single Logic-model call expands a minimal user prompt into `book_metadata`, `characters`, and `plot_beats`. The author persona is created in a separate single-pass call.
|
||
|
||
---
|
||
|
||
### Alt 1-A: Dynamic Bible (Just-In-Time Generation)
|
||
|
||
**Description:** Instead of creating the full bible upfront, generate only world rules and core character archetypes at start. Flesh out secondary characters and specific locations only when the planner references them during outlining.
|
||
|
||
**Mechanism:**
|
||
1. Upfront: title, genre, tone, 1–2 core characters, 3 immutable world rules
|
||
2. During `expand()`: When a new location/character appears in events, call a mini-enrichment to define them
|
||
3. Benefits: Only define what's actually used; no wasted detail on characters who don't appear
|
||
|
||
**Hypothesis:** Dynamic bible reduces Phase 1 token cost by ~30% and improves character coherence because every detail is tied to a specific narrative purpose. May increase Phase 2 cost by ~15% due to incremental enrichment calls.
|
||
|
||
**Complexity:** Medium — requires refactoring `planner.py` to support on-demand enrichment
|
||
|
||
**Risk:** New characters generated mid-outline might not be coherent with established world
|
||
|
||
---
|
||
|
||
### Alt 1-B: Lean Bible (Rules + Emergence)
|
||
|
||
**Description:** Define only immutable "physics" of the world (e.g., "no magic exists", "set in 1920s London") and let all characters and plot details emerge from the writing process. Only characters explicitly named by the user are pre-defined.
|
||
|
||
**Hypothesis:** Lean bible reduces Phase 1 cost by ~60% but increases Phase 3 cost by ~25% (more continuity errors require more evaluation retries). Net effect depends on how many characters the user pre-defines.
|
||
|
||
**Complexity:** Low — strip `enrich()` down to essentials
|
||
|
||
**Risk:** Characters might be inconsistent across chapters without a shared bible anchor
|
||
|
||
---
|
||
|
||
### Alt 1-C: Iterative Persona Validation
|
||
|
||
**Description:** After `create_initial_persona()`, immediately generate a 200-word sample passage in that persona's voice and evaluate it with the editor. Only accept the persona if the sample scores ≥ 7/10.
|
||
|
||
**Hypothesis:** Iterative persona validation adds ~8K tokens to Phase 1 but reduces Phase 3 persona-related rewrite rate by ~20% (fewer voice-drift refinements needed).
|
||
|
||
**Complexity:** Low — add one evaluation call after persona creation
|
||
|
||
**Risk:** Minimal — only adds cost if persona is rejected
|
||
|
||
---
|
||
|
||
## Phase 2: Structuring & Outlining
|
||
|
||
### Current Approach
|
||
Sequential depth-expansion passes convert plot beats into a chapter plan. Each `expand()` call is unaware of the final desired state, so multiple passes are needed.
|
||
|
||
---
|
||
|
||
### Alt 2-A: Single-Pass Hierarchical Outline
|
||
|
||
**Description:** Replace sequential `expand()` calls with a single multi-step prompt that builds the outline in one shot — specifying the desired depth level in the instructions. The model produces both high-level events and chapter-level detail simultaneously.
|
||
|
||
**Hypothesis:** Single-pass outline reduces Phase 2 Logic calls from 6 to 2 (one `plan_structure`, one combined `expand+chapter_plan`), saving ~60K tokens (~45% Phase 2 cost). Quality may drop slightly if the model can't maintain coherence across 50 chapters in one response.
|
||
|
||
**Complexity:** Low — prompt rewrite; no code structure change
|
||
|
||
**Risk:** Large single-response JSON might fail or be truncated by model. Novel (30 chapters) is manageable; Epic (50 chapters) is borderline.
|
||
|
||
---
|
||
|
||
### Alt 2-B: Outline Validation Gate
|
||
|
||
**Description:** After `create_chapter_plan()`, run a validation call that checks the outline for: (a) missing required plot beats, (b) character deaths/revivals, (c) pacing imbalances, (d) POV distribution. Block writing phase until outline passes validation.
|
||
|
||
**Hypothesis:** Pre-generation outline validation (1 Logic call, ~15K tokens, FREE on Pro-Exp) prevents ~3–5 expensive rewrite cycles during Phase 3, saving 75K–125K Writer tokens (~$0.05–$0.10 per book).
|
||
|
||
**Complexity:** Low — add `validate_outline()` function, call it before Phase 3 begins
|
||
|
||
**Risk:** Validation might be overly strict and reject valid creative choices
|
||
|
||
---
|
||
|
||
### Alt 2-C: Dynamic Personas (Mood/POV Adaptation)
|
||
|
||
**Description:** Instead of a single author persona, create sub-personas for different scene types: (a) action sequences, (b) introspection/emotion, (c) dialogue-heavy scenes. The writer prompt selects the appropriate sub-persona based on chapter pacing.
|
||
|
||
**Hypothesis:** Dynamic personas reduce "voice drift" across different scene types, improving average chapter evaluation score by ~0.3 points. Cost increases by ~12K tokens/book for the additional persona generation calls.
|
||
|
||
**Complexity:** Medium — requires sub-persona generation, storage, and selection logic in `write_chapter()`
|
||
|
||
**Risk:** Sub-personas might be inconsistent with each other if not carefully designed
|
||
|
||
---
|
||
|
||
### Alt 2-D: Specialized Chapter Templates
|
||
|
||
**Description:** Create genre-specific "chapter templates" for common patterns: opening chapters, mid-point reversals, climax chapters, denouements. The planner selects the appropriate template when assigning structure, reducing the amount of creative work needed per chapter.
|
||
|
||
**Hypothesis:** Chapter templates reduce Phase 3 beat expansion cost by ~40% (pre-structured templates need less expansion) and reduce rewrite rate by ~15% (templates encode known-good patterns).
|
||
|
||
**Complexity:** Medium — requires template library and selection logic
|
||
|
||
**Risk:** Templates might make books feel formulaic
|
||
|
||
---
|
||
|
||
## Phase 3: The Writing Engine
|
||
|
||
### Current Approach
|
||
Single-model drafting with up to 3 attempts. Low-scoring drafts trigger full rewrites using the Pro model. Evaluation happens after each draft.
|
||
|
||
---
|
||
|
||
### Alt 3-A: Two-Pass Drafting (Cheap Draft + Expensive Polish)
|
||
|
||
**Description:** Use the cheapest available Flash model for a rough first draft (focused on getting beats covered and word count right), then use the Pro model to polish prose quality. Skip the evaluation + rewrite loop entirely.
|
||
|
||
**Hypothesis:** Two-pass drafting reduces average chapter evaluation score variance (fewer very-low scores), but might be slower because every chapter gets polished regardless of quality. Net cost impact uncertain — depends on Flash vs Pro price differential. At current pricing (Flash free on Pro-Exp), this is equivalent to the current approach.
|
||
|
||
**Complexity:** Low — add a "polish" pass after initial draft in `write_chapter()`
|
||
|
||
**Risk:** Polish pass might not improve chapters that have structural problems (wrong beats covered)
|
||
|
||
---
|
||
|
||
### Alt 3-B: Adaptive Scoring Thresholds
|
||
|
||
**Description:** Use different scoring thresholds based on chapter position and importance:
|
||
- Setup chapters (1–20% of book): SCORE_PASSING = 6.5 (accept imperfect early work)
|
||
- Midpoint + rising action (20–70%): SCORE_PASSING = 7.0 (current standard)
|
||
- Climax + resolution (70–100%): SCORE_PASSING = 7.5 (stricter standards for crucial chapters)
|
||
|
||
**Hypothesis:** Adaptive thresholds reduce refinement calls on setup chapters by ~25% while improving quality of climax chapters. Net token saving ~100K per book (~$0.02) with no quality loss on high-stakes scenes.
|
||
|
||
**Complexity:** Very low — change 2 constants in `write_chapter()` to be position-aware
|
||
|
||
**Risk:** Lower-quality setup chapters might affect reader engagement in early pages
|
||
|
||
---
|
||
|
||
### Alt 3-C: Pre-Scoring Outline Beats
|
||
|
||
**Description:** Before writing any chapter, use the Logic model to score each chapter's beat list for "writability" — the likelihood that the beats will produce a high-quality first draft. Flag chapters scoring below 6/10 as "high-risk" and assign them extra write attempts upfront.
|
||
|
||
**Hypothesis:** Pre-scoring beats adds ~5K tokens per book but reduces full-rewrite incidents by ~30% (the most expensive outcome). Expected saving: 30% × 15 rewrites × 50K tokens = ~225K tokens (~$0.05).
|
||
|
||
**Complexity:** Low — add `score_beats_writability()` call before Phase 3 loop
|
||
|
||
**Risk:** Pre-scoring accuracy might be low; Logic model can't fully predict quality from beats alone
|
||
|
||
---
|
||
|
||
### Alt 3-D: Persona Caching (Immediate Win)
|
||
|
||
**Description:** Load the author persona (bio, sample text, sample files) once per book run rather than re-reading from disk for each chapter. Store in memory and pass to `write_chapter()` as a pre-built string.
|
||
|
||
**Hypothesis:** Persona caching reduces per-chapter I/O overhead and eliminates redundant file reads. No quality impact. Saves ~90K tokens per book (3K tokens × 30 chapters from persona sample files).
|
||
|
||
**Complexity:** Very low — refactor engine.py to load persona once and pass it
|
||
|
||
**Risk:** None
|
||
|
||
---
|
||
|
||
### Alt 3-E: Skip Beat Expansion for Detailed Beats
|
||
|
||
**Description:** If a chapter's beats already exceed 100 words each, skip `expand_beats_to_treatment()`. The existing beats are detailed enough to guide the writer.
|
||
|
||
**Hypothesis:** ~30% of chapters have detailed beats. Skipping expansion saves 5K tokens × 30% × 30 chapters = ~45K tokens. Quality impact negligible for already-detailed beats.
|
||
|
||
**Complexity:** Very low — add word-count check before calling `expand_beats_to_treatment()`
|
||
|
||
**Risk:** None for already-detailed beats; risk only if threshold is set too low
|
||
|
||
---
|
||
|
||
## Phase 4: Review & Refinement
|
||
|
||
### Current Approach
|
||
Per-chapter evaluation with 13 rubrics. Post-generation consistency check. Dynamic pacing interventions. User-triggered ripple propagation.
|
||
|
||
---
|
||
|
||
### Alt 4-A: Batched Chapter Evaluation
|
||
|
||
**Description:** Instead of evaluating each chapter individually (~20K tokens/eval), batch 3–5 chapters per evaluation call. The evaluator assesses them together and can identify cross-chapter issues (pacing, voice consistency) that per-chapter evaluation misses.
|
||
|
||
**Hypothesis:** Batched evaluation reduces evaluation token cost by ~60% (from 600K to 240K tokens) while improving cross-chapter quality detection. Risk: individual chapter scores may be less granular.
|
||
|
||
**Complexity:** Medium — refactor `evaluate_chapter_quality()` to accept chapter arrays
|
||
|
||
**Risk:** Batched scoring might be less precise per-chapter; harder to pinpoint which chapter needs rewriting
|
||
|
||
---
|
||
|
||
### Alt 4-B: Mid-Generation Consistency Snapshots
|
||
|
||
**Description:** Run `analyze_consistency()` every 10 chapters (not just post-generation). If contradictions are found, pause writing and resolve them before proceeding.
|
||
|
||
**Hypothesis:** Mid-generation consistency checks add ~3 Logic calls per 30-chapter book (~75K tokens, FREE) but reduce post-generation ripple propagation cost by ~50% by catching issues early.
|
||
|
||
**Complexity:** Low — add consistency snapshot call to engine.py loop
|
||
|
||
**Risk:** Consistency check might generate false positives that stall generation
|
||
|
||
---
|
||
|
||
### Alt 4-C: Semantic Ripple Detection
|
||
|
||
**Description:** Replace LLM-based ripple detection in `check_and_propagate()` with an embedding-similarity approach. When Chapter N is edited, compute semantic similarity between Chapter N's content and all downstream chapters. Only rewrite chapters above a similarity threshold.
|
||
|
||
**Hypothesis:** Semantic ripple detection reduces per-ripple token cost from ~15K (LLM scan) to ~2K (embedding query) — 87% reduction. Accuracy comparable to LLM for direct references; may miss indirect narrative impacts.
|
||
|
||
**Complexity:** High — requires adding `sentence-transformers` or Gemini embedding API dependency
|
||
|
||
**Risk:** Embedding similarity doesn't capture narrative causality (e.g., a character dying affects later chapters even if the death isn't mentioned verbatim)
|
||
|
||
---
|
||
|
||
### Alt 4-D: Editor Bot Specialization
|
||
|
||
**Description:** Create specialized sub-evaluators for specific failure modes:
|
||
- `check_filter_words()` — fast regex-based scan (no LLM needed)
|
||
- `check_summary_mode()` — detect scene-skipping patterns
|
||
- `check_voice_consistency()` — compare chapter voice against persona sample
|
||
- `check_plot_adherence()` — verify beats were covered
|
||
|
||
Run cheap checks first; only invoke full 13-rubric LLM evaluation if fast checks pass.
|
||
|
||
**Hypothesis:** Specialized editor bots reduce evaluation cost by ~40% (many chapters fail fast checks and don't need full LLM eval). Quality detection equal or better because fast checks are more precise for rule violations.
|
||
|
||
**Complexity:** Medium — implement regex-based fast checks; modify evaluation pipeline
|
||
|
||
**Risk:** Fast checks might have false positives that reject good chapters prematurely
|
||
|
||
---
|
||
|
||
## Summary: Hypotheses Ranked by Expected Value
|
||
|
||
| Alt | Phase | Expected Token Saving | Quality Impact | Complexity |
|
||
|-----|-------|----------------------|----------------|------------|
|
||
| 3-D (Persona Cache) | 3 | ~90K | None | Very Low |
|
||
| 3-E (Skip Beat Expansion) | 3 | ~45K | None | Very Low |
|
||
| 2-B (Outline Validation) | 2 | Prevents ~100K rewrites | Positive | Low |
|
||
| 3-B (Adaptive Thresholds) | 3 | ~100K | Positive | Very Low |
|
||
| 1-C (Persona Validation) | 1 | ~60K (prevented rewrites) | Positive | Low |
|
||
| 4-B (Mid-gen Consistency) | 4 | ~75K (prevented rewrites) | Positive | Low |
|
||
| 3-C (Pre-score Beats) | 3 | ~225K | Positive | Low |
|
||
| 4-A (Batch Evaluation) | 4 | ~360K | Neutral/Positive | Medium |
|
||
| 2-A (Single-pass Outline) | 2 | ~60K | Neutral | Low |
|
||
| 3-B (Two-Pass Drafting) | 3 | Neutral | Potentially Positive | Low |
|
||
| 4-D (Editor Bots) | 4 | ~240K | Positive | Medium |
|
||
| 2-C (Dynamic Personas) | 2 | -12K (slight increase) | Positive | Medium |
|
||
| 4-C (Semantic Ripple) | 4 | ~200K | Neutral | High |
|