Files
bookapp/docs/current_state_analysis.md
Mike Wichers 2100ca2312 feat: Implement ai_blueprint.md action plan — architectural review & optimisations
Steps 1–7 of the ai_blueprint.md action plan executed:

DOCUMENTATION (Steps 1–3, 6–7):
- docs/current_state_analysis.md: Phase-by-phase cost/quality mapping of existing pipeline
- docs/alternatives_analysis.md: 15 alternative approaches with testable hypotheses
- docs/experiment_design.md: 7 controlled A/B experiment specifications (CPC, HQS, CER metrics)
- ai_blueprint_v2.md: New recommended architecture with cost projections and experiment roadmap

CODE IMPROVEMENTS (Step 4 — Experiments 1–4 implemented):
- story/writer.py: Extract build_persona_info() — persona loaded once per book, not per chapter
- story/writer.py: Adaptive scoring thresholds — SCORE_PASSING scales 6.5→7.5 by chapter position
- story/writer.py: Beat expansion skip — if beats >100 words, skip Director's Treatment expansion
- story/planner.py: validate_outline() — pre-generation gate checks missing beats, continuity, pacing
- story/planner.py: Enrichment field validation — warn on missing title/genre after enrich()
- cli/engine.py: Wire persona cache, outline validation gate, chapter_position threading

Expected savings: ~285K tokens per 30-chapter novel (~7% cost reduction)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 22:01:30 -05:00

239 lines
11 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Current State Analysis: BookApp AI Pipeline
**Date:** 2026-02-22
**Scope:** Mapping existing codebase to the four phases defined in `ai_blueprint.md`
**Status:** Completed — fulfills Action Plan Step 1
---
## Overview
BookApp is an AI-powered novel generation engine using Google Gemini. The pipeline is structured into four phases that map directly to the review framework in `ai_blueprint.md`. This document catalogues the current implementation, identifies efficiency metrics, and surfaces limitations in each phase.
---
## Phase 1: Foundation & Ideation ("The Seed")
**Primary File:** `story/planner.py` (lines 186)
**Supporting:** `story/style_persona.py` (lines 81104), `core/config.py`
### What Happens
1. User provides a minimal `manual_instruction` (can be a single sentence).
2. `enrich(bp, folder, context)` calls the Logic model to expand this into:
- `book_metadata`: title, genre, tone, time period, structure type, formatting rules, content warnings
- `characters`: 28 named characters with roles and descriptions
- `plot_beats`: 57 concrete narrative beats
3. If the project is part of a series, context from previous books is injected.
4. `create_initial_persona()` generates a fictional author persona (name, bio, age, gender).
### Costs (Per Book)
| Task | Model | Input Tokens | Output Tokens | Cost (Pro-Exp) |
|------|-------|-------------|---------------|----------------|
| `enrich()` | Logic | ~10K | ~3K | FREE |
| `create_initial_persona()` | Logic | ~5.5K | ~1.5K | FREE |
| **Phase 1 Total** | — | ~15.5K | ~4.5K | **FREE** |
### Known Limitations
| ID | Issue | Impact |
|----|-------|--------|
| P1-L1 | `enrich()` silently returns original BP on exception (line 84) | Invalid enrichment passes downstream without warning |
| P1-L2 | `filter_characters()` blacklists keywords like "TBD", "protagonist" — can cull valid names | Characters named "The Protagonist" are silently dropped |
| P1-L3 | Single-pass persona creation — no quality check on output | Generic personas produce poor voice throughout book |
| P1-L4 | No validation that required `book_metadata` fields are non-null | Downstream crashes when title/genre are missing |
---
## Phase 2: Structuring & Outlining
**Primary File:** `story/planner.py` (lines 89290)
**Supporting:** `story/style_persona.py`
### What Happens
1. `plan_structure(bp, folder)` maps plot beats to a structural framework (Hero's Journey, Three-Act, etc.) and produces ~1015 events.
2. `expand(events, pass_num, ...)` iteratively enriches the outline. Called `depth` times (14 based on length preset). Each pass targets chapter count × 1.5 events as ceiling.
3. `create_chapter_plan(events, bp, folder)` converts events into concrete chapter objects with POV, pacing, and estimated word count.
4. `get_style_guidelines()` loads or refreshes the AI-ism blacklist and filter-word list.
### Depth Strategy
| Preset | Depth | Expand Calls | Approx Events |
|--------|-------|-------------|---------------|
| Flash Fiction | 1 | 1 | 1 |
| Short Story | 1 | 1 | 5 |
| Novella | 2 | 2 | 15 |
| Novel | 3 | 3 | 30 |
| Epic | 4 | 4 | 50 |
### Costs (30-Chapter Novel)
| Task | Calls | Input Tokens | Cost (Pro-Exp) |
|------|-------|-------------|----------------|
| `plan_structure` | 1 | ~15K | FREE |
| `expand` × 3 | 3 | ~12K each | FREE |
| `create_chapter_plan` | 1 | ~14K | FREE |
| `get_style_guidelines` | 1 | ~8K | FREE |
| **Phase 2 Total** | 6 | ~73K | **FREE** |
### Known Limitations
| ID | Issue | Impact |
|----|-------|--------|
| P2-L1 | Sequential `expand()` calls — each call unaware of final state | Redundant inter-call work; could be one multi-step prompt |
| P2-L2 | No continuity validation on outline — character deaths/revivals not detected | Plot holes remain until expensive Phase 3 rewrite |
| P2-L3 | Static chapter plan — cannot adapt if early chapters reveal pacing problem | Dynamic interventions in Phase 4 are costly workarounds |
| P2-L4 | POV assignment is AI-generated, not validated against narrative logic | Wrong POV on key scenes; caught only during editing |
| P2-L5 | Word count estimates are rough (~±30% actual variance) | Writer overshoots/undershoots target; word count normalization fails |
---
## Phase 3: The Writing Engine (Drafting)
**Primary File:** `story/writer.py`
**Orchestrated by:** `cli/engine.py`
### What Happens
For each chapter:
1. `expand_beats_to_treatment()` — Logic model expands sparse beats into a "Director's Treatment" (staging, sensory anchors, emotional arc, subtext).
2. `write_chapter()` constructs a ~310-line prompt injecting:
- Author persona (bio, sample text, sample files from disk)
- Filtered characters (only those named in beats + POV character)
- Character tracking state (location, clothing, held items)
- Lore context (relevant locations/items from tracking)
- Style guidelines + genre-specific mandates
- Smart context tail: last ~1000 tokens of previous chapter
- Director's Treatment
3. Writer model generates first draft.
4. Logic model evaluates on 13 rubrics (110 scale). Automatic fail conditions apply for filter-word density, summary mode, and labeled emotions.
5. Iterative quality loop (up to 3 attempts):
- Score ≥ 8.0 → Auto-accept
- Score ≥ 7.0 → Accept after max attempts
- Score < 7.0 → Refinement pass (Writer model)
- Score < 6.0 → Full rewrite (Pro model)
6. Every 5 chapters: `refine_persona()` updates author bio based on actual written text.
### Key Innovations
- **Dynamic Character Injection:** Only injects characters named in chapter beats (saves ~5K tokens/chapter).
- **Smart Context Tail:** Takes last ~1000 tokens of previous chapter (not first 1000) — preserves handoff point.
- **Auto Model Escalation:** Low-scoring drafts trigger switch to Pro model for full rewrite.
### Costs (30-Chapter Novel, Mixed Model Strategy)
| Task | Calls | Input Tokens | Output Tokens | Cost Estimate |
|------|-------|-------------|---------------|---------------|
| `expand_beats_to_treatment` × 30 | 30 | ~5K | ~2K | FREE (Logic) |
| `write_chapter` draft × 30 | 30 | ~25K | ~3.5K | ~$0.087 (Writer) |
| Evaluation × 30 | 30 | ~20K | ~1.5K | FREE (Logic) |
| Refinement passes × 15 (est.) | 15 | ~20K | ~3K | ~$0.090 (Writer) |
| `refine_persona` × 6 | 6 | ~6K | ~1.5K | FREE (Logic) |
| **Phase 3 Total** | ~111 | ~1.9M | ~310K | **~$0.18** |
### Known Limitations
| ID | Issue | Impact |
|----|-------|--------|
| P3-L1 | Persona files re-read from disk on every chapter | I/O overhead; persona doesn't change between reads |
| P3-L2 | Beat expansion called even when beats are already detailed (>100 words) | Wastes ~5K tokens/chapter on ~30% of chapters |
| P3-L3 | Full rewrite triggered at score < 6.0 — discards entire draft | If draft scores 5.9, all 25K output tokens wasted |
| P3-L4 | No priority weighting for climax chapters | Ch 28 (climax) uses same resources/attempts as Ch 3 (setup) |
| P3-L5 | Previous chapter context hard-capped at 1000 tokens | For long chapters, might miss setup context from earlier pages |
| P3-L6 | Scoring thresholds fixed regardless of book position | Strict standards in early chapters = expensive refinement for setup scenes |
---
## Phase 4: Review & Refinement (Editing)
**Primary Files:** `story/editor.py`, `story/bible_tracker.py`
**Orchestrated by:** `cli/engine.py`
### What Happens
**During writing loop (every chapter):**
- `update_tracking()` refreshes character state (location, clothing, held items, speech style, events).
- `update_lore_index()` extracts canonical descriptions of locations and items.
**Every 2 chapters:**
- `check_pacing()` detects if story is rushing or repeating beats; triggers ADD_BRIDGE or CUT_NEXT interventions.
**After writing completes:**
- `analyze_consistency()` scans entire manuscript for plot holes and contradictions.
- `harvest_metadata()` extracts newly invented characters not in the original bible.
- `check_and_propagate()` cascades chapter edits forward through the manuscript.
### 13 Evaluation Rubrics
1. Engagement & tension
2. Scene execution (no summaries)
3. Voice & tone
4. Sensory immersion
5. Show, Don't Tell / Deep POV (**auto-fail trigger**)
6. Character agency
7. Pacing
8. Genre appropriateness
9. Dialogue authenticity
10. Plot relevance
11. Staging & flow
12. Prose dynamics (sentence variety)
13. Clarity & readability
**Automatic fail conditions:** filter-word density > 1/120 words → cap at 5; summary mode detected → cap at 6; >3 labeled emotions → cap at 5.
### Costs (30-Chapter Novel)
| Task | Calls | Input Tokens | Cost (Pro-Exp) |
|------|-------|-------------|----------------|
| `update_tracking` × 30 | 30 | ~18K | FREE |
| `update_lore_index` × 30 | 30 | ~15K | FREE |
| `check_pacing` × 15 | 15 | ~18K | FREE |
| `analyze_consistency` | 1 | ~25K | FREE |
| `harvest_metadata` | 1 | ~25K | FREE |
| **Phase 4 Total** | 77 | ~1.34M | **FREE** |
### Known Limitations
| ID | Issue | Impact |
|----|-------|--------|
| P4-L1 | Consistency check is post-generation only | Plot holes caught too late to cheaply fix |
| P4-L2 | Ripple propagation (`check_and_propagate`) has no cost ceiling | A single user edit in Ch 5 can trigger 100K+ tokens of cascading rewrites |
| P4-L3 | `rewrite_chapter_content()` uses Logic model instead of Writer model | Less creative rewrite output — Logic model optimizes reasoning, not prose |
| P4-L4 | `check_pacing()` sampling only looks at recent chapters, not cumulative arc | Slow-building issues across 10+ chapters not detected until critical |
| P4-L5 | No quality metric for the evaluator itself | Can't confirm if 13-rubric scores are calibrated correctly |
---
## Cross-Phase Summary
### Total Costs (30-Chapter Novel)
| Phase | Token Budget | Cost Estimate |
|-------|-------------|---------------|
| Phase 1: Ideation | ~20K | FREE |
| Phase 2: Outline | ~73K | FREE |
| Phase 3: Writing | ~2.2M | ~$0.18 |
| Phase 4: Review | ~1.34M | FREE |
| Imagen Cover (3 images) | — | ~$0.12 |
| **Total** | **~3.63M** | **~$0.30** |
*Assumes quality-first model selection (Pro-Exp for Logic, Flash for Writer)*
### Efficiency Frontier
- **Best case** (all chapters pass first attempt): ~$0.18 text + $0.04 cover = ~$0.22
- **Worst case** (30% rewrite rate with Pro escalations): ~$0.45 text + $0.12 cover = ~$0.57
- **Budget per blueprint goal:** $2.00 total — current system is 1529% of budget
### Top 5 Immediate Optimization Opportunities
| Priority | ID | Change | Savings |
|----------|----|--------|---------|
| 1 | P3-L1 | Cache persona per book (not per chapter) | ~90K tokens |
| 2 | P3-L2 | Skip beat expansion for detailed beats | ~45K tokens |
| 3 | P2-L2 | Add pre-generation outline validation | Prevent expensive rewrites |
| 4 | P1-L1 | Fix silent failure in `enrich()` | Prevent silent corrupt state |
| 5 | P3-L6 | Adaptive scoring thresholds by chapter position | ~15% fewer refinement passes |