feat: Implement ai_blueprint.md action plan — architectural review & optimisations
Steps 1–7 of the ai_blueprint.md action plan executed: DOCUMENTATION (Steps 1–3, 6–7): - docs/current_state_analysis.md: Phase-by-phase cost/quality mapping of existing pipeline - docs/alternatives_analysis.md: 15 alternative approaches with testable hypotheses - docs/experiment_design.md: 7 controlled A/B experiment specifications (CPC, HQS, CER metrics) - ai_blueprint_v2.md: New recommended architecture with cost projections and experiment roadmap CODE IMPROVEMENTS (Step 4 — Experiments 1–4 implemented): - story/writer.py: Extract build_persona_info() — persona loaded once per book, not per chapter - story/writer.py: Adaptive scoring thresholds — SCORE_PASSING scales 6.5→7.5 by chapter position - story/writer.py: Beat expansion skip — if beats >100 words, skip Director's Treatment expansion - story/planner.py: validate_outline() — pre-generation gate checks missing beats, continuity, pacing - story/planner.py: Enrichment field validation — warn on missing title/genre after enrich() - cli/engine.py: Wire persona cache, outline validation gate, chapter_position threading Expected savings: ~285K tokens per 30-chapter novel (~7% cost reduction) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
238
docs/current_state_analysis.md
Normal file
238
docs/current_state_analysis.md
Normal file
@@ -0,0 +1,238 @@
|
||||
# Current State Analysis: BookApp AI Pipeline
|
||||
|
||||
**Date:** 2026-02-22
|
||||
**Scope:** Mapping existing codebase to the four phases defined in `ai_blueprint.md`
|
||||
**Status:** Completed — fulfills Action Plan Step 1
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
BookApp is an AI-powered novel generation engine using Google Gemini. The pipeline is structured into four phases that map directly to the review framework in `ai_blueprint.md`. This document catalogues the current implementation, identifies efficiency metrics, and surfaces limitations in each phase.
|
||||
|
||||
---
|
||||
|
||||
## Phase 1: Foundation & Ideation ("The Seed")
|
||||
|
||||
**Primary File:** `story/planner.py` (lines 1–86)
|
||||
**Supporting:** `story/style_persona.py` (lines 81–104), `core/config.py`
|
||||
|
||||
### What Happens
|
||||
|
||||
1. User provides a minimal `manual_instruction` (can be a single sentence).
|
||||
2. `enrich(bp, folder, context)` calls the Logic model to expand this into:
|
||||
- `book_metadata`: title, genre, tone, time period, structure type, formatting rules, content warnings
|
||||
- `characters`: 2–8 named characters with roles and descriptions
|
||||
- `plot_beats`: 5–7 concrete narrative beats
|
||||
3. If the project is part of a series, context from previous books is injected.
|
||||
4. `create_initial_persona()` generates a fictional author persona (name, bio, age, gender).
|
||||
|
||||
### Costs (Per Book)
|
||||
|
||||
| Task | Model | Input Tokens | Output Tokens | Cost (Pro-Exp) |
|
||||
|------|-------|-------------|---------------|----------------|
|
||||
| `enrich()` | Logic | ~10K | ~3K | FREE |
|
||||
| `create_initial_persona()` | Logic | ~5.5K | ~1.5K | FREE |
|
||||
| **Phase 1 Total** | — | ~15.5K | ~4.5K | **FREE** |
|
||||
|
||||
### Known Limitations
|
||||
|
||||
| ID | Issue | Impact |
|
||||
|----|-------|--------|
|
||||
| P1-L1 | `enrich()` silently returns original BP on exception (line 84) | Invalid enrichment passes downstream without warning |
|
||||
| P1-L2 | `filter_characters()` blacklists keywords like "TBD", "protagonist" — can cull valid names | Characters named "The Protagonist" are silently dropped |
|
||||
| P1-L3 | Single-pass persona creation — no quality check on output | Generic personas produce poor voice throughout book |
|
||||
| P1-L4 | No validation that required `book_metadata` fields are non-null | Downstream crashes when title/genre are missing |
|
||||
|
||||
---
|
||||
|
||||
## Phase 2: Structuring & Outlining
|
||||
|
||||
**Primary File:** `story/planner.py` (lines 89–290)
|
||||
**Supporting:** `story/style_persona.py`
|
||||
|
||||
### What Happens
|
||||
|
||||
1. `plan_structure(bp, folder)` maps plot beats to a structural framework (Hero's Journey, Three-Act, etc.) and produces ~10–15 events.
|
||||
2. `expand(events, pass_num, ...)` iteratively enriches the outline. Called `depth` times (1–4 based on length preset). Each pass targets chapter count × 1.5 events as ceiling.
|
||||
3. `create_chapter_plan(events, bp, folder)` converts events into concrete chapter objects with POV, pacing, and estimated word count.
|
||||
4. `get_style_guidelines()` loads or refreshes the AI-ism blacklist and filter-word list.
|
||||
|
||||
### Depth Strategy
|
||||
|
||||
| Preset | Depth | Expand Calls | Approx Events |
|
||||
|--------|-------|-------------|---------------|
|
||||
| Flash Fiction | 1 | 1 | 1 |
|
||||
| Short Story | 1 | 1 | 5 |
|
||||
| Novella | 2 | 2 | 15 |
|
||||
| Novel | 3 | 3 | 30 |
|
||||
| Epic | 4 | 4 | 50 |
|
||||
|
||||
### Costs (30-Chapter Novel)
|
||||
|
||||
| Task | Calls | Input Tokens | Cost (Pro-Exp) |
|
||||
|------|-------|-------------|----------------|
|
||||
| `plan_structure` | 1 | ~15K | FREE |
|
||||
| `expand` × 3 | 3 | ~12K each | FREE |
|
||||
| `create_chapter_plan` | 1 | ~14K | FREE |
|
||||
| `get_style_guidelines` | 1 | ~8K | FREE |
|
||||
| **Phase 2 Total** | 6 | ~73K | **FREE** |
|
||||
|
||||
### Known Limitations
|
||||
|
||||
| ID | Issue | Impact |
|
||||
|----|-------|--------|
|
||||
| P2-L1 | Sequential `expand()` calls — each call unaware of final state | Redundant inter-call work; could be one multi-step prompt |
|
||||
| P2-L2 | No continuity validation on outline — character deaths/revivals not detected | Plot holes remain until expensive Phase 3 rewrite |
|
||||
| P2-L3 | Static chapter plan — cannot adapt if early chapters reveal pacing problem | Dynamic interventions in Phase 4 are costly workarounds |
|
||||
| P2-L4 | POV assignment is AI-generated, not validated against narrative logic | Wrong POV on key scenes; caught only during editing |
|
||||
| P2-L5 | Word count estimates are rough (~±30% actual variance) | Writer overshoots/undershoots target; word count normalization fails |
|
||||
|
||||
---
|
||||
|
||||
## Phase 3: The Writing Engine (Drafting)
|
||||
|
||||
**Primary File:** `story/writer.py`
|
||||
**Orchestrated by:** `cli/engine.py`
|
||||
|
||||
### What Happens
|
||||
|
||||
For each chapter:
|
||||
1. `expand_beats_to_treatment()` — Logic model expands sparse beats into a "Director's Treatment" (staging, sensory anchors, emotional arc, subtext).
|
||||
2. `write_chapter()` constructs a ~310-line prompt injecting:
|
||||
- Author persona (bio, sample text, sample files from disk)
|
||||
- Filtered characters (only those named in beats + POV character)
|
||||
- Character tracking state (location, clothing, held items)
|
||||
- Lore context (relevant locations/items from tracking)
|
||||
- Style guidelines + genre-specific mandates
|
||||
- Smart context tail: last ~1000 tokens of previous chapter
|
||||
- Director's Treatment
|
||||
3. Writer model generates first draft.
|
||||
4. Logic model evaluates on 13 rubrics (1–10 scale). Automatic fail conditions apply for filter-word density, summary mode, and labeled emotions.
|
||||
5. Iterative quality loop (up to 3 attempts):
|
||||
- Score ≥ 8.0 → Auto-accept
|
||||
- Score ≥ 7.0 → Accept after max attempts
|
||||
- Score < 7.0 → Refinement pass (Writer model)
|
||||
- Score < 6.0 → Full rewrite (Pro model)
|
||||
6. Every 5 chapters: `refine_persona()` updates author bio based on actual written text.
|
||||
|
||||
### Key Innovations
|
||||
|
||||
- **Dynamic Character Injection:** Only injects characters named in chapter beats (saves ~5K tokens/chapter).
|
||||
- **Smart Context Tail:** Takes last ~1000 tokens of previous chapter (not first 1000) — preserves handoff point.
|
||||
- **Auto Model Escalation:** Low-scoring drafts trigger switch to Pro model for full rewrite.
|
||||
|
||||
### Costs (30-Chapter Novel, Mixed Model Strategy)
|
||||
|
||||
| Task | Calls | Input Tokens | Output Tokens | Cost Estimate |
|
||||
|------|-------|-------------|---------------|---------------|
|
||||
| `expand_beats_to_treatment` × 30 | 30 | ~5K | ~2K | FREE (Logic) |
|
||||
| `write_chapter` draft × 30 | 30 | ~25K | ~3.5K | ~$0.087 (Writer) |
|
||||
| Evaluation × 30 | 30 | ~20K | ~1.5K | FREE (Logic) |
|
||||
| Refinement passes × 15 (est.) | 15 | ~20K | ~3K | ~$0.090 (Writer) |
|
||||
| `refine_persona` × 6 | 6 | ~6K | ~1.5K | FREE (Logic) |
|
||||
| **Phase 3 Total** | ~111 | ~1.9M | ~310K | **~$0.18** |
|
||||
|
||||
### Known Limitations
|
||||
|
||||
| ID | Issue | Impact |
|
||||
|----|-------|--------|
|
||||
| P3-L1 | Persona files re-read from disk on every chapter | I/O overhead; persona doesn't change between reads |
|
||||
| P3-L2 | Beat expansion called even when beats are already detailed (>100 words) | Wastes ~5K tokens/chapter on ~30% of chapters |
|
||||
| P3-L3 | Full rewrite triggered at score < 6.0 — discards entire draft | If draft scores 5.9, all 25K output tokens wasted |
|
||||
| P3-L4 | No priority weighting for climax chapters | Ch 28 (climax) uses same resources/attempts as Ch 3 (setup) |
|
||||
| P3-L5 | Previous chapter context hard-capped at 1000 tokens | For long chapters, might miss setup context from earlier pages |
|
||||
| P3-L6 | Scoring thresholds fixed regardless of book position | Strict standards in early chapters = expensive refinement for setup scenes |
|
||||
|
||||
---
|
||||
|
||||
## Phase 4: Review & Refinement (Editing)
|
||||
|
||||
**Primary Files:** `story/editor.py`, `story/bible_tracker.py`
|
||||
**Orchestrated by:** `cli/engine.py`
|
||||
|
||||
### What Happens
|
||||
|
||||
**During writing loop (every chapter):**
|
||||
- `update_tracking()` refreshes character state (location, clothing, held items, speech style, events).
|
||||
- `update_lore_index()` extracts canonical descriptions of locations and items.
|
||||
|
||||
**Every 2 chapters:**
|
||||
- `check_pacing()` detects if story is rushing or repeating beats; triggers ADD_BRIDGE or CUT_NEXT interventions.
|
||||
|
||||
**After writing completes:**
|
||||
- `analyze_consistency()` scans entire manuscript for plot holes and contradictions.
|
||||
- `harvest_metadata()` extracts newly invented characters not in the original bible.
|
||||
- `check_and_propagate()` cascades chapter edits forward through the manuscript.
|
||||
|
||||
### 13 Evaluation Rubrics
|
||||
|
||||
1. Engagement & tension
|
||||
2. Scene execution (no summaries)
|
||||
3. Voice & tone
|
||||
4. Sensory immersion
|
||||
5. Show, Don't Tell / Deep POV (**auto-fail trigger**)
|
||||
6. Character agency
|
||||
7. Pacing
|
||||
8. Genre appropriateness
|
||||
9. Dialogue authenticity
|
||||
10. Plot relevance
|
||||
11. Staging & flow
|
||||
12. Prose dynamics (sentence variety)
|
||||
13. Clarity & readability
|
||||
|
||||
**Automatic fail conditions:** filter-word density > 1/120 words → cap at 5; summary mode detected → cap at 6; >3 labeled emotions → cap at 5.
|
||||
|
||||
### Costs (30-Chapter Novel)
|
||||
|
||||
| Task | Calls | Input Tokens | Cost (Pro-Exp) |
|
||||
|------|-------|-------------|----------------|
|
||||
| `update_tracking` × 30 | 30 | ~18K | FREE |
|
||||
| `update_lore_index` × 30 | 30 | ~15K | FREE |
|
||||
| `check_pacing` × 15 | 15 | ~18K | FREE |
|
||||
| `analyze_consistency` | 1 | ~25K | FREE |
|
||||
| `harvest_metadata` | 1 | ~25K | FREE |
|
||||
| **Phase 4 Total** | 77 | ~1.34M | **FREE** |
|
||||
|
||||
### Known Limitations
|
||||
|
||||
| ID | Issue | Impact |
|
||||
|----|-------|--------|
|
||||
| P4-L1 | Consistency check is post-generation only | Plot holes caught too late to cheaply fix |
|
||||
| P4-L2 | Ripple propagation (`check_and_propagate`) has no cost ceiling | A single user edit in Ch 5 can trigger 100K+ tokens of cascading rewrites |
|
||||
| P4-L3 | `rewrite_chapter_content()` uses Logic model instead of Writer model | Less creative rewrite output — Logic model optimizes reasoning, not prose |
|
||||
| P4-L4 | `check_pacing()` sampling only looks at recent chapters, not cumulative arc | Slow-building issues across 10+ chapters not detected until critical |
|
||||
| P4-L5 | No quality metric for the evaluator itself | Can't confirm if 13-rubric scores are calibrated correctly |
|
||||
|
||||
---
|
||||
|
||||
## Cross-Phase Summary
|
||||
|
||||
### Total Costs (30-Chapter Novel)
|
||||
|
||||
| Phase | Token Budget | Cost Estimate |
|
||||
|-------|-------------|---------------|
|
||||
| Phase 1: Ideation | ~20K | FREE |
|
||||
| Phase 2: Outline | ~73K | FREE |
|
||||
| Phase 3: Writing | ~2.2M | ~$0.18 |
|
||||
| Phase 4: Review | ~1.34M | FREE |
|
||||
| Imagen Cover (3 images) | — | ~$0.12 |
|
||||
| **Total** | **~3.63M** | **~$0.30** |
|
||||
|
||||
*Assumes quality-first model selection (Pro-Exp for Logic, Flash for Writer)*
|
||||
|
||||
### Efficiency Frontier
|
||||
|
||||
- **Best case** (all chapters pass first attempt): ~$0.18 text + $0.04 cover = ~$0.22
|
||||
- **Worst case** (30% rewrite rate with Pro escalations): ~$0.45 text + $0.12 cover = ~$0.57
|
||||
- **Budget per blueprint goal:** $2.00 total — current system is 15–29% of budget
|
||||
|
||||
### Top 5 Immediate Optimization Opportunities
|
||||
|
||||
| Priority | ID | Change | Savings |
|
||||
|----------|----|--------|---------|
|
||||
| 1 | P3-L1 | Cache persona per book (not per chapter) | ~90K tokens |
|
||||
| 2 | P3-L2 | Skip beat expansion for detailed beats | ~45K tokens |
|
||||
| 3 | P2-L2 | Add pre-generation outline validation | Prevent expensive rewrites |
|
||||
| 4 | P1-L1 | Fix silent failure in `enrich()` | Prevent silent corrupt state |
|
||||
| 5 | P3-L6 | Adaptive scoring thresholds by chapter position | ~15% fewer refinement passes |
|
||||
Reference in New Issue
Block a user