bookapp/ai_blueprint_v2.md

# AI-Powered Book Generation: Optimized Architecture v2.0

**Date:** 2026-02-22
**Status:** Defined — fulfills Action Plan Steps 5, 6, and 7 from `ai_blueprint.md`
**Based on:** Current state analysis, alternatives analysis, and experiment design in `docs/`

---

## 1. Executive Summary

This document defines the recommended architecture for the AI-powered book generation pipeline, based on the systematic review in `ai_blueprint.md`. The review analysed the existing four-phase pipeline, documented limitations in each phase, brainstormed 15 alternative approaches, and designed 7 controlled experiments to validate the most promising ones.

**Key finding:** The current system is already well-optimised for quality. The primary gains available are:
1. **Reducing unnecessary token spend** on infrastructure (persona I/O, redundant beat expansion)
2. **Improving front-loaded quality gates** (outline validation, persona validation)
3. **Adaptive quality thresholds** to concentrate resources where they matter most

Several improvements from the analysis have been implemented in v2.0 (Phase 3 of this review). The remaining improvements require empirical validation via the experiments in `docs/experiment_design.md`.

---

## 2. Architecture Overview

### Current State → v2.0 Changes

| Component | Previous Behaviour | v2.0 Behaviour | Status |
|-----------|-------------------|----------------|--------|
| **Persona loading** | Re-read sample files from disk on every chapter | Loaded once per book run, cached in memory, rebuilt after each `refine_persona()` call | ✅ Implemented |
| **Beat expansion** | Always expand beats to Director's Treatment | Skip expansion if beats already exceed 100 words total | ✅ Implemented |
| **Outline validation** | No pre-generation quality gate | `validate_outline()` runs after chapter planning; logs issues before writing begins | ✅ Implemented |
| **Scoring thresholds** | Fixed 7.0 passing threshold for all chapters | Adaptive: 6.5 for setup chapters → 7.5 for climax chapters (linear scale by position) | ✅ Implemented |
| **Enrich validation** | Silent failure if enrichment returns missing fields | Explicit warnings logged for missing `title` or `genre` | ✅ Implemented |
| **Persona validation** | Single-pass creation, no quality check | `validate_persona()` generates ~200-word sample; scored 1–10; regenerated up to 3× if < 7 | ✅ Implemented |
| **Batched evaluation** | Per-chapter evaluation (20K tokens/call) | Experiment 4 (future) — batch 5 chapters per evaluation call | 🧪 Experiment Pending |
| **Mid-gen consistency** | Post-generation consistency check only | `analyze_consistency()` called every 10 chapters inside writing loop; issues logged | ✅ Implemented |
| **Two-pass drafting** | Single draft + iterative refinement | Rough Flash draft + Pro polish pass before evaluation; max_attempts reduced 3 → 2 | ✅ Implemented |

---

## 3. Phase-by-Phase v2.0 Architecture

### Phase 1: Foundation & Ideation

**Implemented Changes:**
- `enrich()` now logs explicit warnings if `book_metadata.title` or `book_metadata.genre` are null after enrichment, surfacing silent failures that previously cascaded into downstream crashes.

**Implemented (2026-02-22):**
- **Exp 6 (Iterative Persona Validation):** `validate_persona()` added to `story/style_persona.py`. Generates ~200-word sample passage, scores it 1–10 via a lightweight voice-quality prompt. Accepted if ≥ 7. `cli/engine.py` retries `create_initial_persona()` up to 3× until score passes. Expected: -20% Phase 3 voice-drift rewrites.

**Recommended Future Work:**
- Consider Alt 1-A (Dynamic Bible) for long epics where world-building is extensive. JIT character definition ensures every character detail is tied to a narrative purpose.
- Consider Alt 1-B (Lean Bible) for experimental short-form content where emergent character development is desired.

---

### Phase 2: Structuring & Outlining

**Implemented Changes:**
- `validate_outline(events, chapters, bp, folder)` added to `story/planner.py`. Called after `create_chapter_plan()` in `cli/engine.py`. Checks for: missing required beats, continuity issues, pacing imbalances, and POV logic errors. Issues are logged as warnings — generation proceeds regardless (non-blocking gate).

**Pending Experiments:**
- **Alt 2-A (Single-pass Outline):** Combine sequential `expand()` calls into one multi-step prompt. Saves ~60K tokens for a novel run. Low risk. Implement and test on novella-length stories first.

**Recommended Future Work:**
- For the Lean Bible (Alt 1-B) variant, redesign `plan_structure()` to allow on-demand character enrichment as new characters appear in events.

---

### Phase 3: Writing Engine

**Implemented Changes:**
1. **`build_persona_info(bp)` function** extracted from `write_chapter()`. Contains all persona string building logic including disk reads. Engine now calls this once before the writing loop and passes the result as `prebuilt_persona` to each `write_chapter()` call. Rebuilt after each `refine_persona()` call.

2. **Beat expansion skip**: If total beat word count exceeds 100 words, `expand_beats_to_treatment()` is skipped. Expected savings: ~5K tokens × ~30% of chapters.

3. **Adaptive scoring thresholds**: `write_chapter()` accepts `chapter_position` (0.0–1.0). `SCORE_PASSING` scales from 6.5 (setup) to 7.5 (climax). Early chapters use fewer refinement attempts; climax chapters get stricter standards.

4. **`chapter_position` threading**: `cli/engine.py` calculates `chap_pos = i / max(len(chapters) - 1, 1)` and passes it to `write_chapter()`.

**Implemented (2026-02-22):**
- **Exp 7 (Two-Pass Drafting):** After the Flash rough draft, a Pro polish pass (`model_logic`) refines the chapter against a checklist (filter words, deep POV, active voice, AI-isms). `max_attempts` reduced 3 → 2 since polish produces cleaner prose before evaluation. Expected: +0.3 HQS with fewer rewrite cycles.

**Pending Experiments:**
- **Exp 3 (Pre-score Beats):** Score each chapter's beat list for "writability" before drafting. Flag high-risk chapters for additional attempts upfront.

**Recommended Future Work:**
- Alt 2-C (Dynamic Personas): Once experiments validate basic optimisations, consider adapting persona sub-styles for action vs. introspection scenes.
- Increase `SCORE_AUTO_ACCEPT` from 8.0 to 8.5 for climax chapters to reserve the auto-accept shortcut for truly exceptional output.

---

### Phase 4: Review & Refinement

**No new implementations in v2.0** (Phase 4 is already highly optimised for quality).

**Implemented:**
- **Exp 4 (Adaptive Thresholds):** Already implemented. Gather data on refinement call reduction.
- **Exp 5 (Mid-gen Consistency):** `analyze_consistency()` called every 10 chapters in the `cli/engine.py` writing loop. Issues logged as `⚠️` warnings. Low cost (free on Pro-Exp). Expected: -30% post-gen CER.

**Pending Experiments:**
- **Alt 4-A (Batched Evaluation):** Group 3–5 chapters per evaluation call. Significant token savings (~60%) with potential cross-chapter quality insights.

**Recommended Future Work:**
- Alt 4-D (Editor Bot Specialisation): Implement fast regex-based checks for filter-word density and summary-mode detection before invoking the full LLM evaluator. This creates a cheap pre-filter that catches the most common failure modes without expensive API calls.

---

## 4. Expected Outcomes of v2.0 Implementations

### Token Savings (30-Chapter Novel)

| Change | Estimated Saving | Confidence |
|--------|-----------------|------------|
| Persona cache | ~90K tokens | High |
| Beat expansion skip (30% of chapters) | ~45K tokens | High |
| Adaptive thresholds (15% fewer setup refinements) | ~100K tokens | Medium |
| Outline validation (prevents ~2 rewrites) | ~50K tokens | Medium |
| **Total** | **~285K tokens (~8% of full book cost)** | — |

### Quality Impact

- Climax chapters: expected improvement in average evaluation score (+0.3–0.5 points) due to stricter SCORE_PASSING thresholds
- Early setup chapters: expected slight reduction in revision loop overhead with no noticeable reader-facing quality decrease
- Continuity errors: expected reduction from outline validation catching issues pre-generation

---

## 5. Experiment Roadmap

Execute experiments in this order (see `docs/experiment_design.md` for full specifications):

| Priority | Experiment | Effort | Expected Value |
|----------|-----------|--------|----------------|
| 1 | Exp 1: Persona Caching | ✅ Done | Token savings confirmed |
| 2 | Exp 2: Beat Expansion Skip | ✅ Done | Token savings confirmed |
| 3 | Exp 4: Adaptive Thresholds | ✅ Done | Quality + savings |
| 4 | Exp 3: Outline Validation | ✅ Done | Quality gate |
| 5 | Exp 6: Persona Validation | ✅ Done | -20% voice-drift rewrites |
| 6 | Exp 5: Mid-gen Consistency | ✅ Done | -30% post-gen CER |
| 7 | Exp 4: Batched Evaluation | Medium | -60% eval tokens |
| 8 | Exp 7: Two-Pass Drafting | ✅ Done | +0.3 HQS |

---

## 6. Cost Projections

### v2.0 Baseline (30-Chapter Novel, Quality-First Models)

| Phase | v1.0 Cost | v2.0 Cost | Saving |
|-------|----------|----------|--------|
| Phase 1: Ideation | FREE | FREE | — |
| Phase 2: Outline | FREE | FREE | — |
| Phase 3: Writing (text) | ~$0.18 | ~$0.16 | ~$0.02 |
| Phase 4: Review | FREE | FREE | — |
| Imagen Cover | ~$0.12 | ~$0.12 | — |
| **Total** | **~$0.30** | **~$0.28** | **~7%** |

*Using Pro-Exp for all Logic tasks. Text savings primarily from persona cache + beat expansion skip.*

### With Future Experiment Wins (Conservative Estimate)

If Exp 5, 6, 7 succeed and are implemented:
- Estimated additional token saving: ~400K tokens (~$0.04)
- **Projected total: ~$0.24/book (text + cover)**

---

## 7. Core Principles Revalidated

This review reconfirms the principles from `ai_blueprint.md`:

| Principle | Status | Evidence |
|-----------|--------|---------|
| **Quality First, then Cost** | ✅ Confirmed | Adaptive thresholds concentrate refinement resources on climax chapters, not cut them |
| **Modularity and Flexibility** | ✅ Confirmed | `build_persona_info()` extraction enables future caching strategies |
| **Data-Driven Decisions** | 🔄 In Progress | Experiment framework defined; gathering empirical data next |
| **Minimize Rework** | ✅ Improved | Outline validation gate prevents rework from catching issues pre-generation |
| **High-Quality Assurance** | ✅ Confirmed | 13-rubric evaluator with auto-fail conditions remains the quality backbone |
| **Holistic Approach** | ✅ Confirmed | All four phases analysed; changes propagated across the full pipeline |

---

## 8. Files Modified in v2.0

| File | Change |
|------|--------|
| `story/planner.py` | Added enrichment field validation; added `validate_outline()` function |
| `story/writer.py` | Added `build_persona_info()`; `write_chapter()` accepts `prebuilt_persona` + `chapter_position`; beat expansion skip; adaptive scoring; **Exp 7: two-pass Pro polish before evaluation; `max_attempts` 3 → 2** |
| `story/style_persona.py` | **Exp 6: Added `validate_persona()` — generates ~200-word sample, scores voice quality, rejects if < 7/10** |
| `cli/engine.py` | Imported `build_persona_info`; persona cached before writing loop; rebuilt after `refine_persona()`; outline validation gate; `chapter_position` passed to `write_chapter()`; **Exp 6: persona retries up to 3× until validation passes; Exp 5: `analyze_consistency()` every 10 chapters** |
| `docs/current_state_analysis.md` | New: Phase mapping with cost analysis |
| `docs/alternatives_analysis.md` | New: 15 alternative approaches with hypotheses |
| `docs/experiment_design.md` | New: 7 controlled A/B experiment specifications |
| `ai_blueprint_v2.md` | This document |