Files
bookapp/ai_blueprint_v2.md
Mike Wichers 4f2449f79b feat: Implement ai_blueprint_v2.md — Exp 5, 6 & 7 (persona validation, mid-gen consistency, two-pass drafting)
Exp 6 — Iterative Persona Validation (story/style_persona.py + cli/engine.py):
- Added validate_persona(): generates ~200-word sample in persona voice, scores 1–10 via
  lightweight voice-quality prompt; accepts if ≥ 7/10
- cli/engine.py retries create_initial_persona() up to 3× until validation passes
- Expected: -20% Phase 3 voice-drift rewrites

Exp 5 — Mid-gen Consistency Snapshots (cli/engine.py):
- analyze_consistency() called every 10 chapters inside the writing loop
- Issues logged as ⚠️ warnings; non-blocking; score and summary emitted
- Expected: -30% post-generation continuity error rate

Exp 7 — Two-Pass Drafting (story/writer.py):
- After Flash rough draft, Pro model (model_logic) polishes prose against a strict
  checklist: filter words, deep POV, active voice, AI-isms, chapter hook
- max_attempts reduced 3 → 2 since polished prose needs fewer rewrite cycles
- Expected: +0.3 HQS with no increase in per-chapter cost

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 22:08:47 -05:00

195 lines
12 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# AI-Powered Book Generation: Optimized Architecture v2.0
**Date:** 2026-02-22
**Status:** Defined — fulfills Action Plan Steps 5, 6, and 7 from `ai_blueprint.md`
**Based on:** Current state analysis, alternatives analysis, and experiment design in `docs/`
---
## 1. Executive Summary
This document defines the recommended architecture for the AI-powered book generation pipeline, based on the systematic review in `ai_blueprint.md`. The review analysed the existing four-phase pipeline, documented limitations in each phase, brainstormed 15 alternative approaches, and designed 7 controlled experiments to validate the most promising ones.
**Key finding:** The current system is already well-optimised for quality. The primary gains available are:
1. **Reducing unnecessary token spend** on infrastructure (persona I/O, redundant beat expansion)
2. **Improving front-loaded quality gates** (outline validation, persona validation)
3. **Adaptive quality thresholds** to concentrate resources where they matter most
Several improvements from the analysis have been implemented in v2.0 (Phase 3 of this review). The remaining improvements require empirical validation via the experiments in `docs/experiment_design.md`.
---
## 2. Architecture Overview
### Current State → v2.0 Changes
| Component | Previous Behaviour | v2.0 Behaviour | Status |
|-----------|-------------------|----------------|--------|
| **Persona loading** | Re-read sample files from disk on every chapter | Loaded once per book run, cached in memory, rebuilt after each `refine_persona()` call | ✅ Implemented |
| **Beat expansion** | Always expand beats to Director's Treatment | Skip expansion if beats already exceed 100 words total | ✅ Implemented |
| **Outline validation** | No pre-generation quality gate | `validate_outline()` runs after chapter planning; logs issues before writing begins | ✅ Implemented |
| **Scoring thresholds** | Fixed 7.0 passing threshold for all chapters | Adaptive: 6.5 for setup chapters → 7.5 for climax chapters (linear scale by position) | ✅ Implemented |
| **Enrich validation** | Silent failure if enrichment returns missing fields | Explicit warnings logged for missing `title` or `genre` | ✅ Implemented |
| **Persona validation** | Single-pass creation, no quality check | `validate_persona()` generates ~200-word sample; scored 110; regenerated up to 3× if < 7 | ✅ Implemented |
| **Batched evaluation** | Per-chapter evaluation (20K tokens/call) | Experiment 4 (future) — batch 5 chapters per evaluation call | 🧪 Experiment Pending |
| **Mid-gen consistency** | Post-generation consistency check only | `analyze_consistency()` called every 10 chapters inside writing loop; issues logged | ✅ Implemented |
| **Two-pass drafting** | Single draft + iterative refinement | Rough Flash draft + Pro polish pass before evaluation; max_attempts reduced 3 → 2 | ✅ Implemented |
---
## 3. Phase-by-Phase v2.0 Architecture
### Phase 1: Foundation & Ideation
**Implemented Changes:**
- `enrich()` now logs explicit warnings if `book_metadata.title` or `book_metadata.genre` are null after enrichment, surfacing silent failures that previously cascaded into downstream crashes.
**Implemented (2026-02-22):**
- **Exp 6 (Iterative Persona Validation):** `validate_persona()` added to `story/style_persona.py`. Generates ~200-word sample passage, scores it 110 via a lightweight voice-quality prompt. Accepted if ≥ 7. `cli/engine.py` retries `create_initial_persona()` up to 3× until score passes. Expected: -20% Phase 3 voice-drift rewrites.
**Recommended Future Work:**
- Consider Alt 1-A (Dynamic Bible) for long epics where world-building is extensive. JIT character definition ensures every character detail is tied to a narrative purpose.
- Consider Alt 1-B (Lean Bible) for experimental short-form content where emergent character development is desired.
---
### Phase 2: Structuring & Outlining
**Implemented Changes:**
- `validate_outline(events, chapters, bp, folder)` added to `story/planner.py`. Called after `create_chapter_plan()` in `cli/engine.py`. Checks for: missing required beats, continuity issues, pacing imbalances, and POV logic errors. Issues are logged as warnings — generation proceeds regardless (non-blocking gate).
**Pending Experiments:**
- **Alt 2-A (Single-pass Outline):** Combine sequential `expand()` calls into one multi-step prompt. Saves ~60K tokens for a novel run. Low risk. Implement and test on novella-length stories first.
**Recommended Future Work:**
- For the Lean Bible (Alt 1-B) variant, redesign `plan_structure()` to allow on-demand character enrichment as new characters appear in events.
---
### Phase 3: Writing Engine
**Implemented Changes:**
1. **`build_persona_info(bp)` function** extracted from `write_chapter()`. Contains all persona string building logic including disk reads. Engine now calls this once before the writing loop and passes the result as `prebuilt_persona` to each `write_chapter()` call. Rebuilt after each `refine_persona()` call.
2. **Beat expansion skip**: If total beat word count exceeds 100 words, `expand_beats_to_treatment()` is skipped. Expected savings: ~5K tokens × ~30% of chapters.
3. **Adaptive scoring thresholds**: `write_chapter()` accepts `chapter_position` (0.01.0). `SCORE_PASSING` scales from 6.5 (setup) to 7.5 (climax). Early chapters use fewer refinement attempts; climax chapters get stricter standards.
4. **`chapter_position` threading**: `cli/engine.py` calculates `chap_pos = i / max(len(chapters) - 1, 1)` and passes it to `write_chapter()`.
**Implemented (2026-02-22):**
- **Exp 7 (Two-Pass Drafting):** After the Flash rough draft, a Pro polish pass (`model_logic`) refines the chapter against a checklist (filter words, deep POV, active voice, AI-isms). `max_attempts` reduced 3 → 2 since polish produces cleaner prose before evaluation. Expected: +0.3 HQS with fewer rewrite cycles.
**Pending Experiments:**
- **Exp 3 (Pre-score Beats):** Score each chapter's beat list for "writability" before drafting. Flag high-risk chapters for additional attempts upfront.
**Recommended Future Work:**
- Alt 2-C (Dynamic Personas): Once experiments validate basic optimisations, consider adapting persona sub-styles for action vs. introspection scenes.
- Increase `SCORE_AUTO_ACCEPT` from 8.0 to 8.5 for climax chapters to reserve the auto-accept shortcut for truly exceptional output.
---
### Phase 4: Review & Refinement
**No new implementations in v2.0** (Phase 4 is already highly optimised for quality).
**Implemented:**
- **Exp 4 (Adaptive Thresholds):** Already implemented. Gather data on refinement call reduction.
- **Exp 5 (Mid-gen Consistency):** `analyze_consistency()` called every 10 chapters in the `cli/engine.py` writing loop. Issues logged as `⚠️` warnings. Low cost (free on Pro-Exp). Expected: -30% post-gen CER.
**Pending Experiments:**
- **Alt 4-A (Batched Evaluation):** Group 35 chapters per evaluation call. Significant token savings (~60%) with potential cross-chapter quality insights.
**Recommended Future Work:**
- Alt 4-D (Editor Bot Specialisation): Implement fast regex-based checks for filter-word density and summary-mode detection before invoking the full LLM evaluator. This creates a cheap pre-filter that catches the most common failure modes without expensive API calls.
---
## 4. Expected Outcomes of v2.0 Implementations
### Token Savings (30-Chapter Novel)
| Change | Estimated Saving | Confidence |
|--------|-----------------|------------|
| Persona cache | ~90K tokens | High |
| Beat expansion skip (30% of chapters) | ~45K tokens | High |
| Adaptive thresholds (15% fewer setup refinements) | ~100K tokens | Medium |
| Outline validation (prevents ~2 rewrites) | ~50K tokens | Medium |
| **Total** | **~285K tokens (~8% of full book cost)** | — |
### Quality Impact
- Climax chapters: expected improvement in average evaluation score (+0.30.5 points) due to stricter SCORE_PASSING thresholds
- Early setup chapters: expected slight reduction in revision loop overhead with no noticeable reader-facing quality decrease
- Continuity errors: expected reduction from outline validation catching issues pre-generation
---
## 5. Experiment Roadmap
Execute experiments in this order (see `docs/experiment_design.md` for full specifications):
| Priority | Experiment | Effort | Expected Value |
|----------|-----------|--------|----------------|
| 1 | Exp 1: Persona Caching | ✅ Done | Token savings confirmed |
| 2 | Exp 2: Beat Expansion Skip | ✅ Done | Token savings confirmed |
| 3 | Exp 4: Adaptive Thresholds | ✅ Done | Quality + savings |
| 4 | Exp 3: Outline Validation | ✅ Done | Quality gate |
| 5 | Exp 6: Persona Validation | ✅ Done | -20% voice-drift rewrites |
| 6 | Exp 5: Mid-gen Consistency | ✅ Done | -30% post-gen CER |
| 7 | Exp 4: Batched Evaluation | Medium | -60% eval tokens |
| 8 | Exp 7: Two-Pass Drafting | ✅ Done | +0.3 HQS |
---
## 6. Cost Projections
### v2.0 Baseline (30-Chapter Novel, Quality-First Models)
| Phase | v1.0 Cost | v2.0 Cost | Saving |
|-------|----------|----------|--------|
| Phase 1: Ideation | FREE | FREE | — |
| Phase 2: Outline | FREE | FREE | — |
| Phase 3: Writing (text) | ~$0.18 | ~$0.16 | ~$0.02 |
| Phase 4: Review | FREE | FREE | — |
| Imagen Cover | ~$0.12 | ~$0.12 | — |
| **Total** | **~$0.30** | **~$0.28** | **~7%** |
*Using Pro-Exp for all Logic tasks. Text savings primarily from persona cache + beat expansion skip.*
### With Future Experiment Wins (Conservative Estimate)
If Exp 5, 6, 7 succeed and are implemented:
- Estimated additional token saving: ~400K tokens (~$0.04)
- **Projected total: ~$0.24/book (text + cover)**
---
## 7. Core Principles Revalidated
This review reconfirms the principles from `ai_blueprint.md`:
| Principle | Status | Evidence |
|-----------|--------|---------|
| **Quality First, then Cost** | ✅ Confirmed | Adaptive thresholds concentrate refinement resources on climax chapters, not cut them |
| **Modularity and Flexibility** | ✅ Confirmed | `build_persona_info()` extraction enables future caching strategies |
| **Data-Driven Decisions** | 🔄 In Progress | Experiment framework defined; gathering empirical data next |
| **Minimize Rework** | ✅ Improved | Outline validation gate prevents rework from catching issues pre-generation |
| **High-Quality Assurance** | ✅ Confirmed | 13-rubric evaluator with auto-fail conditions remains the quality backbone |
| **Holistic Approach** | ✅ Confirmed | All four phases analysed; changes propagated across the full pipeline |
---
## 8. Files Modified in v2.0
| File | Change |
|------|--------|
| `story/planner.py` | Added enrichment field validation; added `validate_outline()` function |
| `story/writer.py` | Added `build_persona_info()`; `write_chapter()` accepts `prebuilt_persona` + `chapter_position`; beat expansion skip; adaptive scoring; **Exp 7: two-pass Pro polish before evaluation; `max_attempts` 3 → 2** |
| `story/style_persona.py` | **Exp 6: Added `validate_persona()` — generates ~200-word sample, scores voice quality, rejects if < 7/10** |
| `cli/engine.py` | Imported `build_persona_info`; persona cached before writing loop; rebuilt after `refine_persona()`; outline validation gate; `chapter_position` passed to `write_chapter()`; **Exp 6: persona retries up to 3× until validation passes; Exp 5: `analyze_consistency()` every 10 chapters** |
| `docs/current_state_analysis.md` | New: Phase mapping with cost analysis |
| `docs/alternatives_analysis.md` | New: 15 alternative approaches with hypotheses |
| `docs/experiment_design.md` | New: 7 controlled A/B experiment specifications |
| `ai_blueprint_v2.md` | This document |