feat: Improve book quality — stronger evaluator, more refinement attempts, quality-first model selection

- Fix: chapter quality evaluation now uses model_logic (free Pro) instead of model_writer (Flash).
  The model that wrote the chapter was also scoring it, causing circular, lenient grading.
- Increase max_attempts in write_chapter from 2 to 3 for more refinement passes per chapter.
- Update auto model selection prompt (ai/setup.py) to prioritize quality over budget framing:
  free/preview/exp models preferred by capability (Pro > Flash, 2.5 > 2.0 > 1.5), not just cost.
  Writer role now allowed to use best free Flash/Pro preview — not restricted to basic Flash only.
- Bump version to 3.0.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-02-22 21:28:49 -05:00
parent f740174257
commit 6684ec2bf5
4 changed files with 20 additions and 16 deletions

View File

@@ -115,7 +115,7 @@ Open `http://localhost:5000`.
- **Dynamic Pacing:** Monitors story progress during writing and inserts bridge chapters to slow a rushing plot or removes redundant ones detected mid-stream — without restarting. - **Dynamic Pacing:** Monitors story progress during writing and inserts bridge chapters to slow a rushing plot or removes redundant ones detected mid-stream — without restarting.
- **Series Continuity:** When generating Book 2+, carries forward character visual tracking, established relationships, plot threads, and a cumulative "Story So Far" summary. - **Series Continuity:** When generating Book 2+, carries forward character visual tracking, established relationships, plot threads, and a cumulative "Story So Far" summary.
- **Persona Refinement Loop:** Every 5 chapters, analyzes actual written text to refine the author persona model, maintaining stylistic consistency throughout the book. - **Persona Refinement Loop:** Every 5 chapters, analyzes actual written text to refine the author persona model, maintaining stylistic consistency throughout the book.
- **Consistency Checker (`editor.py`):** Scores chapters on 8 rubrics (engagement, voice, sensory detail, scene execution, etc.) and flags AI-isms ("tapestry", "palpable tension") and weak filter verbs ("felt", "realized"). - **Consistency Checker (`editor.py`):** Scores chapters on 13 rubrics (engagement, voice, sensory detail, scene execution, dialogue, pacing, staging, prose dynamics, clarity, etc.) and flags AI-isms ("tapestry", "palpable tension") and weak filter verbs ("felt", "realized"). Chapter evaluation now uses the Logic model (free Pro) rather than the Writer model, ensuring stricter and more accurate scoring.
- **Dynamic Character Injection (`writer.py`):** Only injects characters explicitly named in the chapter's `scene_beats` plus the POV character into the writer prompt. Eliminates token waste from unused characters and reduces hallucinated appearances. - **Dynamic Character Injection (`writer.py`):** Only injects characters explicitly named in the chapter's `scene_beats` plus the POV character into the writer prompt. Eliminates token waste from unused characters and reduces hallucinated appearances.
- **Smart Context Tail (`writer.py`):** Extracts the final ~1,000 tokens of the previous chapter (the actual ending) rather than blindly truncating from the front. Ensures the hand-off point — where characters are standing and what was last said — is always preserved. - **Smart Context Tail (`writer.py`):** Extracts the final ~1,000 tokens of the previous chapter (the actual ending) rather than blindly truncating from the front. Ensures the hand-off point — where characters are standing and what was last said — is always preserved.
- **Stateful Scene Tracking (`bible_tracker.py`):** After each chapter, the tracker records each character's `current_location`, `time_of_day`, and `held_items` in addition to appearance and events. This scene state is injected into subsequent chapter prompts so the writer knows exactly where characters are, what time it is, and what they're carrying. - **Stateful Scene Tracking (`bible_tracker.py`):** After each chapter, the tracker records each character's `current_location`, `time_of_day`, and `held_items` in addition to appearance and events. This scene state is injected into subsequent chapter prompts so the writer knows exactly where characters are, what time it is, and what they're carrying.
@@ -130,7 +130,7 @@ Open `http://localhost:5000`.
### AI Infrastructure (`ai/`) ### AI Infrastructure (`ai/`)
- **Resilient Model Wrapper:** Wraps every Gemini API call with up to 3 retries and exponential backoff, handles quota errors and rate limits, and can switch to an alternative model mid-stream. - **Resilient Model Wrapper:** Wraps every Gemini API call with up to 3 retries and exponential backoff, handles quota errors and rate limits, and can switch to an alternative model mid-stream.
- **Auto Model Selection:** On startup, a bootstrapper model queries the Gemini API and selects the optimal models for Logic, Writer, Artist, and Image roles. Selection is cached for 24 hours. - **Auto Model Selection:** On startup, a bootstrapper model queries the Gemini API and selects the optimal models for Logic, Writer, Artist, and Image roles. Selection is cached for 24 hours. The selection algorithm now prioritizes quality — free/preview/exp models are preferred by capability (Pro > Flash, 2.5 > 2.0 > 1.5) rather than by cost alone.
- **Vertex AI Support:** If `GCP_PROJECT` is set and OAuth credentials are present, initializes Vertex AI automatically for Imagen image generation. - **Vertex AI Support:** If `GCP_PROJECT` is set and OAuth credentials are present, initializes Vertex AI automatically for Imagen image generation.
- **Payload Guardrails:** Every generation call estimates the prompt token count before dispatch. If the payload exceeds 30,000 tokens, a warning is logged so runaway context injection is surfaced immediately. - **Payload Guardrails:** Every generation call estimates the prompt token count before dispatch. If the payload exceeds 30,000 tokens, a warning is logged so runaway context injection is surfaced immediately.

View File

@@ -76,14 +76,14 @@ def select_best_models(force_refresh=False):
prompt = f""" prompt = f"""
ROLE: AI Model Architect ROLE: AI Model Architect
TASK: Select the optimal Gemini models for a book-writing application. TASK: Select the optimal Gemini models for a book-writing application.
PRIMARY OBJECTIVE: Keep total book generation cost under $2.00. Quality is secondary to this budget. PRIMARY OBJECTIVE: Maximize book quality. Free/preview/exp models are $0.00 — use the BEST quality free model available for every role. Only fall back to paid Flash when no free alternative exists, and only if it fits within the budget cap.
AVAILABLE_MODELS: AVAILABLE_MODELS:
{json.dumps(compatible)} {json.dumps(compatible)}
PRICING_CONTEXT (USD per 1M tokens — use these to calculate actual book cost): PRICING_CONTEXT (USD per 1M tokens — use these to calculate actual book cost):
- FREE TIER: Any model with 'exp', 'beta', or 'preview' in name = $0.00. Always prefer these. - FREE TIER: Any model with 'exp', 'beta', or 'preview' in name = $0.00. Always prefer these.
e.g. gemini-2.0-pro-exp = FREE, gemini-2.5-pro-preview = FREE. e.g. gemini-2.0-pro-exp = FREE, gemini-2.5-pro-preview = FREE, gemini-2.5-flash-preview = FREE.
- gemini-2.5-flash / gemini-2.5-flash-preview: ~$0.075 Input / $0.30 Output. - gemini-2.5-flash / gemini-2.5-flash-preview: ~$0.075 Input / $0.30 Output.
- gemini-2.0-flash: ~$0.10 Input / $0.40 Output. - gemini-2.0-flash: ~$0.10 Input / $0.40 Output.
- gemini-1.5-flash: ~$0.075 Input / $0.30 Output. - gemini-1.5-flash: ~$0.075 Input / $0.30 Output.
@@ -92,9 +92,9 @@ def select_best_models(force_refresh=False):
BOOK TOKEN BUDGET (30-chapter novel — use this to calculate real cost before deciding): BOOK TOKEN BUDGET (30-chapter novel — use this to calculate real cost before deciding):
Logic role total: ~265,000 input tokens + ~55,000 output tokens Logic role total: ~265,000 input tokens + ~55,000 output tokens
(planning, state tracking, consistency checks, director treatments per chapter) (planning, state tracking, consistency checks, director treatments, chapter evaluation per chapter)
Writer role total: ~450,000 input tokens + ~135,000 output tokens Writer role total: ~450,000 input tokens + ~135,000 output tokens
(drafting, evaluation, refinement per chapter — 2 passes max) (drafting, refinement per chapter — 3 passes max)
Artist role total: ~30,000 input tokens + ~8,000 output tokens Artist role total: ~30,000 input tokens + ~8,000 output tokens
(cover art prompt design, cover layout, blurb, image quality evaluation — text calls only) (cover art prompt design, cover layout, blurb, image quality evaluation — text calls only)
@@ -107,19 +107,23 @@ def select_best_models(force_refresh=False):
(leaving $0.15 headroom for Imagen cover generation, total book target: $2.00). (leaving $0.15 headroom for Imagen cover generation, total book target: $2.00).
SELECTION RULES (apply in order): SELECTION RULES (apply in order):
1. FREE FIRST: If a free/exp model exists (any tier, any quality), pick it for Logic. Cost = $0. 1. FREE/PREVIEW ALWAYS WINS: Always pick the highest-quality free/exp/preview model for each role.
2. FLASH FOR WRITER: Flash is sufficient for fiction prose. Never pick a paid Pro for Writer. Free models cost $0 regardless of tier — a free Pro beats a paid Flash every time.
2. QUALITY FOR WRITER: The Writer role produces all fiction prose. Prefer the best free Flash or
free Pro variant available. If no free model exists for Writer, use the cheapest paid Flash
that keeps the total budget under $1.85. Never use a paid stable Pro for Writer.
3. CALCULATE: For non-free models, compute the actual book cost using the token budget above. 3. CALCULATE: For non-free models, compute the actual book cost using the token budget above.
Reject any combination that exceeds $2.00 total. Reject any combination that exceeds $2.00 total.
4. QUALITY TIEBREAK: Among models with similar cost, prefer newer generation (2.x > 1.5). 4. QUALITY TIEBREAK: Among models with identical cost (e.g. both free), prefer the highest
generation and capability: Pro > Flash, 2.5 > 2.0 > 1.5, stable > exp only if cost equal.
5. NO THINKING MODELS: Too slow and expensive for any role. 5. NO THINKING MODELS: Too slow and expensive for any role.
ROLES: ROLES:
- LOGIC: Planning, JSON adherence, plot consistency. Free/exp Pro ideal; Flash acceptable. - LOGIC: Planning, JSON adherence, plot consistency, AND chapter quality evaluation. Best free/exp Pro is ideal; free Flash preview acceptable if no free Pro exists.
- WRITER: Creative prose, chapter drafting. Flash 2.x is sufficient — do NOT use paid Pro. - WRITER: Creative prose, chapter drafting and refinement. Best available free Flash or free Pro variant. Never use a paid stable Pro.
- ARTIST: Visual prompts for cover art. Cheapest capable Flash model. - ARTIST: Visual prompts for cover art. Cheapest capable Flash model (free preferred).
- PRO_REWRITE: Emergency full-chapter rewrite (rare, ~1-2x per book). Best free/exp Pro available. - PRO_REWRITE: Emergency full-chapter rewrite (rare, ~1-2x per book). Best free/exp Pro available.
If no free Pro exists, use best Flash — do not use paid Pro even here. If no free Pro exists, use best free Flash preview — do not use paid models here.
OUTPUT_FORMAT (JSON only, no markdown): OUTPUT_FORMAT (JSON only, no markdown):
{{ {{

View File

@@ -66,4 +66,4 @@ LENGTH_DEFINITIONS = {
} }
# --- SYSTEM --- # --- SYSTEM ---
VERSION = "2.9" VERSION = "3.0"

View File

@@ -327,7 +327,7 @@ def write_chapter(chap, bp, folder, prev_sum, tracking=None, prev_content=None,
utils.log("WRITER", f"⚠️ Failed Ch {chap['chapter_number']}: {e}") utils.log("WRITER", f"⚠️ Failed Ch {chap['chapter_number']}: {e}")
return f"## Chapter {chap['chapter_number']} Failed\n\nError: {e}" return f"## Chapter {chap['chapter_number']} Failed\n\nError: {e}"
max_attempts = 2 max_attempts = 3
SCORE_AUTO_ACCEPT = 8 SCORE_AUTO_ACCEPT = 8
SCORE_PASSING = 7 SCORE_PASSING = 7
SCORE_REWRITE_THRESHOLD = 6 SCORE_REWRITE_THRESHOLD = 6
@@ -338,7 +338,7 @@ def write_chapter(chap, bp, folder, prev_sum, tracking=None, prev_content=None,
for attempt in range(1, max_attempts + 1): for attempt in range(1, max_attempts + 1):
utils.log("WRITER", f" -> Evaluating Ch {chap['chapter_number']} (Attempt {attempt}/{max_attempts})...") utils.log("WRITER", f" -> Evaluating Ch {chap['chapter_number']} (Attempt {attempt}/{max_attempts})...")
score, critique = evaluate_chapter_quality(current_text, chap['title'], meta.get('genre', 'Fiction'), ai_models.model_writer, folder, series_context=series_block.strip()) score, critique = evaluate_chapter_quality(current_text, chap['title'], meta.get('genre', 'Fiction'), ai_models.model_logic, folder, series_context=series_block.strip())
past_critiques.append(f"Attempt {attempt}: {critique}") past_critiques.append(f"Attempt {attempt}: {critique}")