feat: Improve book quality — stronger evaluator, more refinement attempts, quality-first model selection
- Fix: chapter quality evaluation now uses model_logic (free Pro) instead of model_writer (Flash). The model that wrote the chapter was also scoring it, causing circular, lenient grading. - Increase max_attempts in write_chapter from 2 to 3 for more refinement passes per chapter. - Update auto model selection prompt (ai/setup.py) to prioritize quality over budget framing: free/preview/exp models preferred by capability (Pro > Flash, 2.5 > 2.0 > 1.5), not just cost. Writer role now allowed to use best free Flash/Pro preview — not restricted to basic Flash only. - Bump version to 3.0. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -115,7 +115,7 @@ Open `http://localhost:5000`.
|
|||||||
- **Dynamic Pacing:** Monitors story progress during writing and inserts bridge chapters to slow a rushing plot or removes redundant ones detected mid-stream — without restarting.
|
- **Dynamic Pacing:** Monitors story progress during writing and inserts bridge chapters to slow a rushing plot or removes redundant ones detected mid-stream — without restarting.
|
||||||
- **Series Continuity:** When generating Book 2+, carries forward character visual tracking, established relationships, plot threads, and a cumulative "Story So Far" summary.
|
- **Series Continuity:** When generating Book 2+, carries forward character visual tracking, established relationships, plot threads, and a cumulative "Story So Far" summary.
|
||||||
- **Persona Refinement Loop:** Every 5 chapters, analyzes actual written text to refine the author persona model, maintaining stylistic consistency throughout the book.
|
- **Persona Refinement Loop:** Every 5 chapters, analyzes actual written text to refine the author persona model, maintaining stylistic consistency throughout the book.
|
||||||
- **Consistency Checker (`editor.py`):** Scores chapters on 8 rubrics (engagement, voice, sensory detail, scene execution, etc.) and flags AI-isms ("tapestry", "palpable tension") and weak filter verbs ("felt", "realized").
|
- **Consistency Checker (`editor.py`):** Scores chapters on 13 rubrics (engagement, voice, sensory detail, scene execution, dialogue, pacing, staging, prose dynamics, clarity, etc.) and flags AI-isms ("tapestry", "palpable tension") and weak filter verbs ("felt", "realized"). Chapter evaluation now uses the Logic model (free Pro) rather than the Writer model, ensuring stricter and more accurate scoring.
|
||||||
- **Dynamic Character Injection (`writer.py`):** Only injects characters explicitly named in the chapter's `scene_beats` plus the POV character into the writer prompt. Eliminates token waste from unused characters and reduces hallucinated appearances.
|
- **Dynamic Character Injection (`writer.py`):** Only injects characters explicitly named in the chapter's `scene_beats` plus the POV character into the writer prompt. Eliminates token waste from unused characters and reduces hallucinated appearances.
|
||||||
- **Smart Context Tail (`writer.py`):** Extracts the final ~1,000 tokens of the previous chapter (the actual ending) rather than blindly truncating from the front. Ensures the hand-off point — where characters are standing and what was last said — is always preserved.
|
- **Smart Context Tail (`writer.py`):** Extracts the final ~1,000 tokens of the previous chapter (the actual ending) rather than blindly truncating from the front. Ensures the hand-off point — where characters are standing and what was last said — is always preserved.
|
||||||
- **Stateful Scene Tracking (`bible_tracker.py`):** After each chapter, the tracker records each character's `current_location`, `time_of_day`, and `held_items` in addition to appearance and events. This scene state is injected into subsequent chapter prompts so the writer knows exactly where characters are, what time it is, and what they're carrying.
|
- **Stateful Scene Tracking (`bible_tracker.py`):** After each chapter, the tracker records each character's `current_location`, `time_of_day`, and `held_items` in addition to appearance and events. This scene state is injected into subsequent chapter prompts so the writer knows exactly where characters are, what time it is, and what they're carrying.
|
||||||
@@ -130,7 +130,7 @@ Open `http://localhost:5000`.
|
|||||||
|
|
||||||
### AI Infrastructure (`ai/`)
|
### AI Infrastructure (`ai/`)
|
||||||
- **Resilient Model Wrapper:** Wraps every Gemini API call with up to 3 retries and exponential backoff, handles quota errors and rate limits, and can switch to an alternative model mid-stream.
|
- **Resilient Model Wrapper:** Wraps every Gemini API call with up to 3 retries and exponential backoff, handles quota errors and rate limits, and can switch to an alternative model mid-stream.
|
||||||
- **Auto Model Selection:** On startup, a bootstrapper model queries the Gemini API and selects the optimal models for Logic, Writer, Artist, and Image roles. Selection is cached for 24 hours.
|
- **Auto Model Selection:** On startup, a bootstrapper model queries the Gemini API and selects the optimal models for Logic, Writer, Artist, and Image roles. Selection is cached for 24 hours. The selection algorithm now prioritizes quality — free/preview/exp models are preferred by capability (Pro > Flash, 2.5 > 2.0 > 1.5) rather than by cost alone.
|
||||||
- **Vertex AI Support:** If `GCP_PROJECT` is set and OAuth credentials are present, initializes Vertex AI automatically for Imagen image generation.
|
- **Vertex AI Support:** If `GCP_PROJECT` is set and OAuth credentials are present, initializes Vertex AI automatically for Imagen image generation.
|
||||||
- **Payload Guardrails:** Every generation call estimates the prompt token count before dispatch. If the payload exceeds 30,000 tokens, a warning is logged so runaway context injection is surfaced immediately.
|
- **Payload Guardrails:** Every generation call estimates the prompt token count before dispatch. If the payload exceeds 30,000 tokens, a warning is logged so runaway context injection is surfaced immediately.
|
||||||
|
|
||||||
|
|||||||
26
ai/setup.py
26
ai/setup.py
@@ -76,14 +76,14 @@ def select_best_models(force_refresh=False):
|
|||||||
prompt = f"""
|
prompt = f"""
|
||||||
ROLE: AI Model Architect
|
ROLE: AI Model Architect
|
||||||
TASK: Select the optimal Gemini models for a book-writing application.
|
TASK: Select the optimal Gemini models for a book-writing application.
|
||||||
PRIMARY OBJECTIVE: Keep total book generation cost under $2.00. Quality is secondary to this budget.
|
PRIMARY OBJECTIVE: Maximize book quality. Free/preview/exp models are $0.00 — use the BEST quality free model available for every role. Only fall back to paid Flash when no free alternative exists, and only if it fits within the budget cap.
|
||||||
|
|
||||||
AVAILABLE_MODELS:
|
AVAILABLE_MODELS:
|
||||||
{json.dumps(compatible)}
|
{json.dumps(compatible)}
|
||||||
|
|
||||||
PRICING_CONTEXT (USD per 1M tokens — use these to calculate actual book cost):
|
PRICING_CONTEXT (USD per 1M tokens — use these to calculate actual book cost):
|
||||||
- FREE TIER: Any model with 'exp', 'beta', or 'preview' in name = $0.00. Always prefer these.
|
- FREE TIER: Any model with 'exp', 'beta', or 'preview' in name = $0.00. Always prefer these.
|
||||||
e.g. gemini-2.0-pro-exp = FREE, gemini-2.5-pro-preview = FREE.
|
e.g. gemini-2.0-pro-exp = FREE, gemini-2.5-pro-preview = FREE, gemini-2.5-flash-preview = FREE.
|
||||||
- gemini-2.5-flash / gemini-2.5-flash-preview: ~$0.075 Input / $0.30 Output.
|
- gemini-2.5-flash / gemini-2.5-flash-preview: ~$0.075 Input / $0.30 Output.
|
||||||
- gemini-2.0-flash: ~$0.10 Input / $0.40 Output.
|
- gemini-2.0-flash: ~$0.10 Input / $0.40 Output.
|
||||||
- gemini-1.5-flash: ~$0.075 Input / $0.30 Output.
|
- gemini-1.5-flash: ~$0.075 Input / $0.30 Output.
|
||||||
@@ -92,9 +92,9 @@ def select_best_models(force_refresh=False):
|
|||||||
|
|
||||||
BOOK TOKEN BUDGET (30-chapter novel — use this to calculate real cost before deciding):
|
BOOK TOKEN BUDGET (30-chapter novel — use this to calculate real cost before deciding):
|
||||||
Logic role total: ~265,000 input tokens + ~55,000 output tokens
|
Logic role total: ~265,000 input tokens + ~55,000 output tokens
|
||||||
(planning, state tracking, consistency checks, director treatments per chapter)
|
(planning, state tracking, consistency checks, director treatments, chapter evaluation per chapter)
|
||||||
Writer role total: ~450,000 input tokens + ~135,000 output tokens
|
Writer role total: ~450,000 input tokens + ~135,000 output tokens
|
||||||
(drafting, evaluation, refinement per chapter — 2 passes max)
|
(drafting, refinement per chapter — 3 passes max)
|
||||||
Artist role total: ~30,000 input tokens + ~8,000 output tokens
|
Artist role total: ~30,000 input tokens + ~8,000 output tokens
|
||||||
(cover art prompt design, cover layout, blurb, image quality evaluation — text calls only)
|
(cover art prompt design, cover layout, blurb, image quality evaluation — text calls only)
|
||||||
|
|
||||||
@@ -107,19 +107,23 @@ def select_best_models(force_refresh=False):
|
|||||||
(leaving $0.15 headroom for Imagen cover generation, total book target: $2.00).
|
(leaving $0.15 headroom for Imagen cover generation, total book target: $2.00).
|
||||||
|
|
||||||
SELECTION RULES (apply in order):
|
SELECTION RULES (apply in order):
|
||||||
1. FREE FIRST: If a free/exp model exists (any tier, any quality), pick it for Logic. Cost = $0.
|
1. FREE/PREVIEW ALWAYS WINS: Always pick the highest-quality free/exp/preview model for each role.
|
||||||
2. FLASH FOR WRITER: Flash is sufficient for fiction prose. Never pick a paid Pro for Writer.
|
Free models cost $0 regardless of tier — a free Pro beats a paid Flash every time.
|
||||||
|
2. QUALITY FOR WRITER: The Writer role produces all fiction prose. Prefer the best free Flash or
|
||||||
|
free Pro variant available. If no free model exists for Writer, use the cheapest paid Flash
|
||||||
|
that keeps the total budget under $1.85. Never use a paid stable Pro for Writer.
|
||||||
3. CALCULATE: For non-free models, compute the actual book cost using the token budget above.
|
3. CALCULATE: For non-free models, compute the actual book cost using the token budget above.
|
||||||
Reject any combination that exceeds $2.00 total.
|
Reject any combination that exceeds $2.00 total.
|
||||||
4. QUALITY TIEBREAK: Among models with similar cost, prefer newer generation (2.x > 1.5).
|
4. QUALITY TIEBREAK: Among models with identical cost (e.g. both free), prefer the highest
|
||||||
|
generation and capability: Pro > Flash, 2.5 > 2.0 > 1.5, stable > exp only if cost equal.
|
||||||
5. NO THINKING MODELS: Too slow and expensive for any role.
|
5. NO THINKING MODELS: Too slow and expensive for any role.
|
||||||
|
|
||||||
ROLES:
|
ROLES:
|
||||||
- LOGIC: Planning, JSON adherence, plot consistency. Free/exp Pro ideal; Flash acceptable.
|
- LOGIC: Planning, JSON adherence, plot consistency, AND chapter quality evaluation. Best free/exp Pro is ideal; free Flash preview acceptable if no free Pro exists.
|
||||||
- WRITER: Creative prose, chapter drafting. Flash 2.x is sufficient — do NOT use paid Pro.
|
- WRITER: Creative prose, chapter drafting and refinement. Best available free Flash or free Pro variant. Never use a paid stable Pro.
|
||||||
- ARTIST: Visual prompts for cover art. Cheapest capable Flash model.
|
- ARTIST: Visual prompts for cover art. Cheapest capable Flash model (free preferred).
|
||||||
- PRO_REWRITE: Emergency full-chapter rewrite (rare, ~1-2x per book). Best free/exp Pro available.
|
- PRO_REWRITE: Emergency full-chapter rewrite (rare, ~1-2x per book). Best free/exp Pro available.
|
||||||
If no free Pro exists, use best Flash — do not use paid Pro even here.
|
If no free Pro exists, use best free Flash preview — do not use paid models here.
|
||||||
|
|
||||||
OUTPUT_FORMAT (JSON only, no markdown):
|
OUTPUT_FORMAT (JSON only, no markdown):
|
||||||
{{
|
{{
|
||||||
|
|||||||
@@ -66,4 +66,4 @@ LENGTH_DEFINITIONS = {
|
|||||||
}
|
}
|
||||||
|
|
||||||
# --- SYSTEM ---
|
# --- SYSTEM ---
|
||||||
VERSION = "2.9"
|
VERSION = "3.0"
|
||||||
|
|||||||
@@ -327,7 +327,7 @@ def write_chapter(chap, bp, folder, prev_sum, tracking=None, prev_content=None,
|
|||||||
utils.log("WRITER", f"⚠️ Failed Ch {chap['chapter_number']}: {e}")
|
utils.log("WRITER", f"⚠️ Failed Ch {chap['chapter_number']}: {e}")
|
||||||
return f"## Chapter {chap['chapter_number']} Failed\n\nError: {e}"
|
return f"## Chapter {chap['chapter_number']} Failed\n\nError: {e}"
|
||||||
|
|
||||||
max_attempts = 2
|
max_attempts = 3
|
||||||
SCORE_AUTO_ACCEPT = 8
|
SCORE_AUTO_ACCEPT = 8
|
||||||
SCORE_PASSING = 7
|
SCORE_PASSING = 7
|
||||||
SCORE_REWRITE_THRESHOLD = 6
|
SCORE_REWRITE_THRESHOLD = 6
|
||||||
@@ -338,7 +338,7 @@ def write_chapter(chap, bp, folder, prev_sum, tracking=None, prev_content=None,
|
|||||||
|
|
||||||
for attempt in range(1, max_attempts + 1):
|
for attempt in range(1, max_attempts + 1):
|
||||||
utils.log("WRITER", f" -> Evaluating Ch {chap['chapter_number']} (Attempt {attempt}/{max_attempts})...")
|
utils.log("WRITER", f" -> Evaluating Ch {chap['chapter_number']} (Attempt {attempt}/{max_attempts})...")
|
||||||
score, critique = evaluate_chapter_quality(current_text, chap['title'], meta.get('genre', 'Fiction'), ai_models.model_writer, folder, series_context=series_block.strip())
|
score, critique = evaluate_chapter_quality(current_text, chap['title'], meta.get('genre', 'Fiction'), ai_models.model_logic, folder, series_context=series_block.strip())
|
||||||
|
|
||||||
past_critiques.append(f"Attempt {attempt}: {critique}")
|
past_critiques.append(f"Attempt {attempt}: {critique}")
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user