feat: Improve book quality — stronger evaluator, more refinement attempts, quality-first model selection
- Fix: chapter quality evaluation now uses model_logic (free Pro) instead of model_writer (Flash). The model that wrote the chapter was also scoring it, causing circular, lenient grading. - Increase max_attempts in write_chapter from 2 to 3 for more refinement passes per chapter. - Update auto model selection prompt (ai/setup.py) to prioritize quality over budget framing: free/preview/exp models preferred by capability (Pro > Flash, 2.5 > 2.0 > 1.5), not just cost. Writer role now allowed to use best free Flash/Pro preview — not restricted to basic Flash only. - Bump version to 3.0. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -115,7 +115,7 @@ Open `http://localhost:5000`.
|
||||
- **Dynamic Pacing:** Monitors story progress during writing and inserts bridge chapters to slow a rushing plot or removes redundant ones detected mid-stream — without restarting.
|
||||
- **Series Continuity:** When generating Book 2+, carries forward character visual tracking, established relationships, plot threads, and a cumulative "Story So Far" summary.
|
||||
- **Persona Refinement Loop:** Every 5 chapters, analyzes actual written text to refine the author persona model, maintaining stylistic consistency throughout the book.
|
||||
- **Consistency Checker (`editor.py`):** Scores chapters on 8 rubrics (engagement, voice, sensory detail, scene execution, etc.) and flags AI-isms ("tapestry", "palpable tension") and weak filter verbs ("felt", "realized").
|
||||
- **Consistency Checker (`editor.py`):** Scores chapters on 13 rubrics (engagement, voice, sensory detail, scene execution, dialogue, pacing, staging, prose dynamics, clarity, etc.) and flags AI-isms ("tapestry", "palpable tension") and weak filter verbs ("felt", "realized"). Chapter evaluation now uses the Logic model (free Pro) rather than the Writer model, ensuring stricter and more accurate scoring.
|
||||
- **Dynamic Character Injection (`writer.py`):** Only injects characters explicitly named in the chapter's `scene_beats` plus the POV character into the writer prompt. Eliminates token waste from unused characters and reduces hallucinated appearances.
|
||||
- **Smart Context Tail (`writer.py`):** Extracts the final ~1,000 tokens of the previous chapter (the actual ending) rather than blindly truncating from the front. Ensures the hand-off point — where characters are standing and what was last said — is always preserved.
|
||||
- **Stateful Scene Tracking (`bible_tracker.py`):** After each chapter, the tracker records each character's `current_location`, `time_of_day`, and `held_items` in addition to appearance and events. This scene state is injected into subsequent chapter prompts so the writer knows exactly where characters are, what time it is, and what they're carrying.
|
||||
@@ -130,7 +130,7 @@ Open `http://localhost:5000`.
|
||||
|
||||
### AI Infrastructure (`ai/`)
|
||||
- **Resilient Model Wrapper:** Wraps every Gemini API call with up to 3 retries and exponential backoff, handles quota errors and rate limits, and can switch to an alternative model mid-stream.
|
||||
- **Auto Model Selection:** On startup, a bootstrapper model queries the Gemini API and selects the optimal models for Logic, Writer, Artist, and Image roles. Selection is cached for 24 hours.
|
||||
- **Auto Model Selection:** On startup, a bootstrapper model queries the Gemini API and selects the optimal models for Logic, Writer, Artist, and Image roles. Selection is cached for 24 hours. The selection algorithm now prioritizes quality — free/preview/exp models are preferred by capability (Pro > Flash, 2.5 > 2.0 > 1.5) rather than by cost alone.
|
||||
- **Vertex AI Support:** If `GCP_PROJECT` is set and OAuth credentials are present, initializes Vertex AI automatically for Imagen image generation.
|
||||
- **Payload Guardrails:** Every generation call estimates the prompt token count before dispatch. If the payload exceeds 30,000 tokens, a warning is logged so runaway context injection is surfaced immediately.
|
||||
|
||||
|
||||
26
ai/setup.py
26
ai/setup.py
@@ -76,14 +76,14 @@ def select_best_models(force_refresh=False):
|
||||
prompt = f"""
|
||||
ROLE: AI Model Architect
|
||||
TASK: Select the optimal Gemini models for a book-writing application.
|
||||
PRIMARY OBJECTIVE: Keep total book generation cost under $2.00. Quality is secondary to this budget.
|
||||
PRIMARY OBJECTIVE: Maximize book quality. Free/preview/exp models are $0.00 — use the BEST quality free model available for every role. Only fall back to paid Flash when no free alternative exists, and only if it fits within the budget cap.
|
||||
|
||||
AVAILABLE_MODELS:
|
||||
{json.dumps(compatible)}
|
||||
|
||||
PRICING_CONTEXT (USD per 1M tokens — use these to calculate actual book cost):
|
||||
- FREE TIER: Any model with 'exp', 'beta', or 'preview' in name = $0.00. Always prefer these.
|
||||
e.g. gemini-2.0-pro-exp = FREE, gemini-2.5-pro-preview = FREE.
|
||||
e.g. gemini-2.0-pro-exp = FREE, gemini-2.5-pro-preview = FREE, gemini-2.5-flash-preview = FREE.
|
||||
- gemini-2.5-flash / gemini-2.5-flash-preview: ~$0.075 Input / $0.30 Output.
|
||||
- gemini-2.0-flash: ~$0.10 Input / $0.40 Output.
|
||||
- gemini-1.5-flash: ~$0.075 Input / $0.30 Output.
|
||||
@@ -92,9 +92,9 @@ def select_best_models(force_refresh=False):
|
||||
|
||||
BOOK TOKEN BUDGET (30-chapter novel — use this to calculate real cost before deciding):
|
||||
Logic role total: ~265,000 input tokens + ~55,000 output tokens
|
||||
(planning, state tracking, consistency checks, director treatments per chapter)
|
||||
(planning, state tracking, consistency checks, director treatments, chapter evaluation per chapter)
|
||||
Writer role total: ~450,000 input tokens + ~135,000 output tokens
|
||||
(drafting, evaluation, refinement per chapter — 2 passes max)
|
||||
(drafting, refinement per chapter — 3 passes max)
|
||||
Artist role total: ~30,000 input tokens + ~8,000 output tokens
|
||||
(cover art prompt design, cover layout, blurb, image quality evaluation — text calls only)
|
||||
|
||||
@@ -107,19 +107,23 @@ def select_best_models(force_refresh=False):
|
||||
(leaving $0.15 headroom for Imagen cover generation, total book target: $2.00).
|
||||
|
||||
SELECTION RULES (apply in order):
|
||||
1. FREE FIRST: If a free/exp model exists (any tier, any quality), pick it for Logic. Cost = $0.
|
||||
2. FLASH FOR WRITER: Flash is sufficient for fiction prose. Never pick a paid Pro for Writer.
|
||||
1. FREE/PREVIEW ALWAYS WINS: Always pick the highest-quality free/exp/preview model for each role.
|
||||
Free models cost $0 regardless of tier — a free Pro beats a paid Flash every time.
|
||||
2. QUALITY FOR WRITER: The Writer role produces all fiction prose. Prefer the best free Flash or
|
||||
free Pro variant available. If no free model exists for Writer, use the cheapest paid Flash
|
||||
that keeps the total budget under $1.85. Never use a paid stable Pro for Writer.
|
||||
3. CALCULATE: For non-free models, compute the actual book cost using the token budget above.
|
||||
Reject any combination that exceeds $2.00 total.
|
||||
4. QUALITY TIEBREAK: Among models with similar cost, prefer newer generation (2.x > 1.5).
|
||||
4. QUALITY TIEBREAK: Among models with identical cost (e.g. both free), prefer the highest
|
||||
generation and capability: Pro > Flash, 2.5 > 2.0 > 1.5, stable > exp only if cost equal.
|
||||
5. NO THINKING MODELS: Too slow and expensive for any role.
|
||||
|
||||
ROLES:
|
||||
- LOGIC: Planning, JSON adherence, plot consistency. Free/exp Pro ideal; Flash acceptable.
|
||||
- WRITER: Creative prose, chapter drafting. Flash 2.x is sufficient — do NOT use paid Pro.
|
||||
- ARTIST: Visual prompts for cover art. Cheapest capable Flash model.
|
||||
- LOGIC: Planning, JSON adherence, plot consistency, AND chapter quality evaluation. Best free/exp Pro is ideal; free Flash preview acceptable if no free Pro exists.
|
||||
- WRITER: Creative prose, chapter drafting and refinement. Best available free Flash or free Pro variant. Never use a paid stable Pro.
|
||||
- ARTIST: Visual prompts for cover art. Cheapest capable Flash model (free preferred).
|
||||
- PRO_REWRITE: Emergency full-chapter rewrite (rare, ~1-2x per book). Best free/exp Pro available.
|
||||
If no free Pro exists, use best Flash — do not use paid Pro even here.
|
||||
If no free Pro exists, use best free Flash preview — do not use paid models here.
|
||||
|
||||
OUTPUT_FORMAT (JSON only, no markdown):
|
||||
{{
|
||||
|
||||
@@ -66,4 +66,4 @@ LENGTH_DEFINITIONS = {
|
||||
}
|
||||
|
||||
# --- SYSTEM ---
|
||||
VERSION = "2.9"
|
||||
VERSION = "3.0"
|
||||
|
||||
@@ -327,7 +327,7 @@ def write_chapter(chap, bp, folder, prev_sum, tracking=None, prev_content=None,
|
||||
utils.log("WRITER", f"⚠️ Failed Ch {chap['chapter_number']}: {e}")
|
||||
return f"## Chapter {chap['chapter_number']} Failed\n\nError: {e}"
|
||||
|
||||
max_attempts = 2
|
||||
max_attempts = 3
|
||||
SCORE_AUTO_ACCEPT = 8
|
||||
SCORE_PASSING = 7
|
||||
SCORE_REWRITE_THRESHOLD = 6
|
||||
@@ -338,7 +338,7 @@ def write_chapter(chap, bp, folder, prev_sum, tracking=None, prev_content=None,
|
||||
|
||||
for attempt in range(1, max_attempts + 1):
|
||||
utils.log("WRITER", f" -> Evaluating Ch {chap['chapter_number']} (Attempt {attempt}/{max_attempts})...")
|
||||
score, critique = evaluate_chapter_quality(current_text, chap['title'], meta.get('genre', 'Fiction'), ai_models.model_writer, folder, series_context=series_block.strip())
|
||||
score, critique = evaluate_chapter_quality(current_text, chap['title'], meta.get('genre', 'Fiction'), ai_models.model_logic, folder, series_context=series_block.strip())
|
||||
|
||||
past_critiques.append(f"Attempt {attempt}: {critique}")
|
||||
|
||||
|
||||
Reference in New Issue
Block a user