feat: Improve book quality — stronger evaluator, more refinement attempts, quality-first model selection

- Fix: chapter quality evaluation now uses model_logic (free Pro) instead of model_writer (Flash). The model that wrote the chapter was also scoring it, causing circular, lenient grading. - Increase max_attempts in write_chapter from 2 to 3 for more refinement passes per chapter. - Update auto model selection prompt (ai/setup.py) to prioritize quality over budget framing: free/preview/exp models preferred by capability (Pro > Flash, 2.5 > 2.0 > 1.5), not just cost. Writer role now allowed to use best free Flash/Pro preview — not restricted to basic Flash only. - Bump version to 3.0. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 21:28:49 -05:00
parent f740174257
commit 6684ec2bf5
4 changed files with 20 additions and 16 deletions
@@ -76,14 +76,14 @@ def select_best_models(force_refresh=False):
        prompt = f"""
        ROLE: AI Model Architect
        TASK: Select the optimal Gemini models for a book-writing application.
-        PRIMARY OBJECTIVE: Keep total book generation cost under $2.00. Quality is secondary to this budget.
+        PRIMARY OBJECTIVE: Maximize book quality. Free/preview/exp models are $0.00 — use the BEST quality free model available for every role. Only fall back to paid Flash when no free alternative exists, and only if it fits within the budget cap.

        AVAILABLE_MODELS:
        {json.dumps(compatible)}

        PRICING_CONTEXT (USD per 1M tokens — use these to calculate actual book cost):
        - FREE TIER: Any model with 'exp', 'beta', or 'preview' in name = $0.00. Always prefer these.
-          e.g. gemini-2.0-pro-exp = FREE, gemini-2.5-pro-preview = FREE.
+          e.g. gemini-2.0-pro-exp = FREE, gemini-2.5-pro-preview = FREE, gemini-2.5-flash-preview = FREE.
        - gemini-2.5-flash / gemini-2.5-flash-preview: ~$0.075 Input / $0.30 Output.
        - gemini-2.0-flash: ~$0.10 Input / $0.40 Output.
        - gemini-1.5-flash: ~$0.075 Input / $0.30 Output.
@@ -92,9 +92,9 @@ def select_best_models(force_refresh=False):

        BOOK TOKEN BUDGET (30-chapter novel — use this to calculate real cost before deciding):
        Logic role total:  ~265,000 input tokens + ~55,000 output tokens
-          (planning, state tracking, consistency checks, director treatments per chapter)
+          (planning, state tracking, consistency checks, director treatments, chapter evaluation per chapter)
        Writer role total: ~450,000 input tokens + ~135,000 output tokens
-          (drafting, evaluation, refinement per chapter — 2 passes max)
+          (drafting, refinement per chapter — 3 passes max)
        Artist role total: ~30,000 input tokens + ~8,000 output tokens
          (cover art prompt design, cover layout, blurb, image quality evaluation — text calls only)

@@ -107,19 +107,23 @@ def select_best_models(force_refresh=False):
          (leaving $0.15 headroom for Imagen cover generation, total book target: $2.00).

        SELECTION RULES (apply in order):
-        1. FREE FIRST: If a free/exp model exists (any tier, any quality), pick it for Logic. Cost = $0.
-        2. FLASH FOR WRITER: Flash is sufficient for fiction prose. Never pick a paid Pro for Writer.
+        1. FREE/PREVIEW ALWAYS WINS: Always pick the highest-quality free/exp/preview model for each role.
+           Free models cost $0 regardless of tier — a free Pro beats a paid Flash every time.
+        2. QUALITY FOR WRITER: The Writer role produces all fiction prose. Prefer the best free Flash or
+           free Pro variant available. If no free model exists for Writer, use the cheapest paid Flash
+           that keeps the total budget under $1.85. Never use a paid stable Pro for Writer.
        3. CALCULATE: For non-free models, compute the actual book cost using the token budget above.
           Reject any combination that exceeds $2.00 total.
-        4. QUALITY TIEBREAK: Among models with similar cost, prefer newer generation (2.x > 1.5).
+        4. QUALITY TIEBREAK: Among models with identical cost (e.g. both free), prefer the highest
+           generation and capability: Pro > Flash, 2.5 > 2.0 > 1.5, stable > exp only if cost equal.
        5. NO THINKING MODELS: Too slow and expensive for any role.

        ROLES:
-        - LOGIC: Planning, JSON adherence, plot consistency. Free/exp Pro ideal; Flash acceptable.
-        - WRITER: Creative prose, chapter drafting. Flash 2.x is sufficient — do NOT use paid Pro.
-        - ARTIST: Visual prompts for cover art. Cheapest capable Flash model.
+        - LOGIC: Planning, JSON adherence, plot consistency, AND chapter quality evaluation. Best free/exp Pro is ideal; free Flash preview acceptable if no free Pro exists.
+        - WRITER: Creative prose, chapter drafting and refinement. Best available free Flash or free Pro variant. Never use a paid stable Pro.
+        - ARTIST: Visual prompts for cover art. Cheapest capable Flash model (free preferred).
        - PRO_REWRITE: Emergency full-chapter rewrite (rare, ~1-2x per book). Best free/exp Pro available.
-          If no free Pro exists, use best Flash — do not use paid Pro even here.
+          If no free Pro exists, use best free Flash preview — do not use paid models here.

        OUTPUT_FORMAT (JSON only, no markdown):
        {{