feat: Add evaluation report pipeline for prompt tuning feedback

Adds a full per-chapter evaluation logging system that captures every score, critique, and quality decision made during writing, then renders a self-contained HTML report shareable with critics or prompt engineers. New file — story/eval_logger.py: - append_eval_entry(folder, entry): writes per-chapter eval data to eval_log.json in the book folder (called from write_chapter() at every return point). - generate_html_report(folder, bp): reads eval_log.json and produces a self-contained HTML file (no external deps) with: • Summary cards (avg score, auto-accepted, rewrites, below-threshold) • Score timeline bar chart (one bar per chapter, colour-coded) • Score distribution histogram • Chapter breakdown table with expand-on-click critique details (attempt number, score, decision badge, full critique text) • Critique pattern frequency table (keyword mining across all critiques) • Auto-generated prompt tuning observations (systemic issues, POV character weak spots, pacing type analysis, climax vs. early chapter comparison) story/writer.py: - Imports time and eval_logger. - Initialises _eval_entry dict (chapter metadata + polish flags + thresholds) after all threshold variables are set. - Records each evaluation attempt's score, critique (truncated to 700 chars), and decision (auto_accepted / full_rewrite / refinement / accepted / below_threshold / eval_error / refinement_failed) before every return. web/routes/run.py: - Imports story_eval_logger. - New route GET /project/<run_id>/eval_report/<book_folder>: loads eval_log.json, calls generate_html_report(), returns the HTML as a downloadable attachment named eval_report_<title>.html. Returns a user-friendly "not yet available" page if no log exists. templates/run_details.html: - Adds "Eval Report" (btn-outline-info) button next to "Check Consistency" in each book's artifact section. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
feat: Improve revision pipeline quality — 6 targeted enhancements (v3.1)
2026-02-24 08:03:32 -05:00 · 2026-02-24 07:51:31 -05:00 · 2026-02-22 22:45:54 -05:00 · 2026-02-22 22:31:22 -05:00 · 2026-02-22 22:24:27 -05:00
14 changed files with 1263 additions and 332 deletions
--- a/README.md
+++ b/README.md
@@ -104,6 +104,15 @@ Open `http://localhost:5000`.
 - **Admin Panel:** Manage all users, view spend, and perform factory resets at `/admin`.
 - **Per-User API Keys:** Each user can supply their own Gemini API key; costs are tracked per account.

+### Cost-Effective by Design
+
+This engine was built with the goal of producing high-quality fiction at the lowest possible cost. This is achieved through several architectural optimizations:
+
+*   **Tiered AI Models**: The system uses cheaper, faster models (like Gemini Pro) for structural and analytical tasks—planning the plot, scoring chapter quality, and ensuring consistency. The more powerful and expensive creative models are reserved for the actual writing process.
+*   **Intelligent Context Management**: To minimize the number of tokens sent to the AI, the system is very selective about the data it includes in each request. For example, when writing a chapter, it only injects data for the characters who are currently in the scene, rather than the entire cast.
+*   **Adaptive Workflows**: The engine avoids unnecessary work. If a user provides a detailed outline for a chapter, the system skips the AI step that would normally expand on a basic idea, saving both time and money. It also adjusts its quality standards based on the chapter's importance, spending more effort on a climactic scene than on a simple transition.
+*   **Caching**: The system caches the results of deterministic AI tasks. If it needs to perform the same analysis twice, it reuses the original result instead of making a new API call.
+
 ### CLI Wizard (`cli/`)
 - **Interactive Setup:** Menu-driven interface (via Rich) for creating projects, managing personas, and defining characters and plot beats.
 - **Smart Resume:** Detects in-progress runs via lock files and prompts to resume.
@@ -118,8 +127,13 @@ Open `http://localhost:5000`.
 - **Persona Cache:** The author persona (including writing sample files) is loaded once at the start of the writing phase and reused for every chapter, eliminating redundant file I/O. The cache is refreshed whenever the persona is refined.
 - **Outline Validation Gate (`planner.py`):** Before the writing phase begins, a Logic-model pass checks the chapter plan for missing required beats, character continuity issues, pacing imbalances, and POV logic errors. Issues are logged as warnings so the writer can review them before generation begins.
 - **Adaptive Scoring Thresholds (`writer.py`):** Quality passing thresholds scale with chapter position — setup chapters use a lower bar (6.5) to avoid over-spending refinement tokens on early exposition, while climax chapters use a stricter bar (7.5) to ensure the most important scenes receive maximum effort.
+- **Adaptive Refinement Attempts (`writer.py`):** Climax and resolution chapters (position ≥ 75% through the book) receive up to 3 refinement attempts; earlier chapters keep 2. This concentrates quality effort on the scenes readers remember most.
+- **Stricter Polish Pass (`writer.py`):** The filter-word threshold for skipping the two-pass polish has been tightened from 1-per-83-words to 1-per-125-words, so more borderline drafts are cleaned before evaluation.
 - **Smart Beat Expansion Skip (`writer.py`):** If a chapter's scene beats are already detailed (>100 words total), the Director's Treatment expansion step is skipped, saving ~5K tokens per chapter.
- **Consistency Checker (`editor.py`):** Scores chapters on 13 rubrics (engagement, voice, sensory detail, scene execution, dialogue, pacing, staging, prose dynamics, clarity, etc.) and flags AI-isms ("tapestry", "palpable tension") and weak filter verbs ("felt", "realized"). Chapter evaluation now uses the Logic model (free Pro) rather than the Writer model, ensuring stricter and more accurate scoring.
+- **Consistency Checker (`editor.py`):** Scores chapters on 13 rubrics (engagement, voice, sensory detail, scene execution, dialogue, pacing, staging, prose dynamics, clarity, etc.) and flags AI-isms ("tapestry", "palpable tension") and weak filter verbs ("felt", "realized"). Chapter evaluation now uses head+tail sampling (`keep_head=True`) ensuring the evaluator sees the chapter opening (hooks, sensory anchoring) as well as the ending — long chapters no longer receive scores based only on their tail.
+- **Rewrite Model Upgrade (`editor.py`):** Manual chapter rewrites and user-triggered edits now use `model_writer` (the creative writing model) instead of `model_logic`, producing significantly better prose quality on rewritten content.
+- **Improved Consistency Sampling (`editor.py`):** The mid-generation consistency analysis now samples head + middle + tail of each chapter (instead of head + tail only), giving the continuity LLM a complete picture of each chapter's events for more accurate contradiction detection.
+- **Larger Persona Validation Sample (`style_persona.py`):** The persona validation test passage has been increased from 200 words to 400 words, giving the scorer enough material to reliably assess sentence rhythm, filter-word habits, and deep POV quality before accepting a persona.
 - **Dynamic Character Injection (`writer.py`):** Only injects characters explicitly named in the chapter's `scene_beats` plus the POV character into the writer prompt. Eliminates token waste from unused characters and reduces hallucinated appearances.
 - **Smart Context Tail (`writer.py`):** Extracts the final ~1,000 tokens of the previous chapter (the actual ending) rather than blindly truncating from the front. Ensures the hand-off point — where characters are standing and what was last said — is always preserved.
 - **Stateful Scene Tracking (`bible_tracker.py`):** After each chapter, the tracker records each character's `current_location`, `time_of_day`, and `held_items` in addition to appearance and events. This scene state is injected into subsequent chapter prompts so the writer knows exactly where characters are, what time it is, and what they're carrying.
--- a/cli/engine.py
+++ b/cli/engine.py
@@ -57,6 +57,8 @@ def process_book(bp, folder, context="", resume=False, interactive=False):
                    candidate_persona = style_persona.create_initial_persona(bp, folder)
                    is_valid, p_score = style_persona.validate_persona(bp, candidate_persona, folder)
                    if is_valid or persona_attempt == max_persona_attempts:
+                        if not is_valid:
+                            utils.log("SYSTEM", f"  ⚠️ Persona accepted after {max_persona_attempts} attempts despite low score ({p_score}/10). Voice drift risk elevated.")
                        bp['book_metadata']['author_details'] = candidate_persona
                        break
                    utils.log("SYSTEM", f"  -> Persona attempt {persona_attempt}/{max_persona_attempts} scored {p_score}/10. Regenerating...")
@@ -217,7 +219,8 @@ def process_book(bp, folder, context="", resume=False, interactive=False):

            # Refine Persona to match the actual output (every 5 chapters)
            if (i == 0 or i % 5 == 0) and txt:
-                bp['book_metadata']['author_details'] = style_persona.refine_persona(bp, txt, folder)
+                pov_char = ch.get('pov_character')
+                bp['book_metadata']['author_details'] = style_persona.refine_persona(bp, txt, folder, pov_character=pov_char)
                with open(bp_path, "w") as f: json.dump(bp, f, indent=2)
                cached_persona = build_persona_info(bp)  # Rebuild cache with updated bio

@@ -268,18 +271,25 @@ def process_book(bp, folder, context="", resume=False, interactive=False):
            with open(chars_track_path, "w") as f: json.dump(tracking['characters'], f, indent=2)
            with open(warn_track_path, "w") as f: json.dump(tracking.get('content_warnings', []), f, indent=2)

-            # Update Lore Index (Item 8: RAG-Lite)
+            # Update Lore Index (Item 8: RAG-Lite) — every 3 chapters (lore is stable after ch 1-3)
+            if i == 0 or i % 3 == 0:
                tracking['lore'] = bible_tracker.update_lore_index(folder, txt, tracking.get('lore', {}))
                with open(lore_track_path, "w") as f: json.dump(tracking['lore'], f, indent=2)

+            # Persist dynamic tracking changes back to the bible (Step 1: Bible-Tracking Merge)
+            bp = bible_tracker.merge_tracking_to_bible(bp, tracking)
+            with open(bp_path, "w") as f: json.dump(bp, f, indent=2)
+
            # Update Structured Story State (Item 9: Thread Tracking)
            current_story_state = story_state.update_story_state(txt, ch['chapter_number'], current_story_state, folder)

            # Exp 5: Mid-gen Consistency Snapshot (every 10 chapters)
+            # Sample: first 2 + last 8 chapters to keep token cost bounded regardless of book length
            if len(ms) > 0 and len(ms) % 10 == 0:
                utils.log("EDITOR", f"--- Mid-gen consistency check after chapter {ch['chapter_number']} ({len(ms)} written) ---")
                try:
-                    consistency = story_editor.analyze_consistency(bp, ms, folder)
+                    ms_sample = (ms[:2] + ms[-8:]) if len(ms) > 10 else ms
+                    consistency = story_editor.analyze_consistency(bp, ms_sample, folder)
                    issues = consistency.get('issues', [])
                    if issues:
                        for issue in issues:
--- a/core/config.py
+++ b/core/config.py
@@ -66,4 +66,4 @@ LENGTH_DEFINITIONS = {
 }

 # --- SYSTEM ---
-VERSION = "3.0"
+VERSION = "3.1"
--- a/core/utils.py
+++ b/core/utils.py
@@ -23,18 +23,27 @@ PRICING_CACHE = {}
 # --- Token Estimation & Truncation Utilities ---

 def estimate_tokens(text):
-    """Estimate token count using a 4-chars-per-token heuristic (no external libs required)."""
+    """Estimate token count using a 3.5-chars-per-token heuristic (more accurate than /4)."""
    if not text:
        return 0
-    return max(1, len(text) // 4)
+    return max(1, int(len(text) / 3.5))

-def truncate_to_tokens(text, max_tokens):
-    """Truncate text to approximately max_tokens, keeping the most recent (tail) content."""
+def truncate_to_tokens(text, max_tokens, keep_head=False):
+    """Truncate text to approximately max_tokens.
+
+    keep_head=False (default): keep the most recent (tail) content — good for 'story so far'.
+    keep_head=True: keep first third + last two thirds — good for context that needs both
+                    the opening framing and the most recent events.
+    """
    if not text:
        return text
-    max_chars = max_tokens * 4
+    max_chars = int(max_tokens * 3.5)
    if len(text) <= max_chars:
        return text
+    if keep_head:
+        head_chars = max_chars // 3
+        tail_chars = max_chars - head_chars
+        return text[:head_chars] + "\n[...]\n" + text[-tail_chars:]
    return text[-max_chars:]

 # --- In-Memory AI Response Cache ---
@@ -126,7 +135,14 @@ def log(phase, msg):
        except: pass

 def load_json(path):
-    return json.load(open(path, 'r')) if os.path.exists(path) else None
+    if not os.path.exists(path):
+        return None
+    try:
+        with open(path, 'r', encoding='utf-8', errors='replace') as f:
+            return json.load(f)
+    except (json.JSONDecodeError, OSError, ValueError) as e:
+        log("SYSTEM", f"⚠️ Failed to load JSON from {path}: {e}")
+        return None

 def create_default_personas():
    # Persona data is now stored in the Persona DB table; ensure the directory exists for sample files.
@@ -153,11 +169,13 @@ def log_image_attempt(folder, img_type, prompt, filename, status, error=None, sc
    data = []
    if os.path.exists(log_path):
        try:
-            with open(log_path, 'r') as f: data = json.load(f)
-        except:
-            pass
+            with open(log_path, 'r', encoding='utf-8') as f:
+                data = json.load(f)
+        except (json.JSONDecodeError, OSError):
+            data = []  # Corrupted log — start fresh rather than crash
    data.append(entry)
-    with open(log_path, 'w') as f: json.dump(data, f, indent=2)
+    with open(log_path, 'w', encoding='utf-8') as f:
+        json.dump(data, f, indent=2)

 def get_run_folder(base_name):
    if not os.path.exists(base_name): os.makedirs(base_name)
@@ -218,9 +236,10 @@ def log_usage(folder, model_label, usage_metadata=None, image_count=0):

    if usage_metadata:
        try:
-            input_tokens = usage_metadata.prompt_token_count
-            output_tokens = usage_metadata.candidates_token_count
-        except: pass
+            input_tokens = usage_metadata.prompt_token_count or 0
+            output_tokens = usage_metadata.candidates_token_count or 0
+        except AttributeError:
+            pass  # usage_metadata shape varies by model; tokens stay 0

    cost = calculate_cost(model_label, input_tokens, output_tokens, image_count)

--- a/marketing/blurb.py
+++ b/marketing/blurb.py
@@ -44,8 +44,24 @@ def generate_blurb(bp, folder):
    try:
        response = ai_models.model_writer.generate_content(prompt)
        utils.log_usage(folder, ai_models.model_writer.name, response.usage_metadata)
-        blurb = response.text
-        with open(os.path.join(folder, "blurb.txt"), "w") as f: f.write(blurb)
-        with open(os.path.join(folder, "back_cover.txt"), "w") as f: f.write(blurb)
-    except:
-        utils.log("MARKETING", "Failed to generate blurb.")
+        blurb = response.text.strip()
+
+        # Trim to 220 words if model overshot the 150-200 word target
+        words = blurb.split()
+        if len(words) > 220:
+            blurb = " ".join(words[:220])
+            # End at the last sentence boundary within those 220 words
+            for end_ch in ['.', '!', '?']:
+                last_sent = blurb.rfind(end_ch)
+                if last_sent > len(blurb) // 2:
+                    blurb = blurb[:last_sent + 1]
+                    break
+            utils.log("MARKETING", f"  -> Blurb trimmed to {len(blurb.split())} words.")
+
+        with open(os.path.join(folder, "blurb.txt"), "w", encoding='utf-8') as f:
+            f.write(blurb)
+        with open(os.path.join(folder, "back_cover.txt"), "w", encoding='utf-8') as f:
+            f.write(blurb)
+        utils.log("MARKETING", f"  -> Blurb: {len(blurb.split())} words.")
+    except Exception as e:
+        utils.log("MARKETING", f"Failed to generate blurb: {e}")
--- a/marketing/cover.py
+++ b/marketing/cover.py
@@ -14,27 +14,187 @@ try:
 except ImportError:
    HAS_PIL = False

+# Score gates (mirrors chapter writing pipeline thresholds)
+ART_SCORE_AUTO_ACCEPT = 8    # Stop retrying — image is excellent
+ART_SCORE_PASSING     = 7    # Acceptable; keep as best candidate
+LAYOUT_SCORE_PASSING  = 7    # Accept layout and stop retrying

-def evaluate_image_quality(image_path, prompt, model, folder=None):
-    if not HAS_PIL: return None, "PIL not installed"
+
+# ---------------------------------------------------------------------------
+# Evaluation helpers
+# ---------------------------------------------------------------------------
+
+def evaluate_cover_art(image_path, genre, title, model, folder=None):
+    """Score generated cover art against a professional book-cover rubric.
+
+    Returns (score: int | None, critique: str).
+    Auto-fail conditions:
+      - Any visible text/watermarks → score capped at 4
+      - Blurry or deformed anatomy → deduct 2 points
+    """
+    if not HAS_PIL:
+        return None, "PIL not installed"
    try:
        img = Image.open(image_path)
-        response = model.generate_content([f"""
-        ROLE: Art Critic
-        TASK: Analyze generated image against prompt.
-        PROMPT: '{prompt}'
-        OUTPUT_FORMAT (JSON): {{ "score": int (1-10), "reason": "string" }}
-        """, img])
-        model_name = getattr(model, 'name', "logic-pro")
-        if folder: utils.log_usage(folder, model_name, response.usage_metadata)
-        data = json.loads(utils.clean_json(response.text))
-        return data.get('score'), data.get('reason')
-    except Exception as e: return None, str(e)
+        prompt = f"""
+        ROLE: Professional Book Cover Art Critic
+        TASK: Score this AI-generated cover art for a {genre} novel titled '{title}'.

+        SCORING RUBRIC (1-10):
+        1. VISUAL IMPACT: Is the image immediately arresting? Does it demand attention on a shelf?
+        2. GENRE FIT: Does the visual style, mood, and colour palette unmistakably signal {genre}?
+        3. COMPOSITION: Is there a clear focal point? Are the top or bottom thirds usable for title/author text overlay?
+        4. TECHNICAL QUALITY: Sharp, detailed, free of deformities, blurring, or AI artefacts?
+        5. CLEAN IMAGE: Absolutely NO text, letters, numbers, watermarks, logos, or UI elements?
+
+        SCORING SCALE:
+        - 9-10: Masterclass cover art, ready for a major publisher
+        - 7-8:  Professional quality, genre-appropriate, minor flaws only
+        - 5-6:  Usable but generic or has one significant flaw
+        - 1-4:  Unusable — major artefacts, wrong genre, deformed figures, or visible text
+
+        AUTO-FAIL RULES (apply before scoring):
+        - If ANY text, letters, watermarks or UI elements are visible → score CANNOT exceed 4. State this explicitly.
+        - If figures have deformed anatomy or blurring → deduct 2 from your final score.
+
+        OUTPUT_FORMAT (JSON): {{"score": int, "critique": "Specific issues citing what to fix in the next attempt.", "actionable": "One concrete change to the image prompt that would improve the next attempt."}}
+        """
+        response = model.generate_content([prompt, img])
+        model_name = getattr(model, 'name', "logic")
+        if folder:
+            utils.log_usage(folder, model_name, response.usage_metadata)
+        data = json.loads(utils.clean_json(response.text))
+        score = data.get('score')
+        critique = data.get('critique', '')
+        if data.get('actionable'):
+            critique += f" FIX: {data['actionable']}"
+        return score, critique
+    except Exception as e:
+        return None, str(e)
+
+
+def evaluate_cover_layout(image_path, title, author, genre, font_name, model, folder=None):
+    """Score the finished cover (art + text overlay) as a professional book cover.
+
+    Returns (score: int | None, critique: str).
+    """
+    if not HAS_PIL:
+        return None, "PIL not installed"
+    try:
+        img = Image.open(image_path)
+        prompt = f"""
+        ROLE: Graphic Design Critic
+        TASK: Score this finished book cover for '{title}' by {author} ({genre}).
+
+        SCORING RUBRIC (1-10):
+        1. LEGIBILITY: Is the title instantly readable? High contrast against the background?
+        2. TYPOGRAPHY: Does the font '{font_name}' suit the {genre} genre? Is sizing proportional?
+        3. PLACEMENT: Is the title placed where it doesn't obscure the focal point? Is the author name readable?
+        4. PROFESSIONAL POLISH: Does this look like a published, commercially-viable cover?
+        5. GENRE SIGNAL: At a glance, does the whole cover (art + text) correctly signal {genre}?
+
+        SCORING SCALE:
+        - 9-10: Indistinguishable from a professional published cover
+        - 7-8:  Strong cover, minor refinement would help
+        - 5-6:  Passable but text placement or contrast needs work
+        - 1-4:  Unusable — unreadable text, clashing colours, or amateurish layout
+
+        AUTO-FAIL: If the title text is illegible (low contrast, obscured, or missing) → score CANNOT exceed 4.
+
+        OUTPUT_FORMAT (JSON): {{"score": int, "critique": "Specific layout issues.", "actionable": "One change to position, colour, or font size that would fix the worst problem."}}
+        """
+        response = model.generate_content([prompt, img])
+        model_name = getattr(model, 'name', "logic")
+        if folder:
+            utils.log_usage(folder, model_name, response.usage_metadata)
+        data = json.loads(utils.clean_json(response.text))
+        score = data.get('score')
+        critique = data.get('critique', '')
+        if data.get('actionable'):
+            critique += f" FIX: {data['actionable']}"
+        return score, critique
+    except Exception as e:
+        return None, str(e)
+
+
+# ---------------------------------------------------------------------------
+# Art prompt pre-validation
+# ---------------------------------------------------------------------------
+
+def validate_art_prompt(art_prompt, meta, model, folder=None):
+    """Pre-validate and improve the image generation prompt before calling Imagen.
+
+    Checks for: accidental text instructions, vague focal point, missing composition
+    guidance, and genre mismatch. Returns improved prompt or original on failure.
+    """
+    genre = meta.get('genre', 'Fiction')
+    title = meta.get('title', 'Untitled')
+
+    check_prompt = f"""
+    ROLE: Art Director
+    TASK: Review and improve this image generation prompt for a {genre} book cover titled '{title}'.
+
+    CURRENT_PROMPT:
+    {art_prompt}
+
+    CHECK FOR AND FIX:
+    1. Any instruction to render text, letters, or the title? → Remove it (text is overlaid separately).
+    2. Is there a specific, memorable FOCAL POINT described? → Add one if missing.
+    3. Does the colour palette and style match {genre} conventions? → Correct if off.
+    4. Is RULE OF THIRDS composition mentioned (space at top/bottom for title overlay)? → Add if missing.
+    5. Does it end with "No text, no letters, no watermarks"? → Ensure this is present.
+
+    Return the improved prompt under 200 words.
+
+    OUTPUT_FORMAT (JSON): {{"improved_prompt": "..."}}
+    """
+    try:
+        resp = model.generate_content(check_prompt)
+        if folder:
+            utils.log_usage(folder, model.name, resp.usage_metadata)
+        data = json.loads(utils.clean_json(resp.text))
+        improved = data.get('improved_prompt', '').strip()
+        if improved and len(improved) > 50:
+            utils.log("MARKETING", "  -> Art prompt validated and improved.")
+            return improved
+    except Exception as e:
+        utils.log("MARKETING", f"  -> Art prompt validation failed: {e}. Using original.")
+    return art_prompt
+
+
+# ---------------------------------------------------------------------------
+# Visual context helper
+# ---------------------------------------------------------------------------
+
+def _build_visual_context(bp, tracking):
+    """Extract structured visual context: protagonist, antagonist, key themes."""
+    lines = []
+    chars = bp.get('characters', [])
+    protagonist = next((c for c in chars if 'protagonist' in c.get('role', '').lower()), None)
+    if protagonist:
+        lines.append(f"PROTAGONIST: {protagonist.get('name')} — {protagonist.get('description', '')[:200]}")
+    antagonist = next((c for c in chars if 'antagonist' in c.get('role', '').lower()), None)
+    if antagonist:
+        lines.append(f"ANTAGONIST: {antagonist.get('name')} — {antagonist.get('description', '')[:150]}")
+    if tracking and tracking.get('characters'):
+        for name, data in list(tracking['characters'].items())[:2]:
+            desc = ', '.join(data.get('descriptors', []))[:120]
+            if desc:
+                lines.append(f"CHARACTER VISUAL ({name}): {desc}")
+    if tracking and tracking.get('events'):
+        recent = [e for e in tracking['events'][-3:] if isinstance(e, str)]
+        if recent:
+            lines.append(f"KEY THEMES/EVENTS: {'; '.join(recent)[:200]}")
+    return "\n".join(lines) if lines else ""
+
+
+# ---------------------------------------------------------------------------
+# Main entry point
+# ---------------------------------------------------------------------------

 def generate_cover(bp, folder, tracking=None, feedback=None, interactive=False):
    if not HAS_PIL:
-        utils.log("MARKETING", "Pillow not installed. Skipping image cover.")
+        utils.log("MARKETING", "Pillow not installed. Skipping cover.")
        return

    utils.log("MARKETING", "Generating cover...")
@@ -45,13 +205,7 @@ def generate_cover(bp, folder, tracking=None, feedback=None, interactive=False):
    if orientation == "Landscape": ar = "4:3"
    elif orientation == "Square": ar = "1:1"

-    visual_context = ""
-    if tracking:
-        visual_context = "IMPORTANT VISUAL CONTEXT:\n"
-        if 'events' in tracking:
-            visual_context += f"Key Events/Themes: {json.dumps(tracking['events'][-5:])}\n"
-        if 'characters' in tracking:
-            visual_context += f"Character Appearances: {json.dumps(tracking['characters'])}\n"
+    visual_context = _build_visual_context(bp, tracking)

    regenerate_image = True
    design_instruction = ""
@@ -60,18 +214,15 @@ def generate_cover(bp, folder, tracking=None, feedback=None, interactive=False):
        regenerate_image = False

    if feedback and feedback.strip():
-        utils.log("MARKETING", f"Analyzing feedback: '{feedback}'...")
+        utils.log("MARKETING", f"Analysing feedback: '{feedback}'...")
        analysis_prompt = f"""
        ROLE: Design Assistant
-        TASK: Analyze user feedback on cover.
-
+        TASK: Analyse user feedback on a book cover.
        FEEDBACK: "{feedback}"
-
        DECISION:
-        1. Keep the current background image but change text/layout/color (REGENERATE_LAYOUT).
-        2. Create a completely new background image (REGENERATE_IMAGE).
-
-        OUTPUT_FORMAT (JSON): {{ "action": "REGENERATE_LAYOUT" or "REGENERATE_IMAGE", "instruction": "Specific instruction for Art Director" }}
+        1. Keep the background image; change only text/layout/colour → REGENERATE_LAYOUT
+        2. Create a completely new background image → REGENERATE_IMAGE
+        OUTPUT_FORMAT (JSON): {{"action": "REGENERATE_LAYOUT" or "REGENERATE_IMAGE", "instruction": "Specific instruction for the Art Director."}}
        """
        try:
            resp = ai_models.model_logic.generate_content(analysis_prompt)
@@ -79,9 +230,9 @@ def generate_cover(bp, folder, tracking=None, feedback=None, interactive=False):
            decision = json.loads(utils.clean_json(resp.text))
            if decision.get('action') == 'REGENERATE_LAYOUT':
                regenerate_image = False
-                utils.log("MARKETING", "Feedback indicates keeping image. Regenerating layout only.")
+                utils.log("MARKETING", "Feedback: keeping image, regenerating layout only.")
            design_instruction = decision.get('instruction', feedback)
-        except:
+        except Exception:
            utils.log("MARKETING", "Feedback analysis failed. Defaulting to full regeneration.")

    genre = meta.get('genre', 'Fiction')
@@ -92,11 +243,11 @@ def generate_cover(bp, folder, tracking=None, feedback=None, interactive=False):
        'romance':           'warm, painterly, soft-focus illustration',
        'fantasy':           'epic digital painting, rich colours, mythic scale',
        'science fiction':   'sharp digital art, cool palette, futuristic',
-        'horror': 'unsettling, dark atmospheric painting, desaturated',
-        'historical fiction': 'classical oil painting style, period-accurate',
+        'horror':            'unsettling dark atmospheric painting, desaturated',
+        'historical fiction':'classical oil painting style, period-accurate',
        'young adult':       'vibrant illustrated style, bold colours',
    }
-    suggested_style = genre_style_map.get(genre.lower(), 'professional digital illustration or photography')
+    suggested_style = genre_style_map.get(genre.lower(), 'professional digital illustration')

    design_prompt = f"""
    ROLE: Art Director
@@ -108,186 +259,228 @@ def generate_cover(bp, folder, tracking=None, feedback=None, interactive=False):
    - TONE: {tone}
    - SUGGESTED_VISUAL_STYLE: {suggested_style}

-    VISUAL_CONTEXT (characters and key themes from the story):
-    {visual_context if visual_context else "Use genre conventions."}
+    VISUAL_CONTEXT (characters and themes from the finished story — use these):
+    {visual_context if visual_context else "Use strong genre conventions."}

    USER_FEEDBACK: {feedback if feedback else "None"}
    DESIGN_INSTRUCTION: {design_instruction if design_instruction else "Create a compelling, genre-appropriate cover."}

    COVER_ART_RULES:
-    - The art_prompt must produce an image with NO text, no letters, no numbers, no watermarks, no UI elements, no logos.
-    - Describe a clear FOCAL POINT (e.g. the protagonist, a dramatic scene, a symbolic object).
-    - Use RULE OF THIRDS composition — leave visual space at top and/or bottom for the title and author text to be overlaid.
-    - Describe LIGHTING that reinforces the tone (e.g. "harsh neon backlight" for thriller, "golden hour" for romance).
-    - Describe the COLOUR PALETTE explicitly (e.g. "deep crimson and shadow-black", "soft rose gold and cream").
-    - Characters must match their descriptions from VISUAL_CONTEXT if present.
+    - The art_prompt MUST produce an image with ABSOLUTELY NO text, letters, numbers, watermarks, UI elements, or logos. Text is overlaid separately.
+    - Describe a specific, memorable FOCAL POINT (e.g. protagonist mid-action, a symbolic object, a dramatic landscape).
+    - Use RULE OF THIRDS composition — preserve visual space at top AND bottom for title/author text overlay.
+    - Describe LIGHTING that reinforces the tone (e.g. "harsh neon backlight", "golden hour", "cold winter dawn").
+    - Specify the COLOUR PALETTE explicitly (e.g. "deep crimson and shadow-black", "soft rose gold and ivory cream").
+    - If characters are described in VISUAL_CONTEXT, their appearance MUST match those descriptions exactly.
+    - End the art_prompt with: "No text, no letters, no watermarks, no UI elements. {suggested_style} quality, 8k detail."

-    OUTPUT_FORMAT (JSON only, no markdown):
+    OUTPUT_FORMAT (JSON only, no markdown wrapper):
    {{
-        "font_name": "Name of a Google Font suited to the genre (e.g. Cinzel for fantasy, Oswald for thriller, Playfair Display for romance)",
-        "primary_color": "#HexCode (dominant background/cover colour)",
+        "font_name": "One Google Font suited to {genre} (e.g. Cinzel for fantasy, Oswald for thriller, Playfair Display for romance)",
+        "primary_color": "#HexCode",
        "text_color": "#HexCode (high contrast against primary_color)",
-        "art_prompt": "Detailed {suggested_style} image generation prompt. Begin with the style. Describe composition, focal point, lighting, colour palette, and any characters. End with: No text, no letters, no watermarks, photorealistic/painted quality, 8k detail."
+        "art_prompt": "Detailed image generation prompt. Style → Focal point → Composition → Lighting → Colour palette → Characters (if any). End with the NO TEXT clause."
    }}
    """
    try:
        response = ai_models.model_artist.generate_content(design_prompt)
        utils.log_usage(folder, ai_models.model_artist.name, response.usage_metadata)
        design = json.loads(utils.clean_json(response.text))
+    except Exception as e:
+        utils.log("MARKETING", f"Cover design failed: {e}")
+        return

    bg_color = design.get('primary_color', '#252570')
-
    art_prompt = design.get('art_prompt', f"Cover art for {meta.get('title')}")
+    font_name = design.get('font_name') or 'Playfair Display'
+
+    # Pre-validate and improve the art prompt before handing to Imagen
+    art_prompt = validate_art_prompt(art_prompt, meta, ai_models.model_logic, folder)
    with open(os.path.join(folder, "cover_art_prompt.txt"), "w") as f:
        f.write(art_prompt)

    img = None
    width, height = 600, 900

-        best_img_score = 0
-        best_img_path = None
+    # -----------------------------------------------------------------------
+    # Phase 1: Art generation loop (evaluate → critique → refine → retry)
+    # -----------------------------------------------------------------------
+    best_art_score = 0
+    best_art_path = None
+    current_art_prompt = art_prompt
+    MAX_ART_ATTEMPTS = 3

-        MAX_IMG_ATTEMPTS = 3
    if regenerate_image:
-            for i in range(1, MAX_IMG_ATTEMPTS + 1):
-                utils.log("MARKETING", f"Generating cover art (Attempt {i}/{MAX_IMG_ATTEMPTS})...")
-                try:
-                    if not ai_models.model_image: raise ImportError("No Image Generation Model available.")
+        for attempt in range(1, MAX_ART_ATTEMPTS + 1):
+            utils.log("MARKETING", f"Generating cover art (Attempt {attempt}/{MAX_ART_ATTEMPTS})...")
+            attempt_path = os.path.join(folder, f"cover_art_attempt_{attempt}.png")
+            gen_status = "success"

-                    status = "success"
            try:
-                        result = ai_models.model_image.generate_images(prompt=art_prompt, number_of_images=1, aspect_ratio=ar)
-                    except Exception as e:
-                        err_lower = str(e).lower()
+                if not ai_models.model_image:
+                    raise ImportError("No image generation model available.")
+
+                try:
+                    result = ai_models.model_image.generate_images(
+                        prompt=current_art_prompt, number_of_images=1, aspect_ratio=ar)
+                except Exception as img_err:
+                    err_lower = str(img_err).lower()
                    if ai_models.HAS_VERTEX and ("resource" in err_lower or "quota" in err_lower):
                        try:
                            utils.log("MARKETING", "⚠️ Imagen 3 failed. Trying Imagen 3 Fast...")
-                                fb_model = ai_models.VertexImageModel.from_pretrained("imagen-3.0-fast-generate-001")
-                                result = fb_model.generate_images(prompt=art_prompt, number_of_images=1, aspect_ratio=ar)
-                                status = "success_fast"
+                            fb = ai_models.VertexImageModel.from_pretrained("imagen-3.0-fast-generate-001")
+                            result = fb.generate_images(prompt=current_art_prompt, number_of_images=1, aspect_ratio=ar)
+                            gen_status = "success_fast"
                        except Exception:
                            utils.log("MARKETING", "⚠️ Imagen 3 Fast failed. Trying Imagen 2...")
-                                fb_model = ai_models.VertexImageModel.from_pretrained("imagegeneration@006")
-                                result = fb_model.generate_images(prompt=art_prompt, number_of_images=1, aspect_ratio=ar)
-                                status = "success_fallback"
+                            fb = ai_models.VertexImageModel.from_pretrained("imagegeneration@006")
+                            result = fb.generate_images(prompt=current_art_prompt, number_of_images=1, aspect_ratio=ar)
+                            gen_status = "success_fallback"
                    else:
-                            raise e
+                        raise img_err

-                    attempt_path = os.path.join(folder, f"cover_art_attempt_{i}.png")
                result.images[0].save(attempt_path)
                utils.log_usage(folder, "imagen", image_count=1)

-                    cover_eval_criteria = (
-                        f"Book cover art for a {genre} novel titled '{meta.get('title')}'.\n\n"
-                        f"Evaluate STRICTLY as a professional book cover on these criteria:\n"
-                        f"1. VISUAL IMPACT: Is the image immediately arresting and compelling?\n"
-                        f"2. GENRE FIT: Does the visual style, mood, and palette match {genre}?\n"
-                        f"3. COMPOSITION: Is there a clear focal point? Are top/bottom areas usable for title/author text?\n"
-                        f"4. QUALITY: Is the image sharp, detailed, and free of deformities or blurring?\n"
-                        f"5. CLEAN IMAGE: Are there absolutely NO text, watermarks, letters, or UI artifacts?\n"
-                        f"Score 1-10. Deduct 3 points if any text/watermarks are visible. "
-                        f"Deduct 2 if the image is blurry or has deformed anatomy."
-                    )
-                    score, critique = evaluate_image_quality(attempt_path, cover_eval_criteria, ai_models.model_writer, folder)
-                    if score is None: score = 0
-
-                    utils.log("MARKETING", f"  -> Image Score: {score}/10. Critique: {critique}")
-                    utils.log_image_attempt(folder, "cover", art_prompt, f"cover_art_{i}.png", status, score=score, critique=critique)
+                score, critique = evaluate_cover_art(
+                    attempt_path, genre, meta.get('title', ''), ai_models.model_logic, folder)
+                if score is None:
+                    score = 0
+                utils.log("MARKETING", f"  -> Art Score: {score}/10. Critique: {critique}")
+                utils.log_image_attempt(folder, "cover", current_art_prompt,
+                                        f"cover_art_attempt_{attempt}.png", gen_status,
+                                        score=score, critique=critique)

                if interactive:
                    try:
                        if os.name == 'nt': os.startfile(attempt_path)
                        elif sys.platform == 'darwin': subprocess.call(('open', attempt_path))
                        else: subprocess.call(('xdg-open', attempt_path))
-                        except: pass
-
+                    except Exception:
+                        pass
                    from rich.prompt import Confirm
-                        if Confirm.ask(f"Accept cover attempt {i} (Score: {score})?", default=True):
-                            best_img_path = attempt_path
+                    if Confirm.ask(f"Accept cover art attempt {attempt} (score {score})?", default=True):
+                        best_art_path = attempt_path
+                        best_art_score = score
                        break
                    else:
-                            utils.log("MARKETING", "User rejected cover. Retrying...")
+                        utils.log("MARKETING", "User rejected art. Regenerating...")
                        continue

-                    if score >= 5 and score > best_img_score:
-                        best_img_score = score
-                        best_img_path = attempt_path
-                    elif best_img_path is None and score > 0:
-                        best_img_score = score
-                        best_img_path = attempt_path
+                # Track best image — prefer passing threshold; keep first usable as fallback
+                if score >= ART_SCORE_PASSING and score > best_art_score:
+                    best_art_score = score
+                    best_art_path = attempt_path
+                elif best_art_path is None and score > 0:
+                    best_art_score = score
+                    best_art_path = attempt_path

-                    if score >= 9:
-                        utils.log("MARKETING", "  -> High quality image accepted.")
+                if score >= ART_SCORE_AUTO_ACCEPT:
+                    utils.log("MARKETING", "  -> High-quality art accepted early.")
                    break

-                    prompt_additions = []
-                    critique_lower = critique.lower() if critique else ""
-                    if "scar" in critique_lower or "deform" in critique_lower:
-                        prompt_additions.append("perfect anatomy, no deformities")
-                    if "blur" in critique_lower or "blurry" in critique_lower:
-                        prompt_additions.append("sharp focus, highly detailed")
-                    if "text" in critique_lower or "letter" in critique_lower:
-                        prompt_additions.append("no text, no letters, no watermarks")
-                    if prompt_additions:
-                        art_prompt += f". ({', '.join(prompt_additions)})"
+                # Critique-driven prompt refinement for next attempt
+                if attempt < MAX_ART_ATTEMPTS and critique:
+                    refine_req = f"""
+                    ROLE: Art Director
+                    TASK: Rewrite the image prompt to fix the critique below. Keep under 200 words.
+
+                    CRITIQUE: {critique}
+                    ORIGINAL_PROMPT: {current_art_prompt}
+
+                    RULES:
+                    - Preserve genre style, focal point, and colour palette unless explicitly criticised.
+                    - If text/watermarks were visible: reinforce "absolutely no text, no letters, no watermarks."
+                    - If anatomy was deformed: add "perfect anatomy, professional figure illustration."
+                    - If blurry: add "tack-sharp focus, highly detailed."
+
+                    OUTPUT_FORMAT (JSON): {{"improved_prompt": "..."}}
+                    """
+                    try:
+                        rr = ai_models.model_logic.generate_content(refine_req)
+                        utils.log_usage(folder, ai_models.model_logic.name, rr.usage_metadata)
+                        rd = json.loads(utils.clean_json(rr.text))
+                        improved = rd.get('improved_prompt', '').strip()
+                        if improved and len(improved) > 50:
+                            current_art_prompt = improved
+                            utils.log("MARKETING", "  -> Art prompt refined for next attempt.")
+                    except Exception:
+                        pass

            except Exception as e:
-                    utils.log("MARKETING", f"Image generation failed: {e}")
-                    if "quota" in str(e).lower(): break
+                utils.log("MARKETING", f"Image generation attempt {attempt} failed: {e}")
+                if "quota" in str(e).lower():
+                    break

-            if best_img_path and os.path.exists(best_img_path):
+        if best_art_path and os.path.exists(best_art_path):
            final_art_path = os.path.join(folder, "cover_art.png")
-                if best_img_path != final_art_path:
-                    shutil.copy(best_img_path, final_art_path)
+            if best_art_path != final_art_path:
+                shutil.copy(best_art_path, final_art_path)
            img = Image.open(final_art_path).resize((width, height)).convert("RGB")
+            utils.log("MARKETING", f"  -> Best art: {best_art_score}/10.")
        else:
-                utils.log("MARKETING", "Falling back to solid color cover.")
+            utils.log("MARKETING", "⚠️ No usable art generated. Falling back to solid colour cover.")
            img = Image.new('RGB', (width, height), color=bg_color)
            utils.log_image_attempt(folder, "cover", art_prompt, "cover.png", "fallback_solid")
    else:
        final_art_path = os.path.join(folder, "cover_art.png")
        if os.path.exists(final_art_path):
-                utils.log("MARKETING", "Using existing cover art (Layout update only).")
+            utils.log("MARKETING", "Using existing cover art (layout update only).")
            img = Image.open(final_art_path).resize((width, height)).convert("RGB")
        else:
-                utils.log("MARKETING", "Existing art not found. Forcing regeneration.")
+            utils.log("MARKETING", "Existing art not found. Using solid colour fallback.")
            img = Image.new('RGB', (width, height), color=bg_color)

-        font_path = download_font(design.get('font_name') or 'Arial')
+    if img is None:
+        utils.log("MARKETING", "Cover generation aborted — no image available.")
+        return

+    font_path = download_font(font_name)
+
+    # -----------------------------------------------------------------------
+    # Phase 2: Text layout loop (evaluate → critique → adjust → retry)
+    # -----------------------------------------------------------------------
    best_layout_score = 0
    best_layout_path = None

    base_layout_prompt = f"""
    ROLE: Graphic Designer
-            TASK: Determine text layout coordinates for a 600x900 cover.
+    TASK: Determine precise text layout coordinates for a 600×900 book cover image.

-            METADATA:
+    BOOK:
    - TITLE: {meta.get('title')}
-            - AUTHOR: {meta.get('author')}
-            - GENRE: {meta.get('genre')}
+    - AUTHOR: {meta.get('author', 'Unknown')}
+    - GENRE: {genre}
+    - FONT: {font_name}
+    - TEXT_COLOR: {design.get('text_color', '#FFFFFF')}

-            CONSTRAINT: Do NOT place text over faces.
+    PLACEMENT RULES:
+    - Title in top third OR bottom third (not centre — that obscures the focal art).
+    - Author name in the opposite zone, or just below the title.
+    - Font sizes: title ~60-80px, author ~28-36px for a 600px-wide canvas.
+    - Do NOT place text over faces or the primary focal point.
+    - Coordinates are the CENTER of the text block (x=300 is horizontal centre).
+
+    {f"USER FEEDBACK: {feedback}. Adjust placement/colour accordingly." if feedback else ""}

    OUTPUT_FORMAT (JSON):
    {{
-                "title": {{ "x": Int, "y": Int, "font_size": Int, "font_name": "String", "color": "#Hex" }},
-                "author": {{ "x": Int, "y": Int, "font_size": Int, "font_name": "String", "color": "#Hex" }}
+        "title":  {{"x": Int, "y": Int, "font_size": Int, "font_name": "{font_name}", "color": "#Hex"}},
+        "author": {{"x": Int, "y": Int, "font_size": Int, "font_name": "{font_name}", "color": "#Hex"}}
    }}
    """

-        if feedback:
-            base_layout_prompt += f"\nUSER FEEDBACK: {feedback}\nAdjust layout/colors accordingly."
-
    layout_prompt = base_layout_prompt
+    MAX_LAYOUT_ATTEMPTS = 5

-        for attempt in range(1, 6):
-            utils.log("MARKETING", f"Designing text layout (Attempt {attempt}/5)...")
+    for attempt in range(1, MAX_LAYOUT_ATTEMPTS + 1):
+        utils.log("MARKETING", f"Designing text layout (Attempt {attempt}/{MAX_LAYOUT_ATTEMPTS})...")
        try:
-                response = ai_models.model_writer.generate_content([layout_prompt, img])
-                utils.log_usage(folder, ai_models.model_writer.name, response.usage_metadata)
-                layout = json.loads(utils.clean_json(response.text))
-                if isinstance(layout, list): layout = layout[0] if layout else {}
+            resp = ai_models.model_writer.generate_content([layout_prompt, img])
+            utils.log_usage(folder, ai_models.model_writer.name, resp.usage_metadata)
+            layout = json.loads(utils.clean_json(resp.text))
+            if isinstance(layout, list):
+                layout = layout[0] if layout else {}
        except Exception as e:
            utils.log("MARKETING", f"Layout generation failed: {e}")
            continue
@@ -297,37 +490,34 @@ def generate_cover(bp, folder, tracking=None, feedback=None, interactive=False):

        def draw_element(key, text_override=None):
            elem = layout.get(key)
-                if not elem: return
-                if isinstance(elem, list): elem = elem[0] if elem else {}
+            if not elem:
+                return
+            if isinstance(elem, list):
+                elem = elem[0] if elem else {}
            text = text_override if text_override else elem.get('text')
-                if not text: return
-
-                f_name = elem.get('font_name') or 'Arial'
-                f_path = download_font(f_name)
+            if not text:
+                return
+            f_name = elem.get('font_name') or font_name
+            f_p = download_font(f_name)
            try:
-                    if f_path: font = ImageFont.truetype(f_path, elem.get('font_size', 40))
-                    else: raise IOError("Font not found")
-                except: font = ImageFont.load_default()
-
+                fnt = ImageFont.truetype(f_p, elem.get('font_size', 40)) if f_p else ImageFont.load_default()
+            except Exception:
+                fnt = ImageFont.load_default()
            x, y = elem.get('x', 300), elem.get('y', 450)
-                color = elem.get('color') or '#FFFFFF'
-
-                avg_char_w = font.getlength("A")
-                wrap_w = int(550 / avg_char_w) if avg_char_w > 0 else 20
+            color = elem.get('color') or design.get('text_color', '#FFFFFF')
+            avg_w = fnt.getlength("A")
+            wrap_w = int(550 / avg_w) if avg_w > 0 else 20
            lines = textwrap.wrap(text, width=wrap_w)
-
            line_heights = []
-                for l in lines:
-                    bbox = draw.textbbox((0, 0), l, font=font)
+            for ln in lines:
+                bbox = draw.textbbox((0, 0), ln, font=fnt)
                line_heights.append(bbox[3] - bbox[1] + 10)
-
            total_h = sum(line_heights)
            current_y = y - (total_h // 2)
-
-                for idx, line in enumerate(lines):
-                    bbox = draw.textbbox((0, 0), line, font=font)
+            for idx, ln in enumerate(lines):
+                bbox = draw.textbbox((0, 0), ln, font=fnt)
                lx = x - ((bbox[2] - bbox[0]) / 2)
-                    draw.text((lx, current_y), line, font=font, fill=color)
+                draw.text((lx, current_y), ln, font=fnt, fill=color)
                current_y += line_heights[idx]

        draw_element('title', meta.get('title'))
@@ -336,30 +526,29 @@ def generate_cover(bp, folder, tracking=None, feedback=None, interactive=False):
        attempt_path = os.path.join(folder, f"cover_layout_attempt_{attempt}.png")
        img_copy.save(attempt_path)

-            eval_prompt = f"""
-            Analyze the text layout for the book title '{meta.get('title')}'.
-            CHECKLIST:
-            1. Is the text legible against the background?
-            2. Is the contrast sufficient?
-            3. Does it look professional?
-            """
-            score, critique = evaluate_image_quality(attempt_path, eval_prompt, ai_models.model_writer, folder)
-            if score is None: score = 0
-
+        score, critique = evaluate_cover_layout(
+            attempt_path, meta.get('title', ''), meta.get('author', ''), genre, font_name,
+            ai_models.model_writer, folder
+        )
+        if score is None:
+            score = 0
        utils.log("MARKETING", f"  -> Layout Score: {score}/10. Critique: {critique}")

        if score > best_layout_score:
            best_layout_score = score
            best_layout_path = attempt_path

-            if score == 10:
-                utils.log("MARKETING", "  -> Perfect layout accepted.")
+        if score >= LAYOUT_SCORE_PASSING:
+            utils.log("MARKETING", f"  -> Layout accepted (score {score} ≥ {LAYOUT_SCORE_PASSING}).")
            break

-            layout_prompt = base_layout_prompt + f"\nCRITIQUE OF PREVIOUS ATTEMPT: {critique}\nAdjust position/color to fix this."
+        if attempt < MAX_LAYOUT_ATTEMPTS:
+            layout_prompt = (base_layout_prompt
+                             + f"\n\nCRITIQUE OF ATTEMPT {attempt}: {critique}\n"
+                             + "Adjust coordinates, font_size, or color to fix these issues exactly.")

    if best_layout_path:
        shutil.copy(best_layout_path, os.path.join(folder, "cover.png"))
-
-    except Exception as e:
-        utils.log("MARKETING", f"Cover generation failed: {e}")
+        utils.log("MARKETING", f"Cover saved. Best layout score: {best_layout_score}/10.")
+    else:
+        utils.log("MARKETING", "⚠️ No layout produced. Cover not saved.")
--- a/marketing/fonts.py
+++ b/marketing/fonts.py
@@ -42,14 +42,20 @@ def download_font(font_name):
        base_url = f"https://github.com/google/fonts/raw/main/{license_type}/{clean_name}"
        for pattern in patterns:
            try:
-                r = requests.get(f"{base_url}/{pattern}", headers=headers, timeout=5)
+                r = requests.get(f"{base_url}/{pattern}", headers=headers, timeout=6)
                if r.status_code == 200 and len(r.content) > 1000:
-                    with open(font_path, 'wb') as f: f.write(r.content)
+                    with open(font_path, 'wb') as f:
+                        f.write(r.content)
                    utils.log("ASSETS", f"✅ Downloaded {font_name} to {font_path}")
                    return font_path
-            except Exception: continue
+            except requests.exceptions.Timeout:
+                utils.log("ASSETS", f"  Font download timeout for {font_name} ({pattern}). Trying next...")
+                continue
+            except Exception:
+                continue

    if clean_name != "roboto":
-        utils.log("ASSETS", f"⚠️ Font '{font_name}' not found. Falling back to Roboto.")
+        utils.log("ASSETS", f"⚠️ Font '{font_name}' not found on Google Fonts. Falling back to Roboto.")
        return download_font("Roboto")
+    utils.log("ASSETS", "⚠️ Roboto fallback also failed. PIL will use built-in default font.")
    return None
--- a/story/bible_tracker.py
+++ b/story/bible_tracker.py
@@ -19,7 +19,11 @@ def merge_selected_changes(original, draft, selected_keys):
                original['project_metadata'][field] = draft['project_metadata'][field]

        elif parts[0] == 'char' and len(parts) >= 2:
+            try:
                idx = int(parts[1])
+            except (ValueError, IndexError):
+                utils.log("SYSTEM", f"⚠️ Skipping malformed bible merge key: '{key}'")
+                continue
            if idx < len(draft['characters']):
                if idx < len(original['characters']):
                    original['characters'][idx] = draft['characters'][idx]
@@ -27,7 +31,11 @@ def merge_selected_changes(original, draft, selected_keys):
                    original['characters'].append(draft['characters'][idx])

        elif parts[0] == 'book' and len(parts) >= 2:
+            try:
                book_num = int(parts[1])
+            except (ValueError, IndexError):
+                utils.log("SYSTEM", f"⚠️ Skipping malformed bible merge key: '{key}'")
+                continue
            orig_book = next((b for b in original['books'] if b['book_number'] == book_num), None)
            draft_book = next((b for b in draft['books'] if b['book_number'] == book_num), None)

@@ -42,7 +50,11 @@ def merge_selected_changes(original, draft, selected_keys):
                    orig_book['manual_instruction'] = draft_book['manual_instruction']

                elif len(parts) == 4 and parts[2] == 'beat':
+                    try:
                        beat_idx = int(parts[3])
+                    except (ValueError, IndexError):
+                        utils.log("SYSTEM", f"⚠️ Skipping malformed beat merge key: '{key}'")
+                        continue
                    if beat_idx < len(draft_book['plot_beats']):
                        while len(orig_book['plot_beats']) <= beat_idx:
                            orig_book['plot_beats'].append("")
@@ -129,6 +141,30 @@ def update_lore_index(folder, chapter_text, current_lore):
        return current_lore


+def merge_tracking_to_bible(bible, tracking):
+    """Merge dynamic tracking state back into the bible dict.
+
+    Makes bible.json the single persistent source of truth by updating
+    character data and lore from the in-memory tracking object.
+    Returns the modified bible dict.
+    """
+    for name, data in tracking.get('characters', {}).items():
+        matched = False
+        for char in bible.get('characters', []):
+            if char.get('name') == name:
+                char.update(data)
+                matched = True
+                break
+        if not matched:
+            utils.log("TRACKER", f"  -> Character '{name}' in tracking not found in bible. Skipping.")
+
+    if 'lore' not in bible:
+        bible['lore'] = {}
+    bible['lore'].update(tracking.get('lore', {}))
+
+    return bible
+
+
 def harvest_metadata(bp, folder, full_manuscript):
    utils.log("HARVESTER", "Scanning for new characters...")
    full_text = "\n".join([c.get('content', '') for c in full_manuscript])[:500000]
@@ -153,7 +189,8 @@ def harvest_metadata(bp, folder, full_manuscript):
            if valid_chars:
                utils.log("HARVESTER", f"Found {len(valid_chars)} new chars.")
                bp['characters'].extend(valid_chars)
-    except: pass
+    except Exception as e:
+        utils.log("HARVESTER", f"⚠️ Metadata harvest failed: {e}")
    return bp


--- a/story/editor.py
+++ b/story/editor.py
@@ -67,7 +67,7 @@ def evaluate_chapter_quality(text, chapter_title, genre, model, folder, series_c
    }}
    """
    try:
-        response = model.generate_content([prompt, utils.truncate_to_tokens(text, 7500)])
+        response = model.generate_content([prompt, utils.truncate_to_tokens(text, 7500, keep_head=True)])
        model_name = getattr(model, 'name', ai_models.logic_model_name)
        utils.log_usage(folder, model_name, response.usage_metadata)
        data = json.loads(utils.clean_json(response.text))
@@ -129,7 +129,13 @@ def analyze_consistency(bp, manuscript, folder):
    chapter_summaries = []
    for ch in manuscript:
        text = ch.get('content', '')
-        excerpt = text[:1000] + "\n...\n" + text[-1000:] if len(text) > 2000 else text
+        if len(text) > 3000:
+            mid = len(text) // 2
+            excerpt = text[:800] + "\n...\n" + text[mid - 200:mid + 200] + "\n...\n" + text[-800:]
+        elif len(text) > 1600:
+            excerpt = text[:800] + "\n...\n" + text[-800:]
+        else:
+            excerpt = text
        chapter_summaries.append(f"Ch {ch.get('num')}: {excerpt}")

    context = "\n".join(chapter_summaries)
@@ -236,8 +242,8 @@ def rewrite_chapter_content(bp, manuscript, chapter_num, instruction, folder):
    """

    try:
-        response = ai_models.model_logic.generate_content(prompt)
-        utils.log_usage(folder, ai_models.model_logic.name, response.usage_metadata)
+        response = ai_models.model_writer.generate_content(prompt)
+        utils.log_usage(folder, ai_models.model_writer.name, response.usage_metadata)
        try:
            data = json.loads(utils.clean_json(response.text))
            return data.get('content'), data.get('summary')
--- a/story/eval_logger.py
+++ b/story/eval_logger.py
@@ -0,0 +1,473 @@
+"""eval_logger.py — Per-chapter evaluation log and HTML report generator.
+
+Writes a structured eval_log.json to the book folder during writing, then
+generates a self-contained HTML report that can be downloaded and shared with
+critics / prompt engineers to analyse quality patterns across a run.
+"""
+
+import json
+import os
+import time
+from core import utils
+
+
+# ---------------------------------------------------------------------------
+# Log writer
+# ---------------------------------------------------------------------------
+
+def append_eval_entry(folder, entry):
+    """Append one chapter's evaluation record to eval_log.json.
+
+    Called from story/writer.py at every return point in write_chapter().
+    Each entry captures the chapter metadata, polish decision, per-attempt
+    scores/critiques/decisions, and the final accepted score.
+    """
+    log_path = os.path.join(folder, "eval_log.json")
+    data = []
+    if os.path.exists(log_path):
+        try:
+            with open(log_path, 'r', encoding='utf-8') as f:
+                data = json.load(f)
+            if not isinstance(data, list):
+                data = []
+        except Exception:
+            data = []
+    data.append(entry)
+    try:
+        with open(log_path, 'w', encoding='utf-8') as f:
+            json.dump(data, f, indent=2)
+    except Exception as e:
+        utils.log("EVAL", f"Failed to write eval log: {e}")
+
+
+# ---------------------------------------------------------------------------
+# Report generation
+# ---------------------------------------------------------------------------
+
+def generate_html_report(folder, bp=None):
+    """Generate a self-contained HTML evaluation report from eval_log.json.
+
+    Returns the HTML string, or None if no log file exists / is empty.
+    """
+    log_path = os.path.join(folder, "eval_log.json")
+    if not os.path.exists(log_path):
+        return None
+    try:
+        with open(log_path, 'r', encoding='utf-8') as f:
+            chapters = json.load(f)
+    except Exception:
+        return None
+
+    if not isinstance(chapters, list) or not chapters:
+        return None
+
+    title, genre = "Unknown Book", "Fiction"
+    if bp:
+        meta = bp.get('book_metadata', {})
+        title = meta.get('title', title)
+        genre = meta.get('genre', genre)
+
+    # --- Summary stats ---
+    scores = [c.get('final_score', 0) for c in chapters if isinstance(c.get('final_score'), (int, float)) and c.get('final_score', 0) > 0]
+    avg_score = round(sum(scores) / len(scores), 2) if scores else 0
+    total = len(chapters)
+    auto_accepted   = sum(1 for c in chapters if c.get('final_decision') == 'auto_accepted')
+    multi_attempt   = sum(1 for c in chapters if len(c.get('attempts', [])) > 1)
+    full_rewrites   = sum(1 for c in chapters for a in c.get('attempts', []) if a.get('decision') == 'full_rewrite')
+    below_threshold = sum(1 for c in chapters if c.get('final_decision') == 'below_threshold')
+    polish_applied  = sum(1 for c in chapters if c.get('polish_applied'))
+
+    score_dist = {i: 0 for i in range(1, 11)}
+    for c in chapters:
+        s = c.get('final_score', 0)
+        if isinstance(s, int) and 1 <= s <= 10:
+            score_dist[s] += 1
+
+    patterns = _mine_critique_patterns(chapters, total)
+    report_date = time.strftime('%Y-%m-%d %H:%M')
+    return _build_html(title, genre, report_date, chapters, avg_score, total,
+                       auto_accepted, multi_attempt, full_rewrites, below_threshold,
+                       polish_applied, score_dist, patterns)
+
+
+# ---------------------------------------------------------------------------
+# Pattern mining
+# ---------------------------------------------------------------------------
+
+def _mine_critique_patterns(chapters, total):
+    pattern_keywords = {
+        "Filter words (felt/saw/noticed)":    ["filter word", "filter", "felt ", "noticed ", "realized ", "saw the", "heard the"],
+        "Summary mode / telling":             ["summary mode", "summariz", "telling", "show don't tell", "show, don't tell", "instead of dramatiz"],
+        "Emotion labeling":                   ["emotion label", "told the reader", "labeling", "labelling", "she felt", "he felt", "was nervous", "was angry", "was sad"],
+        "Deep POV issues":                    ["deep pov", "deep point of view", "distant narration", "remove the reader", "external narration"],
+        "Pacing problems":                    ["pacing", "rushing", "too fast", "too slow", "dragging", "sagging", "abrupt"],
+        "Dialogue too on-the-nose":           ["on-the-nose", "on the nose", "subtext", "exposition dump", "characters explain"],
+        "Weak chapter hook / ending":         ["hook", "cliffhanger", "cut off abruptly", "anticlimax", "ending falls flat", "no tension"],
+        "Passive voice / weak verbs":         ["passive voice", "was [v", "were [v", "weak verb", "adverb"],
+        "AI-isms / clichés":                  ["ai-ism", "cliché", "tapestry", "palpable", "testament", "azure", "cerulean", "bustling"],
+        "Voice / tone inconsistency":         ["voice", "tone inconsist", "persona", "shift in tone", "register"],
+        "Missing sensory / atmosphere":       ["sensory", "grounding", "atmosphere", "immersiv", "white room"],
+    }
+    counts = {}
+    for pattern, keywords in pattern_keywords.items():
+        matching = []
+        for c in chapters:
+            critique_blob = " ".join(
+                a.get('critique', '').lower()
+                for a in c.get('attempts', [])
+            )
+            if any(kw.lower() in critique_blob for kw in keywords):
+                matching.append(c.get('chapter_num', '?'))
+        counts[pattern] = {'count': len(matching), 'chapters': matching}
+    return dict(sorted(counts.items(), key=lambda x: x[1]['count'], reverse=True))
+
+
+# ---------------------------------------------------------------------------
+# HTML builder
+# ---------------------------------------------------------------------------
+
+def _score_color(s):
+    try:
+        s = float(s)
+    except (TypeError, ValueError):
+        return '#6c757d'
+    if s >= 8:  return '#28a745'
+    if s >= 7:  return '#20c997'
+    if s >= 6:  return '#ffc107'
+    return '#dc3545'
+
+
+def _decision_badge(d):
+    MAP = {
+        'auto_accepted':          ('⚡ Auto-Accept',    '#28a745'),
+        'accepted':               ('✓ Accepted',        '#17a2b8'),
+        'accepted_at_max':        ('✓ Accepted',        '#17a2b8'),
+        'below_threshold':        ('⚠ Below Threshold', '#dc3545'),
+        'below_threshold_accepted': ('⚠ Below Threshold', '#dc3545'),
+        'full_rewrite':           ('🔄 Full Rewrite',   '#6f42c1'),
+        'full_rewrite_failed':    ('🔄✗ Rewrite Failed','#6f42c1'),
+        'refinement':             ('✏ Refined',         '#fd7e14'),
+        'refinement_failed':      ('✏✗ Refine Failed',  '#fd7e14'),
+        'eval_error':             ('⚠ Eval Error',      '#6c757d'),
+    }
+    label, color = MAP.get(d, (d or '?', '#6c757d'))
+    return f'<span style="background:{color};color:white;padding:2px 8px;border-radius:4px;font-size:0.78em">{label}</span>'
+
+
+def _safe_int_fmt(v):
+    try:
+        return f"{int(v):,}"
+    except (TypeError, ValueError):
+        return str(v) if v else '?'
+
+
+def _build_html(title, genre, report_date, chapters, avg_score, total,
+                auto_accepted, multi_attempt, full_rewrites, below_threshold,
+                polish_applied, score_dist, patterns):
+
+    avg_color = _score_color(avg_score)
+
+    # --- Score timeline ---
+    MAX_BAR = 260
+    timeline_rows = ''
+    for c in chapters:
+        s = c.get('final_score', 0)
+        color = _score_color(s)
+        width = max(2, int((s / 10) * MAX_BAR)) if s else 2
+        ch_num = c.get('chapter_num', '?')
+        ch_title = str(c.get('title', ''))[:35]
+        timeline_rows += (
+            f'<div style="display:flex;align-items:center;margin-bottom:4px;font-size:0.8em">'
+            f'<div style="width:45px;text-align:right;margin-right:8px;color:#888;flex-shrink:0">Ch {ch_num}</div>'
+            f'<div style="background:{color};height:16px;width:{width}px;border-radius:2px;flex-shrink:0"></div>'
+            f'<div style="margin-left:8px;color:#555">{s}/10 &mdash; {ch_title}</div>'
+            f'</div>'
+        )
+
+    # --- Score distribution ---
+    max_dist = max(score_dist.values()) if any(score_dist.values()) else 1
+    dist_rows = ''
+    for sv in range(10, 0, -1):
+        count = score_dist.get(sv, 0)
+        w = max(2, int((count / max_dist) * 200)) if count else 0
+        color = _score_color(sv)
+        dist_rows += (
+            f'<div style="display:flex;align-items:center;margin-bottom:4px;font-size:0.85em">'
+            f'<div style="width:28px;text-align:right;margin-right:8px;font-weight:bold;color:{color}">{sv}</div>'
+            f'<div style="background:{color};height:15px;width:{w}px;border-radius:2px;opacity:0.85"></div>'
+            f'<div style="margin-left:8px;color:#666">{count} ch{"apters" if count != 1 else "apter"}</div>'
+            f'</div>'
+        )
+
+    # --- Chapter rows ---
+    chapter_rows = ''
+    for c in chapters:
+        cid = c.get('chapter_num', 0)
+        ch_title  = str(c.get('title', '')).replace('<', '&lt;').replace('>', '&gt;')
+        pov       = str(c.get('pov_character') or '—')
+        pace      = str(c.get('pacing') or '—')
+        target_w  = _safe_int_fmt(c.get('target_words'))
+        actual_w  = _safe_int_fmt(c.get('actual_words'))
+        pos       = c.get('chapter_position')
+        pos_pct   = f"{int(pos * 100)}%" if pos is not None else '—'
+        threshold = c.get('score_threshold', '?')
+        fw_dens   = c.get('filter_word_density', 0)
+        polish    = '✓' if c.get('polish_applied') else '✗'
+        polish_c  = '#28a745' if c.get('polish_applied') else '#aaa'
+        fs        = c.get('final_score', 0)
+        fd        = c.get('final_decision', '')
+        attempts  = c.get('attempts', [])
+        n_att     = len(attempts)
+        fs_color  = _score_color(fs)
+        fd_badge  = _decision_badge(fd)
+
+        # Attempt detail sub-rows
+        att_rows = ''
+        for att in attempts:
+            an    = att.get('n', '?')
+            ascr  = att.get('score', '?')
+            adec  = att.get('decision', '')
+            acrit = str(att.get('critique', 'No critique.')).replace('&', '&amp;').replace('<', '&lt;').replace('>', '&gt;')
+            ac    = _score_color(ascr)
+            abadge = _decision_badge(adec)
+            att_rows += (
+                f'<tr style="background:#f6f8fa">'
+                f'<td colspan="11" style="padding:12px 16px 12px 56px;border-bottom:1px solid #e8eaed">'
+                f'<div style="margin-bottom:6px"><strong>Attempt {an}:</strong>'
+                f'<span style="font-size:1.1em;font-weight:bold;color:{ac};margin:0 8px">{ascr}/10</span>'
+                f'{abadge}</div>'
+                f'<div style="font-size:0.83em;color:#444;line-height:1.55;white-space:pre-wrap;'
+                f'background:#fff;padding:10px 12px;border-left:3px solid {ac};border-radius:2px;'
+                f'max-height:300px;overflow-y:auto">{acrit}</div>'
+                f'</td></tr>'
+            )
+
+        chapter_rows += (
+            f'<tr class="chrow" onclick="toggle({cid})" style="cursor:pointer">'
+            f'<td style="font-weight:700;text-align:center">{cid}</td>'
+            f'<td>{ch_title}</td>'
+            f'<td style="color:#666;font-size:0.85em">{pov}</td>'
+            f'<td style="color:#666;font-size:0.85em">{pace}</td>'
+            f'<td style="text-align:right">{actual_w} <span style="color:#aaa">/{target_w}</span></td>'
+            f'<td style="text-align:center;color:#888">{pos_pct}</td>'
+            f'<td style="text-align:center">{threshold}</td>'
+            f'<td style="text-align:center;color:{polish_c}">{polish} <span style="color:#aaa;font-size:0.8em">{fw_dens:.3f}</span></td>'
+            f'<td style="text-align:center;font-weight:700;font-size:1.1em;color:{fs_color}">{fs}</td>'
+            f'<td style="text-align:center;color:#888">{n_att}&times;</td>'
+            f'<td>{fd_badge}</td>'
+            f'</tr>'
+            f'<tr id="d{cid}" class="detrow">{att_rows}</tr>'
+        )
+
+    # --- Critique patterns ---
+    pat_rows = ''
+    for pattern, data in patterns.items():
+        count = data['count']
+        if count == 0:
+            continue
+        pct = int(count / total * 100) if total else 0
+        sev_color = '#dc3545' if pct >= 50 else '#fd7e14' if pct >= 30 else '#17a2b8'
+        chlist = ', '.join(f'Ch {x}' for x in data['chapters'][:10])
+        if len(data['chapters']) > 10:
+            chlist += f' (+{len(data["chapters"]) - 10} more)'
+        pat_rows += (
+            f'<tr>'
+            f'<td><strong>{pattern}</strong></td>'
+            f'<td style="text-align:center;color:{sev_color};font-weight:700">{count}/{total} ({pct}%)</td>'
+            f'<td style="color:#666;font-size:0.83em">{chlist}</td>'
+            f'</tr>'
+        )
+    if not pat_rows:
+        pat_rows = '<tr><td colspan="3" style="color:#666;text-align:center;padding:12px">No significant patterns detected.</td></tr>'
+
+    # --- Prompt tuning notes ---
+    notes = _generate_prompt_notes(chapters, avg_score, total, full_rewrites, below_threshold, patterns)
+    notes_html = ''.join(f'<li style="margin-bottom:8px;line-height:1.55">{n}</li>' for n in notes)
+
+    return f'''<!DOCTYPE html>
+<html lang="en">
+<head>
+<meta charset="UTF-8">
+<meta name="viewport" content="width=device-width, initial-scale=1">
+<title>Eval Report &mdash; {title}</title>
+<style>
+*{{box-sizing:border-box;margin:0;padding:0}}
+body{{font-family:-apple-system,BlinkMacSystemFont,"Segoe UI",Roboto,sans-serif;background:#f0f2f5;color:#333;padding:20px}}
+.wrap{{max-width:1280px;margin:0 auto}}
+header{{background:#1a1d23;color:#fff;padding:22px 28px;border-radius:10px;margin-bottom:22px}}
+header h1{{font-size:0.9em;color:#8b92a1;margin-bottom:4px;font-weight:500}}
+header h2{{font-size:1.9em;font-weight:700;margin-bottom:6px}}
+header p{{color:#8b92a1;font-size:0.88em}}
+.cards{{display:grid;grid-template-columns:repeat(auto-fit,minmax(130px,1fr));gap:12px;margin-bottom:20px}}
+.card{{background:#fff;border-radius:8px;padding:16px;text-align:center;box-shadow:0 1px 3px rgba(0,0,0,.08)}}
+.card .val{{font-size:2em;font-weight:700}}
+.card .lbl{{font-size:0.75em;color:#888;margin-top:4px;line-height:1.3}}
+.two-col{{display:grid;grid-template-columns:1fr 1fr;gap:16px;margin-bottom:16px}}
+section{{background:#fff;border-radius:8px;padding:20px;margin-bottom:16px;box-shadow:0 1px 3px rgba(0,0,0,.08)}}
+section h3{{font-size:1em;font-weight:700;border-bottom:2px solid #f0f0f0;padding-bottom:8px;margin-bottom:14px}}
+table{{width:100%;border-collapse:collapse;font-size:0.86em}}
+th{{background:#f7f8fa;padding:8px 10px;text-align:left;font-weight:600;color:#555;border-bottom:2px solid #e0e4ea;white-space:nowrap}}
+td{{padding:8px 10px;border-bottom:1px solid #f0f0f0;vertical-align:middle}}
+.chrow:hover{{background:#f7f8fa}}
+.detrow{{display:none}}
+.legend{{display:flex;gap:14px;flex-wrap:wrap;font-size:0.78em;color:#777;margin-bottom:10px}}
+.dot{{display:inline-block;width:11px;height:11px;border-radius:50%;vertical-align:middle;margin-right:3px}}
+ul.notes{{padding-left:20px}}
+@media(max-width:768px){{.two-col{{grid-template-columns:1fr}}}}
+</style>
+</head>
+<body>
+<div class="wrap">
+
+<header>
+  <h1>BookApp &mdash; Evaluation Report</h1>
+  <h2>{title}</h2>
+  <p>Genre: {genre}&nbsp;&nbsp;|&nbsp;&nbsp;Generated: {report_date}&nbsp;&nbsp;|&nbsp;&nbsp;{total} chapter{"s" if total != 1 else ""}</p>
+</header>
+
+<div class="cards">
+  <div class="card"><div class="val" style="color:{avg_color}">{avg_score}</div><div class="lbl">Avg Score /10</div></div>
+  <div class="card"><div class="val" style="color:#28a745">{auto_accepted}</div><div class="lbl">Auto-Accepted (8+)</div></div>
+  <div class="card"><div class="val" style="color:#17a2b8">{multi_attempt}</div><div class="lbl">Multi-Attempt</div></div>
+  <div class="card"><div class="val" style="color:#6f42c1">{full_rewrites}</div><div class="lbl">Full Rewrites</div></div>
+  <div class="card"><div class="val" style="color:#dc3545">{below_threshold}</div><div class="lbl">Below Threshold</div></div>
+  <div class="card"><div class="val" style="color:#fd7e14">{polish_applied}</div><div class="lbl">Polish Passes</div></div>
+</div>
+
+<div class="two-col">
+<section>
+  <h3>&#128202; Score Timeline</h3>
+  <div class="legend">
+    <span><span class="dot" style="background:#28a745"></span>8&ndash;10 Great</span>
+    <span><span class="dot" style="background:#20c997"></span>7&ndash;7.9 Good</span>
+    <span><span class="dot" style="background:#ffc107"></span>6&ndash;6.9 Passable</span>
+    <span><span class="dot" style="background:#dc3545"></span>&lt;6 Fail</span>
+  </div>
+  <div style="overflow-y:auto;max-height:420px;padding-right:4px">{timeline_rows}</div>
+</section>
+<section>
+  <h3>&#128200; Score Distribution</h3>
+  <div style="margin-top:8px">{dist_rows}</div>
+</section>
+</div>
+
+<section>
+  <h3>&#128203; Chapter Breakdown &nbsp;<small style="font-weight:400;color:#888">(click any row to expand critiques)</small></h3>
+  <div style="overflow-x:auto">
+  <table>
+    <thead><tr>
+      <th>#</th><th>Title</th><th>POV</th><th>Pacing</th>
+      <th style="text-align:right">Words</th>
+      <th style="text-align:center">Pos%</th>
+      <th style="text-align:center">Threshold</th>
+      <th style="text-align:center">Polish&nbsp;/&nbsp;FW</th>
+      <th style="text-align:center">Score</th>
+      <th style="text-align:center">Att.</th>
+      <th>Decision</th>
+    </tr></thead>
+    <tbody>{chapter_rows}</tbody>
+  </table>
+  </div>
+</section>
+
+<section>
+  <h3>&#128269; Critique Patterns &nbsp;<small style="font-weight:400;color:#888">Keyword frequency across all evaluation critiques &mdash; high % = prompt gap</small></h3>
+  <table>
+    <thead><tr><th>Issue Pattern</th><th style="text-align:center">Frequency</th><th>Affected Chapters</th></tr></thead>
+    <tbody>{pat_rows}</tbody>
+  </table>
+</section>
+
+<section>
+  <h3>&#128161; Prompt Tuning Observations</h3>
+  <ul class="notes">{notes_html}</ul>
+</section>
+
+</div>
+<script>
+function toggle(id){{
+  var r=document.getElementById('d'+id);
+  if(r) r.style.display=(r.style.display==='none'||r.style.display==='')?'table-row':'none';
+}}
+document.querySelectorAll('.detrow').forEach(function(r){{r.style.display='none';}});
+</script>
+</body>
+</html>'''
+
+
+# ---------------------------------------------------------------------------
+# Auto-observations for prompt tuning
+# ---------------------------------------------------------------------------
+
+def _generate_prompt_notes(chapters, avg_score, total, full_rewrites, below_threshold, patterns):
+    notes = []
+
+    # Overall score
+    if avg_score >= 8:
+        notes.append(f"&#9989; <strong>High average score ({avg_score}/10).</strong> The generation pipeline is performing well. Focus on the few outlier chapters below the threshold.")
+    elif avg_score >= 7:
+        notes.append(f"&#10003; <strong>Solid average score ({avg_score}/10).</strong> Minor prompt reinforcement should push this above 8. Focus on the most common critique pattern.")
+    elif avg_score >= 6:
+        notes.append(f"&#9888; <strong>Average score of {avg_score}/10 is below target.</strong> Strengthen the draft prompt's Deep POV mandate and filter-word removal rules.")
+    else:
+        notes.append(f"&#128680; <strong>Low average score ({avg_score}/10).</strong> The core writing prompt needs significant work &mdash; review the Deep POV mandate, genre mandates, and consider adding concrete negative examples.")
+
+    # Full rewrite rate
+    if total > 0:
+        rw_pct = int(full_rewrites / total * 100)
+        if rw_pct > 30:
+            notes.append(f"&#128260; <strong>High full-rewrite rate ({rw_pct}%, {full_rewrites} triggers).</strong> The initial draft prompt produces too many sub-6 drafts. Add stronger examples or tighten the DEEP_POV_MANDATE and PROSE_RULES sections.")
+        elif rw_pct > 15:
+            notes.append(f"&#8617; <strong>Moderate full-rewrite rate ({rw_pct}%, {full_rewrites} triggers).</strong> The draft quality could be improved. Check the genre mandates for the types of chapters that rewrite most often.")
+
+        # Below threshold
+        if below_threshold > 0:
+            bt_pct = int(below_threshold / total * 100)
+            notes.append(f"&#9888; <strong>{below_threshold} chapter{'s' if below_threshold != 1 else ''} ({bt_pct}%) finished below the quality threshold.</strong> Inspect the individual critiques to see if these cluster by POV, pacing, or story position.")
+
+    # Top critique patterns
+    for pattern, data in list(patterns.items())[:5]:
+        pct = int(data['count'] / total * 100) if total else 0
+        if pct >= 50:
+            notes.append(f"&#128308; <strong>'{pattern}' appears in {pct}% of critiques.</strong> This is systemic &mdash; the current prompt does not prevent it. Add an explicit enforcement instruction with a concrete example of the wrong pattern and the correct alternative.")
+        elif pct >= 30:
+            notes.append(f"&#128993; <strong>'{pattern}' mentioned in {pct}% of critiques.</strong> Consider reinforcing the relevant prompt instruction with a stronger negative example.")
+
+    # Climax vs. early chapter comparison
+    high_scores = [c.get('final_score', 0) for c in chapters if isinstance(c.get('chapter_position'), float) and c['chapter_position'] >= 0.75]
+    low_scores  = [c.get('final_score', 0) for c in chapters if isinstance(c.get('chapter_position'), float) and c['chapter_position'] < 0.25]
+    if high_scores and low_scores:
+        avg_climax = round(sum(high_scores) / len(high_scores), 1)
+        avg_early  = round(sum(low_scores)  / len(low_scores),  1)
+        if avg_climax < avg_early - 0.5:
+            notes.append(f"&#128197; <strong>Climax chapters average {avg_climax}/10 vs early chapters {avg_early}/10.</strong> The high-stakes scenes underperform. Strengthen the genre mandates for climax pacing and consider adding specific instructions for emotional payoff.")
+        elif avg_climax > avg_early + 0.5:
+            notes.append(f"&#128197; <strong>Climax chapters outperform early chapters ({avg_climax} vs {avg_early}).</strong> Good &mdash; the adaptive threshold and extra attempts are concentrating quality where it matters.")
+
+    # POV character analysis
+    pov_scores = {}
+    for c in chapters:
+        pov = c.get('pov_character') or 'Unknown'
+        s = c.get('final_score', 0)
+        if s > 0:
+            pov_scores.setdefault(pov, []).append(s)
+    for pov, sc in sorted(pov_scores.items(), key=lambda x: sum(x[1]) / len(x[1])):
+        if len(sc) >= 2 and sum(sc) / len(sc) < 6.5:
+            avg_pov = round(sum(sc) / len(sc), 1)
+            notes.append(f"&#128100; <strong>POV '{pov}' averages {avg_pov}/10.</strong> Consider adding or strengthening a character voice profile for this character, or refining the persona bio to match how this POV character should speak and think.")
+
+    # Pacing analysis
+    pace_scores = {}
+    for c in chapters:
+        pace = c.get('pacing', 'Standard')
+        s = c.get('final_score', 0)
+        if s > 0:
+            pace_scores.setdefault(pace, []).append(s)
+    for pace, sc in pace_scores.items():
+        if len(sc) >= 3 and sum(sc) / len(sc) < 6.5:
+            avg_p = round(sum(sc) / len(sc), 1)
+            notes.append(f"&#9193; <strong>'{pace}' pacing chapters average {avg_p}/10.</strong> The writing model struggles with this rhythm. Revisit the PACING_GUIDE instructions for '{pace}' chapters &mdash; they may need more concrete direction.")
+
+    if not notes:
+        notes.append("No significant patterns detected. Review the individual chapter critiques for targeted improvements.")
+    return notes
--- a/story/style_persona.py
+++ b/story/style_persona.py
@@ -121,7 +121,7 @@ def validate_persona(bp, persona_details, folder):

    sample_prompt = f"""
    ROLE: Fiction Writer
-    TASK: Write a 200-word opening scene that perfectly demonstrates this author's voice.
+    TASK: Write a 400-word opening scene that perfectly demonstrates this author's voice.

    AUTHOR_PERSONA:
    Name: {name}
@@ -131,7 +131,7 @@ def validate_persona(bp, persona_details, folder):
    TONE: {tone}

    RULES:
-    - Exactly ~200 words of prose (no chapter header, no commentary)
+    - Exactly ~400 words of prose (no chapter header, no commentary)
    - Must reflect the persona's stated sentence structure, vocabulary, and voice
    - Show, don't tell — no filter words (felt, saw, heard, realized, noticed)
    - Deep POV: immerse the reader in a character's immediate experience
@@ -184,11 +184,42 @@ def validate_persona(bp, persona_details, folder):
        return True, 7


-def refine_persona(bp, text, folder):
+def refine_persona(bp, text, folder, pov_character=None):
    utils.log("SYSTEM", "Refining Author Persona based on recent chapters...")
    ad = bp.get('book_metadata', {}).get('author_details', {})
-    current_bio = ad.get('bio', 'Standard style.')

+    # If a POV character is given and has a voice_profile, refine that instead
+    if pov_character:
+        for char in bp.get('characters', []):
+            if char.get('name') == pov_character and char.get('voice_profile'):
+                vp = char['voice_profile']
+                current_bio = vp.get('bio', 'Standard style.')
+                prompt = f"""
+    ROLE: Literary Stylist
+    TASK: Refine a POV character's voice profile based on the text sample.
+
+    INPUT_DATA:
+    - TEXT_SAMPLE: {text[:3000]}
+    - CHARACTER: {pov_character}
+    - CURRENT_VOICE_BIO: {current_bio}
+
+    GOAL: Ensure future chapters for this POV character sound exactly like the sample. Highlight quirks, patterns, vocabulary specific to this character's perspective.
+
+    OUTPUT_FORMAT (JSON): {{ "bio": "Updated voice bio..." }}
+    """
+                try:
+                    response = ai_models.model_logic.generate_content(prompt)
+                    utils.log_usage(folder, ai_models.model_logic.name, response.usage_metadata)
+                    new_bio = json.loads(utils.clean_json(response.text)).get('bio')
+                    if new_bio:
+                        char['voice_profile']['bio'] = new_bio
+                        utils.log("SYSTEM", f"  -> Voice profile bio updated for '{pov_character}'.")
+                except Exception as e:
+                    utils.log("SYSTEM", f"  -> Voice profile refinement failed for '{pov_character}': {e}")
+                return ad  # Return author_details unchanged
+
+    # Default: refine the main author persona bio
+    current_bio = ad.get('bio', 'Standard style.')
    prompt = f"""
    ROLE: Literary Stylist
    TASK: Refine Author Bio based on text sample.
--- a/story/writer.py
+++ b/story/writer.py
@@ -1,9 +1,11 @@
 import json
 import os
+import time
 from core import config, utils
 from ai import models as ai_models
 from story.style_persona import get_style_guidelines
 from story.editor import evaluate_chapter_quality
+from story import eval_logger


 def get_genre_instructions(genre):
@@ -168,8 +170,19 @@ def write_chapter(chap, bp, folder, prev_sum, tracking=None, prev_content=None,

    pov_char = chap.get('pov_character', '')

-    # Use pre-loaded persona if provided (avoids re-reading sample files every chapter)
-    if prebuilt_persona is not None:
+    # Check for character-specific voice profile (Step 2: Character Voice Profiles)
+    character_voice = None
+    if pov_char:
+        for char in bp.get('characters', []):
+            if char.get('name') == pov_char and char.get('voice_profile'):
+                vp = char['voice_profile']
+                character_voice = f"Style/Bio: {vp.get('bio', '')}\nKeywords: {', '.join(vp.get('keywords', []))}"
+                utils.log("WRITER", f"  -> Using voice profile for POV character: {pov_char}")
+                break
+
+    if character_voice:
+        persona_info = character_voice
+    elif prebuilt_persona is not None:
        persona_info = prebuilt_persona
    else:
        persona_info = build_persona_info(bp) or "Standard, balanced writing style."
@@ -362,12 +375,18 @@ def write_chapter(chap, bp, folder, prev_sum, tracking=None, prev_content=None,
        utils.log("WRITER", f"⚠️ Failed Ch {chap['chapter_number']}: {e}")
        return f"## Chapter {chap['chapter_number']} Failed\n\nError: {e}"

-    # Exp 7: Two-Pass Drafting — Polish the rough draft with the logic (Pro) model
-    # before evaluation. Produces cleaner prose with fewer rewrite cycles.
-    if current_text:
-        utils.log("WRITER", f"  -> Two-pass polish (Pro model)...")
-        guidelines = get_style_guidelines()
-        fw_list = '", "'.join(guidelines['filter_words'])
+    # Exp 7: Two-Pass Drafting — Polish rough draft with the logic (Pro) model before evaluation.
+    # Skip when local filter-word heuristic shows draft is already clean (saves ~8K tokens/chapter).
+    _guidelines_for_polish = get_style_guidelines()
+    _fw_set = set(_guidelines_for_polish['filter_words'])
+    _draft_word_list = current_text.lower().split() if current_text else []
+    _fw_hit_count = sum(1 for w in _draft_word_list if w in _fw_set)
+    _fw_density = _fw_hit_count / max(len(_draft_word_list), 1)
+    _skip_polish = _fw_density < 0.008  # < ~1 filter word per 125 words → draft already clean
+
+    if current_text and not _skip_polish:
+        utils.log("WRITER", f"  -> Two-pass polish (Pro model, FW density {_fw_density:.3f})...")
+        fw_list = '", "'.join(_guidelines_for_polish['filter_words'])
        polish_prompt = f"""
        ROLE: Senior Fiction Editor
        TASK: Polish this rough draft into publication-ready prose.
@@ -379,6 +398,9 @@ def write_chapter(chap, bp, folder, prev_sum, tracking=None, prev_content=None,
        TARGET_WORDS: ~{est_words}
        BEATS (must all be covered): {json.dumps(chap.get('beats', []))}

+        CONTINUITY (maintain seamless flow from previous chapter):
+        {prev_context_block if prev_context_block else "First chapter — no prior context."}
+
        POLISH_CHECKLIST:
        1. FILTER_REMOVAL: Remove all filter words [{fw_list}] — rewrite each to show the sensation directly.
        2. DEEP_POV: Ensure the reader is inside the POV character's experience at all times — no external narration.
@@ -404,8 +426,14 @@ def write_chapter(chap, bp, folder, prev_sum, tracking=None, prev_content=None,
                current_text = polished
        except Exception as e:
            utils.log("WRITER", f"  -> Polish pass failed: {e}. Proceeding with raw draft.")
+    elif current_text:
+        utils.log("WRITER", f"  -> Draft clean (FW density {_fw_density:.3f}). Skipping polish pass.")

-    # Reduced from 3 → 2 attempts since polish pass already refines prose before evaluation
+    # Adaptive attempts: climax/resolution chapters (position >= 0.75) get 3 passes;
+    # earlier chapters keep 2 (polish pass already refines prose before evaluation).
+    if chapter_position is not None and chapter_position >= 0.75:
+        max_attempts = 3
+    else:
        max_attempts = 2
    SCORE_AUTO_ACCEPT = 8
    # Adaptive passing threshold: lenient for early setup chapters, strict for climax/resolution.
@@ -417,6 +445,25 @@ def write_chapter(chap, bp, folder, prev_sum, tracking=None, prev_content=None,
        SCORE_PASSING = 7
    SCORE_REWRITE_THRESHOLD = 6

+    # Evaluation log entry — written to eval_log.json for the HTML report.
+    _eval_entry = {
+        "ts":                  time.strftime('%Y-%m-%d %H:%M:%S'),
+        "chapter_num":         chap['chapter_number'],
+        "title":               chap.get('title', ''),
+        "pov_character":       chap.get('pov_character', ''),
+        "pacing":              pacing,
+        "target_words":        est_words,
+        "actual_words":        draft_words,
+        "chapter_position":    chapter_position,
+        "score_threshold":     SCORE_PASSING,
+        "score_auto_accept":   SCORE_AUTO_ACCEPT,
+        "polish_applied":      bool(current_text and not _skip_polish),
+        "filter_word_density": round(_fw_density, 4),
+        "attempts":            [],
+        "final_score":         0,
+        "final_decision":      "unknown",
+    }
+
    best_score = 0
    best_text = current_text
    past_critiques = []
@@ -426,16 +473,27 @@ def write_chapter(chap, bp, folder, prev_sum, tracking=None, prev_content=None,
        score, critique = evaluate_chapter_quality(current_text, chap['title'], meta.get('genre', 'Fiction'), ai_models.model_logic, folder, series_context=series_block.strip())

        past_critiques.append(f"Attempt {attempt}: {critique}")
+        _att = {"n": attempt, "score": score, "critique": critique[:700], "decision": None}

        if "Evaluation error" in critique:
            utils.log("WRITER", f"     ⚠️ {critique}. Keeping current draft.")
            if best_score == 0: best_text = current_text
+            _att["decision"] = "eval_error"
+            _eval_entry["attempts"].append(_att)
+            _eval_entry["final_score"] = best_score
+            _eval_entry["final_decision"] = "eval_error"
+            eval_logger.append_eval_entry(folder, _eval_entry)
            break

        utils.log("WRITER", f"     Score: {score}/10. Critique: {critique}")

        if score >= SCORE_AUTO_ACCEPT:
            utils.log("WRITER", "     🌟 Auto-Accept threshold met.")
+            _att["decision"] = "auto_accepted"
+            _eval_entry["attempts"].append(_att)
+            _eval_entry["final_score"] = score
+            _eval_entry["final_decision"] = "auto_accepted"
+            eval_logger.append_eval_entry(folder, _eval_entry)
            return current_text

        if score > best_score:
@@ -445,9 +503,19 @@ def write_chapter(chap, bp, folder, prev_sum, tracking=None, prev_content=None,
        if attempt == max_attempts:
            if best_score >= SCORE_PASSING:
                utils.log("WRITER", f"     ✅ Max attempts reached. Accepting best score ({best_score}).")
+                _att["decision"] = "accepted"
+                _eval_entry["attempts"].append(_att)
+                _eval_entry["final_score"] = best_score
+                _eval_entry["final_decision"] = "accepted"
+                eval_logger.append_eval_entry(folder, _eval_entry)
                return best_text
            else:
                utils.log("WRITER", f"     ⚠️ Quality low ({best_score}/{SCORE_PASSING}) but max attempts reached. Proceeding.")
+                _att["decision"] = "below_threshold"
+                _eval_entry["attempts"].append(_att)
+                _eval_entry["final_score"] = best_score
+                _eval_entry["final_decision"] = "below_threshold"
+                eval_logger.append_eval_entry(folder, _eval_entry)
                return best_text

        if score < SCORE_REWRITE_THRESHOLD:
@@ -469,10 +537,17 @@ def write_chapter(chap, bp, folder, prev_sum, tracking=None, prev_content=None,
                utils.log_usage(folder, ai_models.model_logic.name, resp_rewrite.usage_metadata)
                current_text = resp_rewrite.text
                ai_models.model_logic.update(ai_models.logic_model_name)
+                _att["decision"] = "full_rewrite"
+                _eval_entry["attempts"].append(_att)
                continue
            except Exception as e:
                ai_models.model_logic.update(ai_models.logic_model_name)
                utils.log("WRITER", f"Full rewrite failed: {e}. Falling back to refinement.")
+                _att["decision"] = "full_rewrite_failed"
+                # fall through to refinement; decision will be overwritten below
+
+        else:
+            _att["decision"] = "refinement"

        utils.log("WRITER", f"  -> Refining Ch {chap['chapter_number']} based on feedback...")

@@ -527,8 +602,21 @@ def write_chapter(chap, bp, folder, prev_sum, tracking=None, prev_content=None,
            resp_refine = ai_models.model_writer.generate_content(refine_prompt)
            utils.log_usage(folder, ai_models.model_writer.name, resp_refine.usage_metadata)
            current_text = resp_refine.text
+            if _att["decision"] == "full_rewrite_failed":
+                _att["decision"] = "refinement"  # rewrite failed, fell back to refinement
+            _eval_entry["attempts"].append(_att)
        except Exception as e:
            utils.log("WRITER", f"Refinement failed: {e}")
+            _att["decision"] = "refinement_failed"
+            _eval_entry["attempts"].append(_att)
+            _eval_entry["final_score"] = best_score
+            _eval_entry["final_decision"] = "refinement_failed"
+            eval_logger.append_eval_entry(folder, _eval_entry)
            return best_text

+    # Reached only if eval_error break occurred; write log before returning.
+    if _eval_entry["final_decision"] == "unknown":
+        _eval_entry["final_score"] = best_score
+        _eval_entry["final_decision"] = "best_available"
+        eval_logger.append_eval_entry(folder, _eval_entry)
    return best_text
--- a/templates/run_details.html
+++ b/templates/run_details.html
@@ -208,6 +208,9 @@
                        <a href="{{ url_for('run.check_consistency', run_id=run.id, book_folder=book.folder) }}" class="btn btn-outline-warning ms-2">
                            <i class="fas fa-search me-2"></i>Check Consistency
                        </a>
+                        <a href="{{ url_for('run.eval_report', run_id=run.id, book_folder=book.folder) }}" class="btn btn-outline-info ms-2" title="Download evaluation report (scores, critiques, prompt tuning notes)">
+                            <i class="fas fa-chart-bar me-2"></i>Eval Report
+                        </a>
                        <button class="btn btn-warning ms-2" data-bs-toggle="modal" data-bs-target="#reviseBookModal{{ loop.index }}" title="Regenerate this book with changes, keeping others.">
                            <i class="fas fa-pencil-alt me-2"></i>Revise
                        </button>
--- a/web/routes/run.py
+++ b/web/routes/run.py
@@ -10,7 +10,7 @@ from core import utils
 from ai import models as ai_models
 from ai import setup as ai_setup
 from story import editor as story_editor
-from story import bible_tracker, style_persona
+from story import bible_tracker, style_persona, eval_logger as story_eval_logger
 from export import exporter
 from web.tasks import huey, regenerate_artifacts_task, rewrite_chapter_task

@@ -434,6 +434,45 @@ def delete_run(id):
    return redirect(url_for('project.view_project', id=project_id))


+@run_bp.route('/project/<int:run_id>/eval_report/<string:book_folder>')
+@login_required
+def eval_report(run_id, book_folder):
+    """Generate and download the self-contained HTML evaluation report."""
+    run = db.session.get(Run, run_id) or Run.query.get_or_404(run_id)
+    if run.project.user_id != current_user.id:
+        return "Unauthorized", 403
+
+    if not book_folder or "/" in book_folder or "\\" in book_folder or ".." in book_folder:
+        return "Invalid book folder", 400
+
+    run_dir  = os.path.join(run.project.folder_path, "runs", f"run_{run.id}")
+    book_path = os.path.join(run_dir, book_folder)
+
+    bp = utils.load_json(os.path.join(book_path, "final_blueprint.json")) or \
+         utils.load_json(os.path.join(book_path, "blueprint_initial.json"))
+
+    html = story_eval_logger.generate_html_report(book_path, bp)
+    if not html:
+        return (
+            "<html><body style='font-family:sans-serif;padding:40px'>"
+            "<h2>No evaluation data yet.</h2>"
+            "<p>The evaluation report is generated during the writing phase. "
+            "Start a generation run and the report will be available once chapters have been evaluated.</p>"
+            "</body></html>"
+        ), 200
+
+    from flask import Response
+    safe_title = utils.sanitize_filename(
+        (bp or {}).get('book_metadata', {}).get('title', book_folder) or book_folder
+    )[:40]
+    filename = f"eval_report_{safe_title}.html"
+    return Response(
+        html,
+        mimetype='text/html',
+        headers={'Content-Disposition': f'attachment; filename="{filename}"'}
+    )
+
+
@run_bp.route('/run/<int:id>/download_bible')
@login_required
 def download_bible(id):
Author	SHA1	Message	Date
Mike Wichers	f869700070	feat: Add evaluation report pipeline for prompt tuning feedback Adds a full per-chapter evaluation logging system that captures every score, critique, and quality decision made during writing, then renders a self-contained HTML report shareable with critics or prompt engineers. New file — story/eval_logger.py: - append_eval_entry(folder, entry): writes per-chapter eval data to eval_log.json in the book folder (called from write_chapter() at every return point). - generate_html_report(folder, bp): reads eval_log.json and produces a self-contained HTML file (no external deps) with: • Summary cards (avg score, auto-accepted, rewrites, below-threshold) • Score timeline bar chart (one bar per chapter, colour-coded) • Score distribution histogram • Chapter breakdown table with expand-on-click critique details (attempt number, score, decision badge, full critique text) • Critique pattern frequency table (keyword mining across all critiques) • Auto-generated prompt tuning observations (systemic issues, POV character weak spots, pacing type analysis, climax vs. early chapter comparison) story/writer.py: - Imports time and eval_logger. - Initialises _eval_entry dict (chapter metadata + polish flags + thresholds) after all threshold variables are set. - Records each evaluation attempt's score, critique (truncated to 700 chars), and decision (auto_accepted / full_rewrite / refinement / accepted / below_threshold / eval_error / refinement_failed) before every return. web/routes/run.py: - Imports story_eval_logger. - New route GET /project/<run_id>/eval_report/<book_folder>: loads eval_log.json, calls generate_html_report(), returns the HTML as a downloadable attachment named eval_report_<title>.html. Returns a user-friendly "not yet available" page if no log exists. templates/run_details.html: - Adds "Eval Report" (btn-outline-info) button next to "Check Consistency" in each book's artifact section. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-24 08:03:32 -05:00
Mike Wichers	d2c65f010a	feat: Improve revision pipeline quality — 6 targeted enhancements (v3.1) 1. editor.py — Fix rewrite_chapter_content to use model_writer (was model_logic). Chapter rewrites now use the creative writing model, not the cheaper analysis model. 2. editor.py — evaluate_chapter_quality now uses keep_head=True so the evaluator sees the chapter opening (engagement hook, sensory anchoring) as well as the ending; long chapters no longer scored on tail only. 3. editor.py — Consistency analysis sampling upgraded to head+middle+tail (was head+tail), giving the LLM a complete view of each chapter's events. 4. writer.py — max_attempts is now adaptive: climax/resolution chapters (position >= 0.75) receive 3 refinement attempts; others keep 2. 5. writer.py — Polish-skip threshold tightened from 0.012 to 0.008 (1 filter word per 125 words vs. 1 per 83 words), so more borderline drafts are cleaned. 6. style_persona.py — Persona validation sample increased from 200 to 400 words for more reliable voice quality assessment. Version bumped: 3.0 → 3.1 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-24 07:51:31 -05:00
Mike Wichers	dc39930da4	feat: Implement ai_blueprint.md Steps 1 & 2 — bible-tracking merge and character voice profiles Step 1 (Bible-Tracking Merge): - Added merge_tracking_to_bible() to story/bible_tracker.py — merges character tracking state and lore back into bible dict after each chapter, making blueprint_initial.json the single persistent source of truth. - Integrated in cli/engine.py after each chapter's update_tracking + update_lore_index calls so the persisted bible is always up-to-date. Step 2 (Character-Specific Voice Profiles): - story/writer.py: write_chapter now checks bp['characters'] for a voice_profile on the POV character before falling back to the prebuilt_persona cache. - story/style_persona.py: refine_persona() accepts pov_character=None; when a POV character with a voice_profile is supplied it refines that profile's bio instead of the global author_details bio. - cli/engine.py: refine_persona call now passes ch.get('pov_character') so per-chapter persona refinement targets the correct voice. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-22 22:45:54 -05:00
Mike Wichers	ff5093a5f9	fix: Pipeline hardening — error handling, token efficiency, and robustness core/utils.py: - estimate_tokens: improved heuristic 4 chars/token → 3.5 chars/token (more accurate) - truncate_to_tokens: added keep_head=True mode for head+tail truncation (better context retention for story summaries that need both opening and recent content) - load_json: explicit exception handling (json.JSONDecodeError, OSError) with log instead of silent returns; added utf-8 encoding with error replacement - log_image_attempt: replaced bare except with (json.JSONDecodeError, OSError); added utf-8 encoding to output write - log_usage: replaced bare except with AttributeError for token count extraction story/bible_tracker.py: - merge_selected_changes: wrapped all int() key casts (char idx, book num, beat idx) in try/except with meaningful log warning instead of crashing on malformed keys - harvest_metadata: replaced bare except:pass with except Exception as e + log message cli/engine.py: - Persona validation: added warning when all 3 attempts fail and substandard persona is accepted — flags elevated voice-drift risk for the run - Lore index updates: throttled from every chapter to every 3 chapters; lore is stable after the first few chapters (~10% token saving per book) - Mid-gen consistency check: now samples first 2 + last 8 chapters instead of passing full manuscript — caps token cost regardless of book length story/writer.py: - Two-pass polish: added local filter-word density check (no API call); skips the Pro polish if density < 1 per 83 words — saves ~8K tokens on already-clean drafts - Polish prompt: added prev_context_block for continuity — polished chapter now maintains seamless flow from the previous chapter's ending marketing/fonts.py: - Separated requests.exceptions.Timeout with specific log message vs generic failure - Added explicit log message when Roboto fallback also fails (returns None) marketing/blurb.py: - Added word count trim: blurbs > 220 words trimmed to last sentence within 220 words - Changed bare except to except Exception as e with log message - Added utf-8 encoding to file writes; logs final word count Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-22 22:31:22 -05:00
Mike Wichers	3a42d1a339	feat: Rebuild cover pipeline with full evaluate→critique→refine→retry quality gates Major changes to marketing/cover.py: - Split evaluate_image_quality() into two purpose-built functions: * evaluate_cover_art(): 5-rubric scoring (visual impact, genre fit, composition, quality, clean image) with auto-fail for visible text (score capped at 4) and deductions for deformed anatomy * evaluate_cover_layout(): 5-rubric scoring (legibility, typography, placement, professional polish, genre signal) with auto-fail for illegible title (capped at 4) - Added validate_art_prompt(): pre-validates the Imagen prompt before generation — strips accidental text instructions, ensures focal point + rule-of-thirds + genre fit - Added _build_visual_context(): extracts protagonist/antagonist descriptions and key themes from tracking data into structured visual context for the art director prompt - Score thresholds raised to match chapter pipeline: ART_PASSING=7, ART_AUTO_ACCEPT=8, LAYOUT_PASSING=7 (was: art>=5 or >0, layout breaks only at ==10) - Critique-driven art prompt refinement between attempts: full LLM rewrite of the Imagen prompt using the evaluator's actionable feedback (not just keyword appending) - Layout loop now breaks early at score>=7 (was: only at ==10, so never) - Design prompt strengthened with explicit character/visual context and NO TEXT clause Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-22 22:24:27 -05:00