Auto-commit: v2.12 — Fix frontend stuck on Initializing/Waiting for logs

- web/tasks.py: db_log_callback now writes non-OperationalError exceptions to data/app.log for visibility - web/tasks.py: generate_book_task restructured with try...finally to guarantee final status update — run can never be left in 'running' state if worker crashes - templates/project.html: added .catch() to fetchLog() with console.error + polling resume on failure; added manual Refresh button to status bar - templates/run_details.html: improved .catch() in updateLog() with descriptive message + 5s retry; added manual Refresh button to status bar Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-21 18:40:28 -05:00
parent 87f24d2bd8
commit 4e39e18dfe
4 changed files with 94 additions and 191 deletions
@@ -1,175 +1,52 @@
-# AI Context Optimization Blueprint (v2.11)
+# AI Context Optimization Blueprint (v2.12 — implemented 2026-02-21)
 This blueprint outlines architectural improvements for how AI context is managed during the writing process. The goal is to provide the AI (Claude/Gemini) with **better, highly-targeted context upfront**, which will dramatically improve first-draft quality and reduce the reliance on expensive, time-consuming quality checks and rewrites (currently up to 5 attempts).
-## 0. Model Selection & Review (New Step)
+## Bug: Frontend Stuck on "Initializing/Waiting for logs" ✅ FIXED in v2.12
-**Current Process:**
+**Symptom:**
-Model selection logic exists in `ai/setup.py` (which determines optimal models based on API queries and fallbacks to defaults like `gemini-2.0-flash`), and the models are instantiated in `ai/models.py`. The active selection is cached in `data/model_cache.json` and viewed via `templates/system_status.html`.
+A book generation task is started from the web UI. Docker logs show the process is running correctly and making progress through different phases (ARCHITECT, WRITER, etc.). However, the frontend UI remains stuck on the "Initializing..." or "Waiting for logs..." state and never shows the live log feed or progress updates.
-**Actionable Review Steps:**
+**Root Cause Analysis:**
-Every time a change is made to this blueprint or related files, the following steps must be completed to review the models, update the version, and ensure changes are saved properly:
+This is a state-synchronization issue between the backend task worker and the frontend UI. The UI polls a backend endpoint (`/run/<run_id>/status`) to get the latest status and log content. When this bug occurs, one of several underlying systems has failed:
-1.  **Check the System Status UI**: Navigate to `/system/status` in the web application. This UI displays the "AI Model Selection" and "All Models Ranked".
+1.  **Log Persistence:** The backend worker fails to write its log messages to the shared database (`LogEntry` table) or fallback file (`web_console.log`).
-2.  **Verify Cache (`data/model_cache.json`)**: Check this file to see the currently cached models for the roles (`logic`, `writer`, `artist`).
+2.  **Database Connection/Session:** The web server process that responds to the `/run/.../status` poll has a stale view of the database or cannot connect.
-3.  **Review Selection Logic (`ai/setup.py`)**: Examine `select_best_models()` to understand the criteria and prompt used for model selection (e.g., favoring `gemini-2.x` over `1.5`, using Flash for speed and Pro for complex reasoning).
+3.  **Frontend Polling:** The javascript on the page stops polling for updates.
 4.  **Force Refresh**: Use the "Refresh & Optimize" button in the System Status UI or call `ai.init_models(force=True)` to force a re-evaluation of available models from the Google API and update the cache.
 5.  **Update Version & Commit**: Ensure the `ai_blueprint.md` version is bumped and a git commit is made reflecting the changes.
-## 1. Context Trimming & Relevance Filtering (The "Less is More" Approach)
+Based on previous fixes (v2.9, v2.11), this is a recurring problem with multiple potential failure points.
-**Current Problem:**
+### Areas to Investigate & Verify (Piece-by-Piece)
 `story/writer.py` injects the *entire* list of characters (`chars_for_writer`) into the prompt for every chapter. As the book grows, this wastes tokens, dilutes the AI's attention, and causes hallucinations where random characters appear in scenes they don't belong in.
-**Solution:**
+This section provides a checklist for a developer to debug this specific issue.
 - **Dynamic Character Injection:** ✅ Only inject characters who are explicitly mentioned in the chapter's `scene_beats`, plus the POV character. *(Implemented v1.5.0)*
 - **RAG for Lore/Locations:** ✅ Lightweight retrieval system implemented — chapter beats tagged with `locations`/`key_items`, lore index built via `update_lore_index` in `bible_tracker.py`, only relevant entries injected per chapter. *(Implemented v2.5 — see Section 8)*
-## 2. Structured "Story So Far" (State Management)
+#### 1. Backend Worker -> Database Communication (`web/tasks.py`)
-**Current Problem:**
+*   **`db_log_callback` Function:** This is the primary function responsible for writing log entries to the database.
-`prev_sum` is likely a growing narrative blob. `prev_content` is truncated blindly to 2000 tokens, which might chop off the actual ending of the previous chapter (the most important part for continuity).
+    *   **Action:** Verify that the `try...except` block is not catching and silencing critical errors. The `except OperationalError` is expected for handling database locks, but other exceptions should be logged loudly.
    *   **Action:** Ensure the database session is correctly handled within the callback. Is it possible the session becomes invalid over the long duration of the task?
 *   **`generate_book_task` Final Status Update:**
    *   **Action:** At the very end of the `generate_book_task`, there are final `run.status = 'completed'` and `db.session.commit()` calls. Verify these are wrapped in a `finally` block to guarantee they execute even if the main task logic fails. If the task fails, the status should be explicitly set to `'failed'`.
-**Solution:**
+#### 2. Web Server -> Database Communication (`web/routes/run.py`)
 - **Smart Truncation:** ✅ Instead of truncating `prev_content` blindly, take the *last* 1000 tokens of the previous chapter, ensuring the immediate hand-off (where characters are standing, what they just said) is perfectly preserved. *(Implemented v1.5.0 via `utils.truncate_to_tokens` tail logic)*
 - **Thread Tracking:** ✅ `Story So Far` refactored into structured `story_state.json` via `story/state.py` — `active_threads`, `immediate_handoff` (3 sentences), and `resolved_threads`; injected as structured prompt context in `engine.py`, replacing the raw summary blob. *(Implemented v2.5 — see Section 9)*
-## 3. Pre-Flight Scene Expansion (Fixing it before writing)
+*   **`/run/<run_id>/status` Endpoint:** This is the endpoint the frontend polls.
    *   **Action:** Review the logic for fetching `LogEntry` records. The v2.11 fix added `db.session.expire_all()` to prevent stale reads. Confirm this is still in place and effective.
    *   **Action:** Examine the fallback logic. If the database query returns no logs, it tries to read from `web_console.log`. Is this file path correctly constructed? Does the worker have permissions to write to it? Is the web server looking in the right place (`runs/run_<id>/...`)?
-**Current Problem:**
+#### 3. Frontend -> Web Server Communication (`templates/project.html` & `templates/run_details.html`)
 The system relies heavily on `evaluate_chapter_quality` to catch bad pacing, missing beats, or "tell not show" errors. This causes loops of rewriting.
-**Solution:**
+*   **Javascript Polling Logic:**
- **Beat Expansion Step:** ✅ Before sending the prompt to the `model_writer`, use an inexpensive, fast model to expand the `scene_beats` into a "Director's Treatment." This treatment explicitly outlines the sensory details, emotional shifts, and entry/exit staging for the chapter. *(Implemented v2.0 — `expand_beats_to_treatment` in `story/writer.py`)*
+    *   **Action:** Check the `fetchLog()` or `updateLog()` javascript function. Is there any condition that would cause `setTimeout` to stop being called? (e.g., an unexpected javascript error in the response parsing).
    *   **Action:** Add more robust error handling to the `fetch()` call's `.catch()` block. Log errors to the browser's developer console so they are visible.
    *   **Action:** Verify that the `initialStatus` logic correctly identifies when a page is loaded for an already-running task and starts polling immediately.
-## 4. Enhanced Bible Tracker (Stateful World)
+### Proposed Fixes & Implementation Plan
-**Current Problem:**
+1.  **Strengthen `db_log_callback`:**
-`bible_tracker.py` updates character clothing, descriptors, and speech styles, but does not track location states, time of day, or inventory/items.
+    *   In `web/tasks.py`, modify the `db_log_callback` to explicitly log any non-`OperationalError` exceptions to the main application log file (`data/app.log`) before breaking the loop. This will give us visibility into why it might be failing silently.
-
+2.  **Guarantee Final Status Update:**
-**Solution:**
+    *   In `web/tasks.py`, wrap the main logic of `generate_book_task` in a `try...finally` block. The `finally` block will be responsible for setting the final run status (`completed` or `failed`) and committing the change. This ensures the run is never left in a `running` state if the worker crashes.
- ✅ Expanded `update_tracking` to include `current_location`, `time_of_day`, and `held_items`. *(Implemented v1.5.0)*
+3.  **Improve Frontend Error Visibility:**
- ✅ This explicit "Scene State" is passed to the writer prompt so the AI doesn't have to guess if it's day or night, or if a character is still holding a specific artifact from two chapters ago. *(Implemented v1.5.0)*
+    *   In the javascript of `templates/project.html` and `templates/run_details.html`, add `console.error("Polling failed:", err);` to the `.catch()` block of the status-polling `fetch()` call. This makes it immediately obvious if the frontend is experiencing network or parsing errors.
-
+4.  **Add a "Force Refresh" Button (UI Enhancement):**
-## 5. UI/UX: Asynchronous Model Optimization (Refresh & Optimize)
+    *   Add a small "Refresh Status" button next to the status message in the UI. This button will manually trigger the `fetchLog()`/`updateLog()` function, providing a manual override if the automatic polling fails for any reason.
 **Current Problem:**
 Clicking "Refresh & Optimize" in `templates/system_status.html` submits a form that blocks the UI and results in a full page refresh. This creates a clunky, blocking experience.
 **Solution:**
 - ✅ **Frontend (`templates/system_status.html`):** Converted the `<form>` submission into an asynchronous AJAX `fetch()` call with a spinner and disabled button state during processing. *(Implemented v2.2)*
 - ✅ **Backend (`web/routes/admin.py`):** Updated the `optimize_models` route to detect AJAX requests and return a JSON status response instead of performing a hard redirect. *(Implemented v2.2)*
 ## 6. Eliminating AI-Isms and Enforcing Genre Authenticity (v2.3)
 **Current Problem:**
 Despite the existing `style_guidelines.json` and basic prompts, the AI writing often falls back on predictable phrases ("testament to," "shiver down spine," "a sense of") and lacks true human-like voice, especially failing to deeply adapt to specific genre conventions.
 **Solution & Implementation Plan:**
 1. ✅ **Genre-Specific Instructions:** `story/writer.py` now calls `get_genre_instructions(genre)` to inject genre-tailored mandates (Thriller, Romance, Fantasy, Sci-Fi, Horror, Historical, General Fiction) into every draft prompt. *(Implemented v2.3)*
 2. ✅ **Deep POV Mandate:** The draft prompt in `story/writer.py` includes a `DEEP_POV_MANDATE` block that explicitly bans summary mode and all filter words, with concrete rewrite examples. *(Implemented v2.3)*
 3. ✅ **Prose Filter Enhancements:** The default `ai_isms` list in `story/style_persona.py` expanded from 12 to 33+ banned phrases. *(Implemented v2.3)*
 4. ✅ **Enforce Show, Don't Tell via Evaluation:** `story/editor.py` `evaluate_chapter_quality` now includes a `DEEP_POV_ENFORCEMENT` block with automatic fail conditions for filter word density and summary mode. *(Implemented v2.3)*
 ## 7. Regular Maintenance of AI-Isms (Continuous Improvement) — v2.4
 **Current Problem:**
 AI models evolve, and new overused phrases regularly emerge. The static list in `data/style_guidelines.json` will become outdated. The `refresh_style_guidelines()` function already exists in `story/style_persona.py` but has no UI or scheduled trigger.
 **Solution & Implementation Plan:**
 1. ✅ **Admin UI Trigger:** Added "Refresh Style Rules" button to `templates/system_status.html` using the same async AJAX spinner pattern as "Refresh & Optimize". *(Implemented v2.4)*
 2. ✅ **Backend Route:** Added `/admin/refresh-style-guidelines` route in `web/routes/admin.py` that calls `style_persona.refresh_style_guidelines(model_logic)` and returns JSON status with counts. *(Implemented v2.4)*
 3. ✅ **Logging:** Route logs the updated counts to `data/app.log` via `utils.log`. *(Implemented v2.4)*
 ## 8. Lore & Location Context Retrieval (RAG-Lite) — v2.5
 **Current Problem:**
 The remaining half of Section 1 — `prev_sum` and the `style_block` carry all world-building as a monolithic blob. Locations, artifacts, and lore details not relevant to the current chapter waste tokens and dilute the AI's focus, causing it to hallucinate setting details or ignore established world rules.
 **Solution & Implementation Plan:**
 1. ✅ **Tag Beats with Locations/Items:** Chapter schema supports optional `locations` and `key_items` arrays. `story/writer.py` reads these from the chapter dict. *(Implemented v2.5)*
 2. ✅ **Lore Index in Bible:** Added `update_lore_index(folder, chapter_text, current_lore)` to `story/bible_tracker.py`. Index is stored in `tracking_lore.json` and loaded into `tracking['lore']`. *(Implemented v2.5)*
 3. ✅ **Retrieval in `write_chapter`:** `story/writer.py` matches chapter `locations`/`key_items` against the lore index and injects a `LORE_CONTEXT` block into the prompt. *(Implemented v2.5)*
 4. ✅ **Fallback:** If chapter has no `locations`/`key_items` or lore index is empty, `lore_block` is empty and behaviour is unchanged. *(Implemented v2.5)*
 5. ✅ **Engine Wiring:** `cli/engine.py` loads `tracking_lore.json` on resume, calls `update_lore_index` after each chapter, and saves to `tracking_lore.json`. *(Implemented v2.5)*
 ## 9. Structured "Story So Far" — Thread Tracking — v2.5
 **Current Problem:**
 The remaining half of Section 2 — `prev_sum` is a growing unstructured narrative blob. As chapters accumulate, the AI receives an ever-longer wall of prose-summary as context, which dilutes attention, buries the most important recent state, and causes continuity drift.
 **Solution & Implementation Plan:**
 1. ✅ **Structured Summary Schema:** New `story/state.py` module. After each chapter, `update_story_state()` uses `model_logic` to extract and save `story_state.json` with `active_threads`, `immediate_handoff` (exactly 3 sentences), and `resolved_threads`. *(Implemented v2.5)*
 2. ✅ **Prompt Injection:** `cli/engine.py` calls `story_state.format_for_prompt(current_story_state, chapter_beats)` before each `write_chapter` call. The formatted string replaces `prev_sum` as the context. Falls back to the raw `summary` blob if no structured state exists yet. *(Implemented v2.5)*
 3. ✅ **State Update Step:** `cli/engine.py` calls `story_state.update_story_state()` after each chapter is written and accepted, saving `story_state.json` in the book folder. *(Implemented v2.5)*
 4. ✅ **Continuity Guard:** `format_for_prompt()` always places `IMMEDIATE STORY HANDOFF` first, followed by `ACTIVE PLOT THREADS`. Resolved threads are only included if referenced in the next chapter's beats. *(Implemented v2.5)*
 ## 10. Consistency Report Quick Fix (v2.6)
 **Current Problem:**
 The `templates/consistency_report.html` page displays issues found in the manuscript but does not provide a direct action to fix them. It only suggests using the "Read & Edit" or "Modify & Re-run" features.
 **Solution & Implementation Plan:**
 1. ✅ **Frontend Action:** Added "Redo Book" form to `templates/consistency_report.html` footer with a text input for the revision instruction and a confirmation prompt on submit. *(Implemented v2.6)*
 2. ✅ **Backend Route:** Added `/project/<run_id>/revise_book/<book_folder>` route in `web/routes/run.py`. Route creates a new `Run` record and queues `generate_book_task` with the user's instruction as `feedback` and `source_run_id` pointing to the original run. The existing bible refinement logic in `generate_book_task` applies the instruction to the bible before regenerating. *(Implemented v2.6)*
 ## 11. Series Continuity & Book Number Awareness (v2.7)
 **Current Problem:**
 The system generates books for a series, but the prompts in `story/planner.py` (specifically `enrich` and `plan_structure`) and the writing prompts do not explicitly pass the `series_metadata` (such as `is_series`, `series_title`, `book_number`, and `total_books`) to the LLM. The AI doesn't know if it's generating Book 1, Book 2, or Book 3, leading to inconsistent pacing and continuity across a series.
 **Solution & Implementation Plan:**
 1. ✅ **Planner Prompts Update:** Modified `enrich()` and `plan_structure()` in `story/planner.py` to extract `bp.get('series_metadata', {})` and inject a `SERIES_CONTEXT` block — "This is Book X of Y in the Z series" with position-aware guidance (Book 1 = establish, middle books = escalate, final book = resolve) — into the prompt when `is_series` is true. *(Implemented v2.7)*
 2. ✅ **Writer Prompts Update:** `story/writer.py` `write_chapter()` builds and injects the same `SERIES_CONTEXT` block into the chapter writing prompt and passes it as `series_context` to `evaluate_chapter_quality()` in `story/editor.py`. `editor.py` `evaluate_chapter_quality()` now accepts an optional `series_context` parameter and injects it into the evaluation METADATA so the editor scores arcs relative to the book's position in the series. *(Implemented v2.7)*
 ## 12. Infrastructure & UI Bug Fixes (v2.8)
 **Problems Found & Fixed:**
 ### A. API Timeout Hangs (Spinning Logs)
 The Gemini SDK had no timeout configured on any network call, causing threads to hang indefinitely:
 - `ai/models.py` `generate_content()` had no timeout → runs spun forever on API errors.
 - `ai/setup.py` all three `genai.list_models()` calls had no timeout → model init could hang.
 - `ai/models.py` retry handler called `init_models(force=True)` — a second network call during an existing failure, cascading the hang.
 **Fixes Applied:**
 1. ✅ `ai/models.py`: Added `_GENERATION_TIMEOUT = 180` class variable; all `generate_content()` calls now merge `request_options={"timeout": 180}`. Removed `init_models(force=True)` from retry handler. *(Implemented v2.8)*
 2. ✅ `ai/setup.py`: Added `_LIST_MODELS_TIMEOUT = {"timeout": 30}` passed to all three `genai.list_models()` call sites (`get_optimal_model`, `select_best_models`, `init_models`). *(Implemented v2.8)*
 ### B. Huey Consumer Never Started (Tasks Queued But Never Executed)
 `web/app.py` started the Huey background consumer inside `if __name__ == "__main__":`, which only runs when the script is executed directly. Under `flask run`, gunicorn, or any WSGI runner the block is never reached — tasks were queued in `queue.db` but never processed.
 3. ✅ `web/app.py`: Moved Huey consumer start to module level with a Werkzeug reloader guard (`WERKZEUG_RUN_MAIN`) and a `FLASK_TESTING` guard to prevent duplicate/test-time consumers. Consumer runs as a daemon thread. *(Implemented v2.8)*
 ### C. "Create New Book" Showing Nothing
 Three bugs combined to produce a blank page or silent failure when creating a new project:
 4. ✅ `templates/project_setup.html`: `{{ s.tropes|join(', ') }}` and `{{ s.formatting_rules|join(', ') }}` raised Jinja2 `UndefinedError` when AI analysis failed and the fallback dict lacked those keys → 500 blank page. Fixed to `{{ (s.tropes or [])|join(', ') }}`. *(Implemented v2.8)*
 5. ✅ `web/routes/project.py` (`project_setup_wizard`): When `model_logic` was `None`, the route silently redirected to the dashboard with a flash the user missed. Now renders the setup form with a complete default suggestions dict (all fields populated, lists as `[]`) and a visible `"warning"` flash so the user can fill in details manually. *(Implemented v2.8)*
 6. ✅ `web/routes/project.py` (`create_project_final`): `planner.enrich()` was called with the full project bible dict. `enrich()` reads `bp.get('manual_instruction')` from the top level (got `'A generic story'` fallback — the real concept was in `bible['books'][0]['manual_instruction']`), and wrote enriched data into a new `book_metadata` key instead of the bible's `books[0]`. Fixed to build a proper per-book blueprint, call enrich, and merge `characters`, `plot_beats`, and `structure_prompt` back into the correct bible locations. *(Implemented v2.8)*
 ### D. "Waiting for logs" / "Preparing environment" Background Task Hangs
 The UI gets stuck indefinitely because the background Huey worker thread hangs before emitting the first "Starting Job" log, or fails to connect to the database.
 **Places that impact this and their fixes:**
 1. ✅ **OAuth Browser Prompt in Background Thread**: `ai/setup.py` — Added `import threading`; the OAuth block now checks `threading.current_thread() is not threading.main_thread()`. If running headlessly, `run_local_server` is skipped, `creds` is set to `None`, and a clear warning is logged. Vertex AI falls back to ADC. Token is only written if `creds` is not `None`. *(Implemented v2.9)*
 2. ✅ **SQLite Database Locking Timeout**: `web/tasks.py` — All `sqlite3.connect()` calls now use `timeout=30, check_same_thread=False`. The initial status-update `OperationalError` is caught and logged via `utils.log` so it appears in the log file rather than silently disappearing. *(Implemented v2.9)*
 3. ✅ **Missing Initial Log File Creation**: `web/tasks.py` `generate_book_task` — The `initial_log` path is now `open(…, 'a')`-touched immediately after construction and before `utils.set_log_file()`, guaranteeing the file exists for UI polling even if the worker crashes on the very next line. *(Implemented v2.9)*
 ## Summary of Actionable Changes for Implementation Mode:
 1. ✅ Modify `writer.py` to filter `chars_for_writer` based on characters named in `beats`. *(Implemented in v1.5.0)*
 2. ✅ Modify `writer.py` `prev_content` logic to extract the *tail* of the chapter, not a blind slice. *(Implemented in v1.5.0 via `utils.truncate_to_tokens` tail logic)*
 3. ✅ Update `bible_tracker.py` to track time of day and location states. *(Implemented in v1.5.0)*
 4. ✅ Add a pre-processing function to expand chapter beats into staging directions before generating the prose draft. *(Implemented in v2.0 — `expand_beats_to_treatment` in `story/writer.py`)*
 5. ✅ **(v2.2)** Update "Refresh & Optimize" action in UI to be an async fetch call with a processing flag instead of a full page reload, and update `admin.py` to handle JSON responses.
 6. ✅ **(v2.3)** Updated writing prompts and evaluation rubrics across `story/writer.py`, `story/editor.py`, and `story/style_persona.py` to aggressively filter AI-isms, enforce Deep POV via a non-negotiable mandate, add genre-specific writing instructions, and fail chapters that rely on "telling" rather than "showing" via filter-word density checks in the evaluator.
 7. ✅ **(v2.4)** Add "Refresh Style Rules" button to `system_status.html` and `/admin/refresh-style-guidelines` route in `admin.py`. *(Implemented v2.4)*
 8. ✅ **(v2.5)** Lore & Location RAG-Lite: `update_lore_index` in `bible_tracker.py`, `tracking_lore.json`, lore retrieval in `writer.py`, wired in `engine.py`. *(Implemented v2.5)*
 9. ✅ **(v2.5)** Structured Story State (Thread Tracking): new `story/state.py`, `story_state.json`, structured prompt context replacing raw summary blob in `engine.py`. *(Implemented v2.5)*
 10. ✅ **(v2.6)** "Redo Book" form in `consistency_report.html` + `revise_book` route in `run.py` that creates a new run with the instruction applied as bible feedback. *(Implemented v2.6)*
 11. ✅ **(v2.7)** Series Continuity Fix: `series_metadata` (is_series, series_title, book_number, total_books) injected as `SERIES_CONTEXT` into `story/planner.py` (`enrich`, `plan_structure`), `story/writer.py` (`write_chapter`), and `story/editor.py` (`evaluate_chapter_quality`) prompts with position-aware guidance per book number. *(Implemented v2.7)*
 12. ✅ **(v2.8)** Infrastructure & UI Bug Fixes: API timeouts (180s generation, 30s list_models) in `ai/models.py` + `ai/setup.py`; Huey consumer moved to module level with reloader guard in `web/app.py`; Jinja2 `UndefinedError` fix for `tropes`/`formatting_rules` in `project_setup.html`; `project_setup_wizard` now renders form instead of silent redirect when models fail; `create_project_final` `enrich()` call fixed to use correct per-book blueprint structure. *(Implemented v2.8)*
 13. ✅ **(v2.9)** Background Task Hang Fixes: OAuth headless guard in `ai/setup.py` (skips `run_local_server` in non-main threads, logs warning, falls back to ADC); SQLite `timeout=30, check_same_thread=False` on all connections in `web/tasks.py`; initial log file touched immediately in `generate_book_task` so UI polling never sees an empty/missing file. *(Implemented v2.9)*
 14. ✅ **(v2.10)** Huey Consumer Startup Fix: `Consumer.__init__()` in Huey 2.6.0 does NOT accept a `loglevel` keyword argument — the previous call `Consumer(huey, workers=1, worker_type='thread', loglevel=20)` raised `TypeError` on every app start, silently killing the consumer. All tasks stayed `queued` forever, causing the "Preparing environment / Waiting for logs" hang. Fixed by removing `loglevel=20`; Huey logging now configured via `logging.basicConfig`. Consumer startup errors now written to `data/consumer_error.log` for diagnosis. Also removed emoji characters from `print()` calls in `core/config.py` that caused `UnicodeEncodeError` on Windows `cp1252` terminals. Updated `VERSION` to `2.9` in `config.py`. *(Implemented v2.10)*
 15. ✅ **(v2.11)** Live UI Log Feed Fix: The web UI status bar and console log were not updating even though the task was executing (ARCHITECT phase visible in Docker logs). Two root causes: (1) `db_log_callback` in `web/tasks.py` used a bare `except: break` that silently swallowed any non-OperationalError insertion failure — fixed to print `[db_log_callback ERROR]` to stdout with exception type and message. Also changed `datetime.utcnow()` → `datetime.utcnow().isoformat()` to ensure clean string storage. (2) `run_status` in `web/routes/run.py` only read `LogEntry` via SQLAlchemy ORM (potentially stale session) and its file fallback had no error visibility — fixed by: adding `db.session.expire_all()` at request start to force fresh DB reads; adding a raw sqlite3 bypass query that runs if ORM returns no rows; wrapping the file fallback in try/except that prints errors to stdout; adding a secondary check for `runs/run_{id}/web_console.log` (created after engine starts); encoding `utf-8, errors='replace'` on all file opens. *(Implemented v2.11)*
@@ -107,6 +107,9 @@
                    <div class="d-flex align-items-center mb-2">
                        <div class="spinner-border text-primary spinner-border-sm me-2" role="status"></div>
                        <strong class="text-primary" id="statusPhase">Initializing...</strong>
                        <button type="button" class="btn btn-sm btn-outline-secondary ms-auto py-0" onclick="fetchLog()" title="Manually refresh status">
                            <i class="fas fa-sync-alt"></i> Refresh
                        </button>
                    </div>
                    <h5 class="card-title mb-3" id="statusMessage">Preparing environment...</h5>
                    <div class="progress" style="height: 20px;">
@@ -610,6 +613,11 @@
                        window.location.reload();
                    }
                }
            })
            .catch(err => {
                console.error("Polling failed:", err);
                // Resume polling so the UI doesn't silently stop updating
                if (!activeInterval) activeInterval = setInterval(fetchLog, 2000);
            });
    }
@@ -100,9 +100,14 @@
 <!-- Status Bar -->
 <div class="card shadow-sm mb-4">
    <div class="card-body">
-        <div class="d-flex justify-content-between mb-2">
+        <div class="d-flex justify-content-between align-items-center mb-2">
            <span class="fw-bold" id="status-text">Status: {{ run.status|title }}</span>
-            <span class="text-muted" id="run-duration">{{ run.duration() }}</span>
+            <div>
                <span class="text-muted me-2" id="run-duration">{{ run.duration() }}</span>
                <button type="button" class="btn btn-sm btn-outline-secondary py-0" onclick="updateLog()" title="Manually refresh status">
                    <i class="fas fa-sync-alt"></i> Refresh
                </button>
            </div>
        </div>
        <div class="progress" style="height: 20px;">
            <div id="status-bar" class="progress-bar {% if run.status == 'running' %}progress-bar-striped progress-bar-animated{% elif run.status == 'failed' %}bg-danger{% else %}bg-success{% endif %}" 
@@ -440,7 +445,10 @@
                    }
                }
            })
-            .catch(err => console.error(err));
+            .catch(err => {
                console.error("Polling failed:", err);
                setTimeout(updateLog, 5000);
            });
    }
    // Start polling
@@ -29,6 +29,14 @@ def db_log_callback(db_path, run_id, phase, msg):
            time.sleep(0.1)
        except Exception as _e:
            print(f"[db_log_callback ERROR run={run_id}] {type(_e).__name__}: {_e}", flush=True, file=_sys.stdout)
            try:
                import os as _os
                from core import config as _cfg
                _app_log = _os.path.join(_cfg.DATA_DIR, "app.log")
                with open(_app_log, 'a', encoding='utf-8') as _f:
                    _f.write(f"[db_log_callback ERROR run={run_id}] {type(_e).__name__}: {_e}\n")
            except Exception:
                pass
            break
 def db_progress_callback(db_path, run_id, percent):
@@ -92,6 +100,10 @@ def generate_book_task(run_id, project_path, bible_path, allow_copy=True, feedba
    utils.log("SYSTEM", f"Starting Job #{run_id}")
    status = "failed"  # Default to failed; overwritten to "completed" only on clean success
    total_cost = 0.0
    final_log_path = initial_log
    try:
        # 1.1 Handle Feedback / Modification (Re-run logic)
        if feedback and source_run_id:
@@ -190,38 +202,36 @@ def generate_book_task(run_id, project_path, bible_path, allow_copy=True, feedba
        _task_log(f"ERROR: Job failed — {type(e).__name__}: {e}")
        _task_log(_tb.format_exc())
        utils.log("ERROR", f"Job Failed: {e}")
-        status = "failed"
+        # status remains "failed" (set before try block)
-    # 3. Calculate Cost & Cleanup
+    finally:
-    run_dir = os.path.join(project_path, "runs", f"run_{run_id}")
+        # 3. Calculate Cost & Cleanup — guaranteed to run even if worker crashes
        run_dir = os.path.join(project_path, "runs", f"run_{run_id}")
-    total_cost = 0.0
+        if os.path.exists(run_dir):
-    final_log_path = initial_log
+            final_log_path = os.path.join(run_dir, "web_console.log")
            if os.path.exists(initial_log):
                try:
                    os.rename(initial_log, final_log_path)
                except OSError:
                    shutil.copy2(initial_log, final_log_path)
                    os.remove(initial_log)
-    if os.path.exists(run_dir):
+            for item in os.listdir(run_dir):
-        final_log_path = os.path.join(run_dir, "web_console.log")
+                item_path = os.path.join(run_dir, item)
-        if os.path.exists(initial_log):
+                if os.path.isdir(item_path) and item.startswith("Book_"):
-            try:
+                    usage_path = os.path.join(item_path, "usage_log.json")
-                os.rename(initial_log, final_log_path)
+                    if os.path.exists(usage_path):
-            except OSError:
+                        data = utils.load_json(usage_path)
-                shutil.copy2(initial_log, final_log_path)
+                        total_cost += data.get('totals', {}).get('est_cost_usd', 0.0)
                os.remove(initial_log)
-        for item in os.listdir(run_dir):
+        # 4. Update Database with Final Status — run is never left in 'running' state
-            item_path = os.path.join(run_dir, item)
+        try:
-            if os.path.isdir(item_path) and item.startswith("Book_"):
+            with sqlite3.connect(db_path, timeout=30, check_same_thread=False) as conn:
-                usage_path = os.path.join(item_path, "usage_log.json")
+                conn.execute("UPDATE run SET status = ?, cost = ?, end_time = ?, log_file = ?, progress = 100 WHERE id = ?",
-                if os.path.exists(usage_path):
+                             (status, total_cost, datetime.utcnow(), final_log_path, run_id))
-                    data = utils.load_json(usage_path)
+        except Exception as e:
-                    total_cost += data.get('totals', {}).get('est_cost_usd', 0.0)
+            print(f"Failed to update run status in DB: {e}")
    # 4. Update Database with Final Status
    try:
        with sqlite3.connect(db_path, timeout=30, check_same_thread=False) as conn:
            conn.execute("UPDATE run SET status = ?, cost = ?, end_time = ?, log_file = ?, progress = 100 WHERE id = ?",
                         (status, total_cost, datetime.utcnow(), final_log_path, run_id))
    except Exception as e:
        print(f"Failed to update run status in DB: {e}")
    _task_log(f"Task finished. status={status} cost=${total_cost:.4f}")
    return {"run_id": run_id, "status": status, "cost": total_cost, "final_log": final_log_path}