Files
bookapp/ai_blueprint.md
Mike Wichers 4e39e18dfe Auto-commit: v2.12 — Fix frontend stuck on Initializing/Waiting for logs
- web/tasks.py: db_log_callback now writes non-OperationalError exceptions to data/app.log for visibility
- web/tasks.py: generate_book_task restructured with try...finally to guarantee final status update — run can never be left in 'running' state if worker crashes
- templates/project.html: added .catch() to fetchLog() with console.error + polling resume on failure; added manual Refresh button to status bar
- templates/run_details.html: improved .catch() in updateLog() with descriptive message + 5s retry; added manual Refresh button to status bar

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-21 18:40:28 -05:00

53 lines
5.2 KiB
Markdown

# AI Context Optimization Blueprint (v2.12 — implemented 2026-02-21)
This blueprint outlines architectural improvements for how AI context is managed during the writing process. The goal is to provide the AI (Claude/Gemini) with **better, highly-targeted context upfront**, which will dramatically improve first-draft quality and reduce the reliance on expensive, time-consuming quality checks and rewrites (currently up to 5 attempts).
## Bug: Frontend Stuck on "Initializing/Waiting for logs" ✅ FIXED in v2.12
**Symptom:**
A book generation task is started from the web UI. Docker logs show the process is running correctly and making progress through different phases (ARCHITECT, WRITER, etc.). However, the frontend UI remains stuck on the "Initializing..." or "Waiting for logs..." state and never shows the live log feed or progress updates.
**Root Cause Analysis:**
This is a state-synchronization issue between the backend task worker and the frontend UI. The UI polls a backend endpoint (`/run/<run_id>/status`) to get the latest status and log content. When this bug occurs, one of several underlying systems has failed:
1. **Log Persistence:** The backend worker fails to write its log messages to the shared database (`LogEntry` table) or fallback file (`web_console.log`).
2. **Database Connection/Session:** The web server process that responds to the `/run/.../status` poll has a stale view of the database or cannot connect.
3. **Frontend Polling:** The javascript on the page stops polling for updates.
Based on previous fixes (v2.9, v2.11), this is a recurring problem with multiple potential failure points.
### Areas to Investigate & Verify (Piece-by-Piece)
This section provides a checklist for a developer to debug this specific issue.
#### 1. Backend Worker -> Database Communication (`web/tasks.py`)
* **`db_log_callback` Function:** This is the primary function responsible for writing log entries to the database.
* **Action:** Verify that the `try...except` block is not catching and silencing critical errors. The `except OperationalError` is expected for handling database locks, but other exceptions should be logged loudly.
* **Action:** Ensure the database session is correctly handled within the callback. Is it possible the session becomes invalid over the long duration of the task?
* **`generate_book_task` Final Status Update:**
* **Action:** At the very end of the `generate_book_task`, there are final `run.status = 'completed'` and `db.session.commit()` calls. Verify these are wrapped in a `finally` block to guarantee they execute even if the main task logic fails. If the task fails, the status should be explicitly set to `'failed'`.
#### 2. Web Server -> Database Communication (`web/routes/run.py`)
* **`/run/<run_id>/status` Endpoint:** This is the endpoint the frontend polls.
* **Action:** Review the logic for fetching `LogEntry` records. The v2.11 fix added `db.session.expire_all()` to prevent stale reads. Confirm this is still in place and effective.
* **Action:** Examine the fallback logic. If the database query returns no logs, it tries to read from `web_console.log`. Is this file path correctly constructed? Does the worker have permissions to write to it? Is the web server looking in the right place (`runs/run_<id>/...`)?
#### 3. Frontend -> Web Server Communication (`templates/project.html` & `templates/run_details.html`)
* **Javascript Polling Logic:**
* **Action:** Check the `fetchLog()` or `updateLog()` javascript function. Is there any condition that would cause `setTimeout` to stop being called? (e.g., an unexpected javascript error in the response parsing).
* **Action:** Add more robust error handling to the `fetch()` call's `.catch()` block. Log errors to the browser's developer console so they are visible.
* **Action:** Verify that the `initialStatus` logic correctly identifies when a page is loaded for an already-running task and starts polling immediately.
### Proposed Fixes & Implementation Plan
1. **Strengthen `db_log_callback`:**
* In `web/tasks.py`, modify the `db_log_callback` to explicitly log any non-`OperationalError` exceptions to the main application log file (`data/app.log`) before breaking the loop. This will give us visibility into why it might be failing silently.
2. **Guarantee Final Status Update:**
* In `web/tasks.py`, wrap the main logic of `generate_book_task` in a `try...finally` block. The `finally` block will be responsible for setting the final run status (`completed` or `failed`) and committing the change. This ensures the run is never left in a `running` state if the worker crashes.
3. **Improve Frontend Error Visibility:**
* In the javascript of `templates/project.html` and `templates/run_details.html`, add `console.error("Polling failed:", err);` to the `.catch()` block of the status-polling `fetch()` call. This makes it immediately obvious if the frontend is experiencing network or parsing errors.
4. **Add a "Force Refresh" Button (UI Enhancement):**
* Add a small "Refresh Status" button next to the status message in the UI. This button will manually trigger the `fetchLog()`/`updateLog()` function, providing a manual override if the automatic polling fails for any reason.