- web/tasks.py: db_log_callback now writes non-OperationalError exceptions to data/app.log for visibility - web/tasks.py: generate_book_task restructured with try...finally to guarantee final status update — run can never be left in 'running' state if worker crashes - templates/project.html: added .catch() to fetchLog() with console.error + polling resume on failure; added manual Refresh button to status bar - templates/run_details.html: improved .catch() in updateLog() with descriptive message + 5s retry; added manual Refresh button to status bar Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
5.2 KiB
AI Context Optimization Blueprint (v2.12 — implemented 2026-02-21)
This blueprint outlines architectural improvements for how AI context is managed during the writing process. The goal is to provide the AI (Claude/Gemini) with better, highly-targeted context upfront, which will dramatically improve first-draft quality and reduce the reliance on expensive, time-consuming quality checks and rewrites (currently up to 5 attempts).
Bug: Frontend Stuck on "Initializing/Waiting for logs" ✅ FIXED in v2.12
Symptom: A book generation task is started from the web UI. Docker logs show the process is running correctly and making progress through different phases (ARCHITECT, WRITER, etc.). However, the frontend UI remains stuck on the "Initializing..." or "Waiting for logs..." state and never shows the live log feed or progress updates.
Root Cause Analysis:
This is a state-synchronization issue between the backend task worker and the frontend UI. The UI polls a backend endpoint (/run/<run_id>/status) to get the latest status and log content. When this bug occurs, one of several underlying systems has failed:
- Log Persistence: The backend worker fails to write its log messages to the shared database (
LogEntrytable) or fallback file (web_console.log). - Database Connection/Session: The web server process that responds to the
/run/.../statuspoll has a stale view of the database or cannot connect. - Frontend Polling: The javascript on the page stops polling for updates.
Based on previous fixes (v2.9, v2.11), this is a recurring problem with multiple potential failure points.
Areas to Investigate & Verify (Piece-by-Piece)
This section provides a checklist for a developer to debug this specific issue.
1. Backend Worker -> Database Communication (web/tasks.py)
db_log_callbackFunction: This is the primary function responsible for writing log entries to the database.- Action: Verify that the
try...exceptblock is not catching and silencing critical errors. Theexcept OperationalErroris expected for handling database locks, but other exceptions should be logged loudly. - Action: Ensure the database session is correctly handled within the callback. Is it possible the session becomes invalid over the long duration of the task?
- Action: Verify that the
generate_book_taskFinal Status Update:- Action: At the very end of the
generate_book_task, there are finalrun.status = 'completed'anddb.session.commit()calls. Verify these are wrapped in afinallyblock to guarantee they execute even if the main task logic fails. If the task fails, the status should be explicitly set to'failed'.
- Action: At the very end of the
2. Web Server -> Database Communication (web/routes/run.py)
/run/<run_id>/statusEndpoint: This is the endpoint the frontend polls.- Action: Review the logic for fetching
LogEntryrecords. The v2.11 fix addeddb.session.expire_all()to prevent stale reads. Confirm this is still in place and effective. - Action: Examine the fallback logic. If the database query returns no logs, it tries to read from
web_console.log. Is this file path correctly constructed? Does the worker have permissions to write to it? Is the web server looking in the right place (runs/run_<id>/...)?
- Action: Review the logic for fetching
3. Frontend -> Web Server Communication (templates/project.html & templates/run_details.html)
- Javascript Polling Logic:
- Action: Check the
fetchLog()orupdateLog()javascript function. Is there any condition that would causesetTimeoutto stop being called? (e.g., an unexpected javascript error in the response parsing). - Action: Add more robust error handling to the
fetch()call's.catch()block. Log errors to the browser's developer console so they are visible. - Action: Verify that the
initialStatuslogic correctly identifies when a page is loaded for an already-running task and starts polling immediately.
- Action: Check the
Proposed Fixes & Implementation Plan
- Strengthen
db_log_callback:- In
web/tasks.py, modify thedb_log_callbackto explicitly log any non-OperationalErrorexceptions to the main application log file (data/app.log) before breaking the loop. This will give us visibility into why it might be failing silently.
- In
- Guarantee Final Status Update:
- In
web/tasks.py, wrap the main logic ofgenerate_book_taskin atry...finallyblock. Thefinallyblock will be responsible for setting the final run status (completedorfailed) and committing the change. This ensures the run is never left in arunningstate if the worker crashes.
- In
- Improve Frontend Error Visibility:
- In the javascript of
templates/project.htmlandtemplates/run_details.html, addconsole.error("Polling failed:", err);to the.catch()block of the status-pollingfetch()call. This makes it immediately obvious if the frontend is experiencing network or parsing errors.
- In the javascript of
- Add a "Force Refresh" Button (UI Enhancement):
- Add a small "Refresh Status" button next to the status message in the UI. This button will manually trigger the
fetchLog()/updateLog()function, providing a manual override if the automatic polling fails for any reason.
- Add a small "Refresh Status" button next to the status message in the UI. This button will manually trigger the