Files

Mike Wichers 4e39e18dfe Auto-commit: v2.12 — Fix frontend stuck on Initializing/Waiting for logs

- web/tasks.py: db_log_callback now writes non-OperationalError exceptions to data/app.log for visibility
- web/tasks.py: generate_book_task restructured with try...finally to guarantee final status update — run can never be left in 'running' state if worker crashes
- templates/project.html: added .catch() to fetchLog() with console.error + polling resume on failure; added manual Refresh button to status bar
- templates/run_details.html: improved .catch() in updateLog() with descriptive message + 5s retry; added manual Refresh button to status bar

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-02-21 18:40:28 -05:00

5.2 KiB

Raw Blame History

AI Context Optimization Blueprint (v2.12 — implemented 2026-02-21)

This blueprint outlines architectural improvements for how AI context is managed during the writing process. The goal is to provide the AI (Claude/Gemini) with better, highly-targeted context upfront, which will dramatically improve first-draft quality and reduce the reliance on expensive, time-consuming quality checks and rewrites (currently up to 5 attempts).

Bug: Frontend Stuck on "Initializing/Waiting for logs" ✅ FIXED in v2.12

Symptom: A book generation task is started from the web UI. Docker logs show the process is running correctly and making progress through different phases (ARCHITECT, WRITER, etc.). However, the frontend UI remains stuck on the "Initializing..." or "Waiting for logs..." state and never shows the live log feed or progress updates.

Root Cause Analysis: This is a state-synchronization issue between the backend task worker and the frontend UI. The UI polls a backend endpoint (/run/<run_id>/status) to get the latest status and log content. When this bug occurs, one of several underlying systems has failed:

Log Persistence: The backend worker fails to write its log messages to the shared database (LogEntry table) or fallback file (web_console.log).
Database Connection/Session: The web server process that responds to the /run/.../status poll has a stale view of the database or cannot connect.
Frontend Polling: The javascript on the page stops polling for updates.

Based on previous fixes (v2.9, v2.11), this is a recurring problem with multiple potential failure points.

Areas to Investigate & Verify (Piece-by-Piece)

This section provides a checklist for a developer to debug this specific issue.

1. Backend Worker -> Database Communication (`web/tasks.py`)

db_log_callback Function: This is the primary function responsible for writing log entries to the database.
- Action: Verify that the try...except block is not catching and silencing critical errors. The except OperationalError is expected for handling database locks, but other exceptions should be logged loudly.
- Action: Ensure the database session is correctly handled within the callback. Is it possible the session becomes invalid over the long duration of the task?
generate_book_task Final Status Update:
- Action: At the very end of the generate_book_task, there are final run.status = 'completed' and db.session.commit() calls. Verify these are wrapped in a finally block to guarantee they execute even if the main task logic fails. If the task fails, the status should be explicitly set to 'failed'.

2. Web Server -> Database Communication (`web/routes/run.py`)

/run/<run_id>/status Endpoint: This is the endpoint the frontend polls.
- Action: Review the logic for fetching LogEntry records. The v2.11 fix added db.session.expire_all() to prevent stale reads. Confirm this is still in place and effective.
- Action: Examine the fallback logic. If the database query returns no logs, it tries to read from web_console.log. Is this file path correctly constructed? Does the worker have permissions to write to it? Is the web server looking in the right place (runs/run_<id>/...)?

3. Frontend -> Web Server Communication (`templates/project.html` & `templates/run_details.html`)

Javascript Polling Logic:
- Action: Check the fetchLog() or updateLog() javascript function. Is there any condition that would cause setTimeout to stop being called? (e.g., an unexpected javascript error in the response parsing).
- Action: Add more robust error handling to the fetch() call's .catch() block. Log errors to the browser's developer console so they are visible.
- Action: Verify that the initialStatus logic correctly identifies when a page is loaded for an already-running task and starts polling immediately.

Proposed Fixes & Implementation Plan

Strengthen db_log_callback:
- In web/tasks.py, modify the db_log_callback to explicitly log any non-OperationalError exceptions to the main application log file (data/app.log) before breaking the loop. This will give us visibility into why it might be failing silently.
Guarantee Final Status Update:
- In web/tasks.py, wrap the main logic of generate_book_task in a try...finally block. The finally block will be responsible for setting the final run status (completed or failed) and committing the change. This ensures the run is never left in a running state if the worker crashes.
Improve Frontend Error Visibility:
- In the javascript of templates/project.html and templates/run_details.html, add console.error("Polling failed:", err); to the .catch() block of the status-polling fetch() call. This makes it immediately obvious if the frontend is experiencing network or parsing errors.
Add a "Force Refresh" Button (UI Enhancement):
- Add a small "Refresh Status" button next to the status message in the UI. This button will manually trigger the fetchLog()/updateLog() function, providing a manual override if the automatic polling fails for any reason.

5.2 KiB Raw Blame History