- Remove ai_blueprint.md from git tracking (already gitignored)
- web/app.py: Unify startup reset — all non-terminal states (running,
queued, interrupted) are reset to 'failed' with per-job logging
- web/routes/project.py: Add active_runs list to view_project() context
- templates/project.html: Add Active Jobs card showing all running/queued
jobs with status badge, start time, progress bar, and View Details link;
Generate button and Stop buttons now driven by active_runs list
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- web/db.py: Add last_heartbeat column to Run model
- core/utils.py: Add set_heartbeat_callback() and send_heartbeat()
- web/tasks.py: Add _robust_update_run_status() with 5-retry exponential backoff;
add db_heartbeat_callback(); remove all bare except:pass on DB status updates;
set start_time + last_heartbeat when marking run as 'running'
- web/app.py: Add last_heartbeat column migration; add _stale_job_watcher()
background thread (checks every 5 min, 15-min heartbeat threshold, 2-hr start_time threshold)
- cli/engine.py: Add phase-level logging banners and try/except wrappers in
process_book(); add utils.send_heartbeat() after each chapter save;
add start/finish logging in run_generation()
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Backend (web/routes/run.py): Extended /run/<id>/status JSON response with
server_timestamp, db_log_count, and latest_log_timestamp so clients can
detect whether the DB is being written to independently of the log text.
- Frontend (templates/run_details.html):
• Added Live Status Panel above the System Log card, showing:
- Polling state badge (Initializing / Requesting / Waiting Ns / Error / Idle)
- Last Successful Update timestamp (HH:MM:SS, updated every successful poll)
- DB diagnostics (log count + latest log timestamp from server response)
- Last Error message displayed inline when a poll fails
- Force Refresh button to immediately trigger a new poll
• Refactored JS polling loop: countdown timer with clearCountdown/
startWaitCountdown helpers, forceRefresh() clears pending timers before
re-polling, explicit pollTimer/countdownInterval tracking.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- web/tasks.py: db_log_callback now writes non-OperationalError exceptions to data/app.log for visibility
- web/tasks.py: generate_book_task restructured with try...finally to guarantee final status update — run can never be left in 'running' state if worker crashes
- templates/project.html: added .catch() to fetchLog() with console.error + polling resume on failure; added manual Refresh button to status bar
- templates/run_details.html: improved .catch() in updateLog() with descriptive message + 5s retry; added manual Refresh button to status bar
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- web/tasks.py: db_log_callback bare `except: break` replaced with
explicit `except Exception as _e: print(...)` so insertion failures
are visible in Docker logs. Also fixed datetime.utcnow() → .isoformat()
for clean string storage in SQLite.
Same fix applied to db_progress_callback.
- web/routes/run.py (run_status): added db.session.expire_all() to
force fresh reads; raw sqlite3 bypass query when ORM returns no rows;
file fallback wrapped in try/except with stdout error reporting;
secondary check for web_console.log inside the run directory;
utf-8 encoding on all file opens.
- ai_blueprint.md: bumped to v2.11, documented root causes and fixes.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
docker-compose.yml:
- Add PYTHONIOENCODING=utf-8 env var (guarantees UTF-8 stdout in all
Python environments, including Docker slim images on ARM).
- Add logging driver section: json-file, max-size 10m, max-file 5.
Without this the json-file log on a Raspberry Pi SD card grows
unbounded and eventually kills the container or fills the disk.
web/requirements_web.txt:
- Pin huey==2.6.0 so a future pip upgrade cannot silently change the
Consumer() API and re-introduce the loglevel= TypeError that caused
all tasks to stay queued forever.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- web/app.py: Startup banner to docker logs (Python version, platform,
Huey version, DB paths). All print() calls now flush=True so Docker
captures them immediately. Emoji-free for robust stdout encoding.
Startup now detects orphaned queued runs (queue empty but DB queued)
and resets them to 'failed' so the UI does not stay stuck on reload.
Huey logging configured at INFO level so task pick-up/completion
appears in `docker logs`. Consumer skip reason logged explicitly.
- web/tasks.py: generate_book_task now emits [TASK run=N] lines to
stdout (docker logs) at pick-up, log-file creation, DB status update,
and on error (with full traceback) so failures are always visible.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Root cause: Consumer(huey, workers=1, worker_type='thread', loglevel=20)
raised TypeError on every app start because Huey 2.6.0 does not accept
a `loglevel` keyword argument. The exception was silently caught and only
printed to stdout, so the consumer never ran and all tasks stayed 'queued'
forever — causing the 'Preparing environment / Waiting for logs' hang.
Fixes:
- web/app.py: Remove invalid `loglevel=20` from Consumer(); configure
Huey logging via logging.basicConfig(WARNING) instead. Add persistent
error logging to data/consumer_error.log for future diagnosis.
- core/config.py: Replace emoji print() calls with ASCII-safe equivalents
to prevent UnicodeEncodeError on Windows cp1252 terminals at import time.
- core/config.py: Update VERSION to 2.9 (was stale at 1.5.0).
- ai_blueprint.md: Bump to v2.10, document root cause and fixes.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- ai/setup.py: Added threading import; OAuth block now detects background/headless
threads and skips run_local_server to prevent indefinite blocking. Logs a clear
warning and falls back to ADC for Vertex AI. Token file only written when creds
are not None.
- web/tasks.py: All sqlite3.connect() calls now use timeout=30, check_same_thread=False.
OperationalError on the initial status update is caught and logged via utils.log.
generate_book_task now touches initial_log immediately so the UI polling endpoint
always finds an existing file even if the worker crashes on the next line.
- ai_blueprint.md: Bumped to v2.9; Section 12.D sub-items 1-3 marked ✅; item 13
added to summary.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1. templates/project_setup.html: s.tropes|join and s.formatting_rules|join
raised Jinja2 UndefinedError when AI failed and fallback dict lacked those
keys → 500 blank page. Fixed with (s.tropes or [])|join(', ').
2. web/routes/project.py (project_setup_wizard): Removed silent redirect-to-
dashboard when model_logic is None. Now renders the setup form with a
complete default suggestions dict (all fields present, lists as []) plus a
clear warning flash so the user can fill it in manually.
3. web/routes/project.py (create_project_final): planner.enrich() was called
with the full bible dict — enrich() reads manual_instruction from the top
level (got 'A generic story' fallback) and wrote results into book_metadata
instead of the bible's books[0]. Fixed to build a proper per-book blueprint,
call enrich, and merge characters/plot_beats back into the correct locations.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Root causes of indefinite spinning during book create/generate:
1. ai/models.py — ResilientModel.generate_content() had no timeout: a
stalled Gemini API call would block the thread forever. Now injects
request_options={"timeout": 180} into every call. Also removed the
dangerous init_models(force=True) call inside the retry handler, which
was making a second network call during an existing API failure.
2. ai/setup.py — genai.list_models() calls in get_optimal_model(),
select_best_models(), and init_models() had no timeout. Added
request_options={"timeout": 30} to all three calls so model init
fails fast rather than hanging indefinitely.
3. web/app.py — Huey task consumer only started inside
`if __name__ == "__main__":`, meaning tasks queued via flask run,
gunicorn, or other WSGI runners were never executed (status stuck at
"queued" forever). Moved consumer start to module level with a
WERKZEUG_RUN_MAIN guard to prevent double-start under the reloader.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Removed `from huey.contrib.mini import MiniHuey` which caused
`ModuleNotFoundError: No module named 'gevent'` on startup. MiniHuey
was never used; the app correctly uses SqliteHuey via `web.tasks`.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
web/app.py was hardcoded to port 7070, causing Docker port forwarding
(5000:5000) and the Dockerfile HEALTHCHECK to fail. Changed to port 5000
to match docker-compose.yml and Dockerfile configuration.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>