Auto-commit: v2.14 — Stuck job robustness (heartbeat, retry, stale watcher, granular logging)

- web/db.py: Add last_heartbeat column to Run model
- core/utils.py: Add set_heartbeat_callback() and send_heartbeat()
- web/tasks.py: Add _robust_update_run_status() with 5-retry exponential backoff;
  add db_heartbeat_callback(); remove all bare except:pass on DB status updates;
  set start_time + last_heartbeat when marking run as 'running'
- web/app.py: Add last_heartbeat column migration; add _stale_job_watcher()
  background thread (checks every 5 min, 15-min heartbeat threshold, 2-hr start_time threshold)
- cli/engine.py: Add phase-level logging banners and try/except wrappers in
  process_book(); add utils.send_heartbeat() after each chapter save;
  add start/finish logging in run_generation()

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-02-21 19:00:29 -05:00
parent 97efd51fd5
commit 81340a18ea
6 changed files with 275 additions and 122 deletions

View File

@@ -63,11 +63,19 @@ def set_log_callback(callback):
def set_progress_callback(callback):
_log_context.progress_callback = callback
def set_heartbeat_callback(callback):
_log_context.heartbeat_callback = callback
def update_progress(percent):
if getattr(_log_context, 'progress_callback', None):
try: _log_context.progress_callback(percent)
except: pass
def send_heartbeat():
if getattr(_log_context, 'heartbeat_callback', None):
try: _log_context.heartbeat_callback()
except: pass
def clean_json(text):
text = text.replace("```json", "").replace("```", "").strip()
start_obj = text.find('{')