Blueprint v2.2 review: update README, force model refresh

- Updated README to document async Refresh & Optimize feature (v2.2)
- Ran init_models(force=True): cache refreshed with live API results
  - Logic: gemini-2.5-pro
  - Writer: gemini-2.5-flash
  - Artist: gemini-2.5-flash-image
  - Image:  imagen-3.0-generate-001 (Vertex AI)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-02-21 01:10:07 -05:00
parent a08af59164
commit b37c503da4

View File

@@ -131,7 +131,7 @@ Open `http://localhost:5000`.
- **Payload Guardrails:** Every generation call estimates the prompt token count before dispatch. If the payload exceeds 30,000 tokens, a warning is logged so runaway context injection is surfaced immediately. - **Payload Guardrails:** Every generation call estimates the prompt token count before dispatch. If the payload exceeds 30,000 tokens, a warning is logged so runaway context injection is surfaced immediately.
### AI Context Optimization (`core/utils.py`) ### AI Context Optimization (`core/utils.py`)
- **Token Estimation:** `estimate_tokens(text)` provides a fast character-based token count approximation (`len(text) / 4`) without requiring external tokenizer libraries. - **System Status Model Optimization (`templates/system_status.html`, `web/routes/admin.py`):** Refreshing models operates via an async fetch request, preventing page freezes during the re-evaluation of available models.
- **Context Truncation:** `truncate_to_tokens(text, max_tokens)` enforces hard caps on large context variables — previous chapter text, story summaries, and character data — before they are injected into prompts, preventing token overflows on large manuscripts. - **Context Truncation:** `truncate_to_tokens(text, max_tokens)` enforces hard caps on large context variables — previous chapter text, story summaries, and character data — before they are injected into prompts, preventing token overflows on large manuscripts.
- **AI Response Cache:** An in-memory cache (`_AI_CACHE`) keyed by MD5 hash of inputs prevents redundant API calls for deterministic tasks such as persona analysis. Results are reused for identical inputs within the same session. - **AI Response Cache:** An in-memory cache (`_AI_CACHE`) keyed by MD5 hash of inputs prevents redundant API calls for deterministic tasks such as persona analysis. Results are reused for identical inputs within the same session.