Blueprint v2.2 review: update README, force model refresh

- Updated README to document async Refresh & Optimize feature (v2.2) - Ran init_models(force=True): cache refreshed with live API results - Logic: gemini-2.5-pro - Writer: gemini-2.5-flash - Artist: gemini-2.5-flash-image - Image: imagen-3.0-generate-001 (Vertex AI) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-21 01:10:07 -05:00
parent a08af59164
commit b37c503da4
1 changed files with 1 additions and 1 deletions
--- a/README.md
+++ b/README.md
@@ -131,7 +131,7 @@ Open `http://localhost:5000`.
 - **Payload Guardrails:** Every generation call estimates the prompt token count before dispatch. If the payload exceeds 30,000 tokens, a warning is logged so runaway context injection is surfaced immediately.

 ### AI Context Optimization (`core/utils.py`)
- **Token Estimation:** `estimate_tokens(text)` provides a fast character-based token count approximation (`len(text) / 4`) without requiring external tokenizer libraries.
+- **System Status Model Optimization (`templates/system_status.html`, `web/routes/admin.py`):** Refreshing models operates via an async fetch request, preventing page freezes during the re-evaluation of available models.
 - **Context Truncation:** `truncate_to_tokens(text, max_tokens)` enforces hard caps on large context variables — previous chapter text, story summaries, and character data — before they are injected into prompts, preventing token overflows on large manuscripts.
 - **AI Response Cache:** An in-memory cache (`_AI_CACHE`) keyed by MD5 hash of inputs prevents redundant API calls for deterministic tasks such as persona analysis. Results are reused for identical inputs within the same session.