Blueprint v2.2 review: update README, force model refresh
- Updated README to document async Refresh & Optimize feature (v2.2) - Ran init_models(force=True): cache refreshed with live API results - Logic: gemini-2.5-pro - Writer: gemini-2.5-flash - Artist: gemini-2.5-flash-image - Image: imagen-3.0-generate-001 (Vertex AI) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -131,7 +131,7 @@ Open `http://localhost:5000`.
|
||||
- **Payload Guardrails:** Every generation call estimates the prompt token count before dispatch. If the payload exceeds 30,000 tokens, a warning is logged so runaway context injection is surfaced immediately.
|
||||
|
||||
### AI Context Optimization (`core/utils.py`)
|
||||
- **Token Estimation:** `estimate_tokens(text)` provides a fast character-based token count approximation (`len(text) / 4`) without requiring external tokenizer libraries.
|
||||
- **System Status Model Optimization (`templates/system_status.html`, `web/routes/admin.py`):** Refreshing models operates via an async fetch request, preventing page freezes during the re-evaluation of available models.
|
||||
- **Context Truncation:** `truncate_to_tokens(text, max_tokens)` enforces hard caps on large context variables — previous chapter text, story summaries, and character data — before they are injected into prompts, preventing token overflows on large manuscripts.
|
||||
- **AI Response Cache:** An in-memory cache (`_AI_CACHE`) keyed by MD5 hash of inputs prevents redundant API calls for deterministic tasks such as persona analysis. Results are reused for identical inputs within the same session.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user