diff --git a/README.md b/README.md index a6e2305..e1362b4 100644 --- a/README.md +++ b/README.md @@ -131,7 +131,7 @@ Open `http://localhost:5000`. - **Payload Guardrails:** Every generation call estimates the prompt token count before dispatch. If the payload exceeds 30,000 tokens, a warning is logged so runaway context injection is surfaced immediately. ### AI Context Optimization (`core/utils.py`) -- **Token Estimation:** `estimate_tokens(text)` provides a fast character-based token count approximation (`len(text) / 4`) without requiring external tokenizer libraries. +- **System Status Model Optimization (`templates/system_status.html`, `web/routes/admin.py`):** Refreshing models operates via an async fetch request, preventing page freezes during the re-evaluation of available models. - **Context Truncation:** `truncate_to_tokens(text, max_tokens)` enforces hard caps on large context variables — previous chapter text, story summaries, and character data — before they are injected into prompts, preventing token overflows on large manuscripts. - **AI Response Cache:** An in-memory cache (`_AI_CACHE`) keyed by MD5 hash of inputs prevents redundant API calls for deterministic tasks such as persona analysis. Results are reused for identical inputs within the same session.