When Astra Docs Chat cannot find relevant chunks in Astra DB, the model may still answer from general knowledge: confident, plausible, and sometimes wrong for Astra-specific APIs. The parent post flagged tightening this with a stricter “docs only” prompt. This post covers practical guardrails at the prompt, retrieval, flow, and UX layers.
Context: Building Astra Docs Chat
Related: Langflow chat flow · Streaming chat UI
Status: planned for v2; not deployed in current production chat.
Try v1: Astra Docs Chat
What goes wrong today ¶
Typical failure modes:
- Empty retrieval: question outside the doc corpus (e.g. “What is Kubernetes?”)
- Weak retrieval: top-k chunks are tangential; model fills gaps from training data
- Stale corpus: docs changed; vectors lag; model improvises (re-ingest post )
v1 accepts (1) and (2) as a trade-off for simplicity. Guardrails reduce harm without requiring perfect retrieval.
The chat flow’s Prompt component uses a permissive template: “Given the context above, answer the question as best as possible.” When {context} is empty or irrelevant, “as best as possible” invites general knowledge.
Layer 1: Prompt refusal ¶
In the Langflow Prompt component, replace or augment instructions:
You answer questions about DataStax Astra DB Serverless using ONLY the context below.
If the context is empty or does not contain enough information to answer, respond with exactly:
"I couldn't find this in the Astra DB Serverless documentation I have indexed."
Do not guess API names, limits, default values, or URLs.
Do not answer from general knowledge about databases or other products.
Short and repetitive beats clever: models drift under streaming pressure.
Test in Langflow Playground with {context} manually cleared before you trust production behaviour.
Layer 2: Retrieval threshold ¶
The chat AstraDB component defaults to similarity search, 4 results, score threshold 0 in v1. That means weak matches still reach the prompt.
If your Langflow / AstraDB component exposes scores or distance:
- Log scores for 20 known-good questions and 20 off-topic questions
- Define a minimum similarity threshold
- Branch in the flow: below threshold → skip LLM, return fixed refusal string (Template or ChatOutput)
This avoids paying for an LLM call that will hallucinate anyway.
Exact thresholds are corpus-specific. Questions about exact CLI flags may need lower thresholds than conceptual “what is a collection?” questions.
Hybrid search (Astra vector store post ) can help symbol-heavy queries before you tune thresholds.
Layer 3: Empty-context detection in the flow ¶
Prefer flow-level refusal over proxy hacks:
AstraDB search
→ Condition: context length > N OR top score > T
true → Prompt → DeepSeek → ChatOutput
false → Template ("I couldn't find this...") → ChatOutput
Langflow’s conditional routing varies by version; the idea is do not call the LLM when there is nothing to ground on.
Optional belt-and-braces: if Langflow exposes retrieval metadata in the end event, the Pages Function could replace the body when chunk count is zero. That couples you to response shape; prefer flow-level refusal when possible.
Layer 4: UX copy ¶
Refusal should look intentional, not broken:
I couldn’t find this in the indexed Astra DB Serverless docs. Try rephrasing, or search official documentation .
Better than a generic error bubble or a confident wrong API example. That copy renders in the Hugo chat bubbles described in building a streaming chat UI .
In astra-chat.js, refusals can use the normal assistant bubble styling (not astra-chat-bubble--error) so users do not think the service failed.
Testing matrix ¶
| Input | Expected |
|---|---|
| “How do I create a collection?” | Normal grounded answer |
| “Explain quantum chromodynamics” | Refusal |
| “Astra DB vs Cassandra on Mars” | Refusal or narrow “not in docs” |
| Misspelled but valid topic (“PCU grup”) | Answer or ask to rephrase: document behaviour |
| Question about a feature added after last ingest | Refusal or stale partial answer: note re-ingest gap |
Log failures during beta; adjust prompt before threshold tuning. Prompt changes redeploy only on Langflow, not Hugo.
Interaction with citations ¶
Guardrails without source links still help. If you add citations later, implement refusal before the model can invent URLs.
Order of operations I would use:
- Prompt refusal (cheap)
- Retrieval threshold / empty-context branch (reliable)
- Structured citations (trust)
- Re-ingest automation (freshness)
What changes on the public site ¶
When shipped:
- Langflow flow update only for layers 1-3 (no Hugo redeploy required if refusal text comes through the normal stream)
- Optional copy tweak in welcome message: “Answers use indexed docs only”
- Update parent post link from “sensible next step” to “how I added docs-only guardrails”
Proxy and UI stay the same unless you add a distinct refusal SSE event type (usually unnecessary).
Next in the series ¶
- Re-ingesting a RAG doc corpus when upstream docs change : stale corpus is a guardrail failure mode
- Building a streaming chat UI in Hugo : where refusal copy appears in the browser
Series index: Building Astra Docs Chat
Open Astra Docs Chat and try an off-topic question in v1: notice when the answer sounds general. That is what this post is meant to fix.