The Langflow DataStax RAG template ships with an OpenAI chat node. For Astra Docs Chat , I replaced it with DeepSeek for streaming answers: cheaper at volume, acceptable on technical Q&A over retrieved docs.
Context: Building Astra Docs Chat · Langflow chat flow · Astra DB vector store
Embeddings stayed on OpenAI text-embedding-3-small: only the generation step changed. See chunking and embedding post
for why those stay paired.
Try it: Astra Docs Chat
What changed in the flow ¶
| Component | v1 template | This project |
|---|---|---|
| Language model | OpenAI (e.g. gpt-4o-mini) | DeepSeek deepseek-v4-flash |
| Streaming | optional | enabled (stream: true) |
| Temperature | template default | 0.1 (slightly factual bias) |
| Embeddings | OpenAI | unchanged |
| Astra DB | unchanged | unchanged |
DeepSeek API key stored as Langflow global variable DEEPSEEK_API_KEY: same pattern as OPENAI_API_KEY and Astra tokens.
The chat endpoint name stays datastax-astra-chat. The Pages Function is model-agnostic: it forwards input_value and transforms the stream (proxy post
).
How to swap in Langflow ¶
- Open the DataStax Astra Docs RAG flow in Langflow
- Remove or bypass the template OpenAI Language Model node (
LanguageModelComponent-cAjdOin the exported template) - Add DeepSeek Language Model component (or generic OpenAI-compatible node pointed at DeepSeek base URL)
- Set model to
deepseek-v4-flash(ordeepseek-chatif you prefer quality over speed) - Enable stream
- Connect: Prompt output → DeepSeek input → ChatOutput
- Store API key in global variables, not in the node field that exports with JSON
Smoke test in Playground before touching the public site:
curl -s --compressed \
-H "x-api-key: $LANGFLOW_API_KEY" \
-H "Content-Type: application/json" \
-d '{"input_value":"What is a collection in Astra DB?","output_type":"chat","input_type":"chat","session_id":"smoke-test"}' \
"$LANGFLOW_URL/api/v1/run/datastax-astra-chat?stream=false"
Compare answer grounding to the OpenAI node on the same retrieved context.
Why swap ¶
- Cost: long doc-style answers add up on OpenAI chat pricing; DeepSeek flash-tier models are cheaper for a public, unauthenticated chat
- Streaming: required for the Hugo UI cursor UX (streaming chat UI post ); both providers support it
- Quality: on “explain this Astra feature from context” tasks, quality has been acceptable for a personal reference tool
I did not run a formal benchmark: spot-checks against known doc passages after the swap. Edge-case API details still need human verification against the live docs.
Streaming and the proxy ¶
Langflow emits NDJSON token events when streaming works. Some model paths deliver the full message only on the end event.
The Pages Function tracks whether any tokens arrived; if not, it extracts text from event.data.result.message and sends one SSE chunk (proxy code
). Test both paths after a model swap.
The Hugo UI re-parses markdown on every chunk with marked. Watch code fences and numbered lists during streaming.
What to watch for ¶
- Instruction following: refusal behaviour when context is empty (guardrails )
- Markdown formatting: code fences and lists in streamed output
- Latency: flash models vary; 60s proxy timeout covers slow runs
- API compatibility: DeepSeek uses OpenAI-compatible chat completions; Langflow’s component handles base URL + model name
If quality drops on a class of questions (dense REST tables, version-specific defaults), test one model change in Playground before production.
Reverting ¶
Swap the Language Model component back to OpenAI in Langflow, point global variable at OpenAI key, redeploy nothing on Hugo if endpoint name unchanged. Ingest embeddings unaffected.
Cost vs quality trade-off (informal) ¶
| Model tier | Typical use |
|---|---|
deepseek-v4-flash |
Public chat, fast answers, lower cost |
deepseek-chat |
Higher quality, still cheaper than flagship OpenAI for many prompts |
| OpenAI gpt-4o-mini | Revert path if DeepSeek drifts on Astra-specific details |
Generation cost dominates per-message spend; embedding cost dominates refresh (re-ingest post ).
Next in the series ¶
- Self-hosting Langflow behind a public static site : where DeepSeek keys live alongside other provider secrets
- Docs-only guardrails : model behaviour when retrieval is empty
- Proxying Langflow from Cloudflare Pages Functions : streams DeepSeek output to the browser
Series index: Building Astra Docs Chat
Open Astra Docs Chat : compare feel and detail to asking the same questions in Langflow Playground with each model if you are evaluating.