Swapping DeepSeek in for OpenAI chat in a Langflow RAG flow

The Langflow DataStax RAG template ships with an OpenAI chat node. For Astra Docs Chat , I replaced it with DeepSeek for streaming answers: cheaper at volume, acceptable on technical Q&A over retrieved docs.

Context: Building Astra Docs Chat · Langflow chat flow · Astra DB vector store

Embeddings stayed on OpenAI text-embedding-3-small: only the generation step changed. See chunking and embedding post for why those stay paired.

Try it: Astra Docs Chat

What changed in the flow ¶

Component	v1 template	This project
Language model	OpenAI (e.g. gpt-4o-mini)	DeepSeek `deepseek-v4-flash`
Streaming	optional	enabled (`stream: true`)
Temperature	template default	0.1 (slightly factual bias)
Embeddings	OpenAI	unchanged
Astra DB	unchanged	unchanged

Langflow DeepSeek language model node configured with deepseek-v4-flash, streaming enabled, temperature 0.1, and DEEPSEEK_API_KEY global variable connected to Chat Output

DeepSeek API key stored as Langflow global variable DEEPSEEK_API_KEY: same pattern as OPENAI_API_KEY and Astra tokens.

The chat endpoint name stays datastax-astra-chat. The Pages Function is model-agnostic: it forwards input_value and transforms the stream (proxy post ).

How to swap in Langflow ¶

Open the DataStax Astra Docs RAG flow in Langflow
Remove or bypass the template OpenAI Language Model node (LanguageModelComponent-cAjdO in the exported template)
Add DeepSeek Language Model component (or generic OpenAI-compatible node pointed at DeepSeek base URL)
Set model to deepseek-v4-flash (or deepseek-chat if you prefer quality over speed)
Enable stream
Connect: Prompt output → DeepSeek input → ChatOutput
Store API key in global variables, not in the node field that exports with JSON

Smoke test in Playground before touching the public site:

curl -s --compressed \
  -H "x-api-key: $LANGFLOW_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"input_value":"What is a collection in Astra DB?","output_type":"chat","input_type":"chat","session_id":"smoke-test"}' \
  "$LANGFLOW_URL/api/v1/run/datastax-astra-chat?stream=false"

Compare answer grounding to the OpenAI node on the same retrieved context.

Why swap ¶

Cost: long doc-style answers add up on OpenAI chat pricing; DeepSeek flash-tier models are cheaper for a public, unauthenticated chat
Streaming: required for the Hugo UI cursor UX (streaming chat UI post ); both providers support it
Quality: on “explain this Astra feature from context” tasks, quality has been acceptable for a personal reference tool

I did not run a formal benchmark: spot-checks against known doc passages after the swap. Edge-case API details still need human verification against the live docs.

Streaming and the proxy ¶

Langflow emits NDJSON token events when streaming works. Some model paths deliver the full message only on the end event.

The Pages Function tracks whether any tokens arrived; if not, it extracts text from event.data.result.message and sends one SSE chunk (proxy code ). Test both paths after a model swap.

The Hugo UI re-parses markdown on every chunk with marked. Watch code fences and numbered lists during streaming.

What to watch for ¶

Instruction following: refusal behaviour when context is empty (guardrails )
Markdown formatting: code fences and lists in streamed output
Latency: flash models vary; 60s proxy timeout covers slow runs
API compatibility: DeepSeek uses OpenAI-compatible chat completions; Langflow’s component handles base URL + model name

If quality drops on a class of questions (dense REST tables, version-specific defaults), test one model change in Playground before production.

Reverting ¶

Swap the Language Model component back to OpenAI in Langflow, point global variable at OpenAI key, redeploy nothing on Hugo if endpoint name unchanged. Ingest embeddings unaffected.

Cost vs quality trade-off (informal) ¶

Model tier	Typical use
`deepseek-v4-flash`	Public chat, fast answers, lower cost
`deepseek-chat`	Higher quality, still cheaper than flagship OpenAI for many prompts
OpenAI gpt-4o-mini	Revert path if DeepSeek drifts on Astra-specific details

Generation cost dominates per-message spend; embedding cost dominates refresh (re-ingest post ).

Next in the series ¶

Self-hosting Langflow behind a public static site : where DeepSeek keys live alongside other provider secrets
Docs-only guardrails : model behaviour when retrieval is empty
Proxying Langflow from Cloudflare Pages Functions : streams DeepSeek output to the browser

Series index: Building Astra Docs Chat

Open Astra Docs Chat : compare feel and detail to asking the same questions in Langflow Playground with each model if you are evaluating.