The overview of Astra Docs Chat names Langflow once and moves on: ingest graph, chat graph, done. This post opens the hood: component chains, two published endpoints, where API keys live, and why orchestration stayed in Langflow instead of a Cloudflare Worker.
Start here if you missed the big picture: Building Astra Docs Chat
Related: Batch ingest · Proxy
Two endpoints, one product ¶
| Endpoint | Purpose | Called by |
|---|---|---|
datastax-astra-ingest |
File → vectors in Astra DB | Local batch ingest script (batch ingest post ) |
datastax-astra-chat |
Question → retrieve → LLM answer | Pages Function /api/astra-chat (proxy post
) |
Splitting ingest and chat keeps the public surface minimal. Visitors never trigger file upload or embedding: only the retrieval + generation path.
The main RAG template flow (DataStax Astra Docs RAG) originally combined both paths. I later cloned the ingest chain into a dedicated flow so batch jobs hit a narrow endpoint without touching chat configuration.
Ingest flow ¶
File-ingest
→ SplitText-ingest
→ OpenAI Embeddings (text-embedding-3-small)
→ AstraDB-ingest (collection: datastax_astra_docs)
File: path supplied per run via API tweak. The batch script uploads markdown to /api/v2/files/, then sets File-ingest.path to the returned server path.
SplitText: breaks each page into chunks before embedding. Default settings on the cloned template are chunk size 1000, overlap 200, separator newline. That is not sacred: retrieval quality depends on it. See chunking technical docs for RAG .
OpenAI Embeddings: text-embedding-3-small for the indexed corpus. Chat uses DeepSeek for generation; embeddings stayed on OpenAI because the template already worked and query-time vectors must match ingest-time vectors.
AstraDB: Langflow’s DataStax bundle component. Writes into collection datastax_astra_docs in default_keyspace. Credentials come from Langflow global variables, not pasted into exported flow JSON.
The graph above is the ingest half of the stack: file in, chunks out, vectors written to Astra DB. The batch script sets the file path per run via API tweak; you configure split size and collection name once in Langflow. (The screenshot shows an earlier template with text-embedding-ada-002; production uses text-embedding-3-small, described in the chunking post
.)
On success, Langflow logs a message like “Adding N documents to the Vector Store”. The batch script treats that string as the pass condition.
Chat flow ¶
ChatInput
→ AstraDB (similarity search, top 4)
→ Parser (context from retrieved chunks)
→ Prompt
→ DeepSeek (deepseek-v4-flash, stream: true)
→ ChatOutput
ChatInput: receives input_value, session_id, and chat types from the run API (what the Pages Function forwards).
AstraDB search: same collection as ingest, same embedding model at query time. Template defaults: Vector Search, Similarity, 4 results, score threshold 0 (no minimum in v1).
Prompt: combines retrieved doc text with instructions to answer as a helpful documentation assistant. The template shape is roughly:
{context}
---
Given the context above, answer the question as best as possible.
Question: {question}
Answer:
v1 does not hard-refuse when retrieval is empty. The model may answer from general knowledge. Docs-only guardrails covers the planned tightening.
DeepSeek: replaced the template’s OpenAI chat node for cost and streaming on technical Q&A. Details in swapping DeepSeek for OpenAI chat in Langflow .
ChatOutput: streams tokens back through Langflow’s NDJSON stream, which the Pages Function converts to SSE for the browser.
Global variables for secrets ¶
Flow components reference placeholders like OPENAI_API_KEY, Astra application tokens, and DEEPSEEK_API_KEY via Langflow global variables: set in the Langflow UI or server env, not in component value fields that export with the graph.
That matters when you:
- Version flows as JSON
- Clone flows between environments
- Avoid leaking keys in screenshots or git commits
The Pages Function only holds LANGFLOW_API_KEY + LANGFLOW_URL. Langflow itself holds everything the graph needs to talk to OpenAI, Astra, and DeepSeek. Layered secrets: compromise of Pages secrets does not automatically expose OpenAI billing unless the Langflow key allows arbitrary flow edits.
For where Langflow runs, see self-hosting Langflow behind a public static site .
Why not implement RAG in a Worker? ¶
Cloudflare Workers can call vector APIs and LLMs directly. I skipped reimplementing retrieve-then-prompt because:
- Langflow already had the graph: File/split/embed/store and chat/retrieve/LLM were working in the DataStax bundle template
- Ingest tweaks are awkward in raw code: per-file File component paths map naturally to Langflow’s tweak API
- Iteration speed: prompt and component changes happen in Langflow Playground without redeploying the Hugo site
The Worker/Pages layer stays thin: validate, proxy, transform stream. Intelligence and retrieval orchestration stay where the visual graph is.
Trade-off: you operate a private Langflow instance. Acceptable for a personal site; a team might prefer managed retrieval APIs or Cloudflare Vectorize with a single Worker.
Streaming contract ¶
Chat runs with ?stream=true. Langflow emits NDJSON lines:
tokenevents with partial text:{"event":"token","data":{"chunk":"Hello"}}endevent with final result metadata
The proxy normalises that to SSE for Hugo’s vanilla JS client:
data: {"chunk":"Hello"}
data: [DONE]
If your flow uses a non-streaming model path, the proxy’s end-event fallback still delivers one chunk. See the proxy post
for code.
Astra DB in this stack ¶
Using Astra for a DataStax-documentation chat is intentional: same product family as the corpus, hybrid search available if you extend the flow, Langflow bundle support out of the box. A focused post on collection setup and search modes: using Astra DB as the RAG vector store .
Verify the flows before going public ¶
- Playground: ask “What are PCU groups?” and confirm retrieved content matches known doc wording
- curl the chat endpoint with
x-api-keyfrom a trusted machine (not from browser devtools on a public page):
curl -s --compressed \
-H "x-api-key: $LANGFLOW_API_KEY" \
-H "Content-Type: application/json" \
-d '{"input_value":"What is a collection in Astra DB?","output_type":"chat","input_type":"chat","session_id":"smoke-test"}' \
"$LANGFLOW_URL/api/v1/run/datastax-astra-chat?stream=false"
- Pages preview: proxy + UI on a Cloudflare preview URL
- Production: confirm Network tab shows only same-origin
/api/astra-chat
Without x-api-key, Langflow returns 403. That is why the browser never calls it directly.
Next in the series ¶
- Building a streaming chat UI in Hugo : front end that consumes the SSE stream
- Docs-only guardrails when RAG retrieval finds nothing : tightening answers when retrieval is empty
- Batch-ingesting markdown through Langflow : the script that calls the ingest endpoint
Series index: Building Astra Docs Chat
Open Astra Docs Chat and compare answers to Langflow Playground output for the same question.