Langflow RAG over Astra DB: ingest and chat flows

The overview of Astra Docs Chat names Langflow once and moves on: ingest graph, chat graph, done. This post opens the hood: component chains, two published endpoints, where API keys live, and why orchestration stayed in Langflow instead of a Cloudflare Worker.

Start here if you missed the big picture: Building Astra Docs Chat

Related: Batch ingest · Proxy

Two endpoints, one product ¶

Endpoint	Purpose	Called by
`datastax-astra-ingest`	File → vectors in Astra DB	Local batch ingest script (batch ingest post )
`datastax-astra-chat`	Question → retrieve → LLM answer	Pages Function `/api/astra-chat` (proxy post )

Splitting ingest and chat keeps the public surface minimal. Visitors never trigger file upload or embedding: only the retrieval + generation path.

The main RAG template flow (DataStax Astra Docs RAG) originally combined both paths. I later cloned the ingest chain into a dedicated flow so batch jobs hit a narrow endpoint without touching chat configuration.

Ingest flow ¶

File-ingest
  → SplitText-ingest
  → OpenAI Embeddings (text-embedding-3-small)
  → AstraDB-ingest (collection: datastax_astra_docs)

File: path supplied per run via API tweak. The batch script uploads markdown to /api/v2/files/, then sets File-ingest.path to the returned server path.

SplitText: breaks each page into chunks before embedding. Default settings on the cloned template are chunk size 1000, overlap 200, separator newline. That is not sacred: retrieval quality depends on it. See chunking technical docs for RAG .

OpenAI Embeddings: text-embedding-3-small for the indexed corpus. Chat uses DeepSeek for generation; embeddings stayed on OpenAI because the template already worked and query-time vectors must match ingest-time vectors.

AstraDB: Langflow’s DataStax bundle component. Writes into collection datastax_astra_docs in default_keyspace. Credentials come from Langflow global variables, not pasted into exported flow JSON.

Langflow ingest flow: Read File, Split Text (chunk size 1000, overlap 200), OpenAI Embeddings, and Astra DB wired into collection datastax_astra_docs

The graph above is the ingest half of the stack: file in, chunks out, vectors written to Astra DB. The batch script sets the file path per run via API tweak; you configure split size and collection name once in Langflow. (The screenshot shows an earlier template with text-embedding-ada-002; production uses text-embedding-3-small, described in the chunking post .)

On success, Langflow logs a message like “Adding N documents to the Vector Store”. The batch script treats that string as the pass condition.

Chat flow ¶

ChatInput
  → AstraDB (similarity search, top 4)
  → Parser (context from retrieved chunks)
  → Prompt
  → DeepSeek (deepseek-v4-flash, stream: true)
  → ChatOutput

ChatInput: receives input_value, session_id, and chat types from the run API (what the Pages Function forwards).

AstraDB search: same collection as ingest, same embedding model at query time. Template defaults: Vector Search, Similarity, 4 results, score threshold 0 (no minimum in v1).

Prompt: combines retrieved doc text with instructions to answer as a helpful documentation assistant. The template shape is roughly:

{context}

---

Given the context above, answer the question as best as possible.

Question: {question}

Answer:

v1 does not hard-refuse when retrieval is empty. The model may answer from general knowledge. Docs-only guardrails covers the planned tightening.

DeepSeek: replaced the template’s OpenAI chat node for cost and streaming on technical Q&A. Details in swapping DeepSeek for OpenAI chat in Langflow .

ChatOutput: streams tokens back through Langflow’s NDJSON stream, which the Pages Function converts to SSE for the browser.

Global variables for secrets ¶

Flow components reference placeholders like OPENAI_API_KEY, Astra application tokens, and DEEPSEEK_API_KEY via Langflow global variables: set in the Langflow UI or server env, not in component value fields that export with the graph.

That matters when you:

Version flows as JSON
Clone flows between environments
Avoid leaking keys in screenshots or git commits

The Pages Function only holds LANGFLOW_API_KEY + LANGFLOW_URL. Langflow itself holds everything the graph needs to talk to OpenAI, Astra, and DeepSeek. Layered secrets: compromise of Pages secrets does not automatically expose OpenAI billing unless the Langflow key allows arbitrary flow edits.

For where Langflow runs, see self-hosting Langflow behind a public static site .

Why not implement RAG in a Worker? ¶

Cloudflare Workers can call vector APIs and LLMs directly. I skipped reimplementing retrieve-then-prompt because:

Langflow already had the graph: File/split/embed/store and chat/retrieve/LLM were working in the DataStax bundle template
Ingest tweaks are awkward in raw code: per-file File component paths map naturally to Langflow’s tweak API
Iteration speed: prompt and component changes happen in Langflow Playground without redeploying the Hugo site

The Worker/Pages layer stays thin: validate, proxy, transform stream. Intelligence and retrieval orchestration stay where the visual graph is.

Trade-off: you operate a private Langflow instance. Acceptable for a personal site; a team might prefer managed retrieval APIs or Cloudflare Vectorize with a single Worker.

Streaming contract ¶

Chat runs with ?stream=true. Langflow emits NDJSON lines:

token events with partial text: {"event":"token","data":{"chunk":"Hello"}}
end event with final result metadata

The proxy normalises that to SSE for Hugo’s vanilla JS client:

data: {"chunk":"Hello"}

data: [DONE]

If your flow uses a non-streaming model path, the proxy’s end-event fallback still delivers one chunk. See the proxy post for code.

Astra DB in this stack ¶

Using Astra for a DataStax-documentation chat is intentional: same product family as the corpus, hybrid search available if you extend the flow, Langflow bundle support out of the box. A focused post on collection setup and search modes: using Astra DB as the RAG vector store .

Verify the flows before going public ¶

Playground: ask “What are PCU groups?” and confirm retrieved content matches known doc wording
curl the chat endpoint with x-api-key from a trusted machine (not from browser devtools on a public page):

curl -s --compressed \
  -H "x-api-key: $LANGFLOW_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"input_value":"What is a collection in Astra DB?","output_type":"chat","input_type":"chat","session_id":"smoke-test"}' \
  "$LANGFLOW_URL/api/v1/run/datastax-astra-chat?stream=false"

Pages preview: proxy + UI on a Cloudflare preview URL
Production: confirm Network tab shows only same-origin /api/astra-chat

Without x-api-key, Langflow returns 403. That is why the browser never calls it directly.

Next in the series ¶

Building a streaming chat UI in Hugo : front end that consumes the SSE stream
Docs-only guardrails when RAG retrieval finds nothing : tightening answers when retrieval is empty
Batch-ingesting markdown through Langflow : the script that calls the ingest endpoint

Series index: Building Astra Docs Chat

Open Astra Docs Chat and compare answers to Langflow Playground output for the same question.