Guides

Documentation

Technical reference for the AI-powered question answering endpoint.

POST /v1/query

Get AI-generated answers to questions about your documents. Combines semantic search with GPT-4 generation.

Authentication

  • API Key or Frontend Token

Content-Type

http
Content-Type: application/json

Request Body

json
{ "datasetId": "string", "question": "string", "filters": [], "stream": boolean }

Parameters

ParameterTypeRequiredDefaultDescription
datasetIdstringYes*-Dataset to query (*optional with frontend token)
questionstringYes-Question or prompt for AI
filtersarrayNo[]Qdrant filters for metadata-based filtering
streambooleanNotrueEnable streaming responses (SSE)

Non-Streaming Mode

Request

bash
curl -X POST https://api.easyrag.com/v1/query \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "datasetId": "my-dataset", "question": "What is the refund policy?", "stream": false }'

Response (200)

json
{ "success": true, "data": { "result": "Based on the documentation, our refund policy allows returns within 30 days of purchase for any reason. You can request a refund by contacting support@company.com with your order number. Processing typically takes 5-7 business days.", "sources": [ { "pageContent": "Refunds are available within 30 days of purchase...", "metadata": { "fileId": "f7a3b2c1", "originalName": "refund-policy.pdf" } } ] } }

Response Fields

FieldTypeDescription
successbooleanAlways true on success
dataobjectQuery result object
data.resultstringAI-generated answer
data.sourcesarraySource chunks used for answer

Note: The exact structure of data depends on EmbedJS's query() method.


Streaming Mode (Default)

Request

bash
curl -X POST https://api.easyrag.com/v1/query \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "datasetId": "my-dataset", "question": "What is the refund policy?", "stream": true }'

Response Headers

http
Content-Type: text/event-stream Cache-Control: no-cache Connection: keep-alive

Response Format (SSE)

Server-Sent Events with JSON payloads:

data: {"delta":"Based on"}

data: {"delta":" the documentation,"}

data: {"delta":" our refund"}

data: {"delta":" policy allows"}

data: {"done":true}

Event Types

Text Delta

json
{ "delta": "string" }

Completion

json
{ "done": true }

Error

json
{ "error": "string" }

Example Stream

data: {"delta":"Based on the documentation, "}

data: {"delta":"our refund policy allows "}

data: {"delta":"returns within 30 days "}

data: {"delta":"of purchase for any reason. "}

data: {"delta":"You can request a refund "}

data: {"delta":"by contacting support@company.com "}

data: {"delta":"with your order number."}

data: {"done":true}

Filtering

Apply metadata filters to restrict which documents are considered.

Filter Structure

json
{ "filters": [ { "key": "metadata_field", "match": { "value": "exact_value" } } ] }

Example with Filters

json
{ "datasetId": "company-docs", "question": "What is the vacation policy?", "filters": [ { "key": "department", "match": { "value": "HR" } }, { "key": "year", "match": { "value": 2024 } } ] }

Request Examples

JavaScript (Streaming)

javascript
async function streamQuery(datasetId, question) { const response = await fetch('https://api.easyrag.com/v1/query', { method: 'POST', headers: { 'Authorization': `Bearer ${apiKey}`, 'Content-Type': 'application/json' }, body: JSON.stringify({ datasetId, question, stream: true }) }); const reader = response.body.getReader(); const decoder = new TextDecoder(); while (true) { const { done, value } = await reader.read(); if (done) break; const chunk = decoder.decode(value); const lines = chunk.split('\n'); for (const line of lines) { if (line.startsWith('data: ')) { const data = JSON.parse(line.slice(6)); if (data.delta) { console.log(data.delta); } else if (data.done) { console.log('Stream complete'); } else if (data.error) { console.error('Error:', data.error); } } } } }

JavaScript (Non-Streaming)

javascript
const response = await fetch('https://api.easyrag.com/v1/query', { method: 'POST', headers: { 'Authorization': `Bearer ${apiKey}`, 'Content-Type': 'application/json' }, body: JSON.stringify({ datasetId: 'my-dataset', question: 'What are the key features?', stream: false }) }); const { data } = await response.json(); console.log('Answer:', data.result);

Python (Non-Streaming)

python
import requests response = requests.post( 'https://api.easyrag.com/v1/query', headers={ 'Authorization': f'Bearer {api_key}', 'Content-Type': 'application/json' }, json={ 'datasetId': 'my-dataset', 'question': 'What are the key features?', 'stream': False } ) data = response.json() answer = data['data']['result']

Python (Streaming)

python
import requests import json response = requests.post( 'https://api.easyrag.com/v1/query', headers={ 'Authorization': f'Bearer {api_key}', 'Content-Type': 'application/json' }, json={ 'datasetId': 'my-dataset', 'question': 'What are the key features?', 'stream': True }, stream=True ) for line in response.iter_lines(): if line: line_str = line.decode('utf-8') if line_str.startswith('data: '): data = json.loads(line_str[6:]) if 'delta' in data: print(data['delta'], end='', flush=True) elif 'done' in data: print('\nComplete') elif 'error' in data: print(f'\nError: {data["error"]}')

Error Responses

400 Bad Request

Missing required fields

json
{ "error": "datasetId and question are required" }

Dataset mismatch

json
{ "error": "datasetId mismatch between token and request" }

401 Unauthorized

json
{ "error": "Missing API key or token" }

402 Payment Required

json
{ "error": "INSUFFICIENT_CREDITS", "message": "You are out of credits. Please top up to continue.", "details": { "required": 1, "available": 0 } }

500 Internal Server Error

json
{ "error": "Internal error" }

Stream Error

During streaming, errors are sent as events:

data: {"error":"Streaming error on server"}

Technical Details

Processing Pipeline

  1. Search Phase

    • Question converted to embedding
    • Top 5-10 relevant chunks retrieved
    • Chunks ranked by similarity score
  2. Context Building

    • Chunks formatted with metadata
    • Source information included
    • Context sent to LLM
  3. Generation Phase

    • GPT-4.1-mini generates answer
    • Response streamed or returned complete
    • Based only on provided context

System Prompt

You are a RAG assistant. Answer ONLY based on the provided context. 
If the context is not enough, say you don't know and do NOT hallucinate.

Context Format

### Document 1
Score: 0.892
Source: user-manual.pdf

[chunk content]

---

### Document 2
Score: 0.856
Source: faq.pdf

[chunk content]

---

User question:
What is the refund policy?

Models Used

  • Embeddings: OpenAI text-embedding-3-small (1536 dimensions)
  • LLM: GPT-4.1-mini
  • Streaming: OpenAI Responses API

Billing

  • Cost: 0.1 credit per query (1 unit)
  • Same cost: Both streaming and non-streaming
  • Includes: Retrieval + LLM generation
  • Charged: Before processing
OperationCost
1 query0.1 credit
10 queries1 credit
100 queries10 credits

Rate Limits

  • Limit: 1000 requests/minute per customer
  • Shared: With search endpoint
  • Concurrent: No specific limit on concurrent streams

Streaming vs Non-Streaming

FeatureStreamingNon-Streaming
UXReal-time (word-by-word)Wait for complete response
ImplementationMore complex (SSE)Simple (single JSON response)
Use CaseChat interfacesAPI integrations, batch
Cost0.1 credit0.1 credit
Response TimeStarts immediately2-5s average

Use streaming when:

  • Building chat interfaces
  • User experience is critical
  • Long responses expected

Use non-streaming when:

  • API integrations
  • Batch processing
  • Don't need real-time updates

Comparison: Query vs Search

Feature/v1/query/v1/search
ReturnsAI answerRaw chunks
LLMGPT-4None
StreamingYesNo
SourcesReferencedIncluded
Cost0.1 credit0.1 credit
Speed2-5s~500ms

Use /v1/query when:

  • Want ready-to-use answers
  • Building standard chat
  • Trust GPT-4 generation

Use /v1/search when:

  • Need raw chunks
  • Building custom UI
  • Using own LLM

Best Practices

1. Use Streaming for Chat

javascript
// ✅ Good: Streaming for chat UI const answer = await streamQuery(datasetId, question); // ❌ Bad: Non-streaming for chat (poor UX) const { data } = await query(datasetId, question, { stream: false });

2. Handle Empty Context

javascript
const { data } = await query(datasetId, question, { stream: false }); if (data.result.includes("don't have enough information")) { console.log('No relevant documents found'); }

3. Apply Filters Server-Side

javascript
// ✅ Good: Backend controls filters app.post('/api/query', authenticateUser, async (req, res) => { const answer = await easyragQuery({ datasetId: 'shared-docs', question: req.body.question, filters: [ { key: 'userId', match: { value: req.user.id } } ] }); res.json(answer); });

4. Show Loading States

javascript
const [loading, setLoading] = useState(false); const handleSubmit = async () => { setLoading(true); try { await streamQuery(datasetId, question); } finally { setLoading(false); } };

TypeScript Definition

typescript
interface QueryRequest { datasetId: string; question: string; filters?: Filter[]; stream?: boolean; } interface Filter { key: string; match: { value: string | number | boolean; }; } // Non-streaming interface QueryResponse { success: true; data: { result: string; sources: Array<{ pageContent: string; metadata: Record<string, any>; }>; }; } // Streaming interface StreamDelta { delta: string; } interface StreamDone { done: true; } interface StreamError { error: string; } type StreamEvent = StreamDelta | StreamDone | StreamError;

Grounded Answers

The AI is instructed to:

DO:

  • Answer based only on provided documents
  • Cite specific information from context
  • Admit when information is insufficient

DON'T:

  • Hallucinate information
  • Make assumptions beyond context
  • Provide answers not supported by documents

Example Responses

Sufficient context:

"Based on the user manual, to reset your password:
1. Click 'Forgot Password' on the login page
2. Enter your email address
3. Check your inbox for the reset link
The link expires after 24 hours."

Insufficient context:

"I don't have enough information in the provided documents 
to answer that question about server configuration."

Notes

  • Queries are charged before processing
  • Empty datasets return "no relevant context" to LLM
  • Frontend tokens are dataset-scoped for security
  • System prompt prevents hallucinations
  • Context includes source metadata
  • Streaming uses Server-Sent Events (SSE)
  • Minimum SSE chunk size: 30 characters
  • Non-streaming waits for complete response
  • Filters applied before search phase
  • No token memory between queries

Related Endpoints

  • POST /v1/search - Semantic search without LLM
  • POST /v1/files/upload - Upload queryable files
  • GET /v1/files - List indexed files
  • POST /v1/tokens/create - Generate frontend tokens