Guides

Documentation

Technical reference for the AI-powered question answering endpoint.

POST `/v1/query`

Get AI-generated answers to questions about your documents. Combines semantic search with GPT-4 generation.

Authentication

API Key or Frontend Token

Content-Type

http
Content-Type: application/json

Request Body

json
{
  "datasetId": "string",
  "question": "string",
  "filters": [],
  "stream": boolean
}

Parameters

Parameter	Type	Required	Default	Description
`datasetId`	string	Yes*	-	Dataset to query (*optional with frontend token)
`question`	string	Yes	-	Question or prompt for AI
`filters`	array	No	`[]`	Qdrant filters for metadata-based filtering
`stream`	boolean	No	`true`	Enable streaming responses (SSE)

Non-Streaming Mode

Request

bash
curl -X POST https://api.easyrag.com/v1/query \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "datasetId": "my-dataset",
    "question": "What is the refund policy?",
    "stream": false
  }'

Response (200)

json
{
  "success": true,
  "data": {
    "result": "Based on the documentation, our refund policy allows returns within 30 days of purchase for any reason. You can request a refund by contacting support@company.com with your order number. Processing typically takes 5-7 business days.",
    "sources": [
      {
        "pageContent": "Refunds are available within 30 days of purchase...",
        "metadata": {
          "fileId": "f7a3b2c1",
          "originalName": "refund-policy.pdf"
        }
      }
    ]
  }
}

Response Fields

Field	Type	Description
`success`	boolean	Always `true` on success
`data`	object	Query result object
`data.result`	string	AI-generated answer
`data.sources`	array	Source chunks used for answer

Note: The exact structure of data depends on EmbedJS's query() method.

Streaming Mode (Default)

Request

bash
curl -X POST https://api.easyrag.com/v1/query \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "datasetId": "my-dataset",
    "question": "What is the refund policy?",
    "stream": true
  }'

Response Headers

http
Content-Type: text/event-stream
Cache-Control: no-cache
Connection: keep-alive

Response Format (SSE)

Server-Sent Events with JSON payloads:

data: {"delta":"Based on"}

data: {"delta":" the documentation,"}

data: {"delta":" our refund"}

data: {"delta":" policy allows"}

data: {"done":true}

Event Types

Text Delta

json
{
  "delta": "string"
}

Completion

json
{
  "done": true
}

Error

json
{
  "error": "string"
}

Example Stream

data: {"delta":"Based on the documentation, "}

data: {"delta":"our refund policy allows "}

data: {"delta":"returns within 30 days "}

data: {"delta":"of purchase for any reason. "}

data: {"delta":"You can request a refund "}

data: {"delta":"by contacting support@company.com "}

data: {"delta":"with your order number."}

data: {"done":true}

Filtering

Apply metadata filters to restrict which documents are considered.

Filter Structure

json
{
  "filters": [
    {
      "key": "metadata_field",
      "match": { "value": "exact_value" }
    }
  ]
}

Example with Filters

json
{
  "datasetId": "company-docs",
  "question": "What is the vacation policy?",
  "filters": [
    {
      "key": "department",
      "match": { "value": "HR" }
    },
    {
      "key": "year",
      "match": { "value": 2024 }
    }
  ]
}

Request Examples

JavaScript (Streaming)

javascript
async function streamQuery(datasetId, question) {
  const response = await fetch('https://api.easyrag.com/v1/query', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${apiKey}`,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      datasetId,
      question,
      stream: true
    })
  });

  const reader = response.body.getReader();
  const decoder = new TextDecoder();

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    const chunk = decoder.decode(value);
    const lines = chunk.split('\n');

    for (const line of lines) {
      if (line.startsWith('data: ')) {
        const data = JSON.parse(line.slice(6));
        
        if (data.delta) {
          console.log(data.delta);
        } else if (data.done) {
          console.log('Stream complete');
        } else if (data.error) {
          console.error('Error:', data.error);
        }
      }
    }
  }
}

JavaScript (Non-Streaming)

javascript
const response = await fetch('https://api.easyrag.com/v1/query', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${apiKey}`,
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    datasetId: 'my-dataset',
    question: 'What are the key features?',
    stream: false
  })
});

const { data } = await response.json();
console.log('Answer:', data.result);

Python (Non-Streaming)

python
import requests

response = requests.post(
    'https://api.easyrag.com/v1/query',
    headers={
        'Authorization': f'Bearer {api_key}',
        'Content-Type': 'application/json'
    },
    json={
        'datasetId': 'my-dataset',
        'question': 'What are the key features?',
        'stream': False
    }
)

data = response.json()
answer = data['data']['result']

Python (Streaming)

python
import requests
import json

response = requests.post(
    'https://api.easyrag.com/v1/query',
    headers={
        'Authorization': f'Bearer {api_key}',
        'Content-Type': 'application/json'
    },
    json={
        'datasetId': 'my-dataset',
        'question': 'What are the key features?',
        'stream': True
    },
    stream=True
)

for line in response.iter_lines():
    if line:
        line_str = line.decode('utf-8')
        if line_str.startswith('data: '):
            data = json.loads(line_str[6:])
            
            if 'delta' in data:
                print(data['delta'], end='', flush=True)
            elif 'done' in data:
                print('\nComplete')
            elif 'error' in data:
                print(f'\nError: {data["error"]}')

Error Responses

400 Bad Request

Missing required fields

json
{
  "error": "datasetId and question are required"
}

Dataset mismatch

json
{
  "error": "datasetId mismatch between token and request"
}

401 Unauthorized

json
{
  "error": "Missing API key or token"
}

402 Payment Required

json
{
  "error": "INSUFFICIENT_CREDITS",
  "message": "You are out of credits. Please top up to continue.",
  "details": {
    "required": 1,
    "available": 0
  }
}

500 Internal Server Error

json
{
  "error": "Internal error"
}

Stream Error

During streaming, errors are sent as events:

data: {"error":"Streaming error on server"}

Technical Details

Processing Pipeline

Search Phase
- Question converted to embedding
- Top 5-10 relevant chunks retrieved
- Chunks ranked by similarity score
Context Building
- Chunks formatted with metadata
- Source information included
- Context sent to LLM
Generation Phase
- GPT-4.1-mini generates answer
- Response streamed or returned complete
- Based only on provided context

System Prompt

You are a RAG assistant. Answer ONLY based on the provided context. 
If the context is not enough, say you don't know and do NOT hallucinate.

Context Format

### Document 1
Score: 0.892
Source: user-manual.pdf

[chunk content]

---

### Document 2
Score: 0.856
Source: faq.pdf

[chunk content]

---

User question:
What is the refund policy?

Models Used

Embeddings: OpenAI text-embedding-3-small (1536 dimensions)
LLM: GPT-4.1-mini
Streaming: OpenAI Responses API

Billing

Cost: 0.1 credit per query (1 unit)
Same cost: Both streaming and non-streaming
Includes: Retrieval + LLM generation
Charged: Before processing

Operation	Cost
1 query	0.1 credit
10 queries	1 credit
100 queries	10 credits

Rate Limits

Limit: 1000 requests/minute per customer
Shared: With search endpoint
Concurrent: No specific limit on concurrent streams

Streaming vs Non-Streaming

Feature	Streaming	Non-Streaming
UX	Real-time (word-by-word)	Wait for complete response
Implementation	More complex (SSE)	Simple (single JSON response)
Use Case	Chat interfaces	API integrations, batch
Cost	0.1 credit	0.1 credit
Response Time	Starts immediately	2-5s average

Use streaming when:

Building chat interfaces
User experience is critical
Long responses expected

Use non-streaming when:

API integrations
Batch processing
Don't need real-time updates

Comparison: Query vs Search

Feature	`/v1/query`	`/v1/search`
Returns	AI answer	Raw chunks
LLM	GPT-4	None
Streaming	Yes	No
Sources	Referenced	Included
Cost	0.1 credit	0.1 credit
Speed	2-5s	~500ms

Use /v1/query when:

Want ready-to-use answers
Building standard chat
Trust GPT-4 generation

Use /v1/search when:

Need raw chunks
Building custom UI
Using own LLM

Best Practices

1. Use Streaming for Chat

javascript
// ✅ Good: Streaming for chat UI
const answer = await streamQuery(datasetId, question);

// ❌ Bad: Non-streaming for chat (poor UX)
const { data } = await query(datasetId, question, { stream: false });

2. Handle Empty Context

javascript
const { data } = await query(datasetId, question, { stream: false });

if (data.result.includes("don't have enough information")) {
  console.log('No relevant documents found');
}

3. Apply Filters Server-Side

javascript
// ✅ Good: Backend controls filters
app.post('/api/query', authenticateUser, async (req, res) => {
  const answer = await easyragQuery({
    datasetId: 'shared-docs',
    question: req.body.question,
    filters: [
      { key: 'userId', match: { value: req.user.id } }
    ]
  });
  res.json(answer);
});

4. Show Loading States

javascript
const [loading, setLoading] = useState(false);

const handleSubmit = async () => {
  setLoading(true);
  try {
    await streamQuery(datasetId, question);
  } finally {
    setLoading(false);
  }
};

TypeScript Definition

typescript
interface QueryRequest {
  datasetId: string;
  question: string;
  filters?: Filter[];
  stream?: boolean;
}

interface Filter {
  key: string;
  match: {
    value: string | number | boolean;
  };
}

// Non-streaming
interface QueryResponse {
  success: true;
  data: {
    result: string;
    sources: Array<{
      pageContent: string;
      metadata: Record<string, any>;
    }>;
  };
}

// Streaming
interface StreamDelta {
  delta: string;
}

interface StreamDone {
  done: true;
}

interface StreamError {
  error: string;
}

type StreamEvent = StreamDelta | StreamDone | StreamError;

Grounded Answers

The AI is instructed to:

✅ DO:

Answer based only on provided documents
Cite specific information from context
Admit when information is insufficient

❌ DON'T:

Hallucinate information
Make assumptions beyond context
Provide answers not supported by documents

Example Responses

Sufficient context:

"Based on the user manual, to reset your password:
1. Click 'Forgot Password' on the login page
2. Enter your email address
3. Check your inbox for the reset link
The link expires after 24 hours."

Insufficient context:

"I don't have enough information in the provided documents 
to answer that question about server configuration."

Notes

Queries are charged before processing
Empty datasets return "no relevant context" to LLM
Frontend tokens are dataset-scoped for security
System prompt prevents hallucinations
Context includes source metadata
Streaming uses Server-Sent Events (SSE)
Minimum SSE chunk size: 30 characters
Non-streaming waits for complete response
Filters applied before search phase
No token memory between queries

Related Endpoints

POST /v1/search - Semantic search without LLM
POST /v1/files/upload - Upload queryable files
GET /v1/files - List indexed files
POST /v1/tokens/create - Generate frontend tokens

Documentation

POST /v1/query

Authentication

Content-Type

Request Body

Parameters

Non-Streaming Mode

Request

Response (200)

Response Fields

Streaming Mode (Default)

Request

Response Headers

Response Format (SSE)

Event Types

Example Stream

Filtering

Filter Structure

Example with Filters

Request Examples

JavaScript (Streaming)

JavaScript (Non-Streaming)

Python (Non-Streaming)

Python (Streaming)

Error Responses

400 Bad Request

401 Unauthorized

402 Payment Required

500 Internal Server Error

Stream Error

Technical Details

Processing Pipeline

System Prompt

Context Format

Models Used

Billing

Rate Limits

Streaming vs Non-Streaming

Comparison: Query vs Search

Best Practices

1. Use Streaming for Chat

2. Handle Empty Context

3. Apply Filters Server-Side

4. Show Loading States

TypeScript Definition

Grounded Answers

Example Responses

Notes

Related Endpoints

POST `/v1/query`