Back to Blog
Engineering

Building Multi-Tenant RAG Applications

Best practices for building secure, scalable multi-tenant RAG applications.

Vlad Racoare
8 min read
AdvancedArchitectureMulti-Tenancy

Building Multi-Tenant RAG Applications

Multi-tenancy is crucial for SaaS applications, but it presents unique challenges when building RAG systems. In this guide, we'll explore best practices for building secure, scalable multi-tenant RAG applications.

What is Multi-Tenancy in RAG?

Multi-tenancy means serving multiple customers (tenants) from a single application instance while keeping their data completely isolated. In RAG systems, this includes:

  • Documents: Each tenant's files are separate
  • Vector embeddings: Tenant data must not leak across boundaries
  • Search results: Users only see their own data
  • Metadata: Custom fields per tenant

Architecture Patterns

Pattern 1: Shared Collection with Filtering

The most cost-effective approach for most applications.

javascript
// Store all tenants in one vector collection const vectorDb = new QdrantDb({ collection: 'shared-collection', customerId: 'customer_123', datasetId: 'dataset_456' }); // Filter is applied automatically const results = await vectorDb.search(query);

Pros:

  • Simple to manage
  • Cost-effective (1 collection for all tenants)
  • Scales to 100,000+ tenants
  • Lower memory overhead

Cons:

  • All customers share resources
  • One slow query can affect others
  • Perceived isolation concerns

When to use: Start-ups to mid-scale SaaS (< 10,000 customers)

Pattern 2: Per-Customer Collections

Complete isolation with dedicated vector collections per customer.

javascript
// Each customer gets their own collection const collectionName = `customer_${customerId}`; const vectorDb = new QdrantDb({ collection: collectionName });

Pros:

  • True hardware isolation
  • Easy to identify resource usage
  • Can move customers independently
  • No noisy neighbor issues

Cons:

  • Management complexity
  • Higher costs (5-10x more)
  • Collection limits (Qdrant: ~1000)
  • Wasted resources for small customers

When to use: Enterprise-focused, < 100 customers, high-value accounts

Pattern 3: Hybrid (Recommended at Scale)

Combine both approaches for optimal cost and performance.

javascript
async function getCollectionStrategy(customerId) { const customer = await getCustomer(customerId); // Enterprise: dedicated collection if (customer.tier === 'enterprise' || customer.vectorCount > 1000000) { return { collection: `customer_${customerId}`, useFilters: false }; } // Everyone else: shared with filters return { collection: 'shared-collection', useFilters: true }; }

Pros:

  • Best of both worlds
  • 95% of customers in cost-effective shared
  • 5% enterprise customers get isolation
  • Scales to 100,000+ total customers

Cons:

  • More complex logic
  • Need migration path
  • Two codepaths to maintain

When to use: Scaling SaaS with enterprise tier (10,000+ customers)

Security Best Practices

1. Always Filter by Customer ID

Never trust customer IDs from the request - always get from authentication.

javascript
// ❌ INSECURE - user controls customerId app.post('/search', async (req, res) => { const { customerId, query } = req.body; // User could spoof this! const results = await search(customerId, query); res.json(results); }); // ✅ SECURE - customerId from auth app.post('/search', authenticate, async (req, res) => { const customerId = req.user.id; // From verified token/session const { query } = req.body; const results = await search(customerId, query); res.json(results); });

2. Server-Side Filtering

Always apply tenant filters on the server, never trust client-side filters.

javascript
// Server enforces tenant isolation export function makeVectorDb({ customerId, datasetId }) { return new QdrantDb({ collection: 'shared-collection', // These are baked into the DB instance customerId, // From auth datasetId // From auth or request }); } // All queries are automatically scoped const results = await vectorDb.search(query); // Qdrant only searches this customer's vectors

3. Dataset-Level Access Control

Allow customers to segment their own data.

javascript
// Upload with metadata await client.upload('dataset-123', file, { metadata: { 'document.pdf': { userId: 'user_456', department: 'legal', confidential: true } } }); // Search with filters const results = await client.search('dataset-123', query, { filters: [ { key: 'department', match: { value: 'legal' } }, { key: 'confidential', match: { value: false } } ] });

4. Validate Dataset Ownership

Ensure users can only access their own datasets.

javascript
app.post('/search', authenticate, async (req, res) => { const customerId = req.user.id; const { datasetId, query } = req.body; // Verify ownership const dataset = await getDataset(datasetId); if (dataset.customerId !== customerId) { return res.status(403).json({ error: 'Access denied' }); } // Now safe to search const results = await search(customerId, datasetId, query); res.json(results); });

Performance Optimization

1. Index Payload Fields

Create indexes for all fields you filter on.

javascript
// Qdrant needs indexes for efficient filtering await qdrantClient.createPayloadIndex(collection, { field_name: 'customerId', field_schema: 'keyword' }); await qdrantClient.createPayloadIndex(collection, { field_name: 'datasetId', field_schema: 'keyword' });

Without indexes, filters are slow (full collection scan).

2. Monitor Per-Tenant Usage

Track usage to identify problem customers.

javascript
// Log query metadata await logQuery({ customerId, datasetId, queryTime: endTime - startTime, resultsCount: results.length, timestamp: new Date() }); // Alert on heavy usage if (queryTime > 5000) { await alertSlowQuery(customerId, datasetId); }

3. Rate Limiting Per Tenant

Prevent one customer from overwhelming your system.

javascript
import rateLimit from 'express-rate-limit'; const limiter = rateLimit({ windowMs: 60 * 1000, // 1 minute max: async (req) => { const customer = await getCustomer(req.user.id); return customer.tier === 'enterprise' ? 1000 : 100; }, keyGenerator: (req) => req.user.id // Per customer }); app.use('/api', limiter);

Data Isolation Patterns

Pattern 1: Separate Datasets

Customers manage multiple isolated datasets.

javascript
// Customer has multiple datasets const datasets = [ 'customer-123-public', 'customer-123-private', 'customer-123-archived' ]; // Each dataset is fully isolated await client.upload('customer-123-private', sensitiveFile); await client.search('customer-123-private', query);

Use case: Different projects, departments, or security levels

Pattern 2: User-Level Isolation

Sub-tenant isolation within a customer.

javascript
// Upload with user metadata await client.upload('customer-dataset', file, { metadata: { 'file.pdf': { userId: 'user_789', sharedWith: ['user_790', 'user_791'] } } }); // Search only user's documents const results = await client.search('customer-dataset', query, { filters: [ { key: 'userId', match: { value: currentUserId } } ] });

Use case: Multi-user SaaS apps, collaborative platforms

Pattern 3: Hierarchical Isolation

Nested tenant structures.

javascript
// Organization → Team → User hierarchy await client.upload('shared', file, { metadata: { 'file.pdf': { orgId: 'org_123', teamId: 'team_456', userId: 'user_789' } } }); // Filter at any level const orgResults = await client.search('shared', query, { filters: [ { key: 'orgId', match: { value: 'org_123' } } ] }); const teamResults = await client.search('shared', query, { filters: [ { key: 'orgId', match: { value: 'org_123' } }, { key: 'teamId', match: { value: 'team_456' } } ] });

Use case: Enterprise apps with complex org structures

Billing and Usage Tracking

Track Credits Per Customer

javascript
// Charge for operations async function chargeCredits({ customerId, operation, amount }) { const customer = await getCustomer(customerId); if (customer.credits < amount) { throw new Error('INSUFFICIENT_CREDITS'); } await updateCustomer(customerId, { credits: customer.credits - amount, usage: { [operation]: (customer.usage[operation] || 0) + 1 } }); } // Apply to operations await chargeCredits({ customerId, operation: 'upload', amount: 1 // 1 credit per file }); await chargeCredits({ customerId, operation: 'query', amount: 0.1 // 0.1 credits per query });

Usage Analytics

javascript
// Track vector count per customer async function getCustomerUsage(customerId) { const vectorCount = await vectorDb.count({ filter: { must: [{ key: 'customerId', match: { value: customerId } }] } }); const fileCount = await db.collection('files') .where('customerId', '==', customerId) .count(); return { vectorCount, fileCount }; }

Testing Multi-Tenancy

Test Data Isolation

javascript
describe('Multi-tenancy', () => { it('should not return other customers data', async () => { // Upload for customer A await client.upload('customer-a-dataset', fileA, { metadata: { 'file.pdf': { customerId: 'customer-a' } } }); // Upload for customer B await client.upload('customer-b-dataset', fileB, { metadata: { 'file.pdf': { customerId: 'customer-b' } } }); // Search as customer A const clientA = new EasyRAG(customerAToken); const resultsA = await clientA.search('customer-a-dataset', 'test'); // Should only see customer A's data expect(resultsA.data.every(r => r.metadata.customerId === 'customer-a' )).toBe(true); }); });

Load Testing Per Tenant

javascript
// Simulate multiple tenants async function loadTest() { const tenants = Array.from({ length: 100 }, (_, i) => ({ id: `customer-${i}`, token: generateToken(`customer-${i}`) })); // Concurrent queries from all tenants await Promise.all( tenants.map(tenant => fetch('/api/search', { headers: { Authorization: `Bearer ${tenant.token}` }, body: JSON.stringify({ query: 'test' }) }) ) ); }

Migration Strategy

Moving from Single to Multi-Tenant

javascript
// 1. Add tenant metadata to existing vectors async function backfillTenantMetadata() { const files = await db.collection('files').get(); for (const file of files) { await vectorDb.updatePayload({ points: file.vectorIds, payload: { customerId: file.customerId, datasetId: file.datasetId } }); } } // 2. Update queries to use filters // Before const results = await vectorDb.search(query); // After const results = await vectorDb.search(query, { filter: { must: [ { key: 'customerId', match: { value: customerId } } ] } });

Moving Customers Between Collections

javascript
async function migrateCustomer(customerId) { const sourceCollection = 'shared-collection'; const targetCollection = `customer-${customerId}`; // 1. Create new collection await createCollection(targetCollection); // 2. Copy vectors const vectors = await getCustomerVectors(customerId, sourceCollection); await insertVectors(targetCollection, vectors); // 3. Update references await updateCustomerMetadata(customerId, { collection: targetCollection }); // 4. Delete from source (after verification) await deleteCustomerVectors(customerId, sourceCollection); }

Conclusion

Multi-tenancy in RAG systems requires careful consideration of:

  1. Architecture: Choose the right pattern for your scale
  2. Security: Server-side filtering, auth-based isolation
  3. Performance: Proper indexing, monitoring, rate limiting
  4. Testing: Verify data isolation, load test per tenant
  5. Economics: Balance cost vs. isolation needs

Start simple with shared collections and filters. Evolve to hybrid as you scale. Most SaaS companies never need per-customer collections.

Resources


Questions? Reach out at support@easyrag.com