Building Multi-Tenant RAG Applications

Multi-tenancy is crucial for SaaS applications, but it presents unique challenges when building RAG systems. In this guide, we'll explore best practices for building secure, scalable multi-tenant RAG applications.

What is Multi-Tenancy in RAG?

Multi-tenancy means serving multiple customers (tenants) from a single application instance while keeping their data completely isolated. In RAG systems, this includes:

Documents: Each tenant's files are separate
Vector embeddings: Tenant data must not leak across boundaries
Search results: Users only see their own data
Metadata: Custom fields per tenant

Architecture Patterns

Pattern 1: Shared Collection with Filtering

The most cost-effective approach for most applications.

javascript
// Store all tenants in one vector collection
const vectorDb = new QdrantDb({
  collection: 'shared-collection',
  customerId: 'customer_123',
  datasetId: 'dataset_456'
});

// Filter is applied automatically
const results = await vectorDb.search(query);

Pros:

Simple to manage
Cost-effective (1 collection for all tenants)
Scales to 100,000+ tenants
Lower memory overhead

Cons:

All customers share resources
One slow query can affect others
Perceived isolation concerns

When to use: Start-ups to mid-scale SaaS (< 10,000 customers)

Pattern 2: Per-Customer Collections

Complete isolation with dedicated vector collections per customer.

javascript
// Each customer gets their own collection
const collectionName = `customer_${customerId}`;
const vectorDb = new QdrantDb({
  collection: collectionName
});

Pros:

True hardware isolation
Easy to identify resource usage
Can move customers independently
No noisy neighbor issues

Cons:

Management complexity
Higher costs (5-10x more)
Collection limits (Qdrant: ~1000)
Wasted resources for small customers

When to use: Enterprise-focused, < 100 customers, high-value accounts

Pattern 3: Hybrid (Recommended at Scale)

Combine both approaches for optimal cost and performance.

javascript
async function getCollectionStrategy(customerId) {
  const customer = await getCustomer(customerId);
  
  // Enterprise: dedicated collection
  if (customer.tier === 'enterprise' || customer.vectorCount > 1000000) {
    return {
      collection: `customer_${customerId}`,
      useFilters: false
    };
  }
  
  // Everyone else: shared with filters
  return {
    collection: 'shared-collection',
    useFilters: true
  };
}

Pros:

Best of both worlds
95% of customers in cost-effective shared
5% enterprise customers get isolation
Scales to 100,000+ total customers

Cons:

More complex logic
Need migration path
Two codepaths to maintain

When to use: Scaling SaaS with enterprise tier (10,000+ customers)

Security Best Practices

1. Always Filter by Customer ID

Never trust customer IDs from the request - always get from authentication.

javascript
// ❌ INSECURE - user controls customerId
app.post('/search', async (req, res) => {
  const { customerId, query } = req.body; // User could spoof this!
  const results = await search(customerId, query);
  res.json(results);
});

// ✅ SECURE - customerId from auth
app.post('/search', authenticate, async (req, res) => {
  const customerId = req.user.id; // From verified token/session
  const { query } = req.body;
  const results = await search(customerId, query);
  res.json(results);
});

2. Server-Side Filtering

Always apply tenant filters on the server, never trust client-side filters.

javascript
// Server enforces tenant isolation
export function makeVectorDb({ customerId, datasetId }) {
  return new QdrantDb({
    collection: 'shared-collection',
    // These are baked into the DB instance
    customerId,  // From auth
    datasetId    // From auth or request
  });
}

// All queries are automatically scoped
const results = await vectorDb.search(query);
// Qdrant only searches this customer's vectors

3. Dataset-Level Access Control

Allow customers to segment their own data.

javascript
// Upload with metadata
await client.upload('dataset-123', file, {
  metadata: {
    'document.pdf': {
      userId: 'user_456',
      department: 'legal',
      confidential: true
    }
  }
});

// Search with filters
const results = await client.search('dataset-123', query, {
  filters: [
    { key: 'department', match: { value: 'legal' } },
    { key: 'confidential', match: { value: false } }
  ]
});

4. Validate Dataset Ownership

Ensure users can only access their own datasets.

javascript
app.post('/search', authenticate, async (req, res) => {
  const customerId = req.user.id;
  const { datasetId, query } = req.body;
  
  // Verify ownership
  const dataset = await getDataset(datasetId);
  if (dataset.customerId !== customerId) {
    return res.status(403).json({ error: 'Access denied' });
  }
  
  // Now safe to search
  const results = await search(customerId, datasetId, query);
  res.json(results);
});

Performance Optimization

1. Index Payload Fields

Create indexes for all fields you filter on.

javascript
// Qdrant needs indexes for efficient filtering
await qdrantClient.createPayloadIndex(collection, {
  field_name: 'customerId',
  field_schema: 'keyword'
});

await qdrantClient.createPayloadIndex(collection, {
  field_name: 'datasetId',
  field_schema: 'keyword'
});

Without indexes, filters are slow (full collection scan).

2. Monitor Per-Tenant Usage

Track usage to identify problem customers.

javascript
// Log query metadata
await logQuery({
  customerId,
  datasetId,
  queryTime: endTime - startTime,
  resultsCount: results.length,
  timestamp: new Date()
});

// Alert on heavy usage
if (queryTime > 5000) {
  await alertSlowQuery(customerId, datasetId);
}

3. Rate Limiting Per Tenant

Prevent one customer from overwhelming your system.

javascript
import rateLimit from 'express-rate-limit';

const limiter = rateLimit({
  windowMs: 60 * 1000, // 1 minute
  max: async (req) => {
    const customer = await getCustomer(req.user.id);
    return customer.tier === 'enterprise' ? 1000 : 100;
  },
  keyGenerator: (req) => req.user.id // Per customer
});

app.use('/api', limiter);

Data Isolation Patterns

Pattern 1: Separate Datasets

Customers manage multiple isolated datasets.

javascript
// Customer has multiple datasets
const datasets = [
  'customer-123-public',
  'customer-123-private',
  'customer-123-archived'
];

// Each dataset is fully isolated
await client.upload('customer-123-private', sensitiveFile);
await client.search('customer-123-private', query);

Use case: Different projects, departments, or security levels

Pattern 2: User-Level Isolation

Sub-tenant isolation within a customer.

javascript
// Upload with user metadata
await client.upload('customer-dataset', file, {
  metadata: {
    'file.pdf': { 
      userId: 'user_789',
      sharedWith: ['user_790', 'user_791']
    }
  }
});

// Search only user's documents
const results = await client.search('customer-dataset', query, {
  filters: [
    { key: 'userId', match: { value: currentUserId } }
  ]
});

Use case: Multi-user SaaS apps, collaborative platforms

Pattern 3: Hierarchical Isolation

Nested tenant structures.

javascript
// Organization → Team → User hierarchy
await client.upload('shared', file, {
  metadata: {
    'file.pdf': {
      orgId: 'org_123',
      teamId: 'team_456',
      userId: 'user_789'
    }
  }
});

// Filter at any level
const orgResults = await client.search('shared', query, {
  filters: [
    { key: 'orgId', match: { value: 'org_123' } }
  ]
});

const teamResults = await client.search('shared', query, {
  filters: [
    { key: 'orgId', match: { value: 'org_123' } },
    { key: 'teamId', match: { value: 'team_456' } }
  ]
});

Use case: Enterprise apps with complex org structures

Billing and Usage Tracking

Track Credits Per Customer

javascript
// Charge for operations
async function chargeCredits({ customerId, operation, amount }) {
  const customer = await getCustomer(customerId);
  
  if (customer.credits < amount) {
    throw new Error('INSUFFICIENT_CREDITS');
  }
  
  await updateCustomer(customerId, {
    credits: customer.credits - amount,
    usage: {
      [operation]: (customer.usage[operation] || 0) + 1
    }
  });
}

// Apply to operations
await chargeCredits({
  customerId,
  operation: 'upload',
  amount: 1 // 1 credit per file
});

await chargeCredits({
  customerId,
  operation: 'query',
  amount: 0.1 // 0.1 credits per query
});

Usage Analytics

javascript
// Track vector count per customer
async function getCustomerUsage(customerId) {
  const vectorCount = await vectorDb.count({
    filter: {
      must: [{ key: 'customerId', match: { value: customerId } }]
    }
  });
  
  const fileCount = await db.collection('files')
    .where('customerId', '==', customerId)
    .count();
    
  return { vectorCount, fileCount };
}

Testing Multi-Tenancy

Test Data Isolation

javascript
describe('Multi-tenancy', () => {
  it('should not return other customers data', async () => {
    // Upload for customer A
    await client.upload('customer-a-dataset', fileA, {
      metadata: { 'file.pdf': { customerId: 'customer-a' } }
    });
    
    // Upload for customer B
    await client.upload('customer-b-dataset', fileB, {
      metadata: { 'file.pdf': { customerId: 'customer-b' } }
    });
    
    // Search as customer A
    const clientA = new EasyRAG(customerAToken);
    const resultsA = await clientA.search('customer-a-dataset', 'test');
    
    // Should only see customer A's data
    expect(resultsA.data.every(r => 
      r.metadata.customerId === 'customer-a'
    )).toBe(true);
  });
});

Load Testing Per Tenant

javascript
// Simulate multiple tenants
async function loadTest() {
  const tenants = Array.from({ length: 100 }, (_, i) => ({
    id: `customer-${i}`,
    token: generateToken(`customer-${i}`)
  }));
  
  // Concurrent queries from all tenants
  await Promise.all(
    tenants.map(tenant => 
      fetch('/api/search', {
        headers: { Authorization: `Bearer ${tenant.token}` },
        body: JSON.stringify({ query: 'test' })
      })
    )
  );
}

Migration Strategy

Moving from Single to Multi-Tenant

javascript
// 1. Add tenant metadata to existing vectors
async function backfillTenantMetadata() {
  const files = await db.collection('files').get();
  
  for (const file of files) {
    await vectorDb.updatePayload({
      points: file.vectorIds,
      payload: {
        customerId: file.customerId,
        datasetId: file.datasetId
      }
    });
  }
}

// 2. Update queries to use filters
// Before
const results = await vectorDb.search(query);

// After
const results = await vectorDb.search(query, {
  filter: {
    must: [
      { key: 'customerId', match: { value: customerId } }
    ]
  }
});

Moving Customers Between Collections

javascript
async function migrateCustomer(customerId) {
  const sourceCollection = 'shared-collection';
  const targetCollection = `customer-${customerId}`;
  
  // 1. Create new collection
  await createCollection(targetCollection);
  
  // 2. Copy vectors
  const vectors = await getCustomerVectors(customerId, sourceCollection);
  await insertVectors(targetCollection, vectors);
  
  // 3. Update references
  await updateCustomerMetadata(customerId, {
    collection: targetCollection
  });
  
  // 4. Delete from source (after verification)
  await deleteCustomerVectors(customerId, sourceCollection);
}

Conclusion

Multi-tenancy in RAG systems requires careful consideration of:

Architecture: Choose the right pattern for your scale
Security: Server-side filtering, auth-based isolation
Performance: Proper indexing, monitoring, rate limiting
Testing: Verify data isolation, load test per tenant
Economics: Balance cost vs. isolation needs

Start simple with shared collections and filters. Evolve to hybrid as you scale. Most SaaS companies never need per-customer collections.

Resources

Questions? Reach out at support@easyrag.com