vendure-data-hub-plugin

Production Setup

Best practices for deploying Data Hub in production.

Pre-Deployment Checklist

Configuration

Security

Infrastructure

Environment Variables

Use environment variables for all sensitive configuration:

# Database connections
ERP_DB_HOST=db.production.internal
ERP_DB_USER=vendure_reader
ERP_DB_PASSWORD=secure-password

# API keys
SUPPLIER_API_KEY=sk_live_...
GOOGLE_MERCHANT_API_KEY=...

# AWS credentials (for S3)
AWS_ACCESS_KEY_ID=AKIA...
AWS_SECRET_ACCESS_KEY=...

Plugin Configuration

DataHubPlugin.init({
    enabled: true,
    debug: false,
    retentionDaysRuns: 30,
    retentionDaysErrors: 90,

    secrets: [
        { code: 'supplier-api', provider: 'ENV', value: 'SUPPLIER_API_KEY' },
        { code: 'erp-db-password', provider: 'ENV', value: 'ERP_DB_PASSWORD' },
    ],

    connections: [
        {
            code: 'erp-db',
            type: 'postgres',
            name: 'ERP Database',
            settings: {
                host: '${ERP_DB_HOST}',
                port: 5432,
                database: 'erp',
                username: '${ERP_DB_USER}',
                password: '${ERP_DB_PASSWORD}',
                ssl: true,
                poolSize: 5,
            },
        },
    ],
})

Job Queue Setup

Single Server

For smaller deployments, the default configuration works:

jobQueueOptions: {
    activeQueues: ['default', 'data-hub.run', 'data-hub.schedule'],
}

Multiple Workers

For high-volume processing, run dedicated workers:

// Main server - handles API requests
jobQueueOptions: {
    activeQueues: ['default'],
}

// Worker process - handles data hub jobs
jobQueueOptions: {
    activeQueues: ['data-hub.run', 'data-hub.schedule'],
}

Worker Script

// worker.ts
import { bootstrapWorker } from '@vendure/core';
import config from './vendure-config';

bootstrapWorker({
    ...config,
    jobQueueOptions: {
        activeQueues: ['data-hub.run', 'data-hub.schedule'],
        pollInterval: 1000,
    },
})
    .then(worker => worker.startJobQueue())
    .catch(err => {
        console.error('Worker failed to start:', err);
        process.exit(1);
    });

Database Considerations

Connection Pooling

Limit connection pool size to prevent exhausting database connections:

connections: [
    {
        code: 'external-db',
        type: 'postgres',
        settings: {
            poolSize: 5,  // Limit concurrent connections
        },
    },
]

Read Replicas

For read-heavy operations, configure read replicas:

connections: [
    {
        code: 'erp-db-read',
        type: 'postgres',
        settings: {
            host: '${ERP_DB_READ_HOST}',  // Read replica
        },
    },
]

Logging

Log Persistence Level

Set the minimum level to persist:

mutation {
    updateDataHubSettings(input: {
        logPersistenceLevel: "info"  # debug, info, warn, error
    }) {
        logPersistenceLevel
    }
}

Log Aggregation

Send logs to external systems:

// Custom log handler (example)
import { LoggingService } from '@vendure/core';

class CustomLogger extends LoggingService {
    log(level: string, message: string, context?: any) {
        // Send to CloudWatch, Datadog, etc.
        externalLogger.log({ level, message, context });
    }
}

Monitoring

Key Metrics

Monitor these metrics:

Metric Description Alert Threshold
Pipeline success rate % of successful runs < 95%
Average run duration Execution time > baseline + 50%
Record error rate % of failed records > 5%
Queue depth Pending jobs > 100
Worker health Active workers < expected

Health Checks

Add health check endpoints:

// Check Data Hub status
app.use('/health/data-hub', async (req, res) => {
    const isHealthy = await checkDataHubHealth();
    res.status(isHealthy ? 200 : 503).json({ healthy: isHealthy });
});

Alerting

Set up alerts for:

Backup and Recovery

Data to Backup

Recovery Procedures

  1. Code-first pipelines: Automatically restored from code
  2. UI-created pipelines: Restore from database backup
  3. Secrets: Recreate from secure storage
  4. Connections: Recreate from documentation

Scaling

Horizontal Scaling

Data Hub supports running multiple instances with automatic coordination:

Distributed Locking:

# Option 1: Redis (recommended for production)
DATAHUB_REDIS_URL=redis://redis.production.internal:6379

# Option 2: Force PostgreSQL (no additional infrastructure)
DATAHUB_LOCK_BACKEND=postgres

What’s Protected:

Deployment Architecture:

                    ┌─────────────────┐
                    │  Load Balancer  │
                    └────────┬────────┘
                             │
        ┌────────────────────┼────────────────────┐
        ▼                    ▼                    ▼
┌───────────────┐    ┌───────────────┐    ┌───────────────┐
│   Vendure 1   │    │   Vendure 2   │    │   Vendure 3   │
│ + Data Hub    │    │ + Data Hub    │    │ + Data Hub    │
└───────┬───────┘    └───────┬───────┘    └───────┬───────┘
        │                    │                    │
        └────────────────────┼────────────────────┘
                             │
        ┌────────────────────┼────────────────────┐
        ▼                    ▼                    ▼
┌───────────────┐    ┌───────────────┐    ┌───────────────┐
│   PostgreSQL  │    │     Redis     │    │  Message Queue│
│   (required)  │    │   (optional)  │    │   (optional)  │
└───────────────┘    └───────────────┘    └───────────────┘

Without Redis:

With Redis:

Additional Scaling Tips

Vertical Scaling

Rate Limiting

Protect external APIs:

.extract('api-call', {
    throughput: {
        rateLimitRps: 10,  // Max 10 requests per second
    },
})

Security Best Practices

  1. Secrets: Always use environment variables in production
  2. Connections: Use SSL/TLS for database connections
  3. Webhooks: Enable signature verification
  4. Permissions: Follow principle of least privilege
  5. Logging: Never log sensitive data
  6. Network: Restrict access to internal APIs