Files
sdi/sdi-saas-architecture-blueprint.md
austindebest d62468adf9 Initial commit: SDI SaaS Platform foundation
- Complete monorepo structure with pnpm workspaces
- Prisma database schema with 20+ entities
- NestJS API with 9 core modules
- BullMQ orchestration worker
- AWS and Azure provider adapters
- Docker Compose infrastructure
- Complete documentation
2026-04-20 00:00:59 +01:00

23 KiB

SDI SaaS Architecture Blueprint

Overview

This document outlines a production-oriented architecture blueprint for building a software-defined interconnection (SDI) SaaS platform similar in product direction to Console Connect. Console Connect presents itself as a software-defined interconnection platform that enables enterprises to provision and manage private connections between clouds, data centres, applications, and partners through a portal and APIs.[cite:11][cite:16] MEF's Lifecycle Service Orchestration (LSO) framework is intended to standardize automation across service ordering, inventory, billing, and multi-provider orchestration, making it a strong reference model for an interconnection platform intended to federate with partners and carriers.[cite:21][cite:24][cite:36]

The recommended product approach is a multi-tenant SaaS control plane with a customer portal, admin portal, orchestration engine, provider/cloud adapters, billing subsystem, and standards-aligned API layer. A TypeScript-first implementation matches the user's existing strengths in Node.js, TypeScript, Prisma, Docker, BullMQ, Vue.js, and production deployment on Ubuntu and Kubernetes.[cite:1][cite:2][cite:3][cite:4][cite:5][cite:7][cite:8]

Product Scope

An SDI platform of this type acts as a digital control plane for ordering, provisioning, modifying, monitoring, and billing private connectivity services. Public material describing Console Connect emphasizes software-defined interconnection, private connectivity, self-service provisioning, and automation through APIs rather than slow manual provisioning alone.[cite:11][cite:14][cite:17][cite:18]

The initial commercial service catalog should focus on a limited set of high-value product types:

  • Cloud-to-data-centre private interconnect.
  • Multi-cloud connectivity between AWS and Azure.
  • Partner-to-partner private interconnection.
  • On-demand bandwidth changes for supported services.
  • Service inventory, usage, billing, and lifecycle management.

Architecture Principles

The architecture should follow a few hard rules from the beginning:

  • Keep a canonical internal domain model independent of any single provider or standards body.
  • Treat provisioning as an asynchronous workflow, not a request-response transaction.
  • Separate the orchestration core from provider-specific adapter code.
  • Persist every service-state transition for auditability and recovery.
  • Expose APIs as a core product capability, not as a later add-on.[cite:12][cite:15][cite:18]
  • Align external B2B APIs with MEF LSO concepts where possible so federation with partners is easier later.[cite:24][cite:27][cite:36]

System Landscape

The platform should be organized into the following top-level systems:

System Purpose
Customer portal Order services, manage inventory, monitor status, billing, teams
Admin portal Provider onboarding, pricing, manual intervention, audits, NOC tooling
Public API Customer automation, API keys, webhooks, partner integrations
Core domain API Tenants, catalog, orders, services, billing, audit
Orchestration engine Long-running workflows, retries, rollback, dependency sequencing
Provider adapters AWS, Azure, carrier, IX, and data-centre integration
Event backbone Async job processing and event fan-out
Observability stack Logs, metrics, traces, alerts, SLOs

The core implementation should use a TypeScript-first stack so that the portal, APIs, shared contracts, and workflow logic live in one strongly typed ecosystem. That aligns well with the user's known experience in TypeScript, Node.js, Prisma, Vue.js, Docker, Kubernetes, and BullMQ.[cite:2][cite:3][cite:4][cite:5][cite:7]

Layer Recommended stack
Frontend Vue 3, Nuxt 3, TypeScript, Tailwind CSS, Pinia, TanStack Query
API backend NestJS or Fastify with TypeScript
Data PostgreSQL
ORM Prisma[cite:3]
Queue and jobs Redis + BullMQ[cite:7]
Search/log analytics OpenSearch
Object storage S3-compatible storage or MinIO
Realtime SSE first, WebSockets where needed
Auth Keycloak, Auth0, or Ory
Infra Docker, Kubernetes, Helm, Terraform[cite:5][cite:8]
Observability Prometheus, Grafana, Loki, OpenTelemetry
High-performance adapters Go for selected components when concurrency or low-level control demands it

Repository and Module Structure

A production-friendly repository model should separate applications from shared packages:

apps/
  customer-portal/
  admin-portal/
  api/
  worker/
  realtime-gateway/
packages/
  domain-core/
  shared-types/
  auth-sdk/
  billing-engine/
  event-contracts/
  adapter-aws/
  adapter-azure/
  adapter-mef-partner/
  adapter-carrier-x/
infra/
  terraform/
  helm/
  k8s/

Inside the main API, use bounded modules rather than one giant service layer:

  • auth
  • tenants
  • users
  • roles
  • catalog
  • endpoints
  • quotes
  • orders
  • services
  • provisioning
  • inventory
  • billing
  • audit
  • notifications
  • webhooks
  • providerAccounts
  • incidents

Domain Model

The internal canonical model should normalize the language across clouds, carriers, exchanges, and partners. This prevents the entire system from becoming tightly coupled to AWS Direct Connect, Azure ExpressRoute, or any one MEF payload shape.

Core entities

  • Tenant
  • User
  • Role
  • Provider
  • ProviderAccount
  • Endpoint
  • ProductOffering
  • Quote
  • Order
  • Service
  • ProvisioningTask
  • InventoryRecord
  • UsageRecord
  • Invoice
  • ApiKey
  • WebhookEndpoint
  • AuditEvent
  • Incident

Example service order type

export type ServiceOrderStatus =
  | 'draft'
  | 'submitted'
  | 'validating'
  | 'quoted'
  | 'approved'
  | 'queued'
  | 'provisioning'
  | 'active'
  | 'failed'
  | 'suspended'
  | 'terminated';

export interface ServiceOrder {
  id: string;
  tenantId: string;
  productOfferingId: string;
  providerId: string;
  sourceEndpointId: string;
  targetEndpointId: string;
  bandwidthMbps: number;
  status: ServiceOrderStatus;
  externalReference?: string;
  createdAt: Date;
  updatedAt: Date;
}

MEF LSO Mapping

MEF's LSO standards are useful as the interoperability layer for B2B and partner automation. MEF materials describe API domains around service qualification, quoting, ordering, inventory, billing, and multi-provider automation under the LSO framework.[cite:24][cite:36][cite:39]

Internal module MEF-aligned domain Purpose
endpoints / serviceability Address validation, service qualification Check whether a service can be delivered
quotes Quote management Generate price and commercial terms
orders Product ordering Accept and track service orders
inventory Product inventory Return active services and asset state
incidents Trouble ticketing Lifecycle of faults and support cases
billing Billing management Charges, invoices, reconciliation
partner gateway Sonata / Cantata style inter-provider APIs Federation with partners and providers
internal resource automation Presto-like orchestration patterns Domain-level provisioning inside the provider environment

The clean implementation pattern is to keep the internal canonical objects stable, then map them to MEF-compliant payloads in a translation layer. That lets the SaaS expose MEF-shaped APIs externally without forcing the whole internal system into external standard payloads.[cite:24][cite:33][cite:36]

API Design

The platform should expose three API families:

  1. Customer API for self-service automation.
  2. Partner API for inter-provider federation and standards alignment.
  3. Internal service APIs for adapters, orchestration, billing, and observability.

Example external endpoints

POST   /v1/quotes
GET    /v1/quotes/:id
POST   /v1/orders
GET    /v1/orders/:id
POST   /v1/orders/:id/cancel
GET    /v1/services
GET    /v1/services/:id
POST   /v1/services/:id/modify
POST   /v1/services/:id/suspend
POST   /v1/services/:id/terminate
GET    /v1/inventory
GET    /v1/billing/invoices
POST   /v1/webhooks/test

Webhook events

quote.ready
order.accepted
order.rejected
service.provisioning.started
service.provider.pending
service.active
service.failed
service.modified
service.suspended
service.terminated
invoice.generated
incident.created

Provisioning Architecture

Provisioning must be implemented as a stateful orchestration flow. AWS Direct Connect and Azure ExpressRoute are both private-connectivity services managed through provider tooling and automation interfaces, and each has external dependencies, location constraints, routing details, and lifecycle operations that make asynchronous orchestration necessary.[cite:0][cite:1]

Provisioning flow

  1. Customer submits intent through the portal or API.
  2. Core API validates payload, tenant permissions, and serviceability.
  3. Order record is created in PostgreSQL.
  4. An event is emitted to the orchestration queue.
  5. Orchestrator resolves dependency graph and selects provider adapter.
  6. Adapter calls downstream cloud or partner APIs.
  7. Status is tracked through callbacks or polling.
  8. State transitions are persisted.
  9. Realtime gateway streams updates to the portal.
  10. Billing metering starts after activation.

Example adapter contract

export interface ProviderAdapter {
  validate(payload: ServiceIntent): Promise<ValidationResult>;
  quote(payload: ServiceIntent): Promise<QuoteResult>;
  provision(payload: ProvisionRequest): Promise<ProvisionResponse>;
  getStatus(externalId: string): Promise<ServiceStatus>;
  modify(payload: ModifyRequest): Promise<ModifyResponse>;
  suspend(externalId: string): Promise<ActionResult>;
  terminate(externalId: string): Promise<ActionResult>;
  syncInventory?(): Promise<void>;
}

Example orchestration logic

async function provisionOrder(orderId: string) {
  const order = await orderRepo.getById(orderId);
  await orderRepo.updateStatus(orderId, 'validating');

  const adapter = adapterRegistry.get(order.providerId);
  const validation = await adapter.validate(toServiceIntent(order));

  if (!validation.ok) {
    await orderRepo.updateStatus(orderId, 'failed');
    await audit.log(orderId, 'validation_failed', validation.errors);
    return;
  }

  await orderRepo.updateStatus(orderId, 'provisioning');

  const result = await adapter.provision({
    sourceEndpointId: order.sourceEndpointId,
    targetEndpointId: order.targetEndpointId,
    bandwidthMbps: order.bandwidthMbps,
  });

  if (!result.success) {
    await orderRepo.updateStatus(orderId, 'failed');
    await audit.log(orderId, 'provision_failed', result.error);
    return;
  }

  await orderRepo.updateExternalReference(orderId, result.externalServiceId);
  await orderRepo.updateStatus(orderId, 'active');
  await billing.activateMetering(orderId);
  await audit.log(orderId, 'service_active', result);
}

AWS Adapter Design

AWS Direct Connect is presented by AWS as a private connectivity service with global availability and deployment options such as dedicated and hosted connections, and AWS publishes management through console, CLI, and API tooling.[cite:1] The AWS adapter should therefore encapsulate AWS-specific service qualification, connection creation, lifecycle changes, and status retrieval while exposing a provider-neutral interface to the orchestration engine.

AWS adapter responsibilities

  • Maintain metadata for supported Direct Connect locations and regions.
  • Validate feasible source and target combinations.
  • Create or manage Direct Connect-related service components through AWS APIs.
  • Persist AWS external identifiers and status codes.
  • Store BGP and routing-related metadata where relevant.
  • Support bandwidth changes, suspend or terminate operations, and inventory synchronization.

AWS flow

  1. Resolve available interconnection location.
  2. Validate tenant entitlement and provider account mapping.
  3. Generate quote using price book and optional AWS-linked commercial rules.
  4. Submit provisioning call through the adapter.
  5. Persist external identifiers and poll or subscribe for status changes.
  6. Mark service active and start metering when all required conditions are met.

Azure Adapter Design

Microsoft documents Azure ExpressRoute as private connectivity into Microsoft cloud services through a connectivity provider, exchange, or direct model, with BGP-based routing, redundant connections, and multiple automation paths including portal, PowerShell, CLI, ARM, Terraform, and Bicep.[cite:0] The Azure adapter should mirror the same provider-neutral contract used by AWS while capturing Azure-specific concepts such as ExpressRoute circuit metadata, peering state, redundancy, and regional capability.

Azure adapter responsibilities

  • Maintain ExpressRoute locations, supported providers, SKUs, and bandwidth options.
  • Validate location and provider compatibility.
  • Create or update circuits and peering-related metadata through Azure automation interfaces.
  • Store circuit IDs, provisioning state, and change history.
  • Support modify, suspend, terminate, and inventory-sync operations.

Azure flow

  1. Validate source site and target Azure region.
  2. Resolve supported ExpressRoute location and provider path.
  3. Generate quote and order summary.
  4. Provision through adapter calls.
  5. Persist circuit references and route state.
  6. Move service to active only after dependency checks complete.

Multi-Cloud Connectivity Pattern

A realistic AWS-to-Azure service is usually implemented through an intermediary fabric, exchange, or provider edge, rather than by directly linking two cloud-native constructs in isolation. Megaport's guidance explains connecting AWS Direct Connect and Azure ExpressRoute through a data-centre or interconnection hub where routing can be exchanged and BGP established across the private path.[cite:22][cite:25]

In the SaaS, that should be modeled as a composite service consisting of multiple linked sub-services:

  • AWS-side connectivity leg.
  • Exchange or partner-fabric leg.
  • Azure-side connectivity leg.
  • Composite service object exposed to the customer.

This is important because failures, billing, and lifecycle changes may occur on one leg without affecting the others equally. The orchestration engine therefore needs dependency-aware workflows and partial-failure handling.

Realtime Backend Design

The real-time backend should use event-driven components around a durable transactional core. The user already uses BullMQ and TypeScript background job patterns, which makes Redis-backed orchestration a practical early-stage choice.[cite:7]

  • PostgreSQL for source-of-truth transactional data.
  • Redis for queues, locks, and transient workflow state.
  • BullMQ workers for orchestration.
  • SSE for customer-facing live state changes.
  • WebSockets only where bidirectional traffic is needed.
  • OpenTelemetry traces across API, worker, and adapter boundaries.
  • Grafana, Prometheus, and Loki for operations visibility.

Event model

order.created
order.validated
quote.generated
order.approved
provisioning.started
provider.request.sent
provider.pending
provider.completed
service.active
service.failed
billing.metering.started
inventory.sync.completed
incident.opened

Scaling strategy

  • Scale API pods horizontally behind an ingress controller.
  • Scale workers independently from the API.
  • Isolate slow or noisy providers into dedicated adapter deployments.
  • Use idempotency keys for retries and duplicate-callback protection.
  • Introduce Kafka or NATS later if event volume significantly exceeds the practical comfort zone of Redis-backed orchestration.

Database Schema Skeleton

A relational schema should center on strong referential integrity and auditability.

create table tenants (
  id uuid primary key,
  name text not null,
  created_at timestamptz not null default now()
);

create table providers (
  id uuid primary key,
  name text not null,
  type text not null,
  created_at timestamptz not null default now()
);

create table endpoints (
  id uuid primary key,
  provider_id uuid references providers(id),
  kind text not null,
  region text,
  metro text,
  metadata jsonb not null default '{}'::jsonb
);

create table service_orders (
  id uuid primary key,
  tenant_id uuid references tenants(id),
  provider_id uuid references providers(id),
  source_endpoint_id uuid references endpoints(id),
  target_endpoint_id uuid references endpoints(id),
  status text not null,
  bandwidth_mbps integer not null,
  external_reference text,
  created_at timestamptz not null default now(),
  updated_at timestamptz not null default now()
);

create table services (
  id uuid primary key,
  order_id uuid references service_orders(id),
  tenant_id uuid references tenants(id),
  status text not null,
  activated_at timestamptz,
  terminated_at timestamptz
);

create table audit_events (
  id uuid primary key,
  aggregate_type text not null,
  aggregate_id uuid not null,
  event_type text not null,
  payload jsonb not null,
  created_at timestamptz not null default now()
);

Security Model

Security should be designed for enterprise customers from the start:

  • Multi-tenant data isolation.
  • Strong RBAC with least-privilege roles.
  • SSO and MFA for enterprise tenants.
  • API keys with scopes and rotation.
  • Signed webhooks.
  • End-to-end audit trails.
  • Secret storage outside application code.
  • Encryption in transit and at rest.
  • Per-tenant rate limiting and anomaly detection.

Because the product controls private connectivity, service lifecycle, and billing, its security posture must be closer to enterprise infrastructure software than a lightweight self-serve SaaS.

Billing and Commercial Engine

The billing engine should support:

  • One-time provisioning fees.
  • Recurring monthly port or service fees.
  • Usage-based bandwidth charges.
  • Regional price books.
  • Contract discounts and credits.
  • Taxes and invoice generation.
  • Reconciliation against provider-side usage records.

MEF's billing-related API work reinforces the value of having a distinct billing domain rather than embedding pricing logic deep inside the provisioning code.[cite:24][cite:36]

Deployment Topology

A production launch should use a cloud-native deployment model with staged environments.

Environments

  • Local development with Docker Compose.
  • Shared development cluster.
  • Staging cluster with provider sandbox integrations.
  • Production cluster in one primary region first.
  • Optional secondary region for disaster recovery and later geo-expansion.

Production components

  • Kubernetes for API, worker, realtime gateway, and adapter services.[cite:8]
  • Managed PostgreSQL or HA PostgreSQL.
  • Managed Redis or HA Redis.
  • Object storage for documents, invoices, and exports.
  • Ingress controller with TLS termination and WAF.
  • Prometheus, Grafana, Loki, and tracing backend.

Phased Rollout Plan

Phase 1: MVP

  • Multi-tenant auth and RBAC.
  • Customer portal and admin portal.
  • Catalog of endpoints and product offerings.
  • Quote and order APIs.
  • One AWS adapter.
  • One Azure adapter.
  • Orchestration engine with retries and audit logs.
  • Basic billing and invoices.
  • Realtime order-status updates.

Phase 2: Serious v1

  • Composite multi-cloud services.
  • Provider inventory sync.
  • Customer API keys and webhooks.
  • Incident and support workflows.
  • Enhanced billing and reporting.
  • More provider adapters.
  • Manual intervention queue and NOC tooling.

Phase 3: Federation and standards

  • MEF-aligned partner APIs.
  • Inter-provider automation and external order exchange.
  • Broader inventory and trouble-ticket interoperability.
  • Expanded SLA, compliance, and reporting features.

Phase 4: Global scale

  • Multi-region deployment.
  • Advanced traffic engineering integrations.
  • Stronger commercial routing and partner settlement.
  • Regional data and operational segmentation.

Open-Source Building Blocks

There is no exact open-source clone of Console Connect, but several open-source systems can accelerate a similar build. OpenDaylight is an open-source SDN platform for programmable network control, Faucet is an open-source SDN controller oriented toward production environments, and the MEF LSO Sonata SDK provides useful artifacts for standards-aligned integration work.[cite:29][cite:33][cite:38] ONOS is also commonly evaluated alongside OpenDaylight for service-provider and controller use cases.[cite:23]

Tool Role
OpenDaylight Southbound SDN control and programmable network integration
ONOS Carrier-style SDN control plane option
Faucet Production-focused open SDN controller
MEF LSO Sonata SDK Standards-oriented API artifacts and examples
Terraform Infrastructure and cloud automation, including Azure-friendly workflows[cite:0]
OpenSearch Search, event analytics, and operational visibility

Cost Model

Public 2026 cost estimates for SaaS products place complex platforms broadly in the six-figure range, with enterprise-grade systems frequently reaching $500,000 or more depending on integrations and compliance.[cite:31][cite:34][cite:37][cite:40] A global SDI SaaS should therefore be budgeted more like enterprise infrastructure software than a lightweight B2C web app.

Stage Build estimate Notes
MVP $80k-$180k Limited providers, customer portal, admin panel, core orchestration[cite:31][cite:37]
Serious v1 $180k-$500k Better reliability, billing, AWS and Azure integration, audit, observability[cite:31][cite:34]
Global launch $500k-$1M+ Multi-region ops, partner federation, NOC tooling, compliance, sales engineering[cite:31][cite:34]

Operating cost should be budgeted separately for infrastructure, observability, support, security, partner onboarding, and continuous product iteration.[cite:40]

A sensible execution sequence for this project is:

  1. Model tenants, providers, endpoints, quotes, orders, services, and audit events.
  2. Build auth, RBAC, tenant isolation, and audit logging.
  3. Build quote and order workflows with a persisted state machine.
  4. Add Redis and BullMQ orchestration.
  5. Implement AWS and Azure adapters behind a common interface.
  6. Add SSE-based realtime order tracking.
  7. Add billing, invoices, and usage metering.
  8. Expose customer API and webhooks.
  9. Add provider inventory sync and support workflows.
  10. Layer in MEF-aligned partner APIs once the internal model is stable.

Final Recommendation

The strongest path is to start as a modular, API-first, TypeScript-based interconnection control plane rather than trying to recreate a full carrier-grade network fabric on day one. Public information about Console Connect, AWS Direct Connect, Azure ExpressRoute, and MEF LSO all point to the same conclusion: the hardest and most defensible part of the product is not the dashboard UI, but the orchestration, standards alignment, adapter design, and operational reliability of real-world service delivery.[cite:11][cite:15][cite:18][cite:0][cite:1][cite:24]