"Cloud-native" is one of the most misused terms in technology. Running a monolith on an EC2 instance is not cloud-native. Putting a legacy application in a container is not cloud-native. Cloud-native is an architectural approach that fully leverages cloud capabilities — elasticity, managed services, distributed systems, and automation.
Here's what that actually means in practice.
The Five Principles
1. Design for Failure
Cloud infrastructure fails. Instances terminate, zones go down, networks partition. Cloud-native architecture assumes failure and designs for it.
In practice:
- No single points of failure. Every component has redundancy.
- Circuit breakers prevent cascade failures when dependencies are unavailable.
- Retry logic with exponential backoff for transient failures.
- Graceful degradation — when a non-critical service fails, the system continues with reduced functionality rather than crashing.
- Chaos engineering — intentionally inject failures to verify resilience.
2. Decompose by Business Capability
Structure your system around business capabilities, not technical layers. Each service owns a business function end-to-end — its data, its logic, its API.
In practice:
- Services are organised around business domains (orders, payments, inventory), not technical layers (API layer, business logic layer, data layer)
- Each service owns its data store. No shared databases between services.
- Services communicate through well-defined APIs or events, not shared state.
- Team structure mirrors service structure (Conway's Law, applied deliberately).
3. Automate Everything
Manual processes are the enemy of reliability and velocity. Cloud-native systems automate deployment, scaling, monitoring, recovery, and governance.
In practice:
- Infrastructure as Code (Terraform, Bicep, Pulumi) — no manual resource creation
- CI/CD pipelines for every service — commit to production in minutes
- Auto-scaling based on demand metrics, not manual capacity planning
- Automated health checks, rollbacks, and incident alerting
- Policy-as-code for governance and compliance
4. Observe Everything
You can't manage what you can't see. Cloud-native systems are instrumented from day one with comprehensive observability.
The three pillars:
- Logs: Structured, centralised, searchable. Every request has a correlation ID.
- Metrics: RED metrics (Rate, Errors, Duration) for every service. USE metrics (Utilisation, Saturation, Errors) for every resource.
- Traces: Distributed tracing across service boundaries. Every request's journey is visible end-to-end.
5. Embrace Managed Services
Cloud providers invest billions in operating infrastructure. Use their managed services rather than operating your own.
In practice:
- Managed databases (RDS, Azure SQL, Cloud SQL) instead of self-managed database instances
- Managed Kubernetes (AKS, EKS, GKE) instead of self-managed clusters
- Managed message queues (SQS, Azure Service Bus) instead of self-managed RabbitMQ
- Serverless functions for event processing instead of always-on compute
The trade-off: Managed services reduce operational burden but increase vendor coupling. Accept this trade-off for non-differentiating infrastructure. Maintain portability for core business logic.
When Monolith-First Is Better
Despite everything above, starting with a monolith is often the right choice for early-stage companies.
Choose monolith-first when:
- You have fewer than 5 engineers
- Your domain boundaries are unclear (you're still figuring out the product)
- Time to market is more important than scalability
- You don't have the operational expertise to manage distributed systems
The migration path: Build a well-structured monolith with clear module boundaries. Extract services when a specific module needs to scale independently, deploy independently, or be owned by a separate team.
Kubernetes vs Serverless
| Factor | Kubernetes | Serverless |
|---|---|---|
| Operational complexity | High (even managed) | Low |
| Scaling granularity | Pod-level | Request-level |
| Cold start | None | Yes (100ms-5s) |
| Long-running processes | Excellent | Limited (timeouts) |
| Cost at scale | Lower (reserved capacity) | Higher (per-request pricing) |
| Portability | High | Low (vendor-specific) |
| Ecosystem | Massive | Growing |
Recommendation: Use serverless for event-driven, sporadic workloads (webhooks, scheduled jobs, file processing). Use Kubernetes for core services with sustained traffic and complex networking requirements. Most cloud-native architectures use both.
Cloud-Native Databases
The database choice is one of the most impactful architecture decisions:
| Database Type | When to Use | Cloud-Native Options |
|---|---|---|
| Relational (PostgreSQL) | Structured data, transactions, joins | Azure Database for PostgreSQL, RDS, Cloud SQL |
| Document (MongoDB/Cosmos) | Flexible schema, global distribution | Cosmos DB, DocumentDB, Atlas |
| Key-value (Redis) | Caching, sessions, real-time data | Azure Cache, ElastiCache, MemoryDB |
| Time-series | IoT, monitoring, financial data | Azure Data Explorer, InfluxDB Cloud, Timestream |
| Vector | AI/ML embeddings, semantic search | Azure AI Search, Pinecone, Weaviate |
| Graph | Relationships, knowledge graphs | Cosmos DB (Gremlin), Neptune |
Recommendation: Start with PostgreSQL. It handles 80% of use cases well. Add specialised databases only when PostgreSQL genuinely can't meet the requirements — not because the architecture diagram looks more impressive with more database icons.
Anti-Patterns
Distributed Monolith
Services that can't be deployed independently because they share a database, use synchronous calls everywhere, or have tightly coupled data models. You get the complexity of microservices without the benefits.
Premature Microservices
Splitting into microservices before you understand your domain boundaries. You'll draw the boundaries wrong, and re-drawing them in a distributed system is much harder than in a monolith.
Kubernetes for Everything
Running a simple CRUD API on Kubernetes when a managed serverless function would work perfectly. The operational overhead of Kubernetes is only justified at a certain scale and complexity.
Cloud-Native Resume-Driven Development
Choosing technologies because they look good on a CV rather than because they solve the problem. Service mesh, event sourcing, and CQRS are powerful patterns — but most applications don't need them.
Cloud-native architecture is the foundation for scalable, resilient, and fast-moving technology organisations. If you're designing a cloud-native architecture or migrating from a legacy system, let's talk.