Building Resilient Azure Cosmos DB Kingdoms

Room 7Wed 13 May • 13:15–14:15Cloud & DevOpsIntermediate
Designing a highly available and resilient distributed system is like fortifying a medieval stronghold. Every decision involves trade-offs, every safeguard has a cost, and there is no single “correct” architecture that fits all kingdoms. In this session, we explore how to build resilient, highly available applications using Azure Cosmos DB, drawing together theory and practice across consistency models, CAP and PACELC theorems, multi-region architectures, and client-side resilience patterns such as circuit breakers and request hedging. We’ll examine how these mechanisms interact, and, more importantly, where they conflict. We’ll also address a modern reality: while AI tools are excellent at generating architectures and suggesting “best practices”, they often struggle (just as humans do) with the deeply contextual trade-offs inherent in distributed systems. Choices around consistency, latency, availability, and failure behaviour depend on business semantics, user expectations, and operational risk - factors that cannot be safely abstracted away. This talk equips you with the mental models needed to make these decisions deliberately, rather than outsourcing them blindly to templates or AI agents!

About the speaker

Theodorus Leonardus van Kraay

Theo is passionate about NoSQL and distributed computing. He joined Microsoft in 2017 and has been in the Cosmos DB Engineering team as a Program Manager since 2019. He currently focuses on AI, programmability, and developer experience for Azure Cosmos DB. He has a masters degree in Data Science from Dundee University, and lives in the UK with his wife, two boys, and ragcoon cat.