Cosmos DB
Developer(s) | Microsoft |
---|---|
Initial release | 2017 |
Development status | Active |
Available in | English |
Type | Multi-model database |
Website |
www |
Azure Cosmos DB is Microsoft’s globally-distributed, multi-model database service "for managing data at planet-scale" launched in May 2017.[1] It builds upon and extends the earlier Azure DocumentDB, which was released in 2014.[2] It is schema-less and generally classified as a NoSQL database.
Dynamically tunable
With the current recommended option of "partitioned collection" type, DocumentDB is dynamically tunable along three dimensions:
- Throughput. Developers reserve throughput of the service according to the application's varying load. Behind the scenes, DocumentDB will scale up resources (memory, processor, partitions, replicas, etc.) to achieve that requested throughput while maintaining the 99.99th percentile of latency for reads to under 10 ms and for writes to under 15 ms. Throughput is specified in request units (RUs) per second. The number of RUs consumed for a particular operation vary based upon a number of factors, but the fetching of a single 1KB document by id spends roughly 1 RU. Delete, update, and insert operations consume roughly 5 RUs assuming 1 KB documents. Big queries and stored procedure executions can consume 100 s or 1000 s of RUs based upon the complexity of the operations needed.[3]
- Space. Similarly, Developers can specify how much storage they will need. Both space and throughput directly effect how much the user is charged but either can be tuned up dynamically to handle peak load and down to save costs when more lightly loaded.
- Consistency. DocumentDB provides four consistency levels: strong, bounded-staleness, session, and eventual. The further to the left in this list, the greater the consistency but the higher the RU cost which essentially lowers available throughput for the same RU setting. Session level consistency is the default.[4] Even when set to lower consistency level, any arbitrary set of operations can be executed in an ACID-compliant transaction by performing those operations from within a stored procedure. You can also change the consistency level for each request using the
x-ms-consistency-level
request header or the equivalent option in your SDK.
Partitioning
DocumentDB added automatic partitioning capability in 2016 with the introduction of partitioned collections. Behind the scenes, the collection will span multiple physical partitions with documents distributed by a caller-supplied partition key. DocumentDB automatically decides how many partitions to spread your data across depending upon the size and throughput needs. When DocumentDB decides to add (or remove) partitions, your data remains available while it is rebalanced across the new (or remaining) partitions.
Before partitioned collections were available it was common to write your own code to partition your data and some of the DocumentDB SDKs explicitly supported several different partitioning schemes. That mode is still available but now only recommended when your needs will not exceed the capacity of one collection or when the built-in partitioning capability does not otherwise meet your needs.
Automatic indexing
By default, every field in each document is automatically indexed generally providing good performance without tuning to specific query patterns. These defaults can be modified by setting an indexing policy which can vary per field.
JavaScript
A JavaScript engine is embedded in DocumentDB. This is a perfect fit for JSON documents, but it is also enables additional functionality:
- Stored procedures. Functions that bundle an arbitrarily complex set of operations and logic into an ACID-compliant transaction. They are isolated from changes made while the stored procedure is executing and either all write operations succeed or they all fail, leaving the database in a consistent state. Stored procedures are executed in a single partition which necessitates that the caller provide a partition key when calling into a partitioned collection. Stored procedures can be used to make up for the lack of certain functionality. For instance, the lack of aggregation capability is made up for by the implementation of an OLAP cube as a stored procedure in the open sourced documentdb-lumenize[5] project.
- Triggers. Functions that get executed before or after specific operations (like on a document insertion for example) that can either alter the operation or cancel it.
- User-defined functions (UDF). Functions that can be called from and augment the SQL query language making up for limited SQL support.
Supported environments
In the following environments all features (except Direct Mode which is currently only supported for .NET) are explicitly supported with dedicated SDKs:
Additionally, DocumentDB can be accessed with the following:
- REST API. All features except Direct Mode are supported. You can call this REST API from any language or platform. In fact, the Node.js, Java, and Python SDKs are essentially thin wrappers calling this REST API.
- MongoDB driver-level protocol support. Most features are implemented with two notable exceptions: 1) the low-level (undocumented?) API that allows applications like Meteor to install themselves as a replica and receive all changes as an event stream, and 2) aggregations.
Querying mechanisms
Several mechanisms for querying are provided:
- SQL-like query language with adjustments to match JSON data types.
- LINQ language integrated queries.
- JavaScript language integrated queries. This is only available from the server-side SDK exposed to stored procedures, triggers, and user-defined functions. It is modeled after the Underscore.js API.
- MongoDB query language (JSON) via the MongoDB driver-level protocol support.
Other features
Additionally DocumentDB has support for:
- Global distribution.[7] Global distribution was added to DocumentDB's capability in 2016. This feature lets you scale your DocumentDB instance across different regions around the world and define what type of consistency you expect between the regions, from strong to eventual. It is even possible to configure an automatic and transparent failover for a given region.
- BLOB storage via a behind-the-scenes integration with Azure BLOB Storage. If an Azure Blob Storage instance doesn’t exist, one is automatically provisioned when the first write to blob storage is issued.
- GeoJSON support for storing and querying geographical information
Reception
Gartner Research positions Microsoft as the leader in the Magic Quadrant Operational Database Management Systems in 2016[8] and explicitly calls out the unique capabilities of DocumentDB in their writeup.
Real-world use cases
- Personalization
- IoT
- Mobile
- Games
- Artificial Intelligence
- Social network architectures.[9]
- Integrations with identity providers like Auth0.[10]
Criticism and cautions
- Triggers must be explicitly specified for each operation that you wish to use them which renders them ineffective as a mechanism for maintaining business logic consistency unless you can be certain that all the correct triggers are specified for every operation.
- .NET LINQ language integrated queries are not fully supported. More and more LINQ support has been added over time, but developers are often confused when the LINQ code that they use on other systems fails to work as expected on DocumentDB as evidenced by the large number of StackOverflow questions containing both tags.[11]
- The lack of fully functioning local version. However, a local emulator running under MS Windows for developer desktop use was added in the fall of 2016.
- Aggregation capability in SQL limited to COUNT, SUM, MIN, MAX, AVG functions. No support for GROUP BY or other aggregation functionality found in database systems. However, stored procedures can be used to implement in-the-database aggregation capability.
- "Collection" means something different in DocumentDB. It is simply a bucket of documents. There is a tendency to equate them to tables where each collection would hold only a single type of document which is not recommended with DocumentDB. Rather, developers are encouraged to distinguish document types with a "type" field or by adding an "isTypeA = true" field to all documents of TypeA, "isTypeB = true" for all documents of Type B, etc. This is especially confusing to developers that are coming from MongoDB which has a "collection" entity that is intended to be used in a very different way.
- The lack of query plan visibility (e.g. "EXPLAIN" keyword in SQL).
- Support only for pure JSON data types. Most notably, DocumentDB lacks support for date-time data requiring that you store this data using the available data types. For instance, it can be stored as an ISO-8601 string or epoch integer. MongoDB, the database to which DocumentDB is most often compared, extended JSON in their BSON binary serialization specification to cover date-time data as well as traditional number types, regular expressions, and Undefined. However, many argue that DocumentDB's choice of pure JSON is actually an advantage as it's a better fit for JSON-based REST APIs and the JavaScript engine built into the database.
- Vendor lock-in. Since DocumentDB is only available as a PaaS offering from Microsoft Azure and there is currently no API compatible alternative, once you build a system on DocumentDB, you will not be able to easily get away from paying Azure for your usage of it.
See also
- Spanner, from Google
- Amazon DynamoDB
References
- ↑ "Azure Cosmos DB". Microsoft Azure. Microsoft. Retrieved 9 July 2017.
- ↑ CrawCour, Ryan (21 August 2014). "Introducing Azure DocumentDB – Microsoft’s fully managed NoSQL document database service". Retrieved 9 July 2017.
- ↑ syamkmsft. "DocumentDB storage and performance". docs.microsoft.com. Retrieved 2016-12-01.
- ↑ syamkmsft. "Consistency levels in DocumentDB". docs.microsoft.com. Microsoft. Retrieved 2016-12-01.
- ↑ Maccherone, Larry. "Announcing documentdb-lumenize". blog.lumenize.com. Retrieved 2016-12-11.
- ↑ "Using Azure DocumentDB asn ASP.NET Core for extreme NoSQL performance". auth0.com.
- ↑ kiratp. "Distribute data globally with DocumentDB". docs.microsoft.com. Retrieved 2016-12-11.
- ↑ "Magic Quadrant for Operational Database Management Systems". www.gartner.com. Retrieved 2016-12-11.
- ↑ "A Journey to Social". medium.com.
- ↑ "Planet-scale authentication with Auth0 and DocumentDB". auth0.com.
- ↑ "Newest 'azure-documentdb' Questions". stackoverflow.com. Retrieved 2016-12-07.