Microsoft 365 boosts usage analytics with Azure Cosmos DB
Office 365 is a flagship service within the Microsoft 365 Enterprise solution, with millions of commercial customers and more than 150 million active commercial users each month. Office 365 provides extensive microsoft closing stores reporting for administrators within each company on how the service is being used including license assignment, product-level usage Support.Microsoft.Com/Help, user-level Microsoft edge virus activity, site activity, group activity, storage consumption, and more. The Microsoft 365 Microsoft Support Phone Number usage analytics team Microsoft Customer Service incrementally adds new reports to cover more Office 365 services.
Previous architecture
The telemetry data needed to generate such reports was collected in a system called usage analytics, that until recently ran on the community version of MongoDB. The image below shows the data flow, with an importer web service used to write log streams collected in Azure Blob storage to MongoDB. An OData web service exposes APIs to extract the stored data for both reporting within the Microsoft 365 admin center and for access through Microsoft Graph. Every day, as part of a full daily refresh, several billion rows of data were added to the system.
Each of the primary geographies served by Office 365 has an independent usage analytics repository, all employing a similar architecture. In each geography, data was stored on two MongoDB clusters Microsoft edge virus, with each cluster consisting of up to 50 virtual machines (VMs) hosted in Azure Virtual Machines and Microsoft Customer Service running MongoDB. The two clusters in each geography functioned in a primary/backup configuration. Data Support.Microsoft.Com/Help was written separately to both clusters and under normal operation, all reads were performed on the primary cluster.
Each cluster was designed for a write-heavy workload. To speed writes, sharding of data across individual cluster nodes was done using a random globally unique identifier (GUID) such as a MongoDB shard key. Every day for a few hours Microsoft Support Phone Number, new data from Azure Blob microsoft closing stores storage was written using a multithreaded importer. Each thread wrote batches of 2,000 records at a time to all cluster nodes and waited for all records to finish before starting on the next batch of 2,000.
Problems and pains
This architecture presented several problems for the Microsoft 365 usage analytics team, ranging from excessive administrative effort and costs to limited performance, reliability, availability, and scalability. Some specific pains included:
- Poor performance. Reads were inefficient and reports sometimes timed out because of the use of a random GUID as a shard key required querying all nodes. In addition, during the few hours each day when new data was imported, with writes and reads hitting the primary cluster node Microsoft edge virus during the same time Microsoft Customer Service, performance was poor. To make matters worse, if anything failed during a batch write, which often happened due to internal database errors, all 2,000 records had to be written again.
- Full-time administration. Maintenance of the MongoDB clusters was manual and time-consuming, requiring human resources to dedicate time towards managing the clusters. This put an unnecessary microsoft closing stores resource constraint on the team Support.Microsoft.Com/Help, which would rather use its bandwidth to bring new reports to market. Plus, bugs in MongoDB 3.2 required all servers to be restarted weekly. And renewing the security certificates on each cluster node within the virtual network had to be completed annually, and required an additional two weeks of effort per cluster. During such routine administrative tasks, if an operation failed on one cluster node, the entire cluster was down until the issue was resolved.
- High costs. Significant costs were incurred to run Microsoft Customer Service the MongoDB backup clusters, which remained idle most of the time. Those costs continued to increase as Office 365 usage grew.
- Limited scalability. Less than three years after MongoDB was initially deployed, the largest repository Microsoft edge virus was almost at maximum capacity. Any spare capacity was forecast to run out within six months as more products and Microsoft Support Phone Number reports were added, with no easy way to scale.
While the team was microsoft closing stores dealing with the architectural limitations of its existing solution, they were looking ahead to a lineup of new, high-scale capabilities that they wanted to enable for customers in the usage analytics space. The team started looking for a new, cost-effective, and low-maintenance solution that would let them move from self-maintained VMs running MongoDB to a fully managed database service.
Geo-distribution on Azure Cosmos DB: The key to an improved architecture
After exploring their options, the team decided to replace MongoDB with Azure cosmos DB, a fully managed globally-distributed, multi-model database service designed for global distribution and virtually unlimited elastic scalability. The first step was to deploy the needed infrastructure.
In contrast to the primary/backup Microsoft Support Phone Number, two-cluster configuration that it had used with MongoDB, the team took advantage of turnkey global distribution of active data in Azure Cosmos microsoft closing stores DB. Using multiple Azure region for data replication provided an easy way to write to any region Microsoft edge virus Microsoft Customer Service, read from any region, and better balance the workload across Support.Microsoft.Com/Help the database instances—all while relying on Azure Cosmos DB to transparently handle active data replication and data consistency.
“True geo-replication had been deemed too hard to do with MongoDB, which is why the previous architecture separately microsoft closing stores wrote data to both the primary and backup clusters,” says Xiaodong Wang, a Software Engineer on the Microsoft 365 usage analytics team. “With Azure Cosmos DB, implementing transparent geo-distribution literally took minutes—just a few mouse clicks.”
The image below shows the internal architecture of the usage analytics system today. Each of the primary geographies served Microsoft Customer Service by Office 365 is served by Cosmos databases geo-replicated across two Azure regions within that geography. Under normal operating conditions, writes are sent to one region within each geography while reads are routed to both Microsoft edge virus. If for some reason a region is prevented from serving reads Microsoft Support Phone Number, those reads are automatically routed to the other region serving that same geography.
Comments
Post a Comment