Initial Post-Mortem: Mapiq Service Unavailable Due to Certificate Expiration
Incident Overview
On September 11, 2024, Mapiq experienced an issue where users were unable to log in to the Mapiq service. The incident began when a certificate for a domain that was no longer in use by Mapiq (*.mapiq.net) expired at midnight. Since the certificate was no longer needed, we intentionally chose not to renew it. After the expiration, we discovered two dependencies in our system that still relied on the expired domain, even though it was officially retired.
Timeline
Incident started: 2024-09-11 00:00:00 UTC
Workaround implemented: 2024-09-11 06:46:00 UTC
Services back to normal: 2024-09-11 07:01:00 UTC
Impact
The incident caused a disruption for users interacting with specific parts of the system that relied on the expired domain.
Resolution
One affected service was quickly restored by temporarily redirecting traffic to the non-custom Azure domain URL, allowing all signed-in users to regain access.
The second issue, which affected the sign-in process, required additional steps. We resolved this by temporarily recreating the old domain on Azure Front Door, using the managed certificate provided by Azure.
Preventative Measures
This is an initial write-up of the incident. We are reviewing our processes to ensure any unused certificates or domains are fully decommissioned and to improve monitoring of dependencies on certificates. Ironically, this certificate was the last one in an ongoing project to transition all services to fully managed certificates (where expiration and renewal do not require human intervention). While a recurrence of this type of incident is unlikely, Mapiq will take these learnings to continue enhancing our services.