Summary:
Between 13/08/22 at 06:30 AM IST and 09/08/22 at 09:45 AM IST, we experienced dashboard slowness for all customers in the US cluster.
Timeline (IST):
Date of the first occurrence:
13 August, 2022 05:15 AM IST
Impact of the incident on the customer:
Impact: HIGH
Event consumption stopped for all pipelines and all customers.
Root cause:
We have noticed some slow queries and deadlocks on few rows and tables because of which APIs were failing and the cluster went in inoperable state.
Solution:
We have identified and fixed the query/code which caused deadlock.