Affected
Degraded performance from 2:30 AM to 12:54 PM
Degraded performance from 2:30 AM to 12:54 PM
- ResolvedResolved
Symptoms
Users integrating via LTI 1.0, 1.1 or 1.2 into api.practice-labs.com may have noticed an increase in loading times, eventually resulting in a HTTP 503 'Service Unavailable' status code when accessing Practice Labs or Assessments.What Went Wrong
A package called "RestSharp" was identified as not correctly disposing of used TCP connections and potentially leading to a socket exhaustion scenario which eventually meant the process was no longer able to accept incoming connections.Who Was Impacted
Any customer or users using LTI 1.0, 1.1 or 1.2 integration methods from their LMS.Why It Went Wrong
Our monitoring platform reported a large increase in established TCP connections on 2 separate occasions leading to the investigations into the cause. However, the issue had already occurred and caused the process to crash.How We Fixed It
We have removed any references to the "RestSharp" package and replaced with a more robust HTTP client and retry logic to further prevent this issue occurring again. - IdentifiedIdentified
A potential root cause has been identified within the LTI gateway, a package used in this application called "RestSharp" has been known to excessively consume server resources leading to instability.
This package has since been removed and we are completing further testing in our Staging environment prior to an expected Production release on January 17th 2024.
- UpdateUpdate
An automated workaround has been implemented to help remediate potential recurrences of this incident. Further investigations into the root cause are continuing and updates will be provided as more information becomes available.
- InvestigatingInvestigating
What is wrong
LTI v1.0, 1.1 and 1.2 integrations via api.practice-labs.com may notice an increase in loading times or in certain instances see a HTTP 503 'Service Unavailable' error when accessing Practice Labs or Assessments.
Who is impacted
Any customer or users using LTI v1.0, 1.1 or 1.2 integration methods from their LMS.
Why is it wrong
Our monitoring platform has confirmed that we have had two occurrences of this on consecutive days overnight during approximately the following time windows:
Monday 20th November, 02:30am UTC - 07:30am UTC
Tuesday 21st November, 02:00am UTC - 05:00am UTCHow will we fix it
We are currently continuing investigations into this issue and will provide updates as more information becomes available.