ACI Learning - Issues with lab availability on the Practice Labs platform – Incident details

Issues with lab availability on the Practice Labs platform

Resolved
Operational
Started 7 months agoLasted 2 days

Affected

Practice Labs

Operational from 7:55 AM to 4:56 PM

www.practice-labs.com

Operational from 7:55 AM to 4:56 PM

Lab Services

Operational from 7:55 AM to 4:56 PM

Intranet.practice-labs.com

Operational from 7:55 AM to 4:56 PM

Lab RDP

Operational from 7:55 AM to 4:56 PM

Azure Labs

Operational from 7:55 AM to 4:56 PM

Updates
  • Resolved
    Resolved

    Please be advised, the incident Sunday morning UTC where labs were not accessible from either ATL or CDC locations has been successfully resolved on 2024-02-04 11:30am UTC.

     

    What went wrong

    An automated maintenance task overran, causing locks on the production database. Subsequently the database failed over and due to large trans log growth, we were unable to recover in a timely fashion.​

    Who was impacted

    All Users were unable to login to the Practice Labs platform.​ It’s possible that activity from users already logged into the platform may not be complete.​

    Why it went wrong

    Automated task was introduced to production, executing a trans log clean-up and index defragmentation task. The size of the database caused the index to rebuild and trans log reduction jobs to overrun causing locks in the production database.

    How did we fix it

    We began a database replication from last known working backup ​and built a physical node with twice the resources whilst new primary is building​.

     

    This incident is now closed. We appreciate your patience and understanding during this process.

  • Monitoring
    Monitoring

    Please be advised, the incident where labs are not accessible from either ATL or CDC locations has been resolved.

    This incident has been placed into monitoring until further notice.

  • Identified
    Identified

    We have successfully identified issues affecting our services and are currently working on bringing them back online. Our team is diligently working to resolve the situation, and we appreciate your patience during this time.

    We will keep you updated on our progress and let you know as soon as the services are fully operational again. Thank you for your understanding.

  • Investigating
    Update

    We are currently engaged in ongoing investigations to determine the underlying cause of the access issues affecting the labs in these locations. Our technical team is currently focused on restoring Practice Labs databases, which appear to be the source of the problem.

    We recognise that this situation may be causing inconvenience and disruption, but please rest assured that our engineers are diligently working to resolve it as swiftly as possible.

    We will provide regular updates regarding any developments and progress in resolving this issue. Thank you for your patience and understanding.

  • Investigating
    Update

    We are still actively investigating the root cause behind the inability to access the labs from these locations. Our technical team is diligently working to identify and resolve the issue to ensure that normal operations can resume promptly.

    I understand that this situation may be causing inconvenience and disruption. Please rest assured that we are treating this matter with the utmost priority and doing everything within our capacity to address it as swiftly as possible.

    We will continue to keep you updated on any developments and progress made in resolving this issue.

  • Investigating
    Investigating

    Please note that we are aware of and currently investigating an incident where labs are not accessible from either ATL or CDC locations.