ACI Learning - Issues accessing the Practice Labs platform – Incident details

Issues accessing the Practice Labs platform

Resolved
Operational
Started 12 months agoLasted 8 days

Affected

Practice Labs

Operational from 12:15 PM to 11:10 AM

www.practice-labs.com

api.practice-labs.com

Operational from 12:15 PM to 11:10 AM

Updates
  • Resolved
    Resolved

    Symptoms

    Users attempting to access the Practice Labs platform were unable to login.

    What went wrong

    Our database cluster had a spike in memory usage which caused a failover of the primary node to one of the secondary nodes. During this period the cluster become unresponsive.

    Who was impacted

    All users attempting to login to the Practice Labs platform.

    Why it went wrong

    Memory exhaustion in our database cluster.

    How did we fix it

    We have upgraded the memory in all 4 nodes in our database cluster. One node is running with less RAM than we have allocated due to a minor hardware fault which is being addressed by our maintenance provider.

    Our database cluster is now operating with reduced processing times, in some cases up to 60% faster with the additional RAM. We have monitored this closely for 7 days and are now comfortable that we can come out of monitoring.

  • Monitoring
    Update

    We have completed our emergency maintenance to upgrade our CDC database servers as part of the remediation plan from yesterdays outage. We will continue to closely monitor the platform.

  • Monitoring
    Update

    We are performing emergency maintenance at 9am UTC to upgrade our CDC database servers as part of the remediation plan from yesterdays outage, this should not impact users but are monitoring closely.

  • Monitoring
    Monitoring

    Our database clusters primary node automatically failed over to one of its secondary nodes which restored user access to the platform.

    We are investigating further corrective actions and will continue to monitor. We appreciate your understanding and patience during this incident.

  • Identified
    Identified

    We are currently investigating an issue where users are unable to access the platform either by logging in or launching a lab from another platform.

    We apologies for any inconveniences this has caused.

  • Investigating
    Investigating

    We are investigating an issue where users are unable to access the platform either by logging in or launching a lab from another platform.