ACI Learning - Practice Labs platform unavailable – Incident details

Practice Labs platform unavailable

Resolved
Operational
Started almost 3 years agoLasted 8 days

Affected

Practice Labs

Operational from 9:29 AM to 8:27 AM

www.practice-labs.com

Operational from 9:29 AM to 8:27 AM

Lab Services

Operational from 9:29 AM to 8:27 AM

Intranet.practice-labs.com

Operational from 9:29 AM to 8:27 AM

Lab RDP

Operational from 9:29 AM to 8:27 AM

Azure Labs

Operational from 9:29 AM to 8:27 AM

Updates
  • Resolved
    Resolved

    This incident will now be marked resolved. We have observed continued stability in the platform since Saturday 25th. We will continue to work with our vendors on root cause and their recommendations, therefore any future such changes will be communicated through our Maintenance Schedule notifications on this status page.

  • Monitoring
    Update

    Daily update:

    Platform remains stable and our hardware vendors are providing feedback to problem resolutions. All solutions received so far are being evaluated and tested where possible offline in secondary environments without affecting the production platform.

    Next update due Sept 29.

  • Monitoring
    Update

    Please note that we implemented stability improvements and have been monitoring the platform over the weekend without issue. We are continuing to work with our vendors to isolate root causes for the platform instability. This incident will remain open in a monitoring state with at least daily updates. All services are available and operational.

  • Monitoring
    Monitoring

    We are continuing to work on a fix for this incident. The system has now recovered and we are monitoring our systems closely.

  • Investigating
    Investigating

    We are currently investigating this incident, we are currently collecting the data on this issue and will send another update in 30 mintues.

    Next updated expected 20:45 BST 20:15 UTC

  • Monitoring
    Update

    All services are now operational in the platform including screenshots from the DRT01 data centre. This incident remains under the monitoring status whilst we pursue hardware failures with our vendors. The system will remain under close monitoring and appropriate teams remain are on standby and monitoring.

  • Monitoring
    Monitoring

    We implemented a fix for the accessibility issues and currently monitoring the result.

  • Identified
    Update

    We are aware of a repeat of a previous incident in regards to core platform access. The engineering team are in the process of attempting to restore accessibility to the platform.

  • Identified
    Identified

    We are aware of issues logging into the platform right now. We apologise for this disruption. Our engineering team are working on this situation.

  • Monitoring
    Update

    This incident remains in a monitoring state, there are no further updates at this time. Technical teams continue to work with vendors to establish root causes. We do still have workarounds in place to which seperate emergency maintenance notifications will be distributed as and when changes are expected to be made. There is currently no eta on any restorative changes to the environment.

    Next update expected 1400 UTC

  • Monitoring
    Monitoring

    Services have been restored and the platform is accessible. This incident will be escalated to our vendor for investigation and root cause analysis.

    Screenshots from data centre DRT01 lab devices still requires switching to the HTML client to complete via the Settings menu (disable Connect toggle) as per previous incident notes.

  • Identified
    Update

    We are continuing to work on a fix for this incident. We are in the process of restoring services at this time. We are hoping to complete this process within 30 minutes.

    Next update expected at 2:00 AM UK time (02:00 UTC)

  • Identified
    Update

    We are continuing to work on a fix for this incident. Our Engineer is currently performing recovery actions.

    Next update is expected at 1:45 UK time (01:45 UTC)

  • Identified
    Update

    The engineer has arrived on-site, we are awaiting further update from them.

    Next update expected at 1:15 AM UK time (01:15 UTC)

  • Identified
    Update

    We are continuing to work on a fix for this incident. Our Engineer should be arriving on-site within the hour to investigate the issue further.

    Next update is expected at 12:45 AM UK time (00:45 UTC)

  • Identified
    Update

    The next update is expected at 12:15 AM UK time (00:15 UTC)

  • Identified
    Identified

    We are continuing to work on a fix for this incident. We have sent a Technical engineer on-site to investigate the issue further and collect more data on the incident.

    Next update is expected at 11:40PM UK time (23:40 UTC)

  • Investigating
    Investigating

    We are aware of a major incident impacting access to the platform. We are convening teams to investigate this issue.

  • Monitoring
    Update

    Please note that our secondary data centre is now operational again. Please however note that we are aware of issues capturing screenshots using the Connect RDP client. Please use the workaround mentioned in our previous incident update to switch to the HTML5 client to complete screenshots in the labs if you are allocated the DRT01 data centre as shown in the top level device menu bar of the lab device.

    This issue remains under a monitoring state as we continue to work with our vendors to resolve our Data centre connectivity in full.

  • Monitoring
    Update

    We continue to investigate the platform stability issues. We are aware of connection issues affecting RDP sessions and we can advise those users who are facing issues to disable "Connect" and switch to HTML5 from the Settings menu as documented here https://help.practice-labs.com/practice-lab/settings-tab as a workaround.

  • Monitoring
    Update

    This incident remains under investigation. We continue to work with our vendors and development teams to identify root causes of open issues affecting the platform.

    Next update due 1400 UTC

  • Monitoring
    Update

    The platform is now in a monitoring state. We are aware of an issue affecting screenshot capability with lab devices serviced from the DRT01 data centre as indicated in the device menu bar. This issue remains under investigation.

    Next update expected Thursday, 0900 UTC.

  • Monitoring
    Monitoring

    We implemented a workaround and Cisco labs are now available again. We are currently investigating an issue regarding the taking of screenshots from DRT01 sourced lab devices. This is the final health check item in regards to full service restoration.

  • Identified
    Update

    We are continuing to work on a fix for this incident. We continue to investigate access issues with Cisco based labs.

  • Identified
    Update

    We continue to work on restoring the secondary data centre lab capacity. No further progress or eta to advise at this time. This still continues to directly impact Cisco lab availability.

  • Identified
    Update

    Cisco labs remain unavailable however we are continuing to identify and implement a workaround to restore this access.

  • Identified
    Update

    We have resumed lab services from our primary data centre. We are in the process of monitoring and validation overall platform health and are aware that Cisco Labs are currently unavailable. We are continuing to investigate this.

  • Identified
    Update

    We have reverted services back to our primary data centre and are in the process of recommissioning lab servers. Access to Labs should be available within 30 minutes from this notification.

  • Identified
    Update

    Unfortunately, the situation is the same as in the last update.

    Next update expected at 2:30 UK time (1:30 UTC)

    Once again we sincerely apologise for the disruption this is causing.

  • Identified
    Update

    The platform is still currently unable to establish access to Lab Devices and the team are continuing to work on this.

    Our core data centre is also in the process of restoration so teams are working across both streams to restore services as soon as possible.

    Unfortunately, there is not a notable update to be provided at this time.

    Next update expected at 1:45 UK time (12:45 UTC)

    Once again we sincerely apologise for the disruption this is causing.

  • Identified
    Update

    The platform is still currently unable to establish access to Lab Devices and the team are continuing to work on this.

    Our core data centre is also in the process of restoration so teams are working across both streams to restore services as soon as possible.

    Next update expected at 1:15 UK time (12:15 UTC)

    Once again we sincerely apologise for the disruption this is causing.

  • Identified
    Update

    We are continuing to configure the disaster recovery platform at this time. Unfortunately, we do not have a notable update to provide. Next update expected in 30 minutes on overall status. We sincerely apologize for the disruption caused.

  • Identified
    Update

    We are continuing to configure the disaster recovery platform at this time. Next update expected in 30 minutes on overall status.

  • Identified
    Update

    Practice Labs Disaster recovery site is now accessible and the team are provisioning lab services and RDP servers. Next update expected in 30 minutes. Please note additional Services identified as impacted within this incident.

  • Identified
    Identified

    Please note that the Practice Labs platform is currently unavailable. This relates to a previous emergency maintenance window listed on this status page. At current we are in process of activating our disaster recovery site which will have a reduced lab title capacity. Please click this incident to see affected service details such as Persistent Labs.

    We will update this incident once the disaster recovery site is operational. Current eta to platform access is 11:00 PM UK time.

    We are working to restore the Primary Data Centre as quickly as possible to recover full services.