3DEXPERIENCE platform SaaS Public Cloud unavailability February 12th - 1

Summary

Starting at 09h00 UTC on February 12th, 2024Users of the 3DEXPERIENCE platform SaaS Public Cloud experienced service unavailability across all regions. Users faced the following errors: service unavailable errors, timeouts and access errors.

The incident lasted 30 minutes until 09h30 UTC. Following service recovery, certain users still experienced some slowness until 11h30 UTC.

This incident was immediately detected by the Cloud Operations team and managed until resolution.

Symptom

Some users across all regions of the 3DEXPERIENCE platform SaaS Public Cloud were affected, by being prevented access to the service.
 

Causes

As part of a planned configuration change, the service allowing access to Roles & Apps was upgraded in order to improve performances and scalability and meet users ramp up in the coming months.

Monday morning, during Users ramp-up, an unexpected behavior occurred:This service

  • became significantly slow, eventually reaching a point where it stopped responding to User requests.
  • Some SQL queries were taking an unusually long time to execute and were stacking.
  • This behavior was caused by two factors:
    • Non-optimized SQL queries,
    • Missing indexes.
  • The missing indexes and non-optimized queries were not detected during our change management process.

For consistency across all 3DEXPERIENCE platform services, the service allowing access to Roles & Apps is designed with very high availability and is global. When it experiences downtime, all services worldwide may be impacted.

Remediation

The 24x7 operations team, in collaboration with the development teams, responded to this incident in two steps:

  • As an immediate response, we created the missing indexes. After the indexes were created, the service went back online, although some slowness persisted.
  • To fix the remaining slowness: a fix that consisted in optimizing the impacted queries was created and deployed. After deployment, the service went back to normal.

During the day, the operations and development teams closely monitored the service to confirm the fix properly addressed the issue.

Prevention

The Root Cause Analysis (RCA) has been initiated to understand why this issue occurred.

In Closing

Finally, we sincerely apologize for the inconvenience this unprecedented event may have caused you.

We know how important the 3DEXPERIENCE platform SaaS is to our users and their businesses. We will make sure to learn from this event in order to maintain our customers’ trust and to continue improving on availability of our online services even further.

Get supported

Need Assistance?

Our support team is here to help you make the most of our software. Whether you have a question, encounter an issue, or need guidance, we've got your back.