Posted on 05/31/2024 12:27:33 PM PDT by Red Badger
Earlier this month, Google's cloud platform deleted the entire customer account, including some backups, of UniSuper.
Why it matters: Fortunately for the $135 billion Australian pension fund's 647,000 members, some of UniSuper's backups on Google Cloud's servers and elsewhere were salvageable, and the fund was able to recover its data, teaching us all a lesson about having multiple redundancies.
What they're saying: This was not a "systemic issue," Google says.
"An inadvertent misconfiguration" during a setup left a data field blank, which then triggered the system to automatically delete the account. The big picture: Google is having a rough 2024. In addition to this mishap, the company is reeling from its AI Overview debut and its disastrous AI-generated image tool launch.
Editor's note: This story has been corrected to reflect that some of UniSuper's backups on Google Cloud's servers were not erased.
Ping!....................
Practicing for wealth confiscation….
A ‘rounding’ error................It’s not ‘around’ any more.............
everything is 1’s and 0’s now.
Oh sorry. My bad. Oops 😬
You guys wanna go to Taco Bell 🔔 today?
🌮🌮🌮🌯🌯🌯
My treat. Got a little extra cash.
FWIW, I find AWS more reliable, mainly because Amazon actually runs their business on it.
Here is the explanation of what happened
https://cloud.google.com/blog/products/infrastructure/details-of-google-cloud-gcve-incident
Google Cloud Customer Support
A Google Cloud incident earlier this month impacted our customer, UniSuper, in Australia. While our first priority was to work with our customer to get them fully operational, soon after the incident started, we publicly acknowledged the incident in a joint statement with the customer.
With our customer’s systems fully up and running, we have completed our internal review. We are sharing information publicly to clarify the nature of the incident and ensure there is an accurate account in the interest of transparency. Google Cloud has taken steps to ensure this particular and isolated incident cannot happen again. The impact was very disappointing and we deeply regret the inconvenience caused to our customer.
Scope of the impact
The below listed impacted technologies and services is a description of only Google managed services.
This incident impacted:
One customer in one cloud region.
That customer’s use of one Google Cloud service - Google Cloud VMware Engine (GCVE).
One of the customer’s multiple GCVE Private Clouds (across two zones).
This incident did not impact:
Any other Google Cloud service.
Any other customer using GCVE or any other Google Cloud service.
The customer’s other GCVE Private Clouds, Google Account, Orgs, Folders, or Projects.
The customer’s data backups stored in Google Cloud Storage (GCS) in the same region.
What happened?
TL;DR
During the initial deployment of a Google Cloud VMware Engine (GCVE) Private Cloud for the customer using an internal tool, there was an inadvertent misconfiguration of the GCVE service by Google operators due to leaving a parameter blank. This had the unintended and then unknown consequence of defaulting the customer’s GCVE Private Cloud to a fixed term, with automatic deletion at the end of that period. The incident trigger and the downstream system behavior have both been corrected to ensure that this cannot happen again.
This incident did not impact any Google Cloud service other than this customer’s one GCVE Private Cloud. Other customers were not impacted by this incident.
Diving Deeper:
Deployment using an exception process
In early 2023, Google operators used an internal tool to deploy one of the customer’s GCVE Private Clouds to meet specific capacity placement needs. This internal tool for capacity management was deprecated and fully automated in Q4 2023 and is therefore no longer required (i.e. no need for human intervention).
Blank input parameter led to unintended behavior
Google operators followed internal control protocols. However, one input parameter was left blank when using an internal tool to provision the customer’s Private Cloud. As a result of the blank parameter, the system assigned a then unknown default fixed 1 year term value for this parameter.
After the end of the system-assigned 1 year period, the customer’s GCVE Private Cloud was deleted. No customer notification was sent because the deletion was triggered as a result of a parameter being left blank by Google operators using the internal tool, and not due a customer deletion request. Any customer-initiated deletion would have been preceded by a notification to the customer.
Recovery
The customer and Google teams worked 24x7 over several days to recover the customer’s GCVE Private Cloud, restore the network and security configurations, restore its applications, and recover data to restore full operations.
This was assisted by the customer’s robust and resilient architectural approach to managing risk of outage or failure.
Data backups that were stored in Google Cloud Storage in the same region were not impacted by the deletion, and, along with third party backup software, were instrumental in aiding the rapid restoration.
Remediation
Google Cloud has since taken several actions to ensure that this incident does not and can not occur again, including:
We deprecated the internal tool that triggered this sequence of events. This aspect is now fully automated and controlled by customers via the user interface, even when specific capacity management is required.
We scrubbed the system database and manually reviewed all GCVE Private Clouds to ensure that no other GCVE deployments are at risk.
We corrected the system behavior that sets GCVE Private Clouds for deletion for such deployment workflows.
Conclusions
There has not been an incident of this nature within Google Cloud prior to this instance. It is not a systemic issue.
Google Cloud services have strong safeguards in place with a combination of soft delete, advance notification, and human-in-the-loop, as appropriate.
We have confirmed these safeguards continue to be in place.
Closely partnering with customers is essential to rapid recovery. The customer’s CIO and technical teams deserve praise for the speed and precision with which they executed the 24x7 recovery, working closely with Google Cloud teams.
Resilient and robust risk management with fail safes is essential to rapid recovery in case of unexpected incidents.
Google Cloud continues to have the most resilient and stable cloud infrastructure in the world. Despite this one-time incident, our uptime and resiliency is independently validated to be the best among leading clouds.
Your 1's became 0's.
(Practicing for wealth confiscation….)
heh (laughing, nervously)
Coincidentally, the Australian government reported an unexpected $135 billion surplus. “Crikey,” a spokesman stated, “We don’t know where it came from but we’ll take it.”
It’s a rounding error.
Your files aren’t around any more...................
During my days as a Computer Engineer, fortunately I was not the culprit, but I saw a few things that were pretty bad but nowhere near this bad.
On the Engineering team I was on, we had a person take down the entire ATM network along the eastern coast of the USA, several thousand ATMs in total, this was one of the largest bank in America, fortunately we recovered from that mistake in about 1 hour.
We had a guy one time take down about 20,000 voice mail accounts at the corporate office including the CEO of the same bank where the ATMs were taken down.
Finally, we had a guy while working at the same bank, while running a cleanup routine on Active Directory accounts accidently deleted thousands of user accounts of active users, rendering their access to the network null and void, that was a major screw up.
Mostly 0’s..................
(everything is 1’s and 0’s now.)
WHAT could possibly go wrong? It sure ain’t the 1940s anymore.
Revelation 13:16-18
21st Century King James Version
16 And he causeth all, both small and great, rich and poor, free and bond, to receive a mark in their right hand or in their foreheads,
17 that no man might buy or sell, save he that had the mark or the name of the beast or the number of his name.
18 Here is wisdom: Let him that hath understanding count the number of the beast, for it is the number of a man; and his number is six hundred threescore and six.
https://www.biblegateway.com/passage/?search=Revelation+13%3A16-18&version=KJ21
(It’s OK that some people don’t believe me now; because they will. This WILL NOT be avoided)
✝️🙏🛐
(Mostly 0’s..................)
HEY!! Quit looking at me bank 🏧🏧🏧 account!!!
Me kangaroo 🦘🦘🦘 gonna get ya!
Feels good to be a gangsta
Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.