Following an outage in service beginning last Friday evening, Dropbox has posted an extensive explanation and (rather less extensive) apology:
On Friday at 5:30 PM PT, we had a planned maintenance scheduled to upgrade the OS on some of our machines. During this process, the upgrade script checks to make sure there is no active data on the machine before installing the new OS.
A subtle bug in the script caused the command to reinstall a small number of active machines. Unfortunately, some master-replica pairs were impacted which resulted in the site going down. link
I knew something was wrong Friday night while trying to work with several files in Editoral. At first I thought I had accidentally deleted them or moved them with an ill conceived Hazel rule, but after failing to see the files with the actual Dropbox app, I knew something was up. A quick twitter check confirmed several things:
- a lot of people use dropbox
- dropbox was indeed having a service issue
- hackers are always willing to take credit for major website outages
Dropbox did offer a tweet and several updates to their tech blog on the first day of the outage (many users continued having problems through the weekend) and today’s thorough explanation is great, but one thing I’ve found to almost always be true: users of a service never mind receiving extensive and frequent updates using an apologetic tone in real-time on well known platforms (i.e. more than a single tweet on Twitter).