It's 2024. Why Is Consumer SaaS Backup (Still) Almost Non-Existent?

alt text

Yes, dear person on GitHub, I know what you’re thinking (at least I think I do):

This is a safe refuge from clickbait.

Also: that’s one of the strangest and most melodramatic headlines I’ve seen in the past five minutes.

Also: is that a sloth running a backup job!?

(Why yes, yes it is. Thank you, DALLE!)

Anyway.

If I’m creating a small repository of my most hot-button thoughts on technology subjects, it would be totally remiss of me to skip the one subject that’s likely to draw yawns even from devoted tech enthusiasts.

That would be… backups (don’t go to sleep!)

There’s something vaguely shameful about caring about backups, I think.

Prepping has the same kind of baggage.

Did I mention that I’m also a prepper?

Backup anoraks are the preppers of the digital world. Sometimes, they’re awful enough to be both kinds of terrible person at the same time. My apartment is littered with both cans of tuna and spare hard drives. Wanna come round?

Among my claims to fame:

I have appeared as a guest (twice!) on the world’s first and (as far as I know) only backup podcast. It’s now called The Backup Wrapup and it’s hosted by the dynamic duo of W. Curtis Preston (AKA “Mr. Backup”) and Prasanna Malaiyandi, whose name is far harder to remember to type correctly (and I’m writing this sans spell-check in markdown!)

I’ll try to make this quick.

It will be rant-like. Okay, it’s just a rant.

Ahyway. Let’s begin here:

The Cloud Is Not Backup!

alt text

Cloud vendors would like you to think that “the cloud” is synonymous with backup.

Unfortunately, this well-worn aphorism is pretty much on point:

alt text

But the dismissive attitude that many SaaS companies take toward the subject of how consumers can use their clouds while also ensuring some kind of (managed) backup runs a bit deeper than hubris that anything stored on S3 (etc.) is essentially indestructible.

Despite being a backup nerd, I’m not much one for conspiracy theories.

I don’t think there’s any kind of grand sophisticated plot at work here to deceive consumers into thinking that their data is better protected than it is. I think most SaaS providers just know that most consumers would rather not think about stuff like servers and data federacy. So it’s a pretty easy white lie to spin.

Many consumers – including highly technically literate ones – are blissfully unaware that in many cloud computing contracts, vendors explicitly disclaim any responsibility for securing the integrity of your data.

Don’t want to take the word of a random guy on the internet called Daniel who says he was on a backup podcast once or twice? Okay, I don’t blame you for that. But if you want to go digging, you won’t have to go very far to find smoking gun evidence.

Take, for instance, the AWS Shared Responsibility Model which is foundational material in AWS certifications.

AWS divides responsibilities according to whether it’s backup in the cloud or of the cloud.

You might notice - perhaps with surprise - that customer data is actually the first responsibility in blue shading.

Customer data includes protecting the integrity of that data, keeping it safe from ransomware propagation, and avoiding doing stupid stuff like accidentally deleting it or throwing your phone in a river and forgetting to write down the 2FA backup codes nobody ever bothers to write down. If you do that, AWS will not be dispatching a rescue team to dredge out the river bed. Nor will they be offering you a custom restore from the magical backup of your data you’re certain it said they’d be taking. It’s on you. Your data’s gone.

alt text

Stories to reaffirm the point that the cloud isn’t backup actually abound, ranging from the occasional high-profile incident in which a SaaS provider screws up and loses customer data, to more pedestrian incidents such as those I’ve personally witnessed, in which somebody accidentally deletes a shared Google Drive (then empties the bin), erasing hundreds of thousands of dollars’ worth of IP in a few keystrokes (I promise it wasn’t me!).

“But there’s still a way to get that back, right?” No!

“I mean, we pay Google, so there’s gotta be somebody who can help?”

Perhaps if you pay them millions of dollars.

But sorry, Joe.

In the big scheme of things, you’re just one of millions of small fry Workspace customers like me.

Your dedicated account manager is the support desk site. Or in latter times an AI chatbot.

Read the room, Joe. Nobody cares about your data, Joe. It’s gone. Maybe you should have taken a backup after all!

alt text

But Wait, SaaS Does Backup?! RIGHT!?

With justification, you may be hesitant to take these sweeping claims about the derelict nature of consumer SaaS backup or the supposed delicate state of our cloud data at face value.

You may point to tools like Google Takeout as evidence that consumer SaaS tools do, in, fact, care about letting their consumers back up their own bits and bytes.

Here again I push back!

At the time of writing, from a pure backup standpoint, Takeout is so primitive that to call it a “backup” utility is arguably not even accurate.

Google Takeout is a kind of functional data export tool intended to mop up some of your data from Google’s sprawling ecosystem. If it were a bone fide backup tool, it would be a full one and a highly inefficient and antiquated way to move data from cloud to premises. Incremental backup was invented in the 1960s and anyone who can install rsync on a server has immediate access to more advanced backup technology than Takeout. Google develops some cutting-edge technology but Takeout is (at best) an afterthought.

From a backup standpoint, Takeout is about as useful as an LTO tape after you pulled all the tape out and made confetti out of it.

Enterprise SaaS Backup Is Not Totally Derelict

alt text

You may be wondering whether it’s even possible to offer reliable backup solutions to SaaS customers. Isn’t that kind of a contradiction in terms? Not really. I would argue actually that this should be the default state of affairs.

SaaS companies do all the heavy lifting of deploying software and take care of all the tedious but necessary things like configuring WAFs. But they let you pull out your data to managed storage however often you wish. Technically, it’s hard to think of many real impediments.

The proof in the pudding lies in the fact that SaaS backup does exist. But frequently it’s only developed if it’s determined that the customer is going to care about data enough to ask or demand it.

Government agencies, national security organizations, and those who have to comply with things like HIPAA standards are often legally prohibited from storing a bunch of data on a random SaaS tool they hope is competent at storing their data securely. The fact that AWS stood up private cloud infrastructure for the CIA is an optimistic sign that cloud providers are capable of meeting the very highest standards of data governance. The problem, rather, is that they’re massively discimrinating in who they roll those standards out to.

alt text

In other instances, folks like intelligence agencies and nefarious people like terrorist organisations go low tech, operating legacy hardware and writing CDs in a strange game of cloud-hesitant cat and mouse.

Hizbullah procured pagers in an attempt to avoid Israeli SIGINT intercepts (it ended poorly). Iran apparently took to writing nuclear secrets on CDs. But as Prime Minister Netanyahu showed in what only I probably interpreted through the lens of cybersecurity, even air-gapping your data isn’t going to be enough if somebody really wants to get to it…

All Is Not Entirely Bleak. But In Consumer SaaS, It Pretty Much Is. Sorry.

alt text

Progress has been made over the years by innovative tools trying to use API integrations to help consumers mop up some of their data.

The fact that it has usually taken third parties to bring this tech to market, however, speaks volumes about the fact that most SaaS companies simply don’t care.

Lest you cite GDPR export functionalities as another counter-claim, let me line up some arguments against that line of attack too.

GDPR exports are better understood as begrudgingly-enabled features offered because a regulator said they have to be.

Nobody who cares about backups would architect a backup feature like this. A slim minority of tools offer users access to APIs that shock, horror, yes, it’s possible! allow users the vaunted ability to, you know, actually grab a copy of their own data and put it somewhere they own or control. The fact that this is sometimes held out as an extraordinary enterprise-tier feature speaks mostly to how surprisingly dismal consumers’ expectations of data ownership in the cloud are.

Minus that assurance, most of us are left hoping that the fancy text in the footer reassuring us that the SaaS vendor is taking their own backups is really true (ask the same vendor if you can access those backups, and the situation might start to seem less rosy).

Part of the reason self-hosting continues to be a vibrant part of the tech world when so much SaaS abounds (and I think this point is rarely made!) is that many, like the dear author, simply don’t feel confident using a CRM (or a wiki) on storage they cannot control.

alt text

Towards a Shared Federacy Model to End (Self-Imposed) Consumer SaaS Feudalism

The prevalent governance model through which SaaS is delivered today could be (perhaps too charitably) described as a kind of digital form of feudalism.

In exchange for carefully metered access to SaaS services from benevolently disposed cloud providers, SaaS customers unwittingly agree to forfeit any control or meaningful access to their own data.

This status quo doesn’t exist because SaaS providers are unable to provision the tools that would allow consumers to co-own the data (setting up an integration with an external database or object storage bucket isn’t hard). The status quo exists because, until SaaS consumers question the inadequacy of this model, the situation will persist.

In just about every vertical, the SaaS market is vibrant - in fact, it’s beyond the point of saturation. While I don’t have million dollar contracts to dole out, I make it a point to do business with any provider that puts thought into giving consumers’ reasonable data access.

alt text

A Minimum Standard for Consumer SaaS Backup

So what kind of future would I like to see for SaaS backup?

Firstly, even I would concede that not all data needs to be meticulously preserved. But it’s reasonable for consumers to expect that they should retain shared ownership over at least some of their data.

  • Emails held in webmail clients might contain treasured personal correspondence.
  • Customers frequently use tools like calendars, task managers, and cloud document tools.
  • Small businesses use all of these things, plus tools like project management software, wiki software, etc.

The cloud is great and convenient, and I have a list of reasons why I think that, in most instances, self-hosting makes no sense (I’ll offer a couple of them as a preview: cloud computing makes the most efficient use of human and technical resources by aggregating them; open-source software offered for free is, sadly, often unsustainable).

With a smattering of encryption, I’d probably be happy to never self-host applications again.

But until SaaS providers understand that data governance is not a black-and-white paradigm, little will change.

The idea that consumers should have no expectation of direct access to data they own – or that the only direct access they should have is in formats that aren’t portable or useful – is antiquated and moving on it from it should not be the lone cause of those championing data privacy.

Shared data governance isn’t a lofty standard that should only be held out for databases that hold our insurance records. It’s a pretty bare minimum standard that should really be the norm whenever people and technologists come together.

Thank you for sticking to the end.

Now, go backup your stuff.


By: Daniel Rosehill

Creative Commons License
Creative Commons Attribution 4.0 International License