I mentioned a few weeks back in my obnoxious AWS newsletter that the Amazon Aurora team looked at how to price their new DSQL offering, and promptly gave up entirely. In a remarkably Amazonian display of Customer Obsession, the team reached out to have a conversation with me about it that somehow didn’t open with “now listen here you little shit,” and I learned a lot.
In short: Amazon’s Aurora DSQL is a technical marvel, but its pricing is absolutely baffling. And I mean just that. They’re not gouging customers. It’s not unfair. How they arrived at their pricing makes sense given the product’s development constraints (presumably including things such as “thou shalt not lose us our corporate ass on this service, as we cannot make it up in volume”). It’s just monumentally confusing.
I will explain.
Wherein the unofficial take is better than the official one
Channeling the spirit of AWS blogs from a bygone era, Marc Bowes (Senior Principal Engineer on the Aurora team) did a small scale experiment and blogged about it on his personal site. Despite the keynotes, the heavily corporate blog posts, the copious documentation, and the “we are DSQL” rendition performed by the re:Invent House Band, it was this post that made the service “click” for me: it’s effectively Amazon’s second serverless database offering. Y’know, after DynamoDB. Ignore Aurora Serverless, Aurora Serverless 2: Electric Boogaloo, Keyspaces (yes, the DynamoDB shim), and the rest; this is the real deal.
I followed along with exactly what he did, and sure enough! The DPUs I consumed were correct to five decimal places for the read DPUs, the write DPUs, and–wait, the compute DPU I consumed was roughly 3% less than his?
It was around this point that my eye began twitching uncontrollably.
What will it cost me? Who knows!
The problem with the pricing is not that it’s too high. The problem is instead that it’s hard, bordering on impossible, to model.
I honestly don’t have enough data to say for certain whether Aurora DSQL is a cost efficient way to run a given workload–and neither does anyone else. The single way to find out what a workload costs is to benchmark it on DSQL, and given that it doesn’t support the full set of PostgreSQL features, that’s going to resemble a bit of a migration uplift for any non-trivial workload. Given the pricing dimensions, I could not begin to guess in advance whether the economics will work out. Things that historically didn’t cost anything on RDS (or rather, manifested as rising CPU utilization) now have a direct cost associated with them.
Expensive is okay, “I dunno” is not
If a service is very expensive, that may be okay for some use cases, for some customers. What I can’t countenance, and can’t in good faith recommend, is a service where the result of a modeling exercise spits out an end result of “yes, this will cost you some amount of money.” When the money rises from “science experiment” to something more substantial, companies need to be able to reasonably forecast at least the broad shape of what the bill is likely to be. DSQL makes that an impossibility.
Aurora has seen this before. With a 25¢ I/O charge per million operations, it was very difficult to forecast Aurora costs without running a workload to see what it would look like. This was annoying enough that AWS fixed it by offering a more expensive instance that waived the IO charge–and customers adopted it in droves. The AWS Compute Optimizer now makes recommendations to tell you, cluster by cluster, what you should be running. This is great, but the customer shouldn’t even have to make these decisions.
If I can’t give up, AWS can’t either
AWS pricing has officially reached the point where ‘it depends’ isn’t just the answer – it’s the entire pricing documentation. The worst part is that I don’t really have a better pricing model here that satisfies the constraints. I just know that this one is likely to scare the crap out of customers, and hinder near-term adoption of what looks to be a fantastic service.