AT&T and the iPhone 4 Pre-Order Debacle

Yesterday, we’re told, the crush of fanatical fanboys pre-ordering iPhones brought AT&T’s servers to their knees.  Apple and AT&T pre-sold 600K iPhones, and we’re told they processed 13 million eligibility requests during the day, as people tried over and over to get through.  Random reports surfaced about how the crushing load “crippled” AT&T’s internal network, and caused security glitches and the exposure of private customer data (again).

We’re supposed to believe that this overwhelming traffic load was unprecedented and brought their systems to a screeching halt.  Well, at least AT&T’s systems — Apple’s systems seemed fine if you weren’t going through the eligibility portion of the check.

Here’s the problem, though — if you run the numbers, and know something about web/database applications, it just doesn’t add up.

13 million database queries sounds like a lot.  But let’s say that all of these queries largely happened in the first 12 hours of the day yesterday, instead of spreading them out over the full 24 hour cycle.  That’s 1.08MM queries per hour, or 300 queries per second, on average.

I don’t know if it sounds like a lot to you, but it’s really not.  Here’s a Google query on “mysql queries per second” just to get a general idea of what people are doing out there.  Many of the results range from 2003 through the present, and folks are doing a LOT more than this.  With clustering and various attempts to scale out, folks are doing 10-20K per second.  Oracle, properly tuned, can do thousands to tens of thousands of transactions (operations that change data, not just read it) per second.

I’m not a database expert, but I’ve worked around and with them for years, and I’ll say that 300 queries per second on average is not something that should cause one of the largest (and oldest, if one considers them the heir of the Bell System) telecom companies in the world to crumple under the load.

But traffic is bursty, not uniformly distributed.  So even if they saw periods with 10-50x greater load than average, we’re still in the ballpark for reasonable performance on a pure database query.  Note that I’m assuming that eligibility is a somewhat simple database query; we gave three items of data which obviously form a compound primary key, and AT&T is supposed to return some information about eligibility for upgrade:  perhaps date, perhaps a few other bits of info.

Let’s be generous and assume that 1K of data per eligibility request is returned (i.e., there’s little concern for efficiency).  That’s still only about 300K bytes per second of query results flowing back to Apple from AT&T, or about 2.4Mbps.  Again, perhaps bursting to 20-100Mbps for very brief periods of time.  In other words, a couple of DS3s or a fast ethernet cross-connect are sufficient to carry the data back and forth.  One imagines this shouldn’t strain AT&T’s internal network too much, despite random claims yesterday.

Of course, maybe the problem here isn’t database performance or bandwidth, but that AT&T did the eligibility checks as API calls through a large enterprise system where a single check builds and then tears down many EJBs or other enterprise objects. This might be closer to the truth for a performance bottleneck here.  Maybe the system was built to handle tens, but not hundreds or thousands, of requests per second.  That’s plausible, but kind of stupid for a large engineering company used to having millions of subscribers and doing business globally.  But I could buy it.

But you’d imagine that they’d have learned something from three previous “major” iPhone releases, and the iPad 3G release, and figured out an easier way to quickly respond to eligibility requests.  After all, my eligibility isn’t a rapidly changing variable — I’m eligible on a certain day, and they know what that day is.  Which means that the eligibility of every iPhone owner on the planet could have been precalculated easily just before the iPhone4 launch, and cached.  It’s not that much data, frankly.  You could have cached a table with the user’s phone number, last 4 SSN, and zip (the keys they ask you to enter) hashed, and a eligibility “price code”, in a few gigs of memory on all the app servers, and just statically responded to queries for the first 24 hours, if you were worried that your enterprise systems wouldn’t handle “first day” load.

Anyhow, these are just ballpark figures, and they could be wildly wrong about the instantaneous loads experienced, etc.  But the general point is, 13MM eligibility checks and 600K preorders isn’t really a lot of load and traffic.  Ask Amazon or eBay what “a lot” of transactions looks like.

Or better yet, AT&T, before the next launch, hire some of their ex-employees to take a look at your databases and systems.  Please.

 

Raise a toast to Douglas Adams…

This didn’t make Facebook’s status limit even with aggressive editing, but it is dedicated to our political system, with love and consternation.

The major problem — one of the major problems, for there are several — one of the many major problems with governing people is that of whom you get to do it; or rather of who manages to get people to let them do it to them.

To summarize: it is a well known fact that those people who most want to rule people are, ipso facto, those least suited to do it. To summarize the summary: anyone who is capable of getting themselves made President should on no account be allowed to do the job. To summarize the summary of the summary: people are a problem.

Douglas Adams, the pre-eminent social and political philosopher of our times.  Right behind Monty Python.  Then probably Jon Stewart.  With Friedrich Hayek and John Rawls taking a joint and distant fourth.

Happy Towel Day!

Cocktail Party to Benefit the San Juan Island Permanent Farmer’s Market

Last fall, at our annual Harvest Dinner and Auction to benefit the San Juan Island Permanent Farmer’s Market project, I donated two cocktail classes and parties.  The concept was that the purchasers would select an era, and if they chose, dress in period clothing.  This last Sunday, I hosted the second of the parties, and it was a ton of fun.

As part of the festivities, I taught a short class on cocktail making fundamentals — the bare minimum one needs in order to mix any drink recipe found in a book, etc.  When to shake, when to stir.  Why the dilution from ice is critical to making a balanced cocktail.  How the various ingredients “work” to produce a tasty, balanced beverage.   And then I simply mixed good drinks for the rest of the evening, with food catered by Market Chef in Friday Harbor.

Each person attending also got a booklet which covered the basics of cocktail making, and a bit of cocktail history, in addition to the evening’s menu of cocktails (with short recipes).  I focused on the history of “martini-like” cocktails, beginning from combinations of Old Tom gin and italian vermouth in the mid-1800’s (e.g., Martinez), down through the transition to dry gin and dry vermouth, to the martini as we recognize it today.  Most of the information, of course, is derived from online sources and the incomparable book by David Wondrich, but it’s fun to have a nice summary.

I wanted to post the menu, for folks who were interested.  And, of course, to pique the interest of others who might want a similar party and class.  It goes to benefit a terrific cause — a permanent, year-round home for the farmer’s market on San Juan Island.  Whether you live up here or not, consider supporting the cause!

 

Doctorow v. Johnson: iWhatevers versus Open Platforms and the Future of Computing

This last weekend the first iPads shipped to early adopters in the general public, including me. Like many of us in the technology business, I’ve kept a weather eye on the first impressions of many folks on the web, and friends in the industry. Most of these reactions are the stuff of geek discussion, and not terribly enlightening either about the device and its potential future uses, or the direction in which our industry is moving.

But one exchange is worth analysis and our attention, whatever the details of the device and our first impressions. Cory Doctorow, open-source freedom fighter extraordinaire and speculative fiction author, published a widely discussed, negative essay concerning the very idea of the iPad. By now, you’ve probably read it, or seen the link. If you haven’t, you should.

Cory’s essential points are two (with apologies if I’m missing something serious). First, that open platforms (think Linux, Android, FreeBSD, etc) are structurally designed to foster innovation at minimal entry cost, and with minimum friction to the innovator, and minimal interference between the innovator and the eventual consumer of those innovations. Second, Doctorow argues that the justification everyone is citing for the closed system — “making computers easy for mainstream users” — is insulting to mainstream users.

Joel Johnson responds that Doctorow’s principal arguments miss the point. In particular, that openness and innovation are not causally linked to the extent that open-source and Linux advocates claim. That innovation will thrive on the “nearly closed” platforms like the iPad and iPhone.

An iTunes irritation…

I’m watching TV almost exclusively from the Internet nowadays, and mostly by subscribing on iTunes and watching in HD from my AppleTV. This works incredibly well, once you have the season downloaded and ready to play.

The downloading process exposes some seriously irritating bugs and/or design flaws in iTunes, however. I live at the northern edge of civilization on an island (well, my Canadian friends would say the southern edge, and after reading coverage of the Tea Party Convention I’m inclined to agree…) and I have “difficult” internet connectivity. This is no fault of my local ISP, who do an amazing job considering where I live.

But I often encounter TCP resets in long downloads given the Motorola Canopy point-to-point wireless I use, and iTunes really behaves badly. Despite having typed my Store password to begin the download, upon resumption, iTunes will ask me again. And again. And again. Possibly once for every stream that needs to be resumed, but it doesn’t seem to be as well patterned as that. The application hasn’t restarted, I haven’t logged out, it’s the same hardware underneath, why can’t the application cache the Store password used to initiate a given set of downloads for the duration? Perhaps only asking me to retype if the application closes and restarts?

This seems trivial, but if it happens frequently, and you’re not sitting in front of the computer to type your password whenever needed, downloading a season of episodes can literally take days. Three thus far, in fact, for a show I’m subscribing to at the moment. With 29 more items to go. Basically, it’s going to take a week of retyping my iTunes Store password to get the entire season down, given my internet connection (which is normally pretty decent for browsing and other purposes).

Doesn’t anybody in Cupertino test this type of use case?

Do I still use that piece of software?

Spending a few days bedridden with some nasty viral thing is giving me the unusual chance to spend time with my main laptop, but without the pressure to actually accomplish something (that would require lucidity and the ability to focus for more than a couple of minutes). A few minutes ago, I noticed an icon in my menu bar, and wondered “do I still need that piece of software?.” Heck, what does it do?

Of course I recognized the name, and that I’d been a user since their beta release, and I remembered renewing my license again this year, but what I couldn’t immediately remember was whether that software was still an integral part of keeping my information current, sync’d, backed up, etc. Basically, is it necessary, or is it cruft?

That’s a general problem these days, and arguably it’s a worse problem on the Mac platform than on Windows, though of course it exists there as well. It’s more of a problem because Microsoft tries to build more of this stuff into Windows itself and its major desktop/server suites. Apple leaves more of it to the ISV community.

And as I noted in a previous post, good Mac software can be had for twenty, forty or sixty bucks. So people, especially professionals and developers, have a tendency to buy new apps just to see if it’s a bit better than the previous generation. I’ve done that with notetaking software, outliners, todo list management, and a bewildering variety of synchronization, backup, and storage apps and utilities.

All of which means that my laptop consistently has more than one “appendix” running — part of the system but functionally useless because it’s not being used.

And all which contributes to complexity and difficulty in troubleshooting. When my contacts database suddenly is empty, or has three or four copies of every contact (both of which seem to happen to me), which link in the synchronization chain is responsible? Is it syncing Address Book to Google Contacts? Plaxo syncing with Address Book?

Ultimately, to manage all this complexity, we’re going to need to be able to map the information flow between applications, so I can ask the question and get an answer. Today, I have to sit down and check each app’s preferences and configuration, and sort of make a list of where things are flowing, and rebuild the picture every time something goes wrong.

In complex systems, just as much vital information is contained in the links between things, as in the things themselves…