Category Technology

Facebook, Google+, and the Crafting of the Global Social Network

I was one of the “lucky,” who has a friend (and ex-coworker) that works for Google, and so I got an early invite to Google Plus, their attempt to take on Facebook head-on (i.e., after Facebook has achieved dominance, as opposed to the early Orkut days).

Google+ is oddly Facebook-like. This makes sense, given that FB is well-used by people of all ages in many countries. The design and interface are battle-tested (if also trivially and endlessly changable). But there’s a key difference, and one that started me thinking about the real business that Facebook is in.

That difference is, of course, the prominence of “Circles” in Google+, and the near-absence of features in Facebook for segmenting and targeting your communications. Sure, one can create friend groups in Facebook, and then make status updates for just a friend group, but I’ll bet a lot of you either didn’t know that, or had never used it. Heck, I’ve never used it despite my expressed desire on Facebook for just such a feature. It’s nearly invisible on Facebook.

It’s central and prominent on Google+. Google wants us to *limit* and control, for ourselves, to whom we target our words and images. Twitter almost insists upon the opposite, that we speak boldly into the ether, and whomever is listening will hear, whether we know the person or not.

I’d bet that at Facebook, any feature which restricts the *volume* or *velocity* of messages that flow within the Facebook global social network are verboten, or anathema. But at the same time, Facebook positions itself as providing control and “privacy,” despite numerous well-publicized privacy issues.

Twitter largely self-organizes as a social network. Facebook, on the other hand, is *crafting* the global social network. It encourages us to accept the illusion of privacy in order to get us to friend more people, post more status, and expose our opinions and information than we would be willing to otherwise. We should not, as a result, study the Facebook social network as if it were a reflection of our real-life social networks, because the two networks are different both in topology and in weighting.

What Google+ is trying to do, and how that intent will translate into reality once it’s fully up and running, I have no idea. It is, perhaps, not entirely clear to Google themselves, since they seem to start with goals and ideas, and let data and experiment drive them toward an ultimate plan and implementation. In fact, I’ll bet the social network scientists and researchers at Google have studied the Facebook social network and its dynamics better than anybody else except Facebook’s social network scientists, and know a good deal about what makes it tick and what makes it sick.

But it’s safe to say that they’ve made a couple of bets. One is that Google is willing to accept a slightly lower velocity and average quantity of messages in the system. This is inevitable because people will restrict more highly to whom they send various status and messages if the means for doing so is prominent and core to the system’s operation. The degree to which this effect will be prominent is open to question, but the underlying inequality in rates is pretty much built in. They would make this bet if the increased loyalty they get from customers yields a better upside.

Second, they’re betting that running a more organic and self-structured social network will yield better growth than a manipulated and engineered social network. Here, I’d bet that Google analyzed growth rates from various kinds of node-addition processes, and found that Facebook is oversaturating its degree distribution and eventually will lose the desirable “near-scale-free” network properties (for propagation), and will tend toward a distribution with too many degree correlations to propagate information efficiently. That’s a complete conjecture on my part, but it’s backed by some solid science on the nature of information transfer on various network topologies.

So Google+ is starting out in a seemingly interesting direction: offering more well-integrated control over how and to whom we communicate, but with a familiar feel and design. The real question now is, will enough people come and play, so that we can figure out how well it works, what Google is *really* doing, and whether that’s good or bad for individuals.

A belated Towel Day perspective

This year, on Towel Day, I was busy, putting together a fundraising dinner for the UW Anthropology Department and the UW Student Farm.  So I didn’t really write anything, as I have in years past.  But not for lack of something to say.  I’m not sure what it is, exactly, about “Towel Day,” the semi-bogus holiday celebrated by fans of Douglas Adams each year, but it seems to bring out the “long view” in me, visions of civilizations rising and falling.  You’d think such thoughts would be triggered by someone more profound…by a rereading of Edward Gibbon or at least Barbara Tuchman, or even Carl Sagan reflecting on the immensity in which our parochial concerns are lost.

Nope.  Douglas Adams does it every time.  It’s the Golgafrinchans, at the end of Restaurant At the End of the Universe.

Because, of course, they’re us.  They’re our bumbling, over-specialized, incapable of making a living for themselves, useless skills aplenty, useful skills thin on the ground, selves.

And, as an archaeologist and social scientist, the Golgafrinchans always remind me of how fragile our civilization is.  I am a social scientist, and I read a good bit of contemporary social science, of course, but in my work I analyze phenomena at a much longer time scale.  I study societies and social groups as they come and go, are born by fission from some other group of people, flourish, perhaps give rise to social “offspring,” and eventually go extinct.  And what is more emblematic of social extinction than Adams’s portrayal of the Golgafrinchan Ark “B”, carrying the non-essential members of society off to form a new world….

The Golgafrinchans occupy a place in my personal “wax museum of humanity” right next to Danny Hillis’s Long Now Foundation, and their 10,000 year clock.  Although the 24 hour news cycle and the buzz of tweets and instant information would have you believe otherwise, it is over much longer time scales that we can evaluate the success, and equitability, and sustainability of the various ways we humans have, of being human.  Our battles might be fought in days or years or lifetimes, but it is only our descendants that can truly “keep score” and decide how well we did.

The Long Now clock is designed to transcend us as a civilization, and as one of the ways we can communicate some of what we’ve learned with our far-future descendants.  It is designed not to require folks to be close enough to us in time and culture that they can read our writings, or comprehend our ideas, but to draw upon principles that are presumably deeper — not necessarily built into the laws of physics, mind you — but comprehensible to beings who are descended from our kind of minds, our kind of bodies.

Combine the perspective of an anthropologist studying the slow coming and going of societies, and the perspective of a software and systems engineer, and I think you get a sub-genre of futurism and speculation:  what it takes to “recover” the good bits of a civilization, after a collapse or other disaster.  Or simply the slow erosion of deep time.

I think of this problem in algorithmic terms.  If you wanted to maximize the chances of being able to recreate us, down the road after we’ve lost our knowledge, lost this particular set of scientific/democratic values, what is the “minimal instruction set”?

In short, what is the “boot loader” for an open, democratic society  combining expressive freedom and respect for scientific discovery?

This is the closest I can come up with, and I do not claim that it’s a deterministic algorithm.  In other words, starting here, you are not guaranteed to replicate the aspects of our civilization we value.  It’s clearly stochastic, and there’s clearly a lot of noise.  Which means only that I’m giving an “initial condition” and transition probabilities for processes which are in the “basin of attraction” of the product we’re looking for, and that if you follow such rules, “more often than not,” you’d end up with something we’d recognize as an open society.  Assuming you either replicate the experiment a lot (i.e., send LOTS of Golgafrinchans to LOTS of uninhabited worlds), or wait for the experiment to repeat itself over and over (i.e., deep time).

But here’s the algorithm (and I don’t claim full originality here):

  1. Pay attention and observe patterns in the world around you, keeping an open mind.
  2. Bang the rocks together, so to speak, and make things.  Especially new things.
  3. Understand how competition and cooperation work, and why each is necessary.
  4. Study those who are different, with an open mind.
  5. Pass on what you learn, without too much prejudice.

Put this algorithm on an endless loop, and you have something approximating the progressive parts of the last several thousand years of Western Civilization.   Ignore a couple of key clauses, and you have a much wider array of outcomes.  Not all good, and some downright scary.   Do it just like this, and you might, if you’re lucky, end up with an open, tolerant, prosperous, enlightened democracy.

That’s it.  That’s what it takes.  The Golgafrinchans managed it, apparently…and so did we.  But it was a narrow victory, and the question is whether we can manage to keep it up…..

Happy Towel Day!

AT&T and the iPhone 4 Pre-Order Debacle

Yesterday, we’re told, the crush of fanatical fanboys pre-ordering iPhones brought AT&T’s servers to their knees.  Apple and AT&T pre-sold 600K iPhones, and we’re told they processed 13 million eligibility requests during the day, as people tried over and over to get through.  Random reports surfaced about how the crushing load “crippled” AT&T’s internal network, and caused security glitches and the exposure of private customer data (again).

We’re supposed to believe that this overwhelming traffic load was unprecedented and brought their systems to a screeching halt.  Well, at least AT&T’s systems — Apple’s systems seemed fine if you weren’t going through the eligibility portion of the check.

Here’s the problem, though — if you run the numbers, and know something about web/database applications, it just doesn’t add up.

13 million database queries sounds like a lot.  But let’s say that all of these queries largely happened in the first 12 hours of the day yesterday, instead of spreading them out over the full 24 hour cycle.  That’s 1.08MM queries per hour, or 300 queries per second, on average.

I don’t know if it sounds like a lot to you, but it’s really not.  Here’s a Google query on “mysql queries per second” just to get a general idea of what people are doing out there.  Many of the results range from 2003 through the present, and folks are doing a LOT more than this.  With clustering and various attempts to scale out, folks are doing 10-20K per second.  Oracle, properly tuned, can do thousands to tens of thousands of transactions (operations that change data, not just read it) per second.

I’m not a database expert, but I’ve worked around and with them for years, and I’ll say that 300 queries per second on average is not something that should cause one of the largest (and oldest, if one considers them the heir of the Bell System) telecom companies in the world to crumple under the load.

But traffic is bursty, not uniformly distributed.  So even if they saw periods with 10-50x greater load than average, we’re still in the ballpark for reasonable performance on a pure database query.  Note that I’m assuming that eligibility is a somewhat simple database query; we gave three items of data which obviously form a compound primary key, and AT&T is supposed to return some information about eligibility for upgrade:  perhaps date, perhaps a few other bits of info.

Let’s be generous and assume that 1K of data per eligibility request is returned (i.e., there’s little concern for efficiency).  That’s still only about 300K bytes per second of query results flowing back to Apple from AT&T, or about 2.4Mbps.  Again, perhaps bursting to 20-100Mbps for very brief periods of time.  In other words, a couple of DS3s or a fast ethernet cross-connect are sufficient to carry the data back and forth.  One imagines this shouldn’t strain AT&T’s internal network too much, despite random claims yesterday.

Of course, maybe the problem here isn’t database performance or bandwidth, but that AT&T did the eligibility checks as API calls through a large enterprise system where a single check builds and then tears down many EJBs or other enterprise objects. This might be closer to the truth for a performance bottleneck here.  Maybe the system was built to handle tens, but not hundreds or thousands, of requests per second.  That’s plausible, but kind of stupid for a large engineering company used to having millions of subscribers and doing business globally.  But I could buy it.

But you’d imagine that they’d have learned something from three previous “major” iPhone releases, and the iPad 3G release, and figured out an easier way to quickly respond to eligibility requests.  After all, my eligibility isn’t a rapidly changing variable — I’m eligible on a certain day, and they know what that day is.  Which means that the eligibility of every iPhone owner on the planet could have been precalculated easily just before the iPhone4 launch, and cached.  It’s not that much data, frankly.  You could have cached a table with the user’s phone number, last 4 SSN, and zip (the keys they ask you to enter) hashed, and a eligibility “price code”, in a few gigs of memory on all the app servers, and just statically responded to queries for the first 24 hours, if you were worried that your enterprise systems wouldn’t handle “first day” load.

Anyhow, these are just ballpark figures, and they could be wildly wrong about the instantaneous loads experienced, etc.  But the general point is, 13MM eligibility checks and 600K preorders isn’t really a lot of load and traffic.  Ask Amazon or eBay what “a lot” of transactions looks like.

Or better yet, AT&T, before the next launch, hire some of their ex-employees to take a look at your databases and systems.  Please.


Doctorow v. Johnson: iWhatevers versus Open Platforms and the Future of Computing

This last weekend the first iPads shipped to early adopters in the general public, including me. Like many of us in the technology business, I’ve kept a weather eye on the first impressions of many folks on the web, and friends in the industry. Most of these reactions are the stuff of geek discussion, and not terribly enlightening either about the device and its potential future uses, or the direction in which our industry is moving.

But one exchange is worth analysis and our attention, whatever the details of the device and our first impressions. Cory Doctorow, open-source freedom fighter extraordinaire and speculative fiction author, published a widely discussed, negative essay concerning the very idea of the iPad. By now, you’ve probably read it, or seen the link. If you haven’t, you should.

Cory’s essential points are two (with apologies if I’m missing something serious). First, that open platforms (think Linux, Android, FreeBSD, etc) are structurally designed to foster innovation at minimal entry cost, and with minimum friction to the innovator, and minimal interference between the innovator and the eventual consumer of those innovations. Second, Doctorow argues that the justification everyone is citing for the closed system — “making computers easy for mainstream users” — is insulting to mainstream users.

Joel Johnson responds that Doctorow’s principal arguments miss the point. In particular, that openness and innovation are not causally linked to the extent that open-source and Linux advocates claim. That innovation will thrive on the “nearly closed” platforms like the iPad and iPhone.

An iTunes irritation…

I’m watching TV almost exclusively from the Internet nowadays, and mostly by subscribing on iTunes and watching in HD from my AppleTV. This works incredibly well, once you have the season downloaded and ready to play.

The downloading process exposes some seriously irritating bugs and/or design flaws in iTunes, however. I live at the northern edge of civilization on an island (well, my Canadian friends would say the southern edge, and after reading coverage of the Tea Party Convention I’m inclined to agree…) and I have “difficult” internet connectivity. This is no fault of my local ISP, who do an amazing job considering where I live.

But I often encounter TCP resets in long downloads given the Motorola Canopy point-to-point wireless I use, and iTunes really behaves badly. Despite having typed my Store password to begin the download, upon resumption, iTunes will ask me again. And again. And again. Possibly once for every stream that needs to be resumed, but it doesn’t seem to be as well patterned as that. The application hasn’t restarted, I haven’t logged out, it’s the same hardware underneath, why can’t the application cache the Store password used to initiate a given set of downloads for the duration? Perhaps only asking me to retype if the application closes and restarts?

This seems trivial, but if it happens frequently, and you’re not sitting in front of the computer to type your password whenever needed, downloading a season of episodes can literally take days. Three thus far, in fact, for a show I’m subscribing to at the moment. With 29 more items to go. Basically, it’s going to take a week of retyping my iTunes Store password to get the entire season down, given my internet connection (which is normally pretty decent for browsing and other purposes).

Doesn’t anybody in Cupertino test this type of use case?

Do I still use that piece of software?

Spending a few days bedridden with some nasty viral thing is giving me the unusual chance to spend time with my main laptop, but without the pressure to actually accomplish something (that would require lucidity and the ability to focus for more than a couple of minutes). A few minutes ago, I noticed an icon in my menu bar, and wondered “do I still need that piece of software?.” Heck, what does it do?

Of course I recognized the name, and that I’d been a user since their beta release, and I remembered renewing my license again this year, but what I couldn’t immediately remember was whether that software was still an integral part of keeping my information current, sync’d, backed up, etc. Basically, is it necessary, or is it cruft?

That’s a general problem these days, and arguably it’s a worse problem on the Mac platform than on Windows, though of course it exists there as well. It’s more of a problem because Microsoft tries to build more of this stuff into Windows itself and its major desktop/server suites. Apple leaves more of it to the ISV community.

And as I noted in a previous post, good Mac software can be had for twenty, forty or sixty bucks. So people, especially professionals and developers, have a tendency to buy new apps just to see if it’s a bit better than the previous generation. I’ve done that with notetaking software, outliners, todo list management, and a bewildering variety of synchronization, backup, and storage apps and utilities.

All of which means that my laptop consistently has more than one “appendix” running — part of the system but functionally useless because it’s not being used.

And all which contributes to complexity and difficulty in troubleshooting. When my contacts database suddenly is empty, or has three or four copies of every contact (both of which seem to happen to me), which link in the synchronization chain is responsible? Is it syncing Address Book to Google Contacts? Plaxo syncing with Address Book?

Ultimately, to manage all this complexity, we’re going to need to be able to map the information flow between applications, so I can ask the question and get an answer. Today, I have to sit down and check each app’s preferences and configuration, and sort of make a list of where things are flowing, and rebuild the picture every time something goes wrong.

In complex systems, just as much vital information is contained in the links between things, as in the things themselves…