Friday, February 24, 2012

Cloud Drives New Thinking About Data Architecture

It doesn't take much experience in the cloud space to realize it is subject to the same limitations as Grid computing and for the same reasons. One of the biggest limitations is data; the size of data sets, the bandwidth and cost required to move it, and the introduced latency to process it. I have repeatedly heard Alistair Croll quote another cloud visionary that "next to the cost of moving data, everything else is free". It's simple, and it's true, yet when adopting cloud so few people seem to be aware of the importance to architect for this reality.

As Geva Perry points out the reality of cloud is that like most technologies it enters a company through the everyday knowledge worker and then wends its way upward as it delivers value until finally the CxO's become aware of its success; typically right before they engage in top down initiatives to bring the technology into the company. Most of these early adopters are software developers and systems administrators, both of whom are eternally on the lookout for better solutions that make their lives easier. Neither take a data centeric focus which results in sub-optimal solutions. And in cloud where the return can be so high, a sub-optimal solution still looks great masking it's inherent shortcomings.

As I've explained to many who confuse cloud with mainframes using the centralization argument it's important to realize there is a huge difference. Mainframes were about physical centralization. And if we proved nothing else, we proved physical centralization is bad thing from a disaster recovery point of view, a cost of operations point of view, a response time point of view, and several others that I won't detail. Cloud gives us the best of both worlds: logical centralization within a physically dispersed reality.

New models require new architectures. Since the fundamental value element of any computing system is the data being processed, it's natural to use optimize the architecture for data. In the cloud the optimal data architecture takes advantage of the geographic diversity leveraging virtualization concepts to logically manage the data in a traditional centralized model. The reality is no matter how much storage you can put in one location, you'll never have enough. And even if you did, you'd never have enough buildings to house it. And if you did, you'd run out of bandwidth to move it around and make use of it. We are in a data driven age where we're better at collecting than using data, and combined with mobile technologies in which every electronic device suddenly becomes a sensor of some type we're starting to sink under the weight. Telco's had it right with CLEC's, Carrier Local Exchanges, the distribution points that linked the home to the global network.

In the Smart Grid arena I helped one of the largest utilities to re-think their smart grid strategy. The original design called for bringing all the consumption data back from the smart meters to the data center. The primary challenge on the table was how to move the data - wireless technology? Broadband over the wire? I turned the tables by refocusing the discussion on the fundamental assumptions. Why did the data need to be transferred to the data center? The answer? The business requirement was to be able to tell every user the accumulated cost of their power consumed to date at any given point in time. Digging deeper we found the current mainframe could not meet this requirement being able to only process 200,000 bills each day; hardly the real time answer the business wanted. So I asked if we could flip the architecture taking a distributed control model from my powertrain engineering days with GM. I argued it would make more sense to accumulate the data at the smart meter or neighborhood, calculate the billing at that level, and then only access the detailed data from the data center when needed. Through research the "when needed" use cases were limited to billing disputes, service issues, and the occasional audit. Since only 1% of customers called each month, even if 100% required the customer rep to reach down into the smart meter to retrieve the detailed data it ws certainly eaiser and less of a load than bringing 100% of the data back to the data center. Frankly it was hard to argue against a distributed model which became the standard and has slowly replaced the original centralized model of smart grid touted for years as the answer.

I have advocated the same distributed architecture approach for use by mobile providers (accumulate usage on the mobile phone, execute the billing algorithm, and if the provider needs the detail download what you need when you need it). I have advocated a more generic version for healthcare payers, retailers, and within the supply chain touting the advantage of storing the data where it's collected.

The data management tools are in their infancy but there is significant work going on around the world on the subject. Consider that five years ago your options within the database world were limited to a cluster. Now we have sharding, high performance filesystems, highly scalable SQL databases plus a whole new class of data management from the map-reduce/hadoop world.

Embrace it or fight it the reality doesn't change. Data growth is exploding. Storage densities are plateauing. Is it better to learn how to hold your breath, tread water, or swim?

Tuesday, February 14, 2012

Yet Another Barrier to Public Clouds - Hacktivism

Public cloud providers like Amazon, Google, Rackspace and Microsoft are struggling to be relevant to the enterprise, and to the Fortune 500 in particular. At a recent conference when a keynote asked if people felt confident enough in public cloud storage to put their data in the public cloud, I was the only person who raised my hand (and that only because of bepublicloud). However sitting through the keynote by a founding member of the Cloud Security Alliance brought me to the realization that there is another side to security that will block the adoption of public cloud even once all the security issues are addresses and confidence in the secure public cloud storage surges.

One of the fundamentals of public cloud is that it uses the Internet for connectivity. Even the VPN solutions use the Internet. Connectivity is limited resource and with the thin margins in public cloud bandwidth is a heavily scrutinized, monitored, and protected resource. Similarly enterprises labor continuously to optimize network architecture and minimize the size of the pipes to the Internet. Enter hacktivism and its favorite tool of disruption, the distributed denial of service (DDOS) attack.

A DDOS attack is basically a flood of requests that hit a targeted range of internet addresses seeking to overwhelm the systems ability to respond. Small attacks take down a server, medium attacks take down a site, large attacks saturate the nework and take down an entire company. Essentially so much garbage is being thrown down the drain that eventually the system blocks up and nothing can get through. When this happens nothing goes in or out.

Imagine a bank, hospital, or any other company who begins to use public cloud for enterprise solutions. To the hacktivists it would be the same as inviting their disruptive methods into the data center. A DDOS attack could essentially take the company off-line unable to complete any transaction involving the public cloud. No more access to systems, data, records, images. I expect this is an issue already faced by salesforce.com and other SaaS providers who become the target because of who their customers are rather than as a result of their own actions. It would certainly make a prospect want to know who else uses the service in advance, but well beyond the concern of shared hardware and co-mingled databases.

I'm sure there are ways to architect around this, however it those will likely increase costs and complexity, the direction opposite the strategy of enterprises. Of course adding this issue to the litany of security concerns in the end only serves to decrease confidence in the public cloud.

Ouch!