
Gmail’s recent breakdown has once again brought up the topic of the lack of trust in cloud computing to the forefront. Trust is defined by the Webster dictionary as the “assured reliance on the character, ability, strength, or truth of someone or something” or “one in which confidence is placed”.
The main reasons for the lack of confidence in cloud computing can be summed up in the following questions people typically have:
- Security – How can I trust that my data will be secure when it is not in my physical control?
- Reliability – Can I truly rely on the cloud to be there when I need it? What if it goes down? The track records of Google Apps, Amazon EC2, Microsoft Azure and the rest do not inspire confidence.
- Performance – Will the cloud be able to adequately service my specific performance needs? Doesn’t the cloud affect the latency for my applications?
- Lock-in/Portability: How confident can I be that I will be able to move my applications and data around seamlessly between providers?
Till the industry is able to address all of these issues satisfactorily, the adoption of clouds will grow only marginally. They key is to address all of them; addressing a subset will simply not suffice. After all, what good is a high performance cloud infrastructure that is insecure or a secure cloud that is unreliable or proprietary?
So how far away are we from the current state-of-the-art in cloud computing to achieving trustable cloud computing? In the figure below, I have tried to list all the critical requirements to enabling trust in cloud computing. I have also color coded each of them to indicate current progress within the industry towards achieving these requirements. Green -> What has already been achieved; Yellow -> Work in progress; Orange -> What is yet to be achieved. Below the figure, I have some additional thoughts on what needs to be done to address current deficiencies for each of these requirements.

Scalability
One of the basic requirements for cloud computing is the ability of the infrastructure to scale. After all, if you aren’t able to adequately service incremental demand then the trust in clouds is a non-starter. Today, the likes of Amazon, Google and Microsoft and many others have clearly demonstrated the ability to scale compute infrastructure for cloud computing. So the industry is already well on the way with respect to this requirement for trusted clouds.
Cost
Cost or rather value is absolutely a factor in enabling trust in clouds. Affordability drives adoption and perceived value influences whether customers rely on it longer term. There is no absolute low cost threshold to aim for. The reality is different customers will have different requirements and what they are willing to pay will depend on the value (SLAs, QoS, features etc.) they receive. The key is that customers not only have these choices but that the offerings reflect customer expectation for lower costs derived from economies of scale. For example if cloud providers were to start charging more as the size of clouds grow ( to cover increasing management costs), then customers are going to be skeptical of the paradigm.
The economies of scale in large public clouds are already driving costs down but there are still numerous avenues for improvement. To understand those areas of improvement lets first look at what drives costs today. The cost of cloud computing infrastructure is a function of a number of factors including:
- Hardware costs
- Software costs (license and maintenance)
- Installation services and administration services costs
- Cost due to inefficiencies resulting from
- Under-utilization
- Infrastructure complexity
- Shelf-ware both hardware and software (A $120K software on shelf is equivalent to a person year cost)
- Human latency
Costs for #1 through #3 are driven largely by vendor economies of scale and market forces i.e. larger the market and more the competition – lower the prices. So let us look more at #4 – the costs due to inefficiencies. It is well understood that existing compute resources are not being utilized effectively and this is a major contributor to high data center costs in terms of power, cooling, floor space, software and administration. Virtualization is helping address some of the inefficiencies on the server side but still not all servers are virtualized – either due to technical (e.g. I/O contention, performance), software (design or licensing hurdles) or – as in the case of private clouds, even organizational reasons (inter-departmental accounting, chargebacks and politics). Beyond servers we also have to deal with inefficiencies in storage. Thin provisioning and deduplication are just a couple of solutions here but we still lack for example a good approach to properly reclaim/consolidate capacity when it is no longer used. Clearly there is a lot more room for efficiency gains through the optimization of utilization.
Another area of inefficiency is the management complexity that has been introduced into the infrastructure within our datacenters. The figure below highlights the silos and the duplicated management functionality within those silos that need to be consolidated in order to address inefficiencies due to redundancy. Customers are paying for management functionality that is unnecessarily being duplicated all over.

Finally, there is the cost due to “human latency.” Without cloud infrastructure that can be automatically and optimally reconfigured in response to changing demand patterns, we are at the mercy of human administrators and experts. To accurately account for the cost for humans, we must not only factor what it costs to pay for them but also the cost of the time it takes them to recognize and react to an event requiring action.
Reliability, Availability and Performance
While the prevalent cloud computing platforms of the day have demonstrated their ability to scale – reliability, availability and performance are sorely lacking. This is amply evident in the numerous stories on cloud service outages that routinely make the tech press these days. Reliability, availability and performance are fundamental to establishing trust. What this means is that we need application QoS for the cloud. Application QoS can be defined as the ability of the cloud to satisfactorily service numerous applications with different latency tolerance requirements by monitoring the applications in real-time and then dynamically allocating or assigning compute resources based on business priorities. Latency tolerance may vary depending on the type of application and the application’s business priority. For example latency tolerance for a video streaming application may be less than that for an email server. However, the email server may be deemed more business critical than the video streaming application. The point here being that ultimately, latency tolerance should be specified by the application and that should dictate how all available compute resources are assigned dynamically to ensure application QoS. Today’s cloud computing infrastructure is not there yet. We do have a number of companies that recognize this as an opportunity i.e. the likes of RightScale, Elastra, 3Tera and numerous others. What these companies do is employ server virtualization technologies – which serve to abstract server software and the operating system from the server hardware, as the basis for enabling dynamic resource allocation. In essence, more server resources can be dynamically allocated to an application as demand changes. This however does not address the need for application QoS. The key point to note here is the dynamic resourcing in offerings from the above mentioned companies is limited to “server resources”. To truly enable QoS for applications in the trusted cloud, all resources – not just server resources must be dynamically assignable to match application requirements. Why? Because every compute resource contributes to the overall latency for an application. Storage and network performance in addition to server performance help determine an application’s overall performance. Consequently, the ability to dynamically provision storage IOPS and bandwidth in response to changing application needs is also required to deploy truly dynamic cloud infrastructure. This is essentially extending to storage what can be done today for servers using virtualization. This capability simply does not exist in present day storage systems and will be required for cloud computing infrastructure that can be trusted. I’ll have more in detail on this topic in future posts.
Security
There are two key elements to security in the context of trust in clouds. These are:
- End-to-end security of the data path between application and computer resources
- Security of the data that resides in the cloud
Earlier on we talked about the need for dynamic infrastructure that is able to assign compute resources to applications in real-time based on changing demand patterns. This dynamic infrastructure also needs to ensure security of that dynamically assigned connection end-to-end. Currently, we are in the early stages of this with Amazon’s recently introduced VPC (Virtual Private Cloud) demonstrating how to securely bridge traffic between a private datacenter and a public cloud using an encrypted VPN connection. All this does is securely transport data between a private datacenter and a public cloud infrastructure. The data on the public cloud infrastructure is still shared and there is nothing in Amazon VPC that secures the data once it is in the cloud. What is required is the ability to enable security for dynamic connections between the applications and compute resources. Finally, fine grained access control and encryption-based security must be implemented to lock down access to data that resides in the cloud.
Global Interoperability
In order to trust the clouds, customers will need the confidence that the investment they made in getting applications to deploy in a particular cloud infrastructure provider are not locked in. Accomplishing this will require the industry to create and adopt open standards for global interoperability to ensure portability of applications, data and management paradigms. This is similar to what happened during the course of the evolution of the telecom and the internet industries. In the Telecom world, the ITU ultimately helped establish worldwide standards that enabled seamless interconnection and interoperability between disparate communications systems. In the Internet world, IETF laid the foundation for interoperability by coordinating the creation of standards between customers, operators and vendors. A similar organization needs to drive open standards effort in the cloud computing space. The CCIF (Cloud Computing Interoperability Forum) is an early example of such an effort. However, it is still in the stage of trying to standardize taxonomy and create a common architectural framework. We are still a ways away from global interoperability
In summary it all boils down to assuring application QoS by enabling the end-to-end visibility, fine-grained control and security requirements for each of cloud infrastructure stakeholders while also offering choice. Specifically:
- Service Providers require end-to-end visibility and control to optimally manage resource utilization
- Service developers require that adequate resources will be assigned for applications they develop along with the ability to pick and switch between service providers without fear of vendor lock-in
- End users desire the ability to stipulate the SLAs they require, along with visibility, control and flexibility (choice) to manage to those SLAs
In my next post I will propose a reference architecture that could provide the basis for enabling the above.




This is – broadly speaking, a blog about Cloud Computing …and heaven knows you need another one of those as much as fish need bicycles. So to be more specific, this is a blog that will aim not just to inform, but also to muse and provoke thought on the kinds of datacenter infrastructure and architectures that will be required to truly deliver on the promise of Cloud Computing.