[LADIS 2009] Technical Session #2 – Applications and Services
First Talk: Are Clouds Ready for Large Distributed Applications? by Kay Sripanidkulchai
This talk essentially focused on how can enterprise applications be transported to cloud computing settings. Issues focused on are: deployment, this is more complex than just booting up VMs due to data and functionality dependencies. The second issue, availability. Enterprise apps are heavily engineered to maximize uptime. According to a published study, current cloud services can expect up to 5 hours of down time per year. Enterprise customers however really expect 1 hour of downtime per year. So how can this gap be bridged ? The third issue is that of problem resolution.
Bridging the availability gap: ideas include: 1) implementing scaling architectures in the cloud, 2) developing APIs to allow multiple clouds to interact with each other so as to develop failover techniques, 3) Live VM migration to mask failures.
As for problem resolution: categories of issues raised regarding EC2 on EC2 boards: 10% of topics discussed are feature request, 56% user how-to. As for problems 25% cloud error, 64% user error, 11% unknown error. One of the important things that enterprise customers want is being able to know if something is not running correctly, is the issue with the cloud platform, the VM, faulty hardware or what. So techniques and tools have to be developed in that regards.
Second Talk: Cloudifying Source Code Repositories: How Much Does it Cost? by Michael Siegenthaler
Cloud computing used to be mainly used by large companies that have the resources that enabled them to build and maintain the datacenters. Now this is accessible to people outside these companies for low costs.
Why move source control to the cloud? resilient storage, no physical server to administrate, scale to large communities. Used SVN which is very popular, store data on S3 (problem with eventual consistency), used Yahoo Zookeeper (a coordination service) as a lock service. How to measure costs for SVN on S3? measure cost per diff files and stored files. Back of the envelope analysis of cost shows it is inexpensive even for large projects such as Debian and KDE. A trend to notice is that code repos are getting larger in size, but the price of storing a GB is decreasing with time.
Architecture: machines talk to front-end servers on EC2 and storage is on S3. The front-end need not be on EC2, the cloud is there mainly for storage. A problem with a naive implementation is that eventual consistency in S3 means that multiple revision numbers can be issued for conflicting updates. For this reason locking is required. The commit process essentially has a hook that acquires a lock from ZooKeeper and pull for the most recent version number. The most recent version is retrieved from S3 (retry if not found due to eventual consistency), then make commit and release lock and ZooKeeper increments the version number.
Performance evaluation: usage patterns: Apache foundation has 1 repo for 74 project with average 1.10 commits per minute and a max of 7 per minute. The Debian community has 506 repos with 1.12 commits per minute in aggregate and 6 in max. These were used as experiment traces. The results showed that as you add more front-end servers from EC2 the performance does not suffocate due to possible lock contention, and this was tried with differing number of clients.
Third Talk: Cloud9: A Software Testing Service by Stefan Bucur
There is a need to facilitate automatic testing of programs. Cloud computing can make this have better performance. Testing frameworks should provide autonomy (no human intervention), usability, performance. Cloud9 (http://cloud9.epfl.ch/) is a web service for testing cloud applications.
Symbolic Execution: when testing a function, instead of feeding it input values, send it an input abstraction (say, lambda) and whenever we see a control flow branching (such as an if statement) create a subtree of execution. One idea is to send each of these subtrees to a separate machine and test all possible execution paths at once. A naive approach can have many problems. For example trees can expand exponentially, so incrementally getting new resources to run can be problematic. A solution to that is to pre-allocate all needed machines. There are many challenges in parallel symbolic execution in the cloud such as dynamically load balancing trees among workers and state transfers. Along with other problems such as picking the right strategy portfolios
Preliminary results show that parallel symbolic execution on the cloud can give over linear improvement over conventional methods and KLEE.