[LADIS 2009] Technical Session #4 – Monitoring and Repair
Second Talk: A case for the accountable cloud by Andreas Haeberlen
Three cloud stories .. a threatening cloud, a promising cloud, and a nice cloud 🙂
The problem with current clouds is that the user does not know what the cloud service provider is doing with the customer’s code and data. Also, from the cloud service provider’s perspective, the operator does not know what is the code that they are running for customers supposed to do.
Alice is the customer running a service on the cloud owned and operated by Bob.
A solution: what if we had an oracle that Alice and Bob could ask about cloud problems? We want completeness, (if something is faulty, we will know) accuracy (no false positives), verifiability (the oracle can prove its diagnoses is correct).
Idea: make clud accountable to alice+bob. Cloud records its actions in a tamper-evident log, alice and bob can audit, use log to construct evidence that a fault does or does not exist.
Discussion: 1) Isn’t this too pessimistic? bob isn’t malicious ..maybe, but bob can get hacked, or things can just go wrong. 2) shouldn’t bob use fault tolerance instead? yes whenever we can, but masking faults is never perfect, we still need to check. 3) why would a provider want to deploy this? this feature will be attractive to prospective customers, and helpful for support. 4) Are these the right guarantees? completeness (no false negatives), could be relaxed with probabilistic completeness; verifiability could be relaxed only provide some evidence; accuracy (no false positives) can not be relaxed because we need to have confidence when we rule out problems.
A call to action: cloud accountability should do; deliverable provable guarantees, work for most cloud apps, require no changes to application code, cover a wide spectrum of properties, low overhead.
Work in Progress: Accountable Virtual Machines (AVM), goal: provide accountability for arbitrary unmodified software. cloud records enough data to enable deterministic replay, alice can replay log with a known-good copy of the software, can audit any part of the original execution.
Conclusion: current cloud designs carry risks for both customers and providers (mainly because of split administration problem). Proposed solution: accountable cloud. Lots of research opportunities.
Third Talk: Learning from the Past for Resolving Dilemmas of Asynchrony by Paul Ezhilchelvan
In an asynchronous model you can not bound message delivery time or even message processing time by a machine. However, in a probabilistic synchronous model, we can bound times within a certain probability via proactive measurements. The new central hypothesis of the new model is that most of the time, performance of the past is indicative of the performance of the near future (i.e. delay in the past is the indicative of delay in the future).
Design steps include doing proactive measurements, using them to establish synchrony bounds, and assign time bounds based on that, try that and see how it works and enable exceptions.
On-going work: development of exceptions (to deal with exceptional cases when mistakes are detected). Open environments are asynchronous, use crash signals for notification of extreme unexpected behavior.