Tom Tantillo

tantillo (at)
Department of Computer Science
Johns Hopkins University

207 Malone Hall
3400 N. Charles Street
Baltimore, MD 21218

About Me

I'm currently a fifth year Ph.D. student in the Computer Science Department at Johns Hopkins University. I'm a member of the Distributed Systems and Networks lab, working with prof. Yair Amir. My main research interests are in fault-tolerant and intrusion-tolerant networks and distributed systems, but I also enjoy high-performance computing and computer security. I received my Bachelor's degree in Computer Engineering at Johns Hopkins in 2010, and my Master's degree in Computer Science at Johns Hopkins in 2013.

I'm also a co-creator of the Spines overlay messaging toolkit.

Press release about our work on the first practical intrusion-tolerant network deployed on a global scale (06/28/2016). I presented this work at the end of June at the ICDCS 2016 Conference in Nara Japan!

Current Research

Critical applications, such as monitoring and control of global clouds and management of critical infrastructure (e.g., the power grid), are becoming more connected to the Internet for cost-effectiveness and scalability reasons, but this leaves them vulnerable to attack. Most systems today are not designed to withstand sophisticated attacks; an attacker who is able to compromise a single machine typically gains the power to take down the entire system. Currently, we are working to develop Intrusion-Tolerant systems that continue to work correctly even when parts of the system become compromised. (Funding: DARPA MRC)

Intrusion-Tolerant Open-Source SCADA

As vital components of critical infrastructure, SCADA systems must continue to operate correctly and at their expected level of performance at all times. However, current SCADA systems are vulnerable to intrusions, and even a single compromise can cause catastrophic consequences. We are developing and implementing the first open-source intrusion-tolerant SCADA system that operates correctly and at its required level of performance even while some components are compromised. The system is designed to survive over a long system lifetime (e.g., decades), and protects against compromises at both the network and SCADA-system level, all while guaranteeing message delivery within stringent latency requirements.

DSN 2015 Student Forum: Toward Survivable Intrusion-Tolerant Open-Source SCADA

Intrusion-Tolerant Networks

Accurate and timely monitoring and control is paramount for the correct operation of a cloud. While some of the reliability and availability requirements needed by clouds are solved, a large gap in constructing resilient clouds is their vulnerability to intrusions. Since cloud infrastructure nodes are geographically distributed in data centers and administrators are remote from these nodes, administrators can find themselves blind to problems and unable to resolve issues if their messaging system fails during attacks. Therefore, cloud administrators are faced with a classic chicken-and-egg problem: cloud monitoring and control must work at some level at all times, even while under attack, in order for administrators to react and resolve problems. Using overlay networks and standard cryptographic authentication, we developed and implemented a spectrum of dissemination protocols in the Spines overlay messaging toolkit to deliver messages in the presence of Byzantine nodes. In an extreme case, messages are guaranteed to be delivered as long as even a single path of correct nodes exists between source and destination.

ICDCS 2016: Practical Intrusion-Tolerant Networks
LADIS 2012: Intrusion-Tolerant Cloud Monitoring and Control
My Master's Thesis: Intrusion-Tolerant Cloud Monitoring

Intrusion-Tolerant Consistent State

State of the art Byzantine fault-tolerant replication systems (e.g., Prime) provide safety, liveness, and performance guarantees, even while under attack. However, if more than a given threshold of replicas are compromised during the system lifetime, the replicated state can become inconsistent. In this work, we strengthen the state of the art Byzantine fault-tolerant replication systems using proactive recovery and software diversity. This supports the threshold assumptions and limits the power of an adversary since they must compromise more than the required number of replicas within a vulnerability window of time. In addition, we seek to support applications with large state (e.g., 1 terabyte) and long system lifetimes (e.g., 30 years).

SRDS 2014: Towards a Practical Survivable Intrusion Tolerant Replication System

Previous Research

Diversity Assignment Problem

In many modern clouds, participating cloud nodes are homogeneous due to the cost benefits and ease of management. However, with essentially the same software running on each node, the capability to compromise one node usually translates to the capacity to take over the entire system. Diversifying the attack surface of cloud nodes can help limit the power of an adversary. However, given only a limited number of diverse variants, then question is then how to best assign the variants to maximize resiliency? We introduce this Diversity Assignment Problem (an NP-hard problem) and provide an optimal solution using Mixed Integer Programming. We evaluate our solution on both a global cloud provider's network connectivity graph and random graphs. More information and extensions of this work can be found in the paper and technical reports below.

DSN 2013: Increasing Network Resiliency by Optimally Assigning Diverse Variants to Routing Nodes
Technical Report - 2013: Increasing Network Resiliency by Optimally Assigning Diverse Variants to Routing Nodes

Remote Telesurgery

This project aimed to develop extremely low-latency and reliable communication for use in telerobotic remote surgery. In the target system, an expert robotic surgeon would perform surgery over long geographical distances, requiring a low-latent and reliable two-way communication channel for robot commands and high-definition stereoscopic video. Overlay networks were used in order to cope with the everyday problems experienced by internet routing and provide the necessary latency and availability constraints. In addition, the da Vinci was used as the surgical system. Project page.

Spiny Android

The goal for this project was to develop software to run on Android phones that would share connectivity (WIFI, BlueTooth, etc) between nearby phones running the same software. In the end, we ported Spines directly onto the Android phone's hard disk (no virtualized application). The resulting proof-of-concept resembled something of a mesh network. Project page.


Structured Overlay Networks for a New Generation of Internet Services. A. Babay, C. Danilov, J. Lane, M. Miskin-Amir, D. Obenshain, J. Schultz, J. Stanton, T. Tantillo, and Y.Amir. Invited to IEEE International Conference on Distributed Computing Systems (ICDCS), Vision track, Atlanta GA, June 2017. [PDF]

Applications of Secure Location Sensing in Healthcare. P. Martin, M. Rushanan, T. Tantillo, C. Lehmann and A. Rubin. Association for Computing Machinery Conference on Bioinformatics, Computational Biology, and Health Informatics (ACM BCB), October 2016, pp. 58-67. [PDF]

Practical Intrusion-Tolerant Networks. D. Obenshain, T. Tantillo, A. Babay, J. Schultz, A. Newell, M. Hoque, Y. Amir, C. Nita-Rotaru. IEEE International Conference on Distributed Computing Systems (ICDCS), June 2016, pp. 45-56. [PDF]

On Choosing Server- or Client-Side Solutions for BFT. M. Platania, D. Obenshain, T. Tantillo, Y. Amir, N. Suri. The Association for Computing Machinery Computing Surveys (ACM CSUR), 48(4), Article 61, May 2016. [PDF]

Increasing Network Resiliency by Optimally Assigning Diverse Variants to Routing Nodes. A. Newell, D. Obenshain, T. Tantillo, C. Nita-Rotaru, and Y. Amir. The IEEE Transactions on Dependable and Secure Computing (TDSC), 12(6), pages 602-614, November 2015. [PDF]

Towards a Practical Survivable Intrusion Tolerant Replication System. M. Platania, D. Obenshain, T. Tantillo, R. Sharma, Y. Amir. In the Proceedings of the IEEE International Symposium on Reliable Distributed Systems (SRDS14), Nara, Japan, October 2014, pp. 242-252. [PDF]

Increasing Network Resiliency by Optimally Assigning Diverse Variants to Routing Nodes. A. Newell, D. Obenshain, T. Tantillo, C. Nita-Rotaru, and Y. Amir. In the Proceedings of the IEEE International Conference on Dependable Systems and Networks (DSN13), Budapest, June 2013. [PDF]

Objective assessment in residency-based training for transoral robotic surgery. M. Curry, A. Malpani, R. Li, T. Tantillo, A. Jog, R. Blanco, P. K. Ha, J. Califano, R. Kumar, and J. Richmon. The Laryngoscope 122, no. 10 (2012): 2184-2192. [PDF]

Intrusion-Tolerant Cloud Monitoring and Control. D. Obenshain, T. Tantillo, A. Newell, C. Nita-Rotaru, and Y. Amir. In Proceedings of the 2012 Workshop on Large-Scale Distributed Systems and Middleware (LADIS 2012), Madeira, Portugal, July 2012. Invited paper. [PDF]


Systems and Methods for Cloud-Based Control and Data Acquisition with Abstract State. Y. Amir, A. Babay, and T. Tantillo. United States Provisional Patent Application No. 62/451,341, filed January 2017.

Network-Attack-Resilient Intrusion-Tolerant SCADA Architecture. Y. Amir, A. Babay, and T. Tantillo. United States Provisional Patent Application No. 62/353,256, filed June 2016.


  • 2013 - Outstanding Teaching Award - JHU Computer Science - recognizing a student who has demonstrated outstanding effort and skill in assisting with the teaching of courses
  • 2010 - John Boswell Whitehead Award - recognizing outstanding achievements in electrical engineering by an undergraduate student

  • Courses Taken @ JHU

    Distributed Systems, Advanced Distributed Systems and Networks, Parallel Programming, Algorithms,
    Randomized Algorithms, Security and Privacy in Computing, Advanced Topics in Computer Security,
    Compilers and Interpreters, Database Systems, Declarative Methods, Computer Graphics,
    Selected Topics in Streaming Algorithms, Selected Topics in Systems Research

    Teaching Assistant

  • Advanced Distributed Systems (Spring 2013, Spring 2015, Spring 2017)
  • Distributed Systems (Fall 2012, Fall 2016)
  • Intermediate Programming (Fall 2011, Spring 2012)
  • Parallel Programming (Spring 2017)

  • Miscellaneous

  • 2010-2017 - Computer Science Happy Hour Czar - responsible for organizing weekly get togethers for staff, faculty, and students.