Written by: Karl Bielefeldt - @softwarewhys
Published: 8 August 2016
I had the privilege of attending DockerCon 16 in beautiful Seattle, Washington. To understand what I hoped to get out of DockerCon, it would be helpful to know some of the history behind how we use Docker at ADTRAN. Docker is an integral part of our Firefly platform Nathan introduced in a previous blog post, but so far it has also been a hidden part. As one of the keynote speakers at DockerCon said, "Customers don't care about containers." They care about applications, and that's what ADTRAN delivers, in the form of virtual machines. Encapsulated within those virtual machines are fully dockerized applications.
When I was walking around the DockerCon floor, the first question I was often asked was, "Are you a developer or are you in ops?" The question threw me for a loop the first few times I answered, because I mostly consider myself a developer, but from a Docker point of view I also consider myself a sort of "ops by proxy" for our customers. Let me explain what I mean by that.
Customers interact with our applications using two tools. One we call cluster-admin, which is basically nothing more than a set of curated Ansible playbooks for administering a cluster of our virtual machines. The other is node-admin, which is a python application we wrote that handles all the orchestration for administering a single node. The Ansible playbooks do some of the typical things like installing rpms, but mostly they make calls into node-admin.
node-admin encapsulates everything needed to start one node's services into a node-admin start command. It:
- Pulls the appropriate images specified in a file that was pushed by an Ansible playbook.
- Sets up the appropriate DNS information in a dnsmasq container.
- Makes sure containers are not already running for a service.
- Adds application-specific environment variables.
- Sets up logging.
- Sets up volumes.
- Sets user and group appropriately so we don't need to run containers as root.
- Starts the containers.
- Waits for them to come up successfully.
- Sets up datacenter redundancy.
All that from a simple node-admin start. A whole lot of testing goes into making sure all that runs smoothly and the customer doesn't have to know about the details, because we are not the real operators who are physically present when this is deployed, but it is our tech support who will get called if something goes wrong. This code we write is our ops by proxy.
node-admin interfaces with Docker using the excellent docker-py library, but we found that to be at too low an abstraction level for many of our tasks. To mitigate that issue, we created a library we call dockhand that wraps some of the more common tasks and stabilizes the API. For example, the syntax for creating a container changed somewhat in Docker API version 1.17, and again in version 1.19. Since we had wrapped that function, we were able to support newer Docker engines while remaining backward compatible. That makes our support job, both internal and external, a lot easier.
You might look at all the tasks node-admin does and wonder why we didn't build on top of an existing orchestration solution. The answer is that we adopted Docker fairly early. When we started working on Firefly, there weren't solutions available that met our needs. When those solutions started becoming available, the work to retrofit them cost more than the benefits. It's only recently, as we've started to scale our applications horizontally, that we've seen the costs of maintaining our custom solution start to outweigh the benefits.
We also wrote an orchestration tool for developers we call Kaylee, after the resourceful mechanic of the Serenity. A kaylee dev start does most of the same things a node-admin start does, but tailored for an application developer experience. This allows us to onboard new developers very quickly without a great deal of up front familiarity with Docker.
ADTRAN also uses Docker in our test automation to create reproducible and isolated environments within our continuous delivery pipeline. Before Docker, our DevOps infrastructure team would often get requests along the lines of "please upgrade obscure library to extremely-specific-version." While this can be managed for a small number of environments, it did not scale to a global organization. As more of these requirements arose, some of which were mutually exclusive, the number of environments grew out of control. With Docker, the developers of the tests now have complete control of the test environment and no longer need to make these requests. This has also allowed the DevOps infrastructure team focus more on providing a larger pool of common, standard host machines that can then be specialized as needed by each project. Using Docker in our continuous delivery pipeline has developers thinking of the infrastructure as part of their deliverable, which is helping to expand the DevOps mindset throughout ADTRAN.
ADTRAN was historically a hardware company. One of our earliest uses of Docker was in simulating DSLAM equipment at scale, so we could test our management software at scale. This sort of scaling would not be possible with a resource-heavy virtual machine, and would not be available to day-to-day developers with real hardware. Look for a full blog post on this in the future.
The Firefly platform team is also responsible for most of the deployment and clustering aspects of the system. We use Docker to containerize each component of the platform. We have found that this brings a lot of consistency and determinism to our development pipeline. A local Firefly instance created by Kaylee, a test deployment in our CI pipeline, and a large-scale clustered deployment in production are all essentially the same thing: a bunch of orchestrated Docker containers. We have developed some really cool tools in-house to do Docker orchestration, but the ecosystem around Docker is rapidly evolving, so we are also keeping a close eye on many projects, including Kubernetes, Swarm, Rancher, and others. For us, that means trying things out for ourselves, so someone on the team is constantly saying "hey, come look what I just did!" It's an exciting place to be.
Now that you have an abbreviated history of ADTRAN's Docker use, I can tell you about my goals for DockerCon. First, ADTRAN's customers want to have confidence in the security of the cloud core products we build using Docker. I wanted to learn how Docker and the Docker ecosystem could help our customers have that confidence. Second, I mentioned how our custom orchestration has become more difficult to maintain as we are scaling horizontally. I wanted to learn how we can leverage existing solutions to help with that. This will allow us to focus our efforts on the "ADTRAN special sauce" that adds unique value for our customers. Third, the wild card. I knew there were people using Docker in ways we hadn't considered, but that could be useful to ADTRAN and by extension, our customers. Just by being in proximity to so many other Docker users, I hoped to discover those use cases. DockerCon did not disappoint on any of those points.
First, regarding security. It was obvious that ADTRAN is not the only company focused on Docker security. That topic pervaded throughout the conference, and it's obvious Docker has made great strides in this area in just the last year. One eye-opening slide from the keynote read:
"The most security conscious organizations on the planet are now adopting Docker not in spite of security concerns, but to address their security concerns."
This was not just empty rhetoric, either. The CTO of ADP was one of the speakers singing Docker's praises on security. Check out his talk here. They have 630,000 B2B clients in over 100 countries. They handle 5 million logins per day. They manage health care for more people than healthcare.gov, have 55 million social security numbers on file, and move $1.8 trillion through the ACH system every year. They are considered critical infrastructure by the U.S. government. And they use Docker to keep that information secure and to meet all the associated regulatory requirements. That's a powerful endorsement.
The technical details of security they presented broke down into two main themes. First, they emphasized the concept of a secure software supply chain. This involves both scanning for known vulnerabilities and cryptographic signing of docker images at every stage of the software life cycle. This lets you be confident the image a developer signed off on is the same one QA signed off on, and is the same one actually running uncorrupted in production months later. In addition, these certifications expire. If a new security issue is found at any stage in the pipeline, but you haven't updated your docker images, your execution environment will notify you.
The second theme of security also ties in with orchestration. Docker 1.12 aims to do for orchestration what Docker did for containers, and it makes security easy. Docker 1.12 seamlessly encrypts the links between nodes and automatically and frequently rotates the certificates used for encryption. This is a very difficult problem to get right. Trust us, we have worked on it at ADTRAN. It's an even more difficult problem to make it look easy. Docker 1.12 does both. Continuing on with orchestration, the huge news is Docker 1.12 does it natively, and will add a Distributed Application Bundle (DAB) file, that lets you distribute an application as a whole, rather than individual containers. This will definitely make it easier to manage and track applications.
Now for the wild cards. I work on ADTRAN's cloud core products now, but I wrote embedded firmware for our network elements for several years, so I was pleasantly surprised to see a company called resin.io with an impressive solution for using Docker on embedded devices. They gave a demo where they upgraded the firmware on a quadcopter in mid-flight, saying you could see it when it occurred. I expected the indication to be the quadcopter dipping a foot or two, which would have been impressive in its own right, but it was completely seamless. The only indication was two blips on a real-time graph. I don't know if their product is a good fit for ADTRAN's particular use cases, but it shows the potential of what Docker can do in an embedded environment.
The other wild card was in the area of resiliency testing. We have been working on multi-datacenter solutions. We want to test under similar conditions to geographically distributed datacenters without actually having to run our tests on geographically distributed datacenters. I was able to find a few different solutions for this, from polished commercial apps, to a lightweight and easily hackable open source project demonstrated in the closing keynote.
In my ideal vision, one day we will deliver a DAB file to our customers and they will be able to seamlessly deploy it on their orchestrator of choice, be it Kubernetes, Mesos, Rancher, vanilla Docker, or whatever new comes along. Until then, we will continue to provide the best ops by proxy experience we can. I'm looking forward to seeing what we can do with these new Docker capabilities over the next year.