Docker: What Do You Think You’re Trying To Do?
Docker has become a huge buzzword in the software development industry over the past few years. People are flocking to this new(ish) technology, it’s a keyword that pops up on a ton of resumes, and for good reason! It’s a really cool piece of technology, and it has a ton of great applications for managing development environments, continuous integration testing, and deploying and scaling your production environments.
But you’re wasting your time, and you probably shouldn’t use it.
“You give terrible advice, Thwiv”, you’re probably saying. “Why do you love Vagrant so much?”. Let me get this out of the way: I’ve been working in Docker for several years now, and I really do love it. It solves a ton of problems you come across developing software in different environments, and the speed of deployment compared to full VMs makes onboarding team members to projects a 15 minute task, rather than 1-2 days it seems to take otherwise. I love it enough that I subscribe to /r/docker and try to answer as many questions as I can about setup and deployment of containers. It’s become my way of giving back.
But, while doing that, I’ve come across a lot of people that really need to take a hard look at what Docker is, what it’s for, and how it works before diving in. It’s probably not the solution to all your problems, and there’s some real growing pains that come with getting it up and running. Lets go over some of the questions you should be asking before you start writing your first Dockerfile.
What Do I Want?
This is a question a surprising amount of people trying to use Docker don’t seem to know the answer to. They might know vaguely, “I want Docker to be my development environment”, or “I want Docker so I can scale”, but what do those things mean? And how can Docker help you do them?
A Docker container is, at its core, a process running on your computer. That seems dumb and obvious, but it’s pretty important to remember. The container part makes the process believe that it is isolated: meaning it acts like it is the only process running on your computer. What that process is is up to you; it’s defined in your Dockerfile. With a little ingenuity, you can have the process spin up other processes to run. Docker also allows the container to emulate many different operating systems so that the process has access to the tools available to those systems. But there’s the catch: it’s emulation. Docker is not a Virtual Machine. It can run on one, but it isn’t one in itself, and you shouldn’t think of it that way.
This fact can make the initial setup of Docker a lot more complex than setting up a Virtual Machine. You need to have a good map of what processes your system needs to run, and which ones need containers vs. are hosted on another system, and how all the containers talk to each other, before you can even think about building out a replacement system. Without it, you’re just going in blind. You can’t just ssh into your container and install a new library when you need one (Actually, you can, and I can show you how in a later post, but it’s really not recommended). And once you have all this, you need to ask yourself another question:
Why Do I Want Docker To Do That?
You should really consider what you’re trying to get out of the Docker. Lets go back to those two things I made up, and try to answer the “Why?”
I want Docker to be my development environment
This is the most common use of Docker, I’ve found. But why do you want that? What benefit do you get from a Docker-based development environment? I can tell you what I get from it, and why I like working in it, but I’m not you. Our pain points are not the same. What do you value in your development environment?
Many developers (and companies) want the development environments to match as closely as possible to the production environment. Is this important to you? Because running in a container is not the same as your production environment (unless you’re running Docker in production, and if you’re doing that, why are you reading this?). To do things “the docker way”, your processes will generally all act as if they are running on their own servers, communicating over networks. Running PHP and NGINX? Those should be separate containers, piped using network connections. That is different than 99% of your common server setups.
The things you get from Docker as a development environment are more “eventual consistency” based. Docker makes it much easier to define your environment and dependencies as source code. One well written docker-compose file and some clever configuration included in a git repository can have a new developer up and running in 15 minutes. It won’t be the same as your production environment, but it will be consistent among all the developers, and, so long as you’re not relying on processor-specific compiler settings, we can reasonably assume that tests written by one developer will pass on any other developer’s machine. That’s neat! But is it worth it?
It really depends on your team, your projects, and your needs. I wouldn’t recommend Docker just because it’s the cool new thing.
I want Docker so I can scale!
Okay, cool! Docker is good at that. We talked about it a little last section, but as you build separate containers for each process, you’ll start to realize that these containers can be duplicated in order to scale your project up by orders of magnitude. You’re building separate containers, right?
It turns out this concept scares a lot of people. I’ve answered a lot of questions about mimicking a server in a Docker container, because that’s what everyone knows. They want to know how to build a container with Nginx, PHP-FPM, and MySQL installed. In fact, I’ve built one of these monstrosities myself. Groups want to move to Docker, but they don’t want to break out of that Virtual Machine mindset. But if you really want to use containers to their fullest abilities, you have to. Say it with me: Docker is Not a Virtual Machine.
You get nothing in scalability if you just copy your VM setup into a Docker container. In fact, all you did was add overhead to your VM. The way you get to scale is by breaking those processes down into its atomic pieces, and then duplicating the bottleneck containers to add bandwidth. If you’re serving a lot of static content, duplicate your Nginx container; if PHP can’t handle the load, spin up 3 more PHP containers. There’s plenty of systems for managing this kind of scaling, but if everything is in the same container you’d be better off just spinning up some more AWS EC2 instances.
So What Should I Do?
I’m not trying to tell you to not use Docker. The future is definitely bright for containerization in general, and Docker is a huge part of that. All I’m saying is that before you start, you should have clearly defined goals. I think that might be a mantra to live by for all things, not just Docker. But without goals, you’re just going to end up frustrated with what could have been, and it’s going to be harder for people like me to help you. Also, the other point is obviously Docker is not a Virtual Machine.
…been trying to convince coworkers of this perspective for months… awful lot of people walking around with “engineer” in their title who are more like fashionistas.