The last two articles in our series dealt with the potential of environment variables and some hands-on examples. In this article, we talk with DevOps Engineer Antione Rougeot about the challenges of managing environment variables and he shares some best practices from his experience.
Humanitec: Before we talk about environment variables, perhaps you could introduce yourself briefly and tell us something about your background.
Antione Rougeot: I help enterprises close the gap between developers and products. Fascinated by computers, I worked as a developer and software engineer for 6 years and evolved to be a DevOps engineer. I started working as a freelancer last year helping multiple clients dockerize their applications. Thanks to this, their applications become independent from the servers they are deployed to, and they are free to run on any Docker-compliant infrastructure.
Why do you typically deal with environment variables?
A key concept I followed while coding applications as a Software Engineer was to make classes loosely coupled. This means having an application composed of many independent pieces of code, that ensures that your code has a certain level of cleanness and maintainability. Then, the idea is to instantiate these classes by passing parameters values and ensure they work together in a highly cohesive way. The same concept is applied to software infrastructure. We create many independent Docker containers, that we connect together using environment variables. This is also known as microservices architecture.
To give a concrete example, here is the definition of two modules connected using an environment variable. This definition is made for docker-compose, a tool used on a development machine to start containers and test that they work well together.
<p> CODE: https://gist.github.com/tony-engineering/9c52611dc33aeaf65df6c2becd4ed2cb.js </p>
As you can see, the code is pretty straightforward, the frontend and backend are built using Dockerfiles present in their respective paths. Both modules are accessible on the network using the specified ports. The frontend should be started after the backend. The frontend communicates with the backend using the endpoint value BACKEND_ENDPOINT.
What is great with this setup, is that you don't need to rebuild the frontend module to start pointing to a new backend endpoint value.
When this setup is deployed to production, the only change made is changing BACKEND_ENDPOINT value from localhost:2000 to its domain name, like https://backend.endpoint.domain.org. Each module is now independent (loosely coupled) and also well connected using an environment variable (highly cohesive).
What difficulties can you encounter when setting up environment variables?
To quote Wikipedia: “DevOps is a set of practices that combines software development (Dev) and information-technology operations (Ops)”. Environment variables are part of this "set of practices".
What is challenging is to take an existing application and transform it in a way that is compatible with these practices, including setting parameters with environment variables. When you are building an application from scratch, if you don’t keep in mind that it will run in a Docker container, adaptation will become more complicated since it impacts the whole application.
Once you have successfully made these changes to the application, then you hit the next step, making the same setup work on developer's machines and in production.
Imagine you have the following setup, which is very common.
The application reads sensitive configuration data, like api_key from a plain text file.
This file is not included in the source control for security reasons, it’s passed manually. When a new developer arrives, they ask for this file from colleagues so they can start coding and test the application.
On production, the file is copy-pasted to the remote server and it stays there. The problem here is that by doing that, the production application is tightly coupled to the server it’s running on. To improve things, you choose to move to a Docker-based setup. Good choice. :)
After refactoring, your application doesn’t read the value of "api_key" from the file anymore, but from the API_KEY environment variable. At this point, you can deploy it to a Docker-compliant infrastructure. The value of API_KEY is securely set in the platform you are using to spin up containers, and if you add a new stage it’s present by default, which eliminates the need for copy-pasting something on a remote server and makes the deployment fully automated!
The final step now is how to set API_KEY on a developer's machines? There are multiple solutions:
- you ask each developer to set the value in their environment before launching the application
- you add some logic at the application's initialization to use the API key environment variable value if it exists, otherwise, fall back to the plain configuration file
Great! Everything is now working both on production and on developer's machines. All environments can run the same container, but with different parameters.
Can you give a real-world example of a difficulty you encountered?
As you know, things don’t always go as well as expected. The biggest challenge is when you realize that an external dependency was not designed to be compatible with environment variables. This happened to me recently when trying to dockerize a Ruby on Rails application that was running in a production environment with an engine called Passenger. This engine works well in the non-docker world when you define configurations in plain text files, but it turns out this engine isn’t able to read environment variables by default.
After investigation, I understood that the source of the problem is that this engine was a sub-process of Nginx, and as stated in the documentation:
"By default, Nginx removes all environment variables inherited from its parent process".
Of course, I’m not the first person to try to dockerize a Rails application running Passenger, so after further investigation, I saw it had added a directive "passenger_app_env", that enables you to hard-code environment variables values. The value of the directive cannot be set dynamically, so I ended up using a hacky workaround where I transformed config files into templates and replaced values with the envsubst tool.
It was clearly time to reconsider the choice of using Passenger + Nginx to run the application.
The possible solutions that I identified were the following:
- try the Apache + Passenger alternative,
- try another engine like Unicorn, and
- use the standard Puma server that was already used on developer's machines.
The third solution made sense because it let us drop a layer of complexity, and also move toward another DevOps practice which is: keep development environments the same as production. What seems to be a quick change can sometimes turn out to be a complex task demanding reconsider components of the application's architecture.
At which point do teams most often struggle when it comes to environment variables?
In my early days as a developer, I always said: "I’m allergic to configuration".
Software often has a large config folder or even multiple places where it’s defined. Configuration is, in general, a dark place. Developers don't need to change it often, so nobody knows exactly what is inside, apart from the architect who already left the project.
When you want to dockerize an application, you have to dig, identify and extract all values that are environment-specific. These changes often come with a fear of "breaking everything". Indeed, changing configuration is not like coding on the backend or frontend. In some situations, it’s difficult to validate that what you just changed is correct.
Something common with tasks such as refactoring configuration is that it has low priority. It’s something that won't be visible at all on the product side. Teams are prioritizing new functionalities that are bringing noticeable results. Not taking care of configuration can lead to a total loss of control over it! Furthermore, it requires a high-level view of the system, which can be difficult to achieve. To sum up, multiple factors can create struggles when it comes to environment variables: fear of breaking, loss of control, and the need for a high-level view.
What would be your top 3 tips on how to avoid these struggles?
First, minimize the use of default values. If you identify a parameter that should be set as an environment variable, think twice before setting a default value.
This can be dangerous and produce unexpected behavior, or worse: false positives.
Example: you are using BACKEND_ENDPOINT to tell your frontend how to communicate with the backend.
You have 2 environments: development and production. For development, the value should be https://dev.myapi.org,
and in production: https://prod.myapi.org.
In the initialization of your app you do something like:
<p> CODE: https://gist.github.com/tony-engineering/d4ae32adc6d07e2c5a92dc6fbee05860.js </p>
If you forget to set the value of BACKEND_ENDPOINT for the container you deployed in production, what could happen?
You’ll end up with the production frontend communicating with the development backend, and you may not notice it at all!
It would be better to have the app throw an error message "Error: BACKEND_ENDPOINT is not defined.".
Second, keep your configuration clean. Delete dead lines of code, and be nice to developers by leaving comments if something is obscure. :)
Third, maintain an architecture schema. Use something like asciiflow.com to draw a simple schema of your application's components, and add it to your source control.
This will help people to understand the dependencies of your application.
Since I discovered this tool, I have been a fan of it!
- Everyone can edit the schema since it doesn't require having the source file, or specific software
- Schemas are quick to draw
- You can add it to your source control and track changes
How can Humanitec help from your perspective? What are the main benefits?
First, dynamic environment variables: I mentioned above that for the example app: "in production”, the only change that will be made is changing BACKEND_ENDPOINT value from localhost:2000 to its domain name, like https://backend.endpoint.domain.org.
In the past, I always had to worry about the value of a variable like BACKEND_ENDPOINT. If the domain name changes, I then had to report the value change in the deployment configuration. Since the frontend and backend are always deployed together, it would be great to be able to tell a system: "Deploy the backend, then when it’s ready to accept connections put the value of the current endpoint in BACKEND_ENDPOINT, then you can start the frontend". Humanitec provides a convenient feature for this.
Second, managed services: You can create a database directly in Humanitec and connect your application to it dynamically with the feature described above.
Third, a great UI and easy to rollback state to a previous deployment. I spoke above about the fear of breaking something. Developers are humans after all. :)
Even with the cleanest microservices architecture, you can end up having weird bugs. When multiple modules are connected together, it can happen that a colleague pushes a problematic change for a module you are relying on, but you are not aware of it.
With Humanitec you can identify deployments for all connected modules with few clicks, and also rollback the entire group of modules to a previous state. This makes it a developer-friendly workplace.