IaC and code repositories
#HalfThoughtThought - unpolished ideas that are not necessarily correct, but at least have a clear chain of thought and target vision.
Context
Last year I have been thinking a lot about the best way to structure and organize code of my projects. There are so many layers to this topic, of course. For today I would like to zoom out and focus on what I call deployable units as a whole - apps, services, components, you name it, that are deployed in the infrastructure and receive incoming or internal requests flowing through the system.
If a team practices “you build it, you run it”, everything is in a cloud, modern tools like IaC (say, Ansible, Pulumi, Terraform) are in play, then a question arises:
where would the “infrastructure” code go?
There are at least two options:
Keep all the infrastructure and configuration code inside the application’s source code repository. Everything in one place, the repository is fully autonomous, no unnecessary cognitive load, very predictable. In theory at least.
A dedicated repository for IaC. Because, well, when there are dozens of deployable units, especially if they are configured slightly differently from one environment to another - having this spread across multiple locations becomes an issue. And the team would really benefit from a birds-eye view of the whole infrastructure. But that convenience comes with a cost: fundamental changes to an app would require modifications in multiple places, the app codebase itself, and then its infrastructure. On top, those modifications need to be in sync with each other in terms of rollout to the target environment.
In the “you build it, you run it” world, you should lean towards the first option. This helps with the team’s autonomy. But very soon, one more question pops up:
What about shared infrastructure?
Some apps or services will share common resources: queues, object storage (S3 buckets), or even databases. To which source code repository should those concepts go? Suddenly, the central IaC repository starts to make sense. But, again, that approach leans towards drawing a line between software and infrastructure. Software development teams, and infrastructure teams. That’s not what we want.
Thought
What if we view our application components (deployable units) as a graph? Deployable units (APIs, Web apps, databases, message brokers, you name it) are nodes on this graph. But instead of edges representing dependencies between them (what communicates with what), visualize ownership (in terms of lifecycle): a service is connected to a database by an edge only if this database is part of that service and does not make sense alone. If a service listens to a queue, because it is its public contract - there is an edge too.
Example 1
Simple scenario. There is a web app (HTML/JS/CSS), that is bundled as a set of static files and is served through a CDN. CDN won’t be needed if there is no web app, so in this regard the web app “owns” the CDN resources, their lifecycles are tied. Thus, both can be safely placed in one source code repository.
Example 2
A bit more interesting example. MySQL database and Redis instance are similar to the previous example, so that scenario is very straightforward.
Shared RabbitMQ instance is where things get interesting. Both services communicate with it, with a pub/sub mechanism. So, it is not absolutely clear where should RabbitMQ instance definitions go?
Let’s assume that Another App Service receives asynchronous commands from its peer services, not subscribing to events emitted by App Service.
If Another App Service gets decommissioned, keeping RabbitMQ running will make little sense: none of the commands will be executed, as no one will be listening for them. In this regard, the message broker is part of Another App Service component, so it is owned by it. Thus, definitions (configs and source code) of those two components can be put into the same source code repository.
To be refined…