Git, Distributed Version Control
As I’ve mentioned before, I currently use Git as the repository for all this blog’s content. There, I store both the configuration files and the content itself (text and images) that later generate the functional website using Hugo . Beyond this specific use case, I believe that knowing Git is fundamental for IT professionals, so I set out to write this introduction to the tool.
What is Git?
Git is a Distributed Version Control System (DVCS), originally created by Linus Torvalds (the creator of Linux) in 2005.
In essence, Git is a tool that tracks changes made to files over time. It allows developers to revert to previous versions of a project, branch development to work on new features without affecting the main code, and merge those changes back in a controlled manner when appropriate. ``
Key Points:
Version Control: This means it records every modification in the code, allowing you to know who changed what, and when.
Distributed: This is the most important feature. Unlike centralized systems, every developer has a complete copy of the entire project history on their local machine. This means you can work offline and the project does not rely on a single central server (which increases resilience and security).
How Does Git Work?
Git does not store information as a list of files that change over time. Instead, it stores information as a set of snapshots of the filesystem.
1. The Three-Tree Model (or Three States)
Git manages the project across three main states or logical “trees”:
| State | Name (in Git) | Description |
|---|---|---|
| 1. Working Directory | Working Directory | The files you have on your machine and are currently modifying. These files are either untracked (not monitored) or modified (tracked, but not saved). |
| 2. Staging Area | Staging Area (Index) | An intermediate cache. Here, you place the specific changes from the Working Directory that you want to include in your next commit. |
| 3. Local Repository | Git Directory (Repository) | This is the Git database, containing the history of all changes (commits) in the project. This is where snapshots are permanently stored. |
2. The Commit (Snapshot)
A commit is the fundamental action. It is a snapshot or checkpoint of your project at a specific moment.
- To create a
commit, you must first move the modified files from the Working Directory to the Staging Area (commandgit add). - Then, you create the
commit(commandgit commit), moving those files from the Staging Area to the Local Repository. - Every
commithas a unique SHA-1 hash that identifies it, a descriptive message, and a pointer to the immediately precedingcommit(its “parent”).
3. Branches
Branches are lightweight, movable pointers to a commit. They allow developers to:
- Create a separate line of development from the main code (
mainormaster). - Work on a new feature or bug fix without the risk of breaking the stable version.
- Once the work is ready, that branch can be merged back into the main line.
Why is Git Important?
Beyond the use a software developer might give it, which I won’t focus on here, I believe Git offers several benefits for systems administrators or specialists in various IT branches:
- Audit and Traceability: In security and compliance, traceability is vital. Git provides an immutable record of all changes. If a vulnerability or incorrect configuration is introduced, you know exactly which
commitcaused it, who did it, and when, facilitating auditing and rollback. - Infrastructure as Code (IaC): Tools like Terraform, Ansible, or CloudFormation use Git to store their configuration. This is crucial for DevSecOps and for securely and reproducibly managing cloud environments.
- Collaboration and Resilience: It allows distributed teams to collaborate on the same code safely, without the risk of overwriting each other’s work. Being distributed, if a central server fails (e.g., GitLab, which we will discuss in another post), the complete history still exists on the local machines of those collaborating.
- Workflow (CI/CD): Git is the central engine of most Continuous Integration and Continuous Deployment (CI/CD) pipelines, ensuring that only reviewed and tested code reaches production. For example, in the case of this blog: I use a project on GitLab as a code repository, and then deploy the functional site on Cloudflare Pages through a CI/CD flow.
Therefore, Git can add value to tasks as diverse as managing Infrastructure as Code, configuration management on servers, container orchestration/Kubernetes, or simply documentation management (personally, I have used it for security policies and other documents).
These uses demonstrate that Git is an architectural and knowledge management tool as important as a programming tool. It allows applying the principles of traceability and auditing, essential in cybersecurity, to all aspects of a system.
I plan to write something about GitLab soon, one of the Git servers I usually use for personal and work projects. And once I have more time, I will document the complete deployment of this blog with CI/CD using Hugo + GitLab + Cloudflare Pages here.