Git, Distributed Version Control

As I’ve mentioned before, I currently use Git as the repository for all this blog’s content. There, I store both the configuration files and the content itself (text and images) that later generate the functional website using Hugo . Beyond this specific use case, I believe that knowing Git is fundamental for IT professionals, so I set out to write this introduction to the tool.

What is Git?

Git is a Distributed Version Control System (DVCS), originally created by Linus Torvalds (the creator of Linux) in 2005.

In essence, Git is a tool that tracks changes made to files over time. It allows developers to revert to previous versions of a project, branch development to work on new features without affecting the main code, and merge those changes back in a controlled manner when appropriate.

Key Points:

Version Control: This means it records every modification in the code, allowing you to know who changed what, and when.
Distributed: This is the most important feature. Unlike centralized systems, every developer has a complete copy of the entire project history on their local machine. This means you can work offline and the project does not rely on a single central server (which increases resilience and security).

How Does Git Work?

Git does not store information as a list of files that change over time. Instead, it stores information as a set of snapshots of the filesystem.

1. The Three-Tree Model (or Three States)

Git manages the project across three main states or logical “trees”:

State	Name (in Git)	Description
1. Working Directory	Working Directory	The files you have on your machine and are currently modifying. These files are either untracked (not monitored) or modified (tracked, but not saved).
2. Staging Area	Staging Area (Index)	An intermediate cache. Here, you place the specific changes from the Working Directory that you want to include in your next commit.
3. Local Repository	Git Directory (Repository)	This is the Git database, containing the history of all changes (commits) in the project. This is where snapshots are permanently stored.

2. The `Commit` (Snapshot)

A commit is the fundamental action. It is a snapshot or checkpoint of your project at a specific moment.

To create a commit, you must first move the modified files from the Working Directory to the Staging Area (command git add).
Then, you create the commit (command git commit), moving those files from the Staging Area to the Local Repository.
Every commit has a unique SHA-1 hash that identifies it, a descriptive message, and a pointer to the immediately preceding commit (its “parent”).

3. Branches

Branches are lightweight, movable pointers to a commit. They allow developers to:

Create a separate line of development from the main code (main or master).
Work on a new feature or bug fix without the risk of breaking the stable version.
Once the work is ready, that branch can be merged back into the main line.

Why is Git Important?

Beyond the use a software developer might give it, which I won’t focus on here, I believe Git offers several benefits for systems administrators or specialists in various IT branches:

Audit and Traceability: In security and compliance, traceability is vital. Git provides an immutable record of all changes. If a vulnerability or incorrect configuration is introduced, you know exactly which commit caused it, who did it, and when, facilitating auditing and rollback.
Infrastructure as Code (IaC): Tools like Terraform, Ansible, or CloudFormation use Git to store their configuration. This is crucial for DevSecOps and for securely and reproducibly managing cloud environments.
Collaboration and Resilience: It allows distributed teams to collaborate on the same code safely, without the risk of overwriting each other’s work. Being distributed, if a central server fails (e.g., GitLab, which we will discuss in another post), the complete history still exists on the local machines of those collaborating.
Workflow (CI/CD): Git is the central engine of most Continuous Integration and Continuous Deployment (CI/CD) pipelines, ensuring that only reviewed and tested code reaches production. For example, in the case of this blog: I use a project on GitLab as a code repository, and then deploy the functional site on Cloudflare Pages through a CI/CD flow.

Therefore, Git can add value to tasks as diverse as managing Infrastructure as Code, configuration management on servers, container orchestration/Kubernetes, or simply documentation management (personally, I have used it for security policies and other documents).

These uses demonstrate that Git is an architectural and knowledge management tool as important as a programming tool. It allows applying the principles of traceability and auditing, essential in cybersecurity, to all aspects of a system.

I plan to write something about GitLab soon, one of the Git servers I usually use for personal and work projects. And once I have more time, I will document the complete deployment of this blog with CI/CD using Hugo + GitLab + Cloudflare Pages here.