I got tired of not knowing what was happening.
Not in a paranoid way. In the way where you are using S3 or some managed blob store and a request fails and you have no idea why. The dashboard says healthy. The logs say nothing. You just sit there refreshing, waiting for the magic to fix itself.
I am not a cloud engineer. I am just someone who kept using these tools with a vague idea of how they worked underneath. Docker, Kubernetes, distributed storage — I knew the concepts, but vaguely. Like I could explain it at a high level but could not point to the actual moving parts. That gap bothered me. So I built a tiny version myself to close it.
I do not know if anyone will ever read this. Honestly that is fine. I will read it when I need to recall how any of this works. Consider it a note to myself that got a bit too long.
The Topology
One gateway node called lb01 faces the outside world. Behind it are two storage nodes sitting on a private internal bridge. No public interface. The only way in is through lb01, but importantly, the nodes can talk to each other over this private bridge to coordinate requests.
I set this up with Vagrant. Just a config file that spins up VMs locally.
Vagrant.configure("2") do |config|
config.vm.box = "ubuntu/jammy64"
config.vm.define "lb01" do |lb|
lb.vm.hostname = "lb01"
lb.vm.network "private_network", ip: "192.168.56.10" # public-facing
lb.vm.network "private_network", ip: "192.168.57.10" # internal backend
end
["storage01", "storage02"].each_with_index do |name, i|
config.vm.define name do |node|
node.vm.hostname = name
node.vm.network "private_network", ip: "192.168.57.#{11+i}" # internal only
end
end
endInteractive Storage Topology
Press Play to watch the request flow, or step through manually.
This is called a dual-homed setup. A machine with two network interfaces, one facing each direction. The security win is almost embarrassingly simple: you cannot attack a machine you cannot reach. You harden lb01 and you are done.
How the Data Gets to the Right Place
My first routing was embarrassing. Round-robin. Node 1, node 2, node 1, node 2.
There is a simple API layer to upload files, but I am not going to talk about it here. It is too simple to be interesting, it just takes a file and writes it to disk. The real problem is figuring out which disk it should go to.
Dead. Shell exits, children die. This is not a bug, it is just how Unix process groups work. I never had to care before because something else was always handling it. PM2, Docker, a systemd service someone else configured. The platform absorbs the problem and you never see it.
The fix is a consistent hash ring. The concept is simple. You place your nodes on an imaginary circle. Every key gets hashed to a point on that same circle. Whichever node is closest clockwise from that point owns the key. That is it. Same key, same node, every single time.
My version is maybe 60 lines of Go. Real databases like Cassandra and DynamoDB use the same idea at a much bigger scale. That part fascinated me.
package hashing
import (
"crypto/sha256"
"encoding/binary"
"fmt"
"sort"
)
type HashRing struct {
Nodes []uint32
NodeMap map[uint32]string
Replicas int
}
func NewHashRing(replicas int) *HashRing {
return &HashRing{
NodeMap: make(map[uint32]string),
Replicas: replicas,
}
}
func (r *HashRing) hashFn(key string) uint32 {
h := sha256.Sum256([]byte(key))
return binary.BigEndian.Uint32(h[:4])
}
func (r *HashRing) AddNode(nodeIP string) {
for i := 0; i < r.Replicas; i++ {
vNodeName := fmt.Sprintf("%s#%d", nodeIP, i)
hash := r.hashFn(vNodeName)
r.Nodes = append(r.Nodes, hash)
r.NodeMap[hash] = nodeIP
}
sort.Slice(r.Nodes, func(i, j int) bool {
return r.Nodes[i] < r.Nodes[j]
})
}
func (r *HashRing) GetNode(key string) string {
if len(r.Nodes) == 0 {
return ""
}
hash := r.hashFn(key)
idx := sort.Search(len(r.Nodes), func(i int) bool {
return r.Nodes[i] >= hash
})
if idx == len(r.Nodes) {
idx = 0
}
return r.NodeMap[r.Nodes[idx]]
}Lookups are O(log N). And when you add a node, only the keys between it and its neighbor move. Naive hash-mod reshuffles almost everything when you scale. That matters when traffic is high and you cannot afford the misses.
Each storage node boots with the ring already initialized. When a request lands on the wrong node, it just forwards.
func (s *StorageServer) HandleUpload(w http.ResponseWriter, r *http.Request) {
key := r.URL.Query().Get("key")
targetNode := s.ring.GetNode(key)
if targetNode == s.nodeIP || isInternal {
s.saveLocally(key, r.Body)
} else {
s.forwardToNode(w, targetNode, key, r)
}
}Upload hits storage01. Ring says this key belongs to storage02. Because they share the same internal bridge, storage01 can forward the request directly with an X-Internal-Request header so storage02 knows not to route it again. This is why the internal bridge connectivity is crucial — nodes aren't just isolated silos; they're a cooperative cluster.
The Battle Scars
First attempt at keeping the server running was obvious. SSH in, start the process, throw an &, disconnect.
ssh storage01 "./storage-server &"Dead. Shell exits, children die. This is not a bug. It is how Unix process groups work. I just never had to care before because deployment platforms always handled it for me.
What I needed was systemd-run.
systemd-run --unit=storage-server \
--same-dir \
NODE_IP=192.168.57.11 ./storage-serverSurvives disconnects. systemctl status storage-server tells you what is going on. Turns out this is just what every deployment tool does under the hood.
The other thing that bites you is the health check gap. lb01 boots, immediately starts routing, storage nodes are still warming up. You spend an hour convinced your ring logic is broken when it is just a race condition.
func (rr *RoundRobin) StartHealthChecks(interval, timeout time.Duration) {
checkServerHealth := func() {
for _, server := range rr.servers {
alive := isServerAlive(server, timeout)
rr.mu.Lock()
rr.alive[server] = alive
rr.mu.Unlock()
}
}
checkServerHealth() // run once before accepting traffic
ticker := time.NewTicker(interval)
for range ticker.C {
checkServerHealth()
}
}That first checkServerHealth() before the ticker blocks routing until at least one check has passed. Every load balancer you have ever used does this. I just never thought about it until I had to write it myself.
What Is Next
This whole thing is maybe 600 lines of Go across a handful of files. It handles uploads, downloads, routes keys correctly, and survives a node restart. It is genuinely not that complicated once you see it laid out.
But every piece of it made me understand something I was just taking for granted before. The dual-homed network made me understand why cloud VPCs are designed the way they are. The hash ring made me understand why Cassandra can add a node without downtime. The systemd-run thing made me understand what Kubernetes is actually doing when it manages process lifecycles.
The next layer I want to pull apart is containers. Not Docker. The actual Linux primitives Docker is built on. Namespaces for isolation so each workload gets its own view of the network and filesystem. Cgroups for resource limits so one bad process cannot eat the whole machine.
The Vagrantfile already has compute01 and compute02 stubbed out. Week 2.
I want to understand what a container actually is. Not the abstraction. The syscalls underneath it.
Because that is the whole point of this little project. Not to build something production-ready. Not to compete with anything. Just to pull the curtain back far enough that the magic stops being magic.
Turns out the magic is just Linux. And Linux is learnable.