Instead of hosting all of my Docker containers on one VM, I’ve decided to split the workload into two: one for containers that provide system-facing functionality and another that provides user-facing functionality, system-apps and user-apps respectively. Let’s look only at system-facing apps first.
Identifying the problems we want to solve
Breaking down the kinds of problems we’re looking to do is difficult to do exhaustively:
- As initially identified, I want domains for all of my internal services
- DNS (copying
/etc/hostsfiles isn’t going to be sufficient)
- Port mapping so I don’t have to reach domains like
- DNS (copying
- I want to know when any of my systems are having problems
- Timeseries database for metrics (monitoring data points over time)
- Fulltext index for centralized event logging (monitoring aggregated events)
- Alerting for both metrics and events
- I want to cache downloads wherever possible
- Backups (worth noting, but already covered via Borgmatic pushes into FreeNAS)
I have no doubt there are other things we’ll want to do, but it’s probably best to feel the friction of a problem before we go out to solve it, otherwise we’ll end up making more work for ourselves than necessary, end up complicating things beyond reason.
Selecting solutions for the problems
Looking into the above, I see n1trux’s awesome-sysadmin list does a great job of cataloguing a variety of tools, at least enough to start digging.
The tools I settle on are as follows:
- DNS: Pi-hole (also does ad-blocking)
- Port mapping: Traefik (also does HTTPS-wrapping)
- Timeseries metrics: Prometheus fed by cAdvisor, node-exporter, Docker daemon, Healthchecks, visualized by Grafana
- Centralized logging: Graylog
- Alerting: Alertmanager, Graylog, Healthchecks
- Caching: Registry, devpi-server
Each one of these seems to have decent community support, and each one can be replaced if it shows itself to be more trouble than it’s worth. The real problems begin when one solution we want to keep depends on another we want to jetison. So let’s keep the interdependencies at a minimum where possible.
A gripe about Traefik’s documentation
I found Traefik’s documentation super-frustrating.
- Googling documentation often leads to old and incompatible documentation. Note that Django frequently highlights that you’re looking at old documentation. Applying this universally across old documentation would be helpful, especially if your latest invalidates the majority of that old documentation.
- Permitting some configuration variables to exist in either your static configuration or your dynamic configuration, but restricting others to only live in the dynamic configuration was a real pain until I recognized this requirement.
- Getting an instance of Traefki’s admin API up and running with basic auth credentials while redirecting all incoming HTTP traffic to HTTPS was a pain.
Gripes aside, I really like what Traefik does. I just wish StackOverflow questions clearly denoted what version the questions/answers are applicable to.
Putting experimentation on hold
I have a few other images I’d like to muck around with:
- Watchtower for keeping Docker images up to date
- Gotify for scripting Android alerts
- FreeIPA, OpenLDAP, and Keycloak for identity/session management
But none of these are sufficient pain points to warrant higher priority in my task queue. Again, wait to feel the friction of a problem before spending time solving it, especially when time is at a premium.