FreeNAS to Prometheus

Looking for a bridge

One of my pending issues is to ensure that all hosts have backups (done), events into Graylog (done), and metrics into Prometheus (FreeNAS and UDM outstanding). Neither support Prometheus out of the box, so I’ve decided to tackle FreeNAS tonight.

A week or two back I’d gone hunting for solutions. Searching freenas prometheus results in dead end community questions and a few abandoned GitHub repos with attempts to write a node exporter for FreeNAS. Not great.

But if FreeNAS supports Graphite out of the box, maybe there’s a tool out there that ingests Graphite’s output and exposes it as Prometheus metrics. Searching graphite to prometheus reveals Prometheus' Graphite Exporter, which “accepts metrics via the Graphite protocol and exports them as Prometheus metrics”. Sounds about right!

Running the container

I install the Docker image and go use the snippet provided at the bottom, but something looks odd.

docker run -d -p 9108:9108 -p 9109:9109 -p 9109/udp:9109/udp
        -v $PWD/graphite_mapping.conf:/tmp/graphite_mapping.conf \
        prom/graphite-exporter -graphite.mapping-config=/tmp/graphite_mapping.conf

For one, that’s not the right Docker syntax for UDP ports. Further, there’s a backslash missing at the end of the first line. Adjusting, I think they intended something closer to this:

docker run -d -p 9108:9108 -p 9109:9109 -p 9109:9109/udp \
        -v ${PWD}/graphite_mapping.conf:/tmp/graphite_mapping.conf \
        prom/graphite-exporter --graphite.mapping-config=/tmp/graphite_mapping.conf

This is actually the up-to-date documentation on its GitLab page, so I’ll just assume the Docker page hasn’t been kept up to date.

Testing the container

The documentation provides the following test:

echo "test_tcp 1234 $(date +%s)" | nc localhost 9109
echo "test_udp 1234 $(date +%s)" | nc -u -w1 localhost 9109

But the TCP test just hangs and the UDP test just times out. Out of curiosity, I try telnet localhost 9109 and find I can connect without issue, so I presume this may be either a netcat variant issue or another documentation bug. I go skimming through all reported issues in the GitHub project hoping it might reveal something, but no luck. Searching for telnet works but nc doesn't reveals a few (1, 2, 3) forum threads that provide some insight, but nothing conclusive. Considering port 9108 is responsive, I’ll press on assuming it’s just another documentation issue and will revisit this if I see further signs of concern.

Hooking in FreeNAS

In FreeNAS, I set the “Remote Graphite Server Hostname” to system-apps' IP (I’ll hook up Traefik later once everything is flowing through) and note the field name implies I shouldn’t provide the port. Perhaps 9109 is the standard Graphite port? This Reddit post from 2018 suggests FreeNAS won’t work with a supplied port, but I notice they also refer to port 2003, not 9109. Curious.

Let’s confirm that FreeNAS is actually sending out data:

freenas# tcpdump dst port 9109
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on em0, link-type EN10MB (Ethernet), capture size 262144 bytes
^C
0 packets captured
188 packets received by filter
0 packets dropped by kernel

Nothing. Let’s try port 2003 just out of curiosity:

freenas# tcpdump dst port 2003
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on em0, link-type EN10MB (Ethernet), capture size 262144 bytes
20:47:43.745483 IP freenas.home.lan.64824 > pihole.home.lan.cfingerd: Flags [P.], seq 2969036737:2969038098, ack 4099801591, win 1026, options [nop,nop,TS val 1901950204 ecr 2514409512], length 1361
20:47:43.748299 IP freenas.home.lan.64824 > pihole.home.lan.cfingerd: Flags [.], seq 1361:2809, ack 1, win 1026, options [nop,nop,TS val 1901950207 ecr 2514409512], length 1448
20:47:43.748971 IP freenas.home.lan.64824 > pihole.home.lan.cfingerd: Flags [.], seq 2809:4257, ack 1, win 1026, options [nop,nop,TS val 1901950208 ecr 2514409512], length 1448
...

Okay, so Graphite Exporter’s documentation is assuming a custom port? Maybe there’s a good reason I’m not seeing here, and I doubt it’s to avoid conflict with cfingerd, despite tcpdump’s outdated port lookup. I adjust the docker-compose.yml to work with 2003.

services:
    graphite-exporter:
        ports:
-            - "9108:9108/tcp"
-            - "9108:9108/udp"
+            - "2003:2003/tcp"
+            - "2003:2003/udp"
+        command: --graphite.listen-address=":2003"

And now find data flowing through as expected:

...
# HELP servers_freenas_home_lan_aggregation_cpu_average_cpu_idle Graphite metric servers_freenas_home_lan_aggregation_cpu_average_cpu_idle
# TYPE servers_freenas_home_lan_aggregation_cpu_average_cpu_idle gauge
servers_freenas_home_lan_aggregation_cpu_average_cpu_idle 126.335114
# HELP servers_freenas_home_lan_aggregation_cpu_average_cpu_interrupt Graphite metric servers_freenas_home_lan_aggregation_cpu_average_cpu_interrupt
# TYPE servers_freenas_home_lan_aggregation_cpu_average_cpu_interrupt gauge
servers_freenas_home_lan_aggregation_cpu_average_cpu_interrupt 0
# HELP servers_freenas_home_lan_aggregation_cpu_average_cpu_nice Graphite metric servers_freenas_home_lan_aggregation_cpu_average_cpu_nice
# TYPE servers_freenas_home_lan_aggregation_cpu_average_cpu_nice gauge
servers_freenas_home_lan_aggregation_cpu_average_cpu_nice 0
# HELP servers_freenas_home_lan_aggregation_cpu_average_cpu_system Graphite metric servers_freenas_home_lan_aggregation_cpu_average_cpu_system
# TYPE servers_freenas_home_lan_aggregation_cpu_average_cpu_system gauge
servers_freenas_home_lan_aggregation_cpu_average_cpu_system 0.200511
...

Some closing thoughts

This experience along with recent help requests with some of my old Python libraries has had me consider reprioritizing documentation in the development effort. On one hand, we write code to scratch an itch, but unless we document it well enough it’s just going to be a source of frustration for others. I get why someone might give up halfway through. This task wasn’t overly difficult, but it was far from smooth.

That being said, all of the above took maybe 30 minutes? Typing it out to be readable took roughly 60. I can sympathize with the desire to just get things done with minimal notes. Maybe developers need to better engage with technical writers? Maybe we need QA for documentation? Just spitballing.

Lastly, some things that would have been nice:

Edit from the future

Setting up the Unifi Poller was comparatively an afterthought. Create a read-only user in Unifi, provide the domain/username/password as docker-compose environment variables, and that’s it.