/sys/devices/system/node/node*/numastat
and decided to add these numbers to a collector for OpenTSDB. But whenever I add a collector that reads metrics from /proc
or /sys
, I always need to go read the Linux kernel's source code, because most metrics tend to be misleading and under-documented (when they're documented at all).In this case, if you RTFM, you'll get this:
Numa policy hit/miss statistics /sys/devices/system/node/node*/numastat All units are pages. Hugepages have separate counters. numa_hit A process wanted to allocate memory from this node, and succeeded. numa_miss A process wanted to allocate memory from another node, but ended up with memory from this node. numa_foreign A process wanted to allocate on this node, but ended up with memory from another one. local_node A process ran on this node and got memory from it. other_node A process ran on this node and got memory from another node. interleave_hit Interleaving wanted to allocate from this node and succeeded.I was very confused about the last one, about the exact difference between the second and the third one, and about the difference between the first 3 metrics and the next 2.
After RTFSC, the relevant part of the code appeared to be in
mm/vmstat.c
:void zone_statistics(struct zone *preferred_zone, struct zone *z, gfp_t flags) { if (z->zone_pgdat == preferred_zone->zone_pgdat) { __inc_zone_state(z, NUMA_HIT); } else { __inc_zone_state(z, NUMA_MISS); __inc_zone_state(preferred_zone, NUMA_FOREIGN); } if (z->node == ((flags & __GFP_OTHER_NODE) ? preferred_zone->node : numa_node_id())) __inc_zone_state(z, NUMA_LOCAL); else __inc_zone_state(z, NUMA_OTHER); }
So here's what it all really means:
numa_hit
: Number of pages allocated from the node the process wanted.numa_miss
: Number of pages allocated from this node, but the process preferred another node.numa_foreign
: Number of pages allocated another node, but the process preferred this node.local_node
: Number of pages allocated from this node while the process was running locally.other_node
: Number of pages allocated from this node while the process was running remotely (on another node).interleave_hit
: Number of pages allocated successfully with the interleave strategy.
I was originally confused about
numa_foreign
but this metric can actually be useful to see what happens when a node runs out of free pages. If a process attempts to get a page from its local node, but this node is out of free pages, then the numa_miss
of that node will be incremented (indicating that the node is out of memory) and another node will accomodate the process's request. But in order to know which nodes are "lending memory" to the out-of-memory node, you need to look at numa_foreign
. Having a high value for numa_foreign
for a particular node indicates that this node's memory is under-utilized so the node is frequently accommodating memory allocation requests that failed on other nodes.