Monday, March 14, 2011

The "Out of socket memory" error

I recently did some work on some of our frontend machines (on which we run Varnish) at StumbleUpon and decided to track down some of the errors the Linux kernel was regularly throwing in kern.log such as:
Feb 25 08:23:42 foo kernel: [3077014.450011] Out of socket memory
Before we get started, let me tell you that you should NOT listen to any blog or forum post without doing your homework, especially when the post recommends that you tune up virtually every TCP related knob in the kernel. These people don't know what they're doing and most probably don't understand much to TCP/IP. Most importantly, their voodoo won't help you fix your problem and might actually make it worse.

Dive in the Linux kernel


In order to best understand what's going on, the best thing is to go read the code of the kernel. Unfortunately, the kernel's error messages or counters are often imprecise, confusing, or even misleading. But they're important. And reading the kernel's code isn't nearly as hard as what people say.

The "Out of socket memory" error


The only match for "Out of socket memory" in the kernel's code (as of v2.6.38) is in net/ipv4/tcp_timer.c:
 66 static int tcp_out_of_resources(struct sock *sk, int do_reset)
67 {
68 struct tcp_sock *tp = tcp_sk(sk);
69 int shift = 0;
70
71 /* If peer does not open window for long time, or did not transmit
72 * anything for long time, penalize it. */
73 if ((s32)(tcp_time_stamp - tp->lsndtime) > 2*TCP_RTO_MAX || !do_reset)
74 shift++;
75
76 /* If some dubious ICMP arrived, penalize even more. */
77 if (sk->sk_err_soft)
78 shift++;
79
80 if (tcp_too_many_orphans(sk, shift)) {
81 if (net_ratelimit())
82 printk(KERN_INFO "Out of socket memory\n");
So the question is: when does tcp_too_many_orphans return true? Let's take a look in include/net/tcp.h:
 268 static inline bool tcp_too_many_orphans(struct sock *sk, int shift)
269 {
270 struct percpu_counter *ocp = sk->sk_prot->orphan_count;
271 int orphans = percpu_counter_read_positive(ocp);
272
273 if (orphans << shift > sysctl_tcp_max_orphans) {
274 orphans = percpu_counter_sum_positive(ocp);
275 if (orphans << shift > sysctl_tcp_max_orphans)
276 return true;
277 }
278
279 if (sk->sk_wmem_queued > SOCK_MIN_SNDBUF &&
280 atomic_long_read(&tcp_memory_allocated) > sysctl_tcp_mem[2])
281 return true;
282 return false;
283 }
So two conditions that can trigger this "Out of socket memory" error:
  1. There are "too many" orphan sockets (most common).
  2. The socket already has the minimum amount of memory and we can't give it more because TCP is already using more than its limit.
In order to remedy to your problem, you need to figure out which case you fall into. The vast majority of the people (especially those dealing with frontend servers like Varnish) fall into case 1.

Are you running out of TCP memory?


Ruling out case 2 is easy. All you need is to see how much memory your kernel is configured to give to TCP vs how much is actually being used. If you're close to the limit (uncommon), then you're in case 2. Otherwise (most common) you're in case 1. The kernel keeps track of the memory allocated to TCP in multiple of pages, not in bytes. This is a first bit of confusion that a lot of people run into because some settings are in bytes and other are in pages (and most of the time 1 page = 4096 bytes).

Rule out case 2: find how much memory the kernel is willing to give to TCP:
$ cat /proc/sys/net/ipv4/tcp_mem
3093984 4125312 6187968
The values are in number of pages. They get automatically sized at boot time (values above are for a machine with 32GB of RAM). They mean:
  1. When TCP uses less than 3093984 pages (11.8GB), the kernel will consider it below the "low threshold" and won't bother TCP about its memory consumption.
  2. When TCP uses more than 4125312 pages (15.7GB), enter the "memory pressure" mode.
  3. The maximum number of pages the kernel is willing to give to TCP is 6187968 (23.6GB). When we go above this, we'll start seeing the "Out of socket memory" error and Bad Things will happen.
Now let's find how much of that memory TCP actually uses.
$ cat /proc/net/sockstat
sockets: used 14565
TCP: inuse 35938 orphan 21564 tw 70529 alloc 35942 mem 1894
UDP: inuse 11 mem 3
UDPLITE: inuse 0
RAW: inuse 0
FRAG: inuse 0 memory 0
The last value on the second line (mem 1894) is the number of pages allocated to TCP. In this case we can see that 1894 is way below 6187968, so there's no way we can possibly be running out of TCP memory. So in this case, the "Out of socket memory" error was caused by the number of orphan sockets.

Do you have "too many" orphan sockets?


First of all: what's an orphan socket? It's simply a socket that isn't associated to a file descriptor. For instance, after you close() a socket, you no longer hold a file descriptor to reference it, but it still exists because the kernel has to keep it around for a bit more until TCP is done with it. Because orphan sockets aren't very useful to applications (since applications can't interact with them), the kernel is trying to limit the amount of memory consumed by orphans, and it does so by limiting the number of orphans that stick around. If you're running a frontend web server (or an HTTP load balancer), then you'll most likely have a sizeable number of orphans, and that's perfectly normal.

In order to find the limit on the number of orphan sockets, simply do:
$ cat /proc/sys/net/ipv4/tcp_max_orphans
65536
Here we see the default value, which is 64k. In order to find the number of orphan sockets in the system, look again in sockstat:
$ cat /proc/net/sockstat
sockets: used 14565
TCP: inuse 35938 orphan 21564 tw 70529 alloc 35942 mem 1894
[...]
So in this case we have 21564 orphans. That doesn't seem very close to 65536... Yet, if you look once more at the code above that prints the warning, you'll see that there is this shift variable that has a value between 0 and 2, and that the check is testing if (orphans << shift > sysctl_tcp_max_orphans). What this means is that in certain cases, the kernel decides to penalize some sockets more, and it does so by multiplying the number of orphans by 2x or 4x to artificially increase the "score" of the "bad socket" to penalize. The problem is that due to the way this is implemented, you can see a worrisome "Out of socket memory" error when in fact you're still 4x below the limit and you just had a couple "bad sockets" (which happens frequently when you have an Internet facing service). So unfortunately that means that you need to tune up the maximum number of orphan sockets even if you're 2x or 4x away from the threshold. What value is reasonable for you depends on your situation at hand. Observe how the count of orphans in /proc/net/sockstat is changing when your server is at peak traffic, multiply that value by 4, round it up a bit to have a nice value, and set it. You can set it by doing a echo of the new value in /proc/sys/net/ipv4/tcp_max_orphans, and don't forget to update the value of net.ipv4.tcp_max_orphans in /etc/sysctl.conf so that your change persists across reboots.

That's all you need to get rid of these "Out of socket memory" errors, most of which are "false alarms" due to the shift variable of the implementation.

10 comments:

Alfred Armstrong said...

Hi, I saw this error on a server over the weekend so I started monitoring the orphans count from sockstat via cacti to try to understand what was happening.

The error recurred but rather than a gradually climbing line what I saw was a near vertical ascent from the low hundreds to 30K in a fraction of a second.

This suggests to me that we have some sort of software issue, most likely with php-fastcgi, but I don't have an immediate clue how to diagnose it or find a solution.

Any thoughts?

tsuna said...

If I were you I'd look at other metrics such as number of sockets, number of connections opened, whether the box is sending SYN cookies (which are enabled by default in a lot of kernels these days), the packet rate, and such.

PS: all these metrics and many more are exposed by OpenTSDB (http://opentsdb.net) with a data point every 15 seconds (by default). Give it a try and I promise you'll never wanna use Cacti again :)

Alex said...

Hi,

I seem to have a problem with no orphans but some limit - but following your analysis I have 100216 sockets used which is still far less than 4607424; yet I am still getting "Out of socket memory" errors:

[root@frontend2 log]# cat /proc/net/sockstat | grep orphan
TCP: inuse 134376 orphan 0 tw 87 alloc 150249 mem 100217
[root@frontend2 log]#
[root@frontend2 log]#
[root@frontend2 log]# cat /proc/sys/net/ipv4/tcp_mem
2303712 3071616 4607424
[root@frontend2 log]# cat /proc/net/sockstat
sockets: used 150458
TCP: inuse 134376 orphan 1 tw 69 alloc 150250 mem 100217
UDP: inuse 12 mem 3
UDPLITE: inuse 0
RAW: inuse 0
FRAG: inuse 0 memory 0
[root@frontend2 log]# cat /proc/sys/net/ipv4/tcp_max_orphans
65536
[root@frontend2 log]# cat /proc/net/sockstat
sockets: used 150458
TCP: inuse 134375 orphan 1 tw 76 alloc 150251 mem 100216
UDP: inuse 12 mem 3
UDPLITE: inuse 0
RAW: inuse 0
FRAG: inuse 0 memory 0

tsuna said...

What's your kernel version?

Alex said...

tsuna - 2.6.32-131.0.15.el6.x86_64 (RHEL6 stock)

I'm building openTSDB now to get some graphs - it happened a few minutes ago and again on both counts it looked like I had loads of spare capacity!

Jose Octanio said...

Hi there.

I'm having the "Out of socket memory" error in one of my servers.

I raised the orphans from 62144 to 1062144 but still getting the error.


$ cat /proc/net/sockstat
sockets: used 2298
TCP: inuse 7 orphan 0 tw 239 alloc 522 mem 66

66 its a very low memory use.

$ cat /proc/sys/net/ipv4/tcp_mem
3071488 3075584 3079680

Should I raise the tcp_mem limit also?

$ cat /proc/version
Linux version 2.6.32-5-openvz-amd64

Sorry for my english.
You have a very nice blog, thanks.

ryran said...

Thanks so much for this write-up Tsuna! I didn't know anything about all this and it was perfect to help get me started -- turned out the system I was troubleshooting had tcp_mem's tunables set to insanely-low values (max pages equaled 64 MiB) by someone else in the past. Awesome.

Anonymous said...

Thanks also for this posting!!! For the first time it seems that I've understood some kernel topic more than thru any other posting in the internet.

s7v7nislands said...

thank you for your writing.

I dig the new version kernel, the code is changed. it will more clear than before.

linux kernel 4.0 net/ipv4/tcp.c

bool tcp_check_oom(struct sock *sk, int shift)
{
bool too_many_orphans, out_of_socket_memory;

too_many_orphans = tcp_too_many_orphans(sk, shift);
out_of_socket_memory = tcp_out_of_memory(sk);

if (too_many_orphans)
net_info_ratelimited("too many orphaned sockets\n");
if (out_of_socket_memory)
net_info_ratelimited("out of memory -- consider tuning tcp_mem\n");
return too_many_orphans || out_of_socket_memory;
}

Anonymous said...

We are seeing our system running out of TCP memory. Below are some outputs.

cat /proc/net/sockstat

sockets: used 5133
TCP: inuse 5412 orphan 1484 tw 6 alloc 6497 mem 3146876

cat /proc/sys/net/ipv4/tcp_mem
1538640 20515524 3077280

Ip:
30660428 total packets received
1715 with invalid addresses
0 forwarded
0 incoming packets discarded
30658701 incoming packets delivered
36188395 requests sent out
2 outgoing packets dropped
294 dropped because of missing route

Tcp:
194813 active connections openings
374832 passive connection openings
693 failed connection attempts
65244 connection resets received
3625 connections established
29077689 segments received
33778102 segments send out
758687 segments retransmited
7060 bad segments received.
73069 resets sent

Kernel Interface table
Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg
eth0 9100 0 108643153 0 0 343 152239352 0 0 0 BMPRU
eth1 9100 0 152069557 0 0 374 108196061 0 0 0 BMPRU
eth2 9100 0 0 0 0 0 0 0 0 0 BMU
eth3 9100 0 0 0 0 0 0 0 0 0 BMU
eth4 9100 0 0 0 0 0 0 0 0 0 BMU
eth5 1500 0 29360159 0 83761 413 33350035 0 0 0 BMRU
lo 65536 0 1462873 0 0 0 1462873 0 0 0 LRU
v1asm1 9100 0 0 0 0 0 0 0 0 0 BMRU
vasm1 9100 0 0 0 0 0 0 0 0 0 BMPRU
v2asm2 9100 0 0 0 0 0 0 0 0 0 BMRU
vasm2 9100 0 0 0 0 0 0 0 0 0 BMPRU


Any ideas guys?