Message ID | 20200214071233.100682-1-shakeelb@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | memcg: net: do not associate sock with unrelated memcg | expand |
Hello, Shakeel! On Thu, Feb 13, 2020 at 11:12:33PM -0800, Shakeel Butt wrote: > We are testing network memory accounting in our setup and noticed > inconsistent network memory usage and often unrelated memcgs network > usage correlates with testing workload. On further inspection, it seems > like mem_cgroup_sk_alloc() is broken in irq context specially for > cgroup v1. A great catch! > > mem_cgroup_sk_alloc() can be called in irq context and kind > of assumes that it can only happen from sk_clone_lock() and the source > sock object has already associated memcg. However in cgroup v1, where > network memory accounting is opt-in, the source sock can be not > associated with any memcg and the new cloned sock can get associated > with unrelated interrupted memcg. > > Cgroup v2 can also suffer if the source sock object was created by > process in the root memcg or if sk_alloc() is called in irq context. Do you mind sharing a call trace? Also, shouldn't cgroup_sk_alloc() be changed in a similar way? Thanks! Roman
On Fri, Feb 14, 2020 at 1:47 PM Roman Gushchin <guro@fb.com> wrote: > > Hello, Shakeel! > > On Thu, Feb 13, 2020 at 11:12:33PM -0800, Shakeel Butt wrote: > > We are testing network memory accounting in our setup and noticed > > inconsistent network memory usage and often unrelated memcgs network > > usage correlates with testing workload. On further inspection, it seems > > like mem_cgroup_sk_alloc() is broken in irq context specially for > > cgroup v1. > > A great catch! > > > > > mem_cgroup_sk_alloc() can be called in irq context and kind > > of assumes that it can only happen from sk_clone_lock() and the source > > sock object has already associated memcg. However in cgroup v1, where > > network memory accounting is opt-in, the source sock can be not > > associated with any memcg and the new cloned sock can get associated > > with unrelated interrupted memcg. > > > > Cgroup v2 can also suffer if the source sock object was created by > > process in the root memcg or if sk_alloc() is called in irq context. > > Do you mind sharing a call trace? > Sure, see below. I added a dump_stack() in mem_cgroup_sk_alloc(). [ 647.255327] CPU: 68 PID: 15859 Comm: ssh Tainted: G O 5.6.0-smp-DEV #1 [ 647.255328] Hardware name: ... [ 647.255328] Call Trace: [ 647.255329] <IRQ> [ 647.255333] dump_stack+0x57/0x75 [ 647.255336] mem_cgroup_sk_alloc+0xe9/0xf0 [ 647.255337] sk_clone_lock+0x2a7/0x420 [ 647.255339] inet_csk_clone_lock+0x1b/0x110 [ 647.255340] tcp_create_openreq_child+0x23/0x3b0 [ 647.255342] tcp_v6_syn_recv_sock+0x88/0x730 [ 647.255343] tcp_check_req+0x429/0x560 [ 647.255345] tcp_v6_rcv+0x72d/0xa40 [ 647.255347] ip6_protocol_deliver_rcu+0xc9/0x400 [ 647.255348] ip6_input+0x44/0xd0 [ 647.255349] ? ip6_protocol_deliver_rcu+0x400/0x400 [ 647.255350] ip6_rcv_finish+0x71/0x80 [ 647.255351] ipv6_rcv+0x5b/0xe0 [ 647.255352] ? ip6_sublist_rcv+0x2e0/0x2e0 [ 647.255354] process_backlog+0x108/0x1e0 [ 647.255355] net_rx_action+0x26b/0x460 [ 647.255357] __do_softirq+0x104/0x2a6 [ 647.255358] do_softirq_own_stack+0x2a/0x40 [ 647.255359] </IRQ> [ 647.255361] do_softirq.part.19+0x40/0x50 [ 647.255362] __local_bh_enable_ip+0x51/0x60 [ 647.255363] ip6_finish_output2+0x23d/0x520 [ 647.255365] ? ip6table_mangle_hook+0x55/0x160 [ 647.255366] __ip6_finish_output+0xa1/0x100 [ 647.255367] ip6_finish_output+0x30/0xd0 [ 647.255368] ip6_output+0x73/0x120 [ 647.255369] ? __ip6_finish_output+0x100/0x100 [ 647.255370] ip6_xmit+0x2e3/0x600 [ 647.255372] ? ipv6_anycast_cleanup+0x50/0x50 [ 647.255373] ? inet6_csk_route_socket+0x136/0x1e0 [ 647.255374] ? skb_free_head+0x1e/0x30 [ 647.255375] inet6_csk_xmit+0x95/0xf0 [ 647.255377] __tcp_transmit_skb+0x5b4/0xb20 [ 647.255378] __tcp_send_ack.part.60+0xa3/0x110 [ 647.255379] tcp_send_ack+0x1d/0x20 [ 647.255380] tcp_rcv_state_process+0xe64/0xe80 [ 647.255381] ? tcp_v6_connect+0x5d1/0x5f0 [ 647.255383] tcp_v6_do_rcv+0x1b1/0x3f0 [ 647.255384] ? tcp_v6_do_rcv+0x1b1/0x3f0 [ 647.255385] __release_sock+0x7f/0xd0 [ 647.255386] release_sock+0x30/0xa0 [ 647.255388] __inet_stream_connect+0x1c3/0x3b0 [ 647.255390] ? prepare_to_wait+0xb0/0xb0 [ 647.255391] inet_stream_connect+0x3b/0x60 [ 647.255394] __sys_connect+0x101/0x120 [ 647.255395] ? __sys_getsockopt+0x11b/0x140 [ 647.255397] __x64_sys_connect+0x1a/0x20 [ 647.255398] do_syscall_64+0x51/0x200 [ 647.255399] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 647.255401] RIP: 0033:0x7f45464fcd50 > Also, shouldn't cgroup_sk_alloc() be changed in a similar way? > I will check cgroup_sk_alloc() too. Thanks, Shakeel
On Fri, Feb 14, 2020 at 1:52 PM Shakeel Butt <shakeelb@google.com> wrote: > > On Fri, Feb 14, 2020 at 1:47 PM Roman Gushchin <guro@fb.com> wrote: > > > > Hello, Shakeel! > > > > On Thu, Feb 13, 2020 at 11:12:33PM -0800, Shakeel Butt wrote: > > > We are testing network memory accounting in our setup and noticed > > > inconsistent network memory usage and often unrelated memcgs network > > > usage correlates with testing workload. On further inspection, it seems > > > like mem_cgroup_sk_alloc() is broken in irq context specially for > > > cgroup v1. > > > > A great catch! > > > > > > > > mem_cgroup_sk_alloc() can be called in irq context and kind > > > of assumes that it can only happen from sk_clone_lock() and the source > > > sock object has already associated memcg. However in cgroup v1, where > > > network memory accounting is opt-in, the source sock can be not > > > associated with any memcg and the new cloned sock can get associated > > > with unrelated interrupted memcg. > > > > > > Cgroup v2 can also suffer if the source sock object was created by > > > process in the root memcg or if sk_alloc() is called in irq context. > > > > Do you mind sharing a call trace? > > > > Sure, see below. I added a dump_stack() in mem_cgroup_sk_alloc(). > > [ 647.255327] CPU: 68 PID: 15859 Comm: ssh Tainted: G O > 5.6.0-smp-DEV #1 > [ 647.255328] Hardware name: ... > [ 647.255328] Call Trace: > [ 647.255329] <IRQ> > [ 647.255333] dump_stack+0x57/0x75 > [ 647.255336] mem_cgroup_sk_alloc+0xe9/0xf0 > [ 647.255337] sk_clone_lock+0x2a7/0x420 > [ 647.255339] inet_csk_clone_lock+0x1b/0x110 > [ 647.255340] tcp_create_openreq_child+0x23/0x3b0 > [ 647.255342] tcp_v6_syn_recv_sock+0x88/0x730 > [ 647.255343] tcp_check_req+0x429/0x560 > [ 647.255345] tcp_v6_rcv+0x72d/0xa40 > [ 647.255347] ip6_protocol_deliver_rcu+0xc9/0x400 > [ 647.255348] ip6_input+0x44/0xd0 > [ 647.255349] ? ip6_protocol_deliver_rcu+0x400/0x400 > [ 647.255350] ip6_rcv_finish+0x71/0x80 > [ 647.255351] ipv6_rcv+0x5b/0xe0 > [ 647.255352] ? ip6_sublist_rcv+0x2e0/0x2e0 > [ 647.255354] process_backlog+0x108/0x1e0 > [ 647.255355] net_rx_action+0x26b/0x460 > [ 647.255357] __do_softirq+0x104/0x2a6 > [ 647.255358] do_softirq_own_stack+0x2a/0x40 > [ 647.255359] </IRQ> > [ 647.255361] do_softirq.part.19+0x40/0x50 > [ 647.255362] __local_bh_enable_ip+0x51/0x60 > [ 647.255363] ip6_finish_output2+0x23d/0x520 > [ 647.255365] ? ip6table_mangle_hook+0x55/0x160 > [ 647.255366] __ip6_finish_output+0xa1/0x100 > [ 647.255367] ip6_finish_output+0x30/0xd0 > [ 647.255368] ip6_output+0x73/0x120 > [ 647.255369] ? __ip6_finish_output+0x100/0x100 > [ 647.255370] ip6_xmit+0x2e3/0x600 > [ 647.255372] ? ipv6_anycast_cleanup+0x50/0x50 > [ 647.255373] ? inet6_csk_route_socket+0x136/0x1e0 > [ 647.255374] ? skb_free_head+0x1e/0x30 > [ 647.255375] inet6_csk_xmit+0x95/0xf0 > [ 647.255377] __tcp_transmit_skb+0x5b4/0xb20 > [ 647.255378] __tcp_send_ack.part.60+0xa3/0x110 > [ 647.255379] tcp_send_ack+0x1d/0x20 > [ 647.255380] tcp_rcv_state_process+0xe64/0xe80 > [ 647.255381] ? tcp_v6_connect+0x5d1/0x5f0 > [ 647.255383] tcp_v6_do_rcv+0x1b1/0x3f0 > [ 647.255384] ? tcp_v6_do_rcv+0x1b1/0x3f0 > [ 647.255385] __release_sock+0x7f/0xd0 > [ 647.255386] release_sock+0x30/0xa0 > [ 647.255388] __inet_stream_connect+0x1c3/0x3b0 > [ 647.255390] ? prepare_to_wait+0xb0/0xb0 > [ 647.255391] inet_stream_connect+0x3b/0x60 > [ 647.255394] __sys_connect+0x101/0x120 > [ 647.255395] ? __sys_getsockopt+0x11b/0x140 > [ 647.255397] __x64_sys_connect+0x1a/0x20 > [ 647.255398] do_syscall_64+0x51/0x200 > [ 647.255399] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > [ 647.255401] RIP: 0033:0x7f45464fcd50 > > > Also, shouldn't cgroup_sk_alloc() be changed in a similar way? > > > > I will check cgroup_sk_alloc() too. > Yes, cgroup_sk_alloc() should be changed similarly too.
diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 63bb6a2aab81..f500da82bfe8 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -6697,6 +6697,10 @@ void mem_cgroup_sk_alloc(struct sock *sk) return; } + /* Do not associate the sock with unrelated interrupted task's memcg. */ + if (in_interrupt()) + return; + rcu_read_lock(); memcg = mem_cgroup_from_task(current); if (memcg == root_mem_cgroup)
We are testing network memory accounting in our setup and noticed inconsistent network memory usage and often unrelated memcgs network usage correlates with testing workload. On further inspection, it seems like mem_cgroup_sk_alloc() is broken in irq context specially for cgroup v1. mem_cgroup_sk_alloc() can be called in irq context and kind of assumes that it can only happen from sk_clone_lock() and the source sock object has already associated memcg. However in cgroup v1, where network memory accounting is opt-in, the source sock can be not associated with any memcg and the new cloned sock can get associated with unrelated interrupted memcg. Cgroup v2 can also suffer if the source sock object was created by process in the root memcg or if sk_alloc() is called in irq context. The fix is to just do nothing in interrupt. Fixes: 2d7580738345 ("mm: memcontrol: consolidate cgroup socket tracking") Signed-off-by: Shakeel Butt <shakeelb@google.com> --- mm/memcontrol.c | 4 ++++ 1 file changed, 4 insertions(+)