memcg: net: do not associate sock with unrelated memcg
diff mbox series

Message ID 20200214071233.100682-1-shakeelb@google.com
State New
Headers show
Series
  • memcg: net: do not associate sock with unrelated memcg
Related show

Commit Message

Shakeel Butt Feb. 14, 2020, 7:12 a.m. UTC
We are testing network memory accounting in our setup and noticed
inconsistent network memory usage and often unrelated memcgs network
usage correlates with testing workload. On further inspection, it seems
like mem_cgroup_sk_alloc() is broken in irq context specially for
cgroup v1.

mem_cgroup_sk_alloc() can be called in irq context and kind
of assumes that it can only happen from sk_clone_lock() and the source
sock object has already associated memcg. However in cgroup v1, where
network memory accounting is opt-in, the source sock can be not
associated with any memcg and the new cloned sock can get associated
with unrelated interrupted memcg.

Cgroup v2 can also suffer if the source sock object was created by
process in the root memcg or if sk_alloc() is called in irq context.
The fix is to just do nothing in interrupt.

Fixes: 2d7580738345 ("mm: memcontrol: consolidate cgroup socket tracking")
Signed-off-by: Shakeel Butt <shakeelb@google.com>
---
 mm/memcontrol.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Roman Gushchin Feb. 14, 2020, 9:47 p.m. UTC | #1
Hello, Shakeel!

On Thu, Feb 13, 2020 at 11:12:33PM -0800, Shakeel Butt wrote:
> We are testing network memory accounting in our setup and noticed
> inconsistent network memory usage and often unrelated memcgs network
> usage correlates with testing workload. On further inspection, it seems
> like mem_cgroup_sk_alloc() is broken in irq context specially for
> cgroup v1.

A great catch!

> 
> mem_cgroup_sk_alloc() can be called in irq context and kind
> of assumes that it can only happen from sk_clone_lock() and the source
> sock object has already associated memcg. However in cgroup v1, where
> network memory accounting is opt-in, the source sock can be not
> associated with any memcg and the new cloned sock can get associated
> with unrelated interrupted memcg.
> 
> Cgroup v2 can also suffer if the source sock object was created by
> process in the root memcg or if sk_alloc() is called in irq context.

Do you mind sharing a call trace?

Also, shouldn't cgroup_sk_alloc() be changed in a similar way?

Thanks!

Roman
Shakeel Butt Feb. 14, 2020, 9:52 p.m. UTC | #2
On Fri, Feb 14, 2020 at 1:47 PM Roman Gushchin <guro@fb.com> wrote:
>
> Hello, Shakeel!
>
> On Thu, Feb 13, 2020 at 11:12:33PM -0800, Shakeel Butt wrote:
> > We are testing network memory accounting in our setup and noticed
> > inconsistent network memory usage and often unrelated memcgs network
> > usage correlates with testing workload. On further inspection, it seems
> > like mem_cgroup_sk_alloc() is broken in irq context specially for
> > cgroup v1.
>
> A great catch!
>
> >
> > mem_cgroup_sk_alloc() can be called in irq context and kind
> > of assumes that it can only happen from sk_clone_lock() and the source
> > sock object has already associated memcg. However in cgroup v1, where
> > network memory accounting is opt-in, the source sock can be not
> > associated with any memcg and the new cloned sock can get associated
> > with unrelated interrupted memcg.
> >
> > Cgroup v2 can also suffer if the source sock object was created by
> > process in the root memcg or if sk_alloc() is called in irq context.
>
> Do you mind sharing a call trace?
>

Sure, see below. I added a dump_stack() in mem_cgroup_sk_alloc().

[  647.255327] CPU: 68 PID: 15859 Comm: ssh Tainted: G           O
 5.6.0-smp-DEV #1
[  647.255328] Hardware name: ...
[  647.255328] Call Trace:
[  647.255329]  <IRQ>
[  647.255333]  dump_stack+0x57/0x75
[  647.255336]  mem_cgroup_sk_alloc+0xe9/0xf0
[  647.255337]  sk_clone_lock+0x2a7/0x420
[  647.255339]  inet_csk_clone_lock+0x1b/0x110
[  647.255340]  tcp_create_openreq_child+0x23/0x3b0
[  647.255342]  tcp_v6_syn_recv_sock+0x88/0x730
[  647.255343]  tcp_check_req+0x429/0x560
[  647.255345]  tcp_v6_rcv+0x72d/0xa40
[  647.255347]  ip6_protocol_deliver_rcu+0xc9/0x400
[  647.255348]  ip6_input+0x44/0xd0
[  647.255349]  ? ip6_protocol_deliver_rcu+0x400/0x400
[  647.255350]  ip6_rcv_finish+0x71/0x80
[  647.255351]  ipv6_rcv+0x5b/0xe0
[  647.255352]  ? ip6_sublist_rcv+0x2e0/0x2e0
[  647.255354]  process_backlog+0x108/0x1e0
[  647.255355]  net_rx_action+0x26b/0x460
[  647.255357]  __do_softirq+0x104/0x2a6
[  647.255358]  do_softirq_own_stack+0x2a/0x40
[  647.255359]  </IRQ>
[  647.255361]  do_softirq.part.19+0x40/0x50
[  647.255362]  __local_bh_enable_ip+0x51/0x60
[  647.255363]  ip6_finish_output2+0x23d/0x520
[  647.255365]  ? ip6table_mangle_hook+0x55/0x160
[  647.255366]  __ip6_finish_output+0xa1/0x100
[  647.255367]  ip6_finish_output+0x30/0xd0
[  647.255368]  ip6_output+0x73/0x120
[  647.255369]  ? __ip6_finish_output+0x100/0x100
[  647.255370]  ip6_xmit+0x2e3/0x600
[  647.255372]  ? ipv6_anycast_cleanup+0x50/0x50
[  647.255373]  ? inet6_csk_route_socket+0x136/0x1e0
[  647.255374]  ? skb_free_head+0x1e/0x30
[  647.255375]  inet6_csk_xmit+0x95/0xf0
[  647.255377]  __tcp_transmit_skb+0x5b4/0xb20
[  647.255378]  __tcp_send_ack.part.60+0xa3/0x110
[  647.255379]  tcp_send_ack+0x1d/0x20
[  647.255380]  tcp_rcv_state_process+0xe64/0xe80
[  647.255381]  ? tcp_v6_connect+0x5d1/0x5f0
[  647.255383]  tcp_v6_do_rcv+0x1b1/0x3f0
[  647.255384]  ? tcp_v6_do_rcv+0x1b1/0x3f0
[  647.255385]  __release_sock+0x7f/0xd0
[  647.255386]  release_sock+0x30/0xa0
[  647.255388]  __inet_stream_connect+0x1c3/0x3b0
[  647.255390]  ? prepare_to_wait+0xb0/0xb0
[  647.255391]  inet_stream_connect+0x3b/0x60
[  647.255394]  __sys_connect+0x101/0x120
[  647.255395]  ? __sys_getsockopt+0x11b/0x140
[  647.255397]  __x64_sys_connect+0x1a/0x20
[  647.255398]  do_syscall_64+0x51/0x200
[  647.255399]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  647.255401] RIP: 0033:0x7f45464fcd50

> Also, shouldn't cgroup_sk_alloc() be changed in a similar way?
>

I will check cgroup_sk_alloc() too.

Thanks,
Shakeel
Shakeel Butt Feb. 14, 2020, 10:09 p.m. UTC | #3
On Fri, Feb 14, 2020 at 1:52 PM Shakeel Butt <shakeelb@google.com> wrote:
>
> On Fri, Feb 14, 2020 at 1:47 PM Roman Gushchin <guro@fb.com> wrote:
> >
> > Hello, Shakeel!
> >
> > On Thu, Feb 13, 2020 at 11:12:33PM -0800, Shakeel Butt wrote:
> > > We are testing network memory accounting in our setup and noticed
> > > inconsistent network memory usage and often unrelated memcgs network
> > > usage correlates with testing workload. On further inspection, it seems
> > > like mem_cgroup_sk_alloc() is broken in irq context specially for
> > > cgroup v1.
> >
> > A great catch!
> >
> > >
> > > mem_cgroup_sk_alloc() can be called in irq context and kind
> > > of assumes that it can only happen from sk_clone_lock() and the source
> > > sock object has already associated memcg. However in cgroup v1, where
> > > network memory accounting is opt-in, the source sock can be not
> > > associated with any memcg and the new cloned sock can get associated
> > > with unrelated interrupted memcg.
> > >
> > > Cgroup v2 can also suffer if the source sock object was created by
> > > process in the root memcg or if sk_alloc() is called in irq context.
> >
> > Do you mind sharing a call trace?
> >
>
> Sure, see below. I added a dump_stack() in mem_cgroup_sk_alloc().
>
> [  647.255327] CPU: 68 PID: 15859 Comm: ssh Tainted: G           O
>  5.6.0-smp-DEV #1
> [  647.255328] Hardware name: ...
> [  647.255328] Call Trace:
> [  647.255329]  <IRQ>
> [  647.255333]  dump_stack+0x57/0x75
> [  647.255336]  mem_cgroup_sk_alloc+0xe9/0xf0
> [  647.255337]  sk_clone_lock+0x2a7/0x420
> [  647.255339]  inet_csk_clone_lock+0x1b/0x110
> [  647.255340]  tcp_create_openreq_child+0x23/0x3b0
> [  647.255342]  tcp_v6_syn_recv_sock+0x88/0x730
> [  647.255343]  tcp_check_req+0x429/0x560
> [  647.255345]  tcp_v6_rcv+0x72d/0xa40
> [  647.255347]  ip6_protocol_deliver_rcu+0xc9/0x400
> [  647.255348]  ip6_input+0x44/0xd0
> [  647.255349]  ? ip6_protocol_deliver_rcu+0x400/0x400
> [  647.255350]  ip6_rcv_finish+0x71/0x80
> [  647.255351]  ipv6_rcv+0x5b/0xe0
> [  647.255352]  ? ip6_sublist_rcv+0x2e0/0x2e0
> [  647.255354]  process_backlog+0x108/0x1e0
> [  647.255355]  net_rx_action+0x26b/0x460
> [  647.255357]  __do_softirq+0x104/0x2a6
> [  647.255358]  do_softirq_own_stack+0x2a/0x40
> [  647.255359]  </IRQ>
> [  647.255361]  do_softirq.part.19+0x40/0x50
> [  647.255362]  __local_bh_enable_ip+0x51/0x60
> [  647.255363]  ip6_finish_output2+0x23d/0x520
> [  647.255365]  ? ip6table_mangle_hook+0x55/0x160
> [  647.255366]  __ip6_finish_output+0xa1/0x100
> [  647.255367]  ip6_finish_output+0x30/0xd0
> [  647.255368]  ip6_output+0x73/0x120
> [  647.255369]  ? __ip6_finish_output+0x100/0x100
> [  647.255370]  ip6_xmit+0x2e3/0x600
> [  647.255372]  ? ipv6_anycast_cleanup+0x50/0x50
> [  647.255373]  ? inet6_csk_route_socket+0x136/0x1e0
> [  647.255374]  ? skb_free_head+0x1e/0x30
> [  647.255375]  inet6_csk_xmit+0x95/0xf0
> [  647.255377]  __tcp_transmit_skb+0x5b4/0xb20
> [  647.255378]  __tcp_send_ack.part.60+0xa3/0x110
> [  647.255379]  tcp_send_ack+0x1d/0x20
> [  647.255380]  tcp_rcv_state_process+0xe64/0xe80
> [  647.255381]  ? tcp_v6_connect+0x5d1/0x5f0
> [  647.255383]  tcp_v6_do_rcv+0x1b1/0x3f0
> [  647.255384]  ? tcp_v6_do_rcv+0x1b1/0x3f0
> [  647.255385]  __release_sock+0x7f/0xd0
> [  647.255386]  release_sock+0x30/0xa0
> [  647.255388]  __inet_stream_connect+0x1c3/0x3b0
> [  647.255390]  ? prepare_to_wait+0xb0/0xb0
> [  647.255391]  inet_stream_connect+0x3b/0x60
> [  647.255394]  __sys_connect+0x101/0x120
> [  647.255395]  ? __sys_getsockopt+0x11b/0x140
> [  647.255397]  __x64_sys_connect+0x1a/0x20
> [  647.255398]  do_syscall_64+0x51/0x200
> [  647.255399]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [  647.255401] RIP: 0033:0x7f45464fcd50
>
> > Also, shouldn't cgroup_sk_alloc() be changed in a similar way?
> >
>
> I will check cgroup_sk_alloc() too.
>

Yes, cgroup_sk_alloc() should be changed similarly too.

Patch
diff mbox series

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 63bb6a2aab81..f500da82bfe8 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6697,6 +6697,10 @@  void mem_cgroup_sk_alloc(struct sock *sk)
 		return;
 	}
 
+	/* Do not associate the sock with unrelated interrupted task's memcg. */
+	if (in_interrupt())
+		return;
+
 	rcu_read_lock();
 	memcg = mem_cgroup_from_task(current);
 	if (memcg == root_mem_cgroup)