From patchwork Wed Oct 2 14:50:53 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Karsten Graul X-Patchwork-Id: 11171251 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9B94A1747 for ; Wed, 2 Oct 2019 14:51:01 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 64FD7222C4 for ; Wed, 2 Oct 2019 14:51:01 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 64FD7222C4 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.ibm.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 7553E6B0005; Wed, 2 Oct 2019 10:51:00 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 705916B0006; Wed, 2 Oct 2019 10:51:00 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5F5276B0007; Wed, 2 Oct 2019 10:51:00 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0098.hostedemail.com [216.40.44.98]) by kanga.kvack.org (Postfix) with ESMTP id 404E96B0005 for ; Wed, 2 Oct 2019 10:51:00 -0400 (EDT) Received: from smtpin24.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with SMTP id D97F27599 for ; Wed, 2 Oct 2019 14:50:59 +0000 (UTC) X-FDA: 75999131838.24.mint46_7d09582ab633e X-Spam-Summary: 2,0,0,4fc07043f74276d2,d41d8cd98f00b204,kgraul@linux.ibm.com,:guro@fb.com:shakeelb@google.com:vdavydov.dev@gmail.com:rientjes@google.com:cl@linux.com:penberg@kernel.org:rientjes@google.com:iamjoonsoo.kim@lge.com:akpm@linux-foundation.org::kgraul@linux.ibm.com,RULES_HIT:2:41:152:355:379:854:960:965:966:973:988:989:1260:1261:1277:1311:1313:1314:1345:1437:1515:1516:1518:1535:1593:1594:1605:1606:1730:1747:1777:1792:1801:2194:2196:2198:2199:2200:2201:2376:2393:2559:2562:2731:2827:2904:3138:3139:3140:3141:3142:3865:3866:3867:3868:3870:3871:3872:3873:3874:4120:4250:4321:4384:4385:4390:4395:4605:5007:6117:6119:6238:6261:7652:7875:7903:8660:9010:9121:9149:9163:9165:9389:9391:10004:11026:11232:11473:11658:11783:11914:12043:12114:12295:12296:12297:12438:12555:12663:12679:12742:12895:12986:13148:13161:13229:13230:13870:13894:13972:21080:21324:21451:21611:21627:21740:21939:30003:30012:30029:30054:30056:30060:30075,0,RBL:148.163.156.1:@linux.ibm.com:.lbl8.mailshell.net-62.1 4.0.100 X-HE-Tag: mint46_7d09582ab633e X-Filterd-Recvd-Size: 9419 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by imf20.hostedemail.com (Postfix) with ESMTP for ; Wed, 2 Oct 2019 14:50:59 +0000 (UTC) Received: from pps.filterd (m0098404.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x92Enqec023234 for ; Wed, 2 Oct 2019 10:50:57 -0400 Received: from e06smtp03.uk.ibm.com (e06smtp03.uk.ibm.com [195.75.94.99]) by mx0a-001b2d01.pphosted.com with ESMTP id 2vcvhgu9s7-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Wed, 02 Oct 2019 10:50:57 -0400 Received: from localhost by e06smtp03.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 2 Oct 2019 15:50:55 +0100 Received: from b06cxnps3074.portsmouth.uk.ibm.com (9.149.109.194) by e06smtp03.uk.ibm.com (192.168.101.133) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Wed, 2 Oct 2019 15:50:52 +0100 Received: from b06wcsmtp001.portsmouth.uk.ibm.com (b06wcsmtp001.portsmouth.uk.ibm.com [9.149.105.160]) by b06cxnps3074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id x92EopZe37421060 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 2 Oct 2019 14:50:51 GMT Received: from b06wcsmtp001.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id BD62AA4054; Wed, 2 Oct 2019 14:50:51 +0000 (GMT) Received: from b06wcsmtp001.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 5E007A405F; Wed, 2 Oct 2019 14:50:51 +0000 (GMT) Received: from [9.152.222.72] (unknown [9.152.222.72]) by b06wcsmtp001.portsmouth.uk.ibm.com (Postfix) with ESMTP; Wed, 2 Oct 2019 14:50:51 +0000 (GMT) To: Roman Gushchin , Shakeel Butt Cc: Vladimir Davydov , David Rientjes , ",Christoph Lameter" , Pekka Enberg , David Rientjes , Joonsoo Kim , Andrew Morton , linux-mm@kvack.org, Karsten Graul From: Karsten Graul Subject: BUG: Crash in __free_slab() using SLAB_TYPESAFE_BY_RCU Organization: IBM Deutschland Research & Development GmbH Date: Wed, 2 Oct 2019 16:50:53 +0200 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.9.0 MIME-Version: 1.0 Content-Language: de-DE X-TM-AS-GCONF: 00 x-cbid: 19100214-0012-0000-0000-0000035362BA X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 19100214-0013-0000-0000-0000218E66C1 Message-Id: <4a5108b4-5a2f-f83c-e6a8-5e0c9074ac69@linux.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2019-10-02_07:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=2 phishscore=0 bulkscore=0 spamscore=0 clxscore=1011 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=457 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1908290000 definitions=main-1910020138 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: net/smc is calling proto_register(&smc_proto, 1) with smc_proto.slab_flags = SLAB_TYPESAFE_BY_RCU. Right after the last SMC socket is destroyed, proto_unregister(&smc_proto) is called, which calls kmem_cache_destroy(prot->slab). This results in a kernel crash in __free_slab(). Platform is s390x, reproduced on kernel 5.4-rc1. The problem was introduced by commit fb2f2b0adb98 ("mm: memcg/slab: reparent memcg kmem_caches on cgroup removal") I added a 'call graph', below of that is the crash log and a (simple) patch that works for me, but I don't know if this is the correct way to fix it. (Please keep me on CC of this thread because I do not follow the mm mailing list, thank you) kmem_cache_destroy() -> shutdown_memcg_caches() -> shutdown_cache() -> __kmem_cache_shutdown() (slub.c) -> free_partial() -> discard_slab() -> free_slab() -- call to __free_slab() is delayed -> call_rcu(rcu_free_slab) -> memcg_unlink_cache() -> WRITE_ONCE(s->memcg_params.memcg, NULL); -- !!! -> list_add_tail(&s->list, &slab_caches_to_rcu_destroy); -> schedule_work(&slab_caches_to_rcu_destroy_work); -> work_fn uses rcu_barrier() to wait for rcu_batch, so work_fn is not further involved here... ... rcu grace period ... rcu_batch() ... -> rcu_free_slab() (slub.c) -> __free_slab() -> uncharge_slab_page() -> memcg_uncharge_slab() -> memcg = READ_ONCE(s->memcg_params.memcg); -- !!! memcg == NULL -> mem_cgroup_lruvec(memcg) -> mz = mem_cgroup_nodeinfo(memcg, pgdat->node_id); -- mz == NULL -> lruvec = &mz->lruvec; -- lruvec == NULL -> lruvec->pgdat = pgdat; -- *crash* The crash log: 349.361168¨ Unable to handle kernel pointer dereference in virtual kernel address space 349.361210¨ Failing address: 0000000000000000 TEID: 0000000000000483 349.361223¨ Fault in home space mode while using kernel ASCE. 349.361240¨ AS:00000000017d4007 R3:000000007fbd0007 S:000000007fbff000 P:000000000000003d 349.361340¨ Oops: 0004 ilc:3 Ý#1¨ PREEMPT SMP 349.361349¨ Modules linked in: tcp_diag inet_diag xt_tcpudp ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_at nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6 nf_de 349.361436¨ CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.3.0-05872-g6133e3e4bada-dirty #14 349.361445¨ Hardware name: IBM 2964 NC9 702 (z/VM 6.4.0) 349.361450¨ Krnl PSW : 0704d00180000000 00000000003cadb6 (__free_slab+0x686/0x6b0) 349.361464¨ R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3 349.361470¨ Krnl GPRS: 00000000f3a32928 0000000000000000 000000007fbf5d00 000000000117c4b8 349.361475¨ 0000000000000000 000000009e3291c1 0000000000000000 0000000000000000 349.361481¨ 0000000000000003 0000000000000008 000000002b478b00 000003d080a97600 349.361481¨ 0000000000000003 0000000000000008 000000002b478b00 000003d080a97600 349.361486¨ 000000000117ba00 000003e000057db0 00000000003cabcc 000003e000057c78 349.361500¨ Krnl Code: 00000000003cada6: e310a1400004 lg %r1,320(%r10) 349.361500¨ 00000000003cadac: c0e50046c286 brasl %r14,ca32b8 349.361500¨ #00000000003cadb2: a7f4fe36 brc 15,3caa1e 349.361500¨ >00000000003cadb6: e32060800024 stg %r2,128(%r6) 349.361500¨ 00000000003cadbc: a7f4fd9e brc 15,3ca8f8 349.361500¨ 00000000003cadc0: c0e50046790c brasl %r14,c99fd8 349.361500¨ 00000000003cadc6: a7f4fe2c brc 15,3caa 349.361500¨ 00000000003cadc6: a7f4fe2c brc 15,3caa1e 349.361500¨ 00000000003cadca: ecb1ffff00d9 aghik %r11,%r1,-1 349.361619¨ Call Trace: 349.361627¨ (Ý<00000000003cabcc>¨ __free_slab+0x49c/0x6b0) 349.361634¨ Ý<00000000001f5886>¨ rcu_core+0x5a6/0x7e0 349.361643¨ Ý<0000000000ca2dea>¨ __do_softirq+0xf2/0x5c0 349.361652¨ Ý<0000000000152644>¨ irq_exit+0x104/0x130 349.361659¨ Ý<000000000010d222>¨ do_IRQ+0x9a/0xf0 349.361667¨ Ý<0000000000ca2344>¨ ext_int_handler+0x130/0x134 349.361674¨ Ý<0000000000103648>¨ enabled_wait+0x58/0x128 349.361681¨ (Ý<0000000000103634>¨ enabled_wait+0x44/0x128) 349.361688¨ Ý<0000000000103b00>¨ arch_cpu_idle+0x40/0x58 349.361695¨ Ý<0000000000ca0544>¨ default_idle_call+0x3c/0x68 349.361704¨ Ý<000000000018eaa4>¨ do_idle+0xec/0x1c0 349.361748¨ Ý<000000000018ee0e>¨ cpu_startup_entry+0x36/0x40 349.361756¨ Ý<000000000122df34>¨ arch_call_rest_init+0x5c/0x88 349.361761¨ Ý<0000000000000000>¨ 0x0 349.361765¨ INFO: lockdep is turned off. 349.361769¨ Last Breaking-Event-Address: 349.361774¨ Ý<00000000003ca8f4>¨ __free_slab+0x1c4/0x6b0 349.361781¨ Kernel panic - not syncing: Fatal exception in interrupt A fix that works for me (RFC): diff --git a/mm/slab.h b/mm/slab.h index a62372d0f271..b19a3f940338 100644 --- a/mm/slab.h +++ b/mm/slab.h @@ -328,7 +328,7 @@ static __always_inline void memcg_uncharge_slab(struct page *page, int order, rcu_read_lock(); memcg = READ_ONCE(s->memcg_params.memcg); - if (likely(!mem_cgroup_is_root(memcg))) { + if (likely(memcg && !mem_cgroup_is_root(memcg))) { lruvec = mem_cgroup_lruvec(page_pgdat(page), memcg); mod_lruvec_state(lruvec, cache_vmstat_idx(s), -(1 << order)); memcg_kmem_uncharge_memcg(page, order, memcg);