From patchwork Sat Sep 26 04:19:01 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 11801087 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id AFB1B112E for ; Sat, 26 Sep 2020 04:19:05 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 740B3207C4 for ; Sat, 26 Sep 2020 04:19:05 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=kernel.org header.i=@kernel.org header.b="Zed3OG/w" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 740B3207C4 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 9F7FC6B0062; Sat, 26 Sep 2020 00:19:04 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 9A82F6B0068; Sat, 26 Sep 2020 00:19:04 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 896E36B006C; Sat, 26 Sep 2020 00:19:04 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0163.hostedemail.com [216.40.44.163]) by kanga.kvack.org (Postfix) with ESMTP id 74C436B0062 for ; Sat, 26 Sep 2020 00:19:04 -0400 (EDT) Received: from smtpin08.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 3A0E25825 for ; Sat, 26 Sep 2020 04:19:04 +0000 (UTC) X-FDA: 77303907408.08.boys36_39008522716d Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin08.hostedemail.com (Postfix) with ESMTP id 1944E1819E766 for ; Sat, 26 Sep 2020 04:19:04 +0000 (UTC) X-Spam-Summary: 40,2.5,0,9b981d3dab7ae37b,d41d8cd98f00b204,akpm@linux-foundation.org,,RULES_HIT:41:355:379:800:960:967:973:988:989:1260:1345:1359:1381:1431:1437:1534:1542:1711:1730:1747:1777:1792:2198:2199:2393:2525:2559:2563:2682:2685:2731:2859:2899:2902:2933:2937:2939:2942:2945:2947:2951:2954:2987:3022:3138:3139:3140:3141:3142:3353:3653:3865:3866:3867:3868:3870:3871:3872:3934:3936:3938:3941:3944:3947:3950:3953:3956:3959:4250:4321:5007:6119:6261:6630:6653:6737:7514:7576:7875:7901:7903:8603:9025:9545:10011:11026:11473:11658:11914:12043:12048:12219:12296:12297:12517:12519:12555:12679:12986:13846:14093:14181:14721:21060:21080:21451:21627:21939:30054:30064,0,RBL:198.145.29.99:@linux-foundation.org:.lbl8.mailshell.net-64.100.201.201 62.2.0.100;04yrx9tj5i6uaa4an4o18xjgfrk3dycdxn7ohxmm5o68qw9rp4ztoqdqi9rdsod.dqmknyjgb8sht8i4ko7mu88k87xne4idnswcz5sx7156wg1gudrxsty9r19m68c.h-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk, SPF:fp,M X-HE-Tag: boys36_39008522716d X-Filterd-Recvd-Size: 4132 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf47.hostedemail.com (Postfix) with ESMTP for ; Sat, 26 Sep 2020 04:19:03 +0000 (UTC) Received: from localhost.localdomain (c-71-198-47-131.hsd1.ca.comcast.net [71.198.47.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 136A02076D; Sat, 26 Sep 2020 04:19:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1601093942; bh=AJxwQo/L+ID93H6tPKPieFQbSPWKO3Vas2ik0yvPZ+s=; h=Date:From:To:Subject:In-Reply-To:From; b=Zed3OG/wK9qxn9wnbcZ9fmAs2DU7YwezSNB6Y9AsXjQzhtgMbre9rzAVd6MzH7LQd UFtvvFvKJKbQ44+3YsTtarIH+cCOiJ9Ak0aCVJDBRw5pg9eNFrFcJUK5XyPPJXWmos 6v5fAzaCzSye+xx718QJRjHbtkRgmoCJyqRZbB/g= Date: Fri, 25 Sep 2020 21:19:01 -0700 From: Andrew Morton To: akpm@linux-foundation.org, aquini@redhat.com, cmaiolino@redhat.com, david@fromorbit.com, esandeen@redhat.com, hsiangkao@redhat.com, linux-mm@kvack.org, mm-commits@vger.kernel.org, shy828301@gmail.com, stable@vger.kernel.org, torvalds@linux-foundation.org, willy@infradead.org, ying.huang@intel.com Subject: [patch 1/9] mm, THP, swap: fix allocating cluster for swapfile by mistake Message-ID: <20200926041901.P-A_BQZ2V%akpm@linux-foundation.org> In-Reply-To: <20200925211725.0fea54be9e9715486efea21f@linux-foundation.org> User-Agent: s-nail v14.8.16 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Gao Xiang Subject: mm, THP, swap: fix allocating cluster for swapfile by mistake SWP_FS is used to make swap_{read,write}page() go through the filesystem, and it's only used for swap files over NFS. So, !SWP_FS means non NFS for now, it could be either file backed or device backed. Something similar goes with legacy SWP_FILE. So in order to achieve the goal of the original patch, SWP_BLKDEV should be used instead. FS corruption can be observed with SSD device + XFS + fragmented swapfile due to CONFIG_THP_SWAP=y. I reproduced the issue with the following details: Environment: QEMU + upstream kernel + buildroot + NVMe (2 GB) Kernel config: CONFIG_BLK_DEV_NVME=y CONFIG_THP_SWAP=y Some reproducable steps: mkfs.xfs -f /dev/nvme0n1 mkdir /tmp/mnt mount /dev/nvme0n1 /tmp/mnt bs="32k" sz="1024m" # doesn't matter too much, I also tried 16m xfs_io -f -c "pwrite -R -b $bs 0 $sz" -c "fdatasync" /tmp/mnt/sw xfs_io -f -c "pwrite -R -b $bs 0 $sz" -c "fdatasync" /tmp/mnt/sw xfs_io -f -c "pwrite -R -b $bs 0 $sz" -c "fdatasync" /tmp/mnt/sw xfs_io -f -c "pwrite -F -S 0 -b $bs 0 $sz" -c "fdatasync" /tmp/mnt/sw xfs_io -f -c "pwrite -R -b $bs 0 $sz" -c "fsync" /tmp/mnt/sw mkswap /tmp/mnt/sw swapon /tmp/mnt/sw stress --vm 2 --vm-bytes 600M # doesn't matter too much as well Symptoms: - FS corruption (e.g. checksum failure) - memory corruption at: 0xd2808010 - segfault Link: https://lkml.kernel.org/r/20200820045323.7809-1-hsiangkao@redhat.com Fixes: f0eea189e8e9 ("mm, THP, swap: Don't allocate huge cluster for file backed swap device") Fixes: 38d8b4e6bdc8 ("mm, THP, swap: delay splitting THP during swap out") Signed-off-by: Gao Xiang Reviewed-by: "Huang, Ying" Reviewed-by: Yang Shi Acked-by: Rafael Aquini Cc: Matthew Wilcox Cc: Carlos Maiolino Cc: Eric Sandeen Cc: Dave Chinner Cc: Signed-off-by: Andrew Morton --- mm/swapfile.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/mm/swapfile.c~mm-thp-swap-fix-allocating-cluster-for-swapfile-by-mistake +++ a/mm/swapfile.c @@ -1078,7 +1078,7 @@ start_over: goto nextsi; } if (size == SWAPFILE_CLUSTER) { - if (!(si->flags & SWP_FS)) + if (si->flags & SWP_BLKDEV) n_ret = swap_alloc_cluster(si, swp_entries); } else n_ret = scan_swap_map_slots(si, SWAP_HAS_CACHE, From patchwork Sat Sep 26 04:19:05 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 11801089 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D45E8112E for ; Sat, 26 Sep 2020 04:19:09 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 88D602080A for ; Sat, 26 Sep 2020 04:19:09 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=kernel.org header.i=@kernel.org header.b="TXOz0L5i" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 88D602080A Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 9CF056B0068; Sat, 26 Sep 2020 00:19:08 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 980446B006C; Sat, 26 Sep 2020 00:19:08 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8BCA86B006E; Sat, 26 Sep 2020 00:19:08 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0185.hostedemail.com [216.40.44.185]) by kanga.kvack.org (Postfix) with ESMTP id 766D56B0068 for ; Sat, 26 Sep 2020 00:19:08 -0400 (EDT) Received: from smtpin24.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 33D488249980 for ; Sat, 26 Sep 2020 04:19:08 +0000 (UTC) X-FDA: 77303907576.24.toad68_43125a22716d Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin24.hostedemail.com (Postfix) with ESMTP id 134F41A4A0 for ; Sat, 26 Sep 2020 04:19:08 +0000 (UTC) X-Spam-Summary: 1,0,0,945dc0e8ac06a747,d41d8cd98f00b204,akpm@linux-foundation.org,,RULES_HIT:41:355:379:800:960:967:973:988:989:1260:1345:1359:1381:1431:1437:1534:1543:1711:1730:1747:1777:1792:1801:2198:2199:2393:2525:2559:2563:2682:2685:2859:2895:2902:2933:2937:2939:2942:2945:2947:2951:2954:3022:3138:3139:3140:3141:3142:3353:3865:3866:3867:3868:3872:3874:3934:3936:3938:3941:3944:3947:3950:3953:3956:3959:4321:4605:5007:6119:6261:6653:6737:7514:7576:8908:9025:9121:9545:9707:10004:11026:11232:11473:11658:11914:12043:12048:12296:12297:12438:12517:12519:12555:12679:12986:13846:14130:14181:14721:21080:21451:21627:21740:21939:30034:30054:30056:30064:30070:30091,0,RBL:198.145.29.99:@linux-foundation.org:.lbl8.mailshell.net-62.2.0.100 64.100.201.201;04y8eu6qfz34wbybd9ncqjcj4dmm6ocw9dbo36yn43cn3m14wi77uu5sgaixpot.u4twkyzb97scfxgub8zfg6tydujezfjb5bzeoaumr49jkqmj11htr5oriomrqho.o-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bul k,SPF:fp X-HE-Tag: toad68_43125a22716d X-Filterd-Recvd-Size: 4852 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf49.hostedemail.com (Postfix) with ESMTP for ; Sat, 26 Sep 2020 04:19:07 +0000 (UTC) Received: from localhost.localdomain (c-71-198-47-131.hsd1.ca.comcast.net [71.198.47.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 0C22B2078B; Sat, 26 Sep 2020 04:19:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1601093946; bh=ueHzXzZ8XvTGj1BSpsi5gz3iboaNaBmlhhqS7ErYD0U=; h=Date:From:To:Subject:In-Reply-To:From; b=TXOz0L5ihcGhlMQ4wW1+woCbwN4Qjy/Y8r2Hj2nNxTOHhz6uKWC2dj/vm6CJ9G9dw +xxw8U4k54br1MneI6AFgQ6oVP+KKu6pU17IzOavijvoQc4v39/TBYzWjYnNQdNN11 +1vrzHfJNqHpAT+BXQ66Q3QbFDeTDXCkvHASGrvM= Date: Fri, 25 Sep 2020 21:19:05 -0700 From: Andrew Morton To: akpm@linux-foundation.org, corbet@lwn.net, guro@fb.com, hannes@cmpxchg.org, iamjoonsoo.kim@lge.com, linux-mm@kvack.org, lizefan@huawei.com, mhocko@kernel.org, mm-commits@vger.kernel.org, rdunlap@infradead.org, shakeelb@google.com, songmuchun@bytedance.com, tj@kernel.org, torvalds@linux-foundation.org, vbabka@suse.cz, vdavydov.dev@gmail.com Subject: [patch 2/9] mm: memcontrol: fix missing suffix of workingset_restore Message-ID: <20200926041905.JOR3-Yv3y%akpm@linux-foundation.org> In-Reply-To: <20200925211725.0fea54be9e9715486efea21f@linux-foundation.org> User-Agent: s-nail v14.8.16 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Muchun Song Subject: mm: memcontrol: fix missing suffix of workingset_restore We forget to add the suffix to the workingset_restore string, so fix it. And also update the documentation of cgroup-v2.rst. Link: https://lkml.kernel.org/r/20200916100030.71698-1-songmuchun@bytedance.com Fixes: 170b04b7ae49 ("mm/workingset: prepare the workingset detection infrastructure for anon LRU") Signed-off-by: Muchun Song Reviewed-by: Shakeel Butt Cc: Joonsoo Kim Cc: Johannes Weiner Cc: Vlastimil Babka Cc: Tejun Heo Cc: Zefan Li Cc: Jonathan Corbet Cc: Michal Hocko Cc: Vladimir Davydov Cc: Roman Gushchin Cc: Randy Dunlap Signed-off-by: Andrew Morton --- Documentation/admin-guide/cgroup-v2.rst | 25 +++++++++++++++------- mm/memcontrol.c | 4 +-- 2 files changed, 20 insertions(+), 9 deletions(-) --- a/Documentation/admin-guide/cgroup-v2.rst~mm-memcontrol-fix-missing-suffix-of-workingset_restore +++ a/Documentation/admin-guide/cgroup-v2.rst @@ -1324,15 +1324,26 @@ PAGE_SIZE multiple when read back. pgmajfault Number of major page faults incurred - workingset_refault - Number of refaults of previously evicted pages + workingset_refault_anon + Number of refaults of previously evicted anonymous pages. - workingset_activate - Number of refaulted pages that were immediately activated + workingset_refault_file + Number of refaults of previously evicted file pages. - workingset_restore - Number of restored pages which have been detected as an active - workingset before they got reclaimed. + workingset_activate_anon + Number of refaulted anonymous pages that were immediately + activated. + + workingset_activate_file + Number of refaulted file pages that were immediately activated. + + workingset_restore_anon + Number of restored anonymous pages which have been detected as + an active workingset before they got reclaimed. + + workingset_restore_file + Number of restored file pages which have been detected as an + active workingset before they got reclaimed. workingset_nodereclaim Number of times a shadow node has been reclaimed --- a/mm/memcontrol.c~mm-memcontrol-fix-missing-suffix-of-workingset_restore +++ a/mm/memcontrol.c @@ -1538,9 +1538,9 @@ static char *memory_stat_format(struct m memcg_page_state(memcg, WORKINGSET_ACTIVATE_ANON)); seq_buf_printf(&s, "workingset_activate_file %lu\n", memcg_page_state(memcg, WORKINGSET_ACTIVATE_FILE)); - seq_buf_printf(&s, "workingset_restore %lu\n", + seq_buf_printf(&s, "workingset_restore_anon %lu\n", memcg_page_state(memcg, WORKINGSET_RESTORE_ANON)); - seq_buf_printf(&s, "workingset_restore %lu\n", + seq_buf_printf(&s, "workingset_restore_file %lu\n", memcg_page_state(memcg, WORKINGSET_RESTORE_FILE)); seq_buf_printf(&s, "workingset_nodereclaim %lu\n", memcg_page_state(memcg, WORKINGSET_NODERECLAIM)); From patchwork Sat Sep 26 04:19:10 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 11801091 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 85DA9112C for ; Sat, 26 Sep 2020 04:19:15 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 29170208FE for ; Sat, 26 Sep 2020 04:19:15 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=kernel.org header.i=@kernel.org header.b="oMbZlQCF" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 29170208FE Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 478156B006C; Sat, 26 Sep 2020 00:19:14 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 42AA56B006E; Sat, 26 Sep 2020 00:19:14 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 365A46B0070; Sat, 26 Sep 2020 00:19:14 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 21EFE6B006C for ; Sat, 26 Sep 2020 00:19:14 -0400 (EDT) Received: from smtpin16.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id CEC315820 for ; Sat, 26 Sep 2020 04:19:13 +0000 (UTC) X-FDA: 77303907786.16.shoe55_2f029452716d Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin16.hostedemail.com (Postfix) with ESMTP id ABB58101E024D for ; Sat, 26 Sep 2020 04:19:13 +0000 (UTC) X-Spam-Summary: 1,0,0,b5439b7c94f38033,d41d8cd98f00b204,akpm@linux-foundation.org,,RULES_HIT:1:41:355:379:800:960:967:973:988:989:1260:1345:1359:1381:1431:1437:1605:1730:1747:1777:1792:1801:1981:2194:2199:2393:2525:2559:2564:2636:2682:2685:2693:2859:2902:2933:2937:2939:2942:2945:2947:2951:2954:3022:3138:3139:3140:3141:3142:3167:3865:3866:3867:3868:3870:3871:3874:3934:3936:3938:3941:3944:3947:3950:3953:3956:3959:4321:4605:5007:6117:6119:6261:6630:6653:6737:6738:7576:7875:7903:8660:8985:9025:9121:9545:10004:11026:11473:11658:11854:11914:12043:12048:12050:12291:12296:12297:12438:12517:12519:12555:12663:12679:12683:12986:13148:13230:14096:14799:21080:21433:21451:21611:21627:21740:21796:21939:21990:30003:30012:30036:30054:30064:30070:30079,0,RBL:198.145.29.99:@linux-foundation.org:.lbl8.mailshell.net-62.2.0.100 64.100.201.201;04y8a4fzt554dyrnwdna5m6z66xdqych5595y8m8z3np6c3ngii53k3fhykrgfb.uf5u5uakhrrsw74r1cmsnyisdghhepi4tmnwwkg1ryxnhwsm841ghcnccz939z6.y-lbl8.mailshell.net-223.238. 255.100, X-HE-Tag: shoe55_2f029452716d X-Filterd-Recvd-Size: 12123 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf27.hostedemail.com (Postfix) with ESMTP for ; Sat, 26 Sep 2020 04:19:13 +0000 (UTC) Received: from localhost.localdomain (c-71-198-47-131.hsd1.ca.comcast.net [71.198.47.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 7FFA8207C4; Sat, 26 Sep 2020 04:19:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1601093952; bh=ROsfeFYyajx6sHRc4vsd8UqPQB3IsomCwYlUooeeHLE=; h=Date:From:To:Subject:In-Reply-To:From; b=oMbZlQCFKOM+Cd33AndS+plrLvhXBMkEWLCYTd6ddcqRob/LUCS1/U4hQlGQvT5K4 OT+845XS8+fNv0+QqSI+3D8EeWfSethgnLkedb8OFILFbveC7ORU2vTjbD2NI8o+2X XBDEiJfDaWZj82YSILmHBvyRpC2FvOcxrtHUKXWM= Date: Fri, 25 Sep 2020 21:19:10 -0700 From: Andrew Morton To: agordeev@linux.ibm.com, akpm@linux-foundation.org, arnd@arndb.de, aryabinin@virtuozzo.com, benh@kernel.crashing.org, borntraeger@de.ibm.com, bp@alien8.de, catalin.marinas@arm.com, dave.hansen@intel.com, dave.hansen@linux.intel.com, gerald.schaefer@linux.ibm.com, gor@linux.ibm.com, hca@linux.ibm.com, imbrenda@linux.ibm.com, jdike@addtoit.com, jgg@nvidia.com, jhubbard@nvidia.com, linux-mm@kvack.org, linux@armlinux.org.uk, luto@kernel.org, mingo@redhat.com, mm-commits@vger.kernel.org, mpe@ellerman.id.au, paulus@samba.org, peterz@infradead.org, richard@nod.at, rppt@linux.ibm.com, stable@vger.kernel.org, tglx@linutronix.de, torvalds@linux-foundation.org, will@kernel.org Subject: [patch 3/9] mm/gup: fix gup_fast with dynamic page table folding Message-ID: <20200926041910.c0h74uLjj%akpm@linux-foundation.org> In-Reply-To: <20200925211725.0fea54be9e9715486efea21f@linux-foundation.org> User-Agent: s-nail v14.8.16 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Vasily Gorbik Subject: mm/gup: fix gup_fast with dynamic page table folding Currently to make sure that every page table entry is read just once gup_fast walks perform READ_ONCE and pass pXd value down to the next gup_pXd_range function by value e.g.: static int gup_pud_range(p4d_t p4d, unsigned long addr, unsigned long end, unsigned int flags, struct page **pages, int *nr) ... pudp = pud_offset(&p4d, addr); This function passes a reference on that local value copy to pXd_offset, and might get the very same pointer in return. This happens when the level is folded (on most arches), and that pointer should not be iterated. On s390 due to the fact that each task might have different 5,4 or 3-level address translation and hence different levels folded the logic is more complex and non-iteratable pointer to a local copy leads to severe problems. Here is an example of what happens with gup_fast on s390, for a task with 3-levels paging, crossing a 2 GB pud boundary: // addr = 0x1007ffff000, end = 0x10080001000 static int gup_pud_range(p4d_t p4d, unsigned long addr, unsigned long end, unsigned int flags, struct page **pages, int *nr) { unsigned long next; pud_t *pudp; // pud_offset returns &p4d itself (a pointer to a value on stack) pudp = pud_offset(&p4d, addr); do { // on second iteratation reading "random" stack value pud_t pud = READ_ONCE(*pudp); // next = 0x10080000000, due to PUD_SIZE/MASK != PGDIR_SIZE/MASK on s390 next = pud_addr_end(addr, end); ... } while (pudp++, addr = next, addr != end); // pudp++ iterating over stack return 1; } This happens since s390 moved to common gup code with commit d1874a0c2805 ("s390/mm: make the pxd_offset functions more robust") and commit 1a42010cdc26 ("s390/mm: convert to the generic get_user_pages_fast code"). s390 tried to mimic static level folding by changing pXd_offset primitives to always calculate top level page table offset in pgd_offset and just return the value passed when pXd_offset has to act as folded. What is crucial for gup_fast and what has been overlooked is that PxD_SIZE/MASK and thus pXd_addr_end should also change correspondingly. And the latter is not possible with dynamic folding. To fix the issue in addition to pXd values pass original pXdp pointers down to gup_pXd_range functions. And introduce pXd_offset_lockless helpers, which take an additional pXd entry value parameter. This has already been discussed in https://lkml.kernel.org/r/20190418100218.0a4afd51@mschwideX1 Link: https://lkml.kernel.org/r/patch.git-943f1e5dcff2.your-ad-here.call-01599856292-ext-8676@work.hours Fixes: 1a42010cdc26 ("s390/mm: convert to the generic get_user_pages_fast code") Signed-off-by: Vasily Gorbik Reviewed-by: Gerald Schaefer Reviewed-by: Alexander Gordeev Reviewed-by: Jason Gunthorpe Reviewed-by: Mike Rapoport Reviewed-by: John Hubbard Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Dave Hansen Cc: Russell King Cc: Catalin Marinas Cc: Will Deacon Cc: Michael Ellerman Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Jeff Dike Cc: Richard Weinberger Cc: Dave Hansen Cc: Andy Lutomirski Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Borislav Petkov Cc: Arnd Bergmann Cc: Andrey Ryabinin Cc: Heiko Carstens Cc: Christian Borntraeger Cc: Claudio Imbrenda Cc: [5.2+] Signed-off-by: Andrew Morton --- arch/s390/include/asm/pgtable.h | 42 +++++++++++++++++++++--------- include/linux/pgtable.h | 10 +++++++ mm/gup.c | 18 ++++++------ 3 files changed, 49 insertions(+), 21 deletions(-) --- a/arch/s390/include/asm/pgtable.h~mm-gup-fix-gup_fast-with-dynamic-page-table-folding +++ a/arch/s390/include/asm/pgtable.h @@ -1260,26 +1260,44 @@ static inline pgd_t *pgd_offset_raw(pgd_ #define pgd_offset(mm, address) pgd_offset_raw(READ_ONCE((mm)->pgd), address) -static inline p4d_t *p4d_offset(pgd_t *pgd, unsigned long address) +static inline p4d_t *p4d_offset_lockless(pgd_t *pgdp, pgd_t pgd, unsigned long address) { - if ((pgd_val(*pgd) & _REGION_ENTRY_TYPE_MASK) >= _REGION_ENTRY_TYPE_R1) - return (p4d_t *) pgd_deref(*pgd) + p4d_index(address); - return (p4d_t *) pgd; + if ((pgd_val(pgd) & _REGION_ENTRY_TYPE_MASK) >= _REGION_ENTRY_TYPE_R1) + return (p4d_t *) pgd_deref(pgd) + p4d_index(address); + return (p4d_t *) pgdp; } +#define p4d_offset_lockless p4d_offset_lockless -static inline pud_t *pud_offset(p4d_t *p4d, unsigned long address) +static inline p4d_t *p4d_offset(pgd_t *pgdp, unsigned long address) { - if ((p4d_val(*p4d) & _REGION_ENTRY_TYPE_MASK) >= _REGION_ENTRY_TYPE_R2) - return (pud_t *) p4d_deref(*p4d) + pud_index(address); - return (pud_t *) p4d; + return p4d_offset_lockless(pgdp, *pgdp, address); +} + +static inline pud_t *pud_offset_lockless(p4d_t *p4dp, p4d_t p4d, unsigned long address) +{ + if ((p4d_val(p4d) & _REGION_ENTRY_TYPE_MASK) >= _REGION_ENTRY_TYPE_R2) + return (pud_t *) p4d_deref(p4d) + pud_index(address); + return (pud_t *) p4dp; +} +#define pud_offset_lockless pud_offset_lockless + +static inline pud_t *pud_offset(p4d_t *p4dp, unsigned long address) +{ + return pud_offset_lockless(p4dp, *p4dp, address); } #define pud_offset pud_offset -static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address) +static inline pmd_t *pmd_offset_lockless(pud_t *pudp, pud_t pud, unsigned long address) +{ + if ((pud_val(pud) & _REGION_ENTRY_TYPE_MASK) >= _REGION_ENTRY_TYPE_R3) + return (pmd_t *) pud_deref(pud) + pmd_index(address); + return (pmd_t *) pudp; +} +#define pmd_offset_lockless pmd_offset_lockless + +static inline pmd_t *pmd_offset(pud_t *pudp, unsigned long address) { - if ((pud_val(*pud) & _REGION_ENTRY_TYPE_MASK) >= _REGION_ENTRY_TYPE_R3) - return (pmd_t *) pud_deref(*pud) + pmd_index(address); - return (pmd_t *) pud; + return pmd_offset_lockless(pudp, *pudp, address); } #define pmd_offset pmd_offset --- a/include/linux/pgtable.h~mm-gup-fix-gup_fast-with-dynamic-page-table-folding +++ a/include/linux/pgtable.h @@ -1427,6 +1427,16 @@ typedef unsigned int pgtbl_mod_mask; #define mm_pmd_folded(mm) __is_defined(__PAGETABLE_PMD_FOLDED) #endif +#ifndef p4d_offset_lockless +#define p4d_offset_lockless(pgdp, pgd, address) p4d_offset(&(pgd), address) +#endif +#ifndef pud_offset_lockless +#define pud_offset_lockless(p4dp, p4d, address) pud_offset(&(p4d), address) +#endif +#ifndef pmd_offset_lockless +#define pmd_offset_lockless(pudp, pud, address) pmd_offset(&(pud), address) +#endif + /* * p?d_leaf() - true if this entry is a final mapping to a physical address. * This differs from p?d_huge() by the fact that they are always available (if --- a/mm/gup.c~mm-gup-fix-gup_fast-with-dynamic-page-table-folding +++ a/mm/gup.c @@ -2485,13 +2485,13 @@ static int gup_huge_pgd(pgd_t orig, pgd_ return 1; } -static int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end, +static int gup_pmd_range(pud_t *pudp, pud_t pud, unsigned long addr, unsigned long end, unsigned int flags, struct page **pages, int *nr) { unsigned long next; pmd_t *pmdp; - pmdp = pmd_offset(&pud, addr); + pmdp = pmd_offset_lockless(pudp, pud, addr); do { pmd_t pmd = READ_ONCE(*pmdp); @@ -2528,13 +2528,13 @@ static int gup_pmd_range(pud_t pud, unsi return 1; } -static int gup_pud_range(p4d_t p4d, unsigned long addr, unsigned long end, +static int gup_pud_range(p4d_t *p4dp, p4d_t p4d, unsigned long addr, unsigned long end, unsigned int flags, struct page **pages, int *nr) { unsigned long next; pud_t *pudp; - pudp = pud_offset(&p4d, addr); + pudp = pud_offset_lockless(p4dp, p4d, addr); do { pud_t pud = READ_ONCE(*pudp); @@ -2549,20 +2549,20 @@ static int gup_pud_range(p4d_t p4d, unsi if (!gup_huge_pd(__hugepd(pud_val(pud)), addr, PUD_SHIFT, next, flags, pages, nr)) return 0; - } else if (!gup_pmd_range(pud, addr, next, flags, pages, nr)) + } else if (!gup_pmd_range(pudp, pud, addr, next, flags, pages, nr)) return 0; } while (pudp++, addr = next, addr != end); return 1; } -static int gup_p4d_range(pgd_t pgd, unsigned long addr, unsigned long end, +static int gup_p4d_range(pgd_t *pgdp, pgd_t pgd, unsigned long addr, unsigned long end, unsigned int flags, struct page **pages, int *nr) { unsigned long next; p4d_t *p4dp; - p4dp = p4d_offset(&pgd, addr); + p4dp = p4d_offset_lockless(pgdp, pgd, addr); do { p4d_t p4d = READ_ONCE(*p4dp); @@ -2574,7 +2574,7 @@ static int gup_p4d_range(pgd_t pgd, unsi if (!gup_huge_pd(__hugepd(p4d_val(p4d)), addr, P4D_SHIFT, next, flags, pages, nr)) return 0; - } else if (!gup_pud_range(p4d, addr, next, flags, pages, nr)) + } else if (!gup_pud_range(p4dp, p4d, addr, next, flags, pages, nr)) return 0; } while (p4dp++, addr = next, addr != end); @@ -2602,7 +2602,7 @@ static void gup_pgd_range(unsigned long if (!gup_huge_pd(__hugepd(pgd_val(pgd)), addr, PGDIR_SHIFT, next, flags, pages, nr)) return; - } else if (!gup_p4d_range(pgd, addr, next, flags, pages, nr)) + } else if (!gup_p4d_range(pgdp, pgd, addr, next, flags, pages, nr)) return; } while (pgdp++, addr = next, addr != end); } From patchwork Sat Sep 26 04:19:14 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 11801093 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id AAE67112C for ; Sat, 26 Sep 2020 04:19:18 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 6CD952078B for ; Sat, 26 Sep 2020 04:19:18 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=kernel.org header.i=@kernel.org header.b="v3sh1qqi" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6CD952078B Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 9A8226B006E; Sat, 26 Sep 2020 00:19:17 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 9577C6B0070; Sat, 26 Sep 2020 00:19:17 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 86D526B0071; Sat, 26 Sep 2020 00:19:17 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0040.hostedemail.com [216.40.44.40]) by kanga.kvack.org (Postfix) with ESMTP id 756CC6B006E for ; Sat, 26 Sep 2020 00:19:17 -0400 (EDT) Received: from smtpin28.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 3DB8A180AD801 for ; Sat, 26 Sep 2020 04:19:17 +0000 (UTC) X-FDA: 77303907954.28.fuel02_420e5dd2716d Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin28.hostedemail.com (Postfix) with ESMTP id 201AF6C04 for ; Sat, 26 Sep 2020 04:19:17 +0000 (UTC) X-Spam-Summary: 1,0,0,81308be839c583a9,d41d8cd98f00b204,akpm@linux-foundation.org,,RULES_HIT:41:355:379:800:960:967:973:988:989:1260:1345:1359:1381:1431:1437:1534:1541:1711:1730:1747:1777:1792:1963:2393:2525:2559:2563:2682:2685:2859:2892:2902:2933:2937:2939:2942:2945:2947:2951:2954:3022:3138:3139:3140:3141:3142:3352:3865:3866:3868:3871:3874:3934:3936:3938:3941:3944:3947:3950:3953:3956:3959:4250:4321:5007:6119:6120:6261:6653:7576:7901:7903:8957:9010:9025:9545:10004:11026:11658:11914:12043:12048:12296:12297:12438:12517:12519:12555:12679:12986:13069:13141:13230:13311:13357:13846:14096:14181:14384:14721:21080:21451:21627:21809:21939:21990:30012:30054:30064:30075,0,RBL:198.145.29.99:@linux-foundation.org:.lbl8.mailshell.net-62.2.0.100 64.100.201.201;04ygyzcf4p7rhsu7i5ccpd9uc8bjpopf4enf6n8eb3zyefqa8h7rbi4mriay9w8.en75n4rfgea1h7z9xxqjp4qpwjrbwtgmkt5bodtwr85zid831szrkm6ggi3uzq6.a-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF X-HE-Tag: fuel02_420e5dd2716d X-Filterd-Recvd-Size: 2960 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf44.hostedemail.com (Postfix) with ESMTP for ; Sat, 26 Sep 2020 04:19:16 +0000 (UTC) Received: from localhost.localdomain (c-71-198-47-131.hsd1.ca.comcast.net [71.198.47.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 666E12080A; Sat, 26 Sep 2020 04:19:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1601093955; bh=9u43UOtPeXYOn2SAd52yJxsE9rs6wq54p1naXXmefTI=; h=Date:From:To:Subject:In-Reply-To:From; b=v3sh1qqiMSmEFOL/yDNG06CSyJqEShYwTy6vEW13fuZvAWNvqjQO04eNosEmTuEBM TaNCzJXzmirgMw2x0Vh/jY8cmEViyT1xO6gZ/GHbqGerZtP3D9NxW9q+vKv+bUIRNh +7FPzZJSFZK4W5ixn3IC4Yspu2IVzefmiJim80UY= Date: Fri, 25 Sep 2020 21:19:14 -0700 From: Andrew Morton To: akpm@linux-foundation.org, anshuman.khandual@arm.com, daniel.m.jordan@oracle.com, linux-mm@kvack.org, mm-commits@vger.kernel.org, torvalds@linux-foundation.org, ziy@nvidia.com Subject: [patch 4/9] mm/migrate: correct thp migration stats Message-ID: <20200926041914.Hj7iz37ld%akpm@linux-foundation.org> In-Reply-To: <20200925211725.0fea54be9e9715486efea21f@linux-foundation.org> User-Agent: s-nail v14.8.16 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Zi Yan Subject: mm/migrate: correct thp migration stats PageTransHuge returns true for both thp and hugetlb, so thp stats was counting both thp and hugetlb migrations. Exclude hugetlb migration by setting is_thp variable right. Clean up thp handling code too when we are there. Link: https://lkml.kernel.org/r/20200917210413.1462975-1-zi.yan@sent.com Fixes: 1a5bae25e3cf ("mm/vmstat: add events for THP migration without split") Signed-off-by: Zi Yan Reviewed-by: Daniel Jordan Cc: Anshuman Khandual Signed-off-by: Andrew Morton --- mm/migrate.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) --- a/mm/migrate.c~mm-migrate-correct-thp-migration-stats +++ a/mm/migrate.c @@ -1446,7 +1446,7 @@ retry: * Capture required information that might get lost * during migration. */ - is_thp = PageTransHuge(page); + is_thp = PageTransHuge(page) && !PageHuge(page); nr_subpages = thp_nr_pages(page); cond_resched(); @@ -1472,7 +1472,7 @@ retry: * we encounter them after the rest of the list * is processed. */ - if (PageTransHuge(page) && !PageHuge(page)) { + if (is_thp) { lock_page(page); rc = split_huge_page_to_list(page, from); unlock_page(page); @@ -1481,8 +1481,7 @@ retry: nr_thp_split++; goto retry; } - } - if (is_thp) { + nr_thp_failed++; nr_failed += nr_subpages; goto out; From patchwork Sat Sep 26 04:19:18 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 11801095 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3BBD0112E for ; Sat, 26 Sep 2020 04:19:22 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id F310F2080A for ; Sat, 26 Sep 2020 04:19:21 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=kernel.org header.i=@kernel.org header.b="glp2xNOy" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org F310F2080A Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 2543C6B0070; Sat, 26 Sep 2020 00:19:21 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 2057E6B0071; Sat, 26 Sep 2020 00:19:21 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 11A966B0072; Sat, 26 Sep 2020 00:19:21 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0252.hostedemail.com [216.40.44.252]) by kanga.kvack.org (Postfix) with ESMTP id F16D96B0070 for ; Sat, 26 Sep 2020 00:19:20 -0400 (EDT) Received: from smtpin11.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id BAAE3180AD801 for ; Sat, 26 Sep 2020 04:19:20 +0000 (UTC) X-FDA: 77303908080.11.power30_070fcce2716d Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin11.hostedemail.com (Postfix) with ESMTP id 82E7D180F8B80 for ; Sat, 26 Sep 2020 04:19:20 +0000 (UTC) X-Spam-Summary: 50,0,0,f66b1f6cffed7b68,d41d8cd98f00b204,akpm@linux-foundation.org,,RULES_HIT:2:41:334:355:368:369:379:800:960:966:967:968:973:988:989:1260:1345:1359:1381:1431:1437:1535:1605:1606:1730:1747:1777:1792:2194:2196:2198:2199:2200:2201:2393:2525:2553:2568:2628:2682:2685:2859:2902:2933:2937:2939:2942:2945:2947:2951:2954:3022:3138:3139:3140:3141:3142:3865:3866:3867:3868:3870:3871:3872:3873:3874:3934:3936:3938:3941:3944:3947:3950:3953:3956:3959:4117:4250:4321:4385:4605:5007:6117:6119:6261:6653:6737:7514:7576:7875:7903:8957:9025:9545:10004:11026:11473:11658:11914:12043:12048:12050:12257:12291:12295:12297:12517:12519:12555:12663:12679:12698:12737:12986:13149:13161:13221:13229:13230:13846:21080:21094:21323:21366:21451:21627:21740:21939:21990:30054:30070:30079:30089:30090:30091,0,RBL:198.145.29.99:@linux-foundation.org:.lbl8.mailshell.net-62.2.0.100 64.100.201.201;04yfcdfo8354chuyszhgxhaj4btosockkhq59rbdyo1dy8mcbb15ey8y69r5uof.yt4ak6ryj6wy7kqse8r9z3k6rj7dx7yntgbyr9bp57rwcb km357ncy X-HE-Tag: power30_070fcce2716d X-Filterd-Recvd-Size: 6575 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf15.hostedemail.com (Postfix) with ESMTP for ; Sat, 26 Sep 2020 04:19:20 +0000 (UTC) Received: from localhost.localdomain (c-71-198-47-131.hsd1.ca.comcast.net [71.198.47.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id BEF012078B; Sat, 26 Sep 2020 04:19:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1601093959; bh=gqjbGqJ1joxLFctcB/Msx4e1CUN0FUkvjKq/zP4B4zM=; h=Date:From:To:Subject:In-Reply-To:From; b=glp2xNOyJhKoRYpO+EPEp1Wz4yZampURWRfsCERI3F0fWG1GW292568BAd9RwJFcE qyvUK+UV60w93FDkCeLJe5SdrdEOh5VJBT+02DORAVykWjps9YDpBjt/+Y+dEIkEkc IZ3qxC8QmFZMNKiM1HY9QXoz6HMjpb2Sr4BTzzZ0= Date: Fri, 25 Sep 2020 21:19:18 -0700 From: Andrew Morton To: akpm@linux-foundation.org, andy.lavr@gmail.com, joe@perches.com, keescook@chromium.org, linux-mm@kvack.org, linux@rasmusvillemoes.dk, masahiroy@kernel.org, mm-commits@vger.kernel.org, natechancellor@gmail.com, ndesaulniers@google.com, nivedita@alum.mit.edu, samitolvanen@google.com, stable@vger.kernel.org, torvalds@linux-foundation.org Subject: [patch 5/9] lib/string.c: implement stpcpy Message-ID: <20200926041918.OfR3GXvLc%akpm@linux-foundation.org> In-Reply-To: <20200925211725.0fea54be9e9715486efea21f@linux-foundation.org> User-Agent: s-nail v14.8.16 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Nick Desaulniers Subject: lib/string.c: implement stpcpy LLVM implemented a recent "libcall optimization" that lowers calls to `sprintf(dest, "%s", str)` where the return value is used to `stpcpy(dest, str) - dest`. This generally avoids the machinery involved in parsing format strings. `stpcpy` is just like `strcpy` except it returns the pointer to the new tail of `dest`. This optimization was introduced into clang-12. Implement this so that we don't observe linkage failures due to missing symbol definitions for `stpcpy`. Similar to last year's fire drill with: commit 5f074f3e192f ("lib/string.c: implement a basic bcmp") The kernel is somewhere between a "freestanding" environment (no full libc) and "hosted" environment (many symbols from libc exist with the same type, function signature, and semantics). As H. Peter Anvin notes, there's not really a great way to inform the compiler that you're targeting a freestanding environment but would like to opt-in to some libcall optimizations (see pr/47280 below), rather than opt-out. Arvind notes, -fno-builtin-* behaves slightly differently between GCC and Clang, and Clang is missing many __builtin_* definitions, which I consider a bug in Clang and am working on fixing. Masahiro summarizes the subtle distinction between compilers justly: To prevent transformation from foo() into bar(), there are two ways in Clang to do that; -fno-builtin-foo, and -fno-builtin-bar. There is only one in GCC; -fno-buitin-foo. (Any difference in that behavior in Clang is likely a bug from a missing __builtin_* definition.) Masahiro also notes: We want to disable optimization from foo() to bar(), but we may still benefit from the optimization from foo() into something else. If GCC implements the same transform, we would run into a problem because it is not -fno-builtin-bar, but -fno-builtin-foo that disables that optimization. In this regard, -fno-builtin-foo would be more future-proof than -fno-built-bar, but -fno-builtin-foo is still potentially overkill. We may want to prevent calls from foo() being optimized into calls to bar(), but we still may want other optimization on calls to foo(). It seems that compilers today don't quite provide the fine grain control over which libcall optimizations pseudo-freestanding environments would prefer. Finally, Kees notes that this interface is unsafe, so we should not encourage its use. As such, I've removed the declaration from any header, but it still needs to be exported to avoid linkage errors in modules. Link: https://lkml.kernel.org/r/20200914161643.938408-1-ndesaulniers@google.com Link: https://bugs.llvm.org/show_bug.cgi?id=47162 Link: https://bugs.llvm.org/show_bug.cgi?id=47280 Link: https://github.com/ClangBuiltLinux/linux/issues/1126 Link: https://man7.org/linux/man-pages/man3/stpcpy.3.html Link: https://pubs.opengroup.org/onlinepubs/9699919799/functions/stpcpy.html Link: https://reviews.llvm.org/D85963 Reported-by: Sami Tolvanen Suggested-by: Andy Lavr Suggested-by: Arvind Sankar Suggested-by: Joe Perches Suggested-by: Kees Cook Suggested-by: Masahiro Yamada Suggested-by: Rasmus Villemoes Signed-off-by: Nick Desaulniers Tested-by: Nathan Chancellor Cc: Signed-off-by: Andrew Morton --- lib/string.c | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+) --- a/lib/string.c~lib-stringc-implement-stpcpy +++ a/lib/string.c @@ -272,6 +272,30 @@ ssize_t strscpy_pad(char *dest, const ch } EXPORT_SYMBOL(strscpy_pad); +/** + * stpcpy - copy a string from src to dest returning a pointer to the new end + * of dest, including src's %NUL-terminator. May overrun dest. + * @dest: pointer to end of string being copied into. Must be large enough + * to receive copy. + * @src: pointer to the beginning of string being copied from. Must not overlap + * dest. + * + * stpcpy differs from strcpy in a key way: the return value is a pointer + * to the new %NUL-terminating character in @dest. (For strcpy, the return + * value is a pointer to the start of @dest). This interface is considered + * unsafe as it doesn't perform bounds checking of the inputs. As such it's + * not recommended for usage. Instead, its definition is provided in case + * the compiler lowers other libcalls to stpcpy. + */ +char *stpcpy(char *__restrict__ dest, const char *__restrict__ src); +char *stpcpy(char *__restrict__ dest, const char *__restrict__ src) +{ + while ((*dest++ = *src++) != '\0') + /* nothing */; + return --dest; +} +EXPORT_SYMBOL(stpcpy); + #ifndef __HAVE_ARCH_STRCAT /** * strcat - Append one %NUL-terminated string to another From patchwork Sat Sep 26 04:19:21 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 11801097 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EF915112E for ; Sat, 26 Sep 2020 04:19:24 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 93204215A4 for ; Sat, 26 Sep 2020 04:19:24 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=kernel.org header.i=@kernel.org header.b="F//Nz6di" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 93204215A4 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id A6A9E6B0071; Sat, 26 Sep 2020 00:19:23 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id A1A346B0072; Sat, 26 Sep 2020 00:19:23 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 909086B0073; Sat, 26 Sep 2020 00:19:23 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0095.hostedemail.com [216.40.44.95]) by kanga.kvack.org (Postfix) with ESMTP id 7E2216B0071 for ; Sat, 26 Sep 2020 00:19:23 -0400 (EDT) Received: from smtpin20.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 4690E5820 for ; Sat, 26 Sep 2020 04:19:23 +0000 (UTC) X-FDA: 77303908206.20.cook63_04055ad2716d Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin20.hostedemail.com (Postfix) with ESMTP id 26F1C180C07A3 for ; Sat, 26 Sep 2020 04:19:23 +0000 (UTC) X-Spam-Summary: 1,0,0,ac837d8f39d5048c,d41d8cd98f00b204,akpm@linux-foundation.org,,RULES_HIT:41:334:355:368:369:379:800:960:966:967:973:988:989:1260:1345:1359:1381:1431:1437:1534:1540:1568:1711:1714:1730:1747:1777:1792:2196:2199:2393:2525:2559:2563:2682:2685:2859:2902:2933:2937:2939:2942:2945:2947:2951:2954:3022:3138:3139:3140:3141:3142:3865:3870:3872:3934:3936:3938:3941:3944:3947:3950:3953:3956:3959:4321:4385:5007:6261:6653:7576:9025:9545:10004:11658:11914:12043:12048:12297:12517:12519:12555:12679:12986:13069:13221:13229:13311:13357:13846:14181:14384:14721:21080:21451:21627:21939:30054:30089,0,RBL:198.145.29.99:@linux-foundation.org:.lbl8.mailshell.net-62.2.0.100 64.100.201.201;04y859qtq13y8wyh3nt3poaqrd4emochhfa3mupc7nqj9istqdxsk51fjya4exm.kqef1wxro6er5fdr6sjwjwy7sx1haob31hgqi7zu69oaz8xxgygw1kba5wag5k5.e-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:24,LUA_S UMMARY:n X-HE-Tag: cook63_04055ad2716d X-Filterd-Recvd-Size: 2260 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf26.hostedemail.com (Postfix) with ESMTP for ; Sat, 26 Sep 2020 04:19:22 +0000 (UTC) Received: from localhost.localdomain (c-71-198-47-131.hsd1.ca.comcast.net [71.198.47.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id CC50A20882; Sat, 26 Sep 2020 04:19:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1601093962; bh=CZGQ/IPtIHlW48+y+j6behOl57FkpS5nsvTS70xhlrI=; h=Date:From:To:Subject:In-Reply-To:From; b=F//Nz6di/fbaRVR5cs4WfZ0oo1+G6Bz+F8KkJguT8yC/C5Z5rHv2LInDPXTwthxR5 B75tXbMi+w784JReE7nIDWd5oDAMMmSMjPy4/ZB6zZiht5F0jr6O/T5QAZupU/Ar+Y 8HdbRwSZaC3clAo1MnP3JkafH5Bgea3vv6mBXAPo= Date: Fri, 25 Sep 2020 21:19:21 -0700 From: Andrew Morton To: akpm@linux-foundation.org, hulkci@huawei.com, linux-mm@kvack.org, mm-commits@vger.kernel.org, torvalds@linux-foundation.org, yanaijie@huawei.com Subject: [patch 6/9] lib/memregion.c: include memregion.h Message-ID: <20200926041921.kOd2RwU5v%akpm@linux-foundation.org> In-Reply-To: <20200925211725.0fea54be9e9715486efea21f@linux-foundation.org> User-Agent: s-nail v14.8.16 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Jason Yan Subject: lib/memregion.c: include memregion.h This addresses the following sparse warning: lib/memregion.c:8:5: warning: symbol 'memregion_alloc' was not declared. Should it be static? lib/memregion.c:14:6: warning: symbol 'memregion_free' was not declared. Should it be static? Link: https://lkml.kernel.org/r/20200921142852.875312-1-yanaijie@huawei.com Signed-off-by: Jason Yan Reported-by: Hulk Robot Signed-off-by: Andrew Morton --- lib/memregion.c | 1 + 1 file changed, 1 insertion(+) --- a/lib/memregion.c~lib-include-memregionh-in-memregionc +++ a/lib/memregion.c @@ -2,6 +2,7 @@ /* identifiers for device / performance-differentiated memory regions */ #include #include +#include static DEFINE_IDA(memregion_ids); From patchwork Sat Sep 26 04:19:24 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 11801099 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3654A112C for ; Sat, 26 Sep 2020 04:19:28 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id EE7D32080A for ; Sat, 26 Sep 2020 04:19:27 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=kernel.org header.i=@kernel.org header.b="ARY3by6I" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org EE7D32080A Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 07F616B0072; Sat, 26 Sep 2020 00:19:27 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 0302A6B0073; Sat, 26 Sep 2020 00:19:26 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EADE36B0074; Sat, 26 Sep 2020 00:19:26 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0208.hostedemail.com [216.40.44.208]) by kanga.kvack.org (Postfix) with ESMTP id D54AC6B0072 for ; Sat, 26 Sep 2020 00:19:26 -0400 (EDT) Received: from smtpin01.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 964E65821 for ; Sat, 26 Sep 2020 04:19:26 +0000 (UTC) X-FDA: 77303908332.01.spy63_4a03e7f2716d Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin01.hostedemail.com (Postfix) with ESMTP id 85624100520F1 for ; Sat, 26 Sep 2020 04:19:26 +0000 (UTC) X-Spam-Summary: 1,0,0,baec6da270e413e7,d41d8cd98f00b204,akpm@linux-foundation.org,,RULES_HIT:41:355:379:800:960:967:973:988:989:1260:1345:1359:1381:1431:1437:1534:1541:1711:1730:1747:1777:1792:2393:2525:2553:2559:2563:2682:2685:2859:2902:2933:2937:2939:2942:2945:2947:2951:2954:3022:3138:3139:3140:3141:3142:3352:3865:3868:3872:3934:3936:3938:3941:3944:3947:3950:3953:3956:3959:4321:5007:6261:6653:6737:7576:7688:8603:9025:9545:10004:11026:11658:11914:12043:12048:12114:12297:12438:12517:12519:12555:12679:12895:12986:13069:13255:13311:13357:14181:14384:14721:21060:21080:21451:21611:21627:21939:30054:30064:30079:30090,0,RBL:198.145.29.99:@linux-foundation.org:.lbl8.mailshell.net-64.100.201.201 62.2.0.100;04yrh4xfo3i5xsqmugtz5wwdhpzxroceuk396pfmefg6k8efwnc1bgiiwp8zeuo.gkqwg5djj6mtz5ptfx6gaeptnse1u133wgk3umjc48w46gthxwsgbo1an661qmo.k-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0 :0:0,LFt X-HE-Tag: spy63_4a03e7f2716d X-Filterd-Recvd-Size: 3373 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf44.hostedemail.com (Postfix) with ESMTP for ; Sat, 26 Sep 2020 04:19:26 +0000 (UTC) Received: from localhost.localdomain (c-71-198-47-131.hsd1.ca.comcast.net [71.198.47.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id DF92D207C4; Sat, 26 Sep 2020 04:19:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1601093965; bh=4XYbOkZD/X+maFrZDRd5RAzqJJ9LoJUpJToY3T3h3R0=; h=Date:From:To:Subject:In-Reply-To:From; b=ARY3by6ITqp2ZnsjOsPvXl+rGGDcHv8XbLOpId2hoXAJYDuU33VhrEDACHeGCnZjR 9n91SUl1xCKp3tjXuT8Qf0r7v9lmsrcvsunSEVd4++RIROvmA21hg5jWaBnCAd9vFX cwn6ysHBq1VfNYgqtGTRudh+891wxKG0QhjTznM0= Date: Fri, 25 Sep 2020 21:19:24 -0700 From: Andrew Morton To: akpm@linux-foundation.org, dan.j.wiilliams@intel.com, hch@lst.de, hpa@zytor.com, jack@suse.cz, jmoyer@redhat.com, linux-mm@kvack.org, mawilcox@microsoft.com, mingo@elte.hu, mingo@redhat.com, mm-commits@vger.kernel.org, mpatocka@redhat.com, ross.zwisler@linux.intel.com, stable@vger.kernel.org, tglx@linutronix.de, torvalds@linux-foundation.org, toshi.kani@hpe.com, viro@zeniv.linux.org.uk Subject: [patch 7/9] arch/x86/lib/usercopy_64.c: fix __copy_user_flushcache() cache writeback Message-ID: <20200926041924.JrUcL1D_9%akpm@linux-foundation.org> In-Reply-To: <20200925211725.0fea54be9e9715486efea21f@linux-foundation.org> User-Agent: s-nail v14.8.16 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Mikulas Patocka Subject: arch/x86/lib/usercopy_64.c: fix __copy_user_flushcache() cache writeback If we copy less than 8 bytes and if the destination crosses a cache line, __copy_user_flushcache would invalidate only the first cache line. This patch makes it invalidate the second cache line as well. Link: https://lkml.kernel.org/r/alpine.LRH.2.02.2009161451140.21915@file01.intranet.prod.int.rdu2.redhat.com Fixes: 0aed55af88345b ("x86, uaccess: introduce copy_from_iter_flushcache for pmem / cache-bypass operations") Signed-off-by: Mikulas Patocka Reviewed-by: Dan Williams Cc: Jan Kara Cc: Jeff Moyer Cc: Ingo Molnar Cc: Christoph Hellwig Cc: Toshi Kani Cc: "H. Peter Anvin" Cc: Al Viro Cc: Thomas Gleixner Cc: Matthew Wilcox Cc: Ross Zwisler Cc: Ingo Molnar Cc: "H. Peter Anvin" Cc: Signed-off-by: Andrew Morton --- arch/x86/lib/usercopy_64.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/arch/x86/lib/usercopy_64.c~arch-x86-lib-usercopy_64c-fix-__copy_user_flushcache-cache-writeback +++ a/arch/x86/lib/usercopy_64.c @@ -120,7 +120,7 @@ long __copy_user_flushcache(void *dst, c */ if (size < 8) { if (!IS_ALIGNED(dest, 4) || size != 4) - clean_cache_range(dst, 1); + clean_cache_range(dst, size); } else { if (!IS_ALIGNED(dest, 8)) { dest = ALIGN(dest, boot_cpu_data.x86_clflush_size); From patchwork Sat Sep 26 04:19:28 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 11801101 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 03FD3112C for ; Sat, 26 Sep 2020 04:19:32 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id A87C221527 for ; Sat, 26 Sep 2020 04:19:31 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=kernel.org header.i=@kernel.org header.b="orlT/les" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A87C221527 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id C41666B0073; Sat, 26 Sep 2020 00:19:30 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id BEF8D6B0074; Sat, 26 Sep 2020 00:19:30 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B2B906B0075; Sat, 26 Sep 2020 00:19:30 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0094.hostedemail.com [216.40.44.94]) by kanga.kvack.org (Postfix) with ESMTP id 9D8DF6B0073 for ; Sat, 26 Sep 2020 00:19:30 -0400 (EDT) Received: from smtpin15.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 68FCB181AE864 for ; Sat, 26 Sep 2020 04:19:30 +0000 (UTC) X-FDA: 77303908500.15.rock82_580e68f2716d Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin15.hostedemail.com (Postfix) with ESMTP id 438A01814B0C1 for ; Sat, 26 Sep 2020 04:19:30 +0000 (UTC) X-Spam-Summary: 1,0,0,50d57a97053ab3a9,d41d8cd98f00b204,akpm@linux-foundation.org,,RULES_HIT:1:41:69:355:379:800:960:967:973:988:989:1260:1345:1359:1381:1431:1437:1605:1730:1747:1777:1792:2194:2198:2199:2200:2393:2525:2553:2559:2564:2636:2682:2685:2731:2741:2859:2898:2902:2933:2937:2939:2942:2945:2947:2951:2954:3022:3138:3139:3140:3141:3142:3369:3865:3866:3867:3868:3870:3871:3872:3874:3934:3936:3938:3941:3944:3947:3950:3953:3956:3959:4043:4250:4321:4605:5007:6119:6238:6261:6653:6737:7576:7875:7901:7903:7904:8603:8660:8957:9025:9545:10004:11026:11232:11473:11658:11914:12043:12048:12296:12297:12438:12517:12519:12555:12679:12740:12895:12986:13148:13161:13229:13230:13548:13870:13972:14096:21080:21433:21451:21611:21627:21796:21939:21990:30003:30045:30054:30056:30064:30070:30090,0,RBL:198.145.29.99:@linux-foundation.org:.lbl8.mailshell.net-62.2.0.100 64.100.201.201;04yfznicdaccwhkt5h4z83aqubey5ycrie7h1u7b7549xeesnixu85meo6m58qi.1oj8gc6187gyp3rczrkkpqn79h6cwfrxtn4s5pfgp3t81fyt8ng7yy tpdq4zip X-HE-Tag: rock82_580e68f2716d X-Filterd-Recvd-Size: 11814 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf49.hostedemail.com (Postfix) with ESMTP for ; Sat, 26 Sep 2020 04:19:29 +0000 (UTC) Received: from localhost.localdomain (c-71-198-47-131.hsd1.ca.comcast.net [71.198.47.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id B8D8F2080A; Sat, 26 Sep 2020 04:19:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1601093969; bh=BLn0rc/DiGFvCooB+p8a0n/+Mt1r6e369KsEgAtvVKI=; h=Date:From:To:Subject:In-Reply-To:From; b=orlT/les5zNiJvpUPhDU0blR/827VuvNwzokG8zW7ACyysH233PhPc0yKm1oVMLTh pW80MB23KlhAztN/JfCm8xUol+GDHWMR7s/cDde316DJoR7uJNm3zMZiRNfZLP3Rcm MU8mXDXFyL2a6h51kVvxBm0UoD7Wo9gTlHpu39fs= Date: Fri, 25 Sep 2020 21:19:28 -0700 From: Andrew Morton To: akpm@linux-foundation.org, cheloha@linux.ibm.com, david@redhat.com, fenghua.yu@intel.com, gregkh@linuxfoundation.org, ldufour@linux.ibm.com, linux-mm@kvack.org, mhocko@suse.com, mm-commits@vger.kernel.org, nathanl@linux.ibm.com, osalvador@suse.de, rafael@kernel.org, stable@vger.kernel.org, tony.luck@intel.com, torvalds@linux-foundation.org Subject: [patch 8/9] mm: replace memmap_context by meminit_context Message-ID: <20200926041928.9xJHGgkah%akpm@linux-foundation.org> In-Reply-To: <20200925211725.0fea54be9e9715486efea21f@linux-foundation.org> User-Agent: s-nail v14.8.16 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Laurent Dufour Subject: mm: replace memmap_context by meminit_context Patch series "mm: fix memory to node bad links in sysfs", v3. Sometimes, firmware may expose interleaved memory layout like this: Early memory node ranges node 1: [mem 0x0000000000000000-0x000000011fffffff] node 2: [mem 0x0000000120000000-0x000000014fffffff] node 1: [mem 0x0000000150000000-0x00000001ffffffff] node 0: [mem 0x0000000200000000-0x000000048fffffff] node 2: [mem 0x0000000490000000-0x00000007ffffffff] In that case, we can see memory blocks assigned to multiple nodes in sysfs: $ ls -l /sys/devices/system/memory/memory21 total 0 lrwxrwxrwx 1 root root 0 Aug 24 05:27 node1 -> ../../node/node1 lrwxrwxrwx 1 root root 0 Aug 24 05:27 node2 -> ../../node/node2 -rw-r--r-- 1 root root 65536 Aug 24 05:27 online -r--r--r-- 1 root root 65536 Aug 24 05:27 phys_device -r--r--r-- 1 root root 65536 Aug 24 05:27 phys_index drwxr-xr-x 2 root root 0 Aug 24 05:27 power -r--r--r-- 1 root root 65536 Aug 24 05:27 removable -rw-r--r-- 1 root root 65536 Aug 24 05:27 state lrwxrwxrwx 1 root root 0 Aug 24 05:25 subsystem -> ../../../../bus/memory -rw-r--r-- 1 root root 65536 Aug 24 05:25 uevent -r--r--r-- 1 root root 65536 Aug 24 05:27 valid_zones The same applies in the node's directory with a memory21 link in both the node1 and node2's directory. This is wrong but doesn't prevent the system to run. However when later, one of these memory blocks is hot-unplugged and then hot-plugged, the system is detecting an inconsistency in the sysfs layout and a BUG_ON() is raised: ------------[ cut here ]------------ kernel BUG at /Users/laurent/src/linux-ppc/mm/memory_hotplug.c:1084! Oops: Exception in kernel mode, sig: 5 [#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries Modules linked in: rpadlpar_io rpaphp pseries_rng rng_core vmx_crypto gf128mul binfmt_misc ip_tables x_tables xfs libcrc32c crc32c_vpmsum autofs4 CPU: 8 PID: 10256 Comm: drmgr Not tainted 5.9.0-rc1+ #25 NIP: c000000000403f34 LR: c000000000403f2c CTR: 0000000000000000 REGS: c0000004876e3660 TRAP: 0700 Not tainted (5.9.0-rc1+) MSR: 800000000282b033 CR: 24000448 XER: 20040000 CFAR: c000000000846d20 IRQMASK: 0 GPR00: c000000000403f2c c0000004876e38f0 c0000000012f6f00 ffffffffffffffef GPR04: 0000000000000227 c0000004805ae680 0000000000000000 00000004886f0000 GPR08: 0000000000000226 0000000000000003 0000000000000002 fffffffffffffffd GPR12: 0000000088000484 c00000001ec96280 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000004 0000000000000003 GPR20: c00000047814ffe0 c0000007ffff7c08 0000000000000010 c0000000013332c8 GPR24: 0000000000000000 c0000000011f6cc0 0000000000000000 0000000000000000 GPR28: ffffffffffffffef 0000000000000001 0000000150000000 0000000010000000 NIP [c000000000403f34] add_memory_resource+0x244/0x340 LR [c000000000403f2c] add_memory_resource+0x23c/0x340 Call Trace: [c0000004876e38f0] [c000000000403f2c] add_memory_resource+0x23c/0x340 (unreliable) [c0000004876e39c0] [c00000000040408c] __add_memory+0x5c/0xf0 [c0000004876e39f0] [c0000000000e2b94] dlpar_add_lmb+0x1b4/0x500 [c0000004876e3ad0] [c0000000000e3888] dlpar_memory+0x1f8/0xb80 [c0000004876e3b60] [c0000000000dc0d0] handle_dlpar_errorlog+0xc0/0x190 [c0000004876e3bd0] [c0000000000dc398] dlpar_store+0x198/0x4a0 [c0000004876e3c90] [c00000000072e630] kobj_attr_store+0x30/0x50 [c0000004876e3cb0] [c00000000051f954] sysfs_kf_write+0x64/0x90 [c0000004876e3cd0] [c00000000051ee40] kernfs_fop_write+0x1b0/0x290 [c0000004876e3d20] [c000000000438dd8] vfs_write+0xe8/0x290 [c0000004876e3d70] [c0000000004391ac] ksys_write+0xdc/0x130 [c0000004876e3dc0] [c000000000034e40] system_call_exception+0x160/0x270 [c0000004876e3e20] [c00000000000d740] system_call_common+0xf0/0x27c Instruction dump: 48442e35 60000000 0b030000 3cbe0001 7fa3eb78 7bc48402 38a5fffe 7ca5fa14 78a58402 48442db1 60000000 7c7c1b78 <0b030000> 7f23cb78 4bda371d 60000000 ---[ end trace 562fd6c109cd0fb2 ]--- This has been seen on PowerPC LPAR. The root cause of this issue is that when node's memory is registered, the range used can overlap another node's range, thus the memory block is registered to multiple nodes in sysfs. There are 2 issues here: a. The sysfs memory and node's layouts are broken due to these multiple links b. The link errors in link_mem_sections() should not lead to a system panic. To address a. register_mem_sect_under_node should not rely on the system state to detect whether the link operation is triggered by a hot plug operation or not. This is addressed by the patches 1 and 2 of this series. The patch 3 is addressing the point b. This patch (of 2): The memmap_context enum is used to detect whether a memory operation is due to a hot-add operation or happening at boot time. Make it general to the hotplug operation and rename it as meminit_context. There is no functional change introduced by this patch Link: https://lkml.kernel.org/r/20200915094143.79181-1-ldufour@linux.ibm.com Link: https://lkml.kernel.org/r/20200915132624.9723-1-ldufour@linux.ibm.com Signed-off-by: Laurent Dufour Suggested-by: David Hildenbrand Reviewed-by: David Hildenbrand Reviewed-by: Oscar Salvador Acked-by: Michal Hocko Cc: Greg Kroah-Hartman Cc: "Rafael J . Wysocki" Cc: Nathan Lynch Cc: Scott Cheloha Cc: Tony Luck Cc: Fenghua Yu Cc: Signed-off-by: Andrew Morton --- arch/ia64/mm/init.c | 6 +++--- include/linux/mm.h | 2 +- include/linux/mmzone.h | 11 ++++++++--- mm/memory_hotplug.c | 2 +- mm/page_alloc.c | 10 +++++----- 5 files changed, 18 insertions(+), 13 deletions(-) --- a/arch/ia64/mm/init.c~mm-replace-memmap_context-by-meminit_context +++ a/arch/ia64/mm/init.c @@ -538,7 +538,7 @@ virtual_memmap_init(u64 start, u64 end, if (map_start < map_end) memmap_init_zone((unsigned long)(map_end - map_start), args->nid, args->zone, page_to_pfn(map_start), - MEMMAP_EARLY, NULL); + MEMINIT_EARLY, NULL); return 0; } @@ -547,8 +547,8 @@ memmap_init (unsigned long size, int nid unsigned long start_pfn) { if (!vmem_map) { - memmap_init_zone(size, nid, zone, start_pfn, MEMMAP_EARLY, - NULL); + memmap_init_zone(size, nid, zone, start_pfn, + MEMINIT_EARLY, NULL); } else { struct page *start; struct memmap_init_callback_data args; --- a/include/linux/mm.h~mm-replace-memmap_context-by-meminit_context +++ a/include/linux/mm.h @@ -2416,7 +2416,7 @@ extern int __meminit __early_pfn_to_nid( extern void set_dma_reserve(unsigned long new_dma_reserve); extern void memmap_init_zone(unsigned long, int, unsigned long, unsigned long, - enum memmap_context, struct vmem_altmap *); + enum meminit_context, struct vmem_altmap *); extern void setup_per_zone_wmarks(void); extern int __meminit init_per_zone_wmark_min(void); extern void mem_init(void); --- a/include/linux/mmzone.h~mm-replace-memmap_context-by-meminit_context +++ a/include/linux/mmzone.h @@ -824,10 +824,15 @@ bool zone_watermark_ok(struct zone *z, u unsigned int alloc_flags); bool zone_watermark_ok_safe(struct zone *z, unsigned int order, unsigned long mark, int highest_zoneidx); -enum memmap_context { - MEMMAP_EARLY, - MEMMAP_HOTPLUG, +/* + * Memory initialization context, use to differentiate memory added by + * the platform statically or via memory hotplug interface. + */ +enum meminit_context { + MEMINIT_EARLY, + MEMINIT_HOTPLUG, }; + extern void init_currently_empty_zone(struct zone *zone, unsigned long start_pfn, unsigned long size); --- a/mm/memory_hotplug.c~mm-replace-memmap_context-by-meminit_context +++ a/mm/memory_hotplug.c @@ -729,7 +729,7 @@ void __ref move_pfn_range_to_zone(struct * are reserved so nobody should be touching them so we should be safe */ memmap_init_zone(nr_pages, nid, zone_idx(zone), start_pfn, - MEMMAP_HOTPLUG, altmap); + MEMINIT_HOTPLUG, altmap); set_zone_contiguous(zone); } --- a/mm/page_alloc.c~mm-replace-memmap_context-by-meminit_context +++ a/mm/page_alloc.c @@ -5975,7 +5975,7 @@ overlap_memmap_init(unsigned long zone, * done. Non-atomic initialization, single-pass. */ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, - unsigned long start_pfn, enum memmap_context context, + unsigned long start_pfn, enum meminit_context context, struct vmem_altmap *altmap) { unsigned long pfn, end_pfn = start_pfn + size; @@ -6007,7 +6007,7 @@ void __meminit memmap_init_zone(unsigned * There can be holes in boot-time mem_map[]s handed to this * function. They do not exist on hotplugged memory. */ - if (context == MEMMAP_EARLY) { + if (context == MEMINIT_EARLY) { if (overlap_memmap_init(zone, &pfn)) continue; if (defer_init(nid, pfn, end_pfn)) @@ -6016,7 +6016,7 @@ void __meminit memmap_init_zone(unsigned page = pfn_to_page(pfn); __init_single_page(page, pfn, zone, nid); - if (context == MEMMAP_HOTPLUG) + if (context == MEMINIT_HOTPLUG) __SetPageReserved(page); /* @@ -6099,7 +6099,7 @@ void __ref memmap_init_zone_device(struc * check here not to call set_pageblock_migratetype() against * pfn out of zone. * - * Please note that MEMMAP_HOTPLUG path doesn't clear memmap + * Please note that MEMINIT_HOTPLUG path doesn't clear memmap * because this is done early in section_activate() */ if (!(pfn & (pageblock_nr_pages - 1))) { @@ -6137,7 +6137,7 @@ void __meminit __weak memmap_init(unsign if (end_pfn > start_pfn) { size = end_pfn - start_pfn; memmap_init_zone(size, nid, zone, start_pfn, - MEMMAP_EARLY, NULL); + MEMINIT_EARLY, NULL); } } } From patchwork Sat Sep 26 04:19:31 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 11801103 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 13336112C for ; Sat, 26 Sep 2020 04:19:36 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id C120E207C4 for ; Sat, 26 Sep 2020 04:19:35 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=kernel.org header.i=@kernel.org header.b="EwEPqfL7" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C120E207C4 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id C68376B0074; Sat, 26 Sep 2020 00:19:34 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id BC9786B0075; Sat, 26 Sep 2020 00:19:34 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AB9718E0001; Sat, 26 Sep 2020 00:19:34 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0187.hostedemail.com [216.40.44.187]) by kanga.kvack.org (Postfix) with ESMTP id 967756B0074 for ; Sat, 26 Sep 2020 00:19:34 -0400 (EDT) Received: from smtpin28.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 46F138249980 for ; Sat, 26 Sep 2020 04:19:34 +0000 (UTC) X-FDA: 77303908668.28.jeans37_29022392716d Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin28.hostedemail.com (Postfix) with ESMTP id 258126C04 for ; Sat, 26 Sep 2020 04:19:34 +0000 (UTC) X-Spam-Summary: 1,0,0,fd96d58988649e20,d41d8cd98f00b204,akpm@linux-foundation.org,,RULES_HIT:1:41:69:355:379:800:960:967:968:973:988:989:1260:1345:1359:1381:1431:1437:1605:1730:1747:1777:1792:2198:2199:2393:2525:2553:2559:2563:2637:2682:2685:2859:2898:2902:2933:2937:2939:2942:2945:2947:2951:2954:3022:3138:3139:3140:3141:3142:3865:3866:3867:3868:3870:3871:3872:3874:3934:3936:3938:3941:3944:3947:3950:3953:3956:3959:4043:4250:4321:4605:5007:6119:6238:6261:6653:6671:6737:7558:7576:7875:7903:8603:8660:9025:9545:9592:10004:11026:11473:11658:11914:12043:12048:12291:12296:12297:12438:12517:12519:12555:12679:12683:12740:12895:12986:13007:13148:13161:13229:13230:13548:13870:13972:21080:21433:21451:21611:21627:21939:21990:30045:30054:30056:30064:30070:30080:30090,0,RBL:198.145.29.99:@linux-foundation.org:.lbl8.mailshell.net-62.2.0.100 64.100.201.201;04yrqgwsy57pnqgyhwuf54mt37jfayph9xczj3ahkhwax6hik48awwxngrqknr5.i69dgsgsg5iysw9hu3wptrgj81yn7tz4edzfgsgmm3ru8xg5r3qe1xkwt91tthc.n-lbl8.mail shell.ne X-HE-Tag: jeans37_29022392716d X-Filterd-Recvd-Size: 13496 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf11.hostedemail.com (Postfix) with ESMTP for ; Sat, 26 Sep 2020 04:19:33 +0000 (UTC) Received: from localhost.localdomain (c-71-198-47-131.hsd1.ca.comcast.net [71.198.47.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 4DFF320882; Sat, 26 Sep 2020 04:19:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1601093972; bh=usSBJeQ8z9EC87empoXgaCPunCczq1s1yFugQTQwcUc=; h=Date:From:To:Subject:In-Reply-To:From; b=EwEPqfL7SdVssscBl3RuNWldsF01lgY1/p/j98DVLSK7VVlcnVUnzp+kE+vrDpIqW 5TAEahOPor2r9JjyvRu+Xs944ZFmXJqpNdQBpr06+J077Lfc6Mqm40M2U5ELliRWgF DpLLNCfO7Jo4HNy+r3HbGvJ/ecjKoSatO2Uu0eAs= Date: Fri, 25 Sep 2020 21:19:31 -0700 From: Andrew Morton To: akpm@linux-foundation.org, cheloha@linux.ibm.com, david@redhat.com, fenghua.yu@intel.com, gregkh@linuxfoundation.org, ldufour@linux.ibm.com, linux-mm@kvack.org, mhocko@suse.com, mm-commits@vger.kernel.org, nathanl@linux.ibm.com, osalvador@suse.de, rafael@kernel.org, stable@vger.kernel.org, tony.luck@intel.com, torvalds@linux-foundation.org Subject: [patch 9/9] mm: don't rely on system state to detect hot-plug operations Message-ID: <20200926041931.4jyPIAVMX%akpm@linux-foundation.org> In-Reply-To: <20200925211725.0fea54be9e9715486efea21f@linux-foundation.org> User-Agent: s-nail v14.8.16 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Laurent Dufour Subject: mm: don't rely on system state to detect hot-plug operations In register_mem_sect_under_node() the system_state's value is checked to detect whether the call is made during boot time or during an hot-plug operation. Unfortunately, that check against SYSTEM_BOOTING is wrong because regular memory is registered at SYSTEM_SCHEDULING state. In addition, memory hot-plug operation can be triggered at this system state by the ACPI [1]. So checking against the system state is not enough. The consequence is that on system with interleaved node's ranges like this: Early memory node ranges node 1: [mem 0x0000000000000000-0x000000011fffffff] node 2: [mem 0x0000000120000000-0x000000014fffffff] node 1: [mem 0x0000000150000000-0x00000001ffffffff] node 0: [mem 0x0000000200000000-0x000000048fffffff] node 2: [mem 0x0000000490000000-0x00000007ffffffff] This can be seen on PowerPC LPAR after multiple memory hot-plug and hot-unplug operations are done. At the next reboot the node's memory ranges can be interleaved and since the call to link_mem_sections() is made in topology_init() while the system is in the SYSTEM_SCHEDULING state, the node's id is not checked, and the sections registered to multiple nodes: $ ls -l /sys/devices/system/memory/memory21/node* total 0 lrwxrwxrwx 1 root root 0 Aug 24 05:27 node1 -> ../../node/node1 lrwxrwxrwx 1 root root 0 Aug 24 05:27 node2 -> ../../node/node2 In that case, the system is able to boot but if later one of theses memory blocks is hot-unplugged and then hot-plugged, the sysfs inconsistency is detected and this is triggering a BUG_ON(): ------------[ cut here ]------------ kernel BUG at /Users/laurent/src/linux-ppc/mm/memory_hotplug.c:1084! Oops: Exception in kernel mode, sig: 5 [#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries Modules linked in: rpadlpar_io rpaphp pseries_rng rng_core vmx_crypto gf128mul binfmt_misc ip_tables x_tables xfs libcrc32c crc32c_vpmsum autofs4 CPU: 8 PID: 10256 Comm: drmgr Not tainted 5.9.0-rc1+ #25 NIP: c000000000403f34 LR: c000000000403f2c CTR: 0000000000000000 REGS: c0000004876e3660 TRAP: 0700 Not tainted (5.9.0-rc1+) MSR: 800000000282b033 CR: 24000448 XER: 20040000 CFAR: c000000000846d20 IRQMASK: 0 GPR00: c000000000403f2c c0000004876e38f0 c0000000012f6f00 ffffffffffffffef GPR04: 0000000000000227 c0000004805ae680 0000000000000000 00000004886f0000 GPR08: 0000000000000226 0000000000000003 0000000000000002 fffffffffffffffd GPR12: 0000000088000484 c00000001ec96280 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000004 0000000000000003 GPR20: c00000047814ffe0 c0000007ffff7c08 0000000000000010 c0000000013332c8 GPR24: 0000000000000000 c0000000011f6cc0 0000000000000000 0000000000000000 GPR28: ffffffffffffffef 0000000000000001 0000000150000000 0000000010000000 NIP [c000000000403f34] add_memory_resource+0x244/0x340 LR [c000000000403f2c] add_memory_resource+0x23c/0x340 Call Trace: [c0000004876e38f0] [c000000000403f2c] add_memory_resource+0x23c/0x340 (unreliable) [c0000004876e39c0] [c00000000040408c] __add_memory+0x5c/0xf0 [c0000004876e39f0] [c0000000000e2b94] dlpar_add_lmb+0x1b4/0x500 [c0000004876e3ad0] [c0000000000e3888] dlpar_memory+0x1f8/0xb80 [c0000004876e3b60] [c0000000000dc0d0] handle_dlpar_errorlog+0xc0/0x190 [c0000004876e3bd0] [c0000000000dc398] dlpar_store+0x198/0x4a0 [c0000004876e3c90] [c00000000072e630] kobj_attr_store+0x30/0x50 [c0000004876e3cb0] [c00000000051f954] sysfs_kf_write+0x64/0x90 [c0000004876e3cd0] [c00000000051ee40] kernfs_fop_write+0x1b0/0x290 [c0000004876e3d20] [c000000000438dd8] vfs_write+0xe8/0x290 [c0000004876e3d70] [c0000000004391ac] ksys_write+0xdc/0x130 [c0000004876e3dc0] [c000000000034e40] system_call_exception+0x160/0x270 [c0000004876e3e20] [c00000000000d740] system_call_common+0xf0/0x27c Instruction dump: 48442e35 60000000 0b030000 3cbe0001 7fa3eb78 7bc48402 38a5fffe 7ca5fa14 78a58402 48442db1 60000000 7c7c1b78 <0b030000> 7f23cb78 4bda371d 60000000 ---[ end trace 562fd6c109cd0fb2 ]--- This patch addresses the root cause by not relying on the system_state value to detect whether the call is due to a hot-plug operation. An extra parameter is added to link_mem_sections() detailing whether the operation is due to a hot-plug operation. [1] According to Oscar Salvador, using this qemu command line, ACPI memory hotplug operations are raised at SYSTEM_SCHEDULING state: $QEMU -enable-kvm -machine pc -smp 4,sockets=4,cores=1,threads=1 -cpu host -monitor pty \ -m size=$MEM,slots=255,maxmem=4294967296k \ -numa node,nodeid=0,cpus=0-3,mem=512 -numa node,nodeid=1,mem=512 \ -object memory-backend-ram,id=memdimm0,size=134217728 -device pc-dimm,node=0,memdev=memdimm0,id=dimm0,slot=0 \ -object memory-backend-ram,id=memdimm1,size=134217728 -device pc-dimm,node=0,memdev=memdimm1,id=dimm1,slot=1 \ -object memory-backend-ram,id=memdimm2,size=134217728 -device pc-dimm,node=0,memdev=memdimm2,id=dimm2,slot=2 \ -object memory-backend-ram,id=memdimm3,size=134217728 -device pc-dimm,node=0,memdev=memdimm3,id=dimm3,slot=3 \ -object memory-backend-ram,id=memdimm4,size=134217728 -device pc-dimm,node=1,memdev=memdimm4,id=dimm4,slot=4 \ -object memory-backend-ram,id=memdimm5,size=134217728 -device pc-dimm,node=1,memdev=memdimm5,id=dimm5,slot=5 \ -object memory-backend-ram,id=memdimm6,size=134217728 -device pc-dimm,node=1,memdev=memdimm6,id=dimm6,slot=6 \ Link: https://lkml.kernel.org/r/20200915094143.79181-3-ldufour@linux.ibm.com Fixes: 4fbce633910e ("mm/memory_hotplug.c: make register_mem_sect_under_node() a callback of walk_memory_range()") Signed-off-by: Laurent Dufour Reviewed-by: David Hildenbrand Reviewed-by: Oscar Salvador Acked-by: Michal Hocko Cc: Greg Kroah-Hartman Cc: "Rafael J. Wysocki" Cc: Fenghua Yu Cc: Nathan Lynch Cc: Scott Cheloha Cc: Tony Luck Cc: Signed-off-by: Andrew Morton --- drivers/base/node.c | 85 ++++++++++++++++++++++++++--------------- include/linux/node.h | 11 +++-- mm/memory_hotplug.c | 3 - 3 files changed, 64 insertions(+), 35 deletions(-) --- a/drivers/base/node.c~mm-dont-rely-on-system-state-to-detect-hot-plug-operations +++ a/drivers/base/node.c @@ -761,14 +761,36 @@ static int __ref get_nid_for_pfn(unsigne return pfn_to_nid(pfn); } +static int do_register_memory_block_under_node(int nid, + struct memory_block *mem_blk) +{ + int ret; + + /* + * If this memory block spans multiple nodes, we only indicate + * the last processed node. + */ + mem_blk->nid = nid; + + ret = sysfs_create_link_nowarn(&node_devices[nid]->dev.kobj, + &mem_blk->dev.kobj, + kobject_name(&mem_blk->dev.kobj)); + if (ret) + return ret; + + return sysfs_create_link_nowarn(&mem_blk->dev.kobj, + &node_devices[nid]->dev.kobj, + kobject_name(&node_devices[nid]->dev.kobj)); +} + /* register memory section under specified node if it spans that node */ -static int register_mem_sect_under_node(struct memory_block *mem_blk, - void *arg) +static int register_mem_block_under_node_early(struct memory_block *mem_blk, + void *arg) { unsigned long memory_block_pfns = memory_block_size_bytes() / PAGE_SIZE; unsigned long start_pfn = section_nr_to_pfn(mem_blk->start_section_nr); unsigned long end_pfn = start_pfn + memory_block_pfns - 1; - int ret, nid = *(int *)arg; + int nid = *(int *)arg; unsigned long pfn; for (pfn = start_pfn; pfn <= end_pfn; pfn++) { @@ -785,39 +807,34 @@ static int register_mem_sect_under_node( } /* - * We need to check if page belongs to nid only for the boot - * case, during hotplug we know that all pages in the memory - * block belong to the same node. - */ - if (system_state == SYSTEM_BOOTING) { - page_nid = get_nid_for_pfn(pfn); - if (page_nid < 0) - continue; - if (page_nid != nid) - continue; - } - - /* - * If this memory block spans multiple nodes, we only indicate - * the last processed node. + * We need to check if page belongs to nid only at the boot + * case because node's ranges can be interleaved. */ - mem_blk->nid = nid; - - ret = sysfs_create_link_nowarn(&node_devices[nid]->dev.kobj, - &mem_blk->dev.kobj, - kobject_name(&mem_blk->dev.kobj)); - if (ret) - return ret; + page_nid = get_nid_for_pfn(pfn); + if (page_nid < 0) + continue; + if (page_nid != nid) + continue; - return sysfs_create_link_nowarn(&mem_blk->dev.kobj, - &node_devices[nid]->dev.kobj, - kobject_name(&node_devices[nid]->dev.kobj)); + return do_register_memory_block_under_node(nid, mem_blk); } /* mem section does not span the specified node */ return 0; } /* + * During hotplug we know that all pages in the memory block belong to the same + * node. + */ +static int register_mem_block_under_node_hotplug(struct memory_block *mem_blk, + void *arg) +{ + int nid = *(int *)arg; + + return do_register_memory_block_under_node(nid, mem_blk); +} + +/* * Unregister a memory block device under the node it spans. Memory blocks * with multiple nodes cannot be offlined and therefore also never be removed. */ @@ -832,11 +849,19 @@ void unregister_memory_block_under_nodes kobject_name(&node_devices[mem_blk->nid]->dev.kobj)); } -int link_mem_sections(int nid, unsigned long start_pfn, unsigned long end_pfn) +int link_mem_sections(int nid, unsigned long start_pfn, unsigned long end_pfn, + enum meminit_context context) { + walk_memory_blocks_func_t func; + + if (context == MEMINIT_HOTPLUG) + func = register_mem_block_under_node_hotplug; + else + func = register_mem_block_under_node_early; + return walk_memory_blocks(PFN_PHYS(start_pfn), PFN_PHYS(end_pfn - start_pfn), (void *)&nid, - register_mem_sect_under_node); + func); } #ifdef CONFIG_HUGETLBFS --- a/include/linux/node.h~mm-dont-rely-on-system-state-to-detect-hot-plug-operations +++ a/include/linux/node.h @@ -99,11 +99,13 @@ extern struct node *node_devices[]; typedef void (*node_registration_func_t)(struct node *); #if defined(CONFIG_MEMORY_HOTPLUG_SPARSE) && defined(CONFIG_NUMA) -extern int link_mem_sections(int nid, unsigned long start_pfn, - unsigned long end_pfn); +int link_mem_sections(int nid, unsigned long start_pfn, + unsigned long end_pfn, + enum meminit_context context); #else static inline int link_mem_sections(int nid, unsigned long start_pfn, - unsigned long end_pfn) + unsigned long end_pfn, + enum meminit_context context) { return 0; } @@ -128,7 +130,8 @@ static inline int register_one_node(int if (error) return error; /* link memory sections under this node */ - error = link_mem_sections(nid, start_pfn, end_pfn); + error = link_mem_sections(nid, start_pfn, end_pfn, + MEMINIT_EARLY); } return error; --- a/mm/memory_hotplug.c~mm-dont-rely-on-system-state-to-detect-hot-plug-operations +++ a/mm/memory_hotplug.c @@ -1080,7 +1080,8 @@ int __ref add_memory_resource(int nid, s } /* link memory sections under this node.*/ - ret = link_mem_sections(nid, PFN_DOWN(start), PFN_UP(start + size - 1)); + ret = link_mem_sections(nid, PFN_DOWN(start), PFN_UP(start + size - 1), + MEMINIT_HOTPLUG); BUG_ON(ret); /* create new memmap entry */