From patchwork Tue Mar 22 21:38:33 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789043 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 15FDBC433F5 for ; Tue, 22 Mar 2022 21:38:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8E96B6B0074; Tue, 22 Mar 2022 17:38:36 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8992E6B0075; Tue, 22 Mar 2022 17:38:36 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 760546B0078; Tue, 22 Mar 2022 17:38:36 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0132.hostedemail.com [216.40.44.132]) by kanga.kvack.org (Postfix) with ESMTP id 68AFB6B0074 for ; Tue, 22 Mar 2022 17:38:36 -0400 (EDT) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 108818249980 for ; Tue, 22 Mar 2022 21:38:36 +0000 (UTC) X-FDA: 79273336632.26.7858079 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf01.hostedemail.com (Postfix) with ESMTP id 9BF864001E for ; Tue, 22 Mar 2022 21:38:35 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 063C561741; Tue, 22 Mar 2022 21:38:35 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 628B6C340EC; Tue, 22 Mar 2022 21:38:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985114; bh=bvIqLgXiERK4GBDLZRKDdHuPcuE01ue3q5jEwZMes2U=; h=Date:To:From:In-Reply-To:Subject:From; b=ekHBUYmWeXzUtNgxrL/QZlb1Nxi2FHl9jlSZ+uxulfKz1uuQkOXB4hypdRihqc2la lpCWHV6oJu62Zla1DsGDuqJCkpeIyeD3zwEaoeAZ9Wr7ts9SJS/JQlLr/kX1qBs8Cv NVYMo9ygjppq8YzqRh3G8GIv9hAUvj2+daK7jjZw= Date: Tue, 22 Mar 2022 14:38:33 -0700 To: tj@kernel.org,pmladek@suse.com,laoar.shao@gmail.com,ebiederm@xmission.com,david@redhat.com,caihuoqing@baidu.com,linux@rasmusvillemoes.dk,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 001/227] linux/kthread.h: remove unused macros Message-Id: <20220322213834.628B6C340EC@smtp.kernel.org> X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 9BF864001E X-Stat-Signature: yuf9tgoj9y6awsk6oh76z56o3ee5qiqq Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=ekHBUYmW; dmarc=none; spf=pass (imf01.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Rspam-User: X-HE-Tag: 1647985115-958989 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Rasmus Villemoes Subject: linux/kthread.h: remove unused macros Ever since these macros were introduced in commit b56c0d8937e6 ("kthread: implement kthread_worker"), there has been precisely one user (commit 4d115420707a, "NVMe: Async IO queue deletion"), and that user went away in 2016 with db3cbfff5bcc ("NVMe: IO queue deletion re-write"). Apart from being unused, these macros are also awkward to use (which may contribute to them not being used): Having a way to statically (or on-stack) allocating the storage for the struct kthread_worker itself doesn't help much, since obviously one needs to have some code for actually _spawning_ the worker thread, which must have error checking. And these days we have the kthread_create_worker() interface which both allocates the struct kthread_worker and spawns the kthread. Link: https://lkml.kernel.org/r/20220314145343.494694-1-linux@rasmusvillemoes.dk Signed-off-by: Rasmus Villemoes Acked-by: Tejun Heo Cc: "Eric W. Biederman" Cc: Petr Mladek Cc: David Hildenbrand Cc: Yafang Shao Cc: Cai Huoqing Signed-off-by: Andrew Morton --- include/linux/kthread.h | 22 ---------------------- 1 file changed, 22 deletions(-) --- a/include/linux/kthread.h~linux-kthreadh-remove-unused-macros +++ a/include/linux/kthread.h @@ -141,12 +141,6 @@ struct kthread_delayed_work { struct timer_list timer; }; -#define KTHREAD_WORKER_INIT(worker) { \ - .lock = __RAW_SPIN_LOCK_UNLOCKED((worker).lock), \ - .work_list = LIST_HEAD_INIT((worker).work_list), \ - .delayed_work_list = LIST_HEAD_INIT((worker).delayed_work_list),\ - } - #define KTHREAD_WORK_INIT(work, fn) { \ .node = LIST_HEAD_INIT((work).node), \ .func = (fn), \ @@ -158,9 +152,6 @@ struct kthread_delayed_work { TIMER_IRQSAFE), \ } -#define DEFINE_KTHREAD_WORKER(worker) \ - struct kthread_worker worker = KTHREAD_WORKER_INIT(worker) - #define DEFINE_KTHREAD_WORK(work, fn) \ struct kthread_work work = KTHREAD_WORK_INIT(work, fn) @@ -168,19 +159,6 @@ struct kthread_delayed_work { struct kthread_delayed_work dwork = \ KTHREAD_DELAYED_WORK_INIT(dwork, fn) -/* - * kthread_worker.lock needs its own lockdep class key when defined on - * stack with lockdep enabled. Use the following macros in such cases. - */ -#ifdef CONFIG_LOCKDEP -# define KTHREAD_WORKER_INIT_ONSTACK(worker) \ - ({ kthread_init_worker(&worker); worker; }) -# define DEFINE_KTHREAD_WORKER_ONSTACK(worker) \ - struct kthread_worker worker = KTHREAD_WORKER_INIT_ONSTACK(worker) -#else -# define DEFINE_KTHREAD_WORKER_ONSTACK(worker) DEFINE_KTHREAD_WORKER(worker) -#endif - extern void __kthread_init_worker(struct kthread_worker *worker, const char *name, struct lock_class_key *key); From patchwork Tue Mar 22 21:38:36 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789044 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0AAE0C433F5 for ; Tue, 22 Mar 2022 21:38:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 95F7F6B0075; Tue, 22 Mar 2022 17:38:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 90DE76B0078; Tue, 22 Mar 2022 17:38:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7FCF86B007B; Tue, 22 Mar 2022 17:38:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.26]) by kanga.kvack.org (Postfix) with ESMTP id 715A26B0075 for ; Tue, 22 Mar 2022 17:38:39 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 220D6609D3 for ; Tue, 22 Mar 2022 21:38:39 +0000 (UTC) X-FDA: 79273336758.11.2255B2B Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf05.hostedemail.com (Postfix) with ESMTP id 9D2B1100032 for ; Tue, 22 Mar 2022 21:38:38 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id E11936173C; Tue, 22 Mar 2022 21:38:37 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 42856C340EE; Tue, 22 Mar 2022 21:38:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985117; bh=mayOljjBZi5XbJ520N414EV0mK79u+hTckASm2U4Fk4=; h=Date:To:From:In-Reply-To:Subject:From; b=M1zxtOXDuAHXyFQNUchAF8EWgJ0PLP1xO2wFHpN/o7MYoj6j1S38zCRWjk6MIWdx+ gSadTGe1mzM8d7eeavJu0EJRBtyHafw9083pWoWrNF1eFkmWpXDsAN121A9/fw/2s4 fzrdrGFv3WfriurBKSszhdUQVh4tEWFmuNEsIZuo= Date: Tue, 22 Mar 2022 14:38:36 -0700 To: joe@perches.com,colin.i.king@gmail.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 002/227] scripts/spelling.txt: add more spellings to spelling.txt Message-Id: <20220322213837.42856C340EE@smtp.kernel.org> X-Rspamd-Server: rspam09 X-Rspam-User: X-Stat-Signature: dgqnmbfzcfffki93bc1k7aaoqz6y84qy Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=M1zxtOXD; dmarc=none; spf=pass (imf05.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Rspamd-Queue-Id: 9D2B1100032 X-HE-Tag: 1647985118-109232 X-Bogosity: Ham, tests=bogofilter, spamicity=0.100882, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Colin Ian King Subject: scripts/spelling.txt: add more spellings to spelling.txt Some of the more common spelling mistakes and typos that I've found while fixing up spelling mistakes in the kernel in the past four months. Link: https://lkml.kernel.org/r/20220216152343.105546-1-colin.i.king@gmail.com Signed-off-by: Colin Ian King Cc: Joe Perches Signed-off-by: Andrew Morton --- scripts/spelling.txt | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) --- a/scripts/spelling.txt~scripts-spellingtxt-add-more-spellings-to-spellingtxt +++ a/scripts/spelling.txt @@ -180,6 +180,7 @@ asuming||assuming asycronous||asynchronous asychronous||asynchronous asynchnous||asynchronous +asynchronus||asynchronous asynchromous||asynchronous asymetric||asymmetric asymmeric||asymmetric @@ -231,6 +232,7 @@ baloons||balloons bandwith||bandwidth banlance||balance batery||battery +battey||battery beacuse||because becasue||because becomming||becoming @@ -333,6 +335,7 @@ commoditiy||commodity comsume||consume comsumer||consumer comsuming||consuming +comaptible||compatible compability||compatibility compaibility||compatibility comparsion||comparison @@ -353,7 +356,9 @@ compoment||component comppatible||compatible compres||compress compresion||compression +compresser||compressor comression||compression +comsumed||consumed comunicate||communicate comunication||communication conbination||combination @@ -530,6 +535,7 @@ dissconect||disconnect distiction||distinction divisable||divisible divsiors||divisors +dsiabled||disabled docuentation||documentation documantation||documentation documentaion||documentation @@ -677,6 +683,7 @@ frequence||frequency frequncy||frequency frequancy||frequency frome||from +fronend||frontend fucntion||function fuction||function fuctions||functions @@ -761,6 +768,7 @@ implmentation||implementation implmenting||implementing incative||inactive incomming||incoming +incompaitiblity||incompatibility incompatabilities||incompatibilities incompatable||incompatible incompatble||incompatible @@ -942,6 +950,7 @@ metdata||metadata micropone||microphone microprocesspr||microprocessor migrateable||migratable +millenium||millennium milliseonds||milliseconds minium||minimum minimam||minimum @@ -1007,6 +1016,7 @@ notity||notify nubmer||number numebr||number numner||number +nunber||number obtaion||obtain obusing||abusing occassionally||occasionally @@ -1136,6 +1146,7 @@ preprare||prepare pressre||pressure presuambly||presumably previosuly||previously +previsously||previously primative||primitive princliple||principle priorty||priority @@ -1297,6 +1308,7 @@ routins||routines rquest||request runing||running runned||ran +runnnig||running runnning||running runtine||runtime sacrifying||sacrificing @@ -1353,6 +1365,7 @@ similiar||similar simlar||similar simliar||similar simpified||simplified +simultanous||simultaneous singaled||signaled singal||signal singed||signed @@ -1461,6 +1474,7 @@ syste||system sytem||system sythesis||synthesis taht||that +tained||tainted tansmit||transmit targetted||targeted targetting||targeting @@ -1489,6 +1503,7 @@ timout||timeout tmis||this toogle||toggle torerable||tolerable +torlence||tolerance traget||target traking||tracking tramsmitted||transmitted @@ -1503,6 +1518,7 @@ transferd||transferred transfered||transferred transfering||transferring transision||transition +transistioned||transitioned transmittd||transmitted transormed||transformed trasfer||transfer From patchwork Tue Mar 22 21:38:39 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789045 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B9CFCC433F5 for ; Tue, 22 Mar 2022 21:38:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3E4426B007D; Tue, 22 Mar 2022 17:38:42 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 38F866B007B; Tue, 22 Mar 2022 17:38:42 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 230D06B007D; Tue, 22 Mar 2022 17:38:42 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.25]) by kanga.kvack.org (Postfix) with ESMTP id 1429E6B0078 for ; Tue, 22 Mar 2022 17:38:42 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id D3ED522CE7 for ; Tue, 22 Mar 2022 21:38:41 +0000 (UTC) X-FDA: 79273336842.01.806016D Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf05.hostedemail.com (Postfix) with ESMTP id 5CA88100033 for ; Tue, 22 Mar 2022 21:38:41 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 6E6616166F; Tue, 22 Mar 2022 21:38:40 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3117DC340F4; Tue, 22 Mar 2022 21:38:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985120; bh=eZCt7nWQhmPEj1yw/elTFm+QXFc4QMWTtFkhJv31Mw8=; h=Date:To:From:In-Reply-To:Subject:From; b=T28KDiF5HERRVcy3OYX1F7iiNFkNwV8fUQsTTGIgawGjKBka5cm4AkPeAhTDSrf5k jdNiwGfWXrhzmNeLqzSKBP01uGhnmaUeWin+fajHJS+RM6WnvAkuKN0vUjd8wsEl0J OsMnWp89gMxzD7cUU7dLTkF6yS/m5WFuFvKB6XGg= Date: Tue, 22 Mar 2022 14:38:39 -0700 To: anton@tuxera.com,mudongliangabcd@gmail.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 003/227] ntfs: add sanity check on allocation size Message-Id: <20220322213840.3117DC340F4@smtp.kernel.org> X-Stat-Signature: 33a5tiir6ruyq8ee37r8t9e54j8ztuyw X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 5CA88100033 Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=T28KDiF5; dmarc=none; spf=pass (imf05.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Rspam-User: X-HE-Tag: 1647985121-366230 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Dongliang Mu Subject: ntfs: add sanity check on allocation size ntfs_read_inode_mount invokes ntfs_malloc_nofs with zero allocation size. It triggers one BUG in the __ntfs_malloc function. Fix this by adding sanity check on ni->attr_list_size. Link: https://lkml.kernel.org/r/20220120094914.47736-1-dzm91@hust.edu.cn Reported-by: syzbot+3c765c5248797356edaa@syzkaller.appspotmail.com Signed-off-by: Dongliang Mu Acked-by: Anton Altaparmakov Signed-off-by: Andrew Morton --- fs/ntfs/inode.c | 4 ++++ 1 file changed, 4 insertions(+) --- a/fs/ntfs/inode.c~ntfs-add-sanity-check-on-allocation-size +++ a/fs/ntfs/inode.c @@ -1881,6 +1881,10 @@ int ntfs_read_inode_mount(struct inode * } /* Now allocate memory for the attribute list. */ ni->attr_list_size = (u32)ntfs_attr_size(a); + if (!ni->attr_list_size) { + ntfs_error(sb, "Attr_list_size is zero"); + goto put_err_out; + } ni->attr_list = ntfs_malloc_nofs(ni->attr_list_size); if (!ni->attr_list) { ntfs_error(sb, "Not enough memory to allocate buffer " From patchwork Tue Mar 22 21:38:42 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789046 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A7FCAC433EF for ; Tue, 22 Mar 2022 21:38:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3DFD16B0078; Tue, 22 Mar 2022 17:38:45 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3684F6B007B; Tue, 22 Mar 2022 17:38:45 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 209756B007E; Tue, 22 Mar 2022 17:38:45 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.25]) by kanga.kvack.org (Postfix) with ESMTP id 10C026B0078 for ; Tue, 22 Mar 2022 17:38:45 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id DB60621AAC for ; Tue, 22 Mar 2022 21:38:44 +0000 (UTC) X-FDA: 79273336968.02.0638DE5 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf29.hostedemail.com (Postfix) with ESMTP id 6F954120029 for ; Tue, 22 Mar 2022 21:38:44 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id CE1E76173C; Tue, 22 Mar 2022 21:38:43 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 2F9C3C340EE; Tue, 22 Mar 2022 21:38:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985123; bh=8A4cVOx+2H2wsICtI9ey9kBvYxU0IYDOVM2mO9NY4R8=; h=Date:To:From:In-Reply-To:Subject:From; b=Yc8nhr06gsha69GzyQx3JRfiKCgbsvmyYTTDq0tOm0ewqIojf0lZsyHr7b/t0VH4T p5mqYUVU/svqOakmwt3tL0Oh8HthDioTmkKcSKu9jnDk5Sq1UhdqWbZjVxj5I3sHpN CYPPz/dduOwW923ObLBwjqOeaitPoF1r7tnoXwWM= Date: Tue, 22 Mar 2022 14:38:42 -0700 To: zealci@zte.com.cn,piaojun@huawei.com,mark@fasheh.com,junxiao.bi@oracle.com,jlbec@evilplan.org,ghe@suse.com,gechangwei@live.cn,chi.minghao@zte.com.cn,cgel.zte@gmail.com,joseph.qi@linux.alibaba.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 004/227] ocfs2: cleanup some return variables Message-Id: <20220322213843.2F9C3C340EE@smtp.kernel.org> X-Stat-Signature: ytec4rqo9j3ww6pihaxhi7egw5f3ajiq Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=Yc8nhr06; spf=pass (imf29.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspam-User: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 6F954120029 X-HE-Tag: 1647985124-344506 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Joseph Qi Subject: ocfs2: cleanup some return variables Simply return directly instead of assign the return value to another variable. Link: https://lkml.kernel.org/r/20220114021641.13927-1-joseph.qi@linux.alibaba.com Signed-off-by: Joseph Qi Reported-by: Zeal Robot Cc: Minghao Chi Cc: CGEL ZTE Cc: Mark Fasheh Cc: Joel Becker Cc: Junxiao Bi Cc: Changwei Ge Cc: Gang He Cc: Jun Piao Signed-off-by: Andrew Morton --- fs/ocfs2/file.c | 9 +++------ fs/ocfs2/stack_user.c | 18 ++++++------------ 2 files changed, 9 insertions(+), 18 deletions(-) --- a/fs/ocfs2/file.c~ocfs2-cleanup-some-return-variables +++ a/fs/ocfs2/file.c @@ -540,15 +540,12 @@ int ocfs2_add_inode_data(struct ocfs2_su struct ocfs2_alloc_context *meta_ac, enum ocfs2_alloc_restarted *reason_ret) { - int ret; struct ocfs2_extent_tree et; ocfs2_init_dinode_extent_tree(&et, INODE_CACHE(inode), fe_bh); - ret = ocfs2_add_clusters_in_btree(handle, &et, logical_offset, - clusters_to_add, mark_unwritten, - data_ac, meta_ac, reason_ret); - - return ret; + return ocfs2_add_clusters_in_btree(handle, &et, logical_offset, + clusters_to_add, mark_unwritten, + data_ac, meta_ac, reason_ret); } static int ocfs2_extend_allocation(struct inode *inode, u32 logical_start, --- a/fs/ocfs2/stack_user.c~ocfs2-cleanup-some-return-variables +++ a/fs/ocfs2/stack_user.c @@ -683,28 +683,22 @@ static int user_dlm_lock(struct ocfs2_cl void *name, unsigned int namelen) { - int ret; - if (!lksb->lksb_fsdlm.sb_lvbptr) lksb->lksb_fsdlm.sb_lvbptr = (char *)lksb + sizeof(struct dlm_lksb); - ret = dlm_lock(conn->cc_lockspace, mode, &lksb->lksb_fsdlm, - flags|DLM_LKF_NODLCKWT, name, namelen, 0, - fsdlm_lock_ast_wrapper, lksb, - fsdlm_blocking_ast_wrapper); - return ret; + return dlm_lock(conn->cc_lockspace, mode, &lksb->lksb_fsdlm, + flags|DLM_LKF_NODLCKWT, name, namelen, 0, + fsdlm_lock_ast_wrapper, lksb, + fsdlm_blocking_ast_wrapper); } static int user_dlm_unlock(struct ocfs2_cluster_connection *conn, struct ocfs2_dlm_lksb *lksb, u32 flags) { - int ret; - - ret = dlm_unlock(conn->cc_lockspace, lksb->lksb_fsdlm.sb_lkid, - flags, &lksb->lksb_fsdlm, lksb); - return ret; + return dlm_unlock(conn->cc_lockspace, lksb->lksb_fsdlm.sb_lkid, + flags, &lksb->lksb_fsdlm, lksb); } static int user_dlm_lock_status(struct ocfs2_dlm_lksb *lksb) From patchwork Tue Mar 22 21:38:45 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789047 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3E82CC433F5 for ; Tue, 22 Mar 2022 21:38:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C1E1A6B007B; Tue, 22 Mar 2022 17:38:50 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BCE3A6B007E; Tue, 22 Mar 2022 17:38:50 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A6F216B0080; Tue, 22 Mar 2022 17:38:50 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.28]) by kanga.kvack.org (Postfix) with ESMTP id 96B146B007B for ; Tue, 22 Mar 2022 17:38:50 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 75E1D61C52 for ; Tue, 22 Mar 2022 21:38:50 +0000 (UTC) X-FDA: 79273337220.02.C5BB521 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf10.hostedemail.com (Postfix) with ESMTP id C90F8C003C for ; Tue, 22 Mar 2022 21:38:49 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 946E6B81D9E; Tue, 22 Mar 2022 21:38:47 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 39595C340EE; Tue, 22 Mar 2022 21:38:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985126; bh=udg5BR11vjoTFtcConUlcREZ0823CGBWdFQRsN7az4U=; h=Date:To:From:In-Reply-To:Subject:From; b=spC0z75MROv6HHpmpkEr/D2aNfHN9QN/+14nrw5QamBCEaY+Qm8YtPGCxthRqFMrV ouj2m2ZG+HqMcQe2xyMe2efXxdO8IoXyMjXFS+CWpet0FsXP9hHz+wIbXxqYZxX5E6 mTL7D1rr22PNNiEg0ydlLHyAcREPZpA9+Is5dRys= Date: Tue, 22 Mar 2022 14:38:45 -0700 To: piaojun@huawei.com,mark@fasheh.com,junxiao.bi@oracle.com,joseph.qi@linux.alibaba.com,jlbec@evilplan.org,ghe@suse.com,gechangwei@live.cn,hongnan.li@linux.alibaba.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 005/227] fs/ocfs2: fix comments mentioning i_mutex Message-Id: <20220322213846.39595C340EE@smtp.kernel.org> X-Stat-Signature: 9pkjiyop93si9jeu35r65e63wodzdsby Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=spC0z75M; dmarc=none; spf=pass (imf10.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Rspam-User: X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: C90F8C003C X-HE-Tag: 1647985129-832289 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: hongnanli Subject: fs/ocfs2: fix comments mentioning i_mutex inode->i_mutex has been replaced with inode->i_rwsem long ago. Fix comments still mentioning i_mutex. Link: https://lkml.kernel.org/r/20220214031314.100094-1-hongnan.li@linux.alibaba.com Signed-off-by: hongnanli Acked-by: Joseph Qi Cc: Mark Fasheh Cc: Joel Becker Cc: Junxiao Bi Cc: Changwei Ge Cc: Gang He Cc: Jun Piao Signed-off-by: Andrew Morton --- fs/ocfs2/alloc.c | 2 +- fs/ocfs2/aops.c | 2 +- fs/ocfs2/cluster/nodemanager.c | 2 +- fs/ocfs2/dir.c | 4 ++-- fs/ocfs2/file.c | 4 ++-- fs/ocfs2/inode.c | 2 +- fs/ocfs2/localalloc.c | 6 +++--- fs/ocfs2/namei.c | 2 +- fs/ocfs2/ocfs2.h | 4 ++-- fs/ocfs2/quota_global.c | 2 +- fs/ocfs2/xattr.c | 2 +- 11 files changed, 16 insertions(+), 16 deletions(-) --- a/fs/ocfs2/alloc.c~fs-ocfs2-fix-comments-mentioning-i_mutex +++ a/fs/ocfs2/alloc.c @@ -5981,7 +5981,7 @@ bail: return status; } -/* Expects you to already be holding tl_inode->i_mutex */ +/* Expects you to already be holding tl_inode->i_rwsem */ int __ocfs2_flush_truncate_log(struct ocfs2_super *osb) { int status; --- a/fs/ocfs2/aops.c~fs-ocfs2-fix-comments-mentioning-i_mutex +++ a/fs/ocfs2/aops.c @@ -2311,7 +2311,7 @@ static int ocfs2_dio_end_io_write(struct down_write(&oi->ip_alloc_sem); - /* Delete orphan before acquire i_mutex. */ + /* Delete orphan before acquire i_rwsem. */ if (dwc->dw_orphaned) { BUG_ON(dwc->dw_writer_pid != task_pid_nr(current)); --- a/fs/ocfs2/cluster/nodemanager.c~fs-ocfs2-fix-comments-mentioning-i_mutex +++ a/fs/ocfs2/cluster/nodemanager.c @@ -689,7 +689,7 @@ static struct config_group *o2nm_cluster struct o2nm_node_group *ns = NULL; struct config_group *o2hb_group = NULL, *ret = NULL; - /* this runs under the parent dir's i_mutex; there can be only + /* this runs under the parent dir's i_rwsem; there can be only * one caller in here at a time */ if (o2nm_single_cluster) return ERR_PTR(-ENOSPC); --- a/fs/ocfs2/dir.c~fs-ocfs2-fix-comments-mentioning-i_mutex +++ a/fs/ocfs2/dir.c @@ -1957,7 +1957,7 @@ bail_nolock: } /* - * NOTE: this should always be called with parent dir i_mutex taken. + * NOTE: this should always be called with parent dir i_rwsem taken. */ int ocfs2_find_files_on_disk(const char *name, int namelen, @@ -2003,7 +2003,7 @@ int ocfs2_lookup_ino_from_name(struct in * Return 0 if the name does not exist * Return -EEXIST if the directory contains the name * - * Callers should have i_mutex + a cluster lock on dir + * Callers should have i_rwsem + a cluster lock on dir */ int ocfs2_check_dir_for_entry(struct inode *dir, const char *name, --- a/fs/ocfs2/file.c~fs-ocfs2-fix-comments-mentioning-i_mutex +++ a/fs/ocfs2/file.c @@ -270,7 +270,7 @@ int ocfs2_update_inode_atime(struct inod /* * Don't use ocfs2_mark_inode_dirty() here as we don't always - * have i_mutex to guard against concurrent changes to other + * have i_rwsem to guard against concurrent changes to other * inode fields. */ inode->i_atime = current_time(inode); @@ -1065,7 +1065,7 @@ static int ocfs2_extend_file(struct inod /* * The alloc sem blocks people in read/write from reading our * allocation until we're done changing it. We depend on - * i_mutex to block other extend/truncate calls while we're + * i_rwsem to block other extend/truncate calls while we're * here. We even have to hold it for sparse files because there * might be some tail zeroing. */ --- a/fs/ocfs2/inode.c~fs-ocfs2-fix-comments-mentioning-i_mutex +++ a/fs/ocfs2/inode.c @@ -713,7 +713,7 @@ bail: /* * Serialize with orphan dir recovery. If the process doing * recovery on this orphan dir does an iget() with the dir - * i_mutex held, we'll deadlock here. Instead we detect this + * i_rwsem held, we'll deadlock here. Instead we detect this * and exit early - recovery will wipe this inode for us. */ static int ocfs2_check_orphan_recovery_state(struct ocfs2_super *osb, --- a/fs/ocfs2/localalloc.c~fs-ocfs2-fix-comments-mentioning-i_mutex +++ a/fs/ocfs2/localalloc.c @@ -606,7 +606,7 @@ out: /* * make sure we've got at least bits_wanted contiguous bits in the - * local alloc. You lose them when you drop i_mutex. + * local alloc. You lose them when you drop i_rwsem. * * We will add ourselves to the transaction passed in, but may start * our own in order to shift windows. @@ -636,7 +636,7 @@ int ocfs2_reserve_local_alloc_bits(struc /* * We must double check state and allocator bits because - * another process may have changed them while holding i_mutex. + * another process may have changed them while holding i_rwsem. */ spin_lock(&osb->osb_lock); if (!ocfs2_la_state_enabled(osb) || @@ -1029,7 +1029,7 @@ enum ocfs2_la_event { /* * Given an event, calculate the size of our next local alloc window. * - * This should always be called under i_mutex of the local alloc inode + * This should always be called under i_rwsem of the local alloc inode * so that local alloc disabling doesn't race with processes trying to * use the allocator. * --- a/fs/ocfs2/namei.c~fs-ocfs2-fix-comments-mentioning-i_mutex +++ a/fs/ocfs2/namei.c @@ -476,7 +476,7 @@ leave: ocfs2_free_alloc_context(meta_ac); /* - * We should call iput after the i_mutex of the bitmap been + * We should call iput after the i_rwsem of the bitmap been * unlocked in ocfs2_free_alloc_context, or the * ocfs2_delete_inode will mutex_lock again. */ --- a/fs/ocfs2/ocfs2.h~fs-ocfs2-fix-comments-mentioning-i_mutex +++ a/fs/ocfs2/ocfs2.h @@ -355,7 +355,7 @@ struct ocfs2_super struct delayed_work la_enable_wq; /* - * Must hold local alloc i_mutex and osb->osb_lock to change + * Must hold local alloc i_rwsem and osb->osb_lock to change * local_alloc_bits. Reads can be done under either lock. */ unsigned int local_alloc_bits; @@ -430,7 +430,7 @@ struct ocfs2_super atomic_t osb_tl_disable; /* * How many clusters in our truncate log. - * It must be protected by osb_tl_inode->i_mutex. + * It must be protected by osb_tl_inode->i_rwsem. */ unsigned int truncated_clusters; --- a/fs/ocfs2/quota_global.c~fs-ocfs2-fix-comments-mentioning-i_mutex +++ a/fs/ocfs2/quota_global.c @@ -36,7 +36,7 @@ * should be obeyed by all the functions: * - any write of quota structure (either to local or global file) is protected * by dqio_sem or dquot->dq_lock. - * - any modification of global quota file holds inode cluster lock, i_mutex, + * - any modification of global quota file holds inode cluster lock, i_rwsem, * and ip_alloc_sem of the global quota file (achieved by * ocfs2_lock_global_qf). It also has to hold qinfo_lock. * - an allocation of new blocks for local quota file is protected by --- a/fs/ocfs2/xattr.c~fs-ocfs2-fix-comments-mentioning-i_mutex +++ a/fs/ocfs2/xattr.c @@ -7205,7 +7205,7 @@ out: * Used for reflink a non-preserve-security file. * * It uses common api like ocfs2_xattr_set, so the caller - * must not hold any lock expect i_mutex. + * must not hold any lock expect i_rwsem. */ int ocfs2_init_security_and_acl(struct inode *dir, struct inode *inode, From patchwork Tue Mar 22 21:38:48 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789048 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7ECBEC43217 for ; Tue, 22 Mar 2022 21:38:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 195DF6B007E; Tue, 22 Mar 2022 17:38:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1466D6B0080; Tue, 22 Mar 2022 17:38:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F292D6B0081; Tue, 22 Mar 2022 17:38:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.26]) by kanga.kvack.org (Postfix) with ESMTP id E33836B007E for ; Tue, 22 Mar 2022 17:38:52 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id B1E5522B19 for ; Tue, 22 Mar 2022 21:38:52 +0000 (UTC) X-FDA: 79273337304.08.2114191 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf07.hostedemail.com (Postfix) with ESMTP id B6B284002C for ; Tue, 22 Mar 2022 21:38:51 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 94835B81DB0; Tue, 22 Mar 2022 21:38:50 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 48672C340EC; Tue, 22 Mar 2022 21:38:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985129; bh=/819+sq3ahm8VUreO7KPVHi7/M9kvrngEM4MD0yp4zU=; h=Date:To:From:In-Reply-To:Subject:From; b=iu1fXYeqquvwjQCK7+6PMR6Ep6Pt5lo2F4eddZ4j15AR4I1y/N8Fw3Fqvc4pCT3Tc 8FOTT1k0HtY2+NwEfFcaCvLVOyHqQXE9Ppgb1Qv7q9KWdIJch4Mo5mpxu0ZcS9cc8c afwf1RCLqF1HTZIgUSS6v5O+X32zSpZ9MYNYUmfc= Date: Tue, 22 Mar 2022 14:38:48 -0700 To: trond.myklebust@hammerspace.com,philipp.reisner@linbit.com,paolo.valente@linaro.org,miklos@szeredi.hu,lars.ellenberg@linbit.com,konishi.ryusuke@gmail.com,jlayton@kernel.org,jaegeuk@kernel.org,jack@suse.cz,idryomov@gmail.com,fengguang.wu@intel.com,djwong@kernel.org,chao@kernel.org,axboe@kernel.dk,Anna.Schumaker@Netapp.com,neilb@suse.de,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 006/227] doc: convert 'subsection' to 'section' in gfp.h Message-Id: <20220322213849.48672C340EC@smtp.kernel.org> X-Stat-Signature: gh5s14n6qtep381qp7hye38uuay3mtse Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=iu1fXYeq; spf=pass (imf07.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: B6B284002C X-HE-Tag: 1647985131-634341 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: NeilBrown Subject: doc: convert 'subsection' to 'section' in gfp.h Patch series "Remove remaining parts of congestion tracking code", v2. This patch (of 11): Various DOC: sections in gfp.h have subsection headers (~~~) but the place where they are included in mm-api.rst does not have section, only chapters. So convert to section headers (---) to avoid confusion. Specifically if sections are added later in mm-api.rst, an error results. Link: https://lkml.kernel.org/r/164549971112.9187.16871723439770288255.stgit@noble.brown Link: https://lkml.kernel.org/r/164549983733.9187.17894407453436115822.stgit@noble.brown Signed-off-by: NeilBrown Cc: Jan Kara Cc: Wu Fengguang Cc: Jaegeuk Kim Cc: Chao Yu Cc: Jeff Layton Cc: Ilya Dryomov Cc: Miklos Szeredi Cc: Trond Myklebust Cc: Anna Schumaker Cc: Ryusuke Konishi Cc: Darrick J. Wong Cc: Philipp Reisner Cc: Lars Ellenberg Cc: Paolo Valente Cc: Jens Axboe Signed-off-by: Andrew Morton --- include/linux/gfp.h | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) --- a/include/linux/gfp.h~doc-convert-subsection-to-section-in-gfph +++ a/include/linux/gfp.h @@ -79,7 +79,7 @@ struct vm_area_struct; * DOC: Page mobility and placement hints * * Page mobility and placement hints - * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + * --------------------------------- * * These flags provide hints about how mobile the page is. Pages with similar * mobility are placed within the same pageblocks to minimise problems due @@ -112,7 +112,7 @@ struct vm_area_struct; * DOC: Watermark modifiers * * Watermark modifiers -- controls access to emergency reserves - * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + * ------------------------------------------------------------ * * %__GFP_HIGH indicates that the caller is high-priority and that granting * the request is necessary before the system can make forward progress. @@ -144,7 +144,7 @@ struct vm_area_struct; * DOC: Reclaim modifiers * * Reclaim modifiers - * ~~~~~~~~~~~~~~~~~ + * ----------------- * Please note that all the following flags are only applicable to sleepable * allocations (e.g. %GFP_NOWAIT and %GFP_ATOMIC will ignore them). * @@ -224,7 +224,7 @@ struct vm_area_struct; * DOC: Action modifiers * * Action modifiers - * ~~~~~~~~~~~~~~~~ + * ---------------- * * %__GFP_NOWARN suppresses allocation failure reports. * @@ -256,7 +256,7 @@ struct vm_area_struct; * DOC: Useful GFP flag combinations * * Useful GFP flag combinations - * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + * ---------------------------- * * Useful GFP flag combinations that are commonly used. It is recommended * that subsystems start with one of these combinations and then set/clear From patchwork Tue Mar 22 21:38:51 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789049 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1B145C433FE for ; Tue, 22 Mar 2022 21:38:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A968B6B0080; Tue, 22 Mar 2022 17:38:54 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A6EA16B0081; Tue, 22 Mar 2022 17:38:54 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 899806B0082; Tue, 22 Mar 2022 17:38:54 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.26]) by kanga.kvack.org (Postfix) with ESMTP id 7A8206B0080 for ; Tue, 22 Mar 2022 17:38:54 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 4D9BE21AAF for ; Tue, 22 Mar 2022 21:38:54 +0000 (UTC) X-FDA: 79273337388.05.590C029 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf31.hostedemail.com (Postfix) with ESMTP id C204920026 for ; Tue, 22 Mar 2022 21:38:53 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 193AC6174B; Tue, 22 Mar 2022 21:38:53 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 702A4C340F2; Tue, 22 Mar 2022 21:38:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985132; bh=rooFomz/omA8d5YyPkT0gUeLFmGrSuG1VjLbebu4kNM=; h=Date:To:From:In-Reply-To:Subject:From; b=M6IMM98cOhmgYR61ze67frdPPy4snWLGhENOL+z7mHTjxqWRZ1PPBLbDzE8K6mDmY /j6UOz5ylK6NABJJj85wKfPY5eh1YcoERAsFzlQiVrquFI4hd1YFGJOOxUyamJsccJ TH4vPF80xeJta89xJacOe7gBk8vUuvHsbGaTEJOk= Date: Tue, 22 Mar 2022 14:38:51 -0700 To: trond.myklebust@hammerspace.com,philipp.reisner@linbit.com,paolo.valente@linaro.org,miklos@szeredi.hu,lars.ellenberg@linbit.com,konishi.ryusuke@gmail.com,jlayton@kernel.org,jaegeuk@kernel.org,jack@suse.cz,idryomov@gmail.com,fengguang.wu@intel.com,djwong@kernel.org,chao@kernel.org,axboe@kernel.dk,Anna.Schumaker@Netapp.com,neilb@suse.de,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 007/227] mm: document and polish read-ahead code Message-Id: <20220322213852.702A4C340F2@smtp.kernel.org> X-Rspam-User: Authentication-Results: imf31.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=M6IMM98c; spf=pass (imf31.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: C204920026 X-Stat-Signature: m9hwepsjkk6bfggsjnca5arsu53hji3q X-HE-Tag: 1647985133-820902 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: NeilBrown Subject: mm: document and polish read-ahead code Add some "big-picture" documentation for read-ahead and polish the code to make it fit this documentation. The meaning of ->async_size is clarified to match its name. i.e. Any request to ->readahead() has a sync part and an async part. The caller will wait for the sync pages to complete, but will not wait for the async pages. The first async page is still marked PG_readahead Note that the current function names page_cache_sync_ra() and page_cache_async_ra() are misleading. All ra request are partly sync and partly async, so either part can be empty. A page_cache_sync_ra() request will usually set ->async_size non-zero, implying it is not all synchronous. When a non-zero req_count is passed to page_cache_async_ra(), the implication is that some prefix of the request is synchronous, though the calculation made there is incorrect - I haven't tried to fix it. Link: https://lkml.kernel.org/r/164549983734.9187.11586890887006601405.stgit@noble.brown Signed-off-by: NeilBrown Cc: Anna Schumaker Cc: Chao Yu Cc: Darrick J. Wong Cc: Ilya Dryomov Cc: Jaegeuk Kim Cc: Jan Kara Cc: Jeff Layton Cc: Jens Axboe Cc: Lars Ellenberg Cc: Miklos Szeredi Cc: Paolo Valente Cc: Philipp Reisner Cc: Ryusuke Konishi Cc: Trond Myklebust Cc: Wu Fengguang Signed-off-by: Andrew Morton --- Documentation/core-api/mm-api.rst | 19 ++++- Documentation/filesystems/vfs.rst | 16 ++-- include/linux/fs.h | 9 +- mm/readahead.c | 99 ++++++++++++++++++++++++++++ 4 files changed, 133 insertions(+), 10 deletions(-) --- a/Documentation/core-api/mm-api.rst~mm-document-and-polish-read-ahead-code +++ a/Documentation/core-api/mm-api.rst @@ -58,15 +58,30 @@ Virtually Contiguous Mappings File Mapping and Page Cache =========================== -.. kernel-doc:: mm/readahead.c - :export: +Filemap +------- .. kernel-doc:: mm/filemap.c :export: +Readahead +--------- + +.. kernel-doc:: mm/readahead.c + :doc: Readahead Overview + +.. kernel-doc:: mm/readahead.c + :export: + +Writeback +--------- + .. kernel-doc:: mm/page-writeback.c :export: +Truncate +-------- + .. kernel-doc:: mm/truncate.c :export: --- a/Documentation/filesystems/vfs.rst~mm-document-and-polish-read-ahead-code +++ a/Documentation/filesystems/vfs.rst @@ -806,12 +806,16 @@ cache in your filesystem. The following object. The pages are consecutive in the page cache and are locked. The implementation should decrement the page refcount after starting I/O on each page. Usually the page will be - unlocked by the I/O completion handler. If the filesystem decides - to stop attempting I/O before reaching the end of the readahead - window, it can simply return. The caller will decrement the page - refcount and unlock the remaining pages for you. Set PageUptodate - if the I/O completes successfully. Setting PageError on any page - will be ignored; simply unlock the page if an I/O error occurs. + unlocked by the I/O completion handler. The set of pages are + divided into some sync pages followed by some async pages, + rac->ra->async_size gives the number of async pages. The + filesystem should attempt to read all sync pages but may decide + to stop once it reaches the async pages. If it does decide to + stop attempting I/O, it can simply return. The caller will + remove the remaining pages from the address space, unlock them + and decrement the page refcount. Set PageUptodate if the I/O + completes successfully. Setting PageError on any page will be + ignored; simply unlock the page if an I/O error occurs. ``readpages`` called by the VM to read pages associated with the address_space --- a/include/linux/fs.h~mm-document-and-polish-read-ahead-code +++ a/include/linux/fs.h @@ -930,10 +930,15 @@ struct fown_struct { * struct file_ra_state - Track a file's readahead state. * @start: Where the most recent readahead started. * @size: Number of pages read in the most recent readahead. - * @async_size: Start next readahead when this many pages are left. - * @ra_pages: Maximum size of a readahead request. + * @async_size: Numer of pages that were/are not needed immediately + * and so were/are genuinely "ahead". Start next readahead when + * the first of these pages is accessed. + * @ra_pages: Maximum size of a readahead request, copied from the bdi. * @mmap_miss: How many mmap accesses missed in the page cache. * @prev_pos: The last byte in the most recent read request. + * + * When this structure is passed to ->readahead(), the "most recent" + * readahead means the current readahead. */ struct file_ra_state { pgoff_t start; --- a/mm/readahead.c~mm-document-and-polish-read-ahead-code +++ a/mm/readahead.c @@ -8,6 +8,105 @@ * Initial version. */ +/** + * DOC: Readahead Overview + * + * Readahead is used to read content into the page cache before it is + * explicitly requested by the application. Readahead only ever + * attempts to read pages that are not yet in the page cache. If a + * page is present but not up-to-date, readahead will not try to read + * it. In that case a simple ->readpage() will be requested. + * + * Readahead is triggered when an application read request (whether a + * systemcall or a page fault) finds that the requested page is not in + * the page cache, or that it is in the page cache and has the + * %PG_readahead flag set. This flag indicates that the page was loaded + * as part of a previous read-ahead request and now that it has been + * accessed, it is time for the next read-ahead. + * + * Each readahead request is partly synchronous read, and partly async + * read-ahead. This is reflected in the struct file_ra_state which + * contains ->size being to total number of pages, and ->async_size + * which is the number of pages in the async section. The first page in + * this async section will have %PG_readahead set as a trigger for a + * subsequent read ahead. Once a series of sequential reads has been + * established, there should be no need for a synchronous component and + * all read ahead request will be fully asynchronous. + * + * When either of the triggers causes a readahead, three numbers need to + * be determined: the start of the region, the size of the region, and + * the size of the async tail. + * + * The start of the region is simply the first page address at or after + * the accessed address, which is not currently populated in the page + * cache. This is found with a simple search in the page cache. + * + * The size of the async tail is determined by subtracting the size that + * was explicitly requested from the determined request size, unless + * this would be less than zero - then zero is used. NOTE THIS + * CALCULATION IS WRONG WHEN THE START OF THE REGION IS NOT THE ACCESSED + * PAGE. + * + * The size of the region is normally determined from the size of the + * previous readahead which loaded the preceding pages. This may be + * discovered from the struct file_ra_state for simple sequential reads, + * or from examining the state of the page cache when multiple + * sequential reads are interleaved. Specifically: where the readahead + * was triggered by the %PG_readahead flag, the size of the previous + * readahead is assumed to be the number of pages from the triggering + * page to the start of the new readahead. In these cases, the size of + * the previous readahead is scaled, often doubled, for the new + * readahead, though see get_next_ra_size() for details. + * + * If the size of the previous read cannot be determined, the number of + * preceding pages in the page cache is used to estimate the size of + * a previous read. This estimate could easily be misled by random + * reads being coincidentally adjacent, so it is ignored unless it is + * larger than the current request, and it is not scaled up, unless it + * is at the start of file. + * + * In general read ahead is accelerated at the start of the file, as + * reads from there are often sequential. There are other minor + * adjustments to the read ahead size in various special cases and these + * are best discovered by reading the code. + * + * The above calculation determines the readahead, to which any requested + * read size may be added. + * + * Readahead requests are sent to the filesystem using the ->readahead() + * address space operation, for which mpage_readahead() is a canonical + * implementation. ->readahead() should normally initiate reads on all + * pages, but may fail to read any or all pages without causing an IO + * error. The page cache reading code will issue a ->readpage() request + * for any page which ->readahead() does not provided, and only an error + * from this will be final. + * + * ->readahead() will generally call readahead_page() repeatedly to get + * each page from those prepared for read ahead. It may fail to read a + * page by: + * + * * not calling readahead_page() sufficiently many times, effectively + * ignoring some pages, as might be appropriate if the path to + * storage is congested. + * + * * failing to actually submit a read request for a given page, + * possibly due to insufficient resources, or + * + * * getting an error during subsequent processing of a request. + * + * In the last two cases, the page should be unlocked to indicate that + * the read attempt has failed. In the first case the page will be + * unlocked by the caller. + * + * Those pages not in the final ``async_size`` of the request should be + * considered to be important and ->readahead() should not fail them due + * to congestion or temporary resource unavailability, but should wait + * for necessary resources (e.g. memory or indexing information) to + * become available. Pages in the final ``async_size`` may be + * considered less urgent and failure to read them is more acceptable. + * They will eventually be read individually using ->readpage(). + */ + #include #include #include From patchwork Tue Mar 22 21:38:54 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789050 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9B4D6C433F5 for ; Tue, 22 Mar 2022 21:38:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 27FEF6B0072; Tue, 22 Mar 2022 17:38:59 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2581B6B0081; Tue, 22 Mar 2022 17:38:59 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0F82B6B0082; Tue, 22 Mar 2022 17:38:59 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0067.hostedemail.com [216.40.44.67]) by kanga.kvack.org (Postfix) with ESMTP id 010FA6B0072 for ; Tue, 22 Mar 2022 17:38:58 -0400 (EDT) Received: from smtpin19.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id AA62EA565A for ; Tue, 22 Mar 2022 21:38:58 +0000 (UTC) X-FDA: 79273337556.19.972CA7D Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf23.hostedemail.com (Postfix) with ESMTP id 24332140034 for ; Tue, 22 Mar 2022 21:38:57 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id D82E1B81DB3; Tue, 22 Mar 2022 21:38:56 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 86FBEC340EE; Tue, 22 Mar 2022 21:38:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985135; bh=DDf9lcxfMPm5yFDfS0T2nBpsxu+qWkoo1Qjc4KrD/PQ=; h=Date:To:From:In-Reply-To:Subject:From; b=K2yyK5sDaHg696M5bATsqsf1lngM7yb6z7CpgjP2esEJFL3i6AnEPo2qBrYbdUqCY r5h7VzA3PCUslCUtWDYtQw+Y/qQc5VjJ8tLhFpRDQRxdHnvi7pAtZrrt5HyaZZsrpn sLXoXwHS3Th8fde7xP/+kt0uidhCTCaMV0AMoQeg= Date: Tue, 22 Mar 2022 14:38:54 -0700 To: trond.myklebust@hammerspace.com,philipp.reisner@linbit.com,paolo.valente@linaro.org,miklos@szeredi.hu,lars.ellenberg@linbit.com,konishi.ryusuke@gmail.com,jlayton@kernel.org,jaegeuk@kernel.org,jack@suse.cz,idryomov@gmail.com,fengguang.wu@intel.com,djwong@kernel.org,chao@kernel.org,axboe@kernel.dk,Anna.Schumaker@Netapp.com,neilb@suse.de,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 008/227] mm: improve cleanup when ->readpages doesn't process all pages Message-Id: <20220322213855.86FBEC340EE@smtp.kernel.org> X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 24332140034 X-Rspam-User: Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=K2yyK5sD; dmarc=none; spf=pass (imf23.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Stat-Signature: 8b1t4hyfyckgyrdhd6kwxz6xhqbobe46 X-HE-Tag: 1647985137-632793 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: NeilBrown Subject: mm: improve cleanup when ->readpages doesn't process all pages If ->readpages doesn't process all the pages, then it is best to act as though they weren't requested so that a subsequent readahead can try again. So: - remove any 'ahead' pages from the page cache so they can be loaded with ->readahead() rather then multiple ->read()s - update the file_ra_state to reflect the reads that were actually submitted. This allows ->readpages() to abort early due e.g. to congestion, which will then allow us to remove the inode_read_congested() test from page_Cache_async_ra(). Link: https://lkml.kernel.org/r/164549983736.9187.16755913785880819183.stgit@noble.brown Signed-off-by: NeilBrown Cc: Anna Schumaker Cc: Chao Yu Cc: Darrick J. Wong Cc: Ilya Dryomov Cc: Jaegeuk Kim Cc: Jan Kara Cc: Jeff Layton Cc: Jens Axboe Cc: Lars Ellenberg Cc: Miklos Szeredi Cc: Paolo Valente Cc: Philipp Reisner Cc: Ryusuke Konishi Cc: Trond Myklebust Cc: Wu Fengguang Signed-off-by: Andrew Morton --- mm/readahead.c | 19 +++++++++++++++++-- 1 file changed, 17 insertions(+), 2 deletions(-) --- a/mm/readahead.c~mm-improve-cleanup-when-readpages-doesnt-process-all-pages +++ a/mm/readahead.c @@ -104,7 +104,13 @@ * for necessary resources (e.g. memory or indexing information) to * become available. Pages in the final ``async_size`` may be * considered less urgent and failure to read them is more acceptable. - * They will eventually be read individually using ->readpage(). + * In this case it is best to use delete_from_page_cache() to remove the + * pages from the page cache as is automatically done for pages that + * were not fetched with readahead_page(). This will allow a + * subsequent synchronous read ahead request to try them again. If they + * are left in the page cache, then they will be read individually using + * ->readpage(). + * */ #include @@ -226,8 +232,17 @@ static void read_pages(struct readahead_ if (aops->readahead) { aops->readahead(rac); - /* Clean up the remaining pages */ + /* + * Clean up the remaining pages. The sizes in ->ra + * maybe be used to size next read-ahead, so make sure + * they accurately reflect what happened. + */ while ((page = readahead_page(rac))) { + rac->ra->size -= 1; + if (rac->ra->async_size > 0) { + rac->ra->async_size -= 1; + delete_from_page_cache(page); + } unlock_page(page); put_page(page); } From patchwork Tue Mar 22 21:38:58 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789051 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A549AC433F5 for ; Tue, 22 Mar 2022 21:39:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3B89C6B0081; Tue, 22 Mar 2022 17:39:01 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 368AD6B0082; Tue, 22 Mar 2022 17:39:01 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 20B1C6B0083; Tue, 22 Mar 2022 17:39:01 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0125.hostedemail.com [216.40.44.125]) by kanga.kvack.org (Postfix) with ESMTP id 0E59F6B0081 for ; Tue, 22 Mar 2022 17:39:01 -0400 (EDT) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id BE6648249980 for ; Tue, 22 Mar 2022 21:39:00 +0000 (UTC) X-FDA: 79273337640.26.DDCE4EC Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf29.hostedemail.com (Postfix) with ESMTP id E2DF812002A for ; Tue, 22 Mar 2022 21:38:59 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 4DF396174C; Tue, 22 Mar 2022 21:38:59 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id A5995C340EC; Tue, 22 Mar 2022 21:38:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985138; bh=a7U0suWjCQ7jownKTM659RuFNhFpv69sAQUyL0Jj2vY=; h=Date:To:From:In-Reply-To:Subject:From; b=pgfXrItMGjY4jIfxuBb6wDIJA7YstLg9pPlD0W9AxPfOUF2JkdsF8O5/sX0MsrNy3 iiiH4Ub06Q4/wkVPI09DtgQTPNqIcLMx2vwQLHHYObuhKSS27UrSHL+R9oHv/GvA/3 +WGh1Q1PQZmpGzq21lYOz+NsFvxHdkeDAuLMqe4E= Date: Tue, 22 Mar 2022 14:38:58 -0700 To: trond.myklebust@hammerspace.com,philipp.reisner@linbit.com,paolo.valente@linaro.org,miklos@szeredi.hu,lars.ellenberg@linbit.com,konishi.ryusuke@gmail.com,jlayton@kernel.org,jaegeuk@kernel.org,jack@suse.cz,idryomov@gmail.com,fengguang.wu@intel.com,djwong@kernel.org,chao@kernel.org,axboe@kernel.dk,Anna.Schumaker@Netapp.com,neilb@suse.de,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 009/227] fuse: remove reliance on bdi congestion Message-Id: <20220322213858.A5995C340EC@smtp.kernel.org> X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: E2DF812002A X-Stat-Signature: cn7zocwkre4k531d7tqg7mx9e3ean9j4 X-Rspam-User: Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=pgfXrItM; dmarc=none; spf=pass (imf29.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-HE-Tag: 1647985139-407796 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: NeilBrown Subject: fuse: remove reliance on bdi congestion The bdi congestion tracking in not widely used and will be removed. Fuse is one of a small number of filesystems that uses it, setting both the sync (read) and async (write) congestion flags at what it determines are appropriate times. The only remaining effect of the sync flag is to cause read-ahead to be skipped. The only remaining effect of the async flag is to cause (some) WB_SYNC_NONE writes to be skipped. So instead of setting the flags, change: - .readahead to stop when it has submitted all non-async pages for read. - .writepages to do nothing if WB_SYNC_NONE and the flag would be set - .writepage to return AOP_WRITEPAGE_ACTIVATE if WB_SYNC_NONE and the flag would be set. The writepages change causes a behavioural change in that pageout() can now return PAGE_ACTIVATE instead of PAGE_KEEP, so SetPageActive() will be called on the page which (I think) will further delay the next attempt at writeout. This might be a good thing. Link: https://lkml.kernel.org/r/164549983737.9187.2627117501000365074.stgit@noble.brown Signed-off-by: NeilBrown Cc: Anna Schumaker Cc: Chao Yu Cc: Darrick J. Wong Cc: Ilya Dryomov Cc: Jaegeuk Kim Cc: Jan Kara Cc: Jeff Layton Cc: Jens Axboe Cc: Lars Ellenberg Cc: Miklos Szeredi Cc: Paolo Valente Cc: Philipp Reisner Cc: Ryusuke Konishi Cc: Trond Myklebust Cc: Wu Fengguang Signed-off-by: Andrew Morton --- fs/fuse/control.c | 17 ----------------- fs/fuse/dev.c | 8 -------- fs/fuse/file.c | 17 +++++++++++++++++ 3 files changed, 17 insertions(+), 25 deletions(-) --- a/fs/fuse/control.c~fuse-remove-reliance-on-bdi-congestion +++ a/fs/fuse/control.c @@ -164,7 +164,6 @@ static ssize_t fuse_conn_congestion_thre { unsigned val; struct fuse_conn *fc; - struct fuse_mount *fm; ssize_t ret; ret = fuse_conn_limit_write(file, buf, count, ppos, &val, @@ -178,22 +177,6 @@ static ssize_t fuse_conn_congestion_thre down_read(&fc->killsb); spin_lock(&fc->bg_lock); fc->congestion_threshold = val; - - /* - * Get any fuse_mount belonging to this fuse_conn; s_bdi is - * shared between all of them - */ - - if (!list_empty(&fc->mounts)) { - fm = list_first_entry(&fc->mounts, struct fuse_mount, fc_entry); - if (fc->num_background < fc->congestion_threshold) { - clear_bdi_congested(fm->sb->s_bdi, BLK_RW_SYNC); - clear_bdi_congested(fm->sb->s_bdi, BLK_RW_ASYNC); - } else { - set_bdi_congested(fm->sb->s_bdi, BLK_RW_SYNC); - set_bdi_congested(fm->sb->s_bdi, BLK_RW_ASYNC); - } - } spin_unlock(&fc->bg_lock); up_read(&fc->killsb); fuse_conn_put(fc); --- a/fs/fuse/dev.c~fuse-remove-reliance-on-bdi-congestion +++ a/fs/fuse/dev.c @@ -315,10 +315,6 @@ void fuse_request_end(struct fuse_req *r wake_up(&fc->blocked_waitq); } - if (fc->num_background == fc->congestion_threshold && fm->sb) { - clear_bdi_congested(fm->sb->s_bdi, BLK_RW_SYNC); - clear_bdi_congested(fm->sb->s_bdi, BLK_RW_ASYNC); - } fc->num_background--; fc->active_background--; flush_bg_queue(fc); @@ -540,10 +536,6 @@ static bool fuse_request_queue_backgroun fc->num_background++; if (fc->num_background == fc->max_background) fc->blocked = 1; - if (fc->num_background == fc->congestion_threshold && fm->sb) { - set_bdi_congested(fm->sb->s_bdi, BLK_RW_SYNC); - set_bdi_congested(fm->sb->s_bdi, BLK_RW_ASYNC); - } list_add_tail(&req->list, &fc->bg_queue); flush_bg_queue(fc); queued = true; --- a/fs/fuse/file.c~fuse-remove-reliance-on-bdi-congestion +++ a/fs/fuse/file.c @@ -966,6 +966,14 @@ static void fuse_readahead(struct readah struct fuse_io_args *ia; struct fuse_args_pages *ap; + if (fc->num_background >= fc->congestion_threshold && + rac->ra->async_size >= readahead_count(rac)) + /* + * Congested and only async pages left, so skip the + * rest. + */ + break; + nr_pages = readahead_count(rac) - nr_pages; if (nr_pages > max_pages) nr_pages = max_pages; @@ -1959,6 +1967,7 @@ err: static int fuse_writepage(struct page *page, struct writeback_control *wbc) { + struct fuse_conn *fc = get_fuse_conn(page->mapping->host); int err; if (fuse_page_is_writeback(page->mapping->host, page->index)) { @@ -1974,6 +1983,10 @@ static int fuse_writepage(struct page *p return 0; } + if (wbc->sync_mode == WB_SYNC_NONE && + fc->num_background >= fc->congestion_threshold) + return AOP_WRITEPAGE_ACTIVATE; + err = fuse_writepage_locked(page); unlock_page(page); @@ -2227,6 +2240,10 @@ static int fuse_writepages(struct addres if (fuse_is_bad(inode)) goto out; + if (wbc->sync_mode == WB_SYNC_NONE && + fc->num_background >= fc->congestion_threshold) + return 0; + data.inode = inode; data.wpa = NULL; data.ff = NULL; From patchwork Tue Mar 22 21:39:01 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789052 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C6BB4C433F5 for ; Tue, 22 Mar 2022 21:39:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5B5106B0082; Tue, 22 Mar 2022 17:39:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 563E76B0083; Tue, 22 Mar 2022 17:39:05 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4536A6B0085; Tue, 22 Mar 2022 17:39:05 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0081.hostedemail.com [216.40.44.81]) by kanga.kvack.org (Postfix) with ESMTP id 363366B0082 for ; Tue, 22 Mar 2022 17:39:05 -0400 (EDT) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id D9FD61828AC92 for ; Tue, 22 Mar 2022 21:39:04 +0000 (UTC) X-FDA: 79273337808.18.EF2C58C Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf10.hostedemail.com (Postfix) with ESMTP id 42A71C002A for ; Tue, 22 Mar 2022 21:39:04 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 122BCB81D9E; Tue, 22 Mar 2022 21:39:03 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id C18FBC340EC; Tue, 22 Mar 2022 21:39:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985141; bh=uWkJGJ2N+nvR3L+pGd9iSgDBQla/4Y6wR2/ld2flBDU=; h=Date:To:From:In-Reply-To:Subject:From; b=hQHwlLGVUU4B9VtwY0U3wE0dRXtmMQM8+ojPEz1gpwGGqqJZ9/04xbsiUiRKDdVJx JPj8x6NznFF4VVpQ8D/uOGg8oehPnOOIu+oDOOnbpOJ40FMm3ZSjHW07I79EjU8ERJ YmuNWH2hQhP+Zh1NlJ2mem87a8iA+QfrD0/4URrU= Date: Tue, 22 Mar 2022 14:39:01 -0700 To: trond.myklebust@hammerspace.com,philipp.reisner@linbit.com,paolo.valente@linaro.org,miklos@szeredi.hu,lars.ellenberg@linbit.com,konishi.ryusuke@gmail.com,jlayton@kernel.org,jaegeuk@kernel.org,jack@suse.cz,idryomov@gmail.com,fengguang.wu@intel.com,djwong@kernel.org,chao@kernel.org,axboe@kernel.dk,Anna.Schumaker@Netapp.com,neilb@suse.de,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 010/227] nfs: remove reliance on bdi congestion Message-Id: <20220322213901.C18FBC340EC@smtp.kernel.org> X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 42A71C002A X-Stat-Signature: aw6m4xscjsp94y4gz4rjff7z9uebsoz9 X-Rspam-User: Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=hQHwlLGV; dmarc=none; spf=pass (imf10.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-HE-Tag: 1647985144-560577 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: NeilBrown Subject: nfs: remove reliance on bdi congestion The bdi congestion tracking in not widely used and will be removed. NFS is one of a small number of filesystems that uses it, setting just the async (write) congestion flag at what it determines are appropriate times. The only remaining effect of the async flag is to cause (some) WB_SYNC_NONE writes to be skipped. So instead of setting the flag, set an internal flag and change: - .writepages to do nothing if WB_SYNC_NONE and the flag is set - .writepage to return AOP_WRITEPAGE_ACTIVATE if WB_SYNC_NONE and the flag is set. The writepages change causes a behavioural change in that pageout() can now return PAGE_ACTIVATE instead of PAGE_KEEP, so SetPageActive() will be called on the page which (I think) wil further delay the next attempt at writeout. This might be a good thing. Link: https://lkml.kernel.org/r/164549983738.9187.3972219847989393182.stgit@noble.brown Signed-off-by: NeilBrown Cc: Anna Schumaker Cc: Chao Yu Cc: Darrick J. Wong Cc: Ilya Dryomov Cc: Jaegeuk Kim Cc: Jan Kara Cc: Jeff Layton Cc: Jens Axboe Cc: Lars Ellenberg Cc: Miklos Szeredi Cc: Paolo Valente Cc: Philipp Reisner Cc: Ryusuke Konishi Cc: Trond Myklebust Cc: Wu Fengguang Signed-off-by: Andrew Morton --- fs/nfs/write.c | 14 +++++++++++--- include/linux/nfs_fs_sb.h | 1 + 2 files changed, 12 insertions(+), 3 deletions(-) --- a/fs/nfs/write.c~nfs-remove-reliance-on-bdi-congestion +++ a/fs/nfs/write.c @@ -417,7 +417,7 @@ static void nfs_set_page_writeback(struc if (atomic_long_inc_return(&nfss->writeback) > NFS_CONGESTION_ON_THRESH) - set_bdi_congested(inode_to_bdi(inode), BLK_RW_ASYNC); + nfss->write_congested = 1; } static void nfs_end_page_writeback(struct nfs_page *req) @@ -433,7 +433,7 @@ static void nfs_end_page_writeback(struc end_page_writeback(req->wb_page); if (atomic_long_dec_return(&nfss->writeback) < NFS_CONGESTION_OFF_THRESH) - clear_bdi_congested(inode_to_bdi(inode), BLK_RW_ASYNC); + nfss->write_congested = 0; } /* @@ -672,6 +672,10 @@ static int nfs_writepage_locked(struct p struct inode *inode = page_file_mapping(page)->host; int err; + if (wbc->sync_mode == WB_SYNC_NONE && + NFS_SERVER(inode)->write_congested) + return AOP_WRITEPAGE_ACTIVATE; + nfs_inc_stats(inode, NFSIOS_VFSWRITEPAGE); nfs_pageio_init_write(&pgio, inode, 0, false, &nfs_async_write_completion_ops); @@ -719,6 +723,10 @@ int nfs_writepages(struct address_space int priority = 0; int err; + if (wbc->sync_mode == WB_SYNC_NONE && + NFS_SERVER(inode)->write_congested) + return 0; + nfs_inc_stats(inode, NFSIOS_VFSWRITEPAGES); if (!(mntflags & NFS_MOUNT_WRITE_EAGER) || wbc->for_kupdate || @@ -1893,7 +1901,7 @@ static void nfs_commit_release_pages(str } nfss = NFS_SERVER(data->inode); if (atomic_long_read(&nfss->writeback) < NFS_CONGESTION_OFF_THRESH) - clear_bdi_congested(inode_to_bdi(data->inode), BLK_RW_ASYNC); + nfss->write_congested = 0; nfs_init_cinfo(&cinfo, data->inode, data->dreq); nfs_commit_end(cinfo.mds); --- a/include/linux/nfs_fs_sb.h~nfs-remove-reliance-on-bdi-congestion +++ a/include/linux/nfs_fs_sb.h @@ -138,6 +138,7 @@ struct nfs_server { struct nlm_host *nlm_host; /* NLM client handle */ struct nfs_iostats __percpu *io_stats; /* I/O statistics */ atomic_long_t writeback; /* number of writeback pages */ + unsigned int write_congested;/* flag set when writeback gets too high */ unsigned int flags; /* various flags */ /* The following are for internal use only. Also see uapi/linux/nfs_mount.h */ From patchwork Tue Mar 22 21:39:04 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789053 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9425FC433FE for ; Tue, 22 Mar 2022 21:39:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2A6DC6B0083; Tue, 22 Mar 2022 17:39:07 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 255DB6B0085; Tue, 22 Mar 2022 17:39:07 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 145C46B0087; Tue, 22 Mar 2022 17:39:07 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.26]) by kanga.kvack.org (Postfix) with ESMTP id F38DB6B0083 for ; Tue, 22 Mar 2022 17:39:06 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id B78A61D34 for ; Tue, 22 Mar 2022 21:39:06 +0000 (UTC) X-FDA: 79273337892.12.4371333 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf30.hostedemail.com (Postfix) with ESMTP id 2B4138001F for ; Tue, 22 Mar 2022 21:39:06 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 88EFB61733; Tue, 22 Mar 2022 21:39:05 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id E2AFBC340EC; Tue, 22 Mar 2022 21:39:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985145; bh=GNv9WK5AmSRiwsLgkxT4RsDluuLKljhoxAG1uC8YS+s=; h=Date:To:From:In-Reply-To:Subject:From; b=Ux7WJpJcxPEgnxeqRmITFM1lARMv7K4xmg9v5FXKK/cp7fQ6x2GCHpkfOzpDxUgqU NECgfxw8EF+s0mHaUfLSjKASHbufb92O41uRU9tDHoQBN+o13qWCE9og978rq1myAp BVqSJfgRtuin52pCsewczOowyZiYprpB8FDc5/d0= Date: Tue, 22 Mar 2022 14:39:04 -0700 To: trond.myklebust@hammerspace.com,philipp.reisner@linbit.com,paolo.valente@linaro.org,miklos@szeredi.hu,lars.ellenberg@linbit.com,konishi.ryusuke@gmail.com,jlayton@kernel.org,jaegeuk@kernel.org,jack@suse.cz,idryomov@gmail.com,fengguang.wu@intel.com,djwong@kernel.org,chao@kernel.org,axboe@kernel.dk,Anna.Schumaker@Netapp.com,neilb@suse.de,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 011/227] ceph: remove reliance on bdi congestion Message-Id: <20220322213904.E2AFBC340EC@smtp.kernel.org> X-Stat-Signature: 8ymwzyerc96abhw1id5pwcf7bhq5j85s Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=Ux7WJpJc; spf=pass (imf30.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspam-User: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 2B4138001F X-HE-Tag: 1647985146-222545 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: NeilBrown Subject: ceph: remove reliance on bdi congestion The bdi congestion tracking in not widely used and will be removed. CEPHfs is one of a small number of filesystems that uses it, setting just the async (write) congestion flags at what it determines are appropriate times. The only remaining effect of the async flag is to cause (some) WB_SYNC_NONE writes to be skipped. So instead of setting the flag, set an internal flag and change: - .writepages to do nothing if WB_SYNC_NONE and the flag is set - .writepage to return AOP_WRITEPAGE_ACTIVATE if WB_SYNC_NONE and the flag is set. The writepages change causes a behavioural change in that pageout() can now return PAGE_ACTIVATE instead of PAGE_KEEP, so SetPageActive() will be called on the page which (I think) wil further delay the next attempt at writeout. This might be a good thing. Link: https://lkml.kernel.org/r/164549983739.9187.14895675781408171186.stgit@noble.brown Signed-off-by: NeilBrown Cc: Anna Schumaker Cc: Chao Yu Cc: Darrick J. Wong Cc: Ilya Dryomov Cc: Jaegeuk Kim Cc: Jan Kara Cc: Jeff Layton Cc: Jens Axboe Cc: Lars Ellenberg Cc: Miklos Szeredi Cc: Paolo Valente Cc: Philipp Reisner Cc: Ryusuke Konishi Cc: Trond Myklebust Cc: Wu Fengguang Signed-off-by: Andrew Morton --- fs/ceph/addr.c | 22 +++++++++++++--------- fs/ceph/super.c | 1 + fs/ceph/super.h | 1 + 3 files changed, 15 insertions(+), 9 deletions(-) --- a/fs/ceph/addr.c~ceph-remove-reliance-on-bdi-congestion +++ a/fs/ceph/addr.c @@ -563,7 +563,7 @@ static int writepage_nounlock(struct pag if (atomic_long_inc_return(&fsc->writeback_count) > CONGESTION_ON_THRESH(fsc->mount_options->congestion_kb)) - set_bdi_congested(inode_to_bdi(inode), BLK_RW_ASYNC); + fsc->write_congested = true; req = ceph_osdc_new_request(osdc, &ci->i_layout, ceph_vino(inode), page_off, &len, 0, 1, CEPH_OSD_OP_WRITE, CEPH_OSD_FLAG_WRITE, snapc, @@ -623,7 +623,7 @@ static int writepage_nounlock(struct pag if (atomic_long_dec_return(&fsc->writeback_count) < CONGESTION_OFF_THRESH(fsc->mount_options->congestion_kb)) - clear_bdi_congested(inode_to_bdi(inode), BLK_RW_ASYNC); + fsc->write_congested = false; return err; } @@ -635,6 +635,10 @@ static int ceph_writepage(struct page *p BUG_ON(!inode); ihold(inode); + if (wbc->sync_mode == WB_SYNC_NONE && + ceph_inode_to_client(inode)->write_congested) + return AOP_WRITEPAGE_ACTIVATE; + wait_on_page_fscache(page); err = writepage_nounlock(page, wbc); @@ -707,8 +711,7 @@ static void writepages_finish(struct cep if (atomic_long_dec_return(&fsc->writeback_count) < CONGESTION_OFF_THRESH( fsc->mount_options->congestion_kb)) - clear_bdi_congested(inode_to_bdi(inode), - BLK_RW_ASYNC); + fsc->write_congested = false; ceph_put_snap_context(detach_page_private(page)); end_page_writeback(page); @@ -760,6 +763,10 @@ static int ceph_writepages_start(struct bool done = false; bool caching = ceph_is_cache_enabled(inode); + if (wbc->sync_mode == WB_SYNC_NONE && + fsc->write_congested) + return 0; + dout("writepages_start %p (mode=%s)\n", inode, wbc->sync_mode == WB_SYNC_NONE ? "NONE" : (wbc->sync_mode == WB_SYNC_ALL ? "ALL" : "HOLD")); @@ -954,11 +961,8 @@ get_more_pages: if (atomic_long_inc_return(&fsc->writeback_count) > CONGESTION_ON_THRESH( - fsc->mount_options->congestion_kb)) { - set_bdi_congested(inode_to_bdi(inode), - BLK_RW_ASYNC); - } - + fsc->mount_options->congestion_kb)) + fsc->write_congested = true; pages[locked_pages++] = page; pvec.pages[i] = NULL; --- a/fs/ceph/super.c~ceph-remove-reliance-on-bdi-congestion +++ a/fs/ceph/super.c @@ -802,6 +802,7 @@ static struct ceph_fs_client *create_fs_ fsc->have_copy_from2 = true; atomic_long_set(&fsc->writeback_count, 0); + fsc->write_congested = false; err = -ENOMEM; /* --- a/fs/ceph/super.h~ceph-remove-reliance-on-bdi-congestion +++ a/fs/ceph/super.h @@ -121,6 +121,7 @@ struct ceph_fs_client { struct ceph_mds_client *mdsc; atomic_long_t writeback_count; + bool write_congested; struct workqueue_struct *inode_wq; struct workqueue_struct *cap_wq; From patchwork Tue Mar 22 21:39:07 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789062 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8BEA1C433F5 for ; Tue, 22 Mar 2022 21:39:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 231E06B0074; Tue, 22 Mar 2022 17:39:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1BB7A6B0093; Tue, 22 Mar 2022 17:39:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0365E6B0095; Tue, 22 Mar 2022 17:39:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0223.hostedemail.com [216.40.44.223]) by kanga.kvack.org (Postfix) with ESMTP id E24EA6B0074 for ; Tue, 22 Mar 2022 17:39:39 -0400 (EDT) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id A6D08A3EFA for ; Tue, 22 Mar 2022 21:39:39 +0000 (UTC) X-FDA: 79273339278.22.BFA0436 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf16.hostedemail.com (Postfix) with ESMTP id 8253A18002A for ; Tue, 22 Mar 2022 21:39:10 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 6D03DB81DAB; Tue, 22 Mar 2022 21:39:09 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0731EC340EC; Tue, 22 Mar 2022 21:39:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985148; bh=A6AhwR3cGuGHtuxqGn2j0YO1fLY1CKeKfViYF+PApIc=; h=Date:To:From:In-Reply-To:Subject:From; b=1iwLDBGwcsEZe9Oc/elpWJDrb5Sv9atqIGVVa8SQkXTDVIu7H+rTGwRrEEpaQ6CgU 0AyQUZTyMCssM8ycx31aZuvOVcYg0VKR5RjkBh+qfZYIZNmQBhS8fjp5myLYR4oRdr /C1WEdryPyNJRWc2c//hFrb9ulFWgqTTnZGYmqdY= Date: Tue, 22 Mar 2022 14:39:07 -0700 To: trond.myklebust@hammerspace.com,philipp.reisner@linbit.com,paolo.valente@linaro.org,miklos@szeredi.hu,lars.ellenberg@linbit.com,konishi.ryusuke@gmail.com,jlayton@kernel.org,jaegeuk@kernel.org,jack@suse.cz,idryomov@gmail.com,fengguang.wu@intel.com,djwong@kernel.org,chao@kernel.org,axboe@kernel.dk,Anna.Schumaker@Netapp.com,neilb@suse.de,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 012/227] remove inode_congested() Message-Id: <20220322213908.0731EC340EC@smtp.kernel.org> X-Rspam-User: Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=1iwLDBGw; spf=pass (imf16.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 8253A18002A X-Stat-Signature: ox797jypbuebe377ydx3wo6bruapoqio X-HE-Tag: 1647985150-220281 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: NeilBrown Subject: remove inode_congested() inode_congested() reports if the backing-device for the inode is congested. No bdi reports congestion any more, so this always returns 'false'. So remove inode_congested() and related functions, and remove the call sites, assuming that inode_congested() always returns 'false'. Link: https://lkml.kernel.org/r/164549983741.9187.2174285592262191311.stgit@noble.brown Signed-off-by: NeilBrown Cc: Anna Schumaker Cc: Chao Yu Cc: Darrick J. Wong Cc: Ilya Dryomov Cc: Jaegeuk Kim Cc: Jan Kara Cc: Jeff Layton Cc: Jens Axboe Cc: Lars Ellenberg Cc: Miklos Szeredi Cc: Paolo Valente Cc: Philipp Reisner Cc: Ryusuke Konishi Cc: Trond Myklebust Cc: Wu Fengguang Signed-off-by: Andrew Morton --- fs/fs-writeback.c | 37 ---------------------------------- include/linux/backing-dev.h | 22 -------------------- mm/fadvise.c | 5 +--- mm/readahead.c | 6 ----- mm/vmscan.c | 17 --------------- 5 files changed, 3 insertions(+), 84 deletions(-) --- a/fs/fs-writeback.c~remove-inode_congested +++ a/fs/fs-writeback.c @@ -894,43 +894,6 @@ void wbc_account_cgroup_owner(struct wri EXPORT_SYMBOL_GPL(wbc_account_cgroup_owner); /** - * inode_congested - test whether an inode is congested - * @inode: inode to test for congestion (may be NULL) - * @cong_bits: mask of WB_[a]sync_congested bits to test - * - * Tests whether @inode is congested. @cong_bits is the mask of congestion - * bits to test and the return value is the mask of set bits. - * - * If cgroup writeback is enabled for @inode, the congestion state is - * determined by whether the cgwb (cgroup bdi_writeback) for the blkcg - * associated with @inode is congested; otherwise, the root wb's congestion - * state is used. - * - * @inode is allowed to be NULL as this function is often called on - * mapping->host which is NULL for the swapper space. - */ -int inode_congested(struct inode *inode, int cong_bits) -{ - /* - * Once set, ->i_wb never becomes NULL while the inode is alive. - * Start transaction iff ->i_wb is visible. - */ - if (inode && inode_to_wb_is_valid(inode)) { - struct bdi_writeback *wb; - struct wb_lock_cookie lock_cookie = {}; - bool congested; - - wb = unlocked_inode_to_wb_begin(inode, &lock_cookie); - congested = wb_congested(wb, cong_bits); - unlocked_inode_to_wb_end(inode, &lock_cookie); - return congested; - } - - return wb_congested(&inode_to_bdi(inode)->wb, cong_bits); -} -EXPORT_SYMBOL_GPL(inode_congested); - -/** * wb_split_bdi_pages - split nr_pages to write according to bandwidth * @wb: target bdi_writeback to split @nr_pages to * @nr_pages: number of pages to write for the whole bdi --- a/include/linux/backing-dev.h~remove-inode_congested +++ a/include/linux/backing-dev.h @@ -162,7 +162,6 @@ struct bdi_writeback *wb_get_create(stru gfp_t gfp); void wb_memcg_offline(struct mem_cgroup *memcg); void wb_blkcg_offline(struct blkcg *blkcg); -int inode_congested(struct inode *inode, int cong_bits); /** * inode_cgwb_enabled - test whether cgroup writeback is enabled on an inode @@ -390,29 +389,8 @@ static inline void wb_blkcg_offline(stru { } -static inline int inode_congested(struct inode *inode, int cong_bits) -{ - return wb_congested(&inode_to_bdi(inode)->wb, cong_bits); -} - #endif /* CONFIG_CGROUP_WRITEBACK */ -static inline int inode_read_congested(struct inode *inode) -{ - return inode_congested(inode, 1 << WB_sync_congested); -} - -static inline int inode_write_congested(struct inode *inode) -{ - return inode_congested(inode, 1 << WB_async_congested); -} - -static inline int inode_rw_congested(struct inode *inode) -{ - return inode_congested(inode, (1 << WB_sync_congested) | - (1 << WB_async_congested)); -} - static inline int bdi_congested(struct backing_dev_info *bdi, int cong_bits) { return wb_congested(&bdi->wb, cong_bits); --- a/mm/fadvise.c~remove-inode_congested +++ a/mm/fadvise.c @@ -109,9 +109,8 @@ int generic_fadvise(struct file *file, l case POSIX_FADV_NOREUSE: break; case POSIX_FADV_DONTNEED: - if (!inode_write_congested(mapping->host)) - __filemap_fdatawrite_range(mapping, offset, endbyte, - WB_SYNC_NONE); + __filemap_fdatawrite_range(mapping, offset, endbyte, + WB_SYNC_NONE); /* * First and last FULL page! Partial pages are deliberately --- a/mm/readahead.c~remove-inode_congested +++ a/mm/readahead.c @@ -709,12 +709,6 @@ void page_cache_async_ra(struct readahea folio_clear_readahead(folio); - /* - * Defer asynchronous read-ahead on IO congestion. - */ - if (inode_read_congested(ractl->mapping->host)) - return; - if (blk_cgroup_congested()) return; --- a/mm/vmscan.c~remove-inode_congested +++ a/mm/vmscan.c @@ -989,17 +989,6 @@ static inline int is_page_cache_freeable return page_count(page) - page_has_private(page) == 1 + page_cache_pins; } -static int may_write_to_inode(struct inode *inode) -{ - if (current->flags & PF_SWAPWRITE) - return 1; - if (!inode_write_congested(inode)) - return 1; - if (inode_to_bdi(inode) == current->backing_dev_info) - return 1; - return 0; -} - /* * We detected a synchronous write error writing a page out. Probably * -ENOSPC. We need to propagate that into the address_space for a subsequent @@ -1201,8 +1190,6 @@ static pageout_t pageout(struct page *pa } if (mapping->a_ops->writepage == NULL) return PAGE_ACTIVATE; - if (!may_write_to_inode(mapping->host)) - return PAGE_KEEP; if (clear_page_dirty_for_io(page)) { int res; @@ -1578,9 +1565,7 @@ retry: * end of the LRU a second time. */ mapping = page_mapping(page); - if (((dirty || writeback) && mapping && - inode_write_congested(mapping->host)) || - (writeback && PageReclaim(page))) + if (writeback && PageReclaim(page)) stat->nr_congested++; /* From patchwork Tue Mar 22 21:39:10 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789054 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2DCF6C433FE for ; Tue, 22 Mar 2022 21:39:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B33166B0073; Tue, 22 Mar 2022 17:39:14 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AE3966B0085; Tue, 22 Mar 2022 17:39:14 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9D28F6B0087; Tue, 22 Mar 2022 17:39:14 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.a.hostedemail.com [64.99.140.24]) by kanga.kvack.org (Postfix) with ESMTP id 8CDB56B0073 for ; Tue, 22 Mar 2022 17:39:14 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 5E98C61C52 for ; Tue, 22 Mar 2022 21:39:14 +0000 (UTC) X-FDA: 79273338228.11.F607CE7 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf30.hostedemail.com (Postfix) with ESMTP id C4F6A80018 for ; Tue, 22 Mar 2022 21:39:13 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 8ED18B81DB1; Tue, 22 Mar 2022 21:39:12 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 300E0C340EE; Tue, 22 Mar 2022 21:39:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985151; bh=+H2hovwJcf5n3EEZVRMt5DqfX/fHf+o4DI+lqA0+9Jc=; h=Date:To:From:In-Reply-To:Subject:From; b=BOy0mx7kgtF9yPO+JKC1eebqPaHgzP86fUY0pCN51/69M+BpVigZKZVGJLWHLUhJS LlgRldCSx10DkdIMUmCaYoq2a6Cin86vHYoqXQs2ore4ocBOdtF5Dtv+0DQW4Q36nH kZu0kNiffRt1GP2LS1mi1+DSiOV2TyoPzOrmCXg0= Date: Tue, 22 Mar 2022 14:39:10 -0700 To: trond.myklebust@hammerspace.com,philipp.reisner@linbit.com,paolo.valente@linaro.org,miklos@szeredi.hu,lars.ellenberg@linbit.com,konishi.ryusuke@gmail.com,jlayton@kernel.org,jaegeuk@kernel.org,jack@suse.cz,idryomov@gmail.com,fengguang.wu@intel.com,djwong@kernel.org,chao@kernel.org,axboe@kernel.dk,Anna.Schumaker@Netapp.com,neilb@suse.de,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 013/227] remove bdi_congested() and wb_congested() and related functions Message-Id: <20220322213911.300E0C340EE@smtp.kernel.org> X-Rspamd-Server: rspam09 X-Rspam-User: X-Stat-Signature: doffmbz58igbwfsq5m8zd8ftmuy54m5u Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=BOy0mx7k; dmarc=none; spf=pass (imf30.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Rspamd-Queue-Id: C4F6A80018 X-HE-Tag: 1647985153-731515 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: NeilBrown Subject: remove bdi_congested() and wb_congested() and related functions These functions are no longer useful as no BDIs report congestions any more. Removing the test on bdi_write_contested() in current_may_throttle() could cause a small change in behaviour, but only when PF_LOCAL_THROTTLE is set. So replace the calls by 'false' and simplify the code - and remove the functions. [akpm@linux-foundation.org: fix build] Link: https://lkml.kernel.org/r/164549983742.9187.2570198746005819592.stgit@noble.brown Signed-off-by: NeilBrown Acked-by: Ryusuke Konishi [nilfs] Cc: Anna Schumaker Cc: Chao Yu Cc: Darrick J. Wong Cc: Ilya Dryomov Cc: Jaegeuk Kim Cc: Jan Kara Cc: Jeff Layton Cc: Jens Axboe Cc: Lars Ellenberg Cc: Miklos Szeredi Cc: Paolo Valente Cc: Philipp Reisner Cc: Trond Myklebust Cc: Wu Fengguang Signed-off-by: Andrew Morton --- drivers/block/drbd/drbd_int.h | 3 --- drivers/block/drbd/drbd_req.c | 3 +-- fs/ext2/ialloc.c | 5 ----- fs/nilfs2/segbuf.c | 16 ---------------- fs/xfs/xfs_buf.c | 3 --- include/linux/backing-dev.h | 26 -------------------------- mm/vmscan.c | 4 +--- 7 files changed, 2 insertions(+), 58 deletions(-) --- a/drivers/block/drbd/drbd_int.h~remove-bdi_congested-and-wb_congested-and-related-functions +++ a/drivers/block/drbd/drbd_int.h @@ -638,9 +638,6 @@ enum { STATE_SENT, /* Do not change state/UUIDs while this is set */ CALLBACK_PENDING, /* Whether we have a call_usermodehelper(, UMH_WAIT_PROC) * pending, from drbd worker context. - * If set, bdi_write_congested() returns true, - * so shrink_page_list() would not recurse into, - * and potentially deadlock on, this drbd worker. */ DISCONNECT_SENT, --- a/drivers/block/drbd/drbd_req.c~remove-bdi_congested-and-wb_congested-and-related-functions +++ a/drivers/block/drbd/drbd_req.c @@ -909,8 +909,7 @@ static bool remote_due_to_read_balancing switch (rbm) { case RB_CONGESTED_REMOTE: - return bdi_read_congested( - device->ldev->backing_bdev->bd_disk->bdi); + return 0; case RB_LEAST_PENDING: return atomic_read(&device->local_cnt) > atomic_read(&device->ap_pending_cnt) + atomic_read(&device->rs_pending_cnt); --- a/fs/ext2/ialloc.c~remove-bdi_congested-and-wb_congested-and-related-functions +++ a/fs/ext2/ialloc.c @@ -170,11 +170,6 @@ static void ext2_preread_inode(struct in unsigned long offset; unsigned long block; struct ext2_group_desc * gdp; - struct backing_dev_info *bdi; - - bdi = inode_to_bdi(inode); - if (bdi_rw_congested(bdi)) - return; block_group = (inode->i_ino - 1) / EXT2_INODES_PER_GROUP(inode->i_sb); gdp = ext2_get_group_desc(inode->i_sb, block_group, NULL); --- a/fs/nilfs2/segbuf.c~remove-bdi_congested-and-wb_congested-and-related-functions +++ a/fs/nilfs2/segbuf.c @@ -341,18 +341,6 @@ static int nilfs_segbuf_submit_bio(struc int mode_flags) { struct bio *bio = wi->bio; - int err; - - if (segbuf->sb_nbio > 0 && - bdi_write_congested(segbuf->sb_super->s_bdi)) { - wait_for_completion(&segbuf->sb_bio_event); - segbuf->sb_nbio--; - if (unlikely(atomic_read(&segbuf->sb_err))) { - bio_put(bio); - err = -EIO; - goto failed; - } - } bio->bi_end_io = nilfs_end_bio_write; bio->bi_private = segbuf; @@ -365,10 +353,6 @@ static int nilfs_segbuf_submit_bio(struc wi->nr_vecs = min(wi->max_pages, wi->rest_blocks); wi->start = wi->end; return 0; - - failed: - wi->bio = NULL; - return err; } /** --- a/fs/xfs/xfs_buf.c~remove-bdi_congested-and-wb_congested-and-related-functions +++ a/fs/xfs/xfs_buf.c @@ -843,9 +843,6 @@ xfs_buf_readahead_map( { struct xfs_buf *bp; - if (bdi_read_congested(target->bt_bdev->bd_disk->bdi)) - return; - xfs_buf_read_map(target, map, nmaps, XBF_TRYLOCK | XBF_ASYNC | XBF_READ_AHEAD, &bp, ops, __this_address); --- a/include/linux/backing-dev.h~remove-bdi_congested-and-wb_congested-and-related-functions +++ a/include/linux/backing-dev.h @@ -135,11 +135,6 @@ static inline bool writeback_in_progress struct backing_dev_info *inode_to_bdi(struct inode *inode); -static inline int wb_congested(struct bdi_writeback *wb, int cong_bits) -{ - return wb->congested & cong_bits; -} - long congestion_wait(int sync, long timeout); static inline bool mapping_can_writeback(struct address_space *mapping) @@ -391,27 +386,6 @@ static inline void wb_blkcg_offline(stru #endif /* CONFIG_CGROUP_WRITEBACK */ -static inline int bdi_congested(struct backing_dev_info *bdi, int cong_bits) -{ - return wb_congested(&bdi->wb, cong_bits); -} - -static inline int bdi_read_congested(struct backing_dev_info *bdi) -{ - return bdi_congested(bdi, 1 << WB_sync_congested); -} - -static inline int bdi_write_congested(struct backing_dev_info *bdi) -{ - return bdi_congested(bdi, 1 << WB_async_congested); -} - -static inline int bdi_rw_congested(struct backing_dev_info *bdi) -{ - return bdi_congested(bdi, (1 << WB_sync_congested) | - (1 << WB_async_congested)); -} - const char *bdi_dev_name(struct backing_dev_info *bdi); #endif /* _LINUX_BACKING_DEV_H */ --- a/mm/vmscan.c~remove-bdi_congested-and-wb_congested-and-related-functions +++ a/mm/vmscan.c @@ -2364,9 +2364,7 @@ static unsigned int move_pages_to_lru(st */ static int current_may_throttle(void) { - return !(current->flags & PF_LOCAL_THROTTLE) || - current->backing_dev_info == NULL || - bdi_write_congested(current->backing_dev_info); + return !(current->flags & PF_LOCAL_THROTTLE); } /* From patchwork Tue Mar 22 21:39:13 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789064 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4C0E4C433F5 for ; Tue, 22 Mar 2022 21:39:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5DC2C6B0093; Tue, 22 Mar 2022 17:39:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 566E46B0095; Tue, 22 Mar 2022 17:39:43 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3EA8A6B0096; Tue, 22 Mar 2022 17:39:43 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.a.hostedemail.com [64.99.140.24]) by kanga.kvack.org (Postfix) with ESMTP id 295866B0093 for ; Tue, 22 Mar 2022 17:39:43 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 0339261885 for ; Tue, 22 Mar 2022 21:39:43 +0000 (UTC) X-FDA: 79273339446.02.3C2C95E Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf07.hostedemail.com (Postfix) with ESMTP id CD95F40035 for ; Tue, 22 Mar 2022 21:39:16 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 8BBEAB81DAF; Tue, 22 Mar 2022 21:39:15 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 46BB7C340EE; Tue, 22 Mar 2022 21:39:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985154; bh=TQRRx2kdA8duuRUrhQzWi5QQFSuotwmj5Hbrv21f9i4=; h=Date:To:From:In-Reply-To:Subject:From; b=Zn0+/KU9X6jf6OwMjGQtCdhmwoUR+kg88HGgoO4wjD94uL+EDjMeeWc1weFPgWhWu kIyLpJB+AZoNaefLqxm7CpKcIFt688bhrJ1XuhxQC6MFgGF4BpTAZGFmfq1aKMMMgs OxXKEMgyvOrTUXtd6Tk1/jBt7LzB8Ym6ivC6CyY8= Date: Tue, 22 Mar 2022 14:39:13 -0700 To: trond.myklebust@hammerspace.com,philipp.reisner@linbit.com,paolo.valente@linaro.org,miklos@szeredi.hu,lars.ellenberg@linbit.com,konishi.ryusuke@gmail.com,jlayton@kernel.org,jaegeuk@kernel.org,jack@suse.cz,idryomov@gmail.com,fengguang.wu@intel.com,djwong@kernel.org,chao@kernel.org,axboe@kernel.dk,Anna.Schumaker@Netapp.com,neilb@suse.de,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 014/227] f2fs: replace congestion_wait() calls with io_schedule_timeout() Message-Id: <20220322213914.46BB7C340EE@smtp.kernel.org> X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: CD95F40035 X-Stat-Signature: fb7s9f6ocizos7sor8jbkgjx475t9uhz X-Rspam-User: Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b="Zn0+/KU9"; dmarc=none; spf=pass (imf07.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-HE-Tag: 1647985156-193698 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: NeilBrown Subject: f2fs: replace congestion_wait() calls with io_schedule_timeout() As congestion is no longer tracked, congestion_wait() is effectively equivalent to io_schedule_timeout(). So introduce f2fs_io_schedule_timeout() which sets TASK_UNINTERRUPTIBLE and call that instead. Link: https://lkml.kernel.org/r/164549983744.9187.6425865370954230902.stgit@noble.brown Signed-off-by: NeilBrown Cc: Anna Schumaker Cc: Chao Yu Cc: Darrick J. Wong Cc: Ilya Dryomov Cc: Jaegeuk Kim Cc: Jan Kara Cc: Jeff Layton Cc: Jens Axboe Cc: Lars Ellenberg Cc: Miklos Szeredi Cc: Paolo Valente Cc: Philipp Reisner Cc: Ryusuke Konishi Cc: Trond Myklebust Cc: Wu Fengguang Signed-off-by: Andrew Morton --- fs/f2fs/compress.c | 4 +--- fs/f2fs/data.c | 3 +-- fs/f2fs/f2fs.h | 6 ++++++ fs/f2fs/segment.c | 8 +++----- fs/f2fs/super.c | 6 ++---- 5 files changed, 13 insertions(+), 14 deletions(-) --- a/fs/f2fs/compress.c~f2fs-replace-congestion_wait-calls-with-io_schedule_timeout +++ a/fs/f2fs/compress.c @@ -1505,9 +1505,7 @@ continue_unlock: if (IS_NOQUOTA(cc->inode)) return 0; ret = 0; - cond_resched(); - congestion_wait(BLK_RW_ASYNC, - DEFAULT_IO_TIMEOUT); + f2fs_io_schedule_timeout(DEFAULT_IO_TIMEOUT); goto retry_write; } return ret; --- a/fs/f2fs/data.c~f2fs-replace-congestion_wait-calls-with-io_schedule_timeout +++ a/fs/f2fs/data.c @@ -3047,8 +3047,7 @@ result: } else if (ret == -EAGAIN) { ret = 0; if (wbc->sync_mode == WB_SYNC_ALL) { - cond_resched(); - congestion_wait(BLK_RW_ASYNC, + f2fs_io_schedule_timeout( DEFAULT_IO_TIMEOUT); goto retry_write; } --- a/fs/f2fs/f2fs.h~f2fs-replace-congestion_wait-calls-with-io_schedule_timeout +++ a/fs/f2fs/f2fs.h @@ -4426,6 +4426,12 @@ static inline bool f2fs_block_unit_disca return F2FS_OPTION(sbi).discard_unit == DISCARD_UNIT_BLOCK; } +static inline void f2fs_io_schedule_timeout(long timeout) +{ + set_current_state(TASK_UNINTERRUPTIBLE); + io_schedule_timeout(timeout); +} + #define EFSBADCRC EBADMSG /* Bad CRC detected */ #define EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */ --- a/fs/f2fs/segment.c~f2fs-replace-congestion_wait-calls-with-io_schedule_timeout +++ a/fs/f2fs/segment.c @@ -313,8 +313,7 @@ next: skip: iput(inode); } - congestion_wait(BLK_RW_ASYNC, DEFAULT_IO_TIMEOUT); - cond_resched(); + f2fs_io_schedule_timeout(DEFAULT_IO_TIMEOUT); if (gc_failure) { if (++looped >= count) return; @@ -803,8 +802,7 @@ int f2fs_flush_device_cache(struct f2fs_ do { ret = __submit_flush_wait(sbi, FDEV(i).bdev); if (ret) - congestion_wait(BLK_RW_ASYNC, - DEFAULT_IO_TIMEOUT); + f2fs_io_schedule_timeout(DEFAULT_IO_TIMEOUT); } while (ret && --count); if (ret) { @@ -3133,7 +3131,7 @@ next: blk_finish_plug(&plug); mutex_unlock(&dcc->cmd_lock); trimmed += __wait_all_discard_cmd(sbi, NULL); - congestion_wait(BLK_RW_ASYNC, DEFAULT_IO_TIMEOUT); + f2fs_io_schedule_timeout(DEFAULT_IO_TIMEOUT); goto next; } skip: --- a/fs/f2fs/super.c~f2fs-replace-congestion_wait-calls-with-io_schedule_timeout +++ a/fs/f2fs/super.c @@ -2135,8 +2135,7 @@ static void f2fs_enable_checkpoint(struc /* we should flush all the data to keep data consistency */ do { sync_inodes_sb(sbi->sb); - cond_resched(); - congestion_wait(BLK_RW_ASYNC, DEFAULT_IO_TIMEOUT); + f2fs_io_schedule_timeout(DEFAULT_IO_TIMEOUT); } while (get_pages(sbi, F2FS_DIRTY_DATA) && retry--); if (unlikely(retry < 0)) @@ -2504,8 +2503,7 @@ retry: &page, &fsdata); if (unlikely(err)) { if (err == -ENOMEM) { - congestion_wait(BLK_RW_ASYNC, - DEFAULT_IO_TIMEOUT); + f2fs_io_schedule_timeout(DEFAULT_IO_TIMEOUT); goto retry; } set_sbi_flag(F2FS_SB(sb), SBI_QUOTA_NEED_REPAIR); From patchwork Tue Mar 22 21:39:16 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789055 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 75234C433EF for ; Tue, 22 Mar 2022 21:39:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 010266B0085; Tue, 22 Mar 2022 17:39:20 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id F013C6B0087; Tue, 22 Mar 2022 17:39:19 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DC9466B0088; Tue, 22 Mar 2022 17:39:19 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.27]) by kanga.kvack.org (Postfix) with ESMTP id CE96A6B0085 for ; Tue, 22 Mar 2022 17:39:19 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay12.hostedemail.com (Postfix) with ESMTP id 8DEA0121951 for ; Tue, 22 Mar 2022 21:39:19 +0000 (UTC) X-FDA: 79273338438.02.AD52D6E Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf17.hostedemail.com (Postfix) with ESMTP id 0C66A4002E for ; Tue, 22 Mar 2022 21:39:18 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 001466174A; Tue, 22 Mar 2022 21:39:18 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5A8B1C340EC; Tue, 22 Mar 2022 21:39:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985157; bh=MYvmPK/aNmj5i4GoFiNTXCOIpY4oArXSx1tzkhuqs0o=; h=Date:To:From:In-Reply-To:Subject:From; b=MkSC30XRPIwmNytv1XgJVeiyI1CCAPsRbRyjcKwtMzTBLv7WxkdH2H1cj4qhDAhFj SNcHU82DPxGWsL28y9hg0GI5LHtPkX4ABCHX/ltklmZt07pxNuDJWyjiROsLaLK9Nt OfuY88JDZxF4OAoZM8q/pEZjDkgpXg4W6kCXM9i0= Date: Tue, 22 Mar 2022 14:39:16 -0700 To: trond.myklebust@hammerspace.com,philipp.reisner@linbit.com,paolo.valente@linaro.org,miklos@szeredi.hu,lars.ellenberg@linbit.com,konishi.ryusuke@gmail.com,jlayton@kernel.org,jaegeuk@kernel.org,jack@suse.cz,idryomov@gmail.com,fengguang.wu@intel.com,djwong@kernel.org,chao@kernel.org,axboe@kernel.dk,Anna.Schumaker@Netapp.com,neilb@suse.de,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 015/227] block/bfq-iosched.c: use "false" rather than "BLK_RW_ASYNC" Message-Id: <20220322213917.5A8B1C340EC@smtp.kernel.org> X-Stat-Signature: ucaezdtsdzagcbjyw9ty3mcezwzakbst Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=MkSC30XR; dmarc=none; spf=pass (imf17.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Rspam-User: X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: 0C66A4002E X-HE-Tag: 1647985158-656624 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: NeilBrown Subject: block/bfq-iosched.c: use "false" rather than "BLK_RW_ASYNC" bfq_get_queue() expects a "bool" for the third arg, so pass "false" rather than "BLK_RW_ASYNC" which will soon be removed. Link: https://lkml.kernel.org/r/164549983746.9187.7949730109246767909.stgit@noble.brown Signed-off-by: NeilBrown Acked-by: Jens Axboe Cc: Anna Schumaker Cc: Chao Yu Cc: Darrick J. Wong Cc: Ilya Dryomov Cc: Jaegeuk Kim Cc: Jan Kara Cc: Jeff Layton Cc: Lars Ellenberg Cc: Miklos Szeredi Cc: Paolo Valente Cc: Philipp Reisner Cc: Ryusuke Konishi Cc: Trond Myklebust Cc: Wu Fengguang Signed-off-by: Andrew Morton --- block/bfq-iosched.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/block/bfq-iosched.c~block-bfq-ioschedc-use-false-rather-than-blk_rw_async +++ a/block/bfq-iosched.c @@ -5448,7 +5448,7 @@ static void bfq_check_ioprio_change(stru bfqq = bic_to_bfqq(bic, false); if (bfqq) { bfq_release_process_ref(bfqd, bfqq); - bfqq = bfq_get_queue(bfqd, bio, BLK_RW_ASYNC, bic, true); + bfqq = bfq_get_queue(bfqd, bio, false, bic, true); bic_set_bfqq(bic, bfqq, false); } From patchwork Tue Mar 22 21:39:19 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789056 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CC7E8C433FE for ; Tue, 22 Mar 2022 21:39:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 58C5C6B0087; Tue, 22 Mar 2022 17:39:24 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 53CD96B0088; Tue, 22 Mar 2022 17:39:24 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3DD7B6B0089; Tue, 22 Mar 2022 17:39:24 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0061.hostedemail.com [216.40.44.61]) by kanga.kvack.org (Postfix) with ESMTP id 2D7456B0087 for ; Tue, 22 Mar 2022 17:39:24 -0400 (EDT) Received: from smtpin16.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id DD0F28249980 for ; Tue, 22 Mar 2022 21:39:23 +0000 (UTC) X-FDA: 79273338606.16.8104195 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf11.hostedemail.com (Postfix) with ESMTP id 2A21A4000B for ; Tue, 22 Mar 2022 21:39:22 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id E4499B81D77; Tue, 22 Mar 2022 21:39:21 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 78792C340EC; Tue, 22 Mar 2022 21:39:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985160; bh=NJsgt0ZlcjZ0oYa94U3uqEyWITHBqejgzlR0wvPWS1E=; h=Date:To:From:In-Reply-To:Subject:From; b=VC3WilAxGoRjM0zfNr8TTerm9VEgLeToMADPCVPXU+CbnqpZNPdRkUq+5C1N1ttPY +DuCnLWp6KA/EUudOjY8AOTmoyguLYjEXHmJUZttss/dXw6y1ag8rr2wLnc1NOGonF jisTb0c8I4M6GRd/5iCUoANgDysILztRwKY4KmC4= Date: Tue, 22 Mar 2022 14:39:19 -0700 To: trond.myklebust@hammerspace.com,philipp.reisner@linbit.com,paolo.valente@linaro.org,miklos@szeredi.hu,lars.ellenberg@linbit.com,konishi.ryusuke@gmail.com,jlayton@kernel.org,jaegeuk@kernel.org,jack@suse.cz,idryomov@gmail.com,fengguang.wu@intel.com,djwong@kernel.org,chao@kernel.org,axboe@kernel.dk,Anna.Schumaker@Netapp.com,neilb@suse.de,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 016/227] remove congestion tracking framework Message-Id: <20220322213920.78792C340EC@smtp.kernel.org> X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 2A21A4000B X-Stat-Signature: zqj4uwmtbbhrf4woaswj1izzqcykwccs Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=VC3WilAx; dmarc=none; spf=pass (imf11.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Rspam-User: X-HE-Tag: 1647985162-242262 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: NeilBrown Subject: remove congestion tracking framework This framework is no longer used - so discard it. Link: https://lkml.kernel.org/r/164549983747.9187.6171768583526866601.stgit@noble.brown Signed-off-by: NeilBrown Cc: Anna Schumaker Cc: Chao Yu Cc: Darrick J. Wong Cc: Ilya Dryomov Cc: Jaegeuk Kim Cc: Jan Kara Cc: Jeff Layton Cc: Jens Axboe Cc: Lars Ellenberg Cc: Miklos Szeredi Cc: Paolo Valente Cc: Philipp Reisner Cc: Ryusuke Konishi Cc: Trond Myklebust Cc: Wu Fengguang Signed-off-by: Andrew Morton --- include/linux/backing-dev-defs.h | 8 ---- include/linux/backing-dev.h | 2 - include/trace/events/writeback.h | 28 -------------- mm/backing-dev.c | 57 ----------------------------- 4 files changed, 95 deletions(-) --- a/include/linux/backing-dev-defs.h~remove-congestion-tracking-framework +++ a/include/linux/backing-dev-defs.h @@ -207,14 +207,6 @@ struct backing_dev_info { #endif }; -enum { - BLK_RW_ASYNC = 0, - BLK_RW_SYNC = 1, -}; - -void clear_bdi_congested(struct backing_dev_info *bdi, int sync); -void set_bdi_congested(struct backing_dev_info *bdi, int sync); - struct wb_lock_cookie { bool locked; unsigned long flags; --- a/include/linux/backing-dev.h~remove-congestion-tracking-framework +++ a/include/linux/backing-dev.h @@ -135,8 +135,6 @@ static inline bool writeback_in_progress struct backing_dev_info *inode_to_bdi(struct inode *inode); -long congestion_wait(int sync, long timeout); - static inline bool mapping_can_writeback(struct address_space *mapping) { return inode_to_bdi(mapping->host)->capabilities & BDI_CAP_WRITEBACK; --- a/include/trace/events/writeback.h~remove-congestion-tracking-framework +++ a/include/trace/events/writeback.h @@ -735,34 +735,6 @@ TRACE_EVENT(writeback_sb_inodes_requeue, ) ); -DECLARE_EVENT_CLASS(writeback_congest_waited_template, - - TP_PROTO(unsigned int usec_timeout, unsigned int usec_delayed), - - TP_ARGS(usec_timeout, usec_delayed), - - TP_STRUCT__entry( - __field( unsigned int, usec_timeout ) - __field( unsigned int, usec_delayed ) - ), - - TP_fast_assign( - __entry->usec_timeout = usec_timeout; - __entry->usec_delayed = usec_delayed; - ), - - TP_printk("usec_timeout=%u usec_delayed=%u", - __entry->usec_timeout, - __entry->usec_delayed) -); - -DEFINE_EVENT(writeback_congest_waited_template, writeback_congestion_wait, - - TP_PROTO(unsigned int usec_timeout, unsigned int usec_delayed), - - TP_ARGS(usec_timeout, usec_delayed) -); - DECLARE_EVENT_CLASS(writeback_single_inode_template, TP_PROTO(struct inode *inode, --- a/mm/backing-dev.c~remove-congestion-tracking-framework +++ a/mm/backing-dev.c @@ -1005,60 +1005,3 @@ const char *bdi_dev_name(struct backing_ return bdi->dev_name; } EXPORT_SYMBOL_GPL(bdi_dev_name); - -static wait_queue_head_t congestion_wqh[2] = { - __WAIT_QUEUE_HEAD_INITIALIZER(congestion_wqh[0]), - __WAIT_QUEUE_HEAD_INITIALIZER(congestion_wqh[1]) - }; -static atomic_t nr_wb_congested[2]; - -void clear_bdi_congested(struct backing_dev_info *bdi, int sync) -{ - wait_queue_head_t *wqh = &congestion_wqh[sync]; - enum wb_congested_state bit; - - bit = sync ? WB_sync_congested : WB_async_congested; - if (test_and_clear_bit(bit, &bdi->wb.congested)) - atomic_dec(&nr_wb_congested[sync]); - smp_mb__after_atomic(); - if (waitqueue_active(wqh)) - wake_up(wqh); -} -EXPORT_SYMBOL(clear_bdi_congested); - -void set_bdi_congested(struct backing_dev_info *bdi, int sync) -{ - enum wb_congested_state bit; - - bit = sync ? WB_sync_congested : WB_async_congested; - if (!test_and_set_bit(bit, &bdi->wb.congested)) - atomic_inc(&nr_wb_congested[sync]); -} -EXPORT_SYMBOL(set_bdi_congested); - -/** - * congestion_wait - wait for a backing_dev to become uncongested - * @sync: SYNC or ASYNC IO - * @timeout: timeout in jiffies - * - * Waits for up to @timeout jiffies for a backing_dev (any backing_dev) to exit - * write congestion. If no backing_devs are congested then just wait for the - * next write to be completed. - */ -long congestion_wait(int sync, long timeout) -{ - long ret; - unsigned long start = jiffies; - DEFINE_WAIT(wait); - wait_queue_head_t *wqh = &congestion_wqh[sync]; - - prepare_to_wait(wqh, &wait, TASK_UNINTERRUPTIBLE); - ret = io_schedule_timeout(timeout); - finish_wait(wqh, &wait); - - trace_writeback_congestion_wait(jiffies_to_usecs(timeout), - jiffies_to_usecs(jiffies - start)); - - return ret; -} -EXPORT_SYMBOL(congestion_wait); From patchwork Tue Mar 22 21:39:22 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789057 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 39B98C43217 for ; Tue, 22 Mar 2022 21:39:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A98446B0088; Tue, 22 Mar 2022 17:39:25 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9A8C66B0089; Tue, 22 Mar 2022 17:39:25 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 898936B008A; Tue, 22 Mar 2022 17:39:25 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0126.hostedemail.com [216.40.44.126]) by kanga.kvack.org (Postfix) with ESMTP id 6F6316B0088 for ; Tue, 22 Mar 2022 17:39:25 -0400 (EDT) Received: from smtpin19.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 2ECAEA4DDC for ; Tue, 22 Mar 2022 21:39:25 +0000 (UTC) X-FDA: 79273338690.19.8765046 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf27.hostedemail.com (Postfix) with ESMTP id AE7DB40035 for ; Tue, 22 Mar 2022 21:39:24 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 1B6D861723; Tue, 22 Mar 2022 21:39:24 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7648DC340EC; Tue, 22 Mar 2022 21:39:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985163; bh=CSiPFmisQO0zSb6xFEMkzK913NpWiS9pxubMf+egv5w=; h=Date:To:From:In-Reply-To:Subject:From; b=01IZSxORs2gMHaLw8epFkj1j/leSGM7AwrVZfSJIe80qajj+0BOTCe9w3PD0q8dwR RvVKDLcN3tuJyzmmp1NSX1OU5Ozsvfo6YBLKeXViPRqxPsZvEQNKJHXuvD5d4bz0W2 qXGa9nAsOyNFjffUl21ysZg8KOsVA1HrkdErFKVc= Date: Tue, 22 Mar 2022 14:39:22 -0700 To: viro@zeniv.linux.org.uk,hch@lst.de,djwong@kernel.org,deepa.kernel@gmail.com,christian.brauner@ubuntu.com,ailiop@suse.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 017/227] mount: warn only once about timestamp range expiration Message-Id: <20220322213923.7648DC340EC@smtp.kernel.org> X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: AE7DB40035 X-Stat-Signature: 8zwjf1m5ifrff8rpg437xd14anrxa5m9 X-Rspam-User: Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=01IZSxOR; dmarc=none; spf=pass (imf27.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-HE-Tag: 1647985164-639354 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Anthony Iliopoulos Subject: mount: warn only once about timestamp range expiration Commit f8b92ba67c5d ("mount: Add mount warning for impending timestamp expiry") introduced a mount warning regarding filesystem timestamp limits, that is printed upon each writable mount or remount. This can result in a lot of unnecessary messages in the kernel log in setups where filesystems are being frequently remounted (or mounted multiple times). Avoid this by setting a superblock flag which indicates that the warning has been emitted at least once for any particular mount, as suggested in [1]. [1] https://lore.kernel.org/CAHk-=wim6VGnxQmjfK_tDg6fbHYKL4EFkmnTjVr9QnRqjDBAeA@mail.gmail.com/ Link: https://lkml.kernel.org/r/20220119202934.26495-1-ailiop@suse.com Signed-off-by: Anthony Iliopoulos Reviewed-by: Christoph Hellwig Acked-by: Christian Brauner Reviewed-by: Darrick J. Wong Cc: Alexander Viro Cc: Deepa Dinamani Signed-off-by: Andrew Morton --- fs/namespace.c | 2 ++ include/linux/fs.h | 1 + 2 files changed, 3 insertions(+) --- a/fs/namespace.c~mount-warn-only-once-about-timestamp-range-expiration +++ a/fs/namespace.c @@ -2597,6 +2597,7 @@ static void mnt_warn_timestamp_expiry(st struct super_block *sb = mnt->mnt_sb; if (!__mnt_is_readonly(mnt) && + (!(sb->s_iflags & SB_I_TS_EXPIRY_WARNED)) && (ktime_get_real_seconds() + TIME_UPTIME_SEC_MAX > sb->s_time_max)) { char *buf = (char *)__get_free_page(GFP_KERNEL); char *mntpath = buf ? d_path(mountpoint, buf, PAGE_SIZE) : ERR_PTR(-ENOMEM); @@ -2611,6 +2612,7 @@ static void mnt_warn_timestamp_expiry(st tm.tm_year+1900, (unsigned long long)sb->s_time_max); free_page((unsigned long)buf); + sb->s_iflags |= SB_I_TS_EXPIRY_WARNED; } } --- a/include/linux/fs.h~mount-warn-only-once-about-timestamp-range-expiration +++ a/include/linux/fs.h @@ -1440,6 +1440,7 @@ extern int send_sigurg(struct fown_struc #define SB_I_SKIP_SYNC 0x00000100 /* Skip superblock at global sync */ #define SB_I_PERSB_BDI 0x00000200 /* has a per-sb bdi */ +#define SB_I_TS_EXPIRY_WARNED 0x00000400 /* warned about timestamp range expiry */ /* Possible states of 'frozen' field */ enum { From patchwork Tue Mar 22 21:39:25 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789058 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5EA78C433FE for ; Tue, 22 Mar 2022 21:39:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E8F776B0089; Tue, 22 Mar 2022 17:39:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E15C26B008A; Tue, 22 Mar 2022 17:39:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CB7C56B008C; Tue, 22 Mar 2022 17:39:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0203.hostedemail.com [216.40.44.203]) by kanga.kvack.org (Postfix) with ESMTP id BD5CC6B0089 for ; Tue, 22 Mar 2022 17:39:27 -0400 (EDT) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 873031822F66F for ; Tue, 22 Mar 2022 21:39:27 +0000 (UTC) X-FDA: 79273338774.26.273B196 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf19.hostedemail.com (Postfix) with ESMTP id 2A4F41A0030 for ; Tue, 22 Mar 2022 21:39:27 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id A2F9D6174F; Tue, 22 Mar 2022 21:39:26 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 67512C340EE; Tue, 22 Mar 2022 21:39:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985166; bh=xC2cZqPNFzyjybznmsAmzD/7GWHjCHD+MjNDI4Fd8cE=; h=Date:To:From:In-Reply-To:Subject:From; b=g9SB1ZKKafzmjd4WG2tsKL9guNHYWxfWhn6RLlRFzmYeJ5+jnqZJENe/6LnbCYhLE 1Ruzkr9m3mLSFR3PXIt736hGVGUcjQEeVQwVZA3jyCANUhqSQJNfMLfzcoQApSbE2V sW9OvG6EYdVAluL4N4jckJfiqzZi2cJ80yVwNQmA= Date: Tue, 22 Mar 2022 14:39:25 -0700 To: songmuchun@bytedance.com,linmiaohe@huawei.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 018/227] mm/memremap: avoid calling kasan_remove_zero_shadow() for device private memory Message-Id: <20220322213926.67512C340EE@smtp.kernel.org> X-Rspam-User: Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=g9SB1ZKK; spf=pass (imf19.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 2A4F41A0030 X-Stat-Signature: o7jky8u7awnpfwt91tcopcaotmugf4of X-HE-Tag: 1647985167-406077 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Miaohe Lin Subject: mm/memremap: avoid calling kasan_remove_zero_shadow() for device private memory For device private memory, we do not create a linear mapping for the memory because the device memory is un-accessible. Thus we do not add kasan zero shadow for it. So it's unnecessary to do kasan_remove_zero_shadow() for it. Link: https://lkml.kernel.org/r/20220126092602.1425-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin Reviewed-by: Muchun Song Signed-off-by: Andrew Morton --- mm/memremap.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) --- a/mm/memremap.c~mm-memremap-avoid-calling-kasan_remove_zero_shadow-for-device-private-memory +++ a/mm/memremap.c @@ -282,7 +282,8 @@ static int pagemap_range(struct dev_page return 0; err_add_memory: - kasan_remove_zero_shadow(__va(range->start), range_len(range)); + if (!is_private) + kasan_remove_zero_shadow(__va(range->start), range_len(range)); err_kasan: untrack_pfn(NULL, PHYS_PFN(range->start), range_len(range)); err_pfn_remap: From patchwork Tue Mar 22 21:39:28 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789059 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 36EEBC433F5 for ; Tue, 22 Mar 2022 21:39:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B81446B008A; Tue, 22 Mar 2022 17:39:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B30776B008C; Tue, 22 Mar 2022 17:39:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9F7BA6B0092; Tue, 22 Mar 2022 17:39:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0006.hostedemail.com [216.40.44.6]) by kanga.kvack.org (Postfix) with ESMTP id 8F6236B008A for ; Tue, 22 Mar 2022 17:39:31 -0400 (EDT) Received: from smtpin31.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 46DE68249980 for ; Tue, 22 Mar 2022 21:39:31 +0000 (UTC) X-FDA: 79273338942.31.198C594 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf24.hostedemail.com (Postfix) with ESMTP id A0A54180038 for ; Tue, 22 Mar 2022 21:39:30 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 064A06173C; Tue, 22 Mar 2022 21:39:30 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5BDA6C340EC; Tue, 22 Mar 2022 21:39:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985169; bh=Id/frEGWAdaMUtqBKCeCRkwo7IwmnmPnzqwExFWQhBI=; h=Date:To:From:In-Reply-To:Subject:From; b=fY0uLUfZzss1suVmucxqiXxQoQ4/4748btUggZ8WKxtA9ZeHzzsgvcpTXLqHEVEWQ ofnUmdpTRNwdNd7dRSe3QbprcKGf1XeXoPGZOGza5lhG5cGlTQvAij2XZBGcwFTchX 54I+Vsu9ybNZY5Bognr2eTeGsSe07daFQPFMQ+Ws= Date: Tue, 22 Mar 2022 14:39:28 -0700 To: willy@infradead.org,william.kucharski@oracle.com,vbabka@suse.cz,kirill.shutemov@linux.intel.com,hch@lst.de,hannes@cmpxchg.org,dhowells@redhat.com,agruenba@redhat.com,linmiaohe@huawei.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 019/227] filemap: remove find_get_pages() Message-Id: <20220322213929.5BDA6C340EC@smtp.kernel.org> X-Stat-Signature: i84w4gmc9jmd9tazd4usued3zuxyfw7r Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=fY0uLUfZ; spf=pass (imf24.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: A0A54180038 X-HE-Tag: 1647985170-136421 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Miaohe Lin Subject: filemap: remove find_get_pages() It's unused now. Remove it and clean up the relevant comment. Link: https://lkml.kernel.org/r/20220208134149.47299-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin Reviewed-by: Christoph Hellwig Cc: Matthew Wilcox (Oracle) Cc: David Howells Cc: William Kucharski Cc: Vlastimil Babka Cc: Kirill A. Shutemov Cc: Johannes Weiner Cc: Andreas Gruenbacher Signed-off-by: Andrew Morton --- include/linux/pagemap.h | 7 ------- mm/filemap.c | 11 ++++++----- 2 files changed, 6 insertions(+), 12 deletions(-) --- a/include/linux/pagemap.h~filemap-remove-find_get_pages +++ a/include/linux/pagemap.h @@ -594,13 +594,6 @@ static inline struct page *find_subpage( unsigned find_get_pages_range(struct address_space *mapping, pgoff_t *start, pgoff_t end, unsigned int nr_pages, struct page **pages); -static inline unsigned find_get_pages(struct address_space *mapping, - pgoff_t *start, unsigned int nr_pages, - struct page **pages) -{ - return find_get_pages_range(mapping, start, (pgoff_t)-1, nr_pages, - pages); -} unsigned find_get_pages_contig(struct address_space *mapping, pgoff_t start, unsigned int nr_pages, struct page **pages); unsigned find_get_pages_range_tag(struct address_space *mapping, pgoff_t *index, --- a/mm/filemap.c~filemap-remove-find_get_pages +++ a/mm/filemap.c @@ -2229,8 +2229,9 @@ out: * @nr_pages: The maximum number of pages * @pages: Where the resulting pages are placed * - * find_get_pages_contig() works exactly like find_get_pages(), except - * that the returned number of pages are guaranteed to be contiguous. + * find_get_pages_contig() works exactly like find_get_pages_range(), + * except that the returned number of pages are guaranteed to be + * contiguous. * * Return: the number of pages which were found. */ @@ -2290,9 +2291,9 @@ EXPORT_SYMBOL(find_get_pages_contig); * @nr_pages: the maximum number of pages * @pages: where the resulting pages are placed * - * Like find_get_pages(), except we only return head pages which are tagged - * with @tag. @index is updated to the index immediately after the last - * page we return, ready for the next iteration. + * Like find_get_pages_range(), except we only return head pages which are + * tagged with @tag. @index is updated to the index immediately after the + * last page we return, ready for the next iteration. * * Return: the number of pages which were found. */ From patchwork Tue Mar 22 21:39:31 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789060 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1117BC433EF for ; Tue, 22 Mar 2022 21:39:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A229D6B008C; Tue, 22 Mar 2022 17:39:35 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9CFD46B0092; Tue, 22 Mar 2022 17:39:35 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 849C86B0093; Tue, 22 Mar 2022 17:39:35 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0115.hostedemail.com [216.40.44.115]) by kanga.kvack.org (Postfix) with ESMTP id 6F7996B008C for ; Tue, 22 Mar 2022 17:39:35 -0400 (EDT) Received: from smtpin30.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 3735E181D75CA for ; Tue, 22 Mar 2022 21:39:35 +0000 (UTC) X-FDA: 79273339110.30.D8E6389 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf13.hostedemail.com (Postfix) with ESMTP id C62BB2001F for ; Tue, 22 Mar 2022 21:39:34 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id A9980B81D9E; Tue, 22 Mar 2022 21:39:33 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4678EC340EC; Tue, 22 Mar 2022 21:39:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985172; bh=6k287UQLQFkel2wGvwxTb471aMwsHRQGu+R6HZZy33U=; h=Date:To:From:In-Reply-To:Subject:From; b=N9r8hMac9Rm9vw8tl1BkW4XBLtrMwjqV+mqGkeRigQQlM6kwNmby0mFzRX9D0jrN4 1ZxvNNGXah6BaLfFRon1rZ+yIi+JwWxd8MpgG7f1MfYfBV8lJSwKRrIOW2FutD3JXr G4U7bK+oIjRL72Jdegh+XBJzIMohqGoHMS21MMqc= Date: Tue, 22 Mar 2022 14:39:31 -0700 To: hannes@cmpxchg.org,linmiaohe@huawei.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 020/227] mm/writeback: minor clean up for highmem_dirtyable_memory Message-Id: <20220322213932.4678EC340EC@smtp.kernel.org> X-Rspamd-Server: rspam09 X-Rspam-User: X-Stat-Signature: qzeifxkhb9cqf9giqx1a44f567tswjot Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=N9r8hMac; dmarc=none; spf=pass (imf13.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Rspamd-Queue-Id: C62BB2001F X-HE-Tag: 1647985174-559919 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Miaohe Lin Subject: mm/writeback: minor clean up for highmem_dirtyable_memory Since commit a804552b9a15 ("mm/page-writeback.c: fix dirty_balance_reserve subtraction from dirtyable memory"), local variable x can not be negative. And it can not overflow when it is the total number of dirtyable highmem pages. Thus remove the unneeded comment and overflow check. Link: https://lkml.kernel.org/r/20220224115416.46089-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin Cc: Johannes Weiner Signed-off-by: Andrew Morton --- mm/page-writeback.c | 12 ------------ 1 file changed, 12 deletions(-) --- a/mm/page-writeback.c~mm-writeback-minor-clean-up-for-highmem_dirtyable_memory +++ a/mm/page-writeback.c @@ -324,18 +324,6 @@ static unsigned long highmem_dirtyable_m } /* - * Unreclaimable memory (kernel memory or anonymous memory - * without swap) can bring down the dirtyable pages below - * the zone's dirty balance reserve and the above calculation - * will underflow. However we still want to add in nodes - * which are below threshold (negative values) to get a more - * accurate calculation but make sure that the total never - * underflows. - */ - if ((long)x < 0) - x = 0; - - /* * Make sure that the number of highmem pages is never larger * than the number of the total dirtyable memory. This can only * occur in very strange VM situations but we want to make sure From patchwork Tue Mar 22 21:39:34 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789061 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1F285C433F5 for ; Tue, 22 Mar 2022 21:39:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AC0846B0092; Tue, 22 Mar 2022 17:39:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A6BD66B0093; Tue, 22 Mar 2022 17:39:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 90DDF6B0095; Tue, 22 Mar 2022 17:39:37 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.27]) by kanga.kvack.org (Postfix) with ESMTP id 7D5B26B0092 for ; Tue, 22 Mar 2022 17:39:37 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 5B918602 for ; Tue, 22 Mar 2022 21:39:37 +0000 (UTC) X-FDA: 79273339194.02.789301E Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf13.hostedemail.com (Postfix) with ESMTP id EF3732001F for ; Tue, 22 Mar 2022 21:39:36 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 85E34B81DB1; Tue, 22 Mar 2022 21:39:36 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4321CC340EC; Tue, 22 Mar 2022 21:39:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985175; bh=yP8ZE/+Cvg9nj9MqmFGn11xg9p3WZXmau+nnxD21NZ4=; h=Date:To:From:In-Reply-To:Subject:From; b=YNTI3VSA1OcG3jaSSJEMwd4bbIIznnMcVrUqGGxSGyuiise8Jl7WwP8VlxpenZsko UfWjPHrTAv3iQYgsWu/OR4Qk6C20WInLEXUGrWK1jzhJrO0gPnE1zh2oEKmCubh1BE cwwyLO9+sOh7yqQyp2n1lTEtd832G/nsm445OWzc= Date: Tue, 22 Mar 2022 14:39:34 -0700 To: stable@vger.kernel.org,mtosatti@redhat.com,joaodias@google.com,cgoldswo@codeaurora.org,minchan@kernel.org,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 021/227] mm: fs: fix lru_cache_disabled race in bh_lru Message-Id: <20220322213935.4321CC340EC@smtp.kernel.org> X-Rspamd-Server: rspam09 X-Rspam-User: X-Stat-Signature: 5yf1irpyfhw76rwr5r1mrbq7wzu7ts7f Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=YNTI3VSA; dmarc=none; spf=pass (imf13.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Rspamd-Queue-Id: EF3732001F X-HE-Tag: 1647985176-601250 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Minchan Kim Subject: mm: fs: fix lru_cache_disabled race in bh_lru Check lru_cache_disabled under bh_lru_lock. Otherwise, it could introduce race below and it fails to migrate pages containing buffer_head. CPU 0 CPU 1 bh_lru_install lru_cache_disable lru_cache_disabled = false atomic_inc(&lru_disable_count); invalidate_bh_lrus_cpu of CPU 0 bh_lru_lock __invalidate_bh_lrus bh_lru_unlock bh_lru_lock install the bh bh_lru_unlock WHen this race happens a CMA allocation fails, which is critical for the workload which depends on CMA. Link: https://lkml.kernel.org/r/20220308180709.2017638-1-minchan@kernel.org Fixes: 8cc621d2f45d ("mm: fs: invalidate BH LRU during page migration") Signed-off-by: Minchan Kim Cc: Chris Goldsworthy Cc: Marcelo Tosatti Cc: John Dias Cc: Signed-off-by: Andrew Morton --- fs/buffer.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) --- a/fs/buffer.c~mm-fs-fix-lru_cache_disabled-race-in-bh_lru +++ a/fs/buffer.c @@ -1235,16 +1235,18 @@ static void bh_lru_install(struct buffer int i; check_irqs_on(); + bh_lru_lock(); + /* * the refcount of buffer_head in bh_lru prevents dropping the * attached page(i.e., try_to_free_buffers) so it could cause * failing page migration. * Skip putting upcoming bh into bh_lru until migration is done. */ - if (lru_cache_disabled()) + if (lru_cache_disabled()) { + bh_lru_unlock(); return; - - bh_lru_lock(); + } b = this_cpu_ptr(&bh_lrus); for (i = 0; i < BH_LRU_SIZE; i++) { From patchwork Tue Mar 22 21:39:37 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789063 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 83F12C433EF for ; Tue, 22 Mar 2022 21:39:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1216F6B0075; Tue, 22 Mar 2022 17:39:42 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0CFFD6B0093; Tue, 22 Mar 2022 17:39:42 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EB27E6B0096; Tue, 22 Mar 2022 17:39:41 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.28]) by kanga.kvack.org (Postfix) with ESMTP id D78506B0093 for ; Tue, 22 Mar 2022 17:39:41 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id A5AFD307 for ; Tue, 22 Mar 2022 21:39:41 +0000 (UTC) X-FDA: 79273339362.14.7283588 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf19.hostedemail.com (Postfix) with ESMTP id 041F21A002E for ; Tue, 22 Mar 2022 21:39:40 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id B2667B81DAB; Tue, 22 Mar 2022 21:39:39 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 6F366C340EE; Tue, 22 Mar 2022 21:39:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985178; bh=P45jF2cSIC0S4zMQop2TdiAZmryr8wUTW/pvsxyxx14=; h=Date:To:From:In-Reply-To:Subject:From; b=afSn/VEm87CU/uVGxSVbSgBFWMT8Cg7UA4fd0cN8gnk4KlQrVrlJebFugoAG/sAqV djvh1lfjfF0+yUDURP0NIalVVQ28XBIFhK2GuF3RkxDNEBlOVHqlJQPjzJPkFPnbJ1 WadhmwsLqFT135E8eAZDb2o/bbEv78MbnVFqp1GU= Date: Tue, 22 Mar 2022 14:39:37 -0700 To: willy@infradead.org,lukas.bulwahn@gmail.com,kirill.shutemov@linux.intel.com,jhubbard@nvidia.com,jgg@ziepe.ca,jgg@nvidia.com,jack@suse.cz,imbrenda@linux.ibm.com,hch@lst.de,david@redhat.com,alex.williamson@redhat.com,aarcange@redhat.com,peterx@redhat.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 022/227] mm: fix invalid page pointer returned with FOLL_PIN gups Message-Id: <20220322213938.6F366C340EE@smtp.kernel.org> X-Rspam-User: Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b="afSn/VEm"; spf=pass (imf19.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 041F21A002E X-Stat-Signature: dqbp5iqbjba3y8c6iuiiu7rqjf5gpztw X-HE-Tag: 1647985180-832593 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Peter Xu Subject: mm: fix invalid page pointer returned with FOLL_PIN gups Patch series "mm/gup: some cleanups", v5. This patch (of 5): Alex reported invalid page pointer returned with pin_user_pages_remote() from vfio after upstream commit 4b6c33b32296 ("vfio/type1: Prepare for batched pinning with struct vfio_batch"). It turns out that it's not the fault of the vfio commit; however after vfio switches to a full page buffer to store the page pointers it starts to expose the problem easier. The problem is for VM_PFNMAP vmas we should normally fail with an -EFAULT then vfio will carry on to handle the MMIO regions. However when the bug triggered, follow_page_mask() returned -EEXIST for such a page, which will jump over the current page, leaving that entry in **pages untouched. However the caller is not aware of it, hence the caller will reference the page as usual even if the pointer data can be anything. We had that -EEXIST logic since commit 1027e4436b6a ("mm: make GUP handle pfn mapping unless FOLL_GET is requested") which seems very reasonable. It could be that when we reworked GUP with FOLL_PIN we could have overlooked that special path in commit 3faa52c03f44 ("mm/gup: track FOLL_PIN pages"), even if that commit rightfully touched up follow_devmap_pud() on checking FOLL_PIN when it needs to return an -EEXIST. Attaching the Fixes to the FOLL_PIN rework commit, as it happened later than 1027e4436b6a. [jhubbard@nvidia.com: added some tags, removed a reference to an out of tree module.] Link: https://lkml.kernel.org/r/20220207062213.235127-1-jhubbard@nvidia.com Link: https://lkml.kernel.org/r/20220204020010.68930-1-jhubbard@nvidia.com Link: https://lkml.kernel.org/r/20220204020010.68930-2-jhubbard@nvidia.com Fixes: 3faa52c03f44 ("mm/gup: track FOLL_PIN pages") Signed-off-by: Peter Xu Signed-off-by: John Hubbard Reviewed-by: Claudio Imbrenda Reported-by: Alex Williamson Debugged-by: Alex Williamson Tested-by: Alex Williamson Reviewed-by: Christoph Hellwig Reviewed-by: Jan Kara Cc: Andrea Arcangeli Cc: Kirill A. Shutemov Cc: Jason Gunthorpe Cc: David Hildenbrand Cc: Lukas Bulwahn Cc: Matthew Wilcox (Oracle) Cc: Jason Gunthorpe Signed-off-by: Andrew Morton --- mm/gup.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/mm/gup.c~mm-fix-invalid-page-pointer-returned-with-foll_pin-gups +++ a/mm/gup.c @@ -465,7 +465,7 @@ static int follow_pfn_pte(struct vm_area pte_t *pte, unsigned int flags) { /* No page to get reference */ - if (flags & FOLL_GET) + if (flags & (FOLL_GET | FOLL_PIN)) return -EFAULT; if (flags & FOLL_TOUCH) { From patchwork Tue Mar 22 21:39:40 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789072 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0DB8AC433EF for ; Tue, 22 Mar 2022 21:40:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 084DA6B0082; Tue, 22 Mar 2022 17:40:07 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id F001E6B0087; Tue, 22 Mar 2022 17:40:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D7A696B0099; Tue, 22 Mar 2022 17:40:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.28]) by kanga.kvack.org (Postfix) with ESMTP id BE0346B0082 for ; Tue, 22 Mar 2022 17:40:06 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 963E723882 for ; Tue, 22 Mar 2022 21:40:06 +0000 (UTC) X-FDA: 79273340412.15.94CF50B Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf01.hostedemail.com (Postfix) with ESMTP id 1A1A640037 for ; Tue, 22 Mar 2022 21:39:43 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id DE5E3B81DB0; Tue, 22 Mar 2022 21:39:42 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7E9BDC340EE; Tue, 22 Mar 2022 21:39:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985181; bh=3i4QFs9B/uJnET8QE64iVchO6AOcfb7NZ+OdPYD0EhQ=; h=Date:To:From:In-Reply-To:Subject:From; b=e4Ji6y9xnPSjqjiv/7IHeYY2dhl1ypWy/fV9Wy03JRcX3zT35M9DaLn5J3LEprq6b cj1/9iK70GKHLSbaknOaJTQdhU/3fFdEP68GAhufOzDTXza6vYu5NZotBzuU6r44eG iPAofHTR6eIPTCsyee8Bp/WwU4rxr56E4QuMsvP4= Date: Tue, 22 Mar 2022 14:39:40 -0700 To: willy@infradead.org,peterx@redhat.com,lukas.bulwahn@gmail.com,kirill.shutemov@linux.intel.com,jgg@ziepe.ca,jgg@nvidia.com,jack@suse.cz,imbrenda@linux.ibm.com,hch@lst.de,david@redhat.com,alex.williamson@redhat.com,aarcange@redhat.com,jhubbard@nvidia.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 023/227] mm/gup: follow_pfn_pte(): -EEXIST cleanup Message-Id: <20220322213941.7E9BDC340EE@smtp.kernel.org> Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=e4Ji6y9x; spf=pass (imf01.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Stat-Signature: jpjpnea4di9yt778y4q4oeibi9zdwn4q X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 1A1A640037 X-HE-Tag: 1647985183-676647 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: John Hubbard Subject: mm/gup: follow_pfn_pte(): -EEXIST cleanup Remove a quirky special case from follow_pfn_pte(), and adjust its callers to match. Caller changes include: __get_user_pages(): Regardless of any FOLL_* flags, get_user_pages() and its variants should handle PFN-only entries by stopping early, if the caller expected **pages to be filled in. This makes for a more reliable API, as compared to the previous approach of skipping over such entries (and thus leaving them silently unwritten). move_pages(): squash the -EEXIST error return from follow_page() into -EFAULT, because -EFAULT is listed in the man page, whereas -EEXIST is not. Link: https://lkml.kernel.org/r/20220204020010.68930-3-jhubbard@nvidia.com Signed-off-by: John Hubbard Suggested-by: Jason Gunthorpe Reviewed-by: Christoph Hellwig Reviewed-by: Jan Kara Cc: Peter Xu Cc: Lukas Bulwahn Cc: Matthew Wilcox Cc: Claudio Imbrenda Cc: Alex Williamson Cc: Andrea Arcangeli Cc: David Hildenbrand Cc: Jason Gunthorpe Cc: Kirill A. Shutemov Signed-off-by: Andrew Morton --- mm/gup.c | 13 ++++++++----- mm/migrate.c | 7 +++++++ 2 files changed, 15 insertions(+), 5 deletions(-) --- a/mm/gup.c~mm-gup-follow_pfn_pte-eexist-cleanup +++ a/mm/gup.c @@ -464,10 +464,6 @@ static struct page *no_page_table(struct static int follow_pfn_pte(struct vm_area_struct *vma, unsigned long address, pte_t *pte, unsigned int flags) { - /* No page to get reference */ - if (flags & (FOLL_GET | FOLL_PIN)) - return -EFAULT; - if (flags & FOLL_TOUCH) { pte_t entry = *pte; @@ -1205,8 +1201,15 @@ retry: } else if (PTR_ERR(page) == -EEXIST) { /* * Proper page table entry exists, but no corresponding - * struct page. + * struct page. If the caller expects **pages to be + * filled in, bail out now, because that can't be done + * for this page. */ + if (pages) { + ret = PTR_ERR(page); + goto out; + } + goto next_page; } else if (IS_ERR(page)) { ret = PTR_ERR(page); --- a/mm/migrate.c~mm-gup-follow_pfn_pte-eexist-cleanup +++ a/mm/migrate.c @@ -1762,6 +1762,13 @@ static int do_pages_move(struct mm_struc } /* + * The move_pages() man page does not have an -EEXIST choice, so + * use -EFAULT instead. + */ + if (err == -EEXIST) + err = -EFAULT; + + /* * If the page is already on the target node (!err), store the * node, otherwise, store the err. */ From patchwork Tue Mar 22 21:39:43 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789065 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F0E95C433EF for ; Tue, 22 Mar 2022 21:39:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 874626B007D; Tue, 22 Mar 2022 17:39:46 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 824136B0095; Tue, 22 Mar 2022 17:39:46 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 712056B0096; Tue, 22 Mar 2022 17:39:46 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.27]) by kanga.kvack.org (Postfix) with ESMTP id 5C8A86B007D for ; Tue, 22 Mar 2022 17:39:46 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay11.hostedemail.com (Postfix) with ESMTP id 358FA81A2A for ; Tue, 22 Mar 2022 21:39:46 +0000 (UTC) X-FDA: 79273339572.12.6C868AD Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf26.hostedemail.com (Postfix) with ESMTP id 94679140038 for ; Tue, 22 Mar 2022 21:39:45 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 2F4EF6174A; Tue, 22 Mar 2022 21:39:45 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 80D93C340EC; Tue, 22 Mar 2022 21:39:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985184; bh=TgP8gVZ3VEv74o04VvHLA+1bQwsdNlpj/ml52mWc2tI=; h=Date:To:From:In-Reply-To:Subject:From; b=meMmPqS8ZxVcQqfWHGJI9RrAR/Fl1zFkRy054SyBoQSBsYyO9FTk5AwoXMt26wmY3 Fhj0Ah+PQth+Eg22nE/hmkVzTJUX6iIGhz+/ws4hopSQCjdai7CRTPJ4USvbG3NP9F EudObRaMUqlV2A4Sa5dRkAxthfovGM7eu983RT1M= Date: Tue, 22 Mar 2022 14:39:43 -0700 To: willy@infradead.org,peterx@redhat.com,lukas.bulwahn@gmail.com,kirill.shutemov@linux.intel.com,jgg@ziepe.ca,jgg@nvidia.com,jack@suse.cz,imbrenda@linux.ibm.com,hch@lst.de,david@redhat.com,alex.williamson@redhat.com,aarcange@redhat.com,jhubbard@nvidia.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 024/227] mm/gup: remove unused pin_user_pages_locked() Message-Id: <20220322213944.80D93C340EC@smtp.kernel.org> X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 94679140038 X-Stat-Signature: q41ahhaxyprsetyujimhekg4b8khe8bt Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=meMmPqS8; dmarc=none; spf=pass (imf26.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Rspam-User: X-HE-Tag: 1647985185-892414 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: John Hubbard Subject: mm/gup: remove unused pin_user_pages_locked() This routine was used for a short while, but then the calling code was refactored and the only caller was removed. Link: https://lkml.kernel.org/r/20220204020010.68930-4-jhubbard@nvidia.com Signed-off-by: John Hubbard Reviewed-by: David Hildenbrand Reviewed-by: Jason Gunthorpe Reviewed-by: Jan Kara Reviewed-by: Christoph Hellwig Reviewed-by: Claudio Imbrenda Cc: Alex Williamson Cc: Andrea Arcangeli Cc: Jason Gunthorpe Cc: Kirill A. Shutemov Cc: Lukas Bulwahn Cc: Matthew Wilcox (Oracle) Cc: Peter Xu Signed-off-by: Andrew Morton --- include/linux/mm.h | 2 -- mm/gup.c | 29 ----------------------------- 2 files changed, 31 deletions(-) --- a/include/linux/mm.h~mm-gup-remove-unused-pin_user_pages_locked +++ a/include/linux/mm.h @@ -1918,8 +1918,6 @@ long pin_user_pages(unsigned long start, struct vm_area_struct **vmas); long get_user_pages_locked(unsigned long start, unsigned long nr_pages, unsigned int gup_flags, struct page **pages, int *locked); -long pin_user_pages_locked(unsigned long start, unsigned long nr_pages, - unsigned int gup_flags, struct page **pages, int *locked); long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages, struct page **pages, unsigned int gup_flags); long pin_user_pages_unlocked(unsigned long start, unsigned long nr_pages, --- a/mm/gup.c~mm-gup-remove-unused-pin_user_pages_locked +++ a/mm/gup.c @@ -3127,32 +3127,3 @@ long pin_user_pages_unlocked(unsigned lo return get_user_pages_unlocked(start, nr_pages, pages, gup_flags); } EXPORT_SYMBOL(pin_user_pages_unlocked); - -/* - * pin_user_pages_locked() is the FOLL_PIN variant of get_user_pages_locked(). - * Behavior is the same, except that this one sets FOLL_PIN and rejects - * FOLL_GET. - */ -long pin_user_pages_locked(unsigned long start, unsigned long nr_pages, - unsigned int gup_flags, struct page **pages, - int *locked) -{ - /* - * FIXME: Current FOLL_LONGTERM behavior is incompatible with - * FAULT_FLAG_ALLOW_RETRY because of the FS DAX check requirement on - * vmas. As there are no users of this flag in this call we simply - * disallow this option for now. - */ - if (WARN_ON_ONCE(gup_flags & FOLL_LONGTERM)) - return -EINVAL; - - /* FOLL_GET and FOLL_PIN are mutually exclusive. */ - if (WARN_ON_ONCE(gup_flags & FOLL_GET)) - return -EINVAL; - - gup_flags |= FOLL_PIN; - return __get_user_pages_locked(current->mm, start, nr_pages, - pages, NULL, locked, - gup_flags | FOLL_TOUCH); -} -EXPORT_SYMBOL(pin_user_pages_locked); From patchwork Tue Mar 22 21:39:46 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789066 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 57E65C433FE for ; Tue, 22 Mar 2022 21:39:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D7EF56B0095; Tue, 22 Mar 2022 17:39:50 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D2C106B0096; Tue, 22 Mar 2022 17:39:50 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BCC736B0098; Tue, 22 Mar 2022 17:39:50 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0229.hostedemail.com [216.40.44.229]) by kanga.kvack.org (Postfix) with ESMTP id AC9B66B0095 for ; Tue, 22 Mar 2022 17:39:50 -0400 (EDT) Received: from smtpin30.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 67DC49D673 for ; Tue, 22 Mar 2022 21:39:50 +0000 (UTC) X-FDA: 79273339740.30.319896F Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf03.hostedemail.com (Postfix) with ESMTP id ED00020014 for ; Tue, 22 Mar 2022 21:39:49 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 481EE61673; Tue, 22 Mar 2022 21:39:48 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 971D9C340EC; Tue, 22 Mar 2022 21:39:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985187; bh=TkJ0cJQXEXCw8B24NXJixdrql+CcTRDCJStXqy3s2GY=; h=Date:To:From:In-Reply-To:Subject:From; b=QD/F/geTDZra5hTN341pb3wrNihHd/7RDPomf12PbYr3a6a4lDaT0oBLIfQyxp/4m H+oMJWGpTZrVtxdLDIefgb307E+ikHOVYlWcNZRATffqhzbZDOky6nEP0YrrMoQV+J uMkgAyyNx1cHQfMweheJegXHYWlcdwU8ML0Ft2aM= Date: Tue, 22 Mar 2022 14:39:46 -0700 To: willy@infradead.org,peterx@redhat.com,lukas.bulwahn@gmail.com,kirill.shutemov@linux.intel.com,jgg@ziepe.ca,jgg@nvidia.com,jack@suse.cz,imbrenda@linux.ibm.com,hch@lst.de,david@redhat.com,alex.williamson@redhat.com,aarcange@redhat.com,jhubbard@nvidia.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 025/227] mm: change lookup_node() to use get_user_pages_fast() Message-Id: <20220322213947.971D9C340EC@smtp.kernel.org> X-Stat-Signature: urk37ce48rgqud17bdqj1xopprm1hgna X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: ED00020014 Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b="QD/F/geT"; dmarc=none; spf=pass (imf03.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Rspam-User: X-HE-Tag: 1647985189-628598 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: John Hubbard Subject: mm: change lookup_node() to use get_user_pages_fast() The purpose of calling get_user_pages_locked() from lookup_node() was to allow for unlocking the mmap_lock when reading a page from the disk during a page fault (hidden behind VM_FAULT_RETRY). The idea was to reduce contention on the heavily-used mmap_lock. (Thanks to Jan Kara for clearly pointing that out, and in fact I've used some of his wording here.) However, it is unlikely for lookup_node() to take a page fault. With that in mind, change over to calling get_user_pages_fast(). This simplifies the code, runs a little faster in the expected case, and allows removing get_user_pages_locked() entirely, in a subsequent patch. Link: https://lkml.kernel.org/r/20220204020010.68930-5-jhubbard@nvidia.com Signed-off-by: John Hubbard Reviewed-by: Jan Kara Reviewed-by: Jason Gunthorpe Reviewed-by: Claudio Imbrenda Reviewed-by: Christoph Hellwig Cc: Alex Williamson Cc: Andrea Arcangeli Cc: David Hildenbrand Cc: Jason Gunthorpe Cc: Kirill A. Shutemov Cc: Lukas Bulwahn Cc: Matthew Wilcox (Oracle) Cc: Peter Xu Signed-off-by: Andrew Morton --- mm/mempolicy.c | 21 +++++++++------------ 1 file changed, 9 insertions(+), 12 deletions(-) --- a/mm/mempolicy.c~mm-change-lookup_node-to-use-get_user_pages_fast +++ a/mm/mempolicy.c @@ -907,17 +907,14 @@ static void get_policy_nodemask(struct m static int lookup_node(struct mm_struct *mm, unsigned long addr) { struct page *p = NULL; - int err; + int ret; - int locked = 1; - err = get_user_pages_locked(addr & PAGE_MASK, 1, 0, &p, &locked); - if (err > 0) { - err = page_to_nid(p); + ret = get_user_pages_fast(addr & PAGE_MASK, 1, 0, &p); + if (ret > 0) { + ret = page_to_nid(p); put_page(p); } - if (locked) - mmap_read_unlock(mm); - return err; + return ret; } /* Retrieve NUMA policy */ @@ -968,14 +965,14 @@ static long do_get_mempolicy(int *policy if (flags & MPOL_F_NODE) { if (flags & MPOL_F_ADDR) { /* - * Take a refcount on the mpol, lookup_node() - * will drop the mmap_lock, so after calling - * lookup_node() only "pol" remains valid, "vma" - * is stale. + * Take a refcount on the mpol, because we are about to + * drop the mmap_lock, after which only "pol" remains + * valid, "vma" is stale. */ pol_refcount = pol; vma = NULL; mpol_get(pol); + mmap_read_unlock(mm); err = lookup_node(mm, addr); if (err < 0) goto out; From patchwork Tue Mar 22 21:39:50 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789067 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 08AF2C433F5 for ; Tue, 22 Mar 2022 21:39:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 983EA6B0096; Tue, 22 Mar 2022 17:39:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 932DE6B0098; Tue, 22 Mar 2022 17:39:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 822706B0099; Tue, 22 Mar 2022 17:39:53 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.25]) by kanga.kvack.org (Postfix) with ESMTP id 745186B0096 for ; Tue, 22 Mar 2022 17:39:53 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 402FC22EF8 for ; Tue, 22 Mar 2022 21:39:53 +0000 (UTC) X-FDA: 79273339866.10.664B5E0 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf03.hostedemail.com (Postfix) with ESMTP id E517020014 for ; Tue, 22 Mar 2022 21:39:51 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 4B72A61744; Tue, 22 Mar 2022 21:39:51 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id A164EC340EC; Tue, 22 Mar 2022 21:39:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985190; bh=HfbPuDbZwuSvs+yTlCo/b2LVPimdFYsXjwRqeE3iEsg=; h=Date:To:From:In-Reply-To:Subject:From; b=gYALjbZcqupZPDtekjtzZiwK3ulVkG3M1wISR6d/5L7tRyM/K9KVipwPO88dt/Rfo hnuNNipqpTNMSUVG+8MtglGSxeK29+iBNHSUOdfX12bGPRLXwtx5WTzInWkxjPJ+3w 3tGm5IO/2Chst27co5PIXjdcBjgecwyqnWSsp03k= Date: Tue, 22 Mar 2022 14:39:50 -0700 To: willy@infradead.org,peterx@redhat.com,lukas.bulwahn@gmail.com,kirill.shutemov@linux.intel.com,jgg@ziepe.ca,jgg@nvidia.com,jack@suse.cz,imbrenda@linux.ibm.com,hch@lst.de,david@redhat.com,alex.williamson@redhat.com,aarcange@redhat.com,jhubbard@nvidia.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 026/227] mm/gup: remove unused get_user_pages_locked() Message-Id: <20220322213950.A164EC340EC@smtp.kernel.org> X-Rspam-User: X-Stat-Signature: k3kaboiagd7kw6wama8amgw935yzrjcg Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=gYALjbZc; spf=pass (imf03.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: E517020014 X-HE-Tag: 1647985191-507085 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: John Hubbard Subject: mm/gup: remove unused get_user_pages_locked() Now that the last caller of get_user_pages_locked() is gone, remove it. Link: https://lkml.kernel.org/r/20220204020010.68930-6-jhubbard@nvidia.com Signed-off-by: John Hubbard Reviewed-by: Jan Kara Reviewed-by: Jason Gunthorpe Reviewed-by: Claudio Imbrenda Reviewed-by: Christoph Hellwig Cc: Alex Williamson Cc: Andrea Arcangeli Cc: David Hildenbrand Cc: Jason Gunthorpe Cc: Kirill A. Shutemov Cc: Lukas Bulwahn Cc: Matthew Wilcox (Oracle) Cc: Peter Xu Signed-off-by: Andrew Morton --- include/linux/mm.h | 2 - mm/gup.c | 59 ------------------------------------------- 2 files changed, 61 deletions(-) --- a/include/linux/mm.h~mm-gup-remove-unused-get_user_pages_locked +++ a/include/linux/mm.h @@ -1916,8 +1916,6 @@ long get_user_pages(unsigned long start, long pin_user_pages(unsigned long start, unsigned long nr_pages, unsigned int gup_flags, struct page **pages, struct vm_area_struct **vmas); -long get_user_pages_locked(unsigned long start, unsigned long nr_pages, - unsigned int gup_flags, struct page **pages, int *locked); long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages, struct page **pages, unsigned int gup_flags); long pin_user_pages_unlocked(unsigned long start, unsigned long nr_pages, --- a/mm/gup.c~mm-gup-remove-unused-get_user_pages_locked +++ a/mm/gup.c @@ -2126,65 +2126,6 @@ long get_user_pages(unsigned long start, } EXPORT_SYMBOL(get_user_pages); -/** - * get_user_pages_locked() - variant of get_user_pages() - * - * @start: starting user address - * @nr_pages: number of pages from start to pin - * @gup_flags: flags modifying lookup behaviour - * @pages: array that receives pointers to the pages pinned. - * Should be at least nr_pages long. Or NULL, if caller - * only intends to ensure the pages are faulted in. - * @locked: pointer to lock flag indicating whether lock is held and - * subsequently whether VM_FAULT_RETRY functionality can be - * utilised. Lock must initially be held. - * - * It is suitable to replace the form: - * - * mmap_read_lock(mm); - * do_something() - * get_user_pages(mm, ..., pages, NULL); - * mmap_read_unlock(mm); - * - * to: - * - * int locked = 1; - * mmap_read_lock(mm); - * do_something() - * get_user_pages_locked(mm, ..., pages, &locked); - * if (locked) - * mmap_read_unlock(mm); - * - * We can leverage the VM_FAULT_RETRY functionality in the page fault - * paths better by using either get_user_pages_locked() or - * get_user_pages_unlocked(). - * - */ -long get_user_pages_locked(unsigned long start, unsigned long nr_pages, - unsigned int gup_flags, struct page **pages, - int *locked) -{ - /* - * FIXME: Current FOLL_LONGTERM behavior is incompatible with - * FAULT_FLAG_ALLOW_RETRY because of the FS DAX check requirement on - * vmas. As there are no users of this flag in this call we simply - * disallow this option for now. - */ - if (WARN_ON_ONCE(gup_flags & FOLL_LONGTERM)) - return -EINVAL; - /* - * FOLL_PIN must only be set internally by the pin_user_pages*() APIs, - * never directly by the caller, so enforce that: - */ - if (WARN_ON_ONCE(gup_flags & FOLL_PIN)) - return -EINVAL; - - return __get_user_pages_locked(current->mm, start, nr_pages, - pages, NULL, locked, - gup_flags | FOLL_TOUCH); -} -EXPORT_SYMBOL(get_user_pages_locked); - /* * get_user_pages_unlocked() is suitable to replace the form: * From patchwork Tue Mar 22 21:39:52 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789068 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 11EAFC433FE for ; Tue, 22 Mar 2022 21:39:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8E9EB6B0098; Tue, 22 Mar 2022 17:39:55 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 86FC56B0099; Tue, 22 Mar 2022 17:39:55 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 739A66B009A; Tue, 22 Mar 2022 17:39:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0223.hostedemail.com [216.40.44.223]) by kanga.kvack.org (Postfix) with ESMTP id 5FAFE6B0098 for ; Tue, 22 Mar 2022 17:39:55 -0400 (EDT) Received: from smtpin16.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 2A287181D75CA for ; Tue, 22 Mar 2022 21:39:55 +0000 (UTC) X-FDA: 79273339950.16.7C2FA42 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf31.hostedemail.com (Postfix) with ESMTP id D2EF620002 for ; Tue, 22 Mar 2022 21:39:54 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 3B49860A14; Tue, 22 Mar 2022 21:39:54 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 8DE1AC340EE; Tue, 22 Mar 2022 21:39:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985193; bh=8VA4dRateZukIxcuU24dYBOu+ngWksQajNF7aIpezC4=; h=Date:To:From:In-Reply-To:Subject:From; b=J8wiKzZ+B9gGFOqH3UcgpGnjCYk1SLA7vLgYLhLLI5C/49Uyu+gJt5MDu547i2x5x Hl9oFk9678TZXubc2BxJclUZS7g4K2YtSv0a13w4ip1CnNlPRK4MhB39/TE9A1IDcI 2cCQSP8ijvJlEVF+c1rFypSKc9QIV5tGv0YQPS5Q= Date: Tue, 22 Mar 2022 14:39:52 -0700 To: libang.linuxer@gmail.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 027/227] mm/swap: fix confusing comment in folio_mark_accessed Message-Id: <20220322213953.8DE1AC340EE@smtp.kernel.org> X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: D2EF620002 X-Rspam-User: Authentication-Results: imf31.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=J8wiKzZ+; dmarc=none; spf=pass (imf31.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Stat-Signature: 8dpkwsnczxw36nm4kz99imqmcp7ypo4f X-HE-Tag: 1647985194-512131 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Bang Li Subject: mm/swap: fix confusing comment in folio_mark_accessed For unevictable pages, we don't need mark them. Link: https://lkml.kernel.org/r/20220311141519.59948-1-libang.linuxer@gmail.com Signed-off-by: Bang Li Signed-off-by: Andrew Morton --- mm/swap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/mm/swap.c~mm-swap-fix-confusing-comment-in-folio_mark_accessed +++ a/mm/swap.c @@ -425,7 +425,7 @@ void folio_mark_accessed(struct folio *f /* * Unevictable pages are on the "LRU_UNEVICTABLE" list. But, * this list is never rotated or maintained, so marking an - * evictable page accessed has no effect. + * unevictable page accessed has no effect. */ } else if (!folio_test_active(folio)) { /* From patchwork Tue Mar 22 21:39:55 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789069 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3BF8CC433EF for ; Tue, 22 Mar 2022 21:39:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C1BC36B0078; Tue, 22 Mar 2022 17:39:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BCC8E6B007B; Tue, 22 Mar 2022 17:39:58 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A6D076B0082; Tue, 22 Mar 2022 17:39:58 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.25]) by kanga.kvack.org (Postfix) with ESMTP id 979BA6B007B for ; Tue, 22 Mar 2022 17:39:58 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 6FADF21AAC for ; Tue, 22 Mar 2022 21:39:58 +0000 (UTC) X-FDA: 79273340076.02.9105CD1 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf05.hostedemail.com (Postfix) with ESMTP id C2FCE100028 for ; Tue, 22 Mar 2022 21:39:57 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 3076A6023C; Tue, 22 Mar 2022 21:39:57 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 82087C340EE; Tue, 22 Mar 2022 21:39:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985196; bh=UMDCj61a2EfT3lKHMfB5sf3Xf6ueEGq5nXbQ+6GUdPE=; h=Date:To:From:In-Reply-To:Subject:From; b=T0L7CPVySRUVJGUCi5tCP9iqAYAsuFwo4xjoXt0G1m/w+viDHwOFzzJmRZRch0OU5 fRqeOhUjMKkzlIstnQJc8Ls1T2nWiVvPWHf202GTh6v08xQmtUx9RcidRxyHz+cUUC ovcr03rB4YocKfgg/R65DvaBy/bdWMEqxn2FVzSc= Date: Tue, 22 Mar 2022 14:39:55 -0700 To: xavier.grand@algolia.com,sylvain.bellone@algolia.com,jdelvare@suse.de,hughd@google.com,xavier.roche@algolia.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 028/227] tmpfs: support for file creation time Message-Id: <20220322213956.82087C340EE@smtp.kernel.org> X-Stat-Signature: soxt3hfejk7gs4ojsu5dizq3uhpq4i1m Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=T0L7CPVy; spf=pass (imf05.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspam-User: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: C2FCE100028 X-HE-Tag: 1647985197-544526 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Xavier Roche Subject: tmpfs: support for file creation time Various filesystems (including ext4) now support file creation time. This patch adds such support for tmpfs-based filesystems. Note that using shmem_getattr() on other file types than regular requires that shmem_is_huge() check type, to stop incorrect HPAGE_PMD_SIZE blksize. [hughd@google.com: three tweaks to creation time patch] Link: https://lkml.kernel.org/r/b954973a-b8d1-cab8-63bd-6ea8063de3@google.com Link: https://lkml.kernel.org/r/20220314211150.GA123458@xavier-xps Link: https://lkml.kernel.org/r/b954973a-b8d1-cab8-63bd-6ea8063de3@google.com Link: https://lkml.kernel.org/r/20220211213628.GA1919658@xavier-xps Signed-off-by: Xavier Roche Signed-off-by: Hugh Dickins Tested-by: Jean Delvare Tested-by: Sylvain Bellone Reported-by: Xavier Grand Reviewed-by: Jean Delvare Signed-off-by: Andrew Morton --- include/linux/shmem_fs.h | 1 + mm/shmem.c | 16 +++++++++++++--- 2 files changed, 14 insertions(+), 3 deletions(-) --- a/include/linux/shmem_fs.h~tmpfs-support-for-file-creation-time +++ a/include/linux/shmem_fs.h @@ -24,6 +24,7 @@ struct shmem_inode_info { struct shared_policy policy; /* NUMA memory alloc policy */ struct simple_xattrs xattrs; /* list of xattrs */ atomic_t stop_eviction; /* hold when working on inode */ + struct timespec64 i_crtime; /* file creation time */ struct inode vfs_inode; }; --- a/mm/shmem.c~tmpfs-support-for-file-creation-time +++ a/mm/shmem.c @@ -476,6 +476,8 @@ bool shmem_is_huge(struct vm_area_struct { loff_t i_size; + if (!S_ISREG(inode->i_mode)) + return false; if (shmem_huge == SHMEM_HUGE_DENY) return false; if (vma && ((vma->vm_flags & VM_NOHUGEPAGE) || @@ -1061,6 +1063,12 @@ static int shmem_getattr(struct user_nam if (shmem_is_huge(NULL, inode, 0)) stat->blksize = HPAGE_PMD_SIZE; + if (request_mask & STATX_BTIME) { + stat->result_mask |= STATX_BTIME; + stat->btime.tv_sec = info->i_crtime.tv_sec; + stat->btime.tv_nsec = info->i_crtime.tv_nsec; + } + return 0; } @@ -1854,9 +1862,6 @@ repeat: return 0; } - /* Never use a huge page for shmem_symlink() */ - if (S_ISLNK(inode->i_mode)) - goto alloc_nohuge; if (!shmem_is_huge(vma, inode, index)) goto alloc_nohuge; @@ -2265,6 +2270,7 @@ static struct inode *shmem_get_inode(str atomic_set(&info->stop_eviction, 0); info->seals = F_SEAL_SEAL; info->flags = flags & VM_NORESERVE; + info->i_crtime = inode->i_mtime; INIT_LIST_HEAD(&info->shrinklist); INIT_LIST_HEAD(&info->swaplist); simple_xattrs_init(&info->xattrs); @@ -3196,6 +3202,7 @@ static ssize_t shmem_listxattr(struct de #endif /* CONFIG_TMPFS_XATTR */ static const struct inode_operations shmem_short_symlink_operations = { + .getattr = shmem_getattr, .get_link = simple_get_link, #ifdef CONFIG_TMPFS_XATTR .listxattr = shmem_listxattr, @@ -3203,6 +3210,7 @@ static const struct inode_operations shm }; static const struct inode_operations shmem_symlink_inode_operations = { + .getattr = shmem_getattr, .get_link = shmem_get_link, #ifdef CONFIG_TMPFS_XATTR .listxattr = shmem_listxattr, @@ -3790,6 +3798,7 @@ static const struct inode_operations shm static const struct inode_operations shmem_dir_inode_operations = { #ifdef CONFIG_TMPFS + .getattr = shmem_getattr, .create = shmem_create, .lookup = simple_lookup, .link = shmem_link, @@ -3811,6 +3820,7 @@ static const struct inode_operations shm }; static const struct inode_operations shmem_special_inode_operations = { + .getattr = shmem_getattr, #ifdef CONFIG_TMPFS_XATTR .listxattr = shmem_listxattr, #endif From patchwork Tue Mar 22 21:39:58 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789070 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 652B3C433F5 for ; Tue, 22 Mar 2022 21:40:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E8E996B007B; Tue, 22 Mar 2022 17:40:02 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E3BED6B007E; Tue, 22 Mar 2022 17:40:02 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D2BFD6B0082; Tue, 22 Mar 2022 17:40:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.27]) by kanga.kvack.org (Postfix) with ESMTP id C39AD6B007B for ; Tue, 22 Mar 2022 17:40:02 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay11.hostedemail.com (Postfix) with ESMTP id 990AF81A34 for ; Tue, 22 Mar 2022 21:40:02 +0000 (UTC) X-FDA: 79273340244.07.4208386 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf21.hostedemail.com (Postfix) with ESMTP id 2EA131C0027 for ; Tue, 22 Mar 2022 21:40:02 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id F007DB81DAF; Tue, 22 Mar 2022 21:40:00 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 80C97C340EC; Tue, 22 Mar 2022 21:39:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985199; bh=WwkNuOjD3DZPNyweqoSPUm+7FixY1iuGDWC8FvdWHQE=; h=Date:To:From:In-Reply-To:Subject:From; b=S8WEkbVc++ytJtd8t6sAEtqoJLZlKhwLjg5pm0/AFmTKXWS46OX8rc4qbNO76X5Xs mNMT5oOLF+GA3mWPhFP1fWfM+xcxSozwxJl6m75pqGMgpOxwT+/gO/u/Yikl2GSf+u jK3PxN4Oo/sot7853ON+MWMPLl/qFqwdaDGTObk4= Date: Tue, 22 Mar 2022 14:39:58 -0700 To: hughd@google.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 029/227] shmem: mapping_set_exiting() to help mapped resilience Message-Id: <20220322213959.80C97C340EC@smtp.kernel.org> Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=S8WEkbVc; spf=pass (imf21.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Stat-Signature: jgzapooe6e43kqfaw9ajzu8qmpdfcqm5 X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 2EA131C0027 X-HE-Tag: 1647985202-771006 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Hugh Dickins Subject: shmem: mapping_set_exiting() to help mapped resilience When I added page_mapped() resilience in __delete_from_page_cache() for the mapping_exiting() case, I missed that mapping_set_exiting() is done in truncate_inode_pages_final(), which is not actually called for shmem. (Today, it is folio_mapped() resilience in filemap_unaccount_folio().) So the fixup to avoid a memory leak in this case never worked on shmem: add a mapping_set_exiting() in shmem_evict_inode() at last. But this is hardly a candidate for stable, since it's only useful if "Bad page". Link: https://lkml.kernel.org/r/beefffda-6326-e36d-2d41-ed15b51af872@google.com Fixes: 06b241f32c71 ("mm: __delete_from_page_cache show Bad page if mapped") Signed-off-by: Hugh Dickins Signed-off-by: Andrew Morton --- mm/shmem.c | 1 + 1 file changed, 1 insertion(+) --- a/mm/shmem.c~shmem-mapping_set_exiting-to-help-mapped-resilience +++ a/mm/shmem.c @@ -1129,6 +1129,7 @@ static void shmem_evict_inode(struct ino if (shmem_mapping(inode->i_mapping)) { shmem_unacct_size(info->flags, inode->i_size); inode->i_size = 0; + mapping_set_exiting(inode->i_mapping); shmem_truncate_range(inode, 0, (loff_t)-1); if (!list_empty(&info->shrinklist)) { spin_lock(&sbinfo->shrinklist_lock); From patchwork Tue Mar 22 21:40:01 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789071 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 653B3C433F5 for ; Tue, 22 Mar 2022 21:40:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EDB7B6B007E; Tue, 22 Mar 2022 17:40:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E8AC26B0082; Tue, 22 Mar 2022 17:40:05 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DA04B6B0087; Tue, 22 Mar 2022 17:40:05 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.25]) by kanga.kvack.org (Postfix) with ESMTP id C80136B007E for ; Tue, 22 Mar 2022 17:40:05 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 9FD11230EF for ; Tue, 22 Mar 2022 21:40:05 +0000 (UTC) X-FDA: 79273340370.02.3675241 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf08.hostedemail.com (Postfix) with ESMTP id 110C3160021 for ; Tue, 22 Mar 2022 21:40:04 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id D1F32B81DB3; Tue, 22 Mar 2022 21:40:03 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 90DB4C340EE; Tue, 22 Mar 2022 21:40:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985202; bh=a5G1HImTu6fbOGkPD8Hmh/mpVjC/eyFHXYb0QUn6BXo=; h=Date:To:From:In-Reply-To:Subject:From; b=MJOKSwFh1TOMVFa2fKlhqcSRLWSjsSIKbZUZibIZSFEklVh79sUucMMhFJREwILEu zQSNMXsAFWcvdnCwSqEtK3PuM+pBBFWhVtBtVBEwzeLRvTgIxbQWaXP3ibfdtpIIF2 p/PVEbCtq+j7v/Qa64gbzaOb0R19c3iHnftBxI+g= Date: Tue, 22 Mar 2022 14:40:01 -0700 To: zkabelac@redhat.com,mpatocka@redhat.com,miklos@szeredi.hu,lczerner@redhat.com,hch@lst.de,djwong@kernel.org,bp@suse.de,hughd@google.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 030/227] tmpfs: do not allocate pages on read Message-Id: <20220322214002.90DB4C340EE@smtp.kernel.org> X-Stat-Signature: qmzg6ucg46k8iaect1uu4krtiifzhajx Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=MJOKSwFh; spf=pass (imf08.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspam-User: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 110C3160021 X-HE-Tag: 1647985204-276788 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Hugh Dickins Subject: tmpfs: do not allocate pages on read Mikulas asked in https://lore.kernel.org/linux-mm/alpine.LRH.2.02.2007210510230.6959@file01.intranet.prod.int.rdu2.redhat.com/ Do we still need a0ee5ec520ed ("tmpfs: allocate on read when stacked")? Lukas noticed this unusual behavior of loop device backed by tmpfs in https://lore.kernel.org/linux-mm/20211126075100.gd64odg2bcptiqeb@work/ Normally, shmem_file_read_iter() copies the ZERO_PAGE when reading holes; but if it looks like it might be a read for "a stacking filesystem", it allocates actual pages to the page cache, and even marks them as dirty. And reads from the loop device do satisfy the test that is used. This oddity was added for an old version of unionfs, to help to limit its usage to the limited size of the tmpfs mount involved; but about the same time as the tmpfs mod went in (2.6.25), unionfs was reworked to proceed differently; and the mod kept just in case others needed it. Do we still need it? I cannot answer with more certainty than "Probably not". It's nasty enough that we really should try to delete it; but if a regression is reported somewhere, then we might have to revert later. It's not quite as simple as just removing the test (as Mikulas did): xfstests generic/013 hung because splice from tmpfs failed on page not up-to-date and page mapping unset. That can be fixed just by marking the ZERO_PAGE as Uptodate, which of course it is: do so in pagecache_init() - it might be useful to others than tmpfs. My intention, though, was to stop using the ZERO_PAGE here altogether: surely iov_iter_zero() is better for this case? Sadly not: it relies on clear_user(), and the x86 clear_user() is slower than its copy_user(): https://lore.kernel.org/lkml/2f5ca5e4-e250-a41c-11fb-a7f4ebc7e1c9@google.com/ But while we are still using the ZERO_PAGE, let's stop dirtying its struct page cacheline with unnecessary get_page() and put_page(). Link: https://lkml.kernel.org/r/90bc5e69-9984-b5fa-a685-be55f2b64b@google.com Signed-off-by: Hugh Dickins Reported-by: Mikulas Patocka Reported-by: Lukas Czerner Acked-by: Darrick J. Wong Reviewed-by: Christoph Hellwig Cc: Zdenek Kabelac Cc: "Darrick J. Wong" Cc: Miklos Szeredi Cc: Borislav Petkov Signed-off-by: Andrew Morton --- mm/filemap.c | 6 ++++++ mm/shmem.c | 20 ++++++-------------- 2 files changed, 12 insertions(+), 14 deletions(-) --- a/mm/filemap.c~tmpfs-do-not-allocate-pages-on-read +++ a/mm/filemap.c @@ -1054,6 +1054,12 @@ void __init pagecache_init(void) init_waitqueue_head(&folio_wait_table[i]); page_writeback_init(); + + /* + * tmpfs uses the ZERO_PAGE for reading holes: it is up-to-date, + * and splice's page_cache_pipe_buf_confirm() needs to see that. + */ + SetPageUptodate(ZERO_PAGE(0)); } /* --- a/mm/shmem.c~tmpfs-do-not-allocate-pages-on-read +++ a/mm/shmem.c @@ -2499,19 +2499,10 @@ static ssize_t shmem_file_read_iter(stru struct address_space *mapping = inode->i_mapping; pgoff_t index; unsigned long offset; - enum sgp_type sgp = SGP_READ; int error = 0; ssize_t retval = 0; loff_t *ppos = &iocb->ki_pos; - /* - * Might this read be for a stacking filesystem? Then when reading - * holes of a sparse file, we actually need to allocate those pages, - * and even mark them dirty, so it cannot exceed the max_blocks limit. - */ - if (!iter_is_iovec(to)) - sgp = SGP_CACHE; - index = *ppos >> PAGE_SHIFT; offset = *ppos & ~PAGE_MASK; @@ -2520,6 +2511,7 @@ static ssize_t shmem_file_read_iter(stru pgoff_t end_index; unsigned long nr, ret; loff_t i_size = i_size_read(inode); + bool got_page; end_index = i_size >> PAGE_SHIFT; if (index > end_index) @@ -2530,15 +2522,13 @@ static ssize_t shmem_file_read_iter(stru break; } - error = shmem_getpage(inode, index, &page, sgp); + error = shmem_getpage(inode, index, &page, SGP_READ); if (error) { if (error == -EINVAL) error = 0; break; } if (page) { - if (sgp == SGP_CACHE) - set_page_dirty(page); unlock_page(page); if (PageHWPoison(page)) { @@ -2578,9 +2568,10 @@ static ssize_t shmem_file_read_iter(stru */ if (!offset) mark_page_accessed(page); + got_page = true; } else { page = ZERO_PAGE(0); - get_page(page); + got_page = false; } /* @@ -2593,7 +2584,8 @@ static ssize_t shmem_file_read_iter(stru index += offset >> PAGE_SHIFT; offset &= ~PAGE_MASK; - put_page(page); + if (got_page) + put_page(page); if (!iov_iter_count(to)) break; if (ret < nr) { From patchwork Tue Mar 22 21:40:04 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789073 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4CB88C4332F for ; Tue, 22 Mar 2022 21:40:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C1B446B0087; Tue, 22 Mar 2022 17:40:08 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BF2E36B0099; Tue, 22 Mar 2022 17:40:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AE2566B009A; Tue, 22 Mar 2022 17:40:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0070.hostedemail.com [216.40.44.70]) by kanga.kvack.org (Postfix) with ESMTP id 939456B0087 for ; Tue, 22 Mar 2022 17:40:08 -0400 (EDT) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 5A75CA4DA0 for ; Tue, 22 Mar 2022 21:40:08 +0000 (UTC) X-FDA: 79273340496.23.2265167 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf17.hostedemail.com (Postfix) with ESMTP id D727040027 for ; Tue, 22 Mar 2022 21:40:07 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id D91A8B81DB1; Tue, 22 Mar 2022 21:40:06 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7B83CC340EC; Tue, 22 Mar 2022 21:40:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985205; bh=KKk2x1Qh9+LAcEVgwchDKX9W0sw8zLTq+JS0aVNeOxg=; h=Date:To:From:In-Reply-To:Subject:From; b=r4KU2W6OIDfP8YxqZqSkQ8QSLurC76UrkRQKj7L8F88PlEG8ANvcjMsbZQ8Oolvpv KgS7YIJnQ4wAvsy7FoEESCMmmqQaVCeIvx6Ac9Z5IweQp4xld0l0KhvqdJCMdkB7QF vrsMq4f6msQrXimeMG3r347fUNumRxlfnZZR7dQ8= Date: Tue, 22 Mar 2022 14:40:04 -0700 To: hughd@google.com,linmiaohe@huawei.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 031/227] mm: shmem: use helper macro __ATTR_RW Message-Id: <20220322214005.7B83CC340EC@smtp.kernel.org> X-Stat-Signature: rmfyfeh43dqwhodutypmsw1g6zmoahjz Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=r4KU2W6O; spf=pass (imf17.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: D727040027 X-HE-Tag: 1647985207-207464 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Miaohe Lin Subject: mm: shmem: use helper macro __ATTR_RW Use helper macro __ATTR_RW to define shmem_enabled_attr to make code more clear. Minor readability improvement. Link: https://lkml.kernel.org/r/20220312082252.55586-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin Cc: Hugh Dickins Signed-off-by: Andrew Morton --- mm/shmem.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) --- a/mm/shmem.c~mm-shmem-use-helper-macro-__attr_rw +++ a/mm/shmem.c @@ -3965,8 +3965,7 @@ static ssize_t shmem_enabled_store(struc return count; } -struct kobj_attribute shmem_enabled_attr = - __ATTR(shmem_enabled, 0644, shmem_enabled_show, shmem_enabled_store); +struct kobj_attribute shmem_enabled_attr = __ATTR_RW(shmem_enabled); #endif /* CONFIG_TRANSPARENT_HUGEPAGE && CONFIG_SYSFS */ #else /* !CONFIG_SHMEM */ From patchwork Tue Mar 22 21:40:07 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789074 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5D42AC433EF for ; Tue, 22 Mar 2022 21:40:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 69EE36B0098; Tue, 22 Mar 2022 17:40:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 62AC96B0099; Tue, 22 Mar 2022 17:40:10 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 451CB6B009A; Tue, 22 Mar 2022 17:40:10 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.25]) by kanga.kvack.org (Postfix) with ESMTP id 378B76B0098 for ; Tue, 22 Mar 2022 17:40:10 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay13.hostedemail.com (Postfix) with ESMTP id DD3A16194F for ; Tue, 22 Mar 2022 21:40:09 +0000 (UTC) X-FDA: 79273340580.02.A010D9B Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf20.hostedemail.com (Postfix) with ESMTP id 93CD01C0039 for ; Tue, 22 Mar 2022 21:40:09 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 1326C60A14; Tue, 22 Mar 2022 21:40:09 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 6D046C340F2; Tue, 22 Mar 2022 21:40:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985208; bh=zQUgG65EKYS6D0TPC8dtYrqq7kbaZkukOw4swICrN0E=; h=Date:To:From:In-Reply-To:Subject:From; b=AUsQfmHNf2WqHk+2zAP4qmzCTXmb1D8dQvfwk98kjYUobePW3ZcbKO05I9yMsD4mF 9p4H5aW3rsDTG7H/76dxrn2lqCryZXQXz/FCz8czEy5tekAYcC/hjl2+h33dygaiy/ lwOYe5BIqMf2plH9yrchDzR0/oC4T6LosQATBwvE= Date: Tue, 22 Mar 2022 14:40:07 -0700 To: vvs@virtuozzo.com,roman.gushchin@linux.dev,mhocko@suse.com,hannes@cmpxchg.org,shakeelb@google.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 032/227] memcg: replace in_interrupt() with !in_task() Message-Id: <20220322214008.6D046C340F2@smtp.kernel.org> X-Stat-Signature: wgi3bystaok4ck9j1k3ir11w4t11471u Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=AUsQfmHN; dmarc=none; spf=pass (imf20.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Rspam-User: X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: 93CD01C0039 X-HE-Tag: 1647985209-319419 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Shakeel Butt Subject: memcg: replace in_interrupt() with !in_task() Replace the deprecated in_interrupt() with !in_task() because in_interrupt() returns true for BH disabled even if the call happens in the task context. in_task() is the right interface to differentiate task context from NMI, hard IRQ and softirq contexts. Link: https://lkml.kernel.org/r/20220127162636.3461256-1-shakeelb@google.com Signed-off-by: Shakeel Butt Acked-by: Michal Hocko Cc: Vasily Averin Cc: Johannes Weiner Cc: Roman Gushchin Signed-off-by: Andrew Morton --- mm/memcontrol.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) --- a/mm/memcontrol.c~memcg-replace-in_interrupt-with-in_task +++ a/mm/memcontrol.c @@ -2688,7 +2688,7 @@ done_restock: READ_ONCE(memcg->swap.high); /* Don't bother a random interrupted task */ - if (in_interrupt()) { + if (!in_task()) { if (mem_high) { schedule_work(&memcg->high_work); break; @@ -6968,7 +6968,7 @@ void mem_cgroup_sk_alloc(struct sock *sk return; /* Do not associate the sock with unrelated interrupted task's memcg. */ - if (in_interrupt()) + if (!in_task()) return; rcu_read_lock(); From patchwork Tue Mar 22 21:40:10 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789075 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8A657C433F5 for ; Tue, 22 Mar 2022 21:40:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 73CFA6B0081; Tue, 22 Mar 2022 17:40:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6C2836B0099; Tue, 22 Mar 2022 17:40:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 563686B009A; Tue, 22 Mar 2022 17:40:13 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.26]) by kanga.kvack.org (Postfix) with ESMTP id 47AE76B0081 for ; Tue, 22 Mar 2022 17:40:13 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 1F16B247CE for ; Tue, 22 Mar 2022 21:40:13 +0000 (UTC) X-FDA: 79273340706.10.C8E4047 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf07.hostedemail.com (Postfix) with ESMTP id 9A90B40037 for ; Tue, 22 Mar 2022 21:40:12 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 1228860A66; Tue, 22 Mar 2022 21:40:12 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 66567C340EE; Tue, 22 Mar 2022 21:40:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985211; bh=fO9atfCY0/NO1APLlQpphm01jloJVSHPS69H9Vs5pm8=; h=Date:To:From:In-Reply-To:Subject:From; b=p4KhDzJ/AFOC99i0nZbjjcW/+ZgDXw5S+uQTXxJ832YO+OwzFSHoVbh14hZUkEcbi l674v0IfixyN9M77Hv4YPDrAyWknyakjEBjEVtD6jbsSk0rUtEeGlZgfg/2ikrei1j tlzO8EHV59nsgf3y+VQdNXis2B346estdYQXjsEI= Date: Tue, 22 Mar 2022 14:40:10 -0700 To: songmuchun@bytedance.com,shakeelb@google.com,mhocko@kernel.org,hannes@cmpxchg.org,yosryahmed@google.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 033/227] memcg: add per-memcg total kernel memory stat Message-Id: <20220322214011.66567C340EE@smtp.kernel.org> X-Stat-Signature: 6tpbygz8qxefwaykawhznndxu1bokgjy Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b="p4KhDzJ/"; spf=pass (imf07.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspam-User: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 9A90B40037 X-HE-Tag: 1647985212-4399 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yosry Ahmed Subject: memcg: add per-memcg total kernel memory stat Currently memcg stats show several types of kernel memory: kernel stack, page tables, sock, vmalloc, and slab. However, there are other allocations with __GFP_ACCOUNT (or supersets such as GFP_KERNEL_ACCOUNT) that are not accounted in any of those stats, a few examples are: - various kvm allocations (e.g. allocated pages to create vcpus) - io_uring - tmp_page in pipes during pipe_write() - bpf ringbuffers - unix sockets Keeping track of the total kernel memory is essential for the ease of migration from cgroup v1 to v2 as there are large discrepancies between v1's kmem.usage_in_bytes and the sum of the available kernel memory stats in v2. Adding separate memcg stats for all __GFP_ACCOUNT kernel allocations is an impractical maintenance burden as there a lot of those all over the kernel code, with more use cases likely to show up in the future. Therefore, add a "kernel" memcg stat that is analogous to kmem page counter, with added benefits such as using rstat infrastructure which aggregates stats more efficiently. Additionally, this provides a lighter alternative in case the legacy kmem is deprecated in the future [yosryahmed@google.com: v2] Link: https://lkml.kernel.org/r/20220203193856.972500-1-yosryahmed@google.com Link: https://lkml.kernel.org/r/20220201200823.3283171-1-yosryahmed@google.com Signed-off-by: Yosry Ahmed Acked-by: Shakeel Butt Acked-by: Johannes Weiner Cc: Michal Hocko Cc: Muchun Song Signed-off-by: Andrew Morton --- Documentation/admin-guide/cgroup-v2.rst | 5 ++++ include/linux/memcontrol.h | 1 mm/memcontrol.c | 27 +++++++++++++++++----- 3 files changed, 27 insertions(+), 6 deletions(-) --- a/Documentation/admin-guide/cgroup-v2.rst~memcg-add-per-memcg-total-kernel-memory-stat +++ a/Documentation/admin-guide/cgroup-v2.rst @@ -1301,6 +1301,11 @@ PAGE_SIZE multiple when read back. Amount of memory used to cache filesystem data, including tmpfs and shared memory. + kernel (npn) + Amount of total kernel memory, including + (kernel_stack, pagetables, percpu, vmalloc, slab) in + addition to other kernel memory use cases. + kernel_stack Amount of memory allocated to kernel stacks. --- a/include/linux/memcontrol.h~memcg-add-per-memcg-total-kernel-memory-stat +++ a/include/linux/memcontrol.h @@ -34,6 +34,7 @@ enum memcg_stat_item { MEMCG_SOCK, MEMCG_PERCPU_B, MEMCG_VMALLOC, + MEMCG_KMEM, MEMCG_NR_STAT, }; --- a/mm/memcontrol.c~memcg-add-per-memcg-total-kernel-memory-stat +++ a/mm/memcontrol.c @@ -1371,6 +1371,7 @@ struct memory_stat { static const struct memory_stat memory_stats[] = { { "anon", NR_ANON_MAPPED }, { "file", NR_FILE_PAGES }, + { "kernel", MEMCG_KMEM }, { "kernel_stack", NR_KERNEL_STACK_KB }, { "pagetables", NR_PAGETABLE }, { "percpu", MEMCG_PERCPU_B }, @@ -2114,6 +2115,7 @@ static DEFINE_MUTEX(percpu_charge_mutex) static void drain_obj_stock(struct obj_stock *stock); static bool obj_stock_flush_required(struct memcg_stock_pcp *stock, struct mem_cgroup *root_memcg); +static void memcg_account_kmem(struct mem_cgroup *memcg, int nr_pages); #else static inline void drain_obj_stock(struct obj_stock *stock) @@ -2124,6 +2126,9 @@ static bool obj_stock_flush_required(str { return false; } +static void memcg_account_kmem(struct mem_cgroup *memcg, int nr_pages) +{ +} #endif /** @@ -2979,6 +2984,18 @@ static void memcg_free_cache_id(int id) ida_simple_remove(&memcg_cache_ida, id); } +static void memcg_account_kmem(struct mem_cgroup *memcg, int nr_pages) +{ + mod_memcg_state(memcg, MEMCG_KMEM, nr_pages); + if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) { + if (nr_pages > 0) + page_counter_charge(&memcg->kmem, nr_pages); + else + page_counter_uncharge(&memcg->kmem, -nr_pages); + } +} + + /* * obj_cgroup_uncharge_pages: uncharge a number of kernel pages from a objcg * @objcg: object cgroup to uncharge @@ -2991,8 +3008,7 @@ static void obj_cgroup_uncharge_pages(st memcg = get_mem_cgroup_from_objcg(objcg); - if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) - page_counter_uncharge(&memcg->kmem, nr_pages); + memcg_account_kmem(memcg, -nr_pages); refill_stock(memcg, nr_pages); css_put(&memcg->css); @@ -3018,8 +3034,7 @@ static int obj_cgroup_charge_pages(struc if (ret) goto out; - if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) - page_counter_charge(&memcg->kmem, nr_pages); + memcg_account_kmem(memcg, nr_pages); out: css_put(&memcg->css); @@ -6801,8 +6816,8 @@ static void uncharge_batch(const struct page_counter_uncharge(&ug->memcg->memory, ug->nr_memory); if (do_memsw_account()) page_counter_uncharge(&ug->memcg->memsw, ug->nr_memory); - if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && ug->nr_kmem) - page_counter_uncharge(&ug->memcg->kmem, ug->nr_kmem); + if (ug->nr_kmem) + memcg_account_kmem(ug->memcg, -ug->nr_kmem); memcg_oom_recover(ug->memcg); } From patchwork Tue Mar 22 21:40:13 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789076 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 10B17C433EF for ; Tue, 22 Mar 2022 21:40:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 985D06B0075; Tue, 22 Mar 2022 17:40:16 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9344A6B0093; Tue, 22 Mar 2022 17:40:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7D35F6B0099; Tue, 22 Mar 2022 17:40:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0229.hostedemail.com [216.40.44.229]) by kanga.kvack.org (Postfix) with ESMTP id 68AC86B0075 for ; Tue, 22 Mar 2022 17:40:16 -0400 (EDT) Received: from smtpin19.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 205C0182751AD for ; Tue, 22 Mar 2022 21:40:16 +0000 (UTC) X-FDA: 79273340832.19.17E3F14 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf17.hostedemail.com (Postfix) with ESMTP id A8F1F40027 for ; Tue, 22 Mar 2022 21:40:15 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 1C73B60A1C; Tue, 22 Mar 2022 21:40:15 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 76B14C340EE; Tue, 22 Mar 2022 21:40:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985214; bh=HxZoLzTj6RCg+j3ovTGgyoE0bXf5C530MJKDVUt8Zm4=; h=Date:To:From:In-Reply-To:Subject:From; b=xqVDj6fhSnn0Xf78q0BhffN8zlovUFJSZBbIjFb7NBmC01d0DNciDXPGEK2aOsSKc 0X/EI/gV63RNlxPo9Z8IokP3nMJS5FJG8yMkPETTbz+QC8Zgbg2lz/PWl9QmmkV5P/ SMv3Yd4VJ2HgYnVzt4/G8KOuTolMyhOacdgowlLI= Date: Tue, 22 Mar 2022 14:40:13 -0700 To: vdavydov.dev@gmail.com,vbabka@suse.cz,surenb@google.com,songmuchun@bytedance.com,shy828301@gmail.com,shakeelb@google.com,rppt@linux.ibm.com,roman.gushchin@linux.dev,mhocko@suse.com,hannes@cmpxchg.org,guro@fb.com,richard.weiyang@gmail.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 034/227] mm/memcg: mem_cgroup_per_node is already set to 0 on allocation Message-Id: <20220322214014.76B14C340EE@smtp.kernel.org> X-Stat-Signature: dww6scstj3hs3cnt43xp3jg3qy4ukgor Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=xqVDj6fh; spf=pass (imf17.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspam-User: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: A8F1F40027 X-HE-Tag: 1647985215-854392 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Wei Yang Subject: mm/memcg: mem_cgroup_per_node is already set to 0 on allocation kzalloc_node() would set data to 0, so it's not necessary to set it again. Link: https://lkml.kernel.org/r/20220201004643.8391-1-richard.weiyang@gmail.com Signed-off-by: Wei Yang Reviewed-by: Muchun Song Acked-by: Michal Hocko Reviewed-by: Roman Gushchin Reviewed-by: Mike Rapoport Reviewed-by: Shakeel Butt Cc: Roman Gushchin Cc: Johannes Weiner Cc: Suren Baghdasaryan Cc: Vladimir Davydov Cc: Vlastimil Babka Cc: Yang Shi Signed-off-by: Andrew Morton --- mm/memcontrol.c | 2 -- 1 file changed, 2 deletions(-) --- a/mm/memcontrol.c~mm-memcg-mem_cgroup_per_node-is-already-set-to-0-on-allocation +++ a/mm/memcontrol.c @@ -5105,8 +5105,6 @@ static int alloc_mem_cgroup_per_node_inf } lruvec_init(&pn->lruvec); - pn->usage_in_excess = 0; - pn->on_tree = false; pn->memcg = memcg; memcg->nodeinfo[node] = pn; From patchwork Tue Mar 22 21:40:16 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789077 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 623D9C43219 for ; Tue, 22 Mar 2022 21:40:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E3E936B0093; Tue, 22 Mar 2022 17:40:20 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DEDBC6B0099; Tue, 22 Mar 2022 17:40:20 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CDD1F6B009A; Tue, 22 Mar 2022 17:40:20 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0086.hostedemail.com [216.40.44.86]) by kanga.kvack.org (Postfix) with ESMTP id BE6D96B0093 for ; Tue, 22 Mar 2022 17:40:20 -0400 (EDT) Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 89601A4DDB for ; Tue, 22 Mar 2022 21:40:20 +0000 (UTC) X-FDA: 79273341000.29.B77928F Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf22.hostedemail.com (Postfix) with ESMTP id D62E1C0036 for ; Tue, 22 Mar 2022 21:40:19 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id BF83AB81D77; Tue, 22 Mar 2022 21:40:18 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 806F9C340EC; Tue, 22 Mar 2022 21:40:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985217; bh=vzhFL6mlQti7d58JDlxxBNhVkqwV3XQfc4/UNOriJag=; h=Date:To:From:In-Reply-To:Subject:From; b=0sRv+MWOIF0jYQ4lKQICEGXrbbm9gidGVdF7HQoZOJi8PbT72N0UkStaKuBIVeAjW QCMMAdOxBx0jBdh5UVu46K8rclguFGyzbIZA8bT5kiGsyfuUbT1kko2RX4tK62IQo6 hoQAATH5/iIIUfWpr+DTng1bTAMox5RQipJzHJxE= Date: Tue, 22 Mar 2022 14:40:16 -0700 To: vdavydov.dev@gmail.com,vbabka@suse.cz,surenb@google.com,songmuchun@bytedance.com,shy828301@gmail.com,shakeelb@google.com,rppt@linux.ibm.com,roman.gushchin@linux.dev,mhocko@suse.com,hannes@cmpxchg.org,guro@fb.com,richard.weiyang@gmail.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 035/227] mm/memcg: retrieve parent memcg from css.parent Message-Id: <20220322214017.806F9C340EC@smtp.kernel.org> X-Stat-Signature: ipsh859hp4hg9prpt416pd83e9jcb8xx Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=0sRv+MWO; spf=pass (imf22.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspam-User: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: D62E1C0036 X-HE-Tag: 1647985219-81051 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Wei Yang Subject: mm/memcg: retrieve parent memcg from css.parent The parent we get from page_counter is correct, while this is two different hierarchy. Let's retrieve the parent memcg from css.parent just like parent_cs(), blkcg_parent(), etc. Link: https://lkml.kernel.org/r/20220201004643.8391-2-richard.weiyang@gmail.com Signed-off-by: Wei Yang Reviewed-by: Muchun Song Acked-by: Michal Hocko Reviewed-by: Roman Gushchin Reviewed-by: Shakeel Butt Cc: Roman Gushchin Cc: Johannes Weiner Cc: Vladimir Davydov Cc: Yang Shi Cc: Suren Baghdasaryan Cc: Vlastimil Babka Cc: Mike Rapoport Signed-off-by: Andrew Morton --- include/linux/memcontrol.h | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) --- a/include/linux/memcontrol.h~mm-memcg-retrieve-parent-memcg-from-cssparent +++ a/include/linux/memcontrol.h @@ -842,9 +842,7 @@ static inline struct mem_cgroup *lruvec_ */ static inline struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *memcg) { - if (!memcg->memory.parent) - return NULL; - return mem_cgroup_from_counter(memcg->memory.parent, memory); + return mem_cgroup_from_css(memcg->css.parent); } static inline bool mem_cgroup_is_descendant(struct mem_cgroup *memcg, From patchwork Tue Mar 22 21:40:19 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789078 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 370B0C4321E for ; Tue, 22 Mar 2022 21:40:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4A3596B0099; Tue, 22 Mar 2022 17:40:22 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4508F6B009A; Tue, 22 Mar 2022 17:40:22 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 256026B009B; Tue, 22 Mar 2022 17:40:22 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0188.hostedemail.com [216.40.44.188]) by kanga.kvack.org (Postfix) with ESMTP id 1401D6B0099 for ; Tue, 22 Mar 2022 17:40:22 -0400 (EDT) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id C3AAFA4DB6 for ; Tue, 22 Mar 2022 21:40:21 +0000 (UTC) X-FDA: 79273341042.23.09D3015 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf06.hostedemail.com (Postfix) with ESMTP id 548A3180002 for ; Tue, 22 Mar 2022 21:40:21 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id BAD5A608CC; Tue, 22 Mar 2022 21:40:20 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7FCEBC340F3; Tue, 22 Mar 2022 21:40:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985220; bh=YfZYAxqg2c4a5w0CyZsCfpyonl3kxXWETzWlCY61GNk=; h=Date:To:From:In-Reply-To:Subject:From; b=F5Ln4NJtuKnhu2Lwh5H2Bf/T145sdCH6xebNlw4SpiJUkj2Xn8hJ6D0trbNj0wZFD YIuHkoscx5lxetp2yGYftJPu9bidj39LtxtmoXLbUJhG8f3/UZhpsIr6qtq+t6Pi9x GMmu1PGUown78x/bCAEOJYfH+eTNlWa/uAcPr6O4= Date: Tue, 22 Mar 2022 14:40:19 -0700 To: roman.gushchin@linux.dev,mhocko@suse.com,hannes@cmpxchg.org,guro@fb.com,chris@chrisdown.name,shakeelb@google.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 036/227] memcg: refactor mem_cgroup_oom Message-Id: <20220322214020.7FCEBC340F3@smtp.kernel.org> X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 548A3180002 X-Stat-Signature: zuw68tk878165g4h69sbearh7hg1p9tn Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=F5Ln4NJt; dmarc=none; spf=pass (imf06.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Rspam-User: X-HE-Tag: 1647985221-505448 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Shakeel Butt Subject: memcg: refactor mem_cgroup_oom Patch series "memcg: robust enforcement of memory.high", v2. Due to the semantics of memory.high enforcement i.e. throttle the workload without oom-kill, we are trying to use it for right sizing the workloads in our production environment. However we observed the mechanism fails for some specific applications which does big chunck of allocations in a single syscall. The reason behind this failure is due to the limitation of the memory.high enforcement's current implementation. This patch series solves this issue by enforcing the memory.high synchronously if the current process has accumulated a large amount of high overcharge. This patch (of 4): The function mem_cgroup_oom returns enum which has four possible values but the caller does not care about such values and only cares if the return value is OOM_SUCCESS or not. So, remove the enum altogether and make mem_cgroup_oom returns a simple bool. Link: https://lkml.kernel.org/r/20220211064917.2028469-1-shakeelb@google.com Link: https://lkml.kernel.org/r/20220211064917.2028469-2-shakeelb@google.com Signed-off-by: Shakeel Butt Reviewed-by: Roman Gushchin Cc: Roman Gushchin Cc: Johannes Weiner Cc: Michal Hocko Cc: Chris Down Signed-off-by: Andrew Morton --- mm/memcontrol.c | 44 +++++++++++++++++--------------------------- 1 file changed, 17 insertions(+), 27 deletions(-) --- a/mm/memcontrol.c~memcg-refactor-mem_cgroup_oom +++ a/mm/memcontrol.c @@ -1796,20 +1796,16 @@ static void memcg_oom_recover(struct mem __wake_up(&memcg_oom_waitq, TASK_NORMAL, 0, memcg); } -enum oom_status { - OOM_SUCCESS, - OOM_FAILED, - OOM_ASYNC, - OOM_SKIPPED -}; - -static enum oom_status mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask, int order) +/* + * Returns true if successfully killed one or more processes. Though in some + * corner cases it can return true even without killing any process. + */ +static bool mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask, int order) { - enum oom_status ret; - bool locked; + bool locked, ret; if (order > PAGE_ALLOC_COSTLY_ORDER) - return OOM_SKIPPED; + return false; memcg_memory_event(memcg, MEMCG_OOM); @@ -1832,14 +1828,13 @@ static enum oom_status mem_cgroup_oom(st * victim and then we have to bail out from the charge path. */ if (memcg->oom_kill_disable) { - if (!current->in_user_fault) - return OOM_SKIPPED; - css_get(&memcg->css); - current->memcg_in_oom = memcg; - current->memcg_oom_gfp_mask = mask; - current->memcg_oom_order = order; - - return OOM_ASYNC; + if (current->in_user_fault) { + css_get(&memcg->css); + current->memcg_in_oom = memcg; + current->memcg_oom_gfp_mask = mask; + current->memcg_oom_order = order; + } + return false; } mem_cgroup_mark_under_oom(memcg); @@ -1850,10 +1845,7 @@ static enum oom_status mem_cgroup_oom(st mem_cgroup_oom_notify(memcg); mem_cgroup_unmark_under_oom(memcg); - if (mem_cgroup_out_of_memory(memcg, mask, order)) - ret = OOM_SUCCESS; - else - ret = OOM_FAILED; + ret = mem_cgroup_out_of_memory(memcg, mask, order); if (locked) mem_cgroup_oom_unlock(memcg); @@ -2546,7 +2538,6 @@ static int try_charge_memcg(struct mem_c int nr_retries = MAX_RECLAIM_RETRIES; struct mem_cgroup *mem_over_limit; struct page_counter *counter; - enum oom_status oom_status; unsigned long nr_reclaimed; bool passed_oom = false; bool may_swap = true; @@ -2649,9 +2640,8 @@ retry: * a forward progress or bypass the charge if the oom killer * couldn't make any progress. */ - oom_status = mem_cgroup_oom(mem_over_limit, gfp_mask, - get_order(nr_pages * PAGE_SIZE)); - if (oom_status == OOM_SUCCESS) { + if (mem_cgroup_oom(mem_over_limit, gfp_mask, + get_order(nr_pages * PAGE_SIZE))) { passed_oom = true; nr_retries = MAX_RECLAIM_RETRIES; goto retry; From patchwork Tue Mar 22 21:40:22 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789079 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5C97AC433F5 for ; Tue, 22 Mar 2022 21:40:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E40466B0072; Tue, 22 Mar 2022 17:40:26 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DC69C6B007D; Tue, 22 Mar 2022 17:40:26 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CDCC06B009A; Tue, 22 Mar 2022 17:40:26 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.27]) by kanga.kvack.org (Postfix) with ESMTP id BE8686B0072 for ; Tue, 22 Mar 2022 17:40:26 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id A0ECA24120 for ; Tue, 22 Mar 2022 21:40:26 +0000 (UTC) X-FDA: 79273341252.15.8C98681 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf13.hostedemail.com (Postfix) with ESMTP id 08A082001D for ; Tue, 22 Mar 2022 21:40:25 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id B50C5B81D9E; Tue, 22 Mar 2022 21:40:24 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 74014C340F2; Tue, 22 Mar 2022 21:40:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985223; bh=YOlOXimAS2pZdwcnwOmgXOf4GoqCQNwjVOt5ANWwj6M=; h=Date:To:From:In-Reply-To:Subject:From; b=tukPY7UL197MQ013zoWppMKTlT98ULllWAeY17RP6KNV3l3z49/VKBtcvjjCp76kH weCnMkE1kngfpf/bVxmz8qvZ0vmwV3xaoB9e0DPinuI1pghtbfHFele6LcE1HdvETL pCPUh4BfHWusqFyIOYyBiz1m/PwDxJpfez1FneSA= Date: Tue, 22 Mar 2022 14:40:22 -0700 To: roman.gushchin@linux.dev,mhocko@suse.com,hannes@cmpxchg.org,guro@fb.com,chris@chrisdown.name,shakeelb@google.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 037/227] memcg: unify force charging conditions Message-Id: <20220322214023.74014C340F2@smtp.kernel.org> Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=tukPY7UL; spf=pass (imf13.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Stat-Signature: za5ucgdty3is9uoi5oj934y5c7idzrw6 X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 08A082001D X-HE-Tag: 1647985225-147645 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Shakeel Butt Subject: memcg: unify force charging conditions Currently the kernel force charges the allocations which have __GFP_HIGH flag without triggering the memory reclaim. __GFP_HIGH indicates that the caller is high priority and since commit 869712fd3de5 ("mm: memcontrol: fix network errors from failing __GFP_ATOMIC charges") the kernel lets such allocations do force charging. Please note that __GFP_ATOMIC has been replaced by __GFP_HIGH. __GFP_HIGH does not tell if the caller can block or can trigger reclaim. There are separate checks to determine that. So, there is no need to skip reclaiming for __GFP_HIGH allocations. So, handle __GFP_HIGH together with __GFP_NOFAIL which also does force charging. Please note that this is a noop change as there are no __GFP_HIGH allocators in the kernel which also have __GFP_ACCOUNT (or SLAB_ACCOUNT) and does not allow reclaim for now. Link: https://lkml.kernel.org/r/20220211064917.2028469-3-shakeelb@google.com Signed-off-by: Shakeel Butt Reviewed-by: Roman Gushchin Cc: Roman Gushchin Cc: Chris Down Cc: Johannes Weiner Cc: Michal Hocko Signed-off-by: Andrew Morton --- mm/memcontrol.c | 17 +++++++---------- 1 file changed, 7 insertions(+), 10 deletions(-) --- a/mm/memcontrol.c~memcg-unify-force-charging-conditions +++ a/mm/memcontrol.c @@ -2566,15 +2566,6 @@ retry: } /* - * Memcg doesn't have a dedicated reserve for atomic - * allocations. But like the global atomic pool, we need to - * put the burden of reclaim on regular allocation requests - * and let these go through as privileged allocations. - */ - if (gfp_mask & __GFP_ATOMIC) - goto force; - - /* * Prevent unbounded recursion when reclaim operations need to * allocate memory. This might exceed the limits temporarily, * but we prefer facilitating memory reclaim and getting back @@ -2647,7 +2638,13 @@ retry: goto retry; } nomem: - if (!(gfp_mask & __GFP_NOFAIL)) + /* + * Memcg doesn't have a dedicated reserve for atomic + * allocations. But like the global atomic pool, we need to + * put the burden of reclaim on regular allocation requests + * and let these go through as privileged allocations. + */ + if (!(gfp_mask & (__GFP_NOFAIL | __GFP_HIGH))) return -ENOMEM; force: /* From patchwork Tue Mar 22 21:40:25 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789080 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3B1CBC433F5 for ; Tue, 22 Mar 2022 21:40:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 984796B007D; Tue, 22 Mar 2022 17:40:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9333C6B009A; Tue, 22 Mar 2022 17:40:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7D43E6B009B; Tue, 22 Mar 2022 17:40:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.27]) by kanga.kvack.org (Postfix) with ESMTP id 6696B6B007D for ; Tue, 22 Mar 2022 17:40:28 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 41AF821AB5 for ; Tue, 22 Mar 2022 21:40:28 +0000 (UTC) X-FDA: 79273341336.14.7E92528 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf02.hostedemail.com (Postfix) with ESMTP id B838C8001F for ; Tue, 22 Mar 2022 21:40:27 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 1F2DE608CC; Tue, 22 Mar 2022 21:40:27 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 726E4C340EC; Tue, 22 Mar 2022 21:40:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985226; bh=mKC0uYOcIwKuv/QmmYXg2LhlQfLbD0dcqpP4XxWaPh8=; h=Date:To:From:In-Reply-To:Subject:From; b=dybNFcVjkHFRlycwIZPxK4aXxxwtdGwM4PvHyJRERL8MnDmYsT6zIH5jNlwfSLqIH 5haQmvNlfSbBAXhOMSh6fYuKNgy9kdiZOAbGSSXjP0oXZ10X4tSiStp8lIA/BfNv36 eTiAFwGpvwpvXrwOikM4EltXciccMpbZP0wmqMLo= Date: Tue, 22 Mar 2022 14:40:25 -0700 To: roman.gushchin@linux.dev,mhocko@suse.com,hannes@cmpxchg.org,guro@fb.com,chris@chrisdown.name,shakeelb@google.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 038/227] selftests: memcg: test high limit for single entry allocation Message-Id: <20220322214026.726E4C340EC@smtp.kernel.org> X-Rspam-User: X-Stat-Signature: ka5th89gzsofhw6y6gnpyk6wkorjfh9x Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=dybNFcVj; spf=pass (imf02.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: B838C8001F X-HE-Tag: 1647985227-73531 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000012, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Shakeel Butt Subject: selftests: memcg: test high limit for single entry allocation Test the enforcement of memory.high limit for large amount of memory allocation within a single kernel entry. There are valid use-cases where the application can trigger large amount of memory allocation within a single syscall e.g. mlock() or mmap(MAP_POPULATE). Make sure memory.high limit enforcement works for such use-cases. Link: https://lkml.kernel.org/r/20220211064917.2028469-4-shakeelb@google.com Signed-off-by: Shakeel Butt Reviewed-by: Roman Gushchin Cc: Roman Gushchin Cc: Chris Down Cc: Johannes Weiner Cc: Michal Hocko Signed-off-by: Andrew Morton --- tools/testing/selftests/cgroup/cgroup_util.c | 15 ++ tools/testing/selftests/cgroup/cgroup_util.h | 1 tools/testing/selftests/cgroup/test_memcontrol.c | 78 +++++++++++++ 3 files changed, 91 insertions(+), 3 deletions(-) --- a/tools/testing/selftests/cgroup/cgroup_util.c~selftests-memcg-test-high-limit-for-single-entry-allocation +++ a/tools/testing/selftests/cgroup/cgroup_util.c @@ -583,7 +583,7 @@ int clone_into_cgroup_run_wait(const cha return 0; } -int cg_prepare_for_wait(const char *cgroup) +static int __prepare_for_wait(const char *cgroup, const char *filename) { int fd, ret = -1; @@ -591,8 +591,7 @@ int cg_prepare_for_wait(const char *cgro if (fd == -1) return fd; - ret = inotify_add_watch(fd, cg_control(cgroup, "cgroup.events"), - IN_MODIFY); + ret = inotify_add_watch(fd, cg_control(cgroup, filename), IN_MODIFY); if (ret == -1) { close(fd); fd = -1; @@ -601,6 +600,16 @@ int cg_prepare_for_wait(const char *cgro return fd; } +int cg_prepare_for_wait(const char *cgroup) +{ + return __prepare_for_wait(cgroup, "cgroup.events"); +} + +int memcg_prepare_for_wait(const char *cgroup) +{ + return __prepare_for_wait(cgroup, "memory.events"); +} + int cg_wait_for(int fd) { int ret = -1; --- a/tools/testing/selftests/cgroup/cgroup_util.h~selftests-memcg-test-high-limit-for-single-entry-allocation +++ a/tools/testing/selftests/cgroup/cgroup_util.h @@ -55,4 +55,5 @@ extern int clone_reap(pid_t pid, int opt extern int clone_into_cgroup_run_wait(const char *cgroup); extern int dirfd_open_opath(const char *dir); extern int cg_prepare_for_wait(const char *cgroup); +extern int memcg_prepare_for_wait(const char *cgroup); extern int cg_wait_for(int fd); --- a/tools/testing/selftests/cgroup/test_memcontrol.c~selftests-memcg-test-high-limit-for-single-entry-allocation +++ a/tools/testing/selftests/cgroup/test_memcontrol.c @@ -16,6 +16,7 @@ #include #include #include +#include #include "../kselftest.h" #include "cgroup_util.h" @@ -628,6 +629,82 @@ cleanup: return ret; } +static int alloc_anon_mlock(const char *cgroup, void *arg) +{ + size_t size = (size_t)arg; + void *buf; + + buf = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON, + 0, 0); + if (buf == MAP_FAILED) + return -1; + + mlock(buf, size); + munmap(buf, size); + return 0; +} + +/* + * This test checks that memory.high is able to throttle big single shot + * allocation i.e. large allocation within one kernel entry. + */ +static int test_memcg_high_sync(const char *root) +{ + int ret = KSFT_FAIL, pid, fd = -1; + char *memcg; + long pre_high, pre_max; + long post_high, post_max; + + memcg = cg_name(root, "memcg_test"); + if (!memcg) + goto cleanup; + + if (cg_create(memcg)) + goto cleanup; + + pre_high = cg_read_key_long(memcg, "memory.events", "high "); + pre_max = cg_read_key_long(memcg, "memory.events", "max "); + if (pre_high < 0 || pre_max < 0) + goto cleanup; + + if (cg_write(memcg, "memory.swap.max", "0")) + goto cleanup; + + if (cg_write(memcg, "memory.high", "30M")) + goto cleanup; + + if (cg_write(memcg, "memory.max", "140M")) + goto cleanup; + + fd = memcg_prepare_for_wait(memcg); + if (fd < 0) + goto cleanup; + + pid = cg_run_nowait(memcg, alloc_anon_mlock, (void *)MB(200)); + if (pid < 0) + goto cleanup; + + cg_wait_for(fd); + + post_high = cg_read_key_long(memcg, "memory.events", "high "); + post_max = cg_read_key_long(memcg, "memory.events", "max "); + if (post_high < 0 || post_max < 0) + goto cleanup; + + if (pre_high == post_high || pre_max != post_max) + goto cleanup; + + ret = KSFT_PASS; + +cleanup: + if (fd >= 0) + close(fd); + cg_destroy(memcg); + free(memcg); + + return ret; +} + /* * This test checks that memory.max limits the amount of * memory which can be consumed by either anonymous memory @@ -1180,6 +1257,7 @@ struct memcg_test { T(test_memcg_min), T(test_memcg_low), T(test_memcg_high), + T(test_memcg_high_sync), T(test_memcg_max), T(test_memcg_oom_events), T(test_memcg_swap_max), From patchwork Tue Mar 22 21:40:28 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789081 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8A689C433EF for ; Tue, 22 Mar 2022 21:40:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 16DB26B009A; Tue, 22 Mar 2022 17:40:34 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 11E936B009B; Tue, 22 Mar 2022 17:40:34 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F26DE6B009C; Tue, 22 Mar 2022 17:40:33 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id E3CB46B009A for ; Tue, 22 Mar 2022 17:40:33 -0400 (EDT) Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 4278E1828AC94 for ; Tue, 22 Mar 2022 21:40:32 +0000 (UTC) X-FDA: 79273341504.29.18BC1E1 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf24.hostedemail.com (Postfix) with ESMTP id BFD3C180038 for ; Tue, 22 Mar 2022 21:40:31 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id B0F22B81D9E; Tue, 22 Mar 2022 21:40:30 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 6B700C340F2; Tue, 22 Mar 2022 21:40:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985229; bh=q+Rxyd8FiWtWfSb4FllBbKO/RZXn17P9B+CTV10QsCs=; h=Date:To:From:In-Reply-To:Subject:From; b=B2sWeZSQ9z7hq1jQg+bmH1SIebIe9UVpbmkRuUuz2bIIGfs/0a+aM9spMPOwax2Pn XPYmRWStWga3Oumce8MvYbvD0WdGrnGr9tlgHCgN7XFU0n6IxxBXV3oCXwcB+NGXyZ PinUKX7Mo9x0OOhHdjwc9IFp/LtyaBy6M65d9Y8s= Date: Tue, 22 Mar 2022 14:40:28 -0700 To: roman.gushchin@linux.dev,mhocko@suse.com,hannes@cmpxchg.org,guro@fb.com,chris@chrisdown.name,shakeelb@google.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 039/227] memcg: synchronously enforce memory.high for large overcharges Message-Id: <20220322214029.6B700C340F2@smtp.kernel.org> X-Stat-Signature: 47hqcy5t5i5468k1qttediy95p737ixu Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=B2sWeZSQ; dmarc=none; spf=pass (imf24.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Rspam-User: X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: BFD3C180038 X-HE-Tag: 1647985231-336940 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Shakeel Butt Subject: memcg: synchronously enforce memory.high for large overcharges The high limit is used to throttle the workload without invoking the oom-killer. Recently we tried to use the high limit to right size our internal workloads. More specifically dynamically adjusting the limits of the workload without letting the workload get oom-killed. However due to the limitation of the implementation of high limit enforcement, we observed the mechanism fails for some real workloads. The high limit is enforced on return-to-userspace i.e. the kernel let the usage goes over the limit and when the execution returns to userspace, the high reclaim is triggered and the process can get throttled as well. However this mechanism fails for workloads which do large allocations in a single kernel entry e.g. applications that mlock() a large chunk of memory in a single syscall. Such applications bypass the high limit and can trigger the oom-killer. To make high limit enforcement more robust, this patch makes the limit enforcement synchronous only if the accumulated overcharge becomes larger than MEMCG_CHARGE_BATCH. So, most of the allocations would still be throttled on the return-to-userspace path but only the extreme allocations which accumulates large amount of overcharge without returning to the userspace will be throttled synchronously. The value MEMCG_CHARGE_BATCH is a bit arbitrary but most of other places in the memcg codebase uses this constant therefore for now uses the same one. Link: https://lkml.kernel.org/r/20220211064917.2028469-5-shakeelb@google.com Signed-off-by: Shakeel Butt Reviewed-by: Roman Gushchin Acked-by: Chris Down Cc: Roman Gushchin Cc: Johannes Weiner Cc: Michal Hocko Signed-off-by: Andrew Morton --- mm/memcontrol.c | 5 +++++ 1 file changed, 5 insertions(+) --- a/mm/memcontrol.c~memcg-synchronously-enforce-memoryhigh-for-large-overcharges +++ a/mm/memcontrol.c @@ -2704,6 +2704,11 @@ done_restock: } } while ((memcg = parent_mem_cgroup(memcg))); + if (current->memcg_nr_pages_over_high > MEMCG_CHARGE_BATCH && + !(current->flags & PF_MEMALLOC) && + gfpflags_allow_blocking(gfp_mask)) { + mem_cgroup_handle_over_high(); + } return 0; } From patchwork Tue Mar 22 21:40:31 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789082 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5A248C433FE for ; Tue, 22 Mar 2022 21:40:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CDE506B009B; Tue, 22 Mar 2022 17:40:35 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C8F356B009C; Tue, 22 Mar 2022 17:40:35 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B7D4D6B009D; Tue, 22 Mar 2022 17:40:35 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0105.hostedemail.com [216.40.44.105]) by kanga.kvack.org (Postfix) with ESMTP id A71026B009B for ; Tue, 22 Mar 2022 17:40:35 -0400 (EDT) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 6831D182739D1 for ; Tue, 22 Mar 2022 21:40:35 +0000 (UTC) X-FDA: 79273341630.18.720057B Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf22.hostedemail.com (Postfix) with ESMTP id E9402C0031 for ; Tue, 22 Mar 2022 21:40:34 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id B6908B81D9E; Tue, 22 Mar 2022 21:40:33 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 750D6C340EC; Tue, 22 Mar 2022 21:40:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985232; bh=44P0kXBr6Wmm5QcJbLKop4jEJJGuQ/RbesnXVFSmoiY=; h=Date:To:From:In-Reply-To:Subject:From; b=ldaxHFRUbsSnpy2BiwtwXQ/D2hAujdDqKxyXk0I8KUR457FA17I9OpUP8VX2Ux3Rp +2iW+39RZ4ByD3ZirqehcBsaoQKZ1p90FFNwYJz0LzJIofiVYR/0AlfniR6HCIZsaY k+6QTzBPEWl7sfWhOze5KIPrzJUGm6WFICCEQRS8= Date: Tue, 22 Mar 2022 14:40:31 -0700 To: vdavydov.dev@gmail.com,roman.gushchin@linux.dev,mkoutny@suse.com,mhocko@kernel.org,i.zhbanov@omprussia.ru,hannes@cmpxchg.org,rdunlap@infradead.org,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 040/227] mm/memcontrol: return 1 from cgroup.memory __setup() handler Message-Id: <20220322214032.750D6C340EC@smtp.kernel.org> X-Stat-Signature: yd89qx6hmth841gbo81du17zqktygyyw X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: E9402C0031 Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=ldaxHFRU; dmarc=none; spf=pass (imf22.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Rspam-User: X-HE-Tag: 1647985234-201907 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Randy Dunlap Subject: mm/memcontrol: return 1 from cgroup.memory __setup() handler __setup() handlers should return 1 if the command line option is handled and 0 if not (or maybe never return 0; it just pollutes init's environment). The only reason that this particular __setup handler does not pollute init's environment is that the setup string contains a '.', as in "cgroup.memory". This causes init/main.c::unknown_boottoption() to consider it to be an "Unused module parameter" and ignore it. (This is for parsing of loadable module parameters any time after kernel init.) Otherwise the string "cgroup.memory=whatever" would be added to init's environment strings. Instead of relying on this '.' quirk, just return 1 to indicate that the boot option has been handled. Note that there is no warning message if someone enters: cgroup.memory=anything_invalid Link: https://lkml.kernel.org/r/20220222005811.10672-1-rdunlap@infradead.org Fixes: f7e1cb6ec51b0 ("mm: memcontrol: account socket memory in unified hierarchy memory controller") Signed-off-by: Randy Dunlap Reported-by: Igor Zhbanov Link: lore.kernel.org/r/64644a2f-4a20-bab3-1e15-3b2cdd0defe3@omprussia.ru Reviewed-by: Michal Koutný Cc: Johannes Weiner Cc: Michal Hocko Cc: Vladimir Davydov Cc: Roman Gushchin Signed-off-by: Andrew Morton --- mm/memcontrol.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/mm/memcontrol.c~mm-memcontrol-return-1-from-cgroupmemory-__setup-handler +++ a/mm/memcontrol.c @@ -7058,7 +7058,7 @@ static int __init cgroup_memory(char *s) if (!strcmp(token, "nokmem")) cgroup_memory_nokmem = true; } - return 0; + return 1; } __setup("cgroup.memory=", cgroup_memory); From patchwork Tue Mar 22 21:40:35 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789083 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A626EC433EF for ; Tue, 22 Mar 2022 21:40:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 406366B009C; Tue, 22 Mar 2022 17:40:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3B5216B009D; Tue, 22 Mar 2022 17:40:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2305B6B009E; Tue, 22 Mar 2022 17:40:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0054.hostedemail.com [216.40.44.54]) by kanga.kvack.org (Postfix) with ESMTP id 137546B009C for ; Tue, 22 Mar 2022 17:40:39 -0400 (EDT) Received: from smtpin17.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id CB09FA4DDC for ; Tue, 22 Mar 2022 21:40:38 +0000 (UTC) X-FDA: 79273341756.17.6FD5DF9 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf04.hostedemail.com (Postfix) with ESMTP id 2B9D04002E for ; Tue, 22 Mar 2022 21:40:38 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id DECA2B81DB3; Tue, 22 Mar 2022 21:40:36 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 90F20C340EC; Tue, 22 Mar 2022 21:40:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985235; bh=wEoN/k78XEvkbwHgSqao8XtfxZJ8qaNf4Q6EBOU2kIA=; h=Date:To:From:In-Reply-To:Subject:From; b=uo5UAA1lGrv9HabqTsBb1fKOm1Hn/swOX7QHEdSoVhTSDka88Oq8WM+3xfhnx1k5/ eR4pG+PFnzzHWeoq1ay0yqW++Jyok2dPH+cMOGmKii44zRjXlpnbASf5795rPa7eEH Jhnp+xk3+7gB+vu1kcIpNd0irqXordB5Qm89hY3Y= Date: Tue, 22 Mar 2022 14:40:35 -0700 To: vdavydov.dev@gmail.com,tglx@linutronix.de,shakeelb@google.com,peterz@infradead.org,oliver.sang@intel.com,mkoutny@suse.com,mhocko@kernel.org,longman@redhat.com,hannes@cmpxchg.org,guro@fb.com,bigeasy@linutronix.de,mhocko@suse.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 041/227] mm/memcg: revert ("mm/memcg: optimize user context object stock access") Message-Id: <20220322214035.90F20C340EC@smtp.kernel.org> X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 2B9D04002E X-Stat-Signature: bh1epasaxq4qiydf7bri8au1ywyonp9s X-Rspam-User: Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=uo5UAA1l; dmarc=none; spf=pass (imf04.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-HE-Tag: 1647985238-277907 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Michal Hocko Subject: mm/memcg: revert ("mm/memcg: optimize user context object stock access") Patch series "mm/memcg: Address PREEMPT_RT problems instead of disabling it", v5. This series aims to address the memcg related problem on PREEMPT_RT. I tested them on CONFIG_PREEMPT and CONFIG_PREEMPT_RT with the tools/testing/selftests/cgroup/* tests and I haven't observed any regressions (other than the lockdep report that is already there). This patch (of 6): The optimisation is based on a micro benchmark where local_irq_save() is more expensive than a preempt_disable(). There is no evidence that it is visible in a real-world workload and there are CPUs where the opposite is true (local_irq_save() is cheaper than preempt_disable()). Based on micro benchmarks, the optimisation makes sense on PREEMPT_NONE where preempt_disable() is optimized away. There is no improvement with PREEMPT_DYNAMIC since the preemption counter is always available. The optimization makes also the PREEMPT_RT integration more complicated since most of the assumption are not true on PREEMPT_RT. Revert the optimisation since it complicates the PREEMPT_RT integration and the improvement is hardly visible. [bigeasy@linutronix.de: patch body around Michal's diff] Link: https://lkml.kernel.org/r/20220226204144.1008339-1-bigeasy@linutronix.de Link: https://lore.kernel.org/all/YgOGkXXCrD%2F1k+p4@dhcp22.suse.cz Link: https://lkml.kernel.org/r/YdX+INO9gQje6d0S@linutronix.de Link: https://lkml.kernel.org/r/20220226204144.1008339-2-bigeasy@linutronix.de Signed-off-by: Michal Hocko Signed-off-by: Sebastian Andrzej Siewior Acked-by: Roman Gushchin Acked-by: Johannes Weiner Reviewed-by: Shakeel Butt Acked-by: Michal Hocko Cc: Johannes Weiner Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Vladimir Davydov Cc: Waiman Long Cc: kernel test robot Cc: Michal Hocko Cc: Michal Koutný Signed-off-by: Andrew Morton --- mm/memcontrol.c | 94 +++++++++++++--------------------------------- 1 file changed, 27 insertions(+), 67 deletions(-) --- a/mm/memcontrol.c~mm-memcg-revert-mm-memcg-optimize-user-context-object-stock-access +++ a/mm/memcontrol.c @@ -2078,23 +2078,17 @@ void unlock_page_memcg(struct page *page folio_memcg_unlock(page_folio(page)); } -struct obj_stock { +struct memcg_stock_pcp { + struct mem_cgroup *cached; /* this never be root cgroup */ + unsigned int nr_pages; + #ifdef CONFIG_MEMCG_KMEM struct obj_cgroup *cached_objcg; struct pglist_data *cached_pgdat; unsigned int nr_bytes; int nr_slab_reclaimable_b; int nr_slab_unreclaimable_b; -#else - int dummy[0]; #endif -}; - -struct memcg_stock_pcp { - struct mem_cgroup *cached; /* this never be root cgroup */ - unsigned int nr_pages; - struct obj_stock task_obj; - struct obj_stock irq_obj; struct work_struct work; unsigned long flags; @@ -2104,13 +2098,13 @@ static DEFINE_PER_CPU(struct memcg_stock static DEFINE_MUTEX(percpu_charge_mutex); #ifdef CONFIG_MEMCG_KMEM -static void drain_obj_stock(struct obj_stock *stock); +static void drain_obj_stock(struct memcg_stock_pcp *stock); static bool obj_stock_flush_required(struct memcg_stock_pcp *stock, struct mem_cgroup *root_memcg); static void memcg_account_kmem(struct mem_cgroup *memcg, int nr_pages); #else -static inline void drain_obj_stock(struct obj_stock *stock) +static inline void drain_obj_stock(struct memcg_stock_pcp *stock) { } static bool obj_stock_flush_required(struct memcg_stock_pcp *stock, @@ -2190,9 +2184,7 @@ static void drain_local_stock(struct wor local_irq_save(flags); stock = this_cpu_ptr(&memcg_stock); - drain_obj_stock(&stock->irq_obj); - if (in_task()) - drain_obj_stock(&stock->task_obj); + drain_obj_stock(stock); drain_stock(stock); clear_bit(FLUSHING_CACHED_CHARGE, &stock->flags); @@ -2768,41 +2760,6 @@ retry: #define OBJCGS_CLEAR_MASK (__GFP_DMA | __GFP_RECLAIMABLE | __GFP_ACCOUNT) /* - * Most kmem_cache_alloc() calls are from user context. The irq disable/enable - * sequence used in this case to access content from object stock is slow. - * To optimize for user context access, there are now two object stocks for - * task context and interrupt context access respectively. - * - * The task context object stock can be accessed by disabling preemption only - * which is cheap in non-preempt kernel. The interrupt context object stock - * can only be accessed after disabling interrupt. User context code can - * access interrupt object stock, but not vice versa. - */ -static inline struct obj_stock *get_obj_stock(unsigned long *pflags) -{ - struct memcg_stock_pcp *stock; - - if (likely(in_task())) { - *pflags = 0UL; - preempt_disable(); - stock = this_cpu_ptr(&memcg_stock); - return &stock->task_obj; - } - - local_irq_save(*pflags); - stock = this_cpu_ptr(&memcg_stock); - return &stock->irq_obj; -} - -static inline void put_obj_stock(unsigned long flags) -{ - if (likely(in_task())) - preempt_enable(); - else - local_irq_restore(flags); -} - -/* * mod_objcg_mlstate() may be called with irq enabled, so * mod_memcg_lruvec_state() should be used. */ @@ -3082,10 +3039,13 @@ void __memcg_kmem_uncharge_page(struct p void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat, enum node_stat_item idx, int nr) { + struct memcg_stock_pcp *stock; unsigned long flags; - struct obj_stock *stock = get_obj_stock(&flags); int *bytes; + local_irq_save(flags); + stock = this_cpu_ptr(&memcg_stock); + /* * Save vmstat data in stock and skip vmstat array update unless * accumulating over a page of vmstat data or when pgdat or idx @@ -3136,26 +3096,29 @@ void mod_objcg_state(struct obj_cgroup * if (nr) mod_objcg_mlstate(objcg, pgdat, idx, nr); - put_obj_stock(flags); + local_irq_restore(flags); } static bool consume_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes) { + struct memcg_stock_pcp *stock; unsigned long flags; - struct obj_stock *stock = get_obj_stock(&flags); bool ret = false; + local_irq_save(flags); + + stock = this_cpu_ptr(&memcg_stock); if (objcg == stock->cached_objcg && stock->nr_bytes >= nr_bytes) { stock->nr_bytes -= nr_bytes; ret = true; } - put_obj_stock(flags); + local_irq_restore(flags); return ret; } -static void drain_obj_stock(struct obj_stock *stock) +static void drain_obj_stock(struct memcg_stock_pcp *stock) { struct obj_cgroup *old = stock->cached_objcg; @@ -3211,13 +3174,8 @@ static bool obj_stock_flush_required(str { struct mem_cgroup *memcg; - if (in_task() && stock->task_obj.cached_objcg) { - memcg = obj_cgroup_memcg(stock->task_obj.cached_objcg); - if (memcg && mem_cgroup_is_descendant(memcg, root_memcg)) - return true; - } - if (stock->irq_obj.cached_objcg) { - memcg = obj_cgroup_memcg(stock->irq_obj.cached_objcg); + if (stock->cached_objcg) { + memcg = obj_cgroup_memcg(stock->cached_objcg); if (memcg && mem_cgroup_is_descendant(memcg, root_memcg)) return true; } @@ -3228,10 +3186,13 @@ static bool obj_stock_flush_required(str static void refill_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes, bool allow_uncharge) { + struct memcg_stock_pcp *stock; unsigned long flags; - struct obj_stock *stock = get_obj_stock(&flags); unsigned int nr_pages = 0; + local_irq_save(flags); + + stock = this_cpu_ptr(&memcg_stock); if (stock->cached_objcg != objcg) { /* reset if necessary */ drain_obj_stock(stock); obj_cgroup_get(objcg); @@ -3247,7 +3208,7 @@ static void refill_obj_stock(struct obj_ stock->nr_bytes &= (PAGE_SIZE - 1); } - put_obj_stock(flags); + local_irq_restore(flags); if (nr_pages) obj_cgroup_uncharge_pages(objcg, nr_pages); @@ -6826,7 +6787,6 @@ static void uncharge_folio(struct folio long nr_pages; struct mem_cgroup *memcg; struct obj_cgroup *objcg; - bool use_objcg = folio_memcg_kmem(folio); VM_BUG_ON_FOLIO(folio_test_lru(folio), folio); @@ -6835,7 +6795,7 @@ static void uncharge_folio(struct folio * folio memcg or objcg at this point, we have fully * exclusive access to the folio. */ - if (use_objcg) { + if (folio_memcg_kmem(folio)) { objcg = __folio_objcg(folio); /* * This get matches the put at the end of the function and @@ -6863,7 +6823,7 @@ static void uncharge_folio(struct folio nr_pages = folio_nr_pages(folio); - if (use_objcg) { + if (folio_memcg_kmem(folio)) { ug->nr_memory += nr_pages; ug->nr_kmem += nr_pages; From patchwork Tue Mar 22 21:40:41 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789084 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 680A8C433EF for ; Tue, 22 Mar 2022 21:40:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F27D56B0072; Tue, 22 Mar 2022 17:40:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id ED79A6B0073; Tue, 22 Mar 2022 17:40:43 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DA02E6B0085; Tue, 22 Mar 2022 17:40:43 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.26]) by kanga.kvack.org (Postfix) with ESMTP id CBFA56B0072 for ; Tue, 22 Mar 2022 17:40:43 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay13.hostedemail.com (Postfix) with ESMTP id 7507761957 for ; Tue, 22 Mar 2022 21:40:43 +0000 (UTC) X-FDA: 79273341966.09.17BCCBD Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf05.hostedemail.com (Postfix) with ESMTP id E247410002B for ; Tue, 22 Mar 2022 21:40:42 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 672916104C; Tue, 22 Mar 2022 21:40:42 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id B891EC340EE; Tue, 22 Mar 2022 21:40:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985241; bh=xd0jUdT9Y1l0qCfV8WsVTuy85Z5R/2QOmfTzH7rvdPg=; h=Date:To:From:In-Reply-To:Subject:From; b=fgGdAX9/knMOM0cVkf08y6Oglo/K7juzIusd1p4k73wre8fVwzj7n+4x8Yhxt3eU4 LDHSX7WuPMVYDbcYCPI9gY42D90583TG1F12ebwA+czNwhw46Ad/7XNwWnU0bEOvUx hnr3l50iDyNH3sNC6G952GvqSSLLF0zcDRsAzmOw= Date: Tue, 22 Mar 2022 14:40:41 -0700 To: vdavydov.dev@gmail.com,tglx@linutronix.de,shakeelb@google.com,peterz@infradead.org,oliver.sang@intel.com,mkoutny@suse.com,mhocko@suse.com,mhocko@kernel.org,longman@redhat.com,hannes@cmpxchg.org,guro@fb.com,bigeasy@linutronix.de,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 043/227] mm/memcg: protect per-CPU counter by disabling preemption on PREEMPT_RT where needed. Message-Id: <20220322214041.B891EC340EE@smtp.kernel.org> X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: E247410002B X-Stat-Signature: 88hh47pd3i473q1y355xegow644b7dd1 X-Rspam-User: Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b="fgGdAX9/"; dmarc=none; spf=pass (imf05.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-HE-Tag: 1647985242-938603 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Sebastian Andrzej Siewior Subject: mm/memcg: protect per-CPU counter by disabling preemption on PREEMPT_RT where needed. The per-CPU counter are modified with the non-atomic modifier. The consistency is ensured by disabling interrupts for the update. On non PREEMPT_RT configuration this works because acquiring a spinlock_t typed lock with the _irq() suffix disables interrupts. On PREEMPT_RT configurations the RMW operation can be interrupted. Another problem is that mem_cgroup_swapout() expects to be invoked with disabled interrupts because the caller has to acquire a spinlock_t which is acquired with disabled interrupts. Since spinlock_t never disables interrupts on PREEMPT_RT the interrupts are never disabled at this point. The code is never called from in_irq() context on PREEMPT_RT therefore disabling preemption during the update is sufficient on PREEMPT_RT. The sections which explicitly disable interrupts can remain on PREEMPT_RT because the sections remain short and they don't involve sleeping locks (memcg_check_events() is doing nothing on PREEMPT_RT). Disable preemption during update of the per-CPU variables which do not explicitly disable interrupts. Link: https://lkml.kernel.org/r/20220226204144.1008339-4-bigeasy@linutronix.de Signed-off-by: Sebastian Andrzej Siewior Acked-by: Roman Gushchin Reviewed-by: Shakeel Butt Cc: kernel test robot Cc: Michal Hocko Cc: Michal Hocko Cc: Michal Koutný Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Vladimir Davydov Cc: Waiman Long Signed-off-by: Andrew Morton --- mm/memcontrol.c | 56 +++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 55 insertions(+), 1 deletion(-) --- a/mm/memcontrol.c~mm-memcg-protect-per-cpu-counter-by-disabling-preemption-on-preempt_rt-where-needed +++ a/mm/memcontrol.c @@ -629,6 +629,35 @@ static DEFINE_SPINLOCK(stats_flush_lock) static DEFINE_PER_CPU(unsigned int, stats_updates); static atomic_t stats_flush_threshold = ATOMIC_INIT(0); +/* + * Accessors to ensure that preemption is disabled on PREEMPT_RT because it can + * not rely on this as part of an acquired spinlock_t lock. These functions are + * never used in hardirq context on PREEMPT_RT and therefore disabling preemtion + * is sufficient. + */ +static void memcg_stats_lock(void) +{ +#ifdef CONFIG_PREEMPT_RT + preempt_disable(); +#else + VM_BUG_ON(!irqs_disabled()); +#endif +} + +static void __memcg_stats_lock(void) +{ +#ifdef CONFIG_PREEMPT_RT + preempt_disable(); +#endif +} + +static void memcg_stats_unlock(void) +{ +#ifdef CONFIG_PREEMPT_RT + preempt_enable(); +#endif +} + static inline void memcg_rstat_updated(struct mem_cgroup *memcg, int val) { unsigned int x; @@ -705,6 +734,27 @@ void __mod_memcg_lruvec_state(struct lru pn = container_of(lruvec, struct mem_cgroup_per_node, lruvec); memcg = pn->memcg; + /* + * The caller from rmap relay on disabled preemption becase they never + * update their counter from in-interrupt context. For these two + * counters we check that the update is never performed from an + * interrupt context while other caller need to have disabled interrupt. + */ + __memcg_stats_lock(); + if (IS_ENABLED(CONFIG_DEBUG_VM) && !IS_ENABLED(CONFIG_PREEMPT_RT)) { + switch (idx) { + case NR_ANON_MAPPED: + case NR_FILE_MAPPED: + case NR_ANON_THPS: + case NR_SHMEM_PMDMAPPED: + case NR_FILE_PMDMAPPED: + WARN_ON_ONCE(!in_task()); + break; + default: + WARN_ON_ONCE(!irqs_disabled()); + } + } + /* Update memcg */ __this_cpu_add(memcg->vmstats_percpu->state[idx], val); @@ -712,6 +762,7 @@ void __mod_memcg_lruvec_state(struct lru __this_cpu_add(pn->lruvec_stats_percpu->state[idx], val); memcg_rstat_updated(memcg, val); + memcg_stats_unlock(); } /** @@ -794,8 +845,10 @@ void __count_memcg_events(struct mem_cgr if (mem_cgroup_disabled()) return; + memcg_stats_lock(); __this_cpu_add(memcg->vmstats_percpu->events[idx], count); memcg_rstat_updated(memcg, count); + memcg_stats_unlock(); } static unsigned long memcg_events(struct mem_cgroup *memcg, int event) @@ -7154,8 +7207,9 @@ void mem_cgroup_swapout(struct page *pag * important here to have the interrupts disabled because it is the * only synchronisation we have for updating the per-CPU variables. */ - VM_BUG_ON(!irqs_disabled()); + memcg_stats_lock(); mem_cgroup_charge_statistics(memcg, -nr_entries); + memcg_stats_unlock(); memcg_check_events(memcg, page_to_nid(page)); css_put(&memcg->css); From patchwork Tue Mar 22 21:40:44 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789085 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 61E4FC4332F for ; Tue, 22 Mar 2022 21:40:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E681C6B0073; Tue, 22 Mar 2022 17:40:46 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DA0DD6B0074; Tue, 22 Mar 2022 17:40:46 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C8E9C6B0085; Tue, 22 Mar 2022 17:40:46 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.a.hostedemail.com [64.99.140.24]) by kanga.kvack.org (Postfix) with ESMTP id BAAE56B0073 for ; Tue, 22 Mar 2022 17:40:46 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 8CDB423F54 for ; Tue, 22 Mar 2022 21:40:46 +0000 (UTC) X-FDA: 79273342092.04.886F4C3 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf12.hostedemail.com (Postfix) with ESMTP id 033514003B for ; Tue, 22 Mar 2022 21:40:45 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 7314A610A1; Tue, 22 Mar 2022 21:40:45 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id C9F9BC340F3; Tue, 22 Mar 2022 21:40:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985244; bh=VlyY3pOJ3a0/P4OQp/qLVY9A3L9ShJeRgy6qKyNI2os=; h=Date:To:From:In-Reply-To:Subject:From; b=IFSEvgatPRLp/fZeIvHjI/i/tRPvQDXlPF/YI/s7PPlpFp1gM0HDhr88hfruEmXrh Jmpmq9G+Ypb9cCTEJPes+IWJZvi4ZCxw1Z6rgv6cjA9uwSAfW0XE4Pk39iqrpUmIOA ySPLoc8piepN+yKUd/Eqx6AU8qaAIJZjEFtVFYnA= Date: Tue, 22 Mar 2022 14:40:44 -0700 To: vdavydov.dev@gmail.com,tglx@linutronix.de,shakeelb@google.com,peterz@infradead.org,oliver.sang@intel.com,mkoutny@suse.com,mhocko@suse.com,mhocko@kernel.org,longman@redhat.com,guro@fb.com,bigeasy@linutronix.de,hannes@cmpxchg.org,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 044/227] mm/memcg: opencode the inner part of obj_cgroup_uncharge_pages() in drain_obj_stock() Message-Id: <20220322214044.C9F9BC340F3@smtp.kernel.org> X-Stat-Signature: sc6ia8hcxnqfxa8ygkqzshdtjsfjtzgy X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 033514003B Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=IFSEvgat; dmarc=none; spf=pass (imf12.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Rspam-User: X-HE-Tag: 1647985245-393129 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Johannes Weiner Subject: mm/memcg: opencode the inner part of obj_cgroup_uncharge_pages() in drain_obj_stock() Provide the inner part of refill_stock() as __refill_stock() without disabling interrupts. This eases the integration of local_lock_t where recursive locking must be avoided. Open code obj_cgroup_uncharge_pages() in drain_obj_stock() and use __refill_stock(). The caller of drain_obj_stock() already disables interrupts. [bigeasy@linutronix.de: patch body around Johannes' diff] Link: https://lkml.kernel.org/r/20220226204144.1008339-5-bigeasy@linutronix.de Signed-off-by: Johannes Weiner Signed-off-by: Sebastian Andrzej Siewior Reviewed-by: Shakeel Butt Reviewed-by: Roman Gushchin Acked-by: Michal Hocko Cc: kernel test robot Cc: Michal Hocko Cc: Michal Koutný Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Vladimir Davydov Cc: Waiman Long Signed-off-by: Andrew Morton --- mm/memcontrol.c | 24 ++++++++++++++++++------ 1 file changed, 18 insertions(+), 6 deletions(-) --- a/mm/memcontrol.c~mm-memcg-opencode-the-inner-part-of-obj_cgroup_uncharge_pages-in-drain_obj_stock +++ a/mm/memcontrol.c @@ -2251,12 +2251,9 @@ static void drain_local_stock(struct wor * Cache charges(val) to local per_cpu area. * This will be consumed by consume_stock() function, later. */ -static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages) +static void __refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages) { struct memcg_stock_pcp *stock; - unsigned long flags; - - local_irq_save(flags); stock = this_cpu_ptr(&memcg_stock); if (stock->cached != memcg) { /* reset if necessary */ @@ -2268,7 +2265,14 @@ static void refill_stock(struct mem_cgro if (stock->nr_pages > MEMCG_CHARGE_BATCH) drain_stock(stock); +} + +static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages) +{ + unsigned long flags; + local_irq_save(flags); + __refill_stock(memcg, nr_pages); local_irq_restore(flags); } @@ -3185,8 +3189,16 @@ static void drain_obj_stock(struct memcg unsigned int nr_pages = stock->nr_bytes >> PAGE_SHIFT; unsigned int nr_bytes = stock->nr_bytes & (PAGE_SIZE - 1); - if (nr_pages) - obj_cgroup_uncharge_pages(old, nr_pages); + if (nr_pages) { + struct mem_cgroup *memcg; + + memcg = get_mem_cgroup_from_objcg(old); + + memcg_account_kmem(memcg, -nr_pages); + __refill_stock(memcg, nr_pages); + + css_put(&memcg->css); + } /* * The leftover is flushed to the centralized per-memcg value. From patchwork Tue Mar 22 21:40:47 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789086 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 96D72C433F5 for ; Tue, 22 Mar 2022 21:40:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 258846B0074; Tue, 22 Mar 2022 17:40:50 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1E0DC6B0085; Tue, 22 Mar 2022 17:40:50 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0A7E66B0087; Tue, 22 Mar 2022 17:40:50 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0175.hostedemail.com [216.40.44.175]) by kanga.kvack.org (Postfix) with ESMTP id EF9576B0074 for ; Tue, 22 Mar 2022 17:40:49 -0400 (EDT) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id AD4BD8249980 for ; Tue, 22 Mar 2022 21:40:49 +0000 (UTC) X-FDA: 79273342218.23.CD2CAA8 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf10.hostedemail.com (Postfix) with ESMTP id 1B0E3C0034 for ; Tue, 22 Mar 2022 21:40:48 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 82B28610A1; Tue, 22 Mar 2022 21:40:48 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id D3803C340F5; Tue, 22 Mar 2022 21:40:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985247; bh=wuJbZZQyJm2huM0AFr6qzWnAlPMeYzJ7RaKfMkwRdfM=; h=Date:To:From:In-Reply-To:Subject:From; b=f09YBi0TkGHfYOTgPm1pAjAECnPRwvdD1tmL7gol6nTylNeAeG/mQp1pGgz2pu++n tQolnfhBFA2206BiwYyaju3sApP1MYgYNcFiKlEaMYD26d0L6bEuamaRlrC1p8XLVz ACJ/7FJFf99DGNr5gy4qu3e2bwfnGjAu8BmNXR+I= Date: Tue, 22 Mar 2022 14:40:47 -0700 To: vdavydov.dev@gmail.com,tglx@linutronix.de,shakeelb@google.com,roman.gushchin@linux.dev,peterz@infradead.org,oliver.sang@intel.com,mkoutny@suse.com,mhocko@suse.com,longman@redhat.com,hannes@cmpxchg.org,bigeasy@linutronix.de,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 045/227] mm/memcg: protect memcg_stock with a local_lock_t Message-Id: <20220322214047.D3803C340F5@smtp.kernel.org> X-Stat-Signature: uaqwbngmkgsk4ebtrsiiwznk93ts6hkd Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=f09YBi0T; spf=pass (imf10.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspam-User: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 1B0E3C0034 X-HE-Tag: 1647985248-203954 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Sebastian Andrzej Siewior Subject: mm/memcg: protect memcg_stock with a local_lock_t The members of the per-CPU structure memcg_stock_pcp are protected by disabling interrupts. This is not working on PREEMPT_RT because it creates atomic context in which actions are performed which require preemptible context. One example is obj_cgroup_release(). The IRQ-disable sections can be replaced with local_lock_t which preserves the explicit disabling of interrupts while keeps the code preemptible on PREEMPT_RT. drain_obj_stock() drops a reference on obj_cgroup which leads to an invocat= ion of obj_cgroup_release() if it is the last object. This in turn leads to recursive locking of the local_lock_t. To avoid this, obj_cgroup_release() = is invoked outside of the locked section. obj_cgroup_uncharge_pages() can be invoked with the local_lock_t acquired a= nd without it. This will lead later to a recursion in refill_stock(). To avoid the locking recursion provide obj_cgroup_uncharge_pages_locked() which uses the locked version of refill_stock(). - Replace disabling interrupts for memcg_stock with a local_lock_t. - Let drain_obj_stock() return the old struct obj_cgroup which is passed to obj_cgroup_put() outside of the locked section. - Provide obj_cgroup_uncharge_pages_locked() which uses the locked version of refill_stock() to avoid recursive locking in drain_obj_stock(). Link: https://lkml.kernel.org/r/20220209014709.GA26885@xsang-OptiPlex-9020 Link: https://lkml.kernel.org/r/20220226204144.1008339-6-bigeasy@linutronix.de Signed-off-by: Sebastian Andrzej Siewior Reported-by: kernel test robot Acked-by: Michal Hocko Cc: Johannes Weiner Cc: Michal Koutný Cc: Peter Zijlstra Cc: Roman Gushchin Cc: Shakeel Butt Cc: Thomas Gleixner Cc: Vladimir Davydov Cc: Waiman Long Signed-off-by: Andrew Morton --- mm/memcontrol.c | 59 +++++++++++++++++++++++++++++----------------- 1 file changed, 38 insertions(+), 21 deletions(-) --- a/mm/memcontrol.c~mm-memcg-protect-memcg_stock-with-a-local_lock_t +++ a/mm/memcontrol.c @@ -2135,6 +2135,7 @@ void unlock_page_memcg(struct page *page } struct memcg_stock_pcp { + local_lock_t stock_lock; struct mem_cgroup *cached; /* this never be root cgroup */ unsigned int nr_pages; @@ -2150,18 +2151,21 @@ struct memcg_stock_pcp { unsigned long flags; #define FLUSHING_CACHED_CHARGE 0 }; -static DEFINE_PER_CPU(struct memcg_stock_pcp, memcg_stock); +static DEFINE_PER_CPU(struct memcg_stock_pcp, memcg_stock) = { + .stock_lock = INIT_LOCAL_LOCK(stock_lock), +}; static DEFINE_MUTEX(percpu_charge_mutex); #ifdef CONFIG_MEMCG_KMEM -static void drain_obj_stock(struct memcg_stock_pcp *stock); +static struct obj_cgroup *drain_obj_stock(struct memcg_stock_pcp *stock); static bool obj_stock_flush_required(struct memcg_stock_pcp *stock, struct mem_cgroup *root_memcg); static void memcg_account_kmem(struct mem_cgroup *memcg, int nr_pages); #else -static inline void drain_obj_stock(struct memcg_stock_pcp *stock) +static inline struct obj_cgroup *drain_obj_stock(struct memcg_stock_pcp *stock) { + return NULL; } static bool obj_stock_flush_required(struct memcg_stock_pcp *stock, struct mem_cgroup *root_memcg) @@ -2193,7 +2197,7 @@ static bool consume_stock(struct mem_cgr if (nr_pages > MEMCG_CHARGE_BATCH) return ret; - local_irq_save(flags); + local_lock_irqsave(&memcg_stock.stock_lock, flags); stock = this_cpu_ptr(&memcg_stock); if (memcg == stock->cached && stock->nr_pages >= nr_pages) { @@ -2201,7 +2205,7 @@ static bool consume_stock(struct mem_cgr ret = true; } - local_irq_restore(flags); + local_unlock_irqrestore(&memcg_stock.stock_lock, flags); return ret; } @@ -2230,6 +2234,7 @@ static void drain_stock(struct memcg_sto static void drain_local_stock(struct work_struct *dummy) { struct memcg_stock_pcp *stock; + struct obj_cgroup *old = NULL; unsigned long flags; /* @@ -2237,14 +2242,16 @@ static void drain_local_stock(struct wor * drain_stock races is that we always operate on local CPU stock * here with IRQ disabled */ - local_irq_save(flags); + local_lock_irqsave(&memcg_stock.stock_lock, flags); stock = this_cpu_ptr(&memcg_stock); - drain_obj_stock(stock); + old = drain_obj_stock(stock); drain_stock(stock); clear_bit(FLUSHING_CACHED_CHARGE, &stock->flags); - local_irq_restore(flags); + local_unlock_irqrestore(&memcg_stock.stock_lock, flags); + if (old) + obj_cgroup_put(old); } /* @@ -2271,9 +2278,9 @@ static void refill_stock(struct mem_cgro { unsigned long flags; - local_irq_save(flags); + local_lock_irqsave(&memcg_stock.stock_lock, flags); __refill_stock(memcg, nr_pages); - local_irq_restore(flags); + local_unlock_irqrestore(&memcg_stock.stock_lock, flags); } /* @@ -3100,10 +3107,11 @@ void mod_objcg_state(struct obj_cgroup * enum node_stat_item idx, int nr) { struct memcg_stock_pcp *stock; + struct obj_cgroup *old = NULL; unsigned long flags; int *bytes; - local_irq_save(flags); + local_lock_irqsave(&memcg_stock.stock_lock, flags); stock = this_cpu_ptr(&memcg_stock); /* @@ -3112,7 +3120,7 @@ void mod_objcg_state(struct obj_cgroup * * changes. */ if (stock->cached_objcg != objcg) { - drain_obj_stock(stock); + old = drain_obj_stock(stock); obj_cgroup_get(objcg); stock->nr_bytes = atomic_read(&objcg->nr_charged_bytes) ? atomic_xchg(&objcg->nr_charged_bytes, 0) : 0; @@ -3156,7 +3164,9 @@ void mod_objcg_state(struct obj_cgroup * if (nr) mod_objcg_mlstate(objcg, pgdat, idx, nr); - local_irq_restore(flags); + local_unlock_irqrestore(&memcg_stock.stock_lock, flags); + if (old) + obj_cgroup_put(old); } static bool consume_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes) @@ -3165,7 +3175,7 @@ static bool consume_obj_stock(struct obj unsigned long flags; bool ret = false; - local_irq_save(flags); + local_lock_irqsave(&memcg_stock.stock_lock, flags); stock = this_cpu_ptr(&memcg_stock); if (objcg == stock->cached_objcg && stock->nr_bytes >= nr_bytes) { @@ -3173,17 +3183,17 @@ static bool consume_obj_stock(struct obj ret = true; } - local_irq_restore(flags); + local_unlock_irqrestore(&memcg_stock.stock_lock, flags); return ret; } -static void drain_obj_stock(struct memcg_stock_pcp *stock) +static struct obj_cgroup *drain_obj_stock(struct memcg_stock_pcp *stock) { struct obj_cgroup *old = stock->cached_objcg; if (!old) - return; + return NULL; if (stock->nr_bytes) { unsigned int nr_pages = stock->nr_bytes >> PAGE_SHIFT; @@ -3233,8 +3243,12 @@ static void drain_obj_stock(struct memcg stock->cached_pgdat = NULL; } - obj_cgroup_put(old); stock->cached_objcg = NULL; + /* + * The `old' objects needs to be released by the caller via + * obj_cgroup_put() outside of memcg_stock_pcp::stock_lock. + */ + return old; } static bool obj_stock_flush_required(struct memcg_stock_pcp *stock, @@ -3255,14 +3269,15 @@ static void refill_obj_stock(struct obj_ bool allow_uncharge) { struct memcg_stock_pcp *stock; + struct obj_cgroup *old = NULL; unsigned long flags; unsigned int nr_pages = 0; - local_irq_save(flags); + local_lock_irqsave(&memcg_stock.stock_lock, flags); stock = this_cpu_ptr(&memcg_stock); if (stock->cached_objcg != objcg) { /* reset if necessary */ - drain_obj_stock(stock); + old = drain_obj_stock(stock); obj_cgroup_get(objcg); stock->cached_objcg = objcg; stock->nr_bytes = atomic_read(&objcg->nr_charged_bytes) @@ -3276,7 +3291,9 @@ static void refill_obj_stock(struct obj_ stock->nr_bytes &= (PAGE_SIZE - 1); } - local_irq_restore(flags); + local_unlock_irqrestore(&memcg_stock.stock_lock, flags); + if (old) + obj_cgroup_put(old); if (nr_pages) obj_cgroup_uncharge_pages(objcg, nr_pages); From patchwork Tue Mar 22 21:40:53 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789087 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 85B57C433F5 for ; Tue, 22 Mar 2022 21:40:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1E4116B0085; Tue, 22 Mar 2022 17:40:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 191FC6B0087; Tue, 22 Mar 2022 17:40:58 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 00AB66B0088; Tue, 22 Mar 2022 17:40:57 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.27]) by kanga.kvack.org (Postfix) with ESMTP id E57E86B0085 for ; Tue, 22 Mar 2022 17:40:57 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id B213861999 for ; Tue, 22 Mar 2022 21:40:57 +0000 (UTC) X-FDA: 79273342554.03.AC75C92 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf31.hostedemail.com (Postfix) with ESMTP id 2A90920034 for ; Tue, 22 Mar 2022 21:40:56 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id CE812B81DB5; Tue, 22 Mar 2022 21:40:55 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 45CCBC340EC; Tue, 22 Mar 2022 21:40:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985254; bh=b9oHme6XtAb8w9ww2H03C3yiA5Do1Rpk1W4qTCLr10o=; h=Date:To:From:In-Reply-To:Subject:From; b=EV+NXRn0hRuMPW8YJgl5GlV514jr7yYEvDi34Q9EaPIiWhxOeQHKJh5acawadwcqT tWcufCYXsvsz+uWXaZh/GDRMt5zIKGoxoW8VsnS78s9toq0h+2H9nc1CwpKMrvxfyt uPjbFMrgvril94vyl/V458RQ2mBjchCDpDjuOiS0= Date: Tue, 22 Mar 2022 14:40:53 -0700 To: zhengqi.arch@bytedance.com,willy@infradead.org,vdavydov.dev@gmail.com,vbabka@suse.cz,tytso@mit.edu,trond.myklebust@hammerspace.com,shy828301@gmail.com,shakeelb@google.com,roman.gushchin@linux.dev,richard.weiyang@gmail.com,mhocko@kernel.org,kari.argillander@gmail.com,jaegeuk@kernel.org,hannes@cmpxchg.org,fam.zheng@bytedance.com,duanxiongchun@bytedance.com,david@fromorbit.com,chao@kernel.org,Anna.Schumaker@Netapp.com,alexs@kernel.org,songmuchun@bytedance.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 047/227] mm: list_lru: transpose the array of per-node per-memcg lru lists Message-Id: <20220322214054.45CCBC340EC@smtp.kernel.org> X-Rspam-User: X-Stat-Signature: 98ebfas7t81azdodsx9uzhkaqaegpmk6 Authentication-Results: imf31.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=EV+NXRn0; spf=pass (imf31.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 2A90920034 X-HE-Tag: 1647985256-470859 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Muchun Song Subject: mm: list_lru: transpose the array of per-node per-memcg lru lists Patch series "Optimize list lru memory consumption", v6. In our server, we found a suspected memory leak problem. The kmalloc-32 consumes more than 6GB of memory. Other kmem_caches consume less than 2GB memory. After our in-depth analysis, the memory consumption of kmalloc-32 slab cache is the cause of list_lru_one allocation. crash> p memcg_nr_cache_ids memcg_nr_cache_ids = $2 = 24574 memcg_nr_cache_ids is very large and memory consumption of each list_lru can be calculated with the following formula. num_numa_node * memcg_nr_cache_ids * 32 (kmalloc-32) There are 4 numa nodes in our system, so each list_lru consumes ~3MB. crash> list super_blocks | wc -l 952 Every mount will register 2 list lrus, one is for inode, another is for dentry. There are 952 super_blocks. So the total memory is 952 * 2 * 3 MB (~5.6GB). But now the number of memory cgroups is less than 500. So I guess more than 12286 memory cgroups have been created on this machine (I do not know why there are so many cgroups, it may be a user's bug or the user really want to do that). Because memcg_nr_cache_ids has not been reduced to a suitable value. It leads to waste a lot of memory. If we want to reduce memcg_nr_cache_ids, we have to *reboot* the server. This is not what we want. In order to reduce memcg_nr_cache_ids, I had posted a patchset [1] to do this. But this did not fundamentally solve the problem. We currently allocate scope for every memcg to be able to tracked on every superblock instantiated in the system, regardless of whether that superblock is even accessible to that memcg. These huge memcg counts come from container hosts where memcgs are confined to just a small subset of the total number of superblocks that instantiated at any given point in time. For these systems with huge container counts, list_lru does not need the capability of tracking every memcg on every superblock. What it comes down to is that the list_lru is only needed for a given memcg if that memcg is instatiating and freeing objects on a given list_lru. As Dave said, "Which makes me think we should be moving more towards 'add the memcg to the list_lru at the first insert' model rather than 'instantiate all at memcg init time just in case'." This patchset aims to optimize the list lru memory consumption from different aspects. I had done a easy test to show the optimization. I create 10k memory cgroups and mount 10k filesystems in the systems. We use free command to show how many memory does the systems comsumes after this operation (There are 2 numa nodes in the system). +-----------------------+------------------------+ | condition | memory consumption | +-----------------------+------------------------+ | without this patchset | 24464 MB | +-----------------------+------------------------+ | after patch 1 | 21957 MB | <--------+ +-----------------------+------------------------+ | | after patch 10 | 6895 MB | | +-----------------------+------------------------+ | | after patch 12 | 4367 MB | | +-----------------------+------------------------+ | | The more the number of nodes, the more obvious the effect---+ BTW, there was a recent discussion [2] on the same issue. [1] https://lore.kernel.org/all/20210428094949.43579-1-songmuchun@bytedance.com/ [2] https://lore.kernel.org/all/20210405054848.GA1077931@in.ibm.com/ This series not only optimizes the memory usage of list_lru but also simplifies the code. This patch (of 16): The current scheme of maintaining per-node per-memcg lru lists looks like: struct list_lru { struct list_lru_node *node; (for each node) struct list_lru_memcg *memcg_lrus; struct list_lru_one *lru[]; (for each memcg) } By effectively transposing the two-dimension array of list_lru_one's structures (per-node per-memcg => per-memcg per-node) it's possible to save some memory and simplify alloc/dealloc paths. The new scheme looks like: struct list_lru { struct list_lru_memcg *mlrus; struct list_lru_per_memcg *mlru[]; (for each memcg) struct list_lru_one node[0]; (for each node) } Memory savings are coming from not only 'struct rcu_head' but also some pointer arrays used to store the pointer to 'struct list_lru_one'. The array is per node and its size is 8 (a pointer) * num_memcgs. So the total size of the arrays is 8 * num_nodes * memcg_nr_cache_ids. After this patch, the size becomes 8 * memcg_nr_cache_ids. Link: https://lkml.kernel.org/r/20220228122126.37293-1-songmuchun@bytedance.com Link: https://lkml.kernel.org/r/20220228122126.37293-2-songmuchun@bytedance.com Signed-off-by: Muchun Song Acked-by: Johannes Weiner Cc: Matthew Wilcox (Oracle) Cc: Michal Hocko Cc: Vladimir Davydov Cc: Shakeel Butt Cc: Yang Shi Cc: Alex Shi Cc: Wei Yang Cc: Dave Chinner Cc: Trond Myklebust Cc: Anna Schumaker Cc: Jaegeuk Kim Cc: Chao Yu Cc: Kari Argillander Cc: Vlastimil Babka Cc: Qi Zheng Cc: Xiongchun Duan Cc: Fam Zheng Cc: Roman Gushchin Cc: Theodore Ts'o Signed-off-by: Andrew Morton --- include/linux/list_lru.h | 17 +-- mm/list_lru.c | 206 +++++++++++++------------------------ 2 files changed, 86 insertions(+), 137 deletions(-) --- a/include/linux/list_lru.h~mm-list_lru-transpose-the-array-of-per-node-per-memcg-lru-lists +++ a/include/linux/list_lru.h @@ -31,10 +31,15 @@ struct list_lru_one { long nr_items; }; +struct list_lru_per_memcg { + /* array of per cgroup per node lists, indexed by node id */ + struct list_lru_one node[0]; +}; + struct list_lru_memcg { - struct rcu_head rcu; + struct rcu_head rcu; /* array of per cgroup lists, indexed by memcg_cache_id */ - struct list_lru_one *lru[]; + struct list_lru_per_memcg *mlru[]; }; struct list_lru_node { @@ -42,11 +47,7 @@ struct list_lru_node { spinlock_t lock; /* global list, used for the root cgroup in cgroup aware lrus */ struct list_lru_one lru; -#ifdef CONFIG_MEMCG_KMEM - /* for cgroup aware lrus points to per cgroup lists, otherwise NULL */ - struct list_lru_memcg __rcu *memcg_lrus; -#endif - long nr_items; + long nr_items; } ____cacheline_aligned_in_smp; struct list_lru { @@ -55,6 +56,8 @@ struct list_lru { struct list_head list; int shrinker_id; bool memcg_aware; + /* for cgroup aware lrus points to per cgroup lists, otherwise NULL */ + struct list_lru_memcg __rcu *mlrus; #endif }; --- a/mm/list_lru.c~mm-list_lru-transpose-the-array-of-per-node-per-memcg-lru-lists +++ a/mm/list_lru.c @@ -49,35 +49,37 @@ static int lru_shrinker_id(struct list_l } static inline struct list_lru_one * -list_lru_from_memcg_idx(struct list_lru_node *nlru, int idx) +list_lru_from_memcg_idx(struct list_lru *lru, int nid, int idx) { - struct list_lru_memcg *memcg_lrus; + struct list_lru_memcg *mlrus; + struct list_lru_node *nlru = &lru->node[nid]; + /* * Either lock or RCU protects the array of per cgroup lists - * from relocation (see memcg_update_list_lru_node). + * from relocation (see memcg_update_list_lru). */ - memcg_lrus = rcu_dereference_check(nlru->memcg_lrus, - lockdep_is_held(&nlru->lock)); - if (memcg_lrus && idx >= 0) - return memcg_lrus->lru[idx]; + mlrus = rcu_dereference_check(lru->mlrus, lockdep_is_held(&nlru->lock)); + if (mlrus && idx >= 0) + return &mlrus->mlru[idx]->node[nid]; return &nlru->lru; } static inline struct list_lru_one * -list_lru_from_kmem(struct list_lru_node *nlru, void *ptr, +list_lru_from_kmem(struct list_lru *lru, int nid, void *ptr, struct mem_cgroup **memcg_ptr) { + struct list_lru_node *nlru = &lru->node[nid]; struct list_lru_one *l = &nlru->lru; struct mem_cgroup *memcg = NULL; - if (!nlru->memcg_lrus) + if (!lru->mlrus) goto out; memcg = mem_cgroup_from_obj(ptr); if (!memcg) goto out; - l = list_lru_from_memcg_idx(nlru, memcg_cache_id(memcg)); + l = list_lru_from_memcg_idx(lru, nid, memcg_cache_id(memcg)); out: if (memcg_ptr) *memcg_ptr = memcg; @@ -103,18 +105,18 @@ static inline bool list_lru_memcg_aware( } static inline struct list_lru_one * -list_lru_from_memcg_idx(struct list_lru_node *nlru, int idx) +list_lru_from_memcg_idx(struct list_lru *lru, int nid, int idx) { - return &nlru->lru; + return &lru->node[nid].lru; } static inline struct list_lru_one * -list_lru_from_kmem(struct list_lru_node *nlru, void *ptr, +list_lru_from_kmem(struct list_lru *lru, int nid, void *ptr, struct mem_cgroup **memcg_ptr) { if (memcg_ptr) *memcg_ptr = NULL; - return &nlru->lru; + return &lru->node[nid].lru; } #endif /* CONFIG_MEMCG_KMEM */ @@ -127,7 +129,7 @@ bool list_lru_add(struct list_lru *lru, spin_lock(&nlru->lock); if (list_empty(item)) { - l = list_lru_from_kmem(nlru, item, &memcg); + l = list_lru_from_kmem(lru, nid, item, &memcg); list_add_tail(item, &l->list); /* Set shrinker bit if the first element was added */ if (!l->nr_items++) @@ -150,7 +152,7 @@ bool list_lru_del(struct list_lru *lru, spin_lock(&nlru->lock); if (!list_empty(item)) { - l = list_lru_from_kmem(nlru, item, NULL); + l = list_lru_from_kmem(lru, nid, item, NULL); list_del_init(item); l->nr_items--; nlru->nr_items--; @@ -180,12 +182,11 @@ EXPORT_SYMBOL_GPL(list_lru_isolate_move) unsigned long list_lru_count_one(struct list_lru *lru, int nid, struct mem_cgroup *memcg) { - struct list_lru_node *nlru = &lru->node[nid]; struct list_lru_one *l; long count; rcu_read_lock(); - l = list_lru_from_memcg_idx(nlru, memcg_cache_id(memcg)); + l = list_lru_from_memcg_idx(lru, nid, memcg_cache_id(memcg)); count = READ_ONCE(l->nr_items); rcu_read_unlock(); @@ -206,16 +207,16 @@ unsigned long list_lru_count_node(struct EXPORT_SYMBOL_GPL(list_lru_count_node); static unsigned long -__list_lru_walk_one(struct list_lru_node *nlru, int memcg_idx, +__list_lru_walk_one(struct list_lru *lru, int nid, int memcg_idx, list_lru_walk_cb isolate, void *cb_arg, unsigned long *nr_to_walk) { - + struct list_lru_node *nlru = &lru->node[nid]; struct list_lru_one *l; struct list_head *item, *n; unsigned long isolated = 0; - l = list_lru_from_memcg_idx(nlru, memcg_idx); + l = list_lru_from_memcg_idx(lru, nid, memcg_idx); restart: list_for_each_safe(item, n, &l->list) { enum lru_status ret; @@ -272,8 +273,8 @@ list_lru_walk_one(struct list_lru *lru, unsigned long ret; spin_lock(&nlru->lock); - ret = __list_lru_walk_one(nlru, memcg_cache_id(memcg), isolate, cb_arg, - nr_to_walk); + ret = __list_lru_walk_one(lru, nid, memcg_cache_id(memcg), isolate, + cb_arg, nr_to_walk); spin_unlock(&nlru->lock); return ret; } @@ -288,8 +289,8 @@ list_lru_walk_one_irq(struct list_lru *l unsigned long ret; spin_lock_irq(&nlru->lock); - ret = __list_lru_walk_one(nlru, memcg_cache_id(memcg), isolate, cb_arg, - nr_to_walk); + ret = __list_lru_walk_one(lru, nid, memcg_cache_id(memcg), isolate, + cb_arg, nr_to_walk); spin_unlock_irq(&nlru->lock); return ret; } @@ -308,7 +309,7 @@ unsigned long list_lru_walk_node(struct struct list_lru_node *nlru = &lru->node[nid]; spin_lock(&nlru->lock); - isolated += __list_lru_walk_one(nlru, memcg_idx, + isolated += __list_lru_walk_one(lru, nid, memcg_idx, isolate, cb_arg, nr_to_walk); spin_unlock(&nlru->lock); @@ -328,166 +329,111 @@ static void init_one_lru(struct list_lru } #ifdef CONFIG_MEMCG_KMEM -static void __memcg_destroy_list_lru_node(struct list_lru_memcg *memcg_lrus, - int begin, int end) +static void memcg_destroy_list_lru_range(struct list_lru_memcg *mlrus, + int begin, int end) { int i; for (i = begin; i < end; i++) - kfree(memcg_lrus->lru[i]); + kfree(mlrus->mlru[i]); } -static int __memcg_init_list_lru_node(struct list_lru_memcg *memcg_lrus, - int begin, int end) +static int memcg_init_list_lru_range(struct list_lru_memcg *mlrus, + int begin, int end) { int i; for (i = begin; i < end; i++) { - struct list_lru_one *l; + int nid; + struct list_lru_per_memcg *mlru; - l = kmalloc(sizeof(struct list_lru_one), GFP_KERNEL); - if (!l) + mlru = kmalloc(struct_size(mlru, node, nr_node_ids), GFP_KERNEL); + if (!mlru) goto fail; - init_one_lru(l); - memcg_lrus->lru[i] = l; + for_each_node(nid) + init_one_lru(&mlru->node[nid]); + mlrus->mlru[i] = mlru; } return 0; fail: - __memcg_destroy_list_lru_node(memcg_lrus, begin, i); + memcg_destroy_list_lru_range(mlrus, begin, i); return -ENOMEM; } -static int memcg_init_list_lru_node(struct list_lru_node *nlru) +static int memcg_init_list_lru(struct list_lru *lru, bool memcg_aware) { - struct list_lru_memcg *memcg_lrus; + struct list_lru_memcg *mlrus; int size = memcg_nr_cache_ids; - memcg_lrus = kvmalloc(struct_size(memcg_lrus, lru, size), GFP_KERNEL); - if (!memcg_lrus) + lru->memcg_aware = memcg_aware; + if (!memcg_aware) + return 0; + + mlrus = kvmalloc(struct_size(mlrus, mlru, size), GFP_KERNEL); + if (!mlrus) return -ENOMEM; - if (__memcg_init_list_lru_node(memcg_lrus, 0, size)) { - kvfree(memcg_lrus); + if (memcg_init_list_lru_range(mlrus, 0, size)) { + kvfree(mlrus); return -ENOMEM; } - RCU_INIT_POINTER(nlru->memcg_lrus, memcg_lrus); + RCU_INIT_POINTER(lru->mlrus, mlrus); return 0; } -static void memcg_destroy_list_lru_node(struct list_lru_node *nlru) +static void memcg_destroy_list_lru(struct list_lru *lru) { - struct list_lru_memcg *memcg_lrus; + struct list_lru_memcg *mlrus; + + if (!list_lru_memcg_aware(lru)) + return; + /* * This is called when shrinker has already been unregistered, * and nobody can use it. So, there is no need to use kvfree_rcu(). */ - memcg_lrus = rcu_dereference_protected(nlru->memcg_lrus, true); - __memcg_destroy_list_lru_node(memcg_lrus, 0, memcg_nr_cache_ids); - kvfree(memcg_lrus); + mlrus = rcu_dereference_protected(lru->mlrus, true); + memcg_destroy_list_lru_range(mlrus, 0, memcg_nr_cache_ids); + kvfree(mlrus); } -static int memcg_update_list_lru_node(struct list_lru_node *nlru, - int old_size, int new_size) +static int memcg_update_list_lru(struct list_lru *lru, int old_size, int new_size) { struct list_lru_memcg *old, *new; BUG_ON(old_size > new_size); - old = rcu_dereference_protected(nlru->memcg_lrus, + old = rcu_dereference_protected(lru->mlrus, lockdep_is_held(&list_lrus_mutex)); - new = kvmalloc(struct_size(new, lru, new_size), GFP_KERNEL); + new = kvmalloc(struct_size(new, mlru, new_size), GFP_KERNEL); if (!new) return -ENOMEM; - if (__memcg_init_list_lru_node(new, old_size, new_size)) { + if (memcg_init_list_lru_range(new, old_size, new_size)) { kvfree(new); return -ENOMEM; } - memcpy(&new->lru, &old->lru, flex_array_size(new, lru, old_size)); - rcu_assign_pointer(nlru->memcg_lrus, new); + memcpy(&new->mlru, &old->mlru, flex_array_size(new, mlru, old_size)); + rcu_assign_pointer(lru->mlrus, new); kvfree_rcu(old, rcu); return 0; } -static void memcg_cancel_update_list_lru_node(struct list_lru_node *nlru, - int old_size, int new_size) -{ - struct list_lru_memcg *memcg_lrus; - - memcg_lrus = rcu_dereference_protected(nlru->memcg_lrus, - lockdep_is_held(&list_lrus_mutex)); - /* do not bother shrinking the array back to the old size, because we - * cannot handle allocation failures here */ - __memcg_destroy_list_lru_node(memcg_lrus, old_size, new_size); -} - -static int memcg_init_list_lru(struct list_lru *lru, bool memcg_aware) -{ - int i; - - lru->memcg_aware = memcg_aware; - - if (!memcg_aware) - return 0; - - for_each_node(i) { - if (memcg_init_list_lru_node(&lru->node[i])) - goto fail; - } - return 0; -fail: - for (i = i - 1; i >= 0; i--) { - if (!lru->node[i].memcg_lrus) - continue; - memcg_destroy_list_lru_node(&lru->node[i]); - } - return -ENOMEM; -} - -static void memcg_destroy_list_lru(struct list_lru *lru) -{ - int i; - - if (!list_lru_memcg_aware(lru)) - return; - - for_each_node(i) - memcg_destroy_list_lru_node(&lru->node[i]); -} - -static int memcg_update_list_lru(struct list_lru *lru, - int old_size, int new_size) -{ - int i; - - for_each_node(i) { - if (memcg_update_list_lru_node(&lru->node[i], - old_size, new_size)) - goto fail; - } - return 0; -fail: - for (i = i - 1; i >= 0; i--) { - if (!lru->node[i].memcg_lrus) - continue; - - memcg_cancel_update_list_lru_node(&lru->node[i], - old_size, new_size); - } - return -ENOMEM; -} - static void memcg_cancel_update_list_lru(struct list_lru *lru, int old_size, int new_size) { - int i; + struct list_lru_memcg *mlrus; - for_each_node(i) - memcg_cancel_update_list_lru_node(&lru->node[i], - old_size, new_size); + mlrus = rcu_dereference_protected(lru->mlrus, + lockdep_is_held(&list_lrus_mutex)); + /* + * Do not bother shrinking the array back to the old size, because we + * cannot handle allocation failures here. + */ + memcg_destroy_list_lru_range(mlrus, old_size, new_size); } int memcg_update_all_list_lrus(int new_size) @@ -524,8 +470,8 @@ static void memcg_drain_list_lru_node(st */ spin_lock_irq(&nlru->lock); - src = list_lru_from_memcg_idx(nlru, src_idx); - dst = list_lru_from_memcg_idx(nlru, dst_idx); + src = list_lru_from_memcg_idx(lru, nid, src_idx); + dst = list_lru_from_memcg_idx(lru, nid, dst_idx); list_splice_init(&src->list, &dst->list); From patchwork Tue Mar 22 21:40:56 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789088 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0EE3DC433FE for ; Tue, 22 Mar 2022 21:41:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9ABBB6B007D; Tue, 22 Mar 2022 17:41:01 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 95AAC6B0080; Tue, 22 Mar 2022 17:41:01 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7D32B6B0083; Tue, 22 Mar 2022 17:41:01 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.26]) by kanga.kvack.org (Postfix) with ESMTP id 6DCE26B007D for ; Tue, 22 Mar 2022 17:41:01 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay12.hostedemail.com (Postfix) with ESMTP id 34E0912146C for ; Tue, 22 Mar 2022 21:41:01 +0000 (UTC) X-FDA: 79273342722.13.56DF481 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf20.hostedemail.com (Postfix) with ESMTP id 81DA91C0042 for ; Tue, 22 Mar 2022 21:41:00 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id DDA39B81D77; Tue, 22 Mar 2022 21:40:58 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7728EC340EC; Tue, 22 Mar 2022 21:40:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985257; bh=xdI/0Pw5VwMaqDvr7JVhihZBhMxdebfhFF6pqWJ7xKQ=; h=Date:To:From:In-Reply-To:Subject:From; b=d0iEerc3NCOoxOgBkuV0WR7KOxNuQLsqAHfInMRwTOBqMzjoJE84La8o459la4ZzW 2aaALlAWNed2pUNOm1DAEjERnSHKJAI5CMSV5un7iZIyA72onp5AIFesS/lVAZzgOp A4u5fGARa6ae3Ixqhl6ovN7ipySTaQtRnw8wZzGI= Date: Tue, 22 Mar 2022 14:40:56 -0700 To: zhengqi.arch@bytedance.com,willy@infradead.org,vdavydov.dev@gmail.com,vbabka@suse.cz,tytso@mit.edu,trond.myklebust@hammerspace.com,shy828301@gmail.com,shakeelb@google.com,roman.gushchin@linux.dev,richard.weiyang@gmail.com,mhocko@kernel.org,kari.argillander@gmail.com,jaegeuk@kernel.org,hannes@cmpxchg.org,fam.zheng@bytedance.com,duanxiongchun@bytedance.com,david@fromorbit.com,chao@kernel.org,Anna.Schumaker@Netapp.com,alexs@kernel.org,songmuchun@bytedance.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 048/227] mm: introduce kmem_cache_alloc_lru Message-Id: <20220322214057.7728EC340EC@smtp.kernel.org> X-Stat-Signature: g6optcmbb7jt66ea1kfary85g16qkk3x X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 81DA91C0042 Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=d0iEerc3; dmarc=none; spf=pass (imf20.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Rspam-User: X-HE-Tag: 1647985260-63646 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Muchun Song Subject: mm: introduce kmem_cache_alloc_lru We currently allocate scope for every memcg to be able to tracked on every superblock instantiated in the system, regardless of whether that superblock is even accessible to that memcg. These huge memcg counts come from container hosts where memcgs are confined to just a small subset of the total number of superblocks that instantiated at any given point in time. For these systems with huge container counts, list_lru does not need the capability of tracking every memcg on every superblock. What it comes down to is that adding the memcg to the list_lru at the first insert. So introduce kmem_cache_alloc_lru to allocate objects and its list_lru. In the later patch, we will convert all inode and dentry allocation from kmem_cache_alloc to kmem_cache_alloc_lru. Link: https://lkml.kernel.org/r/20220228122126.37293-3-songmuchun@bytedance.com Signed-off-by: Muchun Song Cc: Alex Shi Cc: Anna Schumaker Cc: Chao Yu Cc: Dave Chinner Cc: Fam Zheng Cc: Jaegeuk Kim Cc: Johannes Weiner Cc: Kari Argillander Cc: Matthew Wilcox (Oracle) Cc: Michal Hocko Cc: Qi Zheng Cc: Roman Gushchin Cc: Shakeel Butt Cc: Theodore Ts'o Cc: Trond Myklebust Cc: Vladimir Davydov Cc: Vlastimil Babka Cc: Wei Yang Cc: Xiongchun Duan Cc: Yang Shi Signed-off-by: Andrew Morton --- include/linux/list_lru.h | 4 + include/linux/memcontrol.h | 14 ++++ include/linux/slab.h | 3 + mm/list_lru.c | 104 +++++++++++++++++++++++++++++++---- mm/memcontrol.c | 14 ---- mm/slab.c | 39 +++++++++---- mm/slab.h | 25 +++++++- mm/slob.c | 6 ++ mm/slub.c | 42 +++++++++----- 9 files changed, 198 insertions(+), 53 deletions(-) --- a/include/linux/list_lru.h~mm-introduce-kmem_cache_alloc_lru +++ a/include/linux/list_lru.h @@ -56,6 +56,8 @@ struct list_lru { struct list_head list; int shrinker_id; bool memcg_aware; + /* protects ->mlrus->mlru[i] */ + spinlock_t lock; /* for cgroup aware lrus points to per cgroup lists, otherwise NULL */ struct list_lru_memcg __rcu *mlrus; #endif @@ -72,6 +74,8 @@ int __list_lru_init(struct list_lru *lru #define list_lru_init_memcg(lru, shrinker) \ __list_lru_init((lru), true, NULL, shrinker) +int memcg_list_lru_alloc(struct mem_cgroup *memcg, struct list_lru *lru, + gfp_t gfp); int memcg_update_all_list_lrus(int num_memcgs); void memcg_drain_all_list_lrus(int src_idx, struct mem_cgroup *dst_memcg); --- a/include/linux/memcontrol.h~mm-introduce-kmem_cache_alloc_lru +++ a/include/linux/memcontrol.h @@ -524,6 +524,20 @@ static inline struct mem_cgroup *page_me return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK); } +static inline struct mem_cgroup *get_mem_cgroup_from_objcg(struct obj_cgroup *objcg) +{ + struct mem_cgroup *memcg; + + rcu_read_lock(); +retry: + memcg = obj_cgroup_memcg(objcg); + if (unlikely(!css_tryget(&memcg->css))) + goto retry; + rcu_read_unlock(); + + return memcg; +} + #ifdef CONFIG_MEMCG_KMEM /* * folio_memcg_kmem - Check if the folio has the memcg_kmem flag set. --- a/include/linux/slab.h~mm-introduce-kmem_cache_alloc_lru +++ a/include/linux/slab.h @@ -135,6 +135,7 @@ #include +struct list_lru; struct mem_cgroup; /* * struct kmem_cache related prototypes @@ -416,6 +417,8 @@ static __always_inline unsigned int __km void *__kmalloc(size_t size, gfp_t flags) __assume_kmalloc_alignment __alloc_size(1); void *kmem_cache_alloc(struct kmem_cache *s, gfp_t flags) __assume_slab_alignment __malloc; +void *kmem_cache_alloc_lru(struct kmem_cache *s, struct list_lru *lru, + gfp_t gfpflags) __assume_slab_alignment __malloc; void kmem_cache_free(struct kmem_cache *s, void *objp); /* --- a/mm/list_lru.c~mm-introduce-kmem_cache_alloc_lru +++ a/mm/list_lru.c @@ -13,6 +13,7 @@ #include #include #include "slab.h" +#include "internal.h" #ifdef CONFIG_MEMCG_KMEM static LIST_HEAD(memcg_list_lrus); @@ -338,22 +339,30 @@ static void memcg_destroy_list_lru_range kfree(mlrus->mlru[i]); } +static struct list_lru_per_memcg *memcg_init_list_lru_one(gfp_t gfp) +{ + int nid; + struct list_lru_per_memcg *mlru; + + mlru = kmalloc(struct_size(mlru, node, nr_node_ids), gfp); + if (!mlru) + return NULL; + + for_each_node(nid) + init_one_lru(&mlru->node[nid]); + + return mlru; +} + static int memcg_init_list_lru_range(struct list_lru_memcg *mlrus, int begin, int end) { int i; for (i = begin; i < end; i++) { - int nid; - struct list_lru_per_memcg *mlru; - - mlru = kmalloc(struct_size(mlru, node, nr_node_ids), GFP_KERNEL); - if (!mlru) + mlrus->mlru[i] = memcg_init_list_lru_one(GFP_KERNEL); + if (!mlrus->mlru[i]) goto fail; - - for_each_node(nid) - init_one_lru(&mlru->node[nid]); - mlrus->mlru[i] = mlru; } return 0; fail: @@ -370,6 +379,8 @@ static int memcg_init_list_lru(struct li if (!memcg_aware) return 0; + spin_lock_init(&lru->lock); + mlrus = kvmalloc(struct_size(mlrus, mlru, size), GFP_KERNEL); if (!mlrus) return -ENOMEM; @@ -416,8 +427,11 @@ static int memcg_update_list_lru(struct return -ENOMEM; } + spin_lock_irq(&lru->lock); memcpy(&new->mlru, &old->mlru, flex_array_size(new, mlru, old_size)); rcu_assign_pointer(lru->mlrus, new); + spin_unlock_irq(&lru->lock); + kvfree_rcu(old, rcu); return 0; } @@ -502,6 +516,78 @@ void memcg_drain_all_list_lrus(int src_i memcg_drain_list_lru(lru, src_idx, dst_memcg); mutex_unlock(&list_lrus_mutex); } + +static bool memcg_list_lru_allocated(struct mem_cgroup *memcg, + struct list_lru *lru) +{ + bool allocated; + int idx; + + idx = memcg->kmemcg_id; + if (unlikely(idx < 0)) + return true; + + rcu_read_lock(); + allocated = !!rcu_dereference(lru->mlrus)->mlru[idx]; + rcu_read_unlock(); + + return allocated; +} + +int memcg_list_lru_alloc(struct mem_cgroup *memcg, struct list_lru *lru, + gfp_t gfp) +{ + int i; + unsigned long flags; + struct list_lru_memcg *mlrus; + struct list_lru_memcg_table { + struct list_lru_per_memcg *mlru; + struct mem_cgroup *memcg; + } *table; + + if (!list_lru_memcg_aware(lru) || memcg_list_lru_allocated(memcg, lru)) + return 0; + + gfp &= GFP_RECLAIM_MASK; + table = kmalloc_array(memcg->css.cgroup->level, sizeof(*table), gfp); + if (!table) + return -ENOMEM; + + /* + * Because the list_lru can be reparented to the parent cgroup's + * list_lru, we should make sure that this cgroup and all its + * ancestors have allocated list_lru_per_memcg. + */ + for (i = 0; memcg; memcg = parent_mem_cgroup(memcg), i++) { + if (memcg_list_lru_allocated(memcg, lru)) + break; + + table[i].memcg = memcg; + table[i].mlru = memcg_init_list_lru_one(gfp); + if (!table[i].mlru) { + while (i--) + kfree(table[i].mlru); + kfree(table); + return -ENOMEM; + } + } + + spin_lock_irqsave(&lru->lock, flags); + mlrus = rcu_dereference_protected(lru->mlrus, true); + while (i--) { + int index = table[i].memcg->kmemcg_id; + + if (mlrus->mlru[index]) + kfree(table[i].mlru); + else + mlrus->mlru[index] = table[i].mlru; + } + spin_unlock_irqrestore(&lru->lock, flags); + + kfree(table); + + return 0; +} #else static int memcg_init_list_lru(struct list_lru *lru, bool memcg_aware) { --- a/mm/memcontrol.c~mm-introduce-kmem_cache_alloc_lru +++ a/mm/memcontrol.c @@ -2805,20 +2805,6 @@ static void commit_charge(struct folio * folio->memcg_data = (unsigned long)memcg; } -static struct mem_cgroup *get_mem_cgroup_from_objcg(struct obj_cgroup *objcg) -{ - struct mem_cgroup *memcg; - - rcu_read_lock(); -retry: - memcg = obj_cgroup_memcg(objcg); - if (unlikely(!css_tryget(&memcg->css))) - goto retry; - rcu_read_unlock(); - - return memcg; -} - #ifdef CONFIG_MEMCG_KMEM /* * The allocated objcg pointers array is not accounted directly. --- a/mm/slab.c~mm-introduce-kmem_cache_alloc_lru +++ a/mm/slab.c @@ -3211,7 +3211,7 @@ slab_alloc_node(struct kmem_cache *cache bool init = false; flags &= gfp_allowed_mask; - cachep = slab_pre_alloc_hook(cachep, &objcg, 1, flags); + cachep = slab_pre_alloc_hook(cachep, NULL, &objcg, 1, flags); if (unlikely(!cachep)) return NULL; @@ -3287,7 +3287,8 @@ __do_cache_alloc(struct kmem_cache *cach #endif /* CONFIG_NUMA */ static __always_inline void * -slab_alloc(struct kmem_cache *cachep, gfp_t flags, size_t orig_size, unsigned long caller) +slab_alloc(struct kmem_cache *cachep, struct list_lru *lru, gfp_t flags, + size_t orig_size, unsigned long caller) { unsigned long save_flags; void *objp; @@ -3295,7 +3296,7 @@ slab_alloc(struct kmem_cache *cachep, gf bool init = false; flags &= gfp_allowed_mask; - cachep = slab_pre_alloc_hook(cachep, &objcg, 1, flags); + cachep = slab_pre_alloc_hook(cachep, lru, &objcg, 1, flags); if (unlikely(!cachep)) return NULL; @@ -3484,6 +3485,18 @@ void ___cache_free(struct kmem_cache *ca __free_one(ac, objp); } +static __always_inline +void *__kmem_cache_alloc_lru(struct kmem_cache *cachep, struct list_lru *lru, + gfp_t flags) +{ + void *ret = slab_alloc(cachep, lru, flags, cachep->object_size, _RET_IP_); + + trace_kmem_cache_alloc(_RET_IP_, ret, + cachep->object_size, cachep->size, flags); + + return ret; +} + /** * kmem_cache_alloc - Allocate an object * @cachep: The cache to allocate from. @@ -3496,15 +3509,17 @@ void ___cache_free(struct kmem_cache *ca */ void *kmem_cache_alloc(struct kmem_cache *cachep, gfp_t flags) { - void *ret = slab_alloc(cachep, flags, cachep->object_size, _RET_IP_); - - trace_kmem_cache_alloc(_RET_IP_, ret, - cachep->object_size, cachep->size, flags); - - return ret; + return __kmem_cache_alloc_lru(cachep, NULL, flags); } EXPORT_SYMBOL(kmem_cache_alloc); +void *kmem_cache_alloc_lru(struct kmem_cache *cachep, struct list_lru *lru, + gfp_t flags) +{ + return __kmem_cache_alloc_lru(cachep, lru, flags); +} +EXPORT_SYMBOL(kmem_cache_alloc_lru); + static __always_inline void cache_alloc_debugcheck_after_bulk(struct kmem_cache *s, gfp_t flags, size_t size, void **p, unsigned long caller) @@ -3521,7 +3536,7 @@ int kmem_cache_alloc_bulk(struct kmem_ca size_t i; struct obj_cgroup *objcg = NULL; - s = slab_pre_alloc_hook(s, &objcg, size, flags); + s = slab_pre_alloc_hook(s, NULL, &objcg, size, flags); if (!s) return 0; @@ -3562,7 +3577,7 @@ kmem_cache_alloc_trace(struct kmem_cache { void *ret; - ret = slab_alloc(cachep, flags, size, _RET_IP_); + ret = slab_alloc(cachep, NULL, flags, size, _RET_IP_); ret = kasan_kmalloc(cachep, ret, size, flags); trace_kmalloc(_RET_IP_, ret, @@ -3689,7 +3704,7 @@ static __always_inline void *__do_kmallo cachep = kmalloc_slab(size, flags); if (unlikely(ZERO_OR_NULL_PTR(cachep))) return cachep; - ret = slab_alloc(cachep, flags, size, caller); + ret = slab_alloc(cachep, NULL, flags, size, caller); ret = kasan_kmalloc(cachep, ret, size, flags); trace_kmalloc(caller, ret, --- a/mm/slab.h~mm-introduce-kmem_cache_alloc_lru +++ a/mm/slab.h @@ -231,6 +231,7 @@ struct kmem_cache { #include #include #include +#include /* * State of the slab allocator. @@ -472,6 +473,7 @@ static inline size_t obj_full_size(struc * Returns false if the allocation should fail. */ static inline bool memcg_slab_pre_alloc_hook(struct kmem_cache *s, + struct list_lru *lru, struct obj_cgroup **objcgp, size_t objects, gfp_t flags) { @@ -487,13 +489,26 @@ static inline bool memcg_slab_pre_alloc_ if (!objcg) return true; - if (obj_cgroup_charge(objcg, flags, objects * obj_full_size(s))) { - obj_cgroup_put(objcg); - return false; + if (lru) { + int ret; + struct mem_cgroup *memcg; + + memcg = get_mem_cgroup_from_objcg(objcg); + ret = memcg_list_lru_alloc(memcg, lru, flags); + css_put(&memcg->css); + + if (ret) + goto out; } + if (obj_cgroup_charge(objcg, flags, objects * obj_full_size(s))) + goto out; + *objcgp = objcg; return true; +out: + obj_cgroup_put(objcg); + return false; } static inline void memcg_slab_post_alloc_hook(struct kmem_cache *s, @@ -598,6 +613,7 @@ static inline void memcg_free_slab_cgrou } static inline bool memcg_slab_pre_alloc_hook(struct kmem_cache *s, + struct list_lru *lru, struct obj_cgroup **objcgp, size_t objects, gfp_t flags) { @@ -697,6 +713,7 @@ static inline size_t slab_ksize(const st } static inline struct kmem_cache *slab_pre_alloc_hook(struct kmem_cache *s, + struct list_lru *lru, struct obj_cgroup **objcgp, size_t size, gfp_t flags) { @@ -707,7 +724,7 @@ static inline struct kmem_cache *slab_pr if (should_failslab(s, flags)) return NULL; - if (!memcg_slab_pre_alloc_hook(s, objcgp, size, flags)) + if (!memcg_slab_pre_alloc_hook(s, lru, objcgp, size, flags)) return NULL; return s; --- a/mm/slob.c~mm-introduce-kmem_cache_alloc_lru +++ a/mm/slob.c @@ -635,6 +635,12 @@ void *kmem_cache_alloc(struct kmem_cache } EXPORT_SYMBOL(kmem_cache_alloc); + +void *kmem_cache_alloc_lru(struct kmem_cache *cachep, struct list_lru *lru, gfp_t flags) +{ + return slob_alloc_node(cachep, flags, NUMA_NO_NODE); +} +EXPORT_SYMBOL(kmem_cache_alloc_lru); #ifdef CONFIG_NUMA void *__kmalloc_node(size_t size, gfp_t gfp, int node) { --- a/mm/slub.c~mm-introduce-kmem_cache_alloc_lru +++ a/mm/slub.c @@ -3131,7 +3131,7 @@ static __always_inline void maybe_wipe_o * * Otherwise we can simply pick the next object from the lockless free list. */ -static __always_inline void *slab_alloc_node(struct kmem_cache *s, +static __always_inline void *slab_alloc_node(struct kmem_cache *s, struct list_lru *lru, gfp_t gfpflags, int node, unsigned long addr, size_t orig_size) { void *object; @@ -3141,7 +3141,7 @@ static __always_inline void *slab_alloc_ struct obj_cgroup *objcg = NULL; bool init = false; - s = slab_pre_alloc_hook(s, &objcg, 1, gfpflags); + s = slab_pre_alloc_hook(s, lru, &objcg, 1, gfpflags); if (!s) return NULL; @@ -3232,27 +3232,41 @@ out: return object; } -static __always_inline void *slab_alloc(struct kmem_cache *s, +static __always_inline void *slab_alloc(struct kmem_cache *s, struct list_lru *lru, gfp_t gfpflags, unsigned long addr, size_t orig_size) { - return slab_alloc_node(s, gfpflags, NUMA_NO_NODE, addr, orig_size); + return slab_alloc_node(s, lru, gfpflags, NUMA_NO_NODE, addr, orig_size); } -void *kmem_cache_alloc(struct kmem_cache *s, gfp_t gfpflags) +static __always_inline +void *__kmem_cache_alloc_lru(struct kmem_cache *s, struct list_lru *lru, + gfp_t gfpflags) { - void *ret = slab_alloc(s, gfpflags, _RET_IP_, s->object_size); + void *ret = slab_alloc(s, lru, gfpflags, _RET_IP_, s->object_size); trace_kmem_cache_alloc(_RET_IP_, ret, s->object_size, s->size, gfpflags); return ret; } + +void *kmem_cache_alloc(struct kmem_cache *s, gfp_t gfpflags) +{ + return __kmem_cache_alloc_lru(s, NULL, gfpflags); +} EXPORT_SYMBOL(kmem_cache_alloc); +void *kmem_cache_alloc_lru(struct kmem_cache *s, struct list_lru *lru, + gfp_t gfpflags) +{ + return __kmem_cache_alloc_lru(s, lru, gfpflags); +} +EXPORT_SYMBOL(kmem_cache_alloc_lru); + #ifdef CONFIG_TRACING void *kmem_cache_alloc_trace(struct kmem_cache *s, gfp_t gfpflags, size_t size) { - void *ret = slab_alloc(s, gfpflags, _RET_IP_, size); + void *ret = slab_alloc(s, NULL, gfpflags, _RET_IP_, size); trace_kmalloc(_RET_IP_, ret, size, s->size, gfpflags); ret = kasan_kmalloc(s, ret, size, gfpflags); return ret; @@ -3263,7 +3277,7 @@ EXPORT_SYMBOL(kmem_cache_alloc_trace); #ifdef CONFIG_NUMA void *kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags, int node) { - void *ret = slab_alloc_node(s, gfpflags, node, _RET_IP_, s->object_size); + void *ret = slab_alloc_node(s, NULL, gfpflags, node, _RET_IP_, s->object_size); trace_kmem_cache_alloc_node(_RET_IP_, ret, s->object_size, s->size, gfpflags, node); @@ -3277,7 +3291,7 @@ void *kmem_cache_alloc_node_trace(struct gfp_t gfpflags, int node, size_t size) { - void *ret = slab_alloc_node(s, gfpflags, node, _RET_IP_, size); + void *ret = slab_alloc_node(s, NULL, gfpflags, node, _RET_IP_, size); trace_kmalloc_node(_RET_IP_, ret, size, s->size, gfpflags, node); @@ -3667,7 +3681,7 @@ int kmem_cache_alloc_bulk(struct kmem_ca struct obj_cgroup *objcg = NULL; /* memcg and kmem_cache debug support */ - s = slab_pre_alloc_hook(s, &objcg, size, flags); + s = slab_pre_alloc_hook(s, NULL, &objcg, size, flags); if (unlikely(!s)) return false; /* @@ -4417,7 +4431,7 @@ void *__kmalloc(size_t size, gfp_t flags if (unlikely(ZERO_OR_NULL_PTR(s))) return s; - ret = slab_alloc(s, flags, _RET_IP_, size); + ret = slab_alloc(s, NULL, flags, _RET_IP_, size); trace_kmalloc(_RET_IP_, ret, size, s->size, flags); @@ -4465,7 +4479,7 @@ void *__kmalloc_node(size_t size, gfp_t if (unlikely(ZERO_OR_NULL_PTR(s))) return s; - ret = slab_alloc_node(s, flags, node, _RET_IP_, size); + ret = slab_alloc_node(s, NULL, flags, node, _RET_IP_, size); trace_kmalloc_node(_RET_IP_, ret, size, s->size, flags, node); @@ -4923,7 +4937,7 @@ void *__kmalloc_track_caller(size_t size if (unlikely(ZERO_OR_NULL_PTR(s))) return s; - ret = slab_alloc(s, gfpflags, caller, size); + ret = slab_alloc(s, NULL, gfpflags, caller, size); /* Honor the call site pointer we received. */ trace_kmalloc(caller, ret, size, s->size, gfpflags); @@ -4954,7 +4968,7 @@ void *__kmalloc_node_track_caller(size_t if (unlikely(ZERO_OR_NULL_PTR(s))) return s; - ret = slab_alloc_node(s, gfpflags, node, caller, size); + ret = slab_alloc_node(s, NULL, gfpflags, node, caller, size); /* Honor the call site pointer we received. */ trace_kmalloc_node(caller, ret, size, s->size, gfpflags, node); From patchwork Tue Mar 22 21:41:00 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789089 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 71DD9C433EF for ; Tue, 22 Mar 2022 21:41:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 082276B0080; Tue, 22 Mar 2022 17:41:04 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 00A2C6B0083; Tue, 22 Mar 2022 17:41:03 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E3BB46B0087; Tue, 22 Mar 2022 17:41:03 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.27]) by kanga.kvack.org (Postfix) with ESMTP id D4F3B6B0080 for ; Tue, 22 Mar 2022 17:41:03 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id A260524783 for ; Tue, 22 Mar 2022 21:41:03 +0000 (UTC) X-FDA: 79273342806.06.E5F37E2 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf18.hostedemail.com (Postfix) with ESMTP id 0C9691C0026 for ; Tue, 22 Mar 2022 21:41:02 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id F2CBAB81D9E; Tue, 22 Mar 2022 21:41:01 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id A902AC340EC; Tue, 22 Mar 2022 21:41:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985260; bh=1//xFcecO7mK3EjNUbQzTa22CRQRRU9/xJu2uZX++mY=; h=Date:To:From:In-Reply-To:Subject:From; b=o4tP7Uo3TgYE6YsFCS5ii1Tmkb9eUqhwTeKkzK8C6thnqoyvsnwolM2KHKUAQ3GHZ J2BAwDxgNUXiBdgWF3jm6Y35UenIHPZhtVTYOeF830jIL0Nm1/cJWkqFe7Z47SGUvq WGKDLpfdNNiD8w5eDg9O5+aN6KUxXprnq8gXhwIw= Date: Tue, 22 Mar 2022 14:41:00 -0700 To: zhengqi.arch@bytedance.com,willy@infradead.org,vdavydov.dev@gmail.com,vbabka@suse.cz,tytso@mit.edu,trond.myklebust@hammerspace.com,shy828301@gmail.com,shakeelb@google.com,roman.gushchin@linux.dev,richard.weiyang@gmail.com,mhocko@kernel.org,kari.argillander@gmail.com,jaegeuk@kernel.org,hannes@cmpxchg.org,fam.zheng@bytedance.com,duanxiongchun@bytedance.com,david@fromorbit.com,chao@kernel.org,Anna.Schumaker@Netapp.com,alexs@kernel.org,songmuchun@bytedance.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 049/227] fs: introduce alloc_inode_sb() to allocate filesystems specific inode Message-Id: <20220322214100.A902AC340EC@smtp.kernel.org> X-Stat-Signature: mhozz86z3wxb8ixjwmrzczh1f9bjj3i9 X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 0C9691C0026 Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=o4tP7Uo3; dmarc=none; spf=pass (imf18.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Rspam-User: X-HE-Tag: 1647985262-560339 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Muchun Song Subject: fs: introduce alloc_inode_sb() to allocate filesystems specific inode The allocated inode cache is supposed to be added to its memcg list_lru which should be allocated as well in advance. That can be done by kmem_cache_alloc_lru() which allocates object and list_lru. The file systems is main user of it. So introduce alloc_inode_sb() to allocate file system specific inodes and set up the inode reclaim context properly. The file system is supposed to use alloc_inode_sb() to allocate inodes. In the later patches, we will convert all users to the new API. Link: https://lkml.kernel.org/r/20220228122126.37293-4-songmuchun@bytedance.com Signed-off-by: Muchun Song Reviewed-by: Roman Gushchin Cc: Alex Shi Cc: Anna Schumaker Cc: Chao Yu Cc: Dave Chinner Cc: Fam Zheng Cc: Jaegeuk Kim Cc: Johannes Weiner Cc: Kari Argillander Cc: Matthew Wilcox (Oracle) Cc: Michal Hocko Cc: Qi Zheng Cc: Shakeel Butt Cc: Theodore Ts'o Cc: Trond Myklebust Cc: Vladimir Davydov Cc: Vlastimil Babka Cc: Wei Yang Cc: Xiongchun Duan Cc: Yang Shi Signed-off-by: Andrew Morton --- Documentation/filesystems/porting.rst | 6 ++++++ fs/inode.c | 2 +- include/linux/fs.h | 11 +++++++++++ 3 files changed, 18 insertions(+), 1 deletion(-) --- a/Documentation/filesystems/porting.rst~fs-introduce-alloc_inode_sb-to-allocate-filesystems-specific-inode +++ a/Documentation/filesystems/porting.rst @@ -45,6 +45,12 @@ typically between calling iget_locked() At some point that will become mandatory. +**mandatory** + +The foo_inode_info should always be allocated through alloc_inode_sb() rather +than kmem_cache_alloc() or kmalloc() related to set up the inode reclaim context +correctly. + --- **mandatory** --- a/fs/inode.c~fs-introduce-alloc_inode_sb-to-allocate-filesystems-specific-inode +++ a/fs/inode.c @@ -259,7 +259,7 @@ static struct inode *alloc_inode(struct if (ops->alloc_inode) inode = ops->alloc_inode(sb); else - inode = kmem_cache_alloc(inode_cachep, GFP_KERNEL); + inode = alloc_inode_sb(sb, inode_cachep, GFP_KERNEL); if (!inode) return NULL; --- a/include/linux/fs.h~fs-introduce-alloc_inode_sb-to-allocate-filesystems-specific-inode +++ a/include/linux/fs.h @@ -42,6 +42,7 @@ #include #include #include +#include #include #include @@ -3114,6 +3115,16 @@ extern void free_inode_nonrcu(struct ino extern int should_remove_suid(struct dentry *); extern int file_remove_privs(struct file *); +/* + * This must be used for allocating filesystems specific inodes to set + * up the inode reclaim context correctly. + */ +static inline void * +alloc_inode_sb(struct super_block *sb, struct kmem_cache *cache, gfp_t gfp) +{ + return kmem_cache_alloc_lru(cache, &sb->s_inode_lru, gfp); +} + extern void __insert_inode_hash(struct inode *, unsigned long hashval); static inline void insert_inode_hash(struct inode *inode) { From patchwork Tue Mar 22 21:41:03 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789090 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A447EC4332F for ; Tue, 22 Mar 2022 21:41:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3B9CB6B007B; Tue, 22 Mar 2022 17:41:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 344756B0087; Tue, 22 Mar 2022 17:41:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1DEA96B0088; Tue, 22 Mar 2022 17:41:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0112.hostedemail.com [216.40.44.112]) by kanga.kvack.org (Postfix) with ESMTP id 0D14A6B007B for ; Tue, 22 Mar 2022 17:41:06 -0400 (EDT) Received: from smtpin16.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id BFC011828A80B for ; Tue, 22 Mar 2022 21:41:05 +0000 (UTC) X-FDA: 79273342890.16.B9B1EC1 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf19.hostedemail.com (Postfix) with ESMTP id 2CF9A1A002E for ; Tue, 22 Mar 2022 21:41:05 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 87DCD6117F; Tue, 22 Mar 2022 21:41:04 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id CD2BCC340EC; Tue, 22 Mar 2022 21:41:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985264; bh=NJe3fKBe78mj29K5ZRXnnZv2tf/gzesWjIVyFub8Qco=; h=Date:To:From:In-Reply-To:Subject:From; b=p6uH8vUGX3iN10zZm5iqx4w6AoJEq4LzV6Dyl638UwbUsXaqAPL+zVjhlYQt9iT1O 17DibxUnUJOiFPS30m3FYS+PL5pCXNVvXVhbrMl7UJLDbT1tO8aYLKy/c/3BifEWN0 VymWg0npuESLmuhcMh02Fn91wZWhmXQ61k01ep6g= Date: Tue, 22 Mar 2022 14:41:03 -0700 To: zhengqi.arch@bytedance.com,willy@infradead.org,vdavydov.dev@gmail.com,vbabka@suse.cz,tytso@mit.edu,trond.myklebust@hammerspace.com,shy828301@gmail.com,shakeelb@google.com,roman.gushchin@linux.dev,richard.weiyang@gmail.com,mhocko@kernel.org,kari.argillander@gmail.com,jaegeuk@kernel.org,hannes@cmpxchg.org,fam.zheng@bytedance.com,duanxiongchun@bytedance.com,david@fromorbit.com,chao@kernel.org,Anna.Schumaker@Netapp.com,alexs@kernel.org,songmuchun@bytedance.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 050/227] fs: allocate inode by using alloc_inode_sb() Message-Id: <20220322214103.CD2BCC340EC@smtp.kernel.org> X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 2CF9A1A002E X-Rspam-User: Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=p6uH8vUG; dmarc=none; spf=pass (imf19.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Stat-Signature: bqjybikuiz71cpt588qqna9qozmqfoyf X-HE-Tag: 1647985265-950698 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Muchun Song Subject: fs: allocate inode by using alloc_inode_sb() The inode allocation is supposed to use alloc_inode_sb(), so convert kmem_cache_alloc() of all filesystems to alloc_inode_sb(). Link: https://lkml.kernel.org/r/20220228122126.37293-5-songmuchun@bytedance.com Signed-off-by: Muchun Song Acked-by: Theodore Ts'o [ext4] Acked-by: Roman Gushchin Cc: Alex Shi Cc: Anna Schumaker Cc: Chao Yu Cc: Dave Chinner Cc: Fam Zheng Cc: Jaegeuk Kim Cc: Johannes Weiner Cc: Kari Argillander Cc: Matthew Wilcox (Oracle) Cc: Michal Hocko Cc: Qi Zheng Cc: Shakeel Butt Cc: Trond Myklebust Cc: Vladimir Davydov Cc: Vlastimil Babka Cc: Wei Yang Cc: Xiongchun Duan Cc: Yang Shi Signed-off-by: Andrew Morton --- block/bdev.c | 2 +- drivers/dax/super.c | 2 +- fs/9p/vfs_inode.c | 2 +- fs/adfs/super.c | 2 +- fs/affs/super.c | 2 +- fs/afs/super.c | 2 +- fs/befs/linuxvfs.c | 2 +- fs/bfs/inode.c | 2 +- fs/btrfs/inode.c | 2 +- fs/ceph/inode.c | 2 +- fs/cifs/cifsfs.c | 2 +- fs/coda/inode.c | 2 +- fs/ecryptfs/super.c | 2 +- fs/efs/super.c | 2 +- fs/erofs/super.c | 2 +- fs/exfat/super.c | 2 +- fs/ext2/super.c | 2 +- fs/ext4/super.c | 2 +- fs/fat/inode.c | 2 +- fs/freevxfs/vxfs_super.c | 2 +- fs/fuse/inode.c | 2 +- fs/gfs2/super.c | 2 +- fs/hfs/super.c | 2 +- fs/hfsplus/super.c | 2 +- fs/hostfs/hostfs_kern.c | 2 +- fs/hpfs/super.c | 2 +- fs/hugetlbfs/inode.c | 2 +- fs/isofs/inode.c | 2 +- fs/jffs2/super.c | 2 +- fs/jfs/super.c | 2 +- fs/minix/inode.c | 2 +- fs/nfs/inode.c | 2 +- fs/nilfs2/super.c | 2 +- fs/ntfs/inode.c | 2 +- fs/ntfs3/super.c | 2 +- fs/ocfs2/dlmfs/dlmfs.c | 2 +- fs/ocfs2/super.c | 2 +- fs/openpromfs/inode.c | 2 +- fs/orangefs/super.c | 2 +- fs/overlayfs/super.c | 2 +- fs/proc/inode.c | 2 +- fs/qnx4/inode.c | 2 +- fs/qnx6/inode.c | 2 +- fs/reiserfs/super.c | 2 +- fs/romfs/super.c | 2 +- fs/squashfs/super.c | 2 +- fs/sysv/inode.c | 2 +- fs/ubifs/super.c | 2 +- fs/udf/super.c | 2 +- fs/ufs/super.c | 2 +- fs/vboxsf/super.c | 2 +- fs/xfs/xfs_icache.c | 2 +- fs/zonefs/super.c | 2 +- ipc/mqueue.c | 2 +- mm/shmem.c | 2 +- net/socket.c | 2 +- net/sunrpc/rpc_pipe.c | 2 +- 57 files changed, 57 insertions(+), 57 deletions(-) --- a/block/bdev.c~fs-allocate-inode-by-using-alloc_inode_sb +++ a/block/bdev.c @@ -385,7 +385,7 @@ static struct kmem_cache * bdev_cachep _ static struct inode *bdev_alloc_inode(struct super_block *sb) { - struct bdev_inode *ei = kmem_cache_alloc(bdev_cachep, GFP_KERNEL); + struct bdev_inode *ei = alloc_inode_sb(sb, bdev_cachep, GFP_KERNEL); if (!ei) return NULL; --- a/drivers/dax/super.c~fs-allocate-inode-by-using-alloc_inode_sb +++ a/drivers/dax/super.c @@ -282,7 +282,7 @@ static struct inode *dax_alloc_inode(str struct dax_device *dax_dev; struct inode *inode; - dax_dev = kmem_cache_alloc(dax_cache, GFP_KERNEL); + dax_dev = alloc_inode_sb(sb, dax_cache, GFP_KERNEL); if (!dax_dev) return NULL; --- a/fs/9p/vfs_inode.c~fs-allocate-inode-by-using-alloc_inode_sb +++ a/fs/9p/vfs_inode.c @@ -228,7 +228,7 @@ struct inode *v9fs_alloc_inode(struct su { struct v9fs_inode *v9inode; - v9inode = kmem_cache_alloc(v9fs_inode_cache, GFP_KERNEL); + v9inode = alloc_inode_sb(sb, v9fs_inode_cache, GFP_KERNEL); if (!v9inode) return NULL; #ifdef CONFIG_9P_FSCACHE --- a/fs/adfs/super.c~fs-allocate-inode-by-using-alloc_inode_sb +++ a/fs/adfs/super.c @@ -220,7 +220,7 @@ static struct kmem_cache *adfs_inode_cac static struct inode *adfs_alloc_inode(struct super_block *sb) { struct adfs_inode_info *ei; - ei = kmem_cache_alloc(adfs_inode_cachep, GFP_KERNEL); + ei = alloc_inode_sb(sb, adfs_inode_cachep, GFP_KERNEL); if (!ei) return NULL; return &ei->vfs_inode; --- a/fs/affs/super.c~fs-allocate-inode-by-using-alloc_inode_sb +++ a/fs/affs/super.c @@ -100,7 +100,7 @@ static struct inode *affs_alloc_inode(st { struct affs_inode_info *i; - i = kmem_cache_alloc(affs_inode_cachep, GFP_KERNEL); + i = alloc_inode_sb(sb, affs_inode_cachep, GFP_KERNEL); if (!i) return NULL; --- a/fs/afs/super.c~fs-allocate-inode-by-using-alloc_inode_sb +++ a/fs/afs/super.c @@ -679,7 +679,7 @@ static struct inode *afs_alloc_inode(str { struct afs_vnode *vnode; - vnode = kmem_cache_alloc(afs_inode_cachep, GFP_KERNEL); + vnode = alloc_inode_sb(sb, afs_inode_cachep, GFP_KERNEL); if (!vnode) return NULL; --- a/fs/befs/linuxvfs.c~fs-allocate-inode-by-using-alloc_inode_sb +++ a/fs/befs/linuxvfs.c @@ -277,7 +277,7 @@ befs_alloc_inode(struct super_block *sb) { struct befs_inode_info *bi; - bi = kmem_cache_alloc(befs_inode_cachep, GFP_KERNEL); + bi = alloc_inode_sb(sb, befs_inode_cachep, GFP_KERNEL); if (!bi) return NULL; return &bi->vfs_inode; --- a/fs/bfs/inode.c~fs-allocate-inode-by-using-alloc_inode_sb +++ a/fs/bfs/inode.c @@ -239,7 +239,7 @@ static struct kmem_cache *bfs_inode_cach static struct inode *bfs_alloc_inode(struct super_block *sb) { struct bfs_inode_info *bi; - bi = kmem_cache_alloc(bfs_inode_cachep, GFP_KERNEL); + bi = alloc_inode_sb(sb, bfs_inode_cachep, GFP_KERNEL); if (!bi) return NULL; return &bi->vfs_inode; --- a/fs/btrfs/inode.c~fs-allocate-inode-by-using-alloc_inode_sb +++ a/fs/btrfs/inode.c @@ -8787,7 +8787,7 @@ struct inode *btrfs_alloc_inode(struct s struct btrfs_inode *ei; struct inode *inode; - ei = kmem_cache_alloc(btrfs_inode_cachep, GFP_KERNEL); + ei = alloc_inode_sb(sb, btrfs_inode_cachep, GFP_KERNEL); if (!ei) return NULL; --- a/fs/ceph/inode.c~fs-allocate-inode-by-using-alloc_inode_sb +++ a/fs/ceph/inode.c @@ -447,7 +447,7 @@ struct inode *ceph_alloc_inode(struct su struct ceph_inode_info *ci; int i; - ci = kmem_cache_alloc(ceph_inode_cachep, GFP_NOFS); + ci = alloc_inode_sb(sb, ceph_inode_cachep, GFP_NOFS); if (!ci) return NULL; --- a/fs/cifs/cifsfs.c~fs-allocate-inode-by-using-alloc_inode_sb +++ a/fs/cifs/cifsfs.c @@ -354,7 +354,7 @@ static struct inode * cifs_alloc_inode(struct super_block *sb) { struct cifsInodeInfo *cifs_inode; - cifs_inode = kmem_cache_alloc(cifs_inode_cachep, GFP_KERNEL); + cifs_inode = alloc_inode_sb(sb, cifs_inode_cachep, GFP_KERNEL); if (!cifs_inode) return NULL; cifs_inode->cifsAttrs = 0x20; /* default */ --- a/fs/coda/inode.c~fs-allocate-inode-by-using-alloc_inode_sb +++ a/fs/coda/inode.c @@ -43,7 +43,7 @@ static struct kmem_cache * coda_inode_ca static struct inode *coda_alloc_inode(struct super_block *sb) { struct coda_inode_info *ei; - ei = kmem_cache_alloc(coda_inode_cachep, GFP_KERNEL); + ei = alloc_inode_sb(sb, coda_inode_cachep, GFP_KERNEL); if (!ei) return NULL; memset(&ei->c_fid, 0, sizeof(struct CodaFid)); --- a/fs/ecryptfs/super.c~fs-allocate-inode-by-using-alloc_inode_sb +++ a/fs/ecryptfs/super.c @@ -38,7 +38,7 @@ static struct inode *ecryptfs_alloc_inod struct ecryptfs_inode_info *inode_info; struct inode *inode = NULL; - inode_info = kmem_cache_alloc(ecryptfs_inode_info_cache, GFP_KERNEL); + inode_info = alloc_inode_sb(sb, ecryptfs_inode_info_cache, GFP_KERNEL); if (unlikely(!inode_info)) goto out; if (ecryptfs_init_crypt_stat(&inode_info->crypt_stat)) { --- a/fs/efs/super.c~fs-allocate-inode-by-using-alloc_inode_sb +++ a/fs/efs/super.c @@ -69,7 +69,7 @@ static struct kmem_cache * efs_inode_cac static struct inode *efs_alloc_inode(struct super_block *sb) { struct efs_inode_info *ei; - ei = kmem_cache_alloc(efs_inode_cachep, GFP_KERNEL); + ei = alloc_inode_sb(sb, efs_inode_cachep, GFP_KERNEL); if (!ei) return NULL; return &ei->vfs_inode; --- a/fs/erofs/super.c~fs-allocate-inode-by-using-alloc_inode_sb +++ a/fs/erofs/super.c @@ -84,7 +84,7 @@ static void erofs_inode_init_once(void * static struct inode *erofs_alloc_inode(struct super_block *sb) { struct erofs_inode *vi = - kmem_cache_alloc(erofs_inode_cachep, GFP_KERNEL); + alloc_inode_sb(sb, erofs_inode_cachep, GFP_KERNEL); if (!vi) return NULL; --- a/fs/exfat/super.c~fs-allocate-inode-by-using-alloc_inode_sb +++ a/fs/exfat/super.c @@ -183,7 +183,7 @@ static struct inode *exfat_alloc_inode(s { struct exfat_inode_info *ei; - ei = kmem_cache_alloc(exfat_inode_cachep, GFP_NOFS); + ei = alloc_inode_sb(sb, exfat_inode_cachep, GFP_NOFS); if (!ei) return NULL; --- a/fs/ext2/super.c~fs-allocate-inode-by-using-alloc_inode_sb +++ a/fs/ext2/super.c @@ -180,7 +180,7 @@ static struct kmem_cache * ext2_inode_ca static struct inode *ext2_alloc_inode(struct super_block *sb) { struct ext2_inode_info *ei; - ei = kmem_cache_alloc(ext2_inode_cachep, GFP_KERNEL); + ei = alloc_inode_sb(sb, ext2_inode_cachep, GFP_KERNEL); if (!ei) return NULL; ei->i_block_alloc_info = NULL; --- a/fs/ext4/super.c~fs-allocate-inode-by-using-alloc_inode_sb +++ a/fs/ext4/super.c @@ -1316,7 +1316,7 @@ static struct inode *ext4_alloc_inode(st { struct ext4_inode_info *ei; - ei = kmem_cache_alloc(ext4_inode_cachep, GFP_NOFS); + ei = alloc_inode_sb(sb, ext4_inode_cachep, GFP_NOFS); if (!ei) return NULL; --- a/fs/fat/inode.c~fs-allocate-inode-by-using-alloc_inode_sb +++ a/fs/fat/inode.c @@ -745,7 +745,7 @@ static struct kmem_cache *fat_inode_cach static struct inode *fat_alloc_inode(struct super_block *sb) { struct msdos_inode_info *ei; - ei = kmem_cache_alloc(fat_inode_cachep, GFP_NOFS); + ei = alloc_inode_sb(sb, fat_inode_cachep, GFP_NOFS); if (!ei) return NULL; --- a/fs/freevxfs/vxfs_super.c~fs-allocate-inode-by-using-alloc_inode_sb +++ a/fs/freevxfs/vxfs_super.c @@ -124,7 +124,7 @@ static struct inode *vxfs_alloc_inode(st { struct vxfs_inode_info *vi; - vi = kmem_cache_alloc(vxfs_inode_cachep, GFP_KERNEL); + vi = alloc_inode_sb(sb, vxfs_inode_cachep, GFP_KERNEL); if (!vi) return NULL; inode_init_once(&vi->vfs_inode); --- a/fs/fuse/inode.c~fs-allocate-inode-by-using-alloc_inode_sb +++ a/fs/fuse/inode.c @@ -72,7 +72,7 @@ static struct inode *fuse_alloc_inode(st { struct fuse_inode *fi; - fi = kmem_cache_alloc(fuse_inode_cachep, GFP_KERNEL); + fi = alloc_inode_sb(sb, fuse_inode_cachep, GFP_KERNEL); if (!fi) return NULL; --- a/fs/gfs2/super.c~fs-allocate-inode-by-using-alloc_inode_sb +++ a/fs/gfs2/super.c @@ -1425,7 +1425,7 @@ static struct inode *gfs2_alloc_inode(st { struct gfs2_inode *ip; - ip = kmem_cache_alloc(gfs2_inode_cachep, GFP_KERNEL); + ip = alloc_inode_sb(sb, gfs2_inode_cachep, GFP_KERNEL); if (!ip) return NULL; ip->i_flags = 0; --- a/fs/hfsplus/super.c~fs-allocate-inode-by-using-alloc_inode_sb +++ a/fs/hfsplus/super.c @@ -624,7 +624,7 @@ static struct inode *hfsplus_alloc_inode { struct hfsplus_inode_info *i; - i = kmem_cache_alloc(hfsplus_inode_cachep, GFP_KERNEL); + i = alloc_inode_sb(sb, hfsplus_inode_cachep, GFP_KERNEL); return i ? &i->vfs_inode : NULL; } --- a/fs/hfs/super.c~fs-allocate-inode-by-using-alloc_inode_sb +++ a/fs/hfs/super.c @@ -162,7 +162,7 @@ static struct inode *hfs_alloc_inode(str { struct hfs_inode_info *i; - i = kmem_cache_alloc(hfs_inode_cachep, GFP_KERNEL); + i = alloc_inode_sb(sb, hfs_inode_cachep, GFP_KERNEL); return i ? &i->vfs_inode : NULL; } --- a/fs/hostfs/hostfs_kern.c~fs-allocate-inode-by-using-alloc_inode_sb +++ a/fs/hostfs/hostfs_kern.c @@ -222,7 +222,7 @@ static struct inode *hostfs_alloc_inode( { struct hostfs_inode_info *hi; - hi = kmem_cache_alloc(hostfs_inode_cache, GFP_KERNEL_ACCOUNT); + hi = alloc_inode_sb(sb, hostfs_inode_cache, GFP_KERNEL_ACCOUNT); if (hi == NULL) return NULL; hi->fd = -1; --- a/fs/hpfs/super.c~fs-allocate-inode-by-using-alloc_inode_sb +++ a/fs/hpfs/super.c @@ -232,7 +232,7 @@ static struct kmem_cache * hpfs_inode_ca static struct inode *hpfs_alloc_inode(struct super_block *sb) { struct hpfs_inode_info *ei; - ei = kmem_cache_alloc(hpfs_inode_cachep, GFP_NOFS); + ei = alloc_inode_sb(sb, hpfs_inode_cachep, GFP_NOFS); if (!ei) return NULL; return &ei->vfs_inode; --- a/fs/hugetlbfs/inode.c~fs-allocate-inode-by-using-alloc_inode_sb +++ a/fs/hugetlbfs/inode.c @@ -1110,7 +1110,7 @@ static struct inode *hugetlbfs_alloc_ino if (unlikely(!hugetlbfs_dec_free_inodes(sbinfo))) return NULL; - p = kmem_cache_alloc(hugetlbfs_inode_cachep, GFP_KERNEL); + p = alloc_inode_sb(sb, hugetlbfs_inode_cachep, GFP_KERNEL); if (unlikely(!p)) { hugetlbfs_inc_free_inodes(sbinfo); return NULL; --- a/fs/isofs/inode.c~fs-allocate-inode-by-using-alloc_inode_sb +++ a/fs/isofs/inode.c @@ -70,7 +70,7 @@ static struct kmem_cache *isofs_inode_ca static struct inode *isofs_alloc_inode(struct super_block *sb) { struct iso_inode_info *ei; - ei = kmem_cache_alloc(isofs_inode_cachep, GFP_KERNEL); + ei = alloc_inode_sb(sb, isofs_inode_cachep, GFP_KERNEL); if (!ei) return NULL; return &ei->vfs_inode; --- a/fs/jffs2/super.c~fs-allocate-inode-by-using-alloc_inode_sb +++ a/fs/jffs2/super.c @@ -39,7 +39,7 @@ static struct inode *jffs2_alloc_inode(s { struct jffs2_inode_info *f; - f = kmem_cache_alloc(jffs2_inode_cachep, GFP_KERNEL); + f = alloc_inode_sb(sb, jffs2_inode_cachep, GFP_KERNEL); if (!f) return NULL; return &f->vfs_inode; --- a/fs/jfs/super.c~fs-allocate-inode-by-using-alloc_inode_sb +++ a/fs/jfs/super.c @@ -102,7 +102,7 @@ static struct inode *jfs_alloc_inode(str { struct jfs_inode_info *jfs_inode; - jfs_inode = kmem_cache_alloc(jfs_inode_cachep, GFP_NOFS); + jfs_inode = alloc_inode_sb(sb, jfs_inode_cachep, GFP_NOFS); if (!jfs_inode) return NULL; #ifdef CONFIG_QUOTA --- a/fs/minix/inode.c~fs-allocate-inode-by-using-alloc_inode_sb +++ a/fs/minix/inode.c @@ -63,7 +63,7 @@ static struct kmem_cache * minix_inode_c static struct inode *minix_alloc_inode(struct super_block *sb) { struct minix_inode_info *ei; - ei = kmem_cache_alloc(minix_inode_cachep, GFP_KERNEL); + ei = alloc_inode_sb(sb, minix_inode_cachep, GFP_KERNEL); if (!ei) return NULL; return &ei->vfs_inode; --- a/fs/nfs/inode.c~fs-allocate-inode-by-using-alloc_inode_sb +++ a/fs/nfs/inode.c @@ -2238,7 +2238,7 @@ static int nfs_update_inode(struct inode struct inode *nfs_alloc_inode(struct super_block *sb) { struct nfs_inode *nfsi; - nfsi = kmem_cache_alloc(nfs_inode_cachep, GFP_KERNEL); + nfsi = alloc_inode_sb(sb, nfs_inode_cachep, GFP_KERNEL); if (!nfsi) return NULL; nfsi->flags = 0UL; --- a/fs/nilfs2/super.c~fs-allocate-inode-by-using-alloc_inode_sb +++ a/fs/nilfs2/super.c @@ -151,7 +151,7 @@ struct inode *nilfs_alloc_inode(struct s { struct nilfs_inode_info *ii; - ii = kmem_cache_alloc(nilfs_inode_cachep, GFP_NOFS); + ii = alloc_inode_sb(sb, nilfs_inode_cachep, GFP_NOFS); if (!ii) return NULL; ii->i_bh = NULL; --- a/fs/ntfs3/super.c~fs-allocate-inode-by-using-alloc_inode_sb +++ a/fs/ntfs3/super.c @@ -399,7 +399,7 @@ static struct kmem_cache *ntfs_inode_cac static struct inode *ntfs_alloc_inode(struct super_block *sb) { - struct ntfs_inode *ni = kmem_cache_alloc(ntfs_inode_cachep, GFP_NOFS); + struct ntfs_inode *ni = alloc_inode_sb(sb, ntfs_inode_cachep, GFP_NOFS); if (!ni) return NULL; --- a/fs/ntfs/inode.c~fs-allocate-inode-by-using-alloc_inode_sb +++ a/fs/ntfs/inode.c @@ -310,7 +310,7 @@ struct inode *ntfs_alloc_big_inode(struc ntfs_inode *ni; ntfs_debug("Entering."); - ni = kmem_cache_alloc(ntfs_big_inode_cache, GFP_NOFS); + ni = alloc_inode_sb(sb, ntfs_big_inode_cache, GFP_NOFS); if (likely(ni != NULL)) { ni->state = 0; return VFS_I(ni); --- a/fs/ocfs2/dlmfs/dlmfs.c~fs-allocate-inode-by-using-alloc_inode_sb +++ a/fs/ocfs2/dlmfs/dlmfs.c @@ -280,7 +280,7 @@ static struct inode *dlmfs_alloc_inode(s { struct dlmfs_inode_private *ip; - ip = kmem_cache_alloc(dlmfs_inode_cache, GFP_NOFS); + ip = alloc_inode_sb(sb, dlmfs_inode_cache, GFP_NOFS); if (!ip) return NULL; --- a/fs/ocfs2/super.c~fs-allocate-inode-by-using-alloc_inode_sb +++ a/fs/ocfs2/super.c @@ -548,7 +548,7 @@ static struct inode *ocfs2_alloc_inode(s { struct ocfs2_inode_info *oi; - oi = kmem_cache_alloc(ocfs2_inode_cachep, GFP_NOFS); + oi = alloc_inode_sb(sb, ocfs2_inode_cachep, GFP_NOFS); if (!oi) return NULL; --- a/fs/openpromfs/inode.c~fs-allocate-inode-by-using-alloc_inode_sb +++ a/fs/openpromfs/inode.c @@ -335,7 +335,7 @@ static struct inode *openprom_alloc_inod { struct op_inode_info *oi; - oi = kmem_cache_alloc(op_inode_cachep, GFP_KERNEL); + oi = alloc_inode_sb(sb, op_inode_cachep, GFP_KERNEL); if (!oi) return NULL; --- a/fs/orangefs/super.c~fs-allocate-inode-by-using-alloc_inode_sb +++ a/fs/orangefs/super.c @@ -107,7 +107,7 @@ static struct inode *orangefs_alloc_inod { struct orangefs_inode_s *orangefs_inode; - orangefs_inode = kmem_cache_alloc(orangefs_inode_cache, GFP_KERNEL); + orangefs_inode = alloc_inode_sb(sb, orangefs_inode_cache, GFP_KERNEL); if (!orangefs_inode) return NULL; --- a/fs/overlayfs/super.c~fs-allocate-inode-by-using-alloc_inode_sb +++ a/fs/overlayfs/super.c @@ -174,7 +174,7 @@ static struct kmem_cache *ovl_inode_cach static struct inode *ovl_alloc_inode(struct super_block *sb) { - struct ovl_inode *oi = kmem_cache_alloc(ovl_inode_cachep, GFP_KERNEL); + struct ovl_inode *oi = alloc_inode_sb(sb, ovl_inode_cachep, GFP_KERNEL); if (!oi) return NULL; --- a/fs/proc/inode.c~fs-allocate-inode-by-using-alloc_inode_sb +++ a/fs/proc/inode.c @@ -66,7 +66,7 @@ static struct inode *proc_alloc_inode(st { struct proc_inode *ei; - ei = kmem_cache_alloc(proc_inode_cachep, GFP_KERNEL); + ei = alloc_inode_sb(sb, proc_inode_cachep, GFP_KERNEL); if (!ei) return NULL; ei->pid = NULL; --- a/fs/qnx4/inode.c~fs-allocate-inode-by-using-alloc_inode_sb +++ a/fs/qnx4/inode.c @@ -338,7 +338,7 @@ static struct kmem_cache *qnx4_inode_cac static struct inode *qnx4_alloc_inode(struct super_block *sb) { struct qnx4_inode_info *ei; - ei = kmem_cache_alloc(qnx4_inode_cachep, GFP_KERNEL); + ei = alloc_inode_sb(sb, qnx4_inode_cachep, GFP_KERNEL); if (!ei) return NULL; return &ei->vfs_inode; --- a/fs/qnx6/inode.c~fs-allocate-inode-by-using-alloc_inode_sb +++ a/fs/qnx6/inode.c @@ -597,7 +597,7 @@ static struct kmem_cache *qnx6_inode_cac static struct inode *qnx6_alloc_inode(struct super_block *sb) { struct qnx6_inode_info *ei; - ei = kmem_cache_alloc(qnx6_inode_cachep, GFP_KERNEL); + ei = alloc_inode_sb(sb, qnx6_inode_cachep, GFP_KERNEL); if (!ei) return NULL; return &ei->vfs_inode; --- a/fs/reiserfs/super.c~fs-allocate-inode-by-using-alloc_inode_sb +++ a/fs/reiserfs/super.c @@ -639,7 +639,7 @@ static struct kmem_cache *reiserfs_inode static struct inode *reiserfs_alloc_inode(struct super_block *sb) { struct reiserfs_inode_info *ei; - ei = kmem_cache_alloc(reiserfs_inode_cachep, GFP_KERNEL); + ei = alloc_inode_sb(sb, reiserfs_inode_cachep, GFP_KERNEL); if (!ei) return NULL; atomic_set(&ei->openers, 0); --- a/fs/romfs/super.c~fs-allocate-inode-by-using-alloc_inode_sb +++ a/fs/romfs/super.c @@ -375,7 +375,7 @@ static struct inode *romfs_alloc_inode(s { struct romfs_inode_info *inode; - inode = kmem_cache_alloc(romfs_inode_cachep, GFP_KERNEL); + inode = alloc_inode_sb(sb, romfs_inode_cachep, GFP_KERNEL); return inode ? &inode->vfs_inode : NULL; } --- a/fs/squashfs/super.c~fs-allocate-inode-by-using-alloc_inode_sb +++ a/fs/squashfs/super.c @@ -584,7 +584,7 @@ static void __exit exit_squashfs_fs(void static struct inode *squashfs_alloc_inode(struct super_block *sb) { struct squashfs_inode_info *ei = - kmem_cache_alloc(squashfs_inode_cachep, GFP_KERNEL); + alloc_inode_sb(sb, squashfs_inode_cachep, GFP_KERNEL); return ei ? &ei->vfs_inode : NULL; } --- a/fs/sysv/inode.c~fs-allocate-inode-by-using-alloc_inode_sb +++ a/fs/sysv/inode.c @@ -306,7 +306,7 @@ static struct inode *sysv_alloc_inode(st { struct sysv_inode_info *si; - si = kmem_cache_alloc(sysv_inode_cachep, GFP_KERNEL); + si = alloc_inode_sb(sb, sysv_inode_cachep, GFP_KERNEL); if (!si) return NULL; return &si->vfs_inode; --- a/fs/ubifs/super.c~fs-allocate-inode-by-using-alloc_inode_sb +++ a/fs/ubifs/super.c @@ -268,7 +268,7 @@ static struct inode *ubifs_alloc_inode(s { struct ubifs_inode *ui; - ui = kmem_cache_alloc(ubifs_inode_slab, GFP_NOFS); + ui = alloc_inode_sb(sb, ubifs_inode_slab, GFP_NOFS); if (!ui) return NULL; --- a/fs/udf/super.c~fs-allocate-inode-by-using-alloc_inode_sb +++ a/fs/udf/super.c @@ -136,7 +136,7 @@ static struct kmem_cache *udf_inode_cach static struct inode *udf_alloc_inode(struct super_block *sb) { struct udf_inode_info *ei; - ei = kmem_cache_alloc(udf_inode_cachep, GFP_KERNEL); + ei = alloc_inode_sb(sb, udf_inode_cachep, GFP_KERNEL); if (!ei) return NULL; --- a/fs/ufs/super.c~fs-allocate-inode-by-using-alloc_inode_sb +++ a/fs/ufs/super.c @@ -1443,7 +1443,7 @@ static struct inode *ufs_alloc_inode(str { struct ufs_inode_info *ei; - ei = kmem_cache_alloc(ufs_inode_cachep, GFP_NOFS); + ei = alloc_inode_sb(sb, ufs_inode_cachep, GFP_NOFS); if (!ei) return NULL; --- a/fs/vboxsf/super.c~fs-allocate-inode-by-using-alloc_inode_sb +++ a/fs/vboxsf/super.c @@ -241,7 +241,7 @@ static struct inode *vboxsf_alloc_inode( { struct vboxsf_inode *sf_i; - sf_i = kmem_cache_alloc(vboxsf_inode_cachep, GFP_NOFS); + sf_i = alloc_inode_sb(sb, vboxsf_inode_cachep, GFP_NOFS); if (!sf_i) return NULL; --- a/fs/xfs/xfs_icache.c~fs-allocate-inode-by-using-alloc_inode_sb +++ a/fs/xfs/xfs_icache.c @@ -77,7 +77,7 @@ xfs_inode_alloc( * XXX: If this didn't occur in transactions, we could drop GFP_NOFAIL * and return NULL here on ENOMEM. */ - ip = kmem_cache_alloc(xfs_inode_cache, GFP_KERNEL | __GFP_NOFAIL); + ip = alloc_inode_sb(mp->m_super, xfs_inode_cache, GFP_KERNEL | __GFP_NOFAIL); if (inode_init_always(mp->m_super, VFS_I(ip))) { kmem_cache_free(xfs_inode_cache, ip); --- a/fs/zonefs/super.c~fs-allocate-inode-by-using-alloc_inode_sb +++ a/fs/zonefs/super.c @@ -1137,7 +1137,7 @@ static struct inode *zonefs_alloc_inode( { struct zonefs_inode_info *zi; - zi = kmem_cache_alloc(zonefs_inode_cachep, GFP_KERNEL); + zi = alloc_inode_sb(sb, zonefs_inode_cachep, GFP_KERNEL); if (!zi) return NULL; --- a/ipc/mqueue.c~fs-allocate-inode-by-using-alloc_inode_sb +++ a/ipc/mqueue.c @@ -486,7 +486,7 @@ static struct inode *mqueue_alloc_inode( { struct mqueue_inode_info *ei; - ei = kmem_cache_alloc(mqueue_inode_cachep, GFP_KERNEL); + ei = alloc_inode_sb(sb, mqueue_inode_cachep, GFP_KERNEL); if (!ei) return NULL; return &ei->vfs_inode; --- a/mm/shmem.c~fs-allocate-inode-by-using-alloc_inode_sb +++ a/mm/shmem.c @@ -3708,7 +3708,7 @@ static struct kmem_cache *shmem_inode_ca static struct inode *shmem_alloc_inode(struct super_block *sb) { struct shmem_inode_info *info; - info = kmem_cache_alloc(shmem_inode_cachep, GFP_KERNEL); + info = alloc_inode_sb(sb, shmem_inode_cachep, GFP_KERNEL); if (!info) return NULL; return &info->vfs_inode; --- a/net/socket.c~fs-allocate-inode-by-using-alloc_inode_sb +++ a/net/socket.c @@ -301,7 +301,7 @@ static struct inode *sock_alloc_inode(st { struct socket_alloc *ei; - ei = kmem_cache_alloc(sock_inode_cachep, GFP_KERNEL); + ei = alloc_inode_sb(sb, sock_inode_cachep, GFP_KERNEL); if (!ei) return NULL; init_waitqueue_head(&ei->socket.wq.wait); --- a/net/sunrpc/rpc_pipe.c~fs-allocate-inode-by-using-alloc_inode_sb +++ a/net/sunrpc/rpc_pipe.c @@ -197,7 +197,7 @@ static struct inode * rpc_alloc_inode(struct super_block *sb) { struct rpc_inode *rpci; - rpci = kmem_cache_alloc(rpc_inode_cachep, GFP_KERNEL); + rpci = alloc_inode_sb(sb, rpc_inode_cachep, GFP_KERNEL); if (!rpci) return NULL; return &rpci->vfs_inode; From patchwork Tue Mar 22 21:41:06 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789091 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E5688C433F5 for ; Tue, 22 Mar 2022 21:41:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7D50A6B007E; Tue, 22 Mar 2022 17:41:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 784636B0088; Tue, 22 Mar 2022 17:41:10 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 64C4E6B0089; Tue, 22 Mar 2022 17:41:10 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 573676B007E for ; Tue, 22 Mar 2022 17:41:10 -0400 (EDT) Received: from smtpin16.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 131DBA4A74 for ; Tue, 22 Mar 2022 21:41:10 +0000 (UTC) X-FDA: 79273343100.16.BA2BA21 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf02.hostedemail.com (Postfix) with ESMTP id 7D4B68001E for ; Tue, 22 Mar 2022 21:41:09 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 675DBB81DAF; Tue, 22 Mar 2022 21:41:08 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 18317C340EC; Tue, 22 Mar 2022 21:41:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985267; bh=u35NPKd+jo0oNE7KcZHC2ivXPW2a9Q+TDXL+8CzoOgE=; h=Date:To:From:In-Reply-To:Subject:From; b=Xu+lac8k+pprIeiogfeGqT6JILDp4qI6LqfvnRoCxjCRaKbOtnLQn9Mqzkou3YSVt 9flF6NDjbOLW8XJlYOABHxqQl3rg4h6tZiGna1aeL8wQNwlsGl0VM1kCNrUwPqF5MC OCzY23fm6QQSU0dG7fTtlJ9u7yL9BuY9p+coAA/k= Date: Tue, 22 Mar 2022 14:41:06 -0700 To: zhengqi.arch@bytedance.com,willy@infradead.org,vdavydov.dev@gmail.com,vbabka@suse.cz,tytso@mit.edu,trond.myklebust@hammerspace.com,shy828301@gmail.com,shakeelb@google.com,roman.gushchin@linux.dev,richard.weiyang@gmail.com,mhocko@kernel.org,kari.argillander@gmail.com,jaegeuk@kernel.org,hannes@cmpxchg.org,fam.zheng@bytedance.com,duanxiongchun@bytedance.com,david@fromorbit.com,chao@kernel.org,Anna.Schumaker@Netapp.com,alexs@kernel.org,songmuchun@bytedance.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 051/227] f2fs: allocate inode by using alloc_inode_sb() Message-Id: <20220322214107.18317C340EC@smtp.kernel.org> X-Rspamd-Server: rspam09 X-Rspam-User: X-Stat-Signature: t71s8cq47xxonsg6tkhqgzhobf6433h8 Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=Xu+lac8k; dmarc=none; spf=pass (imf02.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Rspamd-Queue-Id: 7D4B68001E X-HE-Tag: 1647985269-737103 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Muchun Song Subject: f2fs: allocate inode by using alloc_inode_sb() The inode allocation is supposed to use alloc_inode_sb(), so convert kmem_cache_alloc() to alloc_inode_sb(). Link: https://lkml.kernel.org/r/20220228122126.37293-6-songmuchun@bytedance.com Signed-off-by: Muchun Song Acked-by: Roman Gushchin Cc: Alex Shi Cc: Anna Schumaker Cc: Chao Yu Cc: Dave Chinner Cc: Fam Zheng Cc: Jaegeuk Kim Cc: Johannes Weiner Cc: Kari Argillander Cc: Matthew Wilcox (Oracle) Cc: Michal Hocko Cc: Qi Zheng Cc: Shakeel Butt Cc: Theodore Ts'o Cc: Trond Myklebust Cc: Vladimir Davydov Cc: Vlastimil Babka Cc: Wei Yang Cc: Xiongchun Duan Cc: Yang Shi Signed-off-by: Andrew Morton --- fs/f2fs/super.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) --- a/fs/f2fs/super.c~f2fs-allocate-inode-by-using-alloc_inode_sb +++ a/fs/f2fs/super.c @@ -1345,8 +1345,12 @@ static struct inode *f2fs_alloc_inode(st { struct f2fs_inode_info *fi; - fi = f2fs_kmem_cache_alloc(f2fs_inode_cachep, - GFP_F2FS_ZERO, false, F2FS_SB(sb)); + if (time_to_inject(F2FS_SB(sb), FAULT_SLAB_ALLOC)) { + f2fs_show_injection_info(F2FS_SB(sb), FAULT_SLAB_ALLOC); + return NULL; + } + + fi = alloc_inode_sb(sb, f2fs_inode_cachep, GFP_F2FS_ZERO); if (!fi) return NULL; From patchwork Tue Mar 22 21:41:09 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789092 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4F78AC433EF for ; Tue, 22 Mar 2022 21:41:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D07276B0085; Tue, 22 Mar 2022 17:41:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CB5256B0088; Tue, 22 Mar 2022 17:41:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B7C6E6B0089; Tue, 22 Mar 2022 17:41:13 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0060.hostedemail.com [216.40.44.60]) by kanga.kvack.org (Postfix) with ESMTP id A82836B0085 for ; Tue, 22 Mar 2022 17:41:13 -0400 (EDT) Received: from smtpin19.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 626FFA4DB6 for ; Tue, 22 Mar 2022 21:41:13 +0000 (UTC) X-FDA: 79273343226.19.8CEE387 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf22.hostedemail.com (Postfix) with ESMTP id CFEDAC0032 for ; Tue, 22 Mar 2022 21:41:12 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id A3F97B81DB1; Tue, 22 Mar 2022 21:41:11 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3B2A4C340F3; Tue, 22 Mar 2022 21:41:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985270; bh=smRsLR5KE1eDlWG0M4YVDorxzhzKWgs3LdA4p0yZgwU=; h=Date:To:From:In-Reply-To:Subject:From; b=hN0XTY1q9p8lnfHVUwqoyvjJe+iGAnx0czNZZugCFyrj4LMZP+5T7bGLwahIuOIaW C6kInRkyvmyPWExuEIoAVHZBRpEmHweGXYlM1oGDZ0osTFTlmXaJxF6Vj3c/MM7JtR nASThRJ0ccMW1sr5dIizc2wNQLv3hdhjQVmjUtKc= Date: Tue, 22 Mar 2022 14:41:09 -0700 To: zhengqi.arch@bytedance.com,willy@infradead.org,vdavydov.dev@gmail.com,vbabka@suse.cz,tytso@mit.edu,trond.myklebust@hammerspace.com,shy828301@gmail.com,shakeelb@google.com,roman.gushchin@linux.dev,richard.weiyang@gmail.com,mhocko@kernel.org,kari.argillander@gmail.com,jaegeuk@kernel.org,hannes@cmpxchg.org,fam.zheng@bytedance.com,duanxiongchun@bytedance.com,david@fromorbit.com,chao@kernel.org,Anna.Schumaker@Netapp.com,alexs@kernel.org,songmuchun@bytedance.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 052/227] mm: dcache: use kmem_cache_alloc_lru() to allocate dentry Message-Id: <20220322214110.3B2A4C340F3@smtp.kernel.org> X-Stat-Signature: txwogt8oy3jxmbs1n3yrn3oeqpq49ead Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=hN0XTY1q; spf=pass (imf22.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: CFEDAC0032 X-HE-Tag: 1647985272-518421 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Muchun Song Subject: mm: dcache: use kmem_cache_alloc_lru() to allocate dentry Like inode cache, the dentry will also be added to its memcg list_lru. So replace kmem_cache_alloc() with kmem_cache_alloc_lru() to allocate dentry. Link: https://lkml.kernel.org/r/20220228122126.37293-8-songmuchun@bytedance.com Signed-off-by: Muchun Song Acked-by: Roman Gushchin Cc: Alex Shi Cc: Anna Schumaker Cc: Chao Yu Cc: Dave Chinner Cc: Fam Zheng Cc: Jaegeuk Kim Cc: Johannes Weiner Cc: Kari Argillander Cc: Matthew Wilcox (Oracle) Cc: Michal Hocko Cc: Qi Zheng Cc: Shakeel Butt Cc: Theodore Ts'o Cc: Trond Myklebust Cc: Vladimir Davydov Cc: Vlastimil Babka Cc: Wei Yang Cc: Xiongchun Duan Cc: Yang Shi Signed-off-by: Andrew Morton --- fs/dcache.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) --- a/fs/dcache.c~mm-dcache-use-kmem_cache_alloc_lru-to-allocate-dentry +++ a/fs/dcache.c @@ -1766,7 +1766,8 @@ static struct dentry *__d_alloc(struct s char *dname; int err; - dentry = kmem_cache_alloc(dentry_cache, GFP_KERNEL); + dentry = kmem_cache_alloc_lru(dentry_cache, &sb->s_dentry_lru, + GFP_KERNEL); if (!dentry) return NULL; From patchwork Tue Mar 22 21:41:12 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789093 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3A7C3C433F5 for ; Tue, 22 Mar 2022 21:41:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C421F6B0089; Tue, 22 Mar 2022 17:41:16 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BF3526B008A; Tue, 22 Mar 2022 17:41:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AB9476B008C; Tue, 22 Mar 2022 17:41:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0114.hostedemail.com [216.40.44.114]) by kanga.kvack.org (Postfix) with ESMTP id 9D43F6B0089 for ; Tue, 22 Mar 2022 17:41:16 -0400 (EDT) Received: from smtpin24.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 62B2BA5658 for ; Tue, 22 Mar 2022 21:41:16 +0000 (UTC) X-FDA: 79273343352.24.706DA62 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf30.hostedemail.com (Postfix) with ESMTP id BFA4B8001F for ; Tue, 22 Mar 2022 21:41:15 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id A88BDB81DAF; Tue, 22 Mar 2022 21:41:14 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 62DD8C340EC; Tue, 22 Mar 2022 21:41:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985273; bh=FB4vvzaAOfRiBP+cEUqNMsepKdHCYIMcTCHaktR3wrI=; h=Date:To:From:In-Reply-To:Subject:From; b=NzpRKNckQoXFShpHJhyd36fHyTs3Bh3vvfCyHcjT9GCWVMGN4H/bpG5+GIfD4YVb5 HF/TnRnb04xlMeLUlg4VFQeW90kOXjpGBrSc8v+ZR8kAKmX2LTB/0IIhPP7lIjc0Ii LDapy5LTCCi3UadXM9BKVuyykq8jFNxwUnrrlmzE= Date: Tue, 22 Mar 2022 14:41:12 -0700 To: zhengqi.arch@bytedance.com,willy@infradead.org,vdavydov.dev@gmail.com,vbabka@suse.cz,tytso@mit.edu,trond.myklebust@hammerspace.com,shy828301@gmail.com,shakeelb@google.com,roman.gushchin@linux.dev,richard.weiyang@gmail.com,mhocko@kernel.org,kari.argillander@gmail.com,jaegeuk@kernel.org,hannes@cmpxchg.org,fam.zheng@bytedance.com,duanxiongchun@bytedance.com,david@fromorbit.com,chao@kernel.org,Anna.Schumaker@Netapp.com,alexs@kernel.org,songmuchun@bytedance.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 053/227] xarray: use kmem_cache_alloc_lru to allocate xa_node Message-Id: <20220322214113.62DD8C340EC@smtp.kernel.org> X-Stat-Signature: g7zg6f4wsc7zp3yjo8z8xx89y6d3bcoh Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=NzpRKNck; spf=pass (imf30.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspam-User: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: BFA4B8001F X-HE-Tag: 1647985275-671721 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Muchun Song Subject: xarray: use kmem_cache_alloc_lru to allocate xa_node The workingset will add the xa_node to the shadow_nodes list. So the allocation of xa_node should be done by kmem_cache_alloc_lru(). Using xas_set_lru() to pass the list_lru which we want to insert xa_node into to set up the xa_node reclaim context correctly. Link: https://lkml.kernel.org/r/20220228122126.37293-9-songmuchun@bytedance.com Signed-off-by: Muchun Song Acked-by: Johannes Weiner Acked-by: Roman Gushchin Cc: Alex Shi Cc: Anna Schumaker Cc: Chao Yu Cc: Dave Chinner Cc: Fam Zheng Cc: Jaegeuk Kim Cc: Kari Argillander Cc: Matthew Wilcox (Oracle) Cc: Michal Hocko Cc: Qi Zheng Cc: Shakeel Butt Cc: Theodore Ts'o Cc: Trond Myklebust Cc: Vladimir Davydov Cc: Vlastimil Babka Cc: Wei Yang Cc: Xiongchun Duan Cc: Yang Shi Signed-off-by: Andrew Morton --- include/linux/swap.h | 5 ++++- include/linux/xarray.h | 9 ++++++++- lib/xarray.c | 10 +++++----- mm/workingset.c | 2 +- 4 files changed, 18 insertions(+), 8 deletions(-) --- a/include/linux/swap.h~xarray-use-kmem_cache_alloc_lru-to-allocate-xa_node +++ a/include/linux/swap.h @@ -334,9 +334,12 @@ void workingset_activation(struct folio /* Only track the nodes of mappings with shadow entries */ void workingset_update_node(struct xa_node *node); +extern struct list_lru shadow_nodes; #define mapping_set_update(xas, mapping) do { \ - if (!dax_mapping(mapping) && !shmem_mapping(mapping)) \ + if (!dax_mapping(mapping) && !shmem_mapping(mapping)) { \ xas_set_update(xas, workingset_update_node); \ + xas_set_lru(xas, &shadow_nodes); \ + } \ } while (0) /* linux/mm/page_alloc.c */ --- a/include/linux/xarray.h~xarray-use-kmem_cache_alloc_lru-to-allocate-xa_node +++ a/include/linux/xarray.h @@ -1317,6 +1317,7 @@ struct xa_state { struct xa_node *xa_node; struct xa_node *xa_alloc; xa_update_node_t xa_update; + struct list_lru *xa_lru; }; /* @@ -1336,7 +1337,8 @@ struct xa_state { .xa_pad = 0, \ .xa_node = XAS_RESTART, \ .xa_alloc = NULL, \ - .xa_update = NULL \ + .xa_update = NULL, \ + .xa_lru = NULL, \ } /** @@ -1631,6 +1633,11 @@ static inline void xas_set_update(struct xas->xa_update = update; } +static inline void xas_set_lru(struct xa_state *xas, struct list_lru *lru) +{ + xas->xa_lru = lru; +} + /** * xas_next_entry() - Advance iterator to next present entry. * @xas: XArray operation state. --- a/lib/xarray.c~xarray-use-kmem_cache_alloc_lru-to-allocate-xa_node +++ a/lib/xarray.c @@ -302,7 +302,7 @@ bool xas_nomem(struct xa_state *xas, gfp } if (xas->xa->xa_flags & XA_FLAGS_ACCOUNT) gfp |= __GFP_ACCOUNT; - xas->xa_alloc = kmem_cache_alloc(radix_tree_node_cachep, gfp); + xas->xa_alloc = kmem_cache_alloc_lru(radix_tree_node_cachep, xas->xa_lru, gfp); if (!xas->xa_alloc) return false; xas->xa_alloc->parent = NULL; @@ -334,10 +334,10 @@ static bool __xas_nomem(struct xa_state gfp |= __GFP_ACCOUNT; if (gfpflags_allow_blocking(gfp)) { xas_unlock_type(xas, lock_type); - xas->xa_alloc = kmem_cache_alloc(radix_tree_node_cachep, gfp); + xas->xa_alloc = kmem_cache_alloc_lru(radix_tree_node_cachep, xas->xa_lru, gfp); xas_lock_type(xas, lock_type); } else { - xas->xa_alloc = kmem_cache_alloc(radix_tree_node_cachep, gfp); + xas->xa_alloc = kmem_cache_alloc_lru(radix_tree_node_cachep, xas->xa_lru, gfp); } if (!xas->xa_alloc) return false; @@ -371,7 +371,7 @@ static void *xas_alloc(struct xa_state * if (xas->xa->xa_flags & XA_FLAGS_ACCOUNT) gfp |= __GFP_ACCOUNT; - node = kmem_cache_alloc(radix_tree_node_cachep, gfp); + node = kmem_cache_alloc_lru(radix_tree_node_cachep, xas->xa_lru, gfp); if (!node) { xas_set_err(xas, -ENOMEM); return NULL; @@ -1014,7 +1014,7 @@ void xas_split_alloc(struct xa_state *xa void *sibling = NULL; struct xa_node *node; - node = kmem_cache_alloc(radix_tree_node_cachep, gfp); + node = kmem_cache_alloc_lru(radix_tree_node_cachep, xas->xa_lru, gfp); if (!node) goto nomem; node->array = xas->xa; --- a/mm/workingset.c~xarray-use-kmem_cache_alloc_lru-to-allocate-xa_node +++ a/mm/workingset.c @@ -429,7 +429,7 @@ out: * point where they would still be useful. */ -static struct list_lru shadow_nodes; +struct list_lru shadow_nodes; void workingset_update_node(struct xa_node *node) { From patchwork Tue Mar 22 21:41:15 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789094 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 59C62C4332F for ; Tue, 22 Mar 2022 21:41:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DA17A6B008A; Tue, 22 Mar 2022 17:41:19 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D51C96B008C; Tue, 22 Mar 2022 17:41:19 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C19856B0092; Tue, 22 Mar 2022 17:41:19 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.28]) by kanga.kvack.org (Postfix) with ESMTP id B1AFC6B008A for ; Tue, 22 Mar 2022 17:41:19 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 8312323FAE for ; Tue, 22 Mar 2022 21:41:19 +0000 (UTC) X-FDA: 79273343478.15.D5FFB9F Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf30.hostedemail.com (Postfix) with ESMTP id 03DC68000C for ; Tue, 22 Mar 2022 21:41:18 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id DC5B1B81D77; Tue, 22 Mar 2022 21:41:17 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 91195C340EC; Tue, 22 Mar 2022 21:41:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985276; bh=3oPUXmBu9rzvUQypMoQEzI1+j+Up86PekAe6jM1EMhg=; h=Date:To:From:In-Reply-To:Subject:From; b=lwVTaqQsP5bYOTL/qcnU0V5oAz3Q1x4IfC2Bt29d9peV2Mt4S1wS8sR3UW5hZUwIQ kzwSupfHzvG+qt5OttsIbo6vpAujHeXdesdmWHGnaf8XYUq7339mXu4vhDX3dyJATl lWBeILC9oDgrfdlf84pRaf3vJGJXSFLUCSkPZbjM= Date: Tue, 22 Mar 2022 14:41:15 -0700 To: zhengqi.arch@bytedance.com,willy@infradead.org,vdavydov.dev@gmail.com,vbabka@suse.cz,tytso@mit.edu,trond.myklebust@hammerspace.com,shy828301@gmail.com,shakeelb@google.com,roman.gushchin@linux.dev,richard.weiyang@gmail.com,mhocko@kernel.org,kari.argillander@gmail.com,jaegeuk@kernel.org,hannes@cmpxchg.org,fam.zheng@bytedance.com,duanxiongchun@bytedance.com,david@fromorbit.com,chao@kernel.org,Anna.Schumaker@Netapp.com,alexs@kernel.org,songmuchun@bytedance.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 054/227] mm: memcontrol: move memcg_online_kmem() to mem_cgroup_css_online() Message-Id: <20220322214116.91195C340EC@smtp.kernel.org> Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=lwVTaqQs; spf=pass (imf30.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspam-User: X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 03DC68000C X-Stat-Signature: dsg3j3ohjibz77zt3ajoibjt1tkhg9cg X-HE-Tag: 1647985278-623195 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Muchun Song Subject: mm: memcontrol: move memcg_online_kmem() to mem_cgroup_css_online() It will simplify the code if moving memcg_online_kmem() to mem_cgroup_css_online() and do not need to set ->kmemcg_id to -1 to indicate the memcg is offline. In the next patch, ->kmemcg_id will be used to sync list lru reparenting which requires not to change ->kmemcg_id. Link: https://lkml.kernel.org/r/20220228122126.37293-10-songmuchun@bytedance.com Signed-off-by: Muchun Song Acked-by: Roman Gushchin Cc: Alex Shi Cc: Anna Schumaker Cc: Chao Yu Cc: Dave Chinner Cc: Fam Zheng Cc: Jaegeuk Kim Cc: Johannes Weiner Cc: Kari Argillander Cc: Matthew Wilcox (Oracle) Cc: Michal Hocko Cc: Qi Zheng Cc: Shakeel Butt Cc: Theodore Ts'o Cc: Trond Myklebust Cc: Vladimir Davydov Cc: Vlastimil Babka Cc: Wei Yang Cc: Xiongchun Duan Cc: Yang Shi Signed-off-by: Andrew Morton --- mm/memcontrol.c | 37 ++++++++++++++++--------------------- 1 file changed, 16 insertions(+), 21 deletions(-) --- a/mm/memcontrol.c~mm-memcontrol-move-memcg_online_kmem-to-mem_cgroup_css_online +++ a/mm/memcontrol.c @@ -3670,7 +3670,8 @@ static int memcg_online_kmem(struct mem_ if (cgroup_memory_nokmem) return 0; - BUG_ON(memcg->kmemcg_id >= 0); + if (unlikely(mem_cgroup_is_root(memcg))) + return 0; memcg_id = memcg_alloc_cache_id(); if (memcg_id < 0) @@ -3696,7 +3697,10 @@ static void memcg_offline_kmem(struct me struct mem_cgroup *parent; int kmemcg_id; - if (memcg->kmemcg_id == -1) + if (cgroup_memory_nokmem) + return; + + if (unlikely(mem_cgroup_is_root(memcg))) return; parent = parent_mem_cgroup(memcg); @@ -3706,7 +3710,6 @@ static void memcg_offline_kmem(struct me memcg_reparent_objcgs(memcg, parent); kmemcg_id = memcg->kmemcg_id; - BUG_ON(kmemcg_id < 0); /* * After we have finished memcg_reparent_objcgs(), all list_lrus @@ -3717,7 +3720,6 @@ static void memcg_offline_kmem(struct me memcg_drain_all_list_lrus(kmemcg_id, parent); memcg_free_cache_id(kmemcg_id); - memcg->kmemcg_id = -1; } #else static int memcg_online_kmem(struct mem_cgroup *memcg) @@ -5237,7 +5239,6 @@ mem_cgroup_css_alloc(struct cgroup_subsy { struct mem_cgroup *parent = mem_cgroup_from_css(parent_css); struct mem_cgroup *memcg, *old_memcg; - long error = -ENOMEM; old_memcg = set_active_memcg(parent); memcg = mem_cgroup_alloc(); @@ -5266,34 +5267,26 @@ mem_cgroup_css_alloc(struct cgroup_subsy return &memcg->css; } - /* The following stuff does not apply to the root */ - error = memcg_online_kmem(memcg); - if (error) - goto fail; - if (cgroup_subsys_on_dfl(memory_cgrp_subsys) && !cgroup_memory_nosocket) static_branch_inc(&memcg_sockets_enabled_key); return &memcg->css; -fail: - mem_cgroup_id_remove(memcg); - mem_cgroup_free(memcg); - return ERR_PTR(error); } static int mem_cgroup_css_online(struct cgroup_subsys_state *css) { struct mem_cgroup *memcg = mem_cgroup_from_css(css); + if (memcg_online_kmem(memcg)) + goto remove_id; + /* * A memcg must be visible for expand_shrinker_info() * by the time the maps are allocated. So, we allocate maps * here, when for_each_mem_cgroup() can't skip it. */ - if (alloc_shrinker_info(memcg)) { - mem_cgroup_id_remove(memcg); - return -ENOMEM; - } + if (alloc_shrinker_info(memcg)) + goto offline_kmem; /* Online state pins memcg ID, memcg ID pins CSS */ refcount_set(&memcg->id.ref, 1); @@ -5303,6 +5296,11 @@ static int mem_cgroup_css_online(struct queue_delayed_work(system_unbound_wq, &stats_flush_dwork, 2UL*HZ); return 0; +offline_kmem: + memcg_offline_kmem(memcg); +remove_id: + mem_cgroup_id_remove(memcg); + return -ENOMEM; } static void mem_cgroup_css_offline(struct cgroup_subsys_state *css) @@ -5360,9 +5358,6 @@ static void mem_cgroup_css_free(struct c cancel_work_sync(&memcg->high_work); mem_cgroup_remove_from_trees(memcg); free_shrinker_info(memcg); - - /* Need to offline kmem if online_css() fails */ - memcg_offline_kmem(memcg); mem_cgroup_free(memcg); } From patchwork Tue Mar 22 21:41:19 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789105 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 38751C433FE for ; Tue, 22 Mar 2022 21:41:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C1C7F6B00A4; Tue, 22 Mar 2022 17:41:57 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BA3376B00A5; Tue, 22 Mar 2022 17:41:57 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 932296B00A6; Tue, 22 Mar 2022 17:41:57 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.27]) by kanga.kvack.org (Postfix) with ESMTP id 621AB6B00A4 for ; Tue, 22 Mar 2022 17:41:57 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 3F3C121A8C for ; Tue, 22 Mar 2022 21:41:57 +0000 (UTC) X-FDA: 79273345074.11.D897379 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf03.hostedemail.com (Postfix) with ESMTP id F07EA20033 for ; Tue, 22 Mar 2022 21:41:20 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 6AB8960A1B; Tue, 22 Mar 2022 21:41:20 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id BA884C340EC; Tue, 22 Mar 2022 21:41:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985279; bh=tYSxVO+xbyhoHmDPt9XF+yxjHf4+bDZsuEYKRIFl8js=; h=Date:To:From:In-Reply-To:Subject:From; b=xOGa5b/FIjsegR0KG9Cxu6syDkIc0ulebbPYuXsJYFUy6oO0O8ZkM20glUG9C7jz0 vj6lO4Otp07ncQaYd3/lZkdtcFzOIPEv4r3xE4Vzea51sVEEW17LuqKH4O3kTS3KU2 9yXWIkS9R4tfK3ZBlbmc/Jr0FoYWshjj98G0vLBM= Date: Tue, 22 Mar 2022 14:41:19 -0700 To: zhengqi.arch@bytedance.com,willy@infradead.org,vdavydov.dev@gmail.com,vbabka@suse.cz,tytso@mit.edu,trond.myklebust@hammerspace.com,shy828301@gmail.com,shakeelb@google.com,roman.gushchin@linux.dev,richard.weiyang@gmail.com,mhocko@kernel.org,kari.argillander@gmail.com,jaegeuk@kernel.org,hannes@cmpxchg.org,fam.zheng@bytedance.com,duanxiongchun@bytedance.com,david@fromorbit.com,chao@kernel.org,Anna.Schumaker@Netapp.com,alexs@kernel.org,songmuchun@bytedance.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 055/227] mm: list_lru: allocate list_lru_one only when needed Message-Id: <20220322214119.BA884C340EC@smtp.kernel.org> X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: F07EA20033 X-Rspam-User: Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b="xOGa5b/F"; dmarc=none; spf=pass (imf03.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Stat-Signature: x1p9yh7degp84ehergx3u7kffgr8c66y X-HE-Tag: 1647985280-342475 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Muchun Song Subject: mm: list_lru: allocate list_lru_one only when needed In our server, we found a suspected memory leak problem. The kmalloc-32 consumes more than 6GB of memory. Other kmem_caches consume less than 2GB memory. After our in-depth analysis, the memory consumption of kmalloc-32 slab cache is the cause of list_lru_one allocation. crash> p memcg_nr_cache_ids memcg_nr_cache_ids = $2 = 24574 memcg_nr_cache_ids is very large and memory consumption of each list_lru can be calculated with the following formula. num_numa_node * memcg_nr_cache_ids * 32 (kmalloc-32) There are 4 numa nodes in our system, so each list_lru consumes ~3MB. crash> list super_blocks | wc -l 952 Every mount will register 2 list lrus, one is for inode, another is for dentry. There are 952 super_blocks. So the total memory is 952 * 2 * 3 MB (~5.6GB). But the number of memory cgroup is less than 500. So I guess more than 12286 containers have been deployed on this machine (I do not know why there are so many containers, it may be a user's bug or the user really want to do that). And memcg_nr_cache_ids has not been reduced to a suitable value. This can waste a lot of memory. Now the infrastructure for dynamic list_lru_one allocation is ready, so remove statically allocated memory code to save memory. Link: https://lkml.kernel.org/r/20220228122126.37293-11-songmuchun@bytedance.com Signed-off-by: Muchun Song Cc: Alex Shi Cc: Anna Schumaker Cc: Chao Yu Cc: Dave Chinner Cc: Fam Zheng Cc: Jaegeuk Kim Cc: Johannes Weiner Cc: Kari Argillander Cc: Matthew Wilcox (Oracle) Cc: Michal Hocko Cc: Qi Zheng Cc: Roman Gushchin Cc: Shakeel Butt Cc: Theodore Ts'o Cc: Trond Myklebust Cc: Vladimir Davydov Cc: Vlastimil Babka Cc: Wei Yang Cc: Xiongchun Duan Cc: Yang Shi Signed-off-by: Andrew Morton --- include/linux/list_lru.h | 7 +- mm/list_lru.c | 121 ++++++++++++++++++++----------------- mm/memcontrol.c | 6 + 3 files changed, 77 insertions(+), 57 deletions(-) --- a/include/linux/list_lru.h~mm-list_lru-allocate-list_lru_one-only-when-needed +++ a/include/linux/list_lru.h @@ -32,14 +32,15 @@ struct list_lru_one { }; struct list_lru_per_memcg { + struct rcu_head rcu; /* array of per cgroup per node lists, indexed by node id */ - struct list_lru_one node[0]; + struct list_lru_one node[]; }; struct list_lru_memcg { struct rcu_head rcu; /* array of per cgroup lists, indexed by memcg_cache_id */ - struct list_lru_per_memcg *mlru[]; + struct list_lru_per_memcg __rcu *mlru[]; }; struct list_lru_node { @@ -77,7 +78,7 @@ int __list_lru_init(struct list_lru *lru int memcg_list_lru_alloc(struct mem_cgroup *memcg, struct list_lru *lru, gfp_t gfp); int memcg_update_all_list_lrus(int num_memcgs); -void memcg_drain_all_list_lrus(int src_idx, struct mem_cgroup *dst_memcg); +void memcg_drain_all_list_lrus(struct mem_cgroup *src, struct mem_cgroup *dst); /** * list_lru_add: add an element to the lru list's tail --- a/mm/list_lru.c~mm-list_lru-allocate-list_lru_one-only-when-needed +++ a/mm/list_lru.c @@ -60,8 +60,12 @@ list_lru_from_memcg_idx(struct list_lru * from relocation (see memcg_update_list_lru). */ mlrus = rcu_dereference_check(lru->mlrus, lockdep_is_held(&nlru->lock)); - if (mlrus && idx >= 0) - return &mlrus->mlru[idx]->node[nid]; + if (mlrus && idx >= 0) { + struct list_lru_per_memcg *mlru; + + mlru = rcu_dereference_check(mlrus->mlru[idx], true); + return mlru ? &mlru->node[nid] : NULL; + } return &nlru->lru; } @@ -188,7 +192,7 @@ unsigned long list_lru_count_one(struct rcu_read_lock(); l = list_lru_from_memcg_idx(lru, nid, memcg_cache_id(memcg)); - count = READ_ONCE(l->nr_items); + count = l ? READ_ONCE(l->nr_items) : 0; rcu_read_unlock(); if (unlikely(count < 0)) @@ -217,8 +221,11 @@ __list_lru_walk_one(struct list_lru *lru struct list_head *item, *n; unsigned long isolated = 0; - l = list_lru_from_memcg_idx(lru, nid, memcg_idx); restart: + l = list_lru_from_memcg_idx(lru, nid, memcg_idx); + if (!l) + goto out; + list_for_each_safe(item, n, &l->list) { enum lru_status ret; @@ -262,6 +269,7 @@ restart: BUG(); } } +out: return isolated; } @@ -354,20 +362,25 @@ static struct list_lru_per_memcg *memcg_ return mlru; } -static int memcg_init_list_lru_range(struct list_lru_memcg *mlrus, - int begin, int end) +static void memcg_list_lru_free(struct list_lru *lru, int src_idx) { - int i; + struct list_lru_memcg *mlrus; + struct list_lru_per_memcg *mlru; - for (i = begin; i < end; i++) { - mlrus->mlru[i] = memcg_init_list_lru_one(GFP_KERNEL); - if (!mlrus->mlru[i]) - goto fail; - } - return 0; -fail: - memcg_destroy_list_lru_range(mlrus, begin, i); - return -ENOMEM; + spin_lock_irq(&lru->lock); + mlrus = rcu_dereference_protected(lru->mlrus, true); + mlru = rcu_dereference_protected(mlrus->mlru[src_idx], true); + rcu_assign_pointer(mlrus->mlru[src_idx], NULL); + spin_unlock_irq(&lru->lock); + + /* + * The __list_lru_walk_one() can walk the list of this node. + * We need kvfree_rcu() here. And the walking of the list + * is under lru->node[nid]->lock, which can serve as a RCU + * read-side critical section. + */ + if (mlru) + kvfree_rcu(mlru, rcu); } static int memcg_init_list_lru(struct list_lru *lru, bool memcg_aware) @@ -381,14 +394,10 @@ static int memcg_init_list_lru(struct li spin_lock_init(&lru->lock); - mlrus = kvmalloc(struct_size(mlrus, mlru, size), GFP_KERNEL); + mlrus = kvzalloc(struct_size(mlrus, mlru, size), GFP_KERNEL); if (!mlrus) return -ENOMEM; - if (memcg_init_list_lru_range(mlrus, 0, size)) { - kvfree(mlrus); - return -ENOMEM; - } RCU_INIT_POINTER(lru->mlrus, mlrus); return 0; @@ -422,13 +431,9 @@ static int memcg_update_list_lru(struct if (!new) return -ENOMEM; - if (memcg_init_list_lru_range(new, old_size, new_size)) { - kvfree(new); - return -ENOMEM; - } - spin_lock_irq(&lru->lock); memcpy(&new->mlru, &old->mlru, flex_array_size(new, mlru, old_size)); + memset(&new->mlru[old_size], 0, flex_array_size(new, mlru, new_size - old_size)); rcu_assign_pointer(lru->mlrus, new); spin_unlock_irq(&lru->lock); @@ -436,20 +441,6 @@ static int memcg_update_list_lru(struct return 0; } -static void memcg_cancel_update_list_lru(struct list_lru *lru, - int old_size, int new_size) -{ - struct list_lru_memcg *mlrus; - - mlrus = rcu_dereference_protected(lru->mlrus, - lockdep_is_held(&list_lrus_mutex)); - /* - * Do not bother shrinking the array back to the old size, because we - * cannot handle allocation failures here. - */ - memcg_destroy_list_lru_range(mlrus, old_size, new_size); -} - int memcg_update_all_list_lrus(int new_size) { int ret = 0; @@ -460,15 +451,10 @@ int memcg_update_all_list_lrus(int new_s list_for_each_entry(lru, &memcg_list_lrus, list) { ret = memcg_update_list_lru(lru, old_size, new_size); if (ret) - goto fail; + break; } -out: mutex_unlock(&list_lrus_mutex); return ret; -fail: - list_for_each_entry_continue_reverse(lru, &memcg_list_lrus, list) - memcg_cancel_update_list_lru(lru, old_size, new_size); - goto out; } static void memcg_drain_list_lru_node(struct list_lru *lru, int nid, @@ -485,6 +471,8 @@ static void memcg_drain_list_lru_node(st spin_lock_irq(&nlru->lock); src = list_lru_from_memcg_idx(lru, nid, src_idx); + if (!src) + goto out; dst = list_lru_from_memcg_idx(lru, nid, dst_idx); list_splice_init(&src->list, &dst->list); @@ -494,7 +482,7 @@ static void memcg_drain_list_lru_node(st set_shrinker_bit(dst_memcg, nid, lru_shrinker_id(lru)); src->nr_items = 0; } - +out: spin_unlock_irq(&nlru->lock); } @@ -505,15 +493,41 @@ static void memcg_drain_list_lru(struct for_each_node(i) memcg_drain_list_lru_node(lru, i, src_idx, dst_memcg); + + memcg_list_lru_free(lru, src_idx); } -void memcg_drain_all_list_lrus(int src_idx, struct mem_cgroup *dst_memcg) +void memcg_drain_all_list_lrus(struct mem_cgroup *src, struct mem_cgroup *dst) { + struct cgroup_subsys_state *css; struct list_lru *lru; + int src_idx = src->kmemcg_id; + + /* + * Change kmemcg_id of this cgroup and all its descendants to the + * parent's id, and then move all entries from this cgroup's list_lrus + * to ones of the parent. + * + * After we have finished, all list_lrus corresponding to this cgroup + * are guaranteed to remain empty. So we can safely free this cgroup's + * list lrus in memcg_list_lru_free(). + * + * Changing ->kmemcg_id to the parent can prevent memcg_list_lru_alloc() + * from allocating list lrus for this cgroup after memcg_list_lru_free() + * call. + */ + rcu_read_lock(); + css_for_each_descendant_pre(css, &src->css) { + struct mem_cgroup *memcg; + + memcg = mem_cgroup_from_css(css); + memcg->kmemcg_id = dst->kmemcg_id; + } + rcu_read_unlock(); mutex_lock(&list_lrus_mutex); list_for_each_entry(lru, &memcg_list_lrus, list) - memcg_drain_list_lru(lru, src_idx, dst_memcg); + memcg_drain_list_lru(lru, src_idx, dst); mutex_unlock(&list_lrus_mutex); } @@ -528,7 +542,7 @@ static bool memcg_list_lru_allocated(str return true; rcu_read_lock(); - allocated = !!rcu_dereference(lru->mlrus)->mlru[idx]; + allocated = !!rcu_access_pointer(rcu_dereference(lru->mlrus)->mlru[idx]); rcu_read_unlock(); return allocated; @@ -576,11 +590,12 @@ int memcg_list_lru_alloc(struct mem_cgro mlrus = rcu_dereference_protected(lru->mlrus, true); while (i--) { int index = table[i].memcg->kmemcg_id; + struct list_lru_per_memcg *mlru = table[i].mlru; - if (mlrus->mlru[index]) - kfree(table[i].mlru); + if (index < 0 || rcu_dereference_protected(mlrus->mlru[index], true)) + kfree(mlru); else - mlrus->mlru[index] = table[i].mlru; + rcu_assign_pointer(mlrus->mlru[index], mlru); } spin_unlock_irqrestore(&lru->lock, flags); --- a/mm/memcontrol.c~mm-list_lru-allocate-list_lru_one-only-when-needed +++ a/mm/memcontrol.c @@ -3709,6 +3709,10 @@ static void memcg_offline_kmem(struct me memcg_reparent_objcgs(memcg, parent); + /* + * memcg_drain_all_list_lrus() can change memcg->kmemcg_id. + * Cache it to local @kmemcg_id. + */ kmemcg_id = memcg->kmemcg_id; /* @@ -3717,7 +3721,7 @@ static void memcg_offline_kmem(struct me * The ordering is imposed by list_lru_node->lock taken by * memcg_drain_all_list_lrus(). */ - memcg_drain_all_list_lrus(kmemcg_id, parent); + memcg_drain_all_list_lrus(memcg, parent); memcg_free_cache_id(kmemcg_id); } From patchwork Tue Mar 22 21:41:22 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789095 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 42A40C433EF for ; Tue, 22 Mar 2022 21:41:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C90146B008C; Tue, 22 Mar 2022 17:41:25 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C3EFC6B0092; Tue, 22 Mar 2022 17:41:25 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B07CF6B0093; Tue, 22 Mar 2022 17:41:25 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.28]) by kanga.kvack.org (Postfix) with ESMTP id A1C966B008C for ; Tue, 22 Mar 2022 17:41:25 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 7C82861999 for ; Tue, 22 Mar 2022 21:41:25 +0000 (UTC) X-FDA: 79273343730.07.066D8FC Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf28.hostedemail.com (Postfix) with ESMTP id 01E28C002A for ; Tue, 22 Mar 2022 21:41:24 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id D1C05B81D9E; Tue, 22 Mar 2022 21:41:23 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id E8BF3C340EC; Tue, 22 Mar 2022 21:41:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985283; bh=QnUfLlF3aSdOdedL6HMjuLKYUMOSSkGo8W9FW82MmUE=; h=Date:To:From:In-Reply-To:Subject:From; b=AxxLuPYD2rUdP6eyYYER7Wdk3CdeltJCimEIIDfi1YYTKcFtNrSexidqGC0vvvmju 2L63YrF694LCPK9wYTscQxZQPOqCoPZzpLCJu80BqaEWo7vNxQ9U9IrAWiHT0JAwXg wQlq1s8OtshhwdnBADnK9Ft8bAtpzo3ywUzq1RdY= Date: Tue, 22 Mar 2022 14:41:22 -0700 To: zhengqi.arch@bytedance.com,willy@infradead.org,vdavydov.dev@gmail.com,vbabka@suse.cz,tytso@mit.edu,trond.myklebust@hammerspace.com,shy828301@gmail.com,shakeelb@google.com,roman.gushchin@linux.dev,richard.weiyang@gmail.com,mhocko@kernel.org,kari.argillander@gmail.com,jaegeuk@kernel.org,hannes@cmpxchg.org,fam.zheng@bytedance.com,duanxiongchun@bytedance.com,david@fromorbit.com,chao@kernel.org,Anna.Schumaker@Netapp.com,alexs@kernel.org,songmuchun@bytedance.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 056/227] mm: list_lru: rename memcg_drain_all_list_lrus to memcg_reparent_list_lrus Message-Id: <20220322214122.E8BF3C340EC@smtp.kernel.org> X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 01E28C002A X-Rspam-User: Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=AxxLuPYD; dmarc=none; spf=pass (imf28.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Stat-Signature: 4jxt9qmhw8k7gq8coj1x8u61g5xhywdf X-HE-Tag: 1647985284-636774 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Muchun Song Subject: mm: list_lru: rename memcg_drain_all_list_lrus to memcg_reparent_list_lrus The purpose of the memcg_drain_all_list_lrus() is list_lrus reparenting. It is very similar to memcg_reparent_objcgs(). Rename it to memcg_reparent_list_lrus() so that the name can more consistent with memcg_reparent_objcgs(). Link: https://lkml.kernel.org/r/20220228122126.37293-12-songmuchun@bytedance.com Signed-off-by: Muchun Song Cc: Alex Shi Cc: Anna Schumaker Cc: Chao Yu Cc: Dave Chinner Cc: Fam Zheng Cc: Jaegeuk Kim Cc: Johannes Weiner Cc: Kari Argillander Cc: Matthew Wilcox (Oracle) Cc: Michal Hocko Cc: Qi Zheng Cc: Roman Gushchin Cc: Shakeel Butt Cc: Theodore Ts'o Cc: Trond Myklebust Cc: Vladimir Davydov Cc: Vlastimil Babka Cc: Wei Yang Cc: Xiongchun Duan Cc: Yang Shi Signed-off-by: Andrew Morton --- include/linux/list_lru.h | 2 +- mm/list_lru.c | 24 ++++++++++++------------ mm/memcontrol.c | 6 +++--- 3 files changed, 16 insertions(+), 16 deletions(-) --- a/include/linux/list_lru.h~mm-list_lru-rename-memcg_drain_all_list_lrus-to-memcg_reparent_list_lrus +++ a/include/linux/list_lru.h @@ -78,7 +78,7 @@ int __list_lru_init(struct list_lru *lru int memcg_list_lru_alloc(struct mem_cgroup *memcg, struct list_lru *lru, gfp_t gfp); int memcg_update_all_list_lrus(int num_memcgs); -void memcg_drain_all_list_lrus(struct mem_cgroup *src, struct mem_cgroup *dst); +void memcg_reparent_list_lrus(struct mem_cgroup *memcg, struct mem_cgroup *parent); /** * list_lru_add: add an element to the lru list's tail --- a/mm/list_lru.c~mm-list_lru-rename-memcg_drain_all_list_lrus-to-memcg_reparent_list_lrus +++ a/mm/list_lru.c @@ -457,8 +457,8 @@ int memcg_update_all_list_lrus(int new_s return ret; } -static void memcg_drain_list_lru_node(struct list_lru *lru, int nid, - int src_idx, struct mem_cgroup *dst_memcg) +static void memcg_reparent_list_lru_node(struct list_lru *lru, int nid, + int src_idx, struct mem_cgroup *dst_memcg) { struct list_lru_node *nlru = &lru->node[nid]; int dst_idx = dst_memcg->kmemcg_id; @@ -486,22 +486,22 @@ out: spin_unlock_irq(&nlru->lock); } -static void memcg_drain_list_lru(struct list_lru *lru, - int src_idx, struct mem_cgroup *dst_memcg) +static void memcg_reparent_list_lru(struct list_lru *lru, + int src_idx, struct mem_cgroup *dst_memcg) { int i; for_each_node(i) - memcg_drain_list_lru_node(lru, i, src_idx, dst_memcg); + memcg_reparent_list_lru_node(lru, i, src_idx, dst_memcg); memcg_list_lru_free(lru, src_idx); } -void memcg_drain_all_list_lrus(struct mem_cgroup *src, struct mem_cgroup *dst) +void memcg_reparent_list_lrus(struct mem_cgroup *memcg, struct mem_cgroup *parent) { struct cgroup_subsys_state *css; struct list_lru *lru; - int src_idx = src->kmemcg_id; + int src_idx = memcg->kmemcg_id; /* * Change kmemcg_id of this cgroup and all its descendants to the @@ -517,17 +517,17 @@ void memcg_drain_all_list_lrus(struct me * call. */ rcu_read_lock(); - css_for_each_descendant_pre(css, &src->css) { - struct mem_cgroup *memcg; + css_for_each_descendant_pre(css, &memcg->css) { + struct mem_cgroup *child; - memcg = mem_cgroup_from_css(css); - memcg->kmemcg_id = dst->kmemcg_id; + child = mem_cgroup_from_css(css); + child->kmemcg_id = parent->kmemcg_id; } rcu_read_unlock(); mutex_lock(&list_lrus_mutex); list_for_each_entry(lru, &memcg_list_lrus, list) - memcg_drain_list_lru(lru, src_idx, dst); + memcg_reparent_list_lru(lru, src_idx, parent); mutex_unlock(&list_lrus_mutex); } --- a/mm/memcontrol.c~mm-list_lru-rename-memcg_drain_all_list_lrus-to-memcg_reparent_list_lrus +++ a/mm/memcontrol.c @@ -3710,7 +3710,7 @@ static void memcg_offline_kmem(struct me memcg_reparent_objcgs(memcg, parent); /* - * memcg_drain_all_list_lrus() can change memcg->kmemcg_id. + * memcg_reparent_list_lrus() can change memcg->kmemcg_id. * Cache it to local @kmemcg_id. */ kmemcg_id = memcg->kmemcg_id; @@ -3719,9 +3719,9 @@ static void memcg_offline_kmem(struct me * After we have finished memcg_reparent_objcgs(), all list_lrus * corresponding to this cgroup are guaranteed to remain empty. * The ordering is imposed by list_lru_node->lock taken by - * memcg_drain_all_list_lrus(). + * memcg_reparent_list_lrus(). */ - memcg_drain_all_list_lrus(memcg, parent); + memcg_reparent_list_lrus(memcg, parent); memcg_free_cache_id(kmemcg_id); } From patchwork Tue Mar 22 21:41:25 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789107 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6CB6DC433EF for ; Tue, 22 Mar 2022 21:42:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 00D656B00A8; Tue, 22 Mar 2022 17:42:03 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id ED7006B00A9; Tue, 22 Mar 2022 17:42:02 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D2AAC6B00AA; Tue, 22 Mar 2022 17:42:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0202.hostedemail.com [216.40.44.202]) by kanga.kvack.org (Postfix) with ESMTP id BA0A66B00A8 for ; Tue, 22 Mar 2022 17:42:02 -0400 (EDT) Received: from smtpin17.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 786A7A2B13 for ; Tue, 22 Mar 2022 21:42:02 +0000 (UTC) X-FDA: 79273345284.17.AC90E31 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf12.hostedemail.com (Postfix) with ESMTP id 7D7E840036 for ; Tue, 22 Mar 2022 21:41:27 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id C4D14610A1; Tue, 22 Mar 2022 21:41:26 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 1F5EDC340EC; Tue, 22 Mar 2022 21:41:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985286; bh=JuTSjcCp+NTzB4cIPz60w+hHPA62H4XTW1BXemRJUU0=; h=Date:To:From:In-Reply-To:Subject:From; b=G+RTXAaIYf7cDyceDoAxiSdYeNVYEVGFbwKoc6DYywoSdCsCd7/2IgfBB/iGErlXB TtywwV1Df5KlUdU45W7o1YWCBDuCMh76T3x6WwrFrLfB9GvI6s5veCOn49lzAl+TGB LxFYI+xTrucuYGzY5AOsJwfPSJEZKOceMnx043Vc= Date: Tue, 22 Mar 2022 14:41:25 -0700 To: zhengqi.arch@bytedance.com,willy@infradead.org,vdavydov.dev@gmail.com,vbabka@suse.cz,tytso@mit.edu,trond.myklebust@hammerspace.com,shy828301@gmail.com,shakeelb@google.com,roman.gushchin@linux.dev,richard.weiyang@gmail.com,mhocko@kernel.org,kari.argillander@gmail.com,jaegeuk@kernel.org,hannes@cmpxchg.org,fam.zheng@bytedance.com,duanxiongchun@bytedance.com,david@fromorbit.com,chao@kernel.org,Anna.Schumaker@Netapp.com,alexs@kernel.org,songmuchun@bytedance.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 057/227] mm: list_lru: replace linear array with xarray Message-Id: <20220322214126.1F5EDC340EC@smtp.kernel.org> X-Rspam-User: X-Stat-Signature: j1bzodty3ogdqicspchxmpym3kaiheyy Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=G+RTXAaI; spf=pass (imf12.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 7D7E840036 X-HE-Tag: 1647985287-926601 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Muchun Song Subject: mm: list_lru: replace linear array with xarray If we run 10k containers in the system, the size of the list_lru_memcg->lrus can be ~96KB per list_lru. When we decrease the number containers, the size of the array will not be shrinked. It is not scalable. The xarray is a good choice for this case. We can save a lot of memory when there are tens of thousands continers in the system. If we use xarray, we also can remove the logic code of resizing array, which can simplify the code. [akpm@linux-foundation.org: remove unused local] Link: https://lkml.kernel.org/r/20220228122126.37293-13-songmuchun@bytedance.com Signed-off-by: Muchun Song Cc: Alex Shi Cc: Anna Schumaker Cc: Chao Yu Cc: Dave Chinner Cc: Fam Zheng Cc: Jaegeuk Kim Cc: Johannes Weiner Cc: Kari Argillander Cc: Matthew Wilcox (Oracle) Cc: Michal Hocko Cc: Qi Zheng Cc: Roman Gushchin Cc: Shakeel Butt Cc: Theodore Ts'o Cc: Trond Myklebust Cc: Vladimir Davydov Cc: Vlastimil Babka Cc: Wei Yang Cc: Xiongchun Duan Cc: Yang Shi Signed-off-by: Andrew Morton --- include/linux/list_lru.h | 13 -- include/linux/memcontrol.h | 23 --- mm/list_lru.c | 203 +++++++++++------------------------ mm/memcontrol.c | 77 ------------- 4 files changed, 73 insertions(+), 243 deletions(-) --- a/include/linux/list_lru.h~mm-list_lru-replace-linear-array-with-xarray +++ a/include/linux/list_lru.h @@ -11,6 +11,7 @@ #include #include #include +#include struct mem_cgroup; @@ -37,12 +38,6 @@ struct list_lru_per_memcg { struct list_lru_one node[]; }; -struct list_lru_memcg { - struct rcu_head rcu; - /* array of per cgroup lists, indexed by memcg_cache_id */ - struct list_lru_per_memcg __rcu *mlru[]; -}; - struct list_lru_node { /* protects all lists on the node, including per cgroup */ spinlock_t lock; @@ -57,10 +52,7 @@ struct list_lru { struct list_head list; int shrinker_id; bool memcg_aware; - /* protects ->mlrus->mlru[i] */ - spinlock_t lock; - /* for cgroup aware lrus points to per cgroup lists, otherwise NULL */ - struct list_lru_memcg __rcu *mlrus; + struct xarray xa; #endif }; @@ -77,7 +69,6 @@ int __list_lru_init(struct list_lru *lru int memcg_list_lru_alloc(struct mem_cgroup *memcg, struct list_lru *lru, gfp_t gfp); -int memcg_update_all_list_lrus(int num_memcgs); void memcg_reparent_list_lrus(struct mem_cgroup *memcg, struct mem_cgroup *parent); /** --- a/include/linux/memcontrol.h~mm-list_lru-replace-linear-array-with-xarray +++ a/include/linux/memcontrol.h @@ -1685,18 +1685,6 @@ void obj_cgroup_uncharge(struct obj_cgro extern struct static_key_false memcg_kmem_enabled_key; -extern int memcg_nr_cache_ids; -void memcg_get_cache_ids(void); -void memcg_put_cache_ids(void); - -/* - * Helper macro to loop through all memcg-specific caches. Callers must still - * check if the cache is valid (it is either valid or NULL). - * the slab_mutex must be held when looping through those caches - */ -#define for_each_memcg_cache_index(_idx) \ - for ((_idx) = 0; (_idx) < memcg_nr_cache_ids; (_idx)++) - static inline bool memcg_kmem_enabled(void) { return static_branch_likely(&memcg_kmem_enabled_key); @@ -1753,9 +1741,6 @@ static inline void __memcg_kmem_uncharge { } -#define for_each_memcg_cache_index(_idx) \ - for (; NULL; ) - static inline bool memcg_kmem_enabled(void) { return false; @@ -1766,14 +1751,6 @@ static inline int memcg_cache_id(struct return -1; } -static inline void memcg_get_cache_ids(void) -{ -} - -static inline void memcg_put_cache_ids(void) -{ -} - static inline struct mem_cgroup *mem_cgroup_from_obj(void *p) { return NULL; --- a/mm/list_lru.c~mm-list_lru-replace-linear-array-with-xarray +++ a/mm/list_lru.c @@ -52,21 +52,12 @@ static int lru_shrinker_id(struct list_l static inline struct list_lru_one * list_lru_from_memcg_idx(struct list_lru *lru, int nid, int idx) { - struct list_lru_memcg *mlrus; - struct list_lru_node *nlru = &lru->node[nid]; - - /* - * Either lock or RCU protects the array of per cgroup lists - * from relocation (see memcg_update_list_lru). - */ - mlrus = rcu_dereference_check(lru->mlrus, lockdep_is_held(&nlru->lock)); - if (mlrus && idx >= 0) { - struct list_lru_per_memcg *mlru; + if (list_lru_memcg_aware(lru) && idx >= 0) { + struct list_lru_per_memcg *mlru = xa_load(&lru->xa, idx); - mlru = rcu_dereference_check(mlrus->mlru[idx], true); return mlru ? &mlru->node[nid] : NULL; } - return &nlru->lru; + return &lru->node[nid].lru; } static inline struct list_lru_one * @@ -77,7 +68,7 @@ list_lru_from_kmem(struct list_lru *lru, struct list_lru_one *l = &nlru->lru; struct mem_cgroup *memcg = NULL; - if (!lru->mlrus) + if (!list_lru_memcg_aware(lru)) goto out; memcg = mem_cgroup_from_obj(ptr); @@ -309,16 +300,20 @@ unsigned long list_lru_walk_node(struct unsigned long *nr_to_walk) { long isolated = 0; - int memcg_idx; isolated += list_lru_walk_one(lru, nid, NULL, isolate, cb_arg, nr_to_walk); + +#ifdef CONFIG_MEMCG_KMEM if (*nr_to_walk > 0 && list_lru_memcg_aware(lru)) { - for_each_memcg_cache_index(memcg_idx) { + struct list_lru_per_memcg *mlru; + unsigned long index; + + xa_for_each(&lru->xa, index, mlru) { struct list_lru_node *nlru = &lru->node[nid]; spin_lock(&nlru->lock); - isolated += __list_lru_walk_one(lru, nid, memcg_idx, + isolated += __list_lru_walk_one(lru, nid, index, isolate, cb_arg, nr_to_walk); spin_unlock(&nlru->lock); @@ -327,6 +322,8 @@ unsigned long list_lru_walk_node(struct break; } } +#endif + return isolated; } EXPORT_SYMBOL_GPL(list_lru_walk_node); @@ -338,15 +335,6 @@ static void init_one_lru(struct list_lru } #ifdef CONFIG_MEMCG_KMEM -static void memcg_destroy_list_lru_range(struct list_lru_memcg *mlrus, - int begin, int end) -{ - int i; - - for (i = begin; i < end; i++) - kfree(mlrus->mlru[i]); -} - static struct list_lru_per_memcg *memcg_init_list_lru_one(gfp_t gfp) { int nid; @@ -364,14 +352,7 @@ static struct list_lru_per_memcg *memcg_ static void memcg_list_lru_free(struct list_lru *lru, int src_idx) { - struct list_lru_memcg *mlrus; - struct list_lru_per_memcg *mlru; - - spin_lock_irq(&lru->lock); - mlrus = rcu_dereference_protected(lru->mlrus, true); - mlru = rcu_dereference_protected(mlrus->mlru[src_idx], true); - rcu_assign_pointer(mlrus->mlru[src_idx], NULL); - spin_unlock_irq(&lru->lock); + struct list_lru_per_memcg *mlru = xa_erase_irq(&lru->xa, src_idx); /* * The __list_lru_walk_one() can walk the list of this node. @@ -383,78 +364,27 @@ static void memcg_list_lru_free(struct l kvfree_rcu(mlru, rcu); } -static int memcg_init_list_lru(struct list_lru *lru, bool memcg_aware) +static inline void memcg_init_list_lru(struct list_lru *lru, bool memcg_aware) { - struct list_lru_memcg *mlrus; - int size = memcg_nr_cache_ids; - + if (memcg_aware) + xa_init_flags(&lru->xa, XA_FLAGS_LOCK_IRQ); lru->memcg_aware = memcg_aware; - if (!memcg_aware) - return 0; - - spin_lock_init(&lru->lock); - - mlrus = kvzalloc(struct_size(mlrus, mlru, size), GFP_KERNEL); - if (!mlrus) - return -ENOMEM; - - RCU_INIT_POINTER(lru->mlrus, mlrus); - - return 0; } static void memcg_destroy_list_lru(struct list_lru *lru) { - struct list_lru_memcg *mlrus; + XA_STATE(xas, &lru->xa, 0); + struct list_lru_per_memcg *mlru; if (!list_lru_memcg_aware(lru)) return; - /* - * This is called when shrinker has already been unregistered, - * and nobody can use it. So, there is no need to use kvfree_rcu(). - */ - mlrus = rcu_dereference_protected(lru->mlrus, true); - memcg_destroy_list_lru_range(mlrus, 0, memcg_nr_cache_ids); - kvfree(mlrus); -} - -static int memcg_update_list_lru(struct list_lru *lru, int old_size, int new_size) -{ - struct list_lru_memcg *old, *new; - - BUG_ON(old_size > new_size); - - old = rcu_dereference_protected(lru->mlrus, - lockdep_is_held(&list_lrus_mutex)); - new = kvmalloc(struct_size(new, mlru, new_size), GFP_KERNEL); - if (!new) - return -ENOMEM; - - spin_lock_irq(&lru->lock); - memcpy(&new->mlru, &old->mlru, flex_array_size(new, mlru, old_size)); - memset(&new->mlru[old_size], 0, flex_array_size(new, mlru, new_size - old_size)); - rcu_assign_pointer(lru->mlrus, new); - spin_unlock_irq(&lru->lock); - - kvfree_rcu(old, rcu); - return 0; -} - -int memcg_update_all_list_lrus(int new_size) -{ - int ret = 0; - struct list_lru *lru; - int old_size = memcg_nr_cache_ids; - - mutex_lock(&list_lrus_mutex); - list_for_each_entry(lru, &memcg_list_lrus, list) { - ret = memcg_update_list_lru(lru, old_size, new_size); - if (ret) - break; + xas_lock_irq(&xas); + xas_for_each(&xas, mlru, ULONG_MAX) { + kfree(mlru); + xas_store(&xas, NULL); } - mutex_unlock(&list_lrus_mutex); - return ret; + xas_unlock_irq(&xas); } static void memcg_reparent_list_lru_node(struct list_lru *lru, int nid, @@ -521,7 +451,7 @@ void memcg_reparent_list_lrus(struct mem struct mem_cgroup *child; child = mem_cgroup_from_css(css); - child->kmemcg_id = parent->kmemcg_id; + WRITE_ONCE(child->kmemcg_id, parent->kmemcg_id); } rcu_read_unlock(); @@ -531,21 +461,12 @@ void memcg_reparent_list_lrus(struct mem mutex_unlock(&list_lrus_mutex); } -static bool memcg_list_lru_allocated(struct mem_cgroup *memcg, - struct list_lru *lru) +static inline bool memcg_list_lru_allocated(struct mem_cgroup *memcg, + struct list_lru *lru) { - bool allocated; - int idx; - - idx = memcg->kmemcg_id; - if (unlikely(idx < 0)) - return true; + int idx = memcg->kmemcg_id; - rcu_read_lock(); - allocated = !!rcu_access_pointer(rcu_dereference(lru->mlrus)->mlru[idx]); - rcu_read_unlock(); - - return allocated; + return idx < 0 || xa_load(&lru->xa, idx); } int memcg_list_lru_alloc(struct mem_cgroup *memcg, struct list_lru *lru, @@ -553,11 +474,11 @@ int memcg_list_lru_alloc(struct mem_cgro { int i; unsigned long flags; - struct list_lru_memcg *mlrus; struct list_lru_memcg_table { struct list_lru_per_memcg *mlru; struct mem_cgroup *memcg; } *table; + XA_STATE(xas, &lru->xa, 0); if (!list_lru_memcg_aware(lru) || memcg_list_lru_allocated(memcg, lru)) return 0; @@ -586,27 +507,48 @@ int memcg_list_lru_alloc(struct mem_cgro } } - spin_lock_irqsave(&lru->lock, flags); - mlrus = rcu_dereference_protected(lru->mlrus, true); + xas_lock_irqsave(&xas, flags); while (i--) { - int index = table[i].memcg->kmemcg_id; + int index = READ_ONCE(table[i].memcg->kmemcg_id); struct list_lru_per_memcg *mlru = table[i].mlru; - if (index < 0 || rcu_dereference_protected(mlrus->mlru[index], true)) + xas_set(&xas, index); +retry: + if (unlikely(index < 0 || xas_error(&xas) || xas_load(&xas))) { kfree(mlru); - else - rcu_assign_pointer(mlrus->mlru[index], mlru); + } else { + xas_store(&xas, mlru); + if (xas_error(&xas) == -ENOMEM) { + xas_unlock_irqrestore(&xas, flags); + if (xas_nomem(&xas, gfp)) + xas_set_err(&xas, 0); + xas_lock_irqsave(&xas, flags); + /* + * The xas lock has been released, this memcg + * can be reparented before us. So reload + * memcg id. More details see the comments + * in memcg_reparent_list_lrus(). + */ + index = READ_ONCE(table[i].memcg->kmemcg_id); + if (index < 0) + xas_set_err(&xas, 0); + else if (!xas_error(&xas) && index != xas.xa_index) + xas_set(&xas, index); + goto retry; + } + } } - spin_unlock_irqrestore(&lru->lock, flags); - + /* xas_nomem() is used to free memory instead of memory allocation. */ + if (xas.xa_alloc) + xas_nomem(&xas, gfp); + xas_unlock_irqrestore(&xas, flags); kfree(table); - return 0; + return xas_error(&xas); } #else -static int memcg_init_list_lru(struct list_lru *lru, bool memcg_aware) +static inline void memcg_init_list_lru(struct list_lru *lru, bool memcg_aware) { - return 0; } static void memcg_destroy_list_lru(struct list_lru *lru) @@ -618,7 +560,6 @@ int __list_lru_init(struct list_lru *lru struct lock_class_key *key, struct shrinker *shrinker) { int i; - int err = -ENOMEM; #ifdef CONFIG_MEMCG_KMEM if (shrinker) @@ -626,11 +567,10 @@ int __list_lru_init(struct list_lru *lru else lru->shrinker_id = -1; #endif - memcg_get_cache_ids(); lru->node = kcalloc(nr_node_ids, sizeof(*lru->node), GFP_KERNEL); if (!lru->node) - goto out; + return -ENOMEM; for_each_node(i) { spin_lock_init(&lru->node[i].lock); @@ -639,18 +579,10 @@ int __list_lru_init(struct list_lru *lru init_one_lru(&lru->node[i].lru); } - err = memcg_init_list_lru(lru, memcg_aware); - if (err) { - kfree(lru->node); - /* Do this so a list_lru_destroy() doesn't crash: */ - lru->node = NULL; - goto out; - } - + memcg_init_list_lru(lru, memcg_aware); list_lru_register(lru); -out: - memcg_put_cache_ids(); - return err; + + return 0; } EXPORT_SYMBOL_GPL(__list_lru_init); @@ -660,8 +592,6 @@ void list_lru_destroy(struct list_lru *l if (!lru->node) return; - memcg_get_cache_ids(); - list_lru_unregister(lru); memcg_destroy_list_lru(lru); @@ -671,6 +601,5 @@ void list_lru_destroy(struct list_lru *l #ifdef CONFIG_MEMCG_KMEM lru->shrinker_id = -1; #endif - memcg_put_cache_ids(); } EXPORT_SYMBOL_GPL(list_lru_destroy); --- a/mm/memcontrol.c~mm-list_lru-replace-linear-array-with-xarray +++ a/mm/memcontrol.c @@ -351,42 +351,17 @@ static void memcg_reparent_objcgs(struct * This will be used as a shrinker list's index. * The main reason for not using cgroup id for this: * this works better in sparse environments, where we have a lot of memcgs, - * but only a few kmem-limited. Or also, if we have, for instance, 200 - * memcgs, and none but the 200th is kmem-limited, we'd have to have a - * 200 entry array for that. - * - * The current size of the caches array is stored in memcg_nr_cache_ids. It - * will double each time we have to increase it. + * but only a few kmem-limited. */ static DEFINE_IDA(memcg_cache_ida); -int memcg_nr_cache_ids; - -/* Protects memcg_nr_cache_ids */ -static DECLARE_RWSEM(memcg_cache_ids_sem); - -void memcg_get_cache_ids(void) -{ - down_read(&memcg_cache_ids_sem); -} - -void memcg_put_cache_ids(void) -{ - up_read(&memcg_cache_ids_sem); -} /* - * MIN_SIZE is different than 1, because we would like to avoid going through - * the alloc/free process all the time. In a small machine, 4 kmem-limited - * cgroups is a reasonable guess. In the future, it could be a parameter or - * tunable, but that is strictly not necessary. - * * MAX_SIZE should be as large as the number of cgrp_ids. Ideally, we could get * this constant directly from cgroup, but it is understandable that this is * better kept as an internal representation in cgroup.c. In any case, the * cgrp_id space is not getting any smaller, and we don't have to necessarily * increase ours as well if it increases. */ -#define MEMCG_CACHES_MIN_SIZE 4 #define MEMCG_CACHES_MAX_SIZE MEM_CGROUP_ID_MAX /* @@ -2944,49 +2919,6 @@ __always_inline struct obj_cgroup *get_o return objcg; } -static int memcg_alloc_cache_id(void) -{ - int id, size; - int err; - - id = ida_simple_get(&memcg_cache_ida, - 0, MEMCG_CACHES_MAX_SIZE, GFP_KERNEL); - if (id < 0) - return id; - - if (id < memcg_nr_cache_ids) - return id; - - /* - * There's no space for the new id in memcg_caches arrays, - * so we have to grow them. - */ - down_write(&memcg_cache_ids_sem); - - size = 2 * (id + 1); - if (size < MEMCG_CACHES_MIN_SIZE) - size = MEMCG_CACHES_MIN_SIZE; - else if (size > MEMCG_CACHES_MAX_SIZE) - size = MEMCG_CACHES_MAX_SIZE; - - err = memcg_update_all_list_lrus(size); - if (!err) - memcg_nr_cache_ids = size; - - up_write(&memcg_cache_ids_sem); - - if (err) { - ida_simple_remove(&memcg_cache_ida, id); - return err; - } - return id; -} - -static void memcg_free_cache_id(int id) -{ - ida_simple_remove(&memcg_cache_ida, id); -} - static void memcg_account_kmem(struct mem_cgroup *memcg, int nr_pages) { mod_memcg_state(memcg, MEMCG_KMEM, nr_pages); @@ -3673,13 +3605,14 @@ static int memcg_online_kmem(struct mem_ if (unlikely(mem_cgroup_is_root(memcg))) return 0; - memcg_id = memcg_alloc_cache_id(); + memcg_id = ida_alloc_max(&memcg_cache_ida, MEMCG_CACHES_MAX_SIZE - 1, + GFP_KERNEL); if (memcg_id < 0) return memcg_id; objcg = obj_cgroup_alloc(); if (!objcg) { - memcg_free_cache_id(memcg_id); + ida_free(&memcg_cache_ida, memcg_id); return -ENOMEM; } objcg->memcg = memcg; @@ -3723,7 +3656,7 @@ static void memcg_offline_kmem(struct me */ memcg_reparent_list_lrus(memcg, parent); - memcg_free_cache_id(kmemcg_id); + ida_free(&memcg_cache_ida, kmemcg_id); } #else static int memcg_online_kmem(struct mem_cgroup *memcg) From patchwork Tue Mar 22 21:41:28 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789096 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9B7CDC433FE for ; Tue, 22 Mar 2022 21:41:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2F3B46B0093; Tue, 22 Mar 2022 17:41:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2A2396B0095; Tue, 22 Mar 2022 17:41:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1BA296B0096; Tue, 22 Mar 2022 17:41:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.26]) by kanga.kvack.org (Postfix) with ESMTP id 0DA156B0093 for ; Tue, 22 Mar 2022 17:41:31 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id DEA621943 for ; Tue, 22 Mar 2022 21:41:30 +0000 (UTC) X-FDA: 79273343940.07.D54656D Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf01.hostedemail.com (Postfix) with ESMTP id 6409B40021 for ; Tue, 22 Mar 2022 21:41:30 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id C5DE4611D1; Tue, 22 Mar 2022 21:41:29 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 431F9C340EC; Tue, 22 Mar 2022 21:41:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985289; bh=Q8K3zCzmlXsZoH+fkhRGs2h/kdfXjuXuiDO9aNZLWSQ=; h=Date:To:From:In-Reply-To:Subject:From; b=Cxup5RnnUU81jFJJzQQVdnBWZUoy+u9tI/1W70ikMUO+89XWGyXMNaL7G7+gaYcBe FcZIXKeuwf0Y7NT1c76gD5fU9qhOLc92F5xozpze/6pGmZW/qALGJguA9in6NGWAJY dLNcTJK5i5XeKnon3S6MQfcO6Wr8aHn/UgBVvytg= Date: Tue, 22 Mar 2022 14:41:28 -0700 To: zhengqi.arch@bytedance.com,willy@infradead.org,vdavydov.dev@gmail.com,vbabka@suse.cz,tytso@mit.edu,trond.myklebust@hammerspace.com,shy828301@gmail.com,shakeelb@google.com,roman.gushchin@linux.dev,richard.weiyang@gmail.com,mhocko@kernel.org,kari.argillander@gmail.com,jaegeuk@kernel.org,hannes@cmpxchg.org,fam.zheng@bytedance.com,duanxiongchun@bytedance.com,david@fromorbit.com,chao@kernel.org,Anna.Schumaker@Netapp.com,alexs@kernel.org,songmuchun@bytedance.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 058/227] mm: memcontrol: reuse memory cgroup ID for kmem ID Message-Id: <20220322214129.431F9C340EC@smtp.kernel.org> X-Rspam-User: Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=Cxup5Rnn; spf=pass (imf01.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 6409B40021 X-Stat-Signature: 1yt97hpxghfnyo1rb8z4w5ye3i3ffez5 X-HE-Tag: 1647985290-198224 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Muchun Song Subject: mm: memcontrol: reuse memory cgroup ID for kmem ID There are two idrs being used by memory cgroup, one is for kmem ID, another is for memory cgroup ID. The maximum ID of both is 64Ki. Both of them can limit the total number of memory cgroups. Actually, we can reuse memory cgroup ID for kmem ID to simplify the code. Link: https://lkml.kernel.org/r/20220228122126.37293-14-songmuchun@bytedance.com Signed-off-by: Muchun Song Cc: Alex Shi Cc: Anna Schumaker Cc: Chao Yu Cc: Dave Chinner Cc: Fam Zheng Cc: Jaegeuk Kim Cc: Johannes Weiner Cc: Kari Argillander Cc: Matthew Wilcox (Oracle) Cc: Michal Hocko Cc: Qi Zheng Cc: Roman Gushchin Cc: Shakeel Butt Cc: Theodore Ts'o Cc: Trond Myklebust Cc: Vladimir Davydov Cc: Vlastimil Babka Cc: Wei Yang Cc: Xiongchun Duan Cc: Yang Shi Signed-off-by: Andrew Morton --- mm/memcontrol.c | 39 +++------------------------------------ 1 file changed, 3 insertions(+), 36 deletions(-) --- a/mm/memcontrol.c~mm-memcontrol-reuse-memory-cgroup-id-for-kmem-id +++ a/mm/memcontrol.c @@ -348,23 +348,6 @@ static void memcg_reparent_objcgs(struct } /* - * This will be used as a shrinker list's index. - * The main reason for not using cgroup id for this: - * this works better in sparse environments, where we have a lot of memcgs, - * but only a few kmem-limited. - */ -static DEFINE_IDA(memcg_cache_ida); - -/* - * MAX_SIZE should be as large as the number of cgrp_ids. Ideally, we could get - * this constant directly from cgroup, but it is understandable that this is - * better kept as an internal representation in cgroup.c. In any case, the - * cgrp_id space is not getting any smaller, and we don't have to necessarily - * increase ours as well if it increases. - */ -#define MEMCG_CACHES_MAX_SIZE MEM_CGROUP_ID_MAX - -/* * A lot of the calls to the cache allocation functions are expected to be * inlined by the compiler. Since the calls to memcg_slab_pre_alloc_hook() are * conditional to this static branch, we'll have to allow modules that does @@ -3597,7 +3580,6 @@ static u64 mem_cgroup_read_u64(struct cg static int memcg_online_kmem(struct mem_cgroup *memcg) { struct obj_cgroup *objcg; - int memcg_id; if (cgroup_memory_nokmem) return 0; @@ -3605,22 +3587,16 @@ static int memcg_online_kmem(struct mem_ if (unlikely(mem_cgroup_is_root(memcg))) return 0; - memcg_id = ida_alloc_max(&memcg_cache_ida, MEMCG_CACHES_MAX_SIZE - 1, - GFP_KERNEL); - if (memcg_id < 0) - return memcg_id; - objcg = obj_cgroup_alloc(); - if (!objcg) { - ida_free(&memcg_cache_ida, memcg_id); + if (!objcg) return -ENOMEM; - } + objcg->memcg = memcg; rcu_assign_pointer(memcg->objcg, objcg); static_branch_enable(&memcg_kmem_enabled_key); - memcg->kmemcg_id = memcg_id; + memcg->kmemcg_id = memcg->id.id; return 0; } @@ -3628,7 +3604,6 @@ static int memcg_online_kmem(struct mem_ static void memcg_offline_kmem(struct mem_cgroup *memcg) { struct mem_cgroup *parent; - int kmemcg_id; if (cgroup_memory_nokmem) return; @@ -3643,20 +3618,12 @@ static void memcg_offline_kmem(struct me memcg_reparent_objcgs(memcg, parent); /* - * memcg_reparent_list_lrus() can change memcg->kmemcg_id. - * Cache it to local @kmemcg_id. - */ - kmemcg_id = memcg->kmemcg_id; - - /* * After we have finished memcg_reparent_objcgs(), all list_lrus * corresponding to this cgroup are guaranteed to remain empty. * The ordering is imposed by list_lru_node->lock taken by * memcg_reparent_list_lrus(). */ memcg_reparent_list_lrus(memcg, parent); - - ida_free(&memcg_cache_ida, kmemcg_id); } #else static int memcg_online_kmem(struct mem_cgroup *memcg) From patchwork Tue Mar 22 21:41:31 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789097 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 39F01C433F5 for ; Tue, 22 Mar 2022 21:41:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C1B336B0096; Tue, 22 Mar 2022 17:41:35 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BCA7B6B0098; Tue, 22 Mar 2022 17:41:35 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id ADF906B0099; Tue, 22 Mar 2022 17:41:35 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.26]) by kanga.kvack.org (Postfix) with ESMTP id A01EA6B0096 for ; Tue, 22 Mar 2022 17:41:35 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 7318324410 for ; Tue, 22 Mar 2022 21:41:35 +0000 (UTC) X-FDA: 79273344150.03.ADD720D Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf19.hostedemail.com (Postfix) with ESMTP id F17F31A002E for ; Tue, 22 Mar 2022 21:41:34 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id DC64CB81DB5; Tue, 22 Mar 2022 21:41:33 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 6A250C340EC; Tue, 22 Mar 2022 21:41:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985292; bh=ilvcIuf6uKvI4zi0SpRf5kzXybDRnj5OhDN9qxRlO14=; h=Date:To:From:In-Reply-To:Subject:From; b=V/aR8zAfJVtlfGnVqBmgKbSbpAtxVoeAR7Vjvlfv0mEujtFtRo5Yt2/TONlYBVxDr e5t3W3qu9uIcN9hGhLgo7dGjJ09TIW2H0i1THaAZs5MqRflm8T2RzR1hyVtezbwBOS UvCTDPErP+M5TFRoTeYaMGBgfv+wWWTh4/LaGRuY= Date: Tue, 22 Mar 2022 14:41:31 -0700 To: zhengqi.arch@bytedance.com,willy@infradead.org,vdavydov.dev@gmail.com,vbabka@suse.cz,tytso@mit.edu,trond.myklebust@hammerspace.com,shy828301@gmail.com,shakeelb@google.com,roman.gushchin@linux.dev,richard.weiyang@gmail.com,mhocko@kernel.org,kari.argillander@gmail.com,jaegeuk@kernel.org,hannes@cmpxchg.org,fam.zheng@bytedance.com,duanxiongchun@bytedance.com,david@fromorbit.com,chao@kernel.org,Anna.Schumaker@Netapp.com,alexs@kernel.org,songmuchun@bytedance.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 059/227] mm: memcontrol: fix cannot alloc the maximum memcg ID Message-Id: <20220322214132.6A250C340EC@smtp.kernel.org> X-Stat-Signature: 6b6nzecofjr9iyb9ctt3jpycy7ceuwzu Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b="V/aR8zAf"; spf=pass (imf19.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspam-User: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: F17F31A002E X-HE-Tag: 1647985294-916190 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Muchun Song Subject: mm: memcontrol: fix cannot alloc the maximum memcg ID The idr_alloc() does not include @max ID. So in the current implementation, the maximum memcg ID is 65534 instead of 65535. It seems a bug. So fix this. Link: https://lkml.kernel.org/r/20220228122126.37293-15-songmuchun@bytedance.com Signed-off-by: Muchun Song Cc: Alex Shi Cc: Anna Schumaker Cc: Chao Yu Cc: Dave Chinner Cc: Fam Zheng Cc: Jaegeuk Kim Cc: Johannes Weiner Cc: Kari Argillander Cc: Matthew Wilcox (Oracle) Cc: Michal Hocko Cc: Qi Zheng Cc: Roman Gushchin Cc: Shakeel Butt Cc: Theodore Ts'o Cc: Trond Myklebust Cc: Vladimir Davydov Cc: Vlastimil Babka Cc: Wei Yang Cc: Xiongchun Duan Cc: Yang Shi Signed-off-by: Andrew Morton --- mm/memcontrol.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) --- a/mm/memcontrol.c~mm-memcontrol-fix-cannot-alloc-the-maximum-memcg-id +++ a/mm/memcontrol.c @@ -5088,8 +5088,7 @@ static struct mem_cgroup *mem_cgroup_all return ERR_PTR(error); memcg->id.id = idr_alloc(&mem_cgroup_idr, NULL, - 1, MEM_CGROUP_ID_MAX, - GFP_KERNEL); + 1, MEM_CGROUP_ID_MAX + 1, GFP_KERNEL); if (memcg->id.id < 0) { error = memcg->id.id; goto fail; From patchwork Tue Mar 22 21:41:35 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789098 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3B67BC433EF for ; Tue, 22 Mar 2022 21:41:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CB6E56B0099; Tue, 22 Mar 2022 17:41:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C662E6B009A; Tue, 22 Mar 2022 17:41:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B7D4F6B009B; Tue, 22 Mar 2022 17:41:37 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0195.hostedemail.com [216.40.44.195]) by kanga.kvack.org (Postfix) with ESMTP id A85876B0099 for ; Tue, 22 Mar 2022 17:41:37 -0400 (EDT) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 6ECC7182895A5 for ; Tue, 22 Mar 2022 21:41:37 +0000 (UTC) X-FDA: 79273344234.26.C614B96 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf14.hostedemail.com (Postfix) with ESMTP id 75032100025 for ; Tue, 22 Mar 2022 21:41:36 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 4544E6117F; Tue, 22 Mar 2022 21:41:36 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 90FF6C340EC; Tue, 22 Mar 2022 21:41:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985295; bh=CpEDdkiTdOKrDeK8LGm+Eoj1/yCpSJX3PCy3yXJxufA=; h=Date:To:From:In-Reply-To:Subject:From; b=lodacTqqxDt8Su6/1gc99hLNFRgwKIwP2ea1/gr+SvAK4YKzPb8tx7fjGXo2Qv5Mg evfLhJZMVel1zL0l6XBeYFN4m9vtgTG++2m/Tn6ntP7f048MsozaRfs4b1FVu+SnlD 3U2m6SPLpm8XbATFUyhe3Ic6Gc5Z/vgPufQXvwXM= Date: Tue, 22 Mar 2022 14:41:35 -0700 To: zhengqi.arch@bytedance.com,willy@infradead.org,vdavydov.dev@gmail.com,vbabka@suse.cz,tytso@mit.edu,trond.myklebust@hammerspace.com,shy828301@gmail.com,shakeelb@google.com,roman.gushchin@linux.dev,richard.weiyang@gmail.com,mhocko@kernel.org,kari.argillander@gmail.com,jaegeuk@kernel.org,hannes@cmpxchg.org,fam.zheng@bytedance.com,duanxiongchun@bytedance.com,david@fromorbit.com,chao@kernel.org,Anna.Schumaker@Netapp.com,alexs@kernel.org,songmuchun@bytedance.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 060/227] mm: list_lru: rename list_lru_per_memcg to list_lru_memcg Message-Id: <20220322214135.90FF6C340EC@smtp.kernel.org> Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=lodacTqq; spf=pass (imf14.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspam-User: X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 75032100025 X-Stat-Signature: i7b5goqqqwam3so6g73g4x69fo84dm5b X-HE-Tag: 1647985296-59989 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Muchun Song Subject: mm: list_lru: rename list_lru_per_memcg to list_lru_memcg The name of list_lru_memcg was occupied before and became free since last commit. Rename list_lru_per_memcg to list_lru_memcg since the name is brief. Link: https://lkml.kernel.org/r/20220228122126.37293-16-songmuchun@bytedance.com Signed-off-by: Muchun Song Cc: Alex Shi Cc: Anna Schumaker Cc: Chao Yu Cc: Dave Chinner Cc: Fam Zheng Cc: Jaegeuk Kim Cc: Johannes Weiner Cc: Kari Argillander Cc: Matthew Wilcox (Oracle) Cc: Michal Hocko Cc: Qi Zheng Cc: Roman Gushchin Cc: Shakeel Butt Cc: Theodore Ts'o Cc: Trond Myklebust Cc: Vladimir Davydov Cc: Vlastimil Babka Cc: Wei Yang Cc: Xiongchun Duan Cc: Yang Shi Signed-off-by: Andrew Morton --- include/linux/list_lru.h | 2 +- mm/list_lru.c | 18 +++++++++--------- 2 files changed, 10 insertions(+), 10 deletions(-) --- a/include/linux/list_lru.h~mm-list_lru-rename-list_lru_per_memcg-to-list_lru_memcg +++ a/include/linux/list_lru.h @@ -32,7 +32,7 @@ struct list_lru_one { long nr_items; }; -struct list_lru_per_memcg { +struct list_lru_memcg { struct rcu_head rcu; /* array of per cgroup per node lists, indexed by node id */ struct list_lru_one node[]; --- a/mm/list_lru.c~mm-list_lru-rename-list_lru_per_memcg-to-list_lru_memcg +++ a/mm/list_lru.c @@ -53,7 +53,7 @@ static inline struct list_lru_one * list_lru_from_memcg_idx(struct list_lru *lru, int nid, int idx) { if (list_lru_memcg_aware(lru) && idx >= 0) { - struct list_lru_per_memcg *mlru = xa_load(&lru->xa, idx); + struct list_lru_memcg *mlru = xa_load(&lru->xa, idx); return mlru ? &mlru->node[nid] : NULL; } @@ -306,7 +306,7 @@ unsigned long list_lru_walk_node(struct #ifdef CONFIG_MEMCG_KMEM if (*nr_to_walk > 0 && list_lru_memcg_aware(lru)) { - struct list_lru_per_memcg *mlru; + struct list_lru_memcg *mlru; unsigned long index; xa_for_each(&lru->xa, index, mlru) { @@ -335,10 +335,10 @@ static void init_one_lru(struct list_lru } #ifdef CONFIG_MEMCG_KMEM -static struct list_lru_per_memcg *memcg_init_list_lru_one(gfp_t gfp) +static struct list_lru_memcg *memcg_init_list_lru_one(gfp_t gfp) { int nid; - struct list_lru_per_memcg *mlru; + struct list_lru_memcg *mlru; mlru = kmalloc(struct_size(mlru, node, nr_node_ids), gfp); if (!mlru) @@ -352,7 +352,7 @@ static struct list_lru_per_memcg *memcg_ static void memcg_list_lru_free(struct list_lru *lru, int src_idx) { - struct list_lru_per_memcg *mlru = xa_erase_irq(&lru->xa, src_idx); + struct list_lru_memcg *mlru = xa_erase_irq(&lru->xa, src_idx); /* * The __list_lru_walk_one() can walk the list of this node. @@ -374,7 +374,7 @@ static inline void memcg_init_list_lru(s static void memcg_destroy_list_lru(struct list_lru *lru) { XA_STATE(xas, &lru->xa, 0); - struct list_lru_per_memcg *mlru; + struct list_lru_memcg *mlru; if (!list_lru_memcg_aware(lru)) return; @@ -475,7 +475,7 @@ int memcg_list_lru_alloc(struct mem_cgro int i; unsigned long flags; struct list_lru_memcg_table { - struct list_lru_per_memcg *mlru; + struct list_lru_memcg *mlru; struct mem_cgroup *memcg; } *table; XA_STATE(xas, &lru->xa, 0); @@ -491,7 +491,7 @@ int memcg_list_lru_alloc(struct mem_cgro /* * Because the list_lru can be reparented to the parent cgroup's * list_lru, we should make sure that this cgroup and all its - * ancestors have allocated list_lru_per_memcg. + * ancestors have allocated list_lru_memcg. */ for (i = 0; memcg; memcg = parent_mem_cgroup(memcg), i++) { if (memcg_list_lru_allocated(memcg, lru)) @@ -510,7 +510,7 @@ int memcg_list_lru_alloc(struct mem_cgro xas_lock_irqsave(&xas, flags); while (i--) { int index = READ_ONCE(table[i].memcg->kmemcg_id); - struct list_lru_per_memcg *mlru = table[i].mlru; + struct list_lru_memcg *mlru = table[i].mlru; xas_set(&xas, index); retry: From patchwork Tue Mar 22 21:41:38 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789099 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 438B1C43217 for ; Tue, 22 Mar 2022 21:41:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D2D2A6B009B; Tue, 22 Mar 2022 17:41:41 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CDB646B009C; Tue, 22 Mar 2022 17:41:41 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B07CE6B009D; Tue, 22 Mar 2022 17:41:41 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.28]) by kanga.kvack.org (Postfix) with ESMTP id 9A0CF6B009B for ; Tue, 22 Mar 2022 17:41:41 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay13.hostedemail.com (Postfix) with ESMTP id 41AAB613CA for ; Tue, 22 Mar 2022 21:41:41 +0000 (UTC) X-FDA: 79273344402.07.2302657 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf29.hostedemail.com (Postfix) with ESMTP id B6E95120012 for ; Tue, 22 Mar 2022 21:41:40 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 98C4EB81DAF; Tue, 22 Mar 2022 21:41:39 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id C09F2C340F3; Tue, 22 Mar 2022 21:41:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985298; bh=rb2yf8BdDL/eNtOb+t+e3KEJEoeZ3+IyJOBGGQFF94s=; h=Date:To:From:In-Reply-To:Subject:From; b=xFpbAOQ6am1bFejycXmVpX6HbTgZ7UsoLU1GSwzh7yf0PWctrGkXVAuKTHqTg4lzT UZmBBs85sZohcdp42cmAcrs93tNGd4840A9nDbRe3ofWWP4o9ei95NOOekgwE5cOys avJwJJR24G9ePEcM7/5fGW+SeCrK5k7okHRZU/hw= Date: Tue, 22 Mar 2022 14:41:38 -0700 To: zhengqi.arch@bytedance.com,willy@infradead.org,vdavydov.dev@gmail.com,vbabka@suse.cz,tytso@mit.edu,trond.myklebust@hammerspace.com,shy828301@gmail.com,shakeelb@google.com,roman.gushchin@linux.dev,richard.weiyang@gmail.com,mhocko@kernel.org,kari.argillander@gmail.com,jaegeuk@kernel.org,hannes@cmpxchg.org,fam.zheng@bytedance.com,duanxiongchun@bytedance.com,david@fromorbit.com,chao@kernel.org,Anna.Schumaker@Netapp.com,alexs@kernel.org,songmuchun@bytedance.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 061/227] mm: memcontrol: rename memcg_cache_id to memcg_kmem_id Message-Id: <20220322214138.C09F2C340F3@smtp.kernel.org> X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: B6E95120012 X-Rspam-User: Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=xFpbAOQ6; dmarc=none; spf=pass (imf29.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Stat-Signature: sf94m4pyx6hgk1sjb1jh8pd5o16i4z9k X-HE-Tag: 1647985300-587107 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Muchun Song Subject: mm: memcontrol: rename memcg_cache_id to memcg_kmem_id The memcg_cache_id() introduced by commit 2633d7a02823 ("slab/slub: consider a memcg parameter in kmem_create_cache") is used to index in the kmem_cache->memcg_params->memcg_caches array. Since kmem_cache->memcg_params.memcg_caches has been removed by commit 9855609bde03 ("mm: memcg/slab: use a single set of kmem_caches for all accounted allocations"). So the name does not need to reflect cache related. Just rename it to memcg_kmem_id. And it can reflect kmem related. Link: https://lkml.kernel.org/r/20220228122126.37293-17-songmuchun@bytedance.com Signed-off-by: Muchun Song Cc: Alex Shi Cc: Anna Schumaker Cc: Chao Yu Cc: Dave Chinner Cc: Fam Zheng Cc: Jaegeuk Kim Cc: Johannes Weiner Cc: Kari Argillander Cc: Matthew Wilcox (Oracle) Cc: Michal Hocko Cc: Qi Zheng Cc: Roman Gushchin Cc: Shakeel Butt Cc: Theodore Ts'o Cc: Trond Myklebust Cc: Vladimir Davydov Cc: Vlastimil Babka Cc: Wei Yang Cc: Xiongchun Duan Cc: Yang Shi Signed-off-by: Andrew Morton --- include/linux/memcontrol.h | 4 ++-- mm/list_lru.c | 8 ++++---- 2 files changed, 6 insertions(+), 6 deletions(-) --- a/include/linux/memcontrol.h~mm-memcontrol-rename-memcg_cache_id-to-memcg_kmem_id +++ a/include/linux/memcontrol.h @@ -1708,7 +1708,7 @@ static inline void memcg_kmem_uncharge_p * A helper for accessing memcg's kmem_id, used for getting * corresponding LRU lists. */ -static inline int memcg_cache_id(struct mem_cgroup *memcg) +static inline int memcg_kmem_id(struct mem_cgroup *memcg) { return memcg ? memcg->kmemcg_id : -1; } @@ -1746,7 +1746,7 @@ static inline bool memcg_kmem_enabled(vo return false; } -static inline int memcg_cache_id(struct mem_cgroup *memcg) +static inline int memcg_kmem_id(struct mem_cgroup *memcg) { return -1; } --- a/mm/list_lru.c~mm-memcontrol-rename-memcg_cache_id-to-memcg_kmem_id +++ a/mm/list_lru.c @@ -75,7 +75,7 @@ list_lru_from_kmem(struct list_lru *lru, if (!memcg) goto out; - l = list_lru_from_memcg_idx(lru, nid, memcg_cache_id(memcg)); + l = list_lru_from_memcg_idx(lru, nid, memcg_kmem_id(memcg)); out: if (memcg_ptr) *memcg_ptr = memcg; @@ -182,7 +182,7 @@ unsigned long list_lru_count_one(struct long count; rcu_read_lock(); - l = list_lru_from_memcg_idx(lru, nid, memcg_cache_id(memcg)); + l = list_lru_from_memcg_idx(lru, nid, memcg_kmem_id(memcg)); count = l ? READ_ONCE(l->nr_items) : 0; rcu_read_unlock(); @@ -273,7 +273,7 @@ list_lru_walk_one(struct list_lru *lru, unsigned long ret; spin_lock(&nlru->lock); - ret = __list_lru_walk_one(lru, nid, memcg_cache_id(memcg), isolate, + ret = __list_lru_walk_one(lru, nid, memcg_kmem_id(memcg), isolate, cb_arg, nr_to_walk); spin_unlock(&nlru->lock); return ret; @@ -289,7 +289,7 @@ list_lru_walk_one_irq(struct list_lru *l unsigned long ret; spin_lock_irq(&nlru->lock); - ret = __list_lru_walk_one(lru, nid, memcg_cache_id(memcg), isolate, + ret = __list_lru_walk_one(lru, nid, memcg_kmem_id(memcg), isolate, cb_arg, nr_to_walk); spin_unlock_irq(&nlru->lock); return ret; From patchwork Tue Mar 22 21:41:41 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789100 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 59839C433F5 for ; Tue, 22 Mar 2022 21:41:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EDA406B009D; Tue, 22 Mar 2022 17:41:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DC7686B009E; Tue, 22 Mar 2022 17:41:43 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BF1446B009F; Tue, 22 Mar 2022 17:41:43 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id A12756B009D for ; Tue, 22 Mar 2022 17:41:43 -0400 (EDT) Received: from smtpin24.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 6005DA15CA for ; Tue, 22 Mar 2022 21:41:43 +0000 (UTC) X-FDA: 79273344486.24.6C40C06 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf18.hostedemail.com (Postfix) with ESMTP id DF6871C001E for ; Tue, 22 Mar 2022 21:41:42 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 70595611B2; Tue, 22 Mar 2022 21:41:42 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id C8D12C340EC; Tue, 22 Mar 2022 21:41:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985301; bh=NdXGFOaH/G3Ca+xSreW2a3NNoD/Ri6hzHW0e16G2ANk=; h=Date:To:From:In-Reply-To:Subject:From; b=xM7c0/hKuiYxABcDQtmhtKSsKP9CF1oJaNsw3haREOPpZBlOUr12yS9goyrqfOyBc iO26lcfOxTDm6/hYThVAvrudhs0Dd8PMmY8acex+4lEyiT85fSEZjRihmA8zPgbvle mW2Vcg3DzllDGuG0tKzNSEHoqq0gPx9avys7B5w0= Date: Tue, 22 Mar 2022 14:41:41 -0700 To: vdavydov.dev@gmail.com,shakeelb@google.com,roman.gushchin@linux.dev,mhocko@kernel.org,jirislaby@kernel.org,hannes@cmpxchg.org,gregkh@linuxfoundation.org,vvs@virtuozzo.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 062/227] memcg: enable accounting for tty-related objects Message-Id: <20220322214141.C8D12C340EC@smtp.kernel.org> X-Rspamd-Server: rspam09 X-Rspam-User: X-Stat-Signature: 6yatjhduxcfcp1666jn144n1g7dyaft1 Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b="xM7c0/hK"; dmarc=none; spf=pass (imf18.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Rspamd-Queue-Id: DF6871C001E X-HE-Tag: 1647985302-334747 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Vasily Averin Subject: memcg: enable accounting for tty-related objects At each login the user forces the kernel to create a new terminal and allocate up to ~1Kb memory for the tty-related structures. By default it's allowed to create up to 4096 ptys with 1024 reserve for initial mount namespace only and the settings are controlled by host admin. Though this default is not enough for hosters with thousands of containers per node. Host admin can be forced to increase it up to NR_UNIX98_PTY_MAX = 1<<20. By default container is restricted by pty mount_opt.max = 1024, but admin inside container can change it via remount. As a result, one container can consume almost all allowed ptys and allocate up to 1Gb of unaccounted memory. It is not enough per-se to trigger OOM on host, however anyway, it allows to significantly exceed the assigned memcg limit and leads to troubles on the over-committed node. It makes sense to account for them to restrict the host's memory consumption from inside the memcg-limited container. Link: https://lkml.kernel.org/r/5d4bca06-7d4f-a905-e518-12981ebca1b3@virtuozzo.com Signed-off-by: Vasily Averin Cc: Michal Hocko Cc: Shakeel Butt Cc: Johannes Weiner Cc: Vladimir Davydov Cc: Roman Gushchin Cc: Greg Kroah-Hartman Cc: Jiri Slaby Signed-off-by: Andrew Morton --- drivers/tty/tty_io.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/drivers/tty/tty_io.c~memcg-enable-accounting-for-tty-related-objects +++ a/drivers/tty/tty_io.c @@ -3088,7 +3088,7 @@ struct tty_struct *alloc_tty_struct(stru { struct tty_struct *tty; - tty = kzalloc(sizeof(*tty), GFP_KERNEL); + tty = kzalloc(sizeof(*tty), GFP_KERNEL_ACCOUNT); if (!tty) return NULL; From patchwork Tue Mar 22 21:41:44 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789101 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E5F37C4321E for ; Tue, 22 Mar 2022 21:41:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7FBC96B0072; Tue, 22 Mar 2022 17:41:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 783BC6B009F; Tue, 22 Mar 2022 17:41:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 64BBF6B00A0; Tue, 22 Mar 2022 17:41:47 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0144.hostedemail.com [216.40.44.144]) by kanga.kvack.org (Postfix) with ESMTP id 4FC596B0072 for ; Tue, 22 Mar 2022 17:41:47 -0400 (EDT) Received: from smtpin24.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 1BA3C18209BD4 for ; Tue, 22 Mar 2022 21:41:47 +0000 (UTC) X-FDA: 79273344654.24.E271404 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf18.hostedemail.com (Postfix) with ESMTP id 6EA801C001E for ; Tue, 22 Mar 2022 21:41:46 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 2C689B81D77; Tue, 22 Mar 2022 21:41:45 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id C4157C340F4; Tue, 22 Mar 2022 21:41:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985304; bh=yTJNuKPkDxUvMYoCRQwhtdWlDRoWERSjgS/Q5ChvmME=; h=Date:To:From:In-Reply-To:Subject:From; b=nCMDUFmY6nRZYVQInBxbm9qNWx8N3CjAANQ4+QCmCsvZnlNmx8mbbudvD+u2lnq0R 7OLTkKRLal+u8sWS8bEyef0K45+u0bradk6HapnQ4iMzQavVI7xYn7oE3z6AVN+YYr pR2IRSJJd8L2hciBMRdO1SdTzrW/kDmXrwXpZP6c= Date: Tue, 22 Mar 2022 14:41:44 -0700 To: shuah@kernel.org,groeck@google.com,dave.hansen@linux.intel.com,bp@suse.de,bot@kernelci.org,guillaume.tucker@collabora.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 063/227] selftests, x86: fix how check_cc.sh is being invoked Message-Id: <20220322214144.C4157C340F4@smtp.kernel.org> Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=nCMDUFmY; spf=pass (imf18.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Stat-Signature: utrpuki3ujgmhcgou8sqjbyqxjfbm7y7 X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 6EA801C001E X-HE-Tag: 1647985306-664301 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Guillaume Tucker Subject: selftests, x86: fix how check_cc.sh is being invoked The $(CC) variable used in Makefiles could contain several arguments such as "ccache gcc". These need to be passed as a single string to check_cc.sh, otherwise only the first argument will be used as the compiler command. Without quotes, the $(CC) variable is passed as distinct arguments which causes the script to fail to build trivial programs. Fix this by adding quotes around $(CC) when calling check_cc.sh to pass the whole string as a single argument to the script even if it has several words such as "ccache gcc". Link: https://lkml.kernel.org/r/d0d460d7be0107a69e3c52477761a6fe694c1840.1646991629.git.guillaume.tucker@collabora.com Fixes: e9886ace222e ("selftests, x86: Rework x86 target architecture detection") Signed-off-by: Guillaume Tucker Tested-by: "kernelci.org bot" Reviewed-by: Guenter Roeck Cc: Shuah Khan Cc: Borislav Petkov Cc: Dave Hansen Signed-off-by: Andrew Morton --- tools/testing/selftests/vm/Makefile | 6 +++--- tools/testing/selftests/x86/Makefile | 6 +++--- 2 files changed, 6 insertions(+), 6 deletions(-) --- a/tools/testing/selftests/vm/Makefile~selftests-x86-fix-how-check_ccsh-is-being-invoked +++ a/tools/testing/selftests/vm/Makefile @@ -51,9 +51,9 @@ TEST_GEN_FILES += split_huge_page_test TEST_GEN_FILES += ksm_tests ifeq ($(MACHINE),x86_64) -CAN_BUILD_I386 := $(shell ./../x86/check_cc.sh $(CC) ../x86/trivial_32bit_program.c -m32) -CAN_BUILD_X86_64 := $(shell ./../x86/check_cc.sh $(CC) ../x86/trivial_64bit_program.c) -CAN_BUILD_WITH_NOPIE := $(shell ./../x86/check_cc.sh $(CC) ../x86/trivial_program.c -no-pie) +CAN_BUILD_I386 := $(shell ./../x86/check_cc.sh "$(CC)" ../x86/trivial_32bit_program.c -m32) +CAN_BUILD_X86_64 := $(shell ./../x86/check_cc.sh "$(CC)" ../x86/trivial_64bit_program.c) +CAN_BUILD_WITH_NOPIE := $(shell ./../x86/check_cc.sh "$(CC)" ../x86/trivial_program.c -no-pie) TARGETS := protection_keys BINARIES_32 := $(TARGETS:%=%_32) --- a/tools/testing/selftests/x86/Makefile~selftests-x86-fix-how-check_ccsh-is-being-invoked +++ a/tools/testing/selftests/x86/Makefile @@ -6,9 +6,9 @@ include ../lib.mk .PHONY: all all_32 all_64 warn_32bit_failure clean UNAME_M := $(shell uname -m) -CAN_BUILD_I386 := $(shell ./check_cc.sh $(CC) trivial_32bit_program.c -m32) -CAN_BUILD_X86_64 := $(shell ./check_cc.sh $(CC) trivial_64bit_program.c) -CAN_BUILD_WITH_NOPIE := $(shell ./check_cc.sh $(CC) trivial_program.c -no-pie) +CAN_BUILD_I386 := $(shell ./check_cc.sh "$(CC)" trivial_32bit_program.c -m32) +CAN_BUILD_X86_64 := $(shell ./check_cc.sh "$(CC)" trivial_64bit_program.c) +CAN_BUILD_WITH_NOPIE := $(shell ./check_cc.sh "$(CC)" trivial_program.c -no-pie) TARGETS_C_BOTHBITS := single_step_syscall sysret_ss_attrs syscall_nt test_mremap_vdso \ check_initial_reg_state sigreturn iopl ioperm \ From patchwork Tue Mar 22 21:41:47 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789102 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CBEF3C433FE for ; Tue, 22 Mar 2022 21:41:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6017E6B0073; Tue, 22 Mar 2022 17:41:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 587D06B00A0; Tue, 22 Mar 2022 17:41:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4060F6B00A1; Tue, 22 Mar 2022 17:41:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0167.hostedemail.com [216.40.44.167]) by kanga.kvack.org (Postfix) with ESMTP id 2AA976B0073 for ; Tue, 22 Mar 2022 17:41:51 -0400 (EDT) Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id E7027182895A5 for ; Tue, 22 Mar 2022 21:41:50 +0000 (UTC) X-FDA: 79273344780.29.F3E7666 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf11.hostedemail.com (Postfix) with ESMTP id 53EA84000A for ; Tue, 22 Mar 2022 21:41:50 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 3934BB81D77; Tue, 22 Mar 2022 21:41:49 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id CA76DC340EC; Tue, 22 Mar 2022 21:41:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985307; bh=6Ox1Cq4jv+rMUGMZobKy2Bxyrl25aNYX2jq5jSm2Dm0=; h=Date:To:From:In-Reply-To:Subject:From; b=I22S8OTiko93Apyew/HmJi6rZnupb8MaEjcgx1gvaXQQg5abLlPK43W8ttcaT+Lao Qujotaa5U9uqKWRgfr4WZape37qNBTUyn9m2p6CYAP5LTdE+JMDS/qv0JTa+2cDxH6 n8ZmlU967jmxyE6Lz5nIKMpHJc6JMR0r0rFPjWfU= Date: Tue, 22 Mar 2022 14:41:47 -0700 To: will@kernel.org,paulus@samba.org,mpe@ellerman.id.au,mike.kravetz@oracle.com,davem@davemloft.net,christophe.leroy@csgroup.eu,catalin.marinas@arm.com,anshuman.khandual@arm.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 064/227] mm: merge pte_mkhuge() call into arch_make_huge_pte() Message-Id: <20220322214147.CA76DC340EC@smtp.kernel.org> X-Stat-Signature: tdyogx3xxr4cqouqu4gy7ow5yop5fnbh Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=I22S8OTi; spf=pass (imf11.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspam-User: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 53EA84000A X-HE-Tag: 1647985310-964095 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Anshuman Khandual Subject: mm: merge pte_mkhuge() call into arch_make_huge_pte() Each call into pte_mkhuge() is invariably followed by arch_make_huge_pte(). Instead arch_make_huge_pte() can accommodate pte_mkhuge() at the beginning. This updates generic fallback stub for arch_make_huge_pte() and available platforms definitions. This makes huge pte creation much cleaner and easier to follow. Link: https://lkml.kernel.org/r/1643860669-26307-1-git-send-email-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual Reviewed-by: Christophe Leroy Acked-by: Mike Kravetz Acked-by: Catalin Marinas Cc: Will Deacon Cc: Michael Ellerman Cc: Paul Mackerras Cc: "David S. Miller" Cc: Mike Kravetz Signed-off-by: Andrew Morton --- arch/arm64/mm/hugetlbpage.c | 1 + arch/powerpc/include/asm/nohash/32/hugetlb-8xx.h | 4 ++-- arch/sparc/mm/hugetlbpage.c | 1 + include/linux/hugetlb.h | 2 +- mm/hugetlb.c | 3 +-- mm/vmalloc.c | 1 - 6 files changed, 6 insertions(+), 6 deletions(-) --- a/arch/arm64/mm/hugetlbpage.c~mm-merge-pte_mkhuge-call-into-arch_make_huge_pte +++ a/arch/arm64/mm/hugetlbpage.c @@ -347,6 +347,7 @@ pte_t arch_make_huge_pte(pte_t entry, un { size_t pagesize = 1UL << shift; + entry = pte_mkhuge(entry); if (pagesize == CONT_PTE_SIZE) { entry = pte_mkcont(entry); } else if (pagesize == CONT_PMD_SIZE) { --- a/arch/powerpc/include/asm/nohash/32/hugetlb-8xx.h~mm-merge-pte_mkhuge-call-into-arch_make_huge_pte +++ a/arch/powerpc/include/asm/nohash/32/hugetlb-8xx.h @@ -71,9 +71,9 @@ static inline pte_t arch_make_huge_pte(p size_t size = 1UL << shift; if (size == SZ_16K) - return __pte(pte_val(entry) & ~_PAGE_HUGE); + return __pte(pte_val(entry) | _PAGE_SPS); else - return entry; + return __pte(pte_val(entry) | _PAGE_SPS | _PAGE_HUGE); } #define arch_make_huge_pte arch_make_huge_pte #endif --- a/arch/sparc/mm/hugetlbpage.c~mm-merge-pte_mkhuge-call-into-arch_make_huge_pte +++ a/arch/sparc/mm/hugetlbpage.c @@ -181,6 +181,7 @@ pte_t arch_make_huge_pte(pte_t entry, un { pte_t pte; + entry = pte_mkhuge(entry); pte = hugepage_shift_to_tte(entry, shift); #ifdef CONFIG_SPARC64 --- a/include/linux/hugetlb.h~mm-merge-pte_mkhuge-call-into-arch_make_huge_pte +++ a/include/linux/hugetlb.h @@ -754,7 +754,7 @@ static inline void arch_clear_hugepage_f static inline pte_t arch_make_huge_pte(pte_t entry, unsigned int shift, vm_flags_t flags) { - return entry; + return pte_mkhuge(entry); } #endif --- a/mm/hugetlb.c~mm-merge-pte_mkhuge-call-into-arch_make_huge_pte +++ a/mm/hugetlb.c @@ -4637,7 +4637,6 @@ static pte_t make_huge_pte(struct vm_are vma->vm_page_prot)); } entry = pte_mkyoung(entry); - entry = pte_mkhuge(entry); entry = arch_make_huge_pte(entry, shift, vma->vm_flags); return entry; @@ -6171,7 +6170,7 @@ unsigned long hugetlb_change_protection( unsigned int shift = huge_page_shift(hstate_vma(vma)); old_pte = huge_ptep_modify_prot_start(vma, address, ptep); - pte = pte_mkhuge(huge_pte_modify(old_pte, newprot)); + pte = huge_pte_modify(old_pte, newprot); pte = arch_make_huge_pte(pte, shift, vma->vm_flags); huge_ptep_modify_prot_commit(vma, address, ptep, old_pte, pte); pages++; --- a/mm/vmalloc.c~mm-merge-pte_mkhuge-call-into-arch_make_huge_pte +++ a/mm/vmalloc.c @@ -118,7 +118,6 @@ static int vmap_pte_range(pmd_t *pmd, un if (size != PAGE_SIZE) { pte_t entry = pfn_pte(pfn, prot); - entry = pte_mkhuge(entry); entry = arch_make_huge_pte(entry, ilog2(size), 0); set_huge_pte_at(&init_mm, addr, pte, entry); pfn += PFN_DOWN(size); From patchwork Tue Mar 22 21:41:50 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789103 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6CF08C433F5 for ; Tue, 22 Mar 2022 21:41:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 00D756B0074; Tue, 22 Mar 2022 17:41:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EB1096B00A2; Tue, 22 Mar 2022 17:41:52 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CB56B6B00A3; Tue, 22 Mar 2022 17:41:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0147.hostedemail.com [216.40.44.147]) by kanga.kvack.org (Postfix) with ESMTP id AAD5A6B00A1 for ; Tue, 22 Mar 2022 17:41:52 -0400 (EDT) Received: from smtpin16.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 709358249980 for ; Tue, 22 Mar 2022 21:41:52 +0000 (UTC) X-FDA: 79273344864.16.386597B Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf30.hostedemail.com (Postfix) with ESMTP id AD94E8001F for ; Tue, 22 Mar 2022 21:41:51 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 242DF60A14; Tue, 22 Mar 2022 21:41:51 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id DBF46C340F2; Tue, 22 Mar 2022 21:41:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985311; bh=9VpQhWU+v6rIebMhpj+1pe+IbKQ0Me39rlEhhcxuX5U=; h=Date:To:From:In-Reply-To:Subject:From; b=bYG1f6ZhH7QkZ9uuSR3upcqXx2Gm+r7YZKc7VGfmRTLTGPAdXfOzRgzfeL3fAtHQQ h2Ui0atfVkAL+Ha3UMLAM2nXVfxxAMNkbWxWr4kB+M53zozg5uitaiLgnAGPJEODO/ DBopNoFVHgQKFFFl6I4fX8MWb8cCnj18EGzbDF1s= Date: Tue, 22 Mar 2022 14:41:50 -0700 To: wangkefeng.wang@huawei.com,stefan.kristiansson@saunalahti.fi,rppt@linux.ibm.com,rmk+kernel@armlinux.org.uk,nickhu@andestech.com,jonas@southpole.se,green.hu@gmail.com,deanbo422@gmail.com,david@redhat.com,dave.hansen@linux.intel.com,christophe.leroy@c-s.fr,bcain@codeaurora.org,shorne@gmail.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 065/227] mm: remove mmu_gathers storage from remaining architectures Message-Id: <20220322214150.DBF46C340F2@smtp.kernel.org> X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: AD94E8001F X-Stat-Signature: 7gy3mc9919a5bb3a1s5ydjdny6x1p4ny X-Rspam-User: Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=bYG1f6Zh; dmarc=none; spf=pass (imf30.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-HE-Tag: 1647985311-140807 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Stafford Horne Subject: mm: remove mmu_gathers storage from remaining architectures Originally the mmu_gathers were removed in commit 1c3951769621 ("mm: now that all old mmu_gather code is gone, remove the storage"). However, the openrisc and hexagon architecture were merged around the same time and mmu_gathers was not removed. This patch removes them from openrisc, hexagon and nds32: Noticed while cleaning this warning: arch/openrisc/mm/init.c:41:1: warning: symbol 'mmu_gathers' was not declared. Should it be static? Link: https://lkml.kernel.org/r/20220205141956.3315419-1-shorne@gmail.com Signed-off-by: Stafford Horne Acked-by: Mike Rapoport Cc: Brian Cain Cc: Nick Hu Cc: Greentime Hu Cc: Vincent Chen Cc: Jonas Bonn Cc: Stefan Kristiansson Cc: Russell King Cc: David Hildenbrand Cc: Dave Hansen Cc: Kefeng Wang Cc: Christophe Leroy Signed-off-by: Andrew Morton --- arch/hexagon/mm/init.c | 2 -- arch/nds32/mm/init.c | 1 - arch/openrisc/mm/init.c | 2 -- 3 files changed, 5 deletions(-) --- a/arch/hexagon/mm/init.c~mm-remove-mmu_gathers-storage-from-remaining-architectures +++ a/arch/hexagon/mm/init.c @@ -29,8 +29,6 @@ int max_kernel_seg = 0x303; /* indicate pfn's of high memory */ unsigned long highstart_pfn, highend_pfn; -DEFINE_PER_CPU(struct mmu_gather, mmu_gathers); - /* Default cache attribute for newly created page tables */ unsigned long _dflt_cache_att = CACHEDEF; --- a/arch/nds32/mm/init.c~mm-remove-mmu_gathers-storage-from-remaining-architectures +++ a/arch/nds32/mm/init.c @@ -18,7 +18,6 @@ #include #include -DEFINE_PER_CPU(struct mmu_gather, mmu_gathers); DEFINE_SPINLOCK(anon_alias_lock); extern pgd_t swapper_pg_dir[PTRS_PER_PGD]; --- a/arch/openrisc/mm/init.c~mm-remove-mmu_gathers-storage-from-remaining-architectures +++ a/arch/openrisc/mm/init.c @@ -38,8 +38,6 @@ int mem_init_done; -DEFINE_PER_CPU(struct mmu_gather, mmu_gathers); - static void __init zone_sizes_init(void) { unsigned long max_zone_pfn[MAX_NR_ZONES] = { 0 }; From patchwork Tue Mar 22 21:41:53 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789104 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 82553C433EF for ; Tue, 22 Mar 2022 21:41:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 147A76B00A2; Tue, 22 Mar 2022 17:41:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0CDF16B00A3; Tue, 22 Mar 2022 17:41:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E627E6B00A4; Tue, 22 Mar 2022 17:41:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.28]) by kanga.kvack.org (Postfix) with ESMTP id D018C6B00A2 for ; Tue, 22 Mar 2022 17:41:55 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id ADB0EAA2 for ; Tue, 22 Mar 2022 21:41:55 +0000 (UTC) X-FDA: 79273344990.04.09BAEAD Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf19.hostedemail.com (Postfix) with ESMTP id 3B59B1A0033 for ; Tue, 22 Mar 2022 21:41:55 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 992796104C; Tue, 22 Mar 2022 21:41:54 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id E91DEC340F3; Tue, 22 Mar 2022 21:41:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985314; bh=paGbHyEWO4fjAlnjjA8dly23PY250zM57eYN+jleBqs=; h=Date:To:From:In-Reply-To:Subject:From; b=sTZ2XgW4E9TKeEU6uYhAm0Ve4/UkkL2ZUGcA15UbqNUh+VviIoeoC6FQ2QRmqyVi5 B42gRgT3VVfdvxgw3aupwrFAdoiNzbA5/5Gbtd677lp7tTpPYhbc/lJud052hFLAHH h4+OhxykYkGhqeTt+ID2bQHRDyaEoKZUc15BcR24= Date: Tue, 22 Mar 2022 14:41:53 -0700 To: ziy@nvidia.com,rientjes@google.com,peterx@redhat.com,mike.kravetz@oracle.com,lars.persson@axis.com,kirill.shutemov@linux.intel.com,fam.zheng@bytedance.com,duanxiongchun@bytedance.com,axelrasmussen@google.com,songmuchun@bytedance.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 066/227] mm: thp: fix wrong cache flush in remove_migration_pmd() Message-Id: <20220322214153.E91DEC340F3@smtp.kernel.org> X-Rspam-User: X-Stat-Signature: hauig5nocpq1sctkro1z63at1echnroo Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=sTZ2XgW4; spf=pass (imf19.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 3B59B1A0033 X-HE-Tag: 1647985315-234484 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Muchun Song Subject: mm: thp: fix wrong cache flush in remove_migration_pmd() Patch series "Fix some cache flush bugs", v5. This series focuses on fixing cache maintenance. This patch (of 7): The flush_cache_range() is supposed to be justified only if the page is already placed in process page table, and that is done right after flush_cache_range(). So using this interface is wrong. And there is no need to invalite cache since it was non-present before in remove_migration_pmd(). So just to remove it. Link: https://lkml.kernel.org/r/20220210123058.79206-1-songmuchun@bytedance.com Link: https://lkml.kernel.org/r/20220210123058.79206-2-songmuchun@bytedance.com Signed-off-by: Muchun Song Reviewed-by: Zi Yan Cc: Kirill A. Shutemov Cc: David Rientjes Cc: Lars Persson Cc: Mike Kravetz Cc: Zi Yan Cc: Xiongchun Duan Cc: Fam Zheng Cc: Muchun Song Cc: Axel Rasmussen Cc: Peter Xu Signed-off-by: Andrew Morton --- mm/huge_memory.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) --- a/mm/huge_memory.c~mm-thp-fix-wrong-cache-flush-in-remove_migration_pmd +++ a/mm/huge_memory.c @@ -3197,7 +3197,6 @@ void remove_migration_pmd(struct page_vm if (pmd_swp_uffd_wp(*pvmw->pmd)) pmde = pmd_wrprotect(pmd_mkuffd_wp(pmde)); - flush_cache_range(vma, mmun_start, mmun_start + HPAGE_PMD_SIZE); if (PageAnon(new)) page_add_anon_rmap(new, vma, mmun_start, true); else @@ -3205,6 +3204,8 @@ void remove_migration_pmd(struct page_vm set_pmd_at(mm, mmun_start, pvmw->pmd, pmde); if ((vma->vm_flags & VM_LOCKED) && !PageDoubleMap(new)) mlock_vma_page(new); + + /* No need to invalidate - it was non-present before */ update_mmu_cache_pmd(vma, address, pvmw->pmd); } #endif From patchwork Tue Mar 22 21:41:56 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789106 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D8080C433F5 for ; Tue, 22 Mar 2022 21:41:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1962C6B00A5; Tue, 22 Mar 2022 17:41:59 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 11C9F6B00A6; Tue, 22 Mar 2022 17:41:59 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id ED7236B00A7; Tue, 22 Mar 2022 17:41:58 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.25]) by kanga.kvack.org (Postfix) with ESMTP id D21C76B00A5 for ; Tue, 22 Mar 2022 17:41:58 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay11.hostedemail.com (Postfix) with ESMTP id AC80881A2A for ; Tue, 22 Mar 2022 21:41:58 +0000 (UTC) X-FDA: 79273345116.14.76334B4 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf01.hostedemail.com (Postfix) with ESMTP id 345A040030 for ; Tue, 22 Mar 2022 21:41:58 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id A7A5F60A1B; Tue, 22 Mar 2022 21:41:57 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0D905C340EC; Tue, 22 Mar 2022 21:41:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985317; bh=tvkOCvNvu86WaZ7JJiDd5vDGRJ4/ZlmTN54lTdhKS5g=; h=Date:To:From:In-Reply-To:Subject:From; b=0lYBmuN9qqJ/SfjJeLRfxlYUpYI75Ko2Iarel0UBf6JAP05S+6lXtdkgONuDyGr38 EVU06pRUEq+0iP1ye1jgGp8UNz7m1vPh31xyPKN4ybUF7u3Rb8w9tsvP410HJV+uK0 tsY1gOHp82fp5SQ8pg2gqr5bEBzcXKjgffA2I1tw= Date: Tue, 22 Mar 2022 14:41:56 -0700 To: ziy@nvidia.com,rientjes@google.com,peterx@redhat.com,mike.kravetz@oracle.com,lars.persson@axis.com,kirill.shutemov@linux.intel.com,fam.zheng@bytedance.com,duanxiongchun@bytedance.com,axelrasmussen@google.com,songmuchun@bytedance.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 067/227] mm: fix missing cache flush for all tail pages of compound page Message-Id: <20220322214157.0D905C340EC@smtp.kernel.org> Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=0lYBmuN9; spf=pass (imf01.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspam-User: X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 345A040030 X-Stat-Signature: 8eq9ma4o68qw8uj7nhac3ascx8ojwpmg X-HE-Tag: 1647985318-649447 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Muchun Song Subject: mm: fix missing cache flush for all tail pages of compound page The D-cache maintenance inside move_to_new_page() only consider one page, there is still D-cache maintenance issue for tail pages of compound page (e.g. THP or HugeTLB). THP migration is only enabled on x86_64, ARM64 and powerpc, while powerpc and arm64 need to maintain the consistency between I-Cache and D-Cache, which depends on flush_dcache_page() to maintain the consistency between I-Cache and D-Cache. But there is no issues on arm64 and powerpc since they already considers the compound page cache flushing in their icache flush function. HugeTLB migration is enabled on arm, arm64, mips, parisc, powerpc, riscv, s390 and sh, while arm has handled the compound page cache flush in flush_dcache_page(), but most others do not. In theory, the issue exists on many architectures. Fix this by not using flush_dcache_folio() since it is not backportable. Link: https://lkml.kernel.org/r/20220210123058.79206-3-songmuchun@bytedance.com Fixes: 290408d4a250 ("hugetlb: hugepage migration core") Signed-off-by: Muchun Song Reviewed-by: Zi Yan Cc: Axel Rasmussen Cc: David Rientjes Cc: Fam Zheng Cc: Kirill A. Shutemov Cc: Lars Persson Cc: Mike Kravetz Cc: Peter Xu Cc: Xiongchun Duan Signed-off-by: Andrew Morton --- mm/migrate.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) --- a/mm/migrate.c~mm-fix-missing-cache-flush-for-all-tail-pages-of-compound-page +++ a/mm/migrate.c @@ -916,9 +916,12 @@ static int move_to_new_page(struct page if (!PageMappingFlags(page)) page->mapping = NULL; - if (likely(!is_zone_device_page(newpage))) - flush_dcache_page(newpage); + if (likely(!is_zone_device_page(newpage))) { + int i, nr = compound_nr(newpage); + for (i = 0; i < nr; i++) + flush_dcache_page(newpage + i); + } } out: return rc; From patchwork Tue Mar 22 21:41:59 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789108 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2EDBBC433F5 for ; Tue, 22 Mar 2022 21:42:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CE12A6B00A9; Tue, 22 Mar 2022 17:42:03 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C3FBC6B00AA; Tue, 22 Mar 2022 17:42:03 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A1BBC6B00AB; Tue, 22 Mar 2022 17:42:03 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0159.hostedemail.com [216.40.44.159]) by kanga.kvack.org (Postfix) with ESMTP id 892606B00A9 for ; Tue, 22 Mar 2022 17:42:03 -0400 (EDT) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 48AA9A0FAD for ; Tue, 22 Mar 2022 21:42:03 +0000 (UTC) X-FDA: 79273345326.26.01DABD5 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf30.hostedemail.com (Postfix) with ESMTP id BE5208000C for ; Tue, 22 Mar 2022 21:42:02 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id A11EFB81DAF; Tue, 22 Mar 2022 21:42:01 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 28467C36AE3; Tue, 22 Mar 2022 21:42:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985320; bh=pvulDPwyizhgGDxa4jhwBc0U3Z5T0u8pV4nxemqXHIo=; h=Date:To:From:In-Reply-To:Subject:From; b=PvCTD+JzzxO/UlwcN4mHCJbJ4+rUhaARV7ubSxnLxFkENXo2LFcyV+P1fVYvg9FHz 3IUvp9keQEPtHk5ZVtX6QiCOw0hG9NhRLCUGY2eYZnogQtdX5iNrOwQu04Qnr1jkoV M3etRAbIjHBkdEThQDt2D1YCf3hT5qOaKrocjV2Q= Date: Tue, 22 Mar 2022 14:41:59 -0700 To: ziy@nvidia.com,rientjes@google.com,peterx@redhat.com,mike.kravetz@oracle.com,lars.persson@axis.com,kirill.shutemov@linux.intel.com,fam.zheng@bytedance.com,duanxiongchun@bytedance.com,axelrasmussen@google.com,songmuchun@bytedance.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 068/227] mm: hugetlb: fix missing cache flush in copy_huge_page_from_user() Message-Id: <20220322214200.28467C36AE3@smtp.kernel.org> X-Rspamd-Server: rspam09 X-Rspam-User: X-Stat-Signature: 1nos8jxxogynqn1hucr75dqxbb5yumyc Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=PvCTD+Jz; dmarc=none; spf=pass (imf30.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Rspamd-Queue-Id: BE5208000C X-HE-Tag: 1647985322-632772 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Muchun Song Subject: mm: hugetlb: fix missing cache flush in copy_huge_page_from_user() userfaultfd calls copy_huge_page_from_user() which does not do any cache flushing for the target page. Then the target page will be mapped to the user space with a different address (user address), which might have an alias issue with the kernel address used to copy the data from the user to. Fix this issue by flushing dcache in copy_huge_page_from_user(). Link: https://lkml.kernel.org/r/20220210123058.79206-4-songmuchun@bytedance.com Fixes: fa4d75c1de13 ("userfaultfd: hugetlbfs: add copy_huge_page_from_user for hugetlb userfaultfd support") Signed-off-by: Muchun Song Reviewed-by: Mike Kravetz Cc: Axel Rasmussen Cc: David Rientjes Cc: Fam Zheng Cc: Kirill A. Shutemov Cc: Lars Persson Cc: Peter Xu Cc: Xiongchun Duan Cc: Zi Yan Signed-off-by: Andrew Morton --- mm/memory.c | 2 ++ 1 file changed, 2 insertions(+) --- a/mm/memory.c~mm-hugetlb-fix-missing-cache-flush-in-copy_huge_page_from_user +++ a/mm/memory.c @@ -5444,6 +5444,8 @@ long copy_huge_page_from_user(struct pag if (rc) break; + flush_dcache_page(subpage); + cond_resched(); } return ret_val; From patchwork Tue Mar 22 21:42:02 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789109 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8BD60C433F5 for ; Tue, 22 Mar 2022 21:42:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6E9296B00AB; Tue, 22 Mar 2022 17:42:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 673A96B00AC; Tue, 22 Mar 2022 17:42:05 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 49E5A6B00AD; Tue, 22 Mar 2022 17:42:05 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0236.hostedemail.com [216.40.44.236]) by kanga.kvack.org (Postfix) with ESMTP id 2FB7F6B00AB for ; Tue, 22 Mar 2022 17:42:05 -0400 (EDT) Received: from smtpin30.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id E90CD8249980 for ; Tue, 22 Mar 2022 21:42:04 +0000 (UTC) X-FDA: 79273345368.30.B83B9D7 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf11.hostedemail.com (Postfix) with ESMTP id 86A254000A for ; Tue, 22 Mar 2022 21:42:04 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id E315960A1B; Tue, 22 Mar 2022 21:42:03 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4160AC340EC; Tue, 22 Mar 2022 21:42:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985323; bh=CUWR0+8hDBfmEuWKrPR7y1DbelYcnQOgiVeFj9pX1Po=; h=Date:To:From:In-Reply-To:Subject:From; b=WZcXpGybdIqznUaKJACHWOlloWUkGpqNMF4Bq6GB0Ddi8heGmeAIa9LgZGhd1TzFM 3WEmx8cvpKJdHYaeAbiIh4DTXKrY+Vnng3sCF1A+COE5g4W1p7H8mDXuBuSIqzGPHS ObIRNEKSTYC8g6cdIrMDMf6aC5eUNmQWuoNksgFU= Date: Tue, 22 Mar 2022 14:42:02 -0700 To: ziy@nvidia.com,rientjes@google.com,peterx@redhat.com,mike.kravetz@oracle.com,lars.persson@axis.com,kirill.shutemov@linux.intel.com,fam.zheng@bytedance.com,duanxiongchun@bytedance.com,axelrasmussen@google.com,songmuchun@bytedance.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 069/227] mm: hugetlb: fix missing cache flush in hugetlb_mcopy_atomic_pte() Message-Id: <20220322214203.4160AC340EC@smtp.kernel.org> X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 86A254000A X-Rspam-User: Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=WZcXpGyb; dmarc=none; spf=pass (imf11.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Stat-Signature: 1ruo68chm7dfaa9epgjp6ryhiywwfone X-HE-Tag: 1647985324-856208 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Muchun Song Subject: mm: hugetlb: fix missing cache flush in hugetlb_mcopy_atomic_pte() folio_copy() will copy the data from one page to the target page, then the target page will be mapped to the user space address, which might have an alias issue with the kernel address used to copy the data from the page to. There are 2 ways to fix this issue. 1) insert flush_dcache_page() after folio_copy(). 2) replace folio_copy() with copy_user_huge_page() which already considers the cache maintenance. We chose 2) way to fix the issue since architectures can optimize this situation. It is also make backports easier. Link: https://lkml.kernel.org/r/20220210123058.79206-5-songmuchun@bytedance.com Fixes: 8cc5fcbb5be8 ("mm, hugetlb: fix racy resv_huge_pages underflow on UFFDIO_COPY") Signed-off-by: Muchun Song Reviewed-by: Mike Kravetz Cc: Axel Rasmussen Cc: David Rientjes Cc: Fam Zheng Cc: Kirill A. Shutemov Cc: Lars Persson Cc: Peter Xu Cc: Xiongchun Duan Cc: Zi Yan Signed-off-by: Andrew Morton --- mm/hugetlb.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) --- a/mm/hugetlb.c~mm-hugetlb-fix-missing-cache-flush-in-hugetlb_mcopy_atomic_pte +++ a/mm/hugetlb.c @@ -5816,7 +5816,8 @@ int hugetlb_mcopy_atomic_pte(struct mm_s *pagep = NULL; goto out; } - folio_copy(page_folio(page), page_folio(*pagep)); + copy_user_huge_page(page, *pagep, dst_addr, dst_vma, + pages_per_huge_page(h)); put_page(*pagep); *pagep = NULL; } From patchwork Tue Mar 22 21:42:05 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789110 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 46AA6C433EF for ; Tue, 22 Mar 2022 21:42:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1BA286B00AD; Tue, 22 Mar 2022 17:42:08 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1441C6B00AE; Tue, 22 Mar 2022 17:42:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id ED7B06B00AF; Tue, 22 Mar 2022 17:42:07 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0148.hostedemail.com [216.40.44.148]) by kanga.kvack.org (Postfix) with ESMTP id CD6016B00AD for ; Tue, 22 Mar 2022 17:42:07 -0400 (EDT) Received: from smtpin21.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 8CBBA1828AC92 for ; Tue, 22 Mar 2022 21:42:07 +0000 (UTC) X-FDA: 79273345494.21.C29E59C Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf11.hostedemail.com (Postfix) with ESMTP id 21B6840012 for ; Tue, 22 Mar 2022 21:42:07 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 953206104C; Tue, 22 Mar 2022 21:42:06 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 56E32C340F5; Tue, 22 Mar 2022 21:42:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985326; bh=Ggz33SJm+G5IlaykbrhNZ7fUyEpBzLe9xRoIA5YhLXc=; h=Date:To:From:In-Reply-To:Subject:From; b=rm9xtmJmEgVCqnlgZnbXdqcvs3ysN86UdsWn6J05jUxTY45VkW5jMEwBY1YTcw3iy K52fASLSV9xfl4n6tCwdBTOCdAarbUYTzAkdu4lt2q0qhqUzK5CCQ4IpUPwTHxuf6r qgGfFa4NmdnYCzL1gIttWPTpfjl/4ii8VoGowC44= Date: Tue, 22 Mar 2022 14:42:05 -0700 To: ziy@nvidia.com,rientjes@google.com,peterx@redhat.com,mike.kravetz@oracle.com,lars.persson@axis.com,kirill.shutemov@linux.intel.com,fam.zheng@bytedance.com,duanxiongchun@bytedance.com,axelrasmussen@google.com,songmuchun@bytedance.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 070/227] mm: shmem: fix missing cache flush in shmem_mfill_atomic_pte() Message-Id: <20220322214206.56E32C340F5@smtp.kernel.org> X-Rspam-User: X-Stat-Signature: 5wgshbpqbsd1wg4tgqc5sbt5ffyg57ox Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=rm9xtmJm; spf=pass (imf11.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 21B6840012 X-HE-Tag: 1647985327-75796 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Muchun Song Subject: mm: shmem: fix missing cache flush in shmem_mfill_atomic_pte() userfaultfd calls shmem_mfill_atomic_pte() which does not do any cache flushing for the target page. Then the target page will be mapped to the user space with a different address (user address), which might have an alias issue with the kernel address used to copy the data from the user to. Insert flush_dcache_page() in non-zero-page case. And replace clear_highpage() with clear_user_highpage() which already considers the cache maintenance. Link: https://lkml.kernel.org/r/20220210123058.79206-6-songmuchun@bytedance.com Fixes: 8d1039634206 ("userfaultfd: shmem: add shmem_mfill_zeropage_pte for userfaultfd support") Fixes: 4c27fe4c4c84 ("userfaultfd: shmem: add shmem_mcopy_atomic_pte for userfaultfd support") Signed-off-by: Muchun Song Reviewed-by: Mike Kravetz Cc: Axel Rasmussen Cc: David Rientjes Cc: Fam Zheng Cc: Kirill A. Shutemov Cc: Lars Persson Cc: Peter Xu Cc: Xiongchun Duan Cc: Zi Yan Signed-off-by: Andrew Morton --- mm/shmem.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) --- a/mm/shmem.c~mm-shmem-fix-missing-cache-flush-in-shmem_mfill_atomic_pte +++ a/mm/shmem.c @@ -2364,8 +2364,10 @@ int shmem_mfill_atomic_pte(struct mm_str /* don't free the page */ goto out_unacct_blocks; } + + flush_dcache_page(page); } else { /* ZEROPAGE */ - clear_highpage(page); + clear_user_highpage(page, dst_addr); } } else { page = *pagep; From patchwork Tue Mar 22 21:42:08 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789111 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 092BBC433F5 for ; Tue, 22 Mar 2022 21:42:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9A9966B00B0; Tue, 22 Mar 2022 17:42:11 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9328E6B00B1; Tue, 22 Mar 2022 17:42:11 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7D1B36B00B2; Tue, 22 Mar 2022 17:42:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0076.hostedemail.com [216.40.44.76]) by kanga.kvack.org (Postfix) with ESMTP id 630AE6B00B0 for ; Tue, 22 Mar 2022 17:42:11 -0400 (EDT) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 2685E8249980 for ; Tue, 22 Mar 2022 21:42:11 +0000 (UTC) X-FDA: 79273345662.23.2C895F7 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf01.hostedemail.com (Postfix) with ESMTP id A769940034 for ; Tue, 22 Mar 2022 21:42:10 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 187C360A1B; Tue, 22 Mar 2022 21:42:10 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7207CC340EC; Tue, 22 Mar 2022 21:42:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985329; bh=fBURoQBf/NowRAcjUkoyhi0STImz+EpOKIqpkfRNgXY=; h=Date:To:From:In-Reply-To:Subject:From; b=FlYP2pY1YK9tZkBy5h0DoAbInaMf/yVsWIYfUyzfn4cF3vmwcpgIGT/ig5W5gwrr6 ZJf5skhU194M5+cNoz/vd7JNrk/kzUemIb7lax6/bleB+Np26O/3dv4wtsIx51hfio U2rf9OQ5bYaz/vTL1VzlMbFyozyycBD9Wwi1VdAU= Date: Tue, 22 Mar 2022 14:42:08 -0700 To: ziy@nvidia.com,rientjes@google.com,peterx@redhat.com,mike.kravetz@oracle.com,lars.persson@axis.com,kirill.shutemov@linux.intel.com,fam.zheng@bytedance.com,duanxiongchun@bytedance.com,axelrasmussen@google.com,songmuchun@bytedance.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 071/227] mm: userfaultfd: fix missing cache flush in mcopy_atomic_pte() and __mcopy_atomic() Message-Id: <20220322214209.7207CC340EC@smtp.kernel.org> X-Rspamd-Server: rspam09 X-Rspam-User: X-Stat-Signature: tzpz3ewb9zp7x17zang8q8suziwy6ih3 Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=FlYP2pY1; dmarc=none; spf=pass (imf01.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Rspamd-Queue-Id: A769940034 X-HE-Tag: 1647985330-717461 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Muchun Song Subject: mm: userfaultfd: fix missing cache flush in mcopy_atomic_pte() and __mcopy_atomic() userfaultfd calls mcopy_atomic_pte() and __mcopy_atomic() which do not do any cache flushing for the target page. Then the target page will be mapped to the user space with a different address (user address), which might have an alias issue with the kernel address used to copy the data from the user to. Fix this by insert flush_dcache_page() after copy_from_user() succeeds. Link: https://lkml.kernel.org/r/20220210123058.79206-7-songmuchun@bytedance.com Fixes: b6ebaedb4cb1 ("userfaultfd: avoid mmap_sem read recursion in mcopy_atomic") Fixes: c1a4de99fada ("userfaultfd: mcopy_atomic|mfill_zeropage: UFFDIO_COPY|UFFDIO_ZEROPAGE preparation") Signed-off-by: Muchun Song Cc: Axel Rasmussen Cc: David Rientjes Cc: Fam Zheng Cc: Kirill A. Shutemov Cc: Lars Persson Cc: Mike Kravetz Cc: Peter Xu Cc: Xiongchun Duan Cc: Zi Yan Signed-off-by: Andrew Morton --- mm/userfaultfd.c | 3 +++ 1 file changed, 3 insertions(+) --- a/mm/userfaultfd.c~mm-userfaultfd-fix-missing-cache-flush-in-mcopy_atomic_pte-and-__mcopy_atomic +++ a/mm/userfaultfd.c @@ -150,6 +150,8 @@ static int mcopy_atomic_pte(struct mm_st /* don't free the page */ goto out; } + + flush_dcache_page(page); } else { page = *pagep; *pagep = NULL; @@ -625,6 +627,7 @@ retry: err = -EFAULT; goto out; } + flush_dcache_page(page); goto retry; } else BUG_ON(page); From patchwork Tue Mar 22 21:42:11 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789112 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E15C9C433F5 for ; Tue, 22 Mar 2022 21:42:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6EB196B00B2; Tue, 22 Mar 2022 17:42:17 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6A58B6B00B3; Tue, 22 Mar 2022 17:42:17 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5140D6B00B4; Tue, 22 Mar 2022 17:42:17 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.26]) by kanga.kvack.org (Postfix) with ESMTP id 39DD76B00B2 for ; Tue, 22 Mar 2022 17:42:17 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay11.hostedemail.com (Postfix) with ESMTP id 0779681A3E for ; Tue, 22 Mar 2022 21:42:17 +0000 (UTC) X-FDA: 79273345914.01.3409BF4 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf14.hostedemail.com (Postfix) with ESMTP id DFD67100025 for ; Tue, 22 Mar 2022 21:42:12 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id B4DA66101E; Tue, 22 Mar 2022 21:42:12 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 79B3BC340EC; Tue, 22 Mar 2022 21:42:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985332; bh=ktw58zu8xgeKHCa3X+x2f4+6x3oBLULyuw0PLP0DeCE=; h=Date:To:From:In-Reply-To:Subject:From; b=jpR7VHY/RGLdGpuQJbzxribKOCS9jZvbwYbAe1LtsPLrOIGuCaEjTRy5IkAUcgnzd dJmt2NiUWEPr8FshxzgMM4Fi956HcAhdMfJEeiH+VxAB2/vjH6hzNK8OU/dRp34ApI OdbcUpPBPw6wJRZhC/XiGqYCDmx9h3bSr5MwDROM= Date: Tue, 22 Mar 2022 14:42:11 -0700 To: ziy@nvidia.com,rientjes@google.com,peterx@redhat.com,mike.kravetz@oracle.com,lars.persson@axis.com,kirill.shutemov@linux.intel.com,fam.zheng@bytedance.com,duanxiongchun@bytedance.com,axelrasmussen@google.com,songmuchun@bytedance.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 072/227] mm: replace multiple dcache flush with flush_dcache_folio() Message-Id: <20220322214212.79B3BC340EC@smtp.kernel.org> X-Stat-Signature: z1eoxqkb5c5m53681r8fzfcodgycdtsh Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b="jpR7VHY/"; spf=pass (imf14.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: DFD67100025 X-HE-Tag: 1647985332-597714 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Muchun Song Subject: mm: replace multiple dcache flush with flush_dcache_folio() Simplify the code by using flush_dcache_folio(). Link: https://lkml.kernel.org/r/20220210123058.79206-8-songmuchun@bytedance.com Signed-off-by: Muchun Song Reviewed-by: Mike Kravetz Cc: Axel Rasmussen Cc: David Rientjes Cc: Fam Zheng Cc: Kirill A. Shutemov Cc: Lars Persson Cc: Peter Xu Cc: Xiongchun Duan Cc: Zi Yan Signed-off-by: Andrew Morton --- mm/migrate.c | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) --- a/mm/migrate.c~mm-replace-multiple-dcache-flush-with-flush_dcache_folio +++ a/mm/migrate.c @@ -916,12 +916,8 @@ static int move_to_new_page(struct page if (!PageMappingFlags(page)) page->mapping = NULL; - if (likely(!is_zone_device_page(newpage))) { - int i, nr = compound_nr(newpage); - - for (i = 0; i < nr; i++) - flush_dcache_page(newpage + i); - } + if (likely(!is_zone_device_page(newpage))) + flush_dcache_folio(page_folio(newpage)); } out: return rc; From patchwork Tue Mar 22 21:42:15 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789113 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BD62CC43217 for ; Tue, 22 Mar 2022 21:42:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5155F6B00B4; Tue, 22 Mar 2022 17:42:19 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 452E36B00B5; Tue, 22 Mar 2022 17:42:19 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2A31C6B00B6; Tue, 22 Mar 2022 17:42:19 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.26]) by kanga.kvack.org (Postfix) with ESMTP id 1581C6B00B4 for ; Tue, 22 Mar 2022 17:42:19 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay13.hostedemail.com (Postfix) with ESMTP id C1DEB613CA for ; Tue, 22 Mar 2022 21:42:18 +0000 (UTC) X-FDA: 79273345956.08.D13810A Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf13.hostedemail.com (Postfix) with ESMTP id 3E89F2001F for ; Tue, 22 Mar 2022 21:42:18 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 2030FB81D77; Tue, 22 Mar 2022 21:42:17 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id B2B8DC340EC; Tue, 22 Mar 2022 21:42:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985335; bh=KRojZYKCv6yjC0VBAD6SaG4fLDLF6PUf95HEkgydt+g=; h=Date:To:From:In-Reply-To:Subject:From; b=oXxrjkqzCdbkSb23O9q5Uy+DNgI+YLq2CrJMrygfwOtZD2882/RN8Gjp7yrWzCdp7 31Fy0HTifLsiwh1bVD0PiE3iAp5yBgS1IAbVzWaqIjSU/CRGc5Rawgp/myspUc07eF QcEsAVdpB9AftpfCMFBZHna9zxl5987F+ISbdV00= Date: Tue, 22 Mar 2022 14:42:15 -0700 To: willy@infradead.org,vbabka@suse.cz,stable@vger.kernel.org,shy828301@gmail.com,kirill@shutemov.name,jhubbard@nvidia.com,hughd@google.com,david@redhat.com,apopple@nvidia.com,aarcange@redhat.com,peterx@redhat.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 073/227] mm: don't skip swap entry even if zap_details specified Message-Id: <20220322214215.B2B8DC340EC@smtp.kernel.org> X-Rspam-User: X-Stat-Signature: twhoegbbsu9qganqbq5h48bryrr3ew1i Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=oXxrjkqz; spf=pass (imf13.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 3E89F2001F X-HE-Tag: 1647985338-193452 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Peter Xu Subject: mm: don't skip swap entry even if zap_details specified Patch series "mm: Rework zap ptes on swap entries", v5. Patch 1 should fix a long standing bug for zap_pte_range() on zap_details usage. The risk is we could have some swap entries skipped while we should have zapped them. Migration entries are not the major concern because file backed memory always zap in the pattern that "first time without page lock, then re-zap with page lock" hence the 2nd zap will always make sure all migration entries are already recovered. However there can be issues with real swap entries got skipped errornoously. There's a reproducer provided in commit message of patch 1 for that. Patch 2-4 are cleanups that are based on patch 1. After the whole patchset applied, we should have a very clean view of zap_pte_range(). Only patch 1 needs to be backported to stable if necessary. This patch (of 4): The "details" pointer shouldn't be the token to decide whether we should skip swap entries. For example, when the callers specified details->zap_mapping==NULL, it means the user wants to zap all the pages (including COWed pages), then we need to look into swap entries because there can be private COWed pages that was swapped out. Skipping some swap entries when details is non-NULL may lead to wrongly leaving some of the swap entries while we should have zapped them. A reproducer of the problem: ===8<=== #define _GNU_SOURCE /* See feature_test_macros(7) */ #include #include #include #include #include int page_size; int shmem_fd; char *buffer; void main(void) { int ret; char val; page_size = getpagesize(); shmem_fd = memfd_create("test", 0); assert(shmem_fd >= 0); ret = ftruncate(shmem_fd, page_size * 2); assert(ret == 0); buffer = mmap(NULL, page_size * 2, PROT_READ | PROT_WRITE, MAP_PRIVATE, shmem_fd, 0); assert(buffer != MAP_FAILED); /* Write private page, swap it out */ buffer[page_size] = 1; madvise(buffer, page_size * 2, MADV_PAGEOUT); /* This should drop private buffer[page_size] already */ ret = ftruncate(shmem_fd, page_size); assert(ret == 0); /* Recover the size */ ret = ftruncate(shmem_fd, page_size * 2); assert(ret == 0); /* Re-read the data, it should be all zero */ val = buffer[page_size]; if (val == 0) printf("Good\n"); else printf("BUG\n"); } ===8<=== We don't need to touch up the pmd path, because pmd never had a issue with swap entries. For example, shmem pmd migration will always be split into pte level, and same to swapping on anonymous. Add another helper should_zap_cows() so that we can also check whether we should zap private mappings when there's no page pointer specified. This patch drops that trick, so we handle swap ptes coherently. Meanwhile we should do the same check upon migration entry, hwpoison entry and genuine swap entries too. To be explicit, we should still remember to keep the private entries if even_cows==false, and always zap them when even_cows==true. The issue seems to exist starting from the initial commit of git. [peterx@redhat.com: comment tweaks] Link: https://lkml.kernel.org/r/20220217060746.71256-2-peterx@redhat.com Link: https://lkml.kernel.org/r/20220217060746.71256-1-peterx@redhat.com Link: https://lkml.kernel.org/r/20220216094810.60572-1-peterx@redhat.com Link: https://lkml.kernel.org/r/20220216094810.60572-2-peterx@redhat.com Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Peter Xu Reviewed-by: John Hubbard Cc: David Hildenbrand Cc: Hugh Dickins Cc: Alistair Popple Cc: Andrea Arcangeli Cc: "Kirill A . Shutemov" Cc: Matthew Wilcox Cc: Vlastimil Babka Cc: Yang Shi Cc: Signed-off-by: Andrew Morton --- mm/memory.c | 40 +++++++++++++++++++++++++++++++--------- 1 file changed, 31 insertions(+), 9 deletions(-) --- a/mm/memory.c~mm-dont-skip-swap-entry-even-if-zap_details-specified +++ a/mm/memory.c @@ -1313,6 +1313,17 @@ struct zap_details { struct folio *single_folio; /* Locked folio to be unmapped */ }; +/* Whether we should zap all COWed (private) pages too */ +static inline bool should_zap_cows(struct zap_details *details) +{ + /* By default, zap all pages */ + if (!details) + return true; + + /* Or, we zap COWed pages only if the caller wants to */ + return !details->zap_mapping; +} + /* * We set details->zap_mapping when we want to unmap shared but keep private * pages. Return true if skip zapping this page, false otherwise. @@ -1320,11 +1331,15 @@ struct zap_details { static inline bool zap_skip_check_mapping(struct zap_details *details, struct page *page) { - if (!details || !page) + /* If we can make a decision without *page.. */ + if (should_zap_cows(details)) + return false; + + /* E.g. the caller passes NULL for the case of a zero page */ + if (!page) return false; - return details->zap_mapping && - (details->zap_mapping != page_rmapping(page)); + return details->zap_mapping != page_rmapping(page); } static unsigned long zap_pte_range(struct mmu_gather *tlb, @@ -1405,17 +1420,24 @@ again: continue; } - /* If details->check_mapping, we leave swap entries. */ - if (unlikely(details)) - continue; - - if (!non_swap_entry(entry)) + if (!non_swap_entry(entry)) { + /* Genuine swap entry, hence a private anon page */ + if (!should_zap_cows(details)) + continue; rss[MM_SWAPENTS]--; - else if (is_migration_entry(entry)) { + } else if (is_migration_entry(entry)) { struct page *page; page = pfn_swap_entry_to_page(entry); + if (zap_skip_check_mapping(details, page)) + continue; rss[mm_counter(page)]--; + } else if (is_hwpoison_entry(entry)) { + if (!should_zap_cows(details)) + continue; + } else { + /* We should have covered all the swap entry types */ + WARN_ON_ONCE(1); } if (unlikely(!free_swap_and_cache(entry))) print_bad_pte(vma, addr, ptent, NULL); From patchwork Tue Mar 22 21:42:18 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789114 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 803C4C4332F for ; Tue, 22 Mar 2022 21:42:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 16AA36B00B6; Tue, 22 Mar 2022 17:42:22 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0F4E56B00B7; Tue, 22 Mar 2022 17:42:22 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EFCE56B00B8; Tue, 22 Mar 2022 17:42:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0210.hostedemail.com [216.40.44.210]) by kanga.kvack.org (Postfix) with ESMTP id DC0626B00B6 for ; Tue, 22 Mar 2022 17:42:21 -0400 (EDT) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id A6C3AA4DBE for ; Tue, 22 Mar 2022 21:42:21 +0000 (UTC) X-FDA: 79273346082.22.1143CF8 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf02.hostedemail.com (Postfix) with ESMTP id 14EC080010 for ; Tue, 22 Mar 2022 21:42:20 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 0AE60B81D9E; Tue, 22 Mar 2022 21:42:20 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id BA2AEC340EE; Tue, 22 Mar 2022 21:42:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985338; bh=WVtPcyz9kcIwZgZyI2+nTxY5HfwCQe0PeJBQjxrXGPU=; h=Date:To:From:In-Reply-To:Subject:From; b=JaxSM9itPcAF6XKUl12FYSRGmuPAawHyl6L6isRVmH9Y0h/KdwWt2MdOLPaD1fkv8 HouhNcXHy6NjfSE5XB3ZmCwWfigg3WClLf5XTWC2ha4uviiHvxFo8FwFXtlEn/WvYX jEK+W6RNbZBcmYY+o3vtV0rOvwVT/lJ+LTAE5W98= Date: Tue, 22 Mar 2022 14:42:18 -0700 To: willy@infradead.org,vbabka@suse.cz,shy828301@gmail.com,kirill@shutemov.name,jhubbard@nvidia.com,hughd@google.com,david@redhat.com,apopple@nvidia.com,aarcange@redhat.com,peterx@redhat.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 074/227] mm: rename zap_skip_check_mapping() to should_zap_page() Message-Id: <20220322214218.BA2AEC340EE@smtp.kernel.org> X-Rspamd-Server: rspam09 X-Rspam-User: X-Stat-Signature: co5sbghqo1g6wm6un8io768ue9zmoafb Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=JaxSM9it; dmarc=none; spf=pass (imf02.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Rspamd-Queue-Id: 14EC080010 X-HE-Tag: 1647985340-615072 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Peter Xu Subject: mm: rename zap_skip_check_mapping() to should_zap_page() The previous name is against the natural way people think. Invert the meaning and also the return value. No functional change intended. Link: https://lkml.kernel.org/r/20220216094810.60572-3-peterx@redhat.com Signed-off-by: Peter Xu Suggested-by: David Hildenbrand Suggested-by: Hugh Dickins Reviewed-by: David Hildenbrand Reviewed-by: John Hubbard Cc: Alistair Popple Cc: Andrea Arcangeli Cc: "Kirill A . Shutemov" Cc: Matthew Wilcox Cc: Vlastimil Babka Cc: Yang Shi Signed-off-by: Andrew Morton --- mm/memory.c | 17 ++++++++--------- 1 file changed, 8 insertions(+), 9 deletions(-) --- a/mm/memory.c~mm-rename-zap_skip_check_mapping-to-should_zap_page +++ a/mm/memory.c @@ -1326,20 +1326,19 @@ static inline bool should_zap_cows(struc /* * We set details->zap_mapping when we want to unmap shared but keep private - * pages. Return true if skip zapping this page, false otherwise. + * pages. Return true if we should zap this page, false otherwise. */ -static inline bool -zap_skip_check_mapping(struct zap_details *details, struct page *page) +static inline bool should_zap_page(struct zap_details *details, struct page *page) { /* If we can make a decision without *page.. */ if (should_zap_cows(details)) - return false; + return true; /* E.g. the caller passes NULL for the case of a zero page */ if (!page) - return false; + return true; - return details->zap_mapping != page_rmapping(page); + return details->zap_mapping == page_rmapping(page); } static unsigned long zap_pte_range(struct mmu_gather *tlb, @@ -1374,7 +1373,7 @@ again: struct page *page; page = vm_normal_page(vma, addr, ptent); - if (unlikely(zap_skip_check_mapping(details, page))) + if (unlikely(!should_zap_page(details, page))) continue; ptent = ptep_get_and_clear_full(mm, addr, pte, tlb->fullmm); @@ -1408,7 +1407,7 @@ again: is_device_exclusive_entry(entry)) { struct page *page = pfn_swap_entry_to_page(entry); - if (unlikely(zap_skip_check_mapping(details, page))) + if (unlikely(!should_zap_page(details, page))) continue; pte_clear_not_present_full(mm, addr, pte, tlb->fullmm); rss[mm_counter(page)]--; @@ -1429,7 +1428,7 @@ again: struct page *page; page = pfn_swap_entry_to_page(entry); - if (zap_skip_check_mapping(details, page)) + if (!should_zap_page(details, page)) continue; rss[mm_counter(page)]--; } else if (is_hwpoison_entry(entry)) { From patchwork Tue Mar 22 21:42:21 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789115 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2839BC433EF for ; Tue, 22 Mar 2022 21:42:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id ABCB56B00B8; Tue, 22 Mar 2022 17:42:23 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 896266B00B9; Tue, 22 Mar 2022 17:42:23 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6E7BE6B00BA; Tue, 22 Mar 2022 17:42:23 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.a.hostedemail.com [64.99.140.24]) by kanga.kvack.org (Postfix) with ESMTP id 5A25F6B00B8 for ; Tue, 22 Mar 2022 17:42:23 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 2FFB621965 for ; Tue, 22 Mar 2022 21:42:23 +0000 (UTC) X-FDA: 79273346166.08.1C8061C Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf08.hostedemail.com (Postfix) with ESMTP id A0AB5160028 for ; Tue, 22 Mar 2022 21:42:22 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 1A6FF611D1; Tue, 22 Mar 2022 21:42:22 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id D2EEDC340F4; Tue, 22 Mar 2022 21:42:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985341; bh=SDCOxudM00qlDYU/6AB5sF1BxrVnfvgjR6MK5V0pBrA=; h=Date:To:From:In-Reply-To:Subject:From; b=tFkcQct3t1N1wJC4DZg6cWs2vbJasz9T/nUj/aW4JSc2OIXqXe9I0bWtlQHQ9hWPQ BhuJNHmWRKcwE+DbsFD2dW8Z1/MSAPJV8uH4nHJAc28rrWzZMRVgRQdiLqKMRVhF1W nRbM/yco5zZDbYbXeUESmxovvK8xJjrQKC6mS/fs= Date: Tue, 22 Mar 2022 14:42:21 -0700 To: willy@infradead.org,vbabka@suse.cz,shy828301@gmail.com,kirill@shutemov.name,jhubbard@nvidia.com,hughd@google.com,david@redhat.com,apopple@nvidia.com,aarcange@redhat.com,peterx@redhat.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 075/227] mm: change zap_details.zap_mapping into even_cows Message-Id: <20220322214221.D2EEDC340F4@smtp.kernel.org> X-Stat-Signature: 8nfdhmbfzantfajyeez3fw1jqtefmmp6 Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=tFkcQct3; spf=pass (imf08.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspam-User: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: A0AB5160028 X-HE-Tag: 1647985342-148167 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Peter Xu Subject: mm: change zap_details.zap_mapping into even_cows Currently we have a zap_mapping pointer maintained in zap_details, when it is specified we only want to zap the pages that has the same mapping with what the caller has specified. But what we want to do is actually simpler: we want to skip zapping private (COW-ed) pages in some cases. We can refer to unmap_mapping_pages() callers where we could have passed in different even_cows values. The other user is unmap_mapping_folio() where we always want to skip private pages. According to Hugh, we used a mapping pointer for historical reason, as explained here: https://lore.kernel.org/lkml/391aa58d-ce84-9d4-d68d-d98a9c533255@google.com/ Quoting partly from Hugh: Which raises the question again of why I did not just use a boolean flag there originally: aah, I think I've found why. In those days there was a horrible "optimization", for better performance on some benchmark I guess, which when you read from /dev/zero into a private mapping, would map the zero page there (look up read_zero_pagealigned() and zeromap_page_range() if you dare). So there was another category of page to be skipped along with the anon COWs, and I didn't want multiple tests in the zap loop, so checking check_mapping against page->mapping did both. I think nowadays you could do it by checking for PageAnon page (or genuine swap entry) instead. This patch replaces the zap_details.zap_mapping pointer into the even_cows boolean, then we check it against PageAnon. Link: https://lkml.kernel.org/r/20220216094810.60572-4-peterx@redhat.com Signed-off-by: Peter Xu Suggested-by: Hugh Dickins Reviewed-by: John Hubbard Cc: David Hildenbrand Cc: Alistair Popple Cc: Andrea Arcangeli Cc: "Kirill A . Shutemov" Cc: Matthew Wilcox Cc: Vlastimil Babka Cc: Yang Shi Signed-off-by: Andrew Morton --- mm/memory.c | 16 +++++++--------- 1 file changed, 7 insertions(+), 9 deletions(-) --- a/mm/memory.c~mm-change-zap_detailszap_mapping-into-even_cows +++ a/mm/memory.c @@ -1309,8 +1309,8 @@ copy_page_range(struct vm_area_struct *d * Parameter block passed down to zap_pte_range in exceptional cases. */ struct zap_details { - struct address_space *zap_mapping; /* Check page->mapping if set */ struct folio *single_folio; /* Locked folio to be unmapped */ + bool even_cows; /* Zap COWed private pages too? */ }; /* Whether we should zap all COWed (private) pages too */ @@ -1321,13 +1321,10 @@ static inline bool should_zap_cows(struc return true; /* Or, we zap COWed pages only if the caller wants to */ - return !details->zap_mapping; + return details->even_cows; } -/* - * We set details->zap_mapping when we want to unmap shared but keep private - * pages. Return true if we should zap this page, false otherwise. - */ +/* Decides whether we should zap this page with the page pointer specified */ static inline bool should_zap_page(struct zap_details *details, struct page *page) { /* If we can make a decision without *page.. */ @@ -1338,7 +1335,8 @@ static inline bool should_zap_page(struc if (!page) return true; - return details->zap_mapping == page_rmapping(page); + /* Otherwise we should only zap non-anon pages */ + return !PageAnon(page); } static unsigned long zap_pte_range(struct mmu_gather *tlb, @@ -3398,7 +3396,7 @@ void unmap_mapping_folio(struct folio *f first_index = folio->index; last_index = folio->index + folio_nr_pages(folio) - 1; - details.zap_mapping = mapping; + details.even_cows = false; details.single_folio = folio; i_mmap_lock_write(mapping); @@ -3427,7 +3425,7 @@ void unmap_mapping_pages(struct address_ pgoff_t first_index = start; pgoff_t last_index = start + nr - 1; - details.zap_mapping = even_cows ? NULL : mapping; + details.even_cows = even_cows; if (last_index < first_index) last_index = ULONG_MAX; From patchwork Tue Mar 22 21:42:24 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789116 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CFD82C433F5 for ; Tue, 22 Mar 2022 21:42:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5D99C6B00BA; Tue, 22 Mar 2022 17:42:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 560326B00BB; Tue, 22 Mar 2022 17:42:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3D9DA6B00BC; Tue, 22 Mar 2022 17:42:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.28]) by kanga.kvack.org (Postfix) with ESMTP id 2778C6B00BA for ; Tue, 22 Mar 2022 17:42:28 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id F3CFD61D0A for ; Tue, 22 Mar 2022 21:42:27 +0000 (UTC) X-FDA: 79273346334.12.6F64DD3 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf08.hostedemail.com (Postfix) with ESMTP id 63298160024 for ; Tue, 22 Mar 2022 21:42:27 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 30E2BB81D77; Tue, 22 Mar 2022 21:42:26 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id E25CFC340F4; Tue, 22 Mar 2022 21:42:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985345; bh=F1QTr47N+B/kQaHUt9g/8MjSLEV7cYIWjwYDApu4qdY=; h=Date:To:From:In-Reply-To:Subject:From; b=uGkVjBGn6A8ANywWERVebEM+nEUBUQYcVeFJIPxtKErCAk6iUGco5Oa7FmvQnsFOv 27kysIx2MNMgiem1f8mQnbnj8DG5sE5pRoS+xH2d6/2CtIOcvh10x0Y1OkwLVdEP+V RPIi5Zd5xRmUYyWpWxszjzpX3kMn3mPEFXQWq8Qg= Date: Tue, 22 Mar 2022 14:42:24 -0700 To: willy@infradead.org,vbabka@suse.cz,shy828301@gmail.com,kirill@shutemov.name,jhubbard@nvidia.com,hughd@google.com,david@redhat.com,apopple@nvidia.com,aarcange@redhat.com,peterx@redhat.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 076/227] mm: rework swap handling of zap_pte_range Message-Id: <20220322214224.E25CFC340F4@smtp.kernel.org> X-Stat-Signature: zcrsj16f84orhcy81n1bwh7ttiuznjtb Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=uGkVjBGn; spf=pass (imf08.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 63298160024 X-HE-Tag: 1647985347-533473 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Peter Xu Subject: mm: rework swap handling of zap_pte_range Clean the code up by merging the device private/exclusive swap entry handling with the rest, then we merge the pte clear operation too. struct* page is defined in multiple places in the function, move it upward. free_swap_and_cache() is only useful for !non_swap_entry() case, put it into the condition. No functional change intended. Link: https://lkml.kernel.org/r/20220216094810.60572-5-peterx@redhat.com Signed-off-by: Peter Xu Reviewed-by: John Hubbard Cc: David Hildenbrand Cc: Hugh Dickins Cc: "Kirill A . Shutemov" Cc: Matthew Wilcox Cc: Yang Shi Cc: Andrea Arcangeli Cc: Alistair Popple Cc: Vlastimil Babka Signed-off-by: Andrew Morton --- mm/memory.c | 21 ++++++--------------- 1 file changed, 6 insertions(+), 15 deletions(-) --- a/mm/memory.c~mm-rework-swap-handling-of-zap_pte_range +++ a/mm/memory.c @@ -1361,6 +1361,8 @@ again: arch_enter_lazy_mmu_mode(); do { pte_t ptent = *pte; + struct page *page; + if (pte_none(ptent)) continue; @@ -1368,8 +1370,6 @@ again: break; if (pte_present(ptent)) { - struct page *page; - page = vm_normal_page(vma, addr, ptent); if (unlikely(!should_zap_page(details, page))) continue; @@ -1403,28 +1403,21 @@ again: entry = pte_to_swp_entry(ptent); if (is_device_private_entry(entry) || is_device_exclusive_entry(entry)) { - struct page *page = pfn_swap_entry_to_page(entry); - + page = pfn_swap_entry_to_page(entry); if (unlikely(!should_zap_page(details, page))) continue; - pte_clear_not_present_full(mm, addr, pte, tlb->fullmm); rss[mm_counter(page)]--; - if (is_device_private_entry(entry)) page_remove_rmap(page, false); - put_page(page); - continue; - } - - if (!non_swap_entry(entry)) { + } else if (!non_swap_entry(entry)) { /* Genuine swap entry, hence a private anon page */ if (!should_zap_cows(details)) continue; rss[MM_SWAPENTS]--; + if (unlikely(!free_swap_and_cache(entry))) + print_bad_pte(vma, addr, ptent, NULL); } else if (is_migration_entry(entry)) { - struct page *page; - page = pfn_swap_entry_to_page(entry); if (!should_zap_page(details, page)) continue; @@ -1436,8 +1429,6 @@ again: /* We should have covered all the swap entry types */ WARN_ON_ONCE(1); } - if (unlikely(!free_swap_and_cache(entry))) - print_bad_pte(vma, addr, ptent, NULL); pte_clear_not_present_full(mm, addr, pte, tlb->fullmm); } while (pte++, addr += PAGE_SIZE, addr != end); From patchwork Tue Mar 22 21:42:27 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789117 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 49F52C4332F for ; Tue, 22 Mar 2022 21:42:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9829E6B00BB; Tue, 22 Mar 2022 17:42:29 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8BBD26B00BC; Tue, 22 Mar 2022 17:42:29 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 75CE86B00BD; Tue, 22 Mar 2022 17:42:29 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0182.hostedemail.com [216.40.44.182]) by kanga.kvack.org (Postfix) with ESMTP id 653496B00BB for ; Tue, 22 Mar 2022 17:42:29 -0400 (EDT) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 257EB18209BD4 for ; Tue, 22 Mar 2022 21:42:29 +0000 (UTC) X-FDA: 79273346418.23.3690A49 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf17.hostedemail.com (Postfix) with ESMTP id AF65040022 for ; Tue, 22 Mar 2022 21:42:28 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 1F0D9611B4; Tue, 22 Mar 2022 21:42:28 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id D8AADC340EE; Tue, 22 Mar 2022 21:42:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985347; bh=+jM+T2Xr+b7Pjmhh1FKJbinEX4Z6x00Ku09S6Q0DdpA=; h=Date:To:From:In-Reply-To:Subject:From; b=x+Z1IgL+D9OaavxQCvteIOAmxOihuq7OzFRE4Nm4EVLlFaNkdvAH+dTk/3Qg8nrCe 2up1LRVApKC5hdASVWFqz/m+6az/iBtBtEYSCTfZlUUE4bEG3e+KjfaRZn8K0TmV5u uFpgDf47Fkvs1/uyqxHw5U/QprCVoHRSe/sN6YHs= Date: Tue, 22 Mar 2022 14:42:27 -0700 To: i.zhbanov@omprussia.ru,hughd@google.com,rdunlap@infradead.org,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 077/227] mm/mmap: return 1 from stack_guard_gap __setup() handler Message-Id: <20220322214227.D8AADC340EE@smtp.kernel.org> Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=x+Z1IgL+; spf=pass (imf17.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Stat-Signature: pjb9fy1zwqc7co87egb5tp6af6apf1y5 X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: AF65040022 X-HE-Tag: 1647985348-802967 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Randy Dunlap Subject: mm/mmap: return 1 from stack_guard_gap __setup() handler __setup() handlers should return 1 if the command line option is handled and 0 if not (or maybe never return 0; it just pollutes init's environment). This prevents: Unknown kernel command line parameters \ "BOOT_IMAGE=/boot/bzImage-517rc5 stack_guard_gap=100", will be \ passed to user space. Run /sbin/init as init process with arguments: /sbin/init with environment: HOME=/ TERM=linux BOOT_IMAGE=/boot/bzImage-517rc5 stack_guard_gap=100 Return 1 to indicate that the boot option has been handled. Note that there is no warning message if someone enters: stack_guard_gap=anything_invalid and 'val' and stack_guard_gap are both set to 0 due to the use of simple_strtoul(). This could be improved by using kstrtoxxx() and checking for an error. It appears that having stack_guard_gap == 0 is valid (if unexpected) since using "stack_guard_gap=0" on the kernel command line does that. Link: https://lkml.kernel.org/r/20220222005817.11087-1-rdunlap@infradead.org Link: lore.kernel.org/r/64644a2f-4a20-bab3-1e15-3b2cdd0defe3@omprussia.ru Fixes: 1be7107fbe18e ("mm: larger stack guard gap, between vmas") Signed-off-by: Randy Dunlap Reported-by: Igor Zhbanov Cc: Hugh Dickins Signed-off-by: Andrew Morton --- mm/mmap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/mm/mmap.c~mm-mmap-return-1-from-stack_guard_gap-__setup-handler +++ a/mm/mmap.c @@ -2557,7 +2557,7 @@ static int __init cmdline_parse_stack_gu if (!*endptr) stack_guard_gap = val << PAGE_SHIFT; - return 0; + return 1; } __setup("stack_guard_gap=", cmdline_parse_stack_guard_gap); From patchwork Tue Mar 22 21:42:30 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789118 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E5490C433F5 for ; Tue, 22 Mar 2022 21:42:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 451946B00BF; Tue, 22 Mar 2022 17:42:33 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3B6F16B00C0; Tue, 22 Mar 2022 17:42:33 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 18FE86B00C1; Tue, 22 Mar 2022 17:42:33 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0072.hostedemail.com [216.40.44.72]) by kanga.kvack.org (Postfix) with ESMTP id 0491E6B00BF for ; Tue, 22 Mar 2022 17:42:33 -0400 (EDT) Received: from smtpin21.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id BB7B5182895A5 for ; Tue, 22 Mar 2022 21:42:32 +0000 (UTC) X-FDA: 79273346544.21.C2EC3E6 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf31.hostedemail.com (Postfix) with ESMTP id 5644020030 for ; Tue, 22 Mar 2022 21:42:32 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 38E88B81D9E; Tue, 22 Mar 2022 21:42:31 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id CB061C340EC; Tue, 22 Mar 2022 21:42:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985350; bh=8I0AvYzXU0dWraExZB72rwlmoaAJKY60nseCbClVmMU=; h=Date:To:From:In-Reply-To:Subject:From; b=bCsjG3bA5HiIdFWZ7NSSoMUAqaRgkLAIvzq0GSl4nLoF6XiL/s0fI97/lW6BVdrEH etmsxpd5ZsrZszR2lFUg834NUz0xIccw8j/o+zJXA6eaQ1ivZc0RADxPSZsGo9xtoi jfV5ELHNNzrHwI0pRhty+NjrNTWeMEsqxRlATxLE= Date: Tue, 22 Mar 2022 14:42:30 -0700 To: linmiaohe@huawei.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 078/227] mm/memory.c: use helper function range_in_vma() Message-Id: <20220322214230.CB061C340EC@smtp.kernel.org> X-Stat-Signature: r9wstmiqk4fuwqdcr9zqwzef3pgx67fw X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 5644020030 Authentication-Results: imf31.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=bCsjG3bA; dmarc=none; spf=pass (imf31.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Rspam-User: X-HE-Tag: 1647985352-222178 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Miaohe Lin Subject: mm/memory.c: use helper function range_in_vma() Use helper function range_in_vma() to check if address, address + size are within the vma range. Minor readability improvement. Link: https://lkml.kernel.org/r/20220219021441.29173-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin Signed-off-by: Andrew Morton --- mm/memory.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/mm/memory.c~mm-use-helper-function-range_in_vma +++ a/mm/memory.c @@ -1715,7 +1715,7 @@ static void zap_page_range_single(struct void zap_vma_ptes(struct vm_area_struct *vma, unsigned long address, unsigned long size) { - if (address < vma->vm_start || address + size > vma->vm_end || + if (!range_in_vma(vma, address, address + size) || !(vma->vm_flags & VM_PFNMAP)) return; From patchwork Tue Mar 22 21:42:33 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789119 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D3ED4C433FE for ; Tue, 22 Mar 2022 21:42:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F01326B00C1; Tue, 22 Mar 2022 17:42:35 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E60AB6B00C2; Tue, 22 Mar 2022 17:42:35 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C17E36B00C3; Tue, 22 Mar 2022 17:42:35 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0115.hostedemail.com [216.40.44.115]) by kanga.kvack.org (Postfix) with ESMTP id AEA676B00C1 for ; Tue, 22 Mar 2022 17:42:35 -0400 (EDT) Received: from smtpin31.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 7711B182895A5 for ; Tue, 22 Mar 2022 21:42:35 +0000 (UTC) X-FDA: 79273346670.31.E36CA9D Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf13.hostedemail.com (Postfix) with ESMTP id 073A22001F for ; Tue, 22 Mar 2022 21:42:34 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 6F5B560A14; Tue, 22 Mar 2022 21:42:34 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id BE339C340EE; Tue, 22 Mar 2022 21:42:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985353; bh=QkjuG4RlAp0lVQvcDkJiAAem/hBEFCOY6/rUVHq5FQw=; h=Date:To:From:In-Reply-To:Subject:From; b=NmO3q/0inf88gph868xUVDbnQitySWqaAYp4AWjfrKb3r6uxFIreXSDTPKMMNC/pQ tH9+8eFcEEt1EI2TmHRIqss50ssLUJ3zU/pj6nZZ8MD9a0Q81P3NbWp99fbO9iI5sZ KoFZk4AZzSFt5YxP8hYR007ZEuNJh60qsD3DrXCA= Date: Tue, 22 Mar 2022 14:42:33 -0700 To: linmiaohe@huawei.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 079/227] mm/memory.c: use helper macro min and max in unmap_mapping_range_tree() Message-Id: <20220322214233.BE339C340EE@smtp.kernel.org> Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b="NmO3q/0i"; spf=pass (imf13.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Stat-Signature: rrzxmbg4yg4okqhy9xfpanoy6d9fn7eb X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 073A22001F X-HE-Tag: 1647985354-537045 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Miaohe Lin Subject: mm/memory.c: use helper macro min and max in unmap_mapping_range_tree() Use helper macro min and max to help simplify the code logic. Minor readability improvement. Link: https://lkml.kernel.org/r/20220224121134.35068-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin Signed-off-by: Andrew Morton --- mm/memory.c | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) --- a/mm/memory.c~mm-use-helper-macro-min-and-max-in-unmap_mapping_range_tree +++ a/mm/memory.c @@ -3350,12 +3350,8 @@ static inline void unmap_mapping_range_t vma_interval_tree_foreach(vma, root, first_index, last_index) { vba = vma->vm_pgoff; vea = vba + vma_pages(vma) - 1; - zba = first_index; - if (zba < vba) - zba = vba; - zea = last_index; - if (zea > vea) - zea = vea; + zba = max(first_index, vba); + zea = min(last_index, vea); unmap_mapping_range_vma(vma, ((zba - vba) << PAGE_SHIFT) + vma->vm_start, From patchwork Tue Mar 22 21:42:36 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789120 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C30AFC433EF for ; Tue, 22 Mar 2022 21:42:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5B3186B00C3; Tue, 22 Mar 2022 17:42:38 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 49E406B00C5; Tue, 22 Mar 2022 17:42:38 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2F0496B00C6; Tue, 22 Mar 2022 17:42:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.27]) by kanga.kvack.org (Postfix) with ESMTP id 1A4A76B00C3 for ; Tue, 22 Mar 2022 17:42:38 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id E880B61885 for ; Tue, 22 Mar 2022 21:42:37 +0000 (UTC) X-FDA: 79273346754.14.B9BEF2A Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf06.hostedemail.com (Postfix) with ESMTP id 83515180011 for ; Tue, 22 Mar 2022 21:42:37 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id ED90761179; Tue, 22 Mar 2022 21:42:36 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id B221EC340EE; Tue, 22 Mar 2022 21:42:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985356; bh=Hn/AIoQdeTw5O5kZZ7f+AVx97B+TM9hDMs/OwKX3LVQ=; h=Date:To:From:In-Reply-To:Subject:From; b=IPou8L9ZNHBCjwbnuaBc6PtXf3MRj5bX/sFfSQtSXYTb/e5HLIdpkfmORt+epFi1l drMhOv/MJSKjZdY6llZvHGNDsApFnQ7rqSEJkyd8o+8ph5sGIPeXmbkUjn0V1VsPW2 CBXrMcAGMsLXooMK0LWT4hlwD2oOFj0NBRQoMQYo= Date: Tue, 22 Mar 2022 14:42:36 -0700 To: vbabka@suse.cz,hughd@google.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 080/227] mm: _install_special_mapping() apply VM_LOCKED_CLEAR_MASK Message-Id: <20220322214236.B221EC340EE@smtp.kernel.org> Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=IPou8L9Z; spf=pass (imf06.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Stat-Signature: tiuhjyyz66csmm3r1habz9k7ktse9sga X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 83515180011 X-HE-Tag: 1647985357-409415 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Hugh Dickins Subject: mm: _install_special_mapping() apply VM_LOCKED_CLEAR_MASK _install_special_mapping() adds the VM_SPECIAL bit VM_DONTEXPAND (and never attempts to update locked_vm), so it ought to be consistent with mmap_region() and mlock_fixup(), making sure not to add VM_LOCKED or VM_LOCKONFAULT. I doubt that this fixes any problem in practice: just do it for consistency. Link: https://lkml.kernel.org/r/a85315a9-21d1-6133-c5fc-c89863dfb25b@google.com Signed-off-by: Hugh Dickins Cc: Vlastimil Babka Signed-off-by: Andrew Morton --- mm/mmap.c | 1 + 1 file changed, 1 insertion(+) --- a/mm/mmap.c~mm-_install_special_mapping-apply-vm_locked_clear_mask +++ a/mm/mmap.c @@ -3448,6 +3448,7 @@ static struct vm_area_struct *__install_ vma->vm_end = addr + len; vma->vm_flags = vm_flags | mm->def_flags | VM_DONTEXPAND | VM_SOFTDIRTY; + vma->vm_flags &= VM_LOCKED_CLEAR_MASK; vma->vm_page_prot = vm_get_page_prot(vma->vm_flags); vma->vm_ops = ops; From patchwork Tue Mar 22 21:42:39 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789121 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 62051C433F5 for ; Tue, 22 Mar 2022 21:42:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E8DB16B00C6; Tue, 22 Mar 2022 17:42:42 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E137F6B00C7; Tue, 22 Mar 2022 17:42:42 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C8C666B00C8; Tue, 22 Mar 2022 17:42:42 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0018.hostedemail.com [216.40.44.18]) by kanga.kvack.org (Postfix) with ESMTP id B63946B00C6 for ; Tue, 22 Mar 2022 17:42:42 -0400 (EDT) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 78E9718209BD4 for ; Tue, 22 Mar 2022 21:42:42 +0000 (UTC) X-FDA: 79273346964.18.2DD8DB6 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf14.hostedemail.com (Postfix) with ESMTP id A236210001F for ; Tue, 22 Mar 2022 21:42:41 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id E90DEB81DAF; Tue, 22 Mar 2022 21:42:40 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id A11A7C340F3; Tue, 22 Mar 2022 21:42:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985359; bh=sgXnbpK3OL1x/YqQqwFQP1CQFUTj6EI9JU7bIqejWBM=; h=Date:To:From:In-Reply-To:Subject:From; b=g+0ewIdnB1OvfZkdktYnBJQhcP3dAkyC4eZmZlQDr7kyLPR8M0/UF3BtWkbRnQVzi M4BQqI6uhGRgIupUFI6rlglvmy/fAExJ+A/uwxTN+wspNSFkdj+ZlhLlUnEEruPJ6F 4UdatXb9IlcznJ3FSoP0yPTqv4k7xeVZWg1oKhw8= Date: Tue, 22 Mar 2022 14:42:39 -0700 To: linmiaohe@huawei.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 081/227] mm/mmap: remove obsolete comment in ksys_mmap_pgoff Message-Id: <20220322214239.A11A7C340F3@smtp.kernel.org> X-Rspamd-Server: rspam09 X-Rspam-User: X-Stat-Signature: 8qxtsznafb594r8knyzypuzhqow9wa3k Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=g+0ewIdn; dmarc=none; spf=pass (imf14.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Rspamd-Queue-Id: A236210001F X-HE-Tag: 1647985361-878188 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Miaohe Lin Subject: mm/mmap: remove obsolete comment in ksys_mmap_pgoff RLIMIT_MEMLOCK is already reimplemented on top of ucounts now. And since commit 83c1fd763b32 ("mm,hugetlb: remove mlock ulimit for SHM_HUGETLB"), mlock ulimit for SHM_HUGETLB is further removed. So we should remove this obsolete comment. Link: https://lkml.kernel.org/r/20220309090623.13036-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin Signed-off-by: Andrew Morton --- mm/mmap.c | 2 -- 1 file changed, 2 deletions(-) --- a/mm/mmap.c~mm-mmap-remove-obsolete-comment-in-ksys_mmap_pgoff +++ a/mm/mmap.c @@ -1616,8 +1616,6 @@ unsigned long ksys_mmap_pgoff(unsigned l /* * VM_NORESERVE is used because the reservations will be * taken when vm_ops->mmap() is called - * A dummy user value is used because we are not locking - * memory so no accounting is necessary */ file = hugetlb_file_setup(HUGETLB_ANON_FILE, len, VM_NORESERVE, From patchwork Tue Mar 22 21:42:41 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789122 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 41FF0C433FE for ; Tue, 22 Mar 2022 21:42:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D2A526B00C8; Tue, 22 Mar 2022 17:42:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CB3336B00C9; Tue, 22 Mar 2022 17:42:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B2DAC6B00CA; Tue, 22 Mar 2022 17:42:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.27]) by kanga.kvack.org (Postfix) with ESMTP id 9DAB66B00C8 for ; Tue, 22 Mar 2022 17:42:44 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 6477221C29 for ; Tue, 22 Mar 2022 21:42:44 +0000 (UTC) X-FDA: 79273347048.07.28D01C1 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf08.hostedemail.com (Postfix) with ESMTP id E4AA8160013 for ; Tue, 22 Mar 2022 21:42:43 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 3A00F60A1B; Tue, 22 Mar 2022 21:42:43 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 906AFC340EC; Tue, 22 Mar 2022 21:42:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985362; bh=nLd/e0wfjnqccZgkuzXkEmRaOX/nNoEF/3WrCYYj7oI=; h=Date:To:From:In-Reply-To:Subject:From; b=FM55E8451rTYyUODUG37K9nGKHXIJtTlFGD9wxEk97PtcQjbBC3Rv+eqM4wmHdCta SXwjGvRq3G8btTfxIvmRxN9p554BOLMmbQ71J6ygxjqVG/ox3+jawbzYOylnhEU0wk 82kVrKdW9aTprGaVHwpuE5sPYJV1YhSSYUDWsqZA= Date: Tue, 22 Mar 2022 14:42:41 -0700 To: david@redhat.com,linmiaohe@huawei.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 082/227] mm/mremap:: use vma_lookup() instead of find_vma() Message-Id: <20220322214242.906AFC340EC@smtp.kernel.org> X-Rspam-User: Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=FM55E845; spf=pass (imf08.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: E4AA8160013 X-Stat-Signature: ekyitgoegy39bexuzrogricfbu3twr6x X-HE-Tag: 1647985363-536263 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Miaohe Lin Subject: mm/mremap:: use vma_lookup() instead of find_vma() Using vma_lookup() verifies the address is contained in the found vma. This results in easier to read code. Link: https://lkml.kernel.org/r/20220312083118.48284-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin Reviewed-by: Andrew Morton Reviewed-by: David Hildenbrand Signed-off-by: Andrew Morton --- mm/mremap.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) --- a/mm/mremap.c~mm-mremap-use-vma_lookup-instead-of-find_vma +++ a/mm/mremap.c @@ -942,8 +942,8 @@ SYSCALL_DEFINE5(mremap, unsigned long, a if (mmap_write_lock_killable(current->mm)) return -EINTR; - vma = find_vma(mm, addr); - if (!vma || vma->vm_start > addr) { + vma = vma_lookup(mm, addr); + if (!vma) { ret = EFAULT; goto out; } From patchwork Tue Mar 22 21:42:44 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789123 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F201AC433F5 for ; Tue, 22 Mar 2022 21:42:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 932F56B00CA; Tue, 22 Mar 2022 17:42:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8BB986B00CB; Tue, 22 Mar 2022 17:42:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 75DB76B00CC; Tue, 22 Mar 2022 17:42:47 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.a.hostedemail.com [64.99.140.24]) by kanga.kvack.org (Postfix) with ESMTP id 5FDC86B00CA for ; Tue, 22 Mar 2022 17:42:47 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 2CFA124409 for ; Tue, 22 Mar 2022 21:42:47 +0000 (UTC) X-FDA: 79273347174.15.A098A5F Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf28.hostedemail.com (Postfix) with ESMTP id A8B1EC001A for ; Tue, 22 Mar 2022 21:42:46 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 2857E60A14; Tue, 22 Mar 2022 21:42:46 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7E61AC340EE; Tue, 22 Mar 2022 21:42:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985365; bh=mPKzudlzJ9FrbxOIvBg6s/Z3UnRNW3NEEw4AvYzkQms=; h=Date:To:From:In-Reply-To:Subject:From; b=V2vzQDFCGt6BdNeVyLtyGfOfFwl1eE6jujWz4746NaHZdloDZWbIHhQJOEFAZHF90 2O2gF3qI1svoSXUn38kuraatY6afo6ieEKeH65kzjMjrULICUg5v9ILgU9o1yQO0y4 4e/g4AoLBWP/HiGWf4JeDScVsWJKwRFhVpbJT5fk= Date: Tue, 22 Mar 2022 14:42:44 -0700 To: rppt@linux.ibm.com,linmiaohe@huawei.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 083/227] mm/sparse: make mminit_validate_memmodel_limits() static Message-Id: <20220322214245.7E61AC340EE@smtp.kernel.org> X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: A8B1EC001A X-Stat-Signature: d4qtbfhpffq45ph3ciaea84pb7yfquay X-Rspam-User: Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=V2vzQDFC; dmarc=none; spf=pass (imf28.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-HE-Tag: 1647985366-289155 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Miaohe Lin Subject: mm/sparse: make mminit_validate_memmodel_limits() static It's only used in the sparse.c now. So we can make it static and further clean up the relevant code. Link: https://lkml.kernel.org/r/20220127093221.63524-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin Reviewed-by: Mike Rapoport Signed-off-by: Andrew Morton --- mm/internal.h | 11 ----------- mm/sparse.c | 2 +- 2 files changed, 1 insertion(+), 12 deletions(-) --- a/mm/internal.h~mm-sparse-make-mminit_validate_memmodel_limits-static +++ a/mm/internal.h @@ -572,17 +572,6 @@ static inline void mminit_verify_zonelis } #endif /* CONFIG_DEBUG_MEMORY_INIT */ -/* mminit_validate_memmodel_limits is independent of CONFIG_DEBUG_MEMORY_INIT */ -#if defined(CONFIG_SPARSEMEM) -extern void mminit_validate_memmodel_limits(unsigned long *start_pfn, - unsigned long *end_pfn); -#else -static inline void mminit_validate_memmodel_limits(unsigned long *start_pfn, - unsigned long *end_pfn) -{ -} -#endif /* CONFIG_SPARSEMEM */ - #define NODE_RECLAIM_NOSCAN -2 #define NODE_RECLAIM_FULL -1 #define NODE_RECLAIM_SOME 0 --- a/mm/sparse.c~mm-sparse-make-mminit_validate_memmodel_limits-static +++ a/mm/sparse.c @@ -126,7 +126,7 @@ static inline int sparse_early_nid(struc } /* Validate the physical addressing limitations of the model */ -void __meminit mminit_validate_memmodel_limits(unsigned long *start_pfn, +static void __meminit mminit_validate_memmodel_limits(unsigned long *start_pfn, unsigned long *end_pfn) { unsigned long max_sparsemem_pfn = 1UL << (MAX_PHYSMEM_BITS-PAGE_SHIFT); From patchwork Tue Mar 22 21:42:47 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789124 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 455F6C433F5 for ; Tue, 22 Mar 2022 21:42:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CDE2B6B00CC; Tue, 22 Mar 2022 17:42:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C65886B00CD; Tue, 22 Mar 2022 17:42:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B06166B00CE; Tue, 22 Mar 2022 17:42:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0023.hostedemail.com [216.40.44.23]) by kanga.kvack.org (Postfix) with ESMTP id 9911F6B00CC for ; Tue, 22 Mar 2022 17:42:51 -0400 (EDT) Received: from smtpin28.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 542BAA0FAD for ; Tue, 22 Mar 2022 21:42:51 +0000 (UTC) X-FDA: 79273347342.28.173934E Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf20.hostedemail.com (Postfix) with ESMTP id D2A781C0038 for ; Tue, 22 Mar 2022 21:42:50 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id B3CB2B81DAF; Tue, 22 Mar 2022 21:42:49 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 69587C340EC; Tue, 22 Mar 2022 21:42:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985368; bh=65vsjaCa7K823HTWNhYf+T6ODYlPVC1bXOGpsV6LcrY=; h=Date:To:From:In-Reply-To:Subject:From; b=CGwX50YZDJZgYNuN7CkQRueu6llexKonykLskDYIsgrceEVohXlGUyUJPxf/VqPgF Jw1rQbTaxJkwTstxb66Zn4bQ11ABYjorb314NO9Q2hcZttG4UyXnZvmBnxcGiqAnJK FMHXw45dCB60G7ilTkwU/2Z4xizaYrPW85hUsAzU= Date: Tue, 22 Mar 2022 14:42:47 -0700 To: urezki@gmail.com,linmiaohe@huawei.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 084/227] mm/vmalloc: remove unneeded function forward declaration Message-Id: <20220322214248.69587C340EC@smtp.kernel.org> X-Stat-Signature: a8n1kwrn4axh9asehuqd95djb6xa37nn Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=CGwX50YZ; spf=pass (imf20.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspam-User: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: D2A781C0038 X-HE-Tag: 1647985370-60568 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Miaohe Lin Subject: mm/vmalloc: remove unneeded function forward declaration The forward declaration for lazy_max_pages() is unnecessary. Remove it. Link: https://lkml.kernel.org/r/20220124133752.60663-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin Cc: Uladzislau Rezki Signed-off-by: Andrew Morton --- mm/vmalloc.c | 1 - 1 file changed, 1 deletion(-) --- a/mm/vmalloc.c~mm-vmalloc-remove-unneeded-function-forward-declaration +++ a/mm/vmalloc.c @@ -791,7 +791,6 @@ RB_DECLARE_CALLBACKS_MAX(static, free_vm static void purge_vmap_area_lazy(void); static BLOCKING_NOTIFIER_HEAD(vmap_notify_list); -static unsigned long lazy_max_pages(void); static atomic_long_t nr_vmalloc_pages; From patchwork Tue Mar 22 21:42:50 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789125 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0444CC433EF for ; Tue, 22 Mar 2022 21:42:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 932D56B00CE; Tue, 22 Mar 2022 17:42:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8BBB86B00CF; Tue, 22 Mar 2022 17:42:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 70DB66B00D0; Tue, 22 Mar 2022 17:42:53 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0204.hostedemail.com [216.40.44.204]) by kanga.kvack.org (Postfix) with ESMTP id 5D31C6B00CE for ; Tue, 22 Mar 2022 17:42:53 -0400 (EDT) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 27668A0FAD for ; Tue, 22 Mar 2022 21:42:53 +0000 (UTC) X-FDA: 79273347426.18.F7D1ACC Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf25.hostedemail.com (Postfix) with ESMTP id A5C8EA001D for ; Tue, 22 Mar 2022 21:42:52 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 1C00D60A1B; Tue, 22 Mar 2022 21:42:52 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7163FC340EC; Tue, 22 Mar 2022 21:42:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985371; bh=h6WthA8eonFvZJuDoYxVkuXubbra0E50IcOP3ZyBbDE=; h=Date:To:From:In-Reply-To:Subject:From; b=iZtrXFsL+zBH9Azf5lPQfCXnacUPG8FjA0EoySJLtO9huHWgwQ0czGaJYjEDkC7/o Kyqt+AKQVYGu9coebG0ZlbBTeUnRU0+Cb6xA8ESTmTUMWGQujDLJEvst0aFB4ta7w0 oU6qd30BXmBiFrmljeYsH93IGPUP39pQNCl29/Lc= Date: Tue, 22 Mar 2022 14:42:50 -0700 To: willy@infradead.org,vvs@virtuozzo.com,uladzislau.rezki@sony.com,oleksiy.avramchenko@sonymobile.com,npiggin@gmail.com,hch@lst.de,urezki@gmail.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 085/227] mm/vmalloc: Move draining areas out of caller context Message-Id: <20220322214251.7163FC340EC@smtp.kernel.org> X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: A5C8EA001D X-Stat-Signature: b47sfd6tbcmr5zc3sy3n5jqahe5hihm5 X-Rspam-User: Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=iZtrXFsL; dmarc=none; spf=pass (imf25.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-HE-Tag: 1647985372-206615 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: "Uladzislau Rezki (Sony)" Subject: mm/vmalloc: Move draining areas out of caller context A caller initiates the drain procces from its context once the drain threshold is reached or passed. There are at least two drawbacks of doing so: a) a caller can be a high-prio or RT task. In that case it can stuck in doing the actual drain of all lazily freed areas. This is not optimal because such tasks usually are latency sensitive where the control should be returned back as soon as possible in order to drive such workloads in time. See 96e2db456135 ("mm/vmalloc: rework the drain logic") b) It is not safe to call vfree() during holding a spinlock due to the vmap_purge_lock mutex. The was a report about this from Zeal Robot here: https://lore.kernel.org/all/20211222081026.484058-1-chi.minghao@zte.com.cn Moving the drain to the separate work context addresses those issues. v1->v2: - Added prefix "_work" to the drain worker function. v2->v3: - Remove the drain_vmap_work_in_progress. Extra queuing is expectable under heavy load but it can be disregarded because a work will bail out if nothing to be done. Link: https://lkml.kernel.org/r/20220131144058.35608-1-urezki@gmail.com Signed-off-by: Uladzislau Rezki (Sony) Reviewed-by: Christoph Hellwig Cc: Matthew Wilcox Cc: Nicholas Piggin Cc: Oleksiy Avramchenko Cc: Uladzislau Rezki Cc: Vasily Averin Signed-off-by: Andrew Morton --- mm/vmalloc.c | 30 +++++++++++++++++------------- 1 file changed, 17 insertions(+), 13 deletions(-) --- a/mm/vmalloc.c~mm-vmalloc-move-draining-areas-out-of-caller-context +++ a/mm/vmalloc.c @@ -791,6 +791,8 @@ RB_DECLARE_CALLBACKS_MAX(static, free_vm static void purge_vmap_area_lazy(void); static BLOCKING_NOTIFIER_HEAD(vmap_notify_list); +static void drain_vmap_area_work(struct work_struct *work); +static DECLARE_WORK(drain_vmap_work, drain_vmap_area_work); static atomic_long_t nr_vmalloc_pages; @@ -1718,18 +1720,6 @@ static bool __purge_vmap_area_lazy(unsig } /* - * Kick off a purge of the outstanding lazy areas. Don't bother if somebody - * is already purging. - */ -static void try_purge_vmap_area_lazy(void) -{ - if (mutex_trylock(&vmap_purge_lock)) { - __purge_vmap_area_lazy(ULONG_MAX, 0); - mutex_unlock(&vmap_purge_lock); - } -} - -/* * Kick off a purge of the outstanding lazy areas. */ static void purge_vmap_area_lazy(void) @@ -1740,6 +1730,20 @@ static void purge_vmap_area_lazy(void) mutex_unlock(&vmap_purge_lock); } +static void drain_vmap_area_work(struct work_struct *work) +{ + unsigned long nr_lazy; + + do { + mutex_lock(&vmap_purge_lock); + __purge_vmap_area_lazy(ULONG_MAX, 0); + mutex_unlock(&vmap_purge_lock); + + /* Recheck if further work is required. */ + nr_lazy = atomic_long_read(&vmap_lazy_nr); + } while (nr_lazy > lazy_max_pages()); +} + /* * Free a vmap area, caller ensuring that the area has been unmapped * and flush_cache_vunmap had been called for the correct range @@ -1766,7 +1770,7 @@ static void free_vmap_area_noflush(struc /* After this point, we may free va at any time */ if (unlikely(nr_lazy > lazy_max_pages())) - try_purge_vmap_area_lazy(); + schedule_work(&drain_vmap_work); } /* From patchwork Tue Mar 22 21:42:53 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789126 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 25773C433EF for ; Tue, 22 Mar 2022 21:42:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AE1B86B00D0; Tue, 22 Mar 2022 17:42:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A67F86B00D1; Tue, 22 Mar 2022 17:42:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 909D96B00D2; Tue, 22 Mar 2022 17:42:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.28]) by kanga.kvack.org (Postfix) with ESMTP id 7B0D06B00D0 for ; Tue, 22 Mar 2022 17:42:56 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 4ED7D24409 for ; Tue, 22 Mar 2022 21:42:56 +0000 (UTC) X-FDA: 79273347552.05.999CEE7 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf19.hostedemail.com (Postfix) with ESMTP id BC2AE1A002D for ; Tue, 22 Mar 2022 21:42:55 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 324926101E; Tue, 22 Mar 2022 21:42:55 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7C278C340EE; Tue, 22 Mar 2022 21:42:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985374; bh=KVSn9jTSoszg9gbPFpx4VwZWDw0TuKwpm8ilMnj4JSc=; h=Date:To:From:In-Reply-To:Subject:From; b=oEsB1BeI79JnLOVpRj0s63nZxHyo+vP8XMdck8Ye7uyjd1aj94jUcQskcfBbfg09y +tnVGdsxvl7tKpYNcaUt7sUQCcKHXQCdoAYnbd1QKEnQw7gIvfXOrLNUeAslz/YyVf B1TBAr9MYGONmbDXmSyHEdBAMmxzV4S+BRD6K3VU= Date: Tue, 22 Mar 2022 14:42:53 -0700 To: willy@infradead.org,vvs@virtuozzo.com,urezki@gmail.com,oleksiy.avramchenko@sonymobile.com,npiggin@gmail.com,hch@lst.de,uladzislau.rezki@sony.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 086/227] mm/vmalloc: add adjust_search_size parameter Message-Id: <20220322214254.7C278C340EE@smtp.kernel.org> X-Rspam-User: Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=oEsB1BeI; spf=pass (imf19.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: BC2AE1A002D X-Stat-Signature: tkwdppy1egaayr8zgby3zfxb8wgwurs9 X-HE-Tag: 1647985375-879802 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Uladzislau Rezki Subject: mm/vmalloc: add adjust_search_size parameter Extend the find_vmap_lowest_match() function with one more parameter. It is "adjust_search_size" boolean variable, so it is possible to control an accuracy of search block if a specific alignment is required. With this patch, a search size is always adjusted, to serve a request as fast as possible because of performance reason. But there is one exception though, it is short ranges where requested size corresponds to passed vstart/vend restriction together with a specific alignment request. In such scenario an adjustment wold not lead to success allocation. Link: https://lkml.kernel.org/r/20220119143540.601149-2-urezki@gmail.com Signed-off-by: Uladzislau Rezki Signed-off-by: Uladzislau Rezki (Sony) Cc: Christoph Hellwig Cc: Matthew Wilcox Cc: Nicholas Piggin Cc: Oleksiy Avramchenko Cc: Vasily Averin Signed-off-by: Andrew Morton --- mm/vmalloc.c | 37 ++++++++++++++++++++++++++++--------- 1 file changed, 28 insertions(+), 9 deletions(-) --- a/mm/vmalloc.c~mm-vmalloc-add-adjust_search_size-parameter +++ a/mm/vmalloc.c @@ -1189,22 +1189,28 @@ is_within_this_va(struct vmap_area *va, /* * Find the first free block(lowest start address) in the tree, * that will accomplish the request corresponding to passing - * parameters. + * parameters. Please note, with an alignment bigger than PAGE_SIZE, + * a search length is adjusted to account for worst case alignment + * overhead. */ static __always_inline struct vmap_area * -find_vmap_lowest_match(unsigned long size, - unsigned long align, unsigned long vstart) +find_vmap_lowest_match(unsigned long size, unsigned long align, + unsigned long vstart, bool adjust_search_size) { struct vmap_area *va; struct rb_node *node; + unsigned long length; /* Start from the root. */ node = free_vmap_area_root.rb_node; + /* Adjust the search size for alignment overhead. */ + length = adjust_search_size ? size + align - 1 : size; + while (node) { va = rb_entry(node, struct vmap_area, rb_node); - if (get_subtree_max_size(node->rb_left) >= size && + if (get_subtree_max_size(node->rb_left) >= length && vstart < va->va_start) { node = node->rb_left; } else { @@ -1214,9 +1220,9 @@ find_vmap_lowest_match(unsigned long siz /* * Does not make sense to go deeper towards the right * sub-tree if it does not have a free block that is - * equal or bigger to the requested search size. + * equal or bigger to the requested search length. */ - if (get_subtree_max_size(node->rb_right) >= size) { + if (get_subtree_max_size(node->rb_right) >= length) { node = node->rb_right; continue; } @@ -1232,7 +1238,7 @@ find_vmap_lowest_match(unsigned long siz if (is_within_this_va(va, size, align, vstart)) return va; - if (get_subtree_max_size(node->rb_right) >= size && + if (get_subtree_max_size(node->rb_right) >= length && vstart <= va->va_start) { /* * Shift the vstart forward. Please note, we update it with @@ -1280,7 +1286,7 @@ find_vmap_lowest_match_check(unsigned lo get_random_bytes(&rnd, sizeof(rnd)); vstart = VMALLOC_START + rnd; - va_1 = find_vmap_lowest_match(size, align, vstart); + va_1 = find_vmap_lowest_match(size, align, vstart, false); va_2 = find_vmap_lowest_linear_match(size, align, vstart); if (va_1 != va_2) @@ -1431,12 +1437,25 @@ static __always_inline unsigned long __alloc_vmap_area(unsigned long size, unsigned long align, unsigned long vstart, unsigned long vend) { + bool adjust_search_size = true; unsigned long nva_start_addr; struct vmap_area *va; enum fit_type type; int ret; - va = find_vmap_lowest_match(size, align, vstart); + /* + * Do not adjust when: + * a) align <= PAGE_SIZE, because it does not make any sense. + * All blocks(their start addresses) are at least PAGE_SIZE + * aligned anyway; + * b) a short range where a requested size corresponds to exactly + * specified [vstart:vend] interval and an alignment > PAGE_SIZE. + * With adjusted search length an allocation would not succeed. + */ + if (align <= PAGE_SIZE || (align > PAGE_SIZE && (vend - vstart) == size)) + adjust_search_size = false; + + va = find_vmap_lowest_match(size, align, vstart, adjust_search_size); if (unlikely(!va)) return vend; From patchwork Tue Mar 22 21:42:56 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789127 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4C760C433EF for ; Tue, 22 Mar 2022 21:43:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E632F6B00D2; Tue, 22 Mar 2022 17:43:00 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DEA716B00D3; Tue, 22 Mar 2022 17:43:00 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C8BD06B00D4; Tue, 22 Mar 2022 17:43:00 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0033.hostedemail.com [216.40.44.33]) by kanga.kvack.org (Postfix) with ESMTP id B1DD16B00D2 for ; Tue, 22 Mar 2022 17:43:00 -0400 (EDT) Received: from smtpin17.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 7BD32A4DB6 for ; Tue, 22 Mar 2022 21:43:00 +0000 (UTC) X-FDA: 79273347720.17.28C7DAB Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf23.hostedemail.com (Postfix) with ESMTP id E7B5714002C for ; Tue, 22 Mar 2022 21:42:59 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id DA8A4B81DB3; Tue, 22 Mar 2022 21:42:58 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7B129C340EC; Tue, 22 Mar 2022 21:42:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985377; bh=RuagpvtIP+q3zKI8V0gZFyAfB4Nt0+fKY1JfMjH+53s=; h=Date:To:From:In-Reply-To:Subject:From; b=qEdu+fuQEkdO56qTxG5x/hvSbxA5kph5RRcRwirOvz0k+6dI7D1FBkSmIc4GmNfb7 iA8EfwO0xMoRZ4pMC2Akb8flGQ9FUbC85efVKxJSPntBZUkavv+nnE4ZkDa+hI15qo uKEWBZYXsTocrt0QbxYvAVE9qWw8th6vKtf/5zKQ= Date: Tue, 22 Mar 2022 14:42:56 -0700 To: willy@infradead.org,vvs@virtuozzo.com,uladzislau.rezki@sony.com,oleksiy.avramchenko@sonymobile.com,npiggin@gmail.com,hch@lst.de,urezki@gmail.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 087/227] mm/vmalloc: eliminate an extra orig_gfp_mask Message-Id: <20220322214257.7B129C340EC@smtp.kernel.org> X-Stat-Signature: icxj6ehf7urgeyp39meoax77tbruthpp Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=qEdu+fuQ; dmarc=none; spf=pass (imf23.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Rspam-User: X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: E7B5714002C X-HE-Tag: 1647985379-849285 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: "Uladzislau Rezki (Sony)" Subject: mm/vmalloc: eliminate an extra orig_gfp_mask That extra variable has been introduced just for keeping an original passed gfp_mask because it is updated with __GFP_NOWARN on entry, thus error handling messages were broken. Instead we can keep an original gfp_mask without modifying it and add an extra __GFP_NOWARN flag together with gfp_mask as a parameter to the vm_area_alloc_pages() function. It will make it less confused. Link: https://lkml.kernel.org/r/20220119143540.601149-3-urezki@gmail.com Signed-off-by: Uladzislau Rezki (Sony) Cc: Vasily Averin Cc: Christoph Hellwig Cc: Matthew Wilcox Cc: Nicholas Piggin Cc: Oleksiy Avramchenko Cc: Uladzislau Rezki Signed-off-by: Andrew Morton --- mm/vmalloc.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) --- a/mm/vmalloc.c~mm-vmalloc-eliminate-an-extra-orig_gfp_mask +++ a/mm/vmalloc.c @@ -2946,7 +2946,6 @@ static void *__vmalloc_area_node(struct int node) { const gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO; - const gfp_t orig_gfp_mask = gfp_mask; bool nofail = gfp_mask & __GFP_NOFAIL; unsigned long addr = (unsigned long)area->addr; unsigned long size = get_vm_area_size(area); @@ -2970,7 +2969,7 @@ static void *__vmalloc_area_node(struct } if (!area->pages) { - warn_alloc(orig_gfp_mask, NULL, + warn_alloc(gfp_mask, NULL, "vmalloc error: size %lu, failed to allocated page array size %lu", nr_small_pages * PAGE_SIZE, array_size); free_vm_area(area); @@ -2980,8 +2979,8 @@ static void *__vmalloc_area_node(struct set_vm_area_page_order(area, page_shift - PAGE_SHIFT); page_order = vm_area_page_order(area); - area->nr_pages = vm_area_alloc_pages(gfp_mask, node, - page_order, nr_small_pages, area->pages); + area->nr_pages = vm_area_alloc_pages(gfp_mask | __GFP_NOWARN, + node, page_order, nr_small_pages, area->pages); atomic_long_add(area->nr_pages, &nr_vmalloc_pages); if (gfp_mask & __GFP_ACCOUNT) { @@ -2997,7 +2996,7 @@ static void *__vmalloc_area_node(struct * allocation request, free them via __vfree() if any. */ if (area->nr_pages != nr_small_pages) { - warn_alloc(orig_gfp_mask, NULL, + warn_alloc(gfp_mask, NULL, "vmalloc error: size %lu, page order %u, failed to allocate pages", area->nr_pages * PAGE_SIZE, page_order); goto fail; @@ -3025,7 +3024,7 @@ static void *__vmalloc_area_node(struct memalloc_noio_restore(flags); if (ret < 0) { - warn_alloc(orig_gfp_mask, NULL, + warn_alloc(gfp_mask, NULL, "vmalloc error: size %lu, failed to map pages", area->nr_pages * PAGE_SIZE); goto fail; From patchwork Tue Mar 22 21:42:59 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789128 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4E44EC433F5 for ; Tue, 22 Mar 2022 21:43:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E160F6B00D4; Tue, 22 Mar 2022 17:43:03 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D9ED06B00D5; Tue, 22 Mar 2022 17:43:03 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C3D936B00D6; Tue, 22 Mar 2022 17:43:03 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0019.hostedemail.com [216.40.44.19]) by kanga.kvack.org (Postfix) with ESMTP id AB0796B00D4 for ; Tue, 22 Mar 2022 17:43:03 -0400 (EDT) Received: from smtpin17.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 70C31A4DC7 for ; Tue, 22 Mar 2022 21:43:03 +0000 (UTC) X-FDA: 79273347846.17.0700AB1 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf12.hostedemail.com (Postfix) with ESMTP id CFE284003F for ; Tue, 22 Mar 2022 21:43:02 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id B38EEB81DAF; Tue, 22 Mar 2022 21:43:01 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 675CAC340EC; Tue, 22 Mar 2022 21:43:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985380; bh=wZjjPIHrZl/Gm3NIx0w7E2TeqahXjahiA5sz15thP9w=; h=Date:To:From:In-Reply-To:Subject:From; b=nn9+6RD0jLm8REiUbTs3y9OUYUdrQyDOq7HKBvuhazT4ZcD0lTefti243YG2gxrxn nRaYi/5AZtlnI+03ggHCA+dEuBN0IWBOCEwnFvRgNz/HOHx78b80IXDAFCo393MUEf /DSrtAd8EEMZmidEWlTPRSF8ZNkQ6UkWEVxrT2p0= Date: Tue, 22 Mar 2022 14:42:59 -0700 To: abaci@linux.alibaba.com,jiapeng.chong@linux.alibaba.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 088/227] mm/vmalloc.c: fix "unused function" warning Message-Id: <20220322214300.675CAC340EC@smtp.kernel.org> X-Stat-Signature: gpj54q18q6h1rj3o97yikhyn5wjmq4ey Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=nn9+6RD0; spf=pass (imf12.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: CFE284003F X-HE-Tag: 1647985382-564347 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Jiapeng Chong Subject: mm/vmalloc.c: fix "unused function" warning compute_subtree_max_size() is unused, when building with DEBUG_AUGMENT_PROPAGATE_CHECK=y. mm/vmalloc.c:785:1: warning: unused function 'compute_subtree_max_size' [-Wunused-function]. Link: https://lkml.kernel.org/r/20220129034652.75359-1-jiapeng.chong@linux.alibaba.com Signed-off-by: Jiapeng Chong Reported-by: Abaci Robot Signed-off-by: Andrew Morton --- mm/vmalloc.c | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) --- a/mm/vmalloc.c~mm-vmallocc-fix-unused-function-warning +++ a/mm/vmalloc.c @@ -775,17 +775,6 @@ get_subtree_max_size(struct rb_node *nod return va ? va->subtree_max_size : 0; } -/* - * Gets called when remove the node and rotate. - */ -static __always_inline unsigned long -compute_subtree_max_size(struct vmap_area *va) -{ - return max3(va_size(va), - get_subtree_max_size(va->rb_node.rb_left), - get_subtree_max_size(va->rb_node.rb_right)); -} - RB_DECLARE_CALLBACKS_MAX(static, free_vmap_area_rb_augment_cb, struct vmap_area, rb_node, unsigned long, subtree_max_size, va_size) @@ -973,6 +962,17 @@ unlink_va(struct vmap_area *va, struct r } #if DEBUG_AUGMENT_PROPAGATE_CHECK +/* + * Gets called when remove the node and rotate. + */ +static __always_inline unsigned long +compute_subtree_max_size(struct vmap_area *va) +{ + return max3(va_size(va), + get_subtree_max_size(va->rb_node.rb_left), + get_subtree_max_size(va->rb_node.rb_right)); +} + static void augment_tree_propagate_check(void) { From patchwork Tue Mar 22 21:43:02 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789129 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 66056C433EF for ; Tue, 22 Mar 2022 21:43:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EB3DE6B00D6; Tue, 22 Mar 2022 17:43:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E12636B00D7; Tue, 22 Mar 2022 17:43:05 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C65016B00D8; Tue, 22 Mar 2022 17:43:05 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0053.hostedemail.com [216.40.44.53]) by kanga.kvack.org (Postfix) with ESMTP id B2ACB6B00D6 for ; Tue, 22 Mar 2022 17:43:05 -0400 (EDT) Received: from smtpin31.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 689A58249980 for ; Tue, 22 Mar 2022 21:43:05 +0000 (UTC) X-FDA: 79273347930.31.448BC9F Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf01.hostedemail.com (Postfix) with ESMTP id CD37D40031 for ; Tue, 22 Mar 2022 21:43:04 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id BCF03B81D77; Tue, 22 Mar 2022 21:43:03 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 61053C340F2; Tue, 22 Mar 2022 21:43:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985383; bh=xOP4VsLcA/A6DThG+5c6QHsNCq7VcSxGTkMgyqZJ33Q=; h=Date:To:From:In-Reply-To:Subject:From; b=gRWk7wOp07OmBgxLZEXng+qBoCDeBf4+F6cG9CkWZ6o2Q+wt55EDTudRO4ULwFVbL UNoDDrczYoPQGcIlznv7urCMmGH0NVRVxEye+gU941C7L9lyGEhxM1ROZA3tUhI1cH Tsbozdvs5ofP6QtPTkw1JAGkkoxXKv0za/fEzoJs= Date: Tue, 22 Mar 2022 14:43:02 -0700 To: urezki@gmail.com,lpf.vector@gmail.com,libang.linuxer@gmail.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 089/227] mm/vmalloc: fix comments about vmap_area struct Message-Id: <20220322214303.61053C340F2@smtp.kernel.org> X-Stat-Signature: pw991nj9ataizxtjnwxuuoxk8epn1mg4 Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=gRWk7wOp; dmarc=none; spf=pass (imf01.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Rspam-User: X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: CD37D40031 X-HE-Tag: 1647985384-593012 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Bang Li Subject: mm/vmalloc: fix comments about vmap_area struct The vmap_area_root should be in the "busy" tree and the free_vmap_area_root should be in the "free" tree. Link: https://lkml.kernel.org/r/20220305011510.33596-1-libang.linuxer@gmail.com Fixes: 688fcbfc06e4 ("mm/vmalloc: modify struct vmap_area to reduce its size") Signed-off-by: Bang Li Reviewed-by: Uladzislau Rezki (Sony) Cc: Pengfei Li Signed-off-by: Andrew Morton --- include/linux/vmalloc.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) --- a/include/linux/vmalloc.h~mm-vmalloc-fix-comments-about-vmap_area-struct +++ a/include/linux/vmalloc.h @@ -80,8 +80,8 @@ struct vmap_area { /* * The following two variables can be packed, because * a vmap_area object can be either: - * 1) in "free" tree (root is vmap_area_root) - * 2) or "busy" tree (root is free_vmap_area_root) + * 1) in "free" tree (root is free_vmap_area_root) + * 2) or "busy" tree (root is vmap_area_root) */ union { unsigned long subtree_max_size; /* in "free" tree */ From patchwork Tue Mar 22 21:43:05 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789130 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D6D2FC433FE for ; Tue, 22 Mar 2022 21:43:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 674866B00D8; Tue, 22 Mar 2022 17:43:09 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5FDF66B00D9; Tue, 22 Mar 2022 17:43:09 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 49F376B00DA; Tue, 22 Mar 2022 17:43:09 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.a.hostedemail.com [64.99.140.24]) by kanga.kvack.org (Postfix) with ESMTP id 3513B6B00D8 for ; Tue, 22 Mar 2022 17:43:09 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id F3E46243EB for ; Tue, 22 Mar 2022 21:43:08 +0000 (UTC) X-FDA: 79273348056.15.B8AD204 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf07.hostedemail.com (Postfix) with ESMTP id 817CC40038 for ; Tue, 22 Mar 2022 21:43:08 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id CE2AB61222; Tue, 22 Mar 2022 21:43:07 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 6CF5AC340F5; Tue, 22 Mar 2022 21:43:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985386; bh=svYSNXXQduEdRBmFgUxBEObKmpC7v7mW7bTnEIqX8Pc=; h=Date:To:From:In-Reply-To:Subject:From; b=uq5qanz3UW9Gbmj4XyQmC74nNrxZ6dFTcx7lpfalmXpTOZd7O7/0LhbxTUnsXsKSl OE/g4dr9IeVAXVepISyssI8rjedkR+RK5MDguPDW8IEOhXYDnwQYkftbnHB4tlmd37 DR/2C/lutil5/f1UcXd+CYhWMpFU3Dpc5/QQ9tvo= Date: Tue, 22 Mar 2022 14:43:05 -0700 To: vbabka@suse.cz,rppt@linux.ibm.com,rppt@kernel.org,osalvador@suse.de,mgorman@techsingularity.net,david@redhat.com,ziy@nvidia.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 090/227] mm: page_alloc: avoid merging non-fallbackable pageblocks with others Message-Id: <20220322214306.6CF5AC340F5@smtp.kernel.org> X-Rspamd-Server: rspam09 X-Rspam-User: X-Stat-Signature: qqnd4gunn4dim7tgoonof7sxpuhsb1r1 Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=uq5qanz3; dmarc=none; spf=pass (imf07.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Rspamd-Queue-Id: 817CC40038 X-HE-Tag: 1647985388-762022 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Zi Yan Subject: mm: page_alloc: avoid merging non-fallbackable pageblocks with others This is done in addition to MIGRATE_ISOLATE pageblock merge avoidance. It prepares for the upcoming removal of the MAX_ORDER-1 alignment requirement for CMA and alloc_contig_range(). MIGRATE_HIGHATOMIC should not merge with other migratetypes like MIGRATE_ISOLATE and MIGRARTE_CMA[1], so this commit prevents that too. Remove MIGRATE_CMA and MIGRATE_ISOLATE from fallbacks list, since they are never used. [1] https://lore.kernel.org/linux-mm/20211130100853.GP3366@techsingularity.net/ Link: https://lkml.kernel.org/r/20220124175957.1261961-1-zi.yan@sent.com Signed-off-by: Zi Yan Acked-by: Mel Gorman Acked-by: David Hildenbrand Acked-by: Vlastimil Babka Acked-by: Mike Rapoport Reviewed-by: Oscar Salvador Cc: Mike Rapoport Signed-off-by: Andrew Morton --- include/linux/mmzone.h | 11 +++++++++ mm/page_alloc.c | 46 ++++++++++++++++++--------------------- 2 files changed, 33 insertions(+), 24 deletions(-) --- a/include/linux/mmzone.h~mm-page_alloc-avoid-merging-non-fallbackable-pageblocks-with-others +++ a/include/linux/mmzone.h @@ -83,6 +83,17 @@ static inline bool is_migrate_movable(in return is_migrate_cma(mt) || mt == MIGRATE_MOVABLE; } +/* + * Check whether a migratetype can be merged with another migratetype. + * + * It is only mergeable when it can fall back to other migratetypes for + * allocation. See fallbacks[MIGRATE_TYPES][3] in page_alloc.c. + */ +static inline bool migratetype_is_mergeable(int mt) +{ + return mt < MIGRATE_PCPTYPES; +} + #define for_each_migratetype_order(order, type) \ for (order = 0; order < MAX_ORDER; order++) \ for (type = 0; type < MIGRATE_TYPES; type++) --- a/mm/page_alloc.c~mm-page_alloc-avoid-merging-non-fallbackable-pageblocks-with-others +++ a/mm/page_alloc.c @@ -1117,25 +1117,24 @@ continue_merging: } if (order < MAX_ORDER - 1) { /* If we are here, it means order is >= pageblock_order. - * We want to prevent merge between freepages on isolate - * pageblock and normal pageblock. Without this, pageblock - * isolation could cause incorrect freepage or CMA accounting. + * We want to prevent merge between freepages on pageblock + * without fallbacks and normal pageblock. Without this, + * pageblock isolation could cause incorrect freepage or CMA + * accounting or HIGHATOMIC accounting. * * We don't want to hit this code for the more frequent * low-order merging. */ - if (unlikely(has_isolate_pageblock(zone))) { - int buddy_mt; + int buddy_mt; - buddy_pfn = __find_buddy_pfn(pfn, order); - buddy = page + (buddy_pfn - pfn); - buddy_mt = get_pageblock_migratetype(buddy); - - if (migratetype != buddy_mt - && (is_migrate_isolate(migratetype) || - is_migrate_isolate(buddy_mt))) - goto done_merging; - } + buddy_pfn = __find_buddy_pfn(pfn, order); + buddy = page + (buddy_pfn - pfn); + buddy_mt = get_pageblock_migratetype(buddy); + + if (migratetype != buddy_mt + && (!migratetype_is_mergeable(migratetype) || + !migratetype_is_mergeable(buddy_mt))) + goto done_merging; max_order = order + 1; goto continue_merging; } @@ -2479,17 +2478,13 @@ struct page *__rmqueue_smallest(struct z /* * This array describes the order lists are fallen back to when * the free lists for the desirable migrate type are depleted + * + * The other migratetypes do not have fallbacks. */ static int fallbacks[MIGRATE_TYPES][3] = { [MIGRATE_UNMOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE, MIGRATE_TYPES }, [MIGRATE_MOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_TYPES }, [MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE, MIGRATE_MOVABLE, MIGRATE_TYPES }, -#ifdef CONFIG_CMA - [MIGRATE_CMA] = { MIGRATE_TYPES }, /* Never used */ -#endif -#ifdef CONFIG_MEMORY_ISOLATION - [MIGRATE_ISOLATE] = { MIGRATE_TYPES }, /* Never used */ -#endif }; #ifdef CONFIG_CMA @@ -2795,8 +2790,8 @@ static void reserve_highatomic_pageblock /* Yoink! */ mt = get_pageblock_migratetype(page); - if (!is_migrate_highatomic(mt) && !is_migrate_isolate(mt) - && !is_migrate_cma(mt)) { + /* Only reserve normal pageblocks (i.e., they can merge with others) */ + if (migratetype_is_mergeable(mt)) { zone->nr_reserved_highatomic += pageblock_nr_pages; set_pageblock_migratetype(page, MIGRATE_HIGHATOMIC); move_freepages_block(zone, page, MIGRATE_HIGHATOMIC, NULL); @@ -3545,8 +3540,11 @@ int __isolate_free_page(struct page *pag struct page *endpage = page + (1 << order) - 1; for (; page < endpage; page += pageblock_nr_pages) { int mt = get_pageblock_migratetype(page); - if (!is_migrate_isolate(mt) && !is_migrate_cma(mt) - && !is_migrate_highatomic(mt)) + /* + * Only change normal pageblocks (i.e., they can merge + * with others) + */ + if (migratetype_is_mergeable(mt)) set_pageblock_migratetype(page, MIGRATE_MOVABLE); } From patchwork Tue Mar 22 21:43:08 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789131 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0993CC433FE for ; Tue, 22 Mar 2022 21:43:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 980B06B00DA; Tue, 22 Mar 2022 17:43:12 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 908ED6B00DB; Tue, 22 Mar 2022 17:43:12 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 782C16B00DC; Tue, 22 Mar 2022 17:43:12 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0020.hostedemail.com [216.40.44.20]) by kanga.kvack.org (Postfix) with ESMTP id 6327D6B00DA for ; Tue, 22 Mar 2022 17:43:12 -0400 (EDT) Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 2F9F3A4DBE for ; Tue, 22 Mar 2022 21:43:12 +0000 (UTC) X-FDA: 79273348224.29.083D8DE Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf19.hostedemail.com (Postfix) with ESMTP id B84D71A002F for ; Tue, 22 Mar 2022 21:43:11 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id A7685B81DAC; Tue, 22 Mar 2022 21:43:10 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 62FC5C340EC; Tue, 22 Mar 2022 21:43:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985389; bh=2fw4V34BMaBWwfa8lGQogtWOuNUo51fHy2VVhad5FVo=; h=Date:To:From:In-Reply-To:Subject:From; b=D7QK0v64JEml4OOpBVivrX3z4qSiP2SDe5UdQuX3UlhEiMIHfQPkkgU4uAJB8jteg sZ4NAuP5Um3Dm43Lb3SkTtv57rTWTycCqbYmn8JFKuny/gNuIXoRX/ep8FiUsc/plw gYt3rdYhPPhssQf7sBV2MS9moa43MVB9DbfnMhpk= Date: Tue, 22 Mar 2022 14:43:08 -0700 To: peterz@infradead.org,mgorman@suse.de,andreyknvl@gmail.com,pcc@google.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 091/227] mm/mmzone.c: use try_cmpxchg() in page_cpupid_xchg_last() Message-Id: <20220322214309.62FC5C340EC@smtp.kernel.org> X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: B84D71A002F X-Stat-Signature: ubz9wj9wp48upfkm9jykt3wce9nfpndh X-Rspam-User: Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=D7QK0v64; dmarc=none; spf=pass (imf19.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-HE-Tag: 1647985391-987560 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Peter Collingbourne Subject: mm/mmzone.c: use try_cmpxchg() in page_cpupid_xchg_last() This will let us avoid an additional read from page->flags when retrying the compare-exchange on some architectures. Link: https://lkml.kernel.org/r/20220120011200.1322836-1-pcc@google.com Link: https://linux-review.googlesource.com/id/I2e1f5b5b080ac9c4e0eb7f98768dba6fd7821693 Signed-off-by: Peter Collingbourne Suggested-by: Peter Zijlstra Cc: Andrey Konovalov Cc: Mel Gorman Signed-off-by: Andrew Morton --- mm/mmzone.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) --- a/mm/mmzone.c~mm-mmzonec-use-try_cmpxchg-in-page_cpupid_xchg_last +++ a/mm/mmzone.c @@ -89,13 +89,14 @@ int page_cpupid_xchg_last(struct page *p unsigned long old_flags, flags; int last_cpupid; + old_flags = READ_ONCE(page->flags); do { - old_flags = flags = page->flags; - last_cpupid = page_cpupid_last(page); + flags = old_flags; + last_cpupid = (flags >> LAST_CPUPID_PGSHIFT) & LAST_CPUPID_MASK; flags &= ~(LAST_CPUPID_MASK << LAST_CPUPID_PGSHIFT); flags |= (cpupid & LAST_CPUPID_MASK) << LAST_CPUPID_PGSHIFT; - } while (unlikely(cmpxchg(&page->flags, old_flags, flags) != old_flags)); + } while (unlikely(!try_cmpxchg(&page->flags, &old_flags, flags))); return last_cpupid; } From patchwork Tue Mar 22 21:43:11 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789132 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3F57FC433EF for ; Tue, 22 Mar 2022 21:43:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D09AF6B00DC; Tue, 22 Mar 2022 17:43:15 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C63D46B00DD; Tue, 22 Mar 2022 17:43:15 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B2C016B00DE; Tue, 22 Mar 2022 17:43:15 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.26]) by kanga.kvack.org (Postfix) with ESMTP id 9C6196B00DC for ; Tue, 22 Mar 2022 17:43:15 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 757B824409 for ; Tue, 22 Mar 2022 21:43:15 +0000 (UTC) X-FDA: 79273348350.09.9098ACB Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf23.hostedemail.com (Postfix) with ESMTP id EEE05140036 for ; Tue, 22 Mar 2022 21:43:14 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id B3E50B81DB3; Tue, 22 Mar 2022 21:43:13 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4278BC340EE; Tue, 22 Mar 2022 21:43:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985392; bh=bpaFSIc7ZBNjCH63R/S83QHOpSD0M8MAHs9Hh9FydY8=; h=Date:To:From:In-Reply-To:Subject:From; b=czwKs6ZrU8f66mUM0RC9Uf21mb6DMOUtyliQ71R3KtZk3SL+xLFw2vdDz/2PBkgTl lQc9pFe8103z2nEGWlZqoZv6n9qgSQVYREgKX3G3ziNjI5/noOhfciMWgdQ78B4ciC 0UKz6/6Hwhoax3YjjOQ2Yuqev8wGK3AKOVqN7vR8= Date: Tue, 22 Mar 2022 14:43:11 -0700 To: rppt@linux.ibm.com,david@redhat.com,linmiaohe@huawei.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 092/227] mm/mmzone.h: remove unused macros Message-Id: <20220322214312.4278BC340EE@smtp.kernel.org> X-Rspam-User: Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=czwKs6Zr; spf=pass (imf23.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: EEE05140036 X-Stat-Signature: xshcqq5j5wf4n9cpwmihkej1yznoupig X-HE-Tag: 1647985394-264578 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Miaohe Lin Subject: mm/mmzone.h: remove unused macros Remove pgdat_page_nr, nid_page_nr and NODE_MEM_MAP. They are unused now. Link: https://lkml.kernel.org/r/20220127093210.62293-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin Reviewed-by: David Hildenbrand Reviewed-by: Mike Rapoport Signed-off-by: Andrew Morton --- include/linux/mmzone.h | 7 ------- 1 file changed, 7 deletions(-) --- a/include/linux/mmzone.h~mm-mmzoneh-remove-unused-macros +++ a/include/linux/mmzone.h @@ -931,12 +931,6 @@ typedef struct pglist_data { #define node_present_pages(nid) (NODE_DATA(nid)->node_present_pages) #define node_spanned_pages(nid) (NODE_DATA(nid)->node_spanned_pages) -#ifdef CONFIG_FLATMEM -#define pgdat_page_nr(pgdat, pagenr) ((pgdat)->node_mem_map + (pagenr)) -#else -#define pgdat_page_nr(pgdat, pagenr) pfn_to_page((pgdat)->node_start_pfn + (pagenr)) -#endif -#define nid_page_nr(nid, pagenr) pgdat_page_nr(NODE_DATA(nid),(pagenr)) #define node_start_pfn(nid) (NODE_DATA(nid)->node_start_pfn) #define node_end_pfn(nid) pgdat_end_pfn(NODE_DATA(nid)) @@ -1112,7 +1106,6 @@ static inline struct pglist_data *NODE_D { return &contig_page_data; } -#define NODE_MEM_MAP(nid) mem_map #else /* CONFIG_NUMA */ From patchwork Tue Mar 22 21:43:14 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789133 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1D829C433EF for ; Tue, 22 Mar 2022 21:43:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AB8AC6B00DE; Tue, 22 Mar 2022 17:43:17 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A41606B00DF; Tue, 22 Mar 2022 17:43:17 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 894F66B00E0; Tue, 22 Mar 2022 17:43:17 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.27]) by kanga.kvack.org (Postfix) with ESMTP id 747E36B00DE for ; Tue, 22 Mar 2022 17:43:17 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 4423120662 for ; Tue, 22 Mar 2022 21:43:17 +0000 (UTC) X-FDA: 79273348434.02.A88D176 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf04.hostedemail.com (Postfix) with ESMTP id B5B074002E for ; Tue, 22 Mar 2022 21:43:16 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 8CF6AB81DAC; Tue, 22 Mar 2022 21:43:15 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 30E58C340EC; Tue, 22 Mar 2022 21:43:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985395; bh=gwx7EbpqxdKZ4x56sd/wrooQG2UM+FkrsbZkthcsn0o=; h=Date:To:From:In-Reply-To:Subject:From; b=D/Js3zL6IPeCs+niDVuf+/BD08mvPcNmRVUzVVo+trHjpJp/zvs4F+2ZMRsbUdxVH aksnPjmXgPscoP/KSRHLwG8Ooz1qNPbWW6zoat9gaEACHNY0IVlBOT69ZVoEZMCLkK mipKtN0Czj3RT+uRQUS6y8GyGD3xU9w+EdlsahOU= Date: Tue, 22 Mar 2022 14:43:14 -0700 To: willy@infradead.org,vbabka@suse.cz,nsaenzju@redhat.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 093/227] mm/page_alloc: don't pass pfn to free_unref_page_commit() Message-Id: <20220322214315.30E58C340EC@smtp.kernel.org> X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: B5B074002E X-Stat-Signature: pu4jkxkx63bfc6rih8sc86ao77y4t4zh Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b="D/Js3zL6"; dmarc=none; spf=pass (imf04.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Rspam-User: X-HE-Tag: 1647985396-376071 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Nicolas Saenz Julienne Subject: mm/page_alloc: don't pass pfn to free_unref_page_commit() free_unref_page_commit() doesn't make use of its pfn argument, so get rid of it. Link: https://lkml.kernel.org/r/20220202140451.415928-1-nsaenzju@redhat.com Signed-off-by: Nicolas Saenz Julienne Reviewed-by: Vlastimil Babka Reviewed-by: Matthew Wilcox (Oracle) Signed-off-by: Andrew Morton --- mm/page_alloc.c | 17 ++++++----------- 1 file changed, 6 insertions(+), 11 deletions(-) --- a/mm/page_alloc.c~mm-page_alloc-dont-pass-pfn-to-free_unref_page_commit +++ a/mm/page_alloc.c @@ -3366,8 +3366,8 @@ static int nr_pcp_high(struct per_cpu_pa return min(READ_ONCE(pcp->batch) << 2, high); } -static void free_unref_page_commit(struct page *page, unsigned long pfn, - int migratetype, unsigned int order) +static void free_unref_page_commit(struct page *page, int migratetype, + unsigned int order) { struct zone *zone = page_zone(page); struct per_cpu_pages *pcp; @@ -3416,7 +3416,7 @@ void free_unref_page(struct page *page, } local_lock_irqsave(&pagesets.lock, flags); - free_unref_page_commit(page, pfn, migratetype, order); + free_unref_page_commit(page, migratetype, order); local_unlock_irqrestore(&pagesets.lock, flags); } @@ -3426,13 +3426,13 @@ void free_unref_page(struct page *page, void free_unref_page_list(struct list_head *list) { struct page *page, *next; - unsigned long flags, pfn; + unsigned long flags; int batch_count = 0; int migratetype; /* Prepare pages for freeing */ list_for_each_entry_safe(page, next, list, lru) { - pfn = page_to_pfn(page); + unsigned long pfn = page_to_pfn(page); if (!free_unref_page_prepare(page, pfn, 0)) { list_del(&page->lru); continue; @@ -3448,15 +3448,10 @@ void free_unref_page_list(struct list_he free_one_page(page_zone(page), page, pfn, 0, migratetype, FPI_NONE); continue; } - - set_page_private(page, pfn); } local_lock_irqsave(&pagesets.lock, flags); list_for_each_entry_safe(page, next, list, lru) { - pfn = page_private(page); - set_page_private(page, 0); - /* * Non-isolated types over MIGRATE_PCPTYPES get added * to the MIGRATE_MOVABLE pcp list. @@ -3466,7 +3461,7 @@ void free_unref_page_list(struct list_he migratetype = MIGRATE_MOVABLE; trace_mm_page_free_batched(page); - free_unref_page_commit(page, pfn, migratetype, 0); + free_unref_page_commit(page, migratetype, 0); /* * Guard against excessive IRQ disabled times when we get From patchwork Tue Mar 22 21:43:17 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789134 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6A9F6C433F5 for ; Tue, 22 Mar 2022 21:43:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EB2546B00E0; Tue, 22 Mar 2022 17:43:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E39696B00E1; Tue, 22 Mar 2022 17:43:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CB2F46B00E2; Tue, 22 Mar 2022 17:43:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0187.hostedemail.com [216.40.44.187]) by kanga.kvack.org (Postfix) with ESMTP id B2C476B00E0 for ; Tue, 22 Mar 2022 17:43:21 -0400 (EDT) Received: from smtpin31.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 781F8A4DB6 for ; Tue, 22 Mar 2022 21:43:21 +0000 (UTC) X-FDA: 79273348602.31.F403B22 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf25.hostedemail.com (Postfix) with ESMTP id E8129A002E for ; Tue, 22 Mar 2022 21:43:20 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id B3932B81D9E; Tue, 22 Mar 2022 21:43:19 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 52B6BC340EC; Tue, 22 Mar 2022 21:43:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985398; bh=nU4vuK1+X2bVEE65CPSbHW/aMzkv7aCM7r+8wsnsy3o=; h=Date:To:From:In-Reply-To:Subject:From; b=0QBUERgUSQl5aw7ljy8yAmzC1xafCBJgzWhPQ2DrUkHwcw77++PESKH15vVWVF2B4 RHAzGbmSKBwwjGME0CFTesjT7Up+yTjC4BbY1Og4kxd6evZLOcTvUZs1vX8pfSuqwK F9HXx7XaOhQ+CtPD/yrz6pYgknX5kQ+IRwiqhHEE= Date: Tue, 22 Mar 2022 14:43:17 -0700 To: ziy@nvidia.com,vbabka@suse.cz,robin.murphy@arm.com,robh@kernel.org,paulus@samba.org,m.szyprowski@samsung.com,mst@redhat.com,mpe@ellerman.id.au,minchan@kernel.org,iommu@lists.linux-foundation.org,hch@lst.de,frowand.list@gmail.com,benh@kernel.crashing.org,aneesh.kumar@linux.ibm.com,david@redhat.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 094/227] cma: factor out minimum alignment requirement Message-Id: <20220322214318.52B6BC340EC@smtp.kernel.org> X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: E8129A002E X-Rspam-User: Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=0QBUERgU; dmarc=none; spf=pass (imf25.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Stat-Signature: yrxotxzcfysrxfagfmw7sgskfhp5q6r6 X-HE-Tag: 1647985400-499313 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: David Hildenbrand Subject: cma: factor out minimum alignment requirement Patch series "mm: enforce pageblock_order < MAX_ORDER". Having pageblock_order >= MAX_ORDER seems to be able to happen in corner cases and some parts of the kernel are not prepared for it. For example, Aneesh has shown [1] that such kernels can be compiled on ppc64 with 64k base pages by setting FORCE_MAX_ZONEORDER=8, which will run into a WARN_ON_ONCE(order >= MAX_ORDER) in comapction code right during boot. We can get pageblock_order >= MAX_ORDER when the default hugetlb size is bigger than the maximum allocation granularity of the buddy, in which case we are no longer talking about huge pages but instead gigantic pages. Having pageblock_order >= MAX_ORDER can only make alloc_contig_range() of such gigantic pages more likely to succeed. Reliable use of gigantic pages either requires boot time allcoation or CMA, no need to overcomplicate some places in the kernel to optimize for corner cases that are broken in other areas of the kernel. This patch (of 2): Let's enforce pageblock_order < MAX_ORDER and simplify. Especially patch #1 can be regarded a cleanup before: [PATCH v5 0/6] Use pageblock_order for cma and alloc_contig_range alignment. [2] [1] https://lkml.kernel.org/r/87r189a2ks.fsf@linux.ibm.com [2] https://lkml.kernel.org/r/20220211164135.1803616-1-zi.yan@sent.com Link: https://lkml.kernel.org/r/20220214174132.219303-2-david@redhat.com Signed-off-by: David Hildenbrand Reviewed-by: Zi Yan Acked-by: Rob Herring Cc: Aneesh Kumar K.V Cc: Michael Ellerman Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Frank Rowand Cc: Michael S. Tsirkin Cc: Christoph Hellwig Cc: Marek Szyprowski Cc: Robin Murphy Cc: Minchan Kim Cc: Vlastimil Babka Cc: John Garry via iommu Signed-off-by: Andrew Morton --- arch/powerpc/include/asm/fadump-internal.h | 5 ---- arch/powerpc/kernel/fadump.c | 2 - drivers/of/of_reserved_mem.c | 9 ++------ include/linux/cma.h | 9 ++++++++ kernel/dma/contiguous.c | 4 --- mm/cma.c | 20 ++++--------------- 6 files changed, 19 insertions(+), 30 deletions(-) --- a/arch/powerpc/include/asm/fadump-internal.h~cma-factor-out-minimum-alignment-requirement +++ a/arch/powerpc/include/asm/fadump-internal.h @@ -19,11 +19,6 @@ #define memblock_num_regions(memblock_type) (memblock.memblock_type.cnt) -/* Alignment per CMA requirement. */ -#define FADUMP_CMA_ALIGNMENT (PAGE_SIZE << \ - max_t(unsigned long, MAX_ORDER - 1, \ - pageblock_order)) - /* FAD commands */ #define FADUMP_REGISTER 1 #define FADUMP_UNREGISTER 2 --- a/arch/powerpc/kernel/fadump.c~cma-factor-out-minimum-alignment-requirement +++ a/arch/powerpc/kernel/fadump.c @@ -544,7 +544,7 @@ int __init fadump_reserve_mem(void) if (!fw_dump.nocma) { fw_dump.boot_memory_size = ALIGN(fw_dump.boot_memory_size, - FADUMP_CMA_ALIGNMENT); + CMA_MIN_ALIGNMENT_BYTES); } #endif --- a/drivers/of/of_reserved_mem.c~cma-factor-out-minimum-alignment-requirement +++ a/drivers/of/of_reserved_mem.c @@ -22,6 +22,7 @@ #include #include #include +#include #include "of_private.h" @@ -116,12 +117,8 @@ static int __init __reserved_mem_alloc_s if (IS_ENABLED(CONFIG_CMA) && of_flat_dt_is_compatible(node, "shared-dma-pool") && of_get_flat_dt_prop(node, "reusable", NULL) - && !nomap) { - unsigned long order = - max_t(unsigned long, MAX_ORDER - 1, pageblock_order); - - align = max(align, (phys_addr_t)PAGE_SIZE << order); - } + && !nomap) + align = max_t(phys_addr_t, align, CMA_MIN_ALIGNMENT_BYTES); prop = of_get_flat_dt_prop(node, "alloc-ranges", &len); if (prop) { --- a/include/linux/cma.h~cma-factor-out-minimum-alignment-requirement +++ a/include/linux/cma.h @@ -20,6 +20,15 @@ #define CMA_MAX_NAME 64 +/* + * TODO: once the buddy -- especially pageblock merging and alloc_contig_range() + * -- can deal with only some pageblocks of a higher-order page being + * MIGRATE_CMA, we can use pageblock_nr_pages. + */ +#define CMA_MIN_ALIGNMENT_PAGES max_t(phys_addr_t, MAX_ORDER_NR_PAGES, \ + pageblock_nr_pages) +#define CMA_MIN_ALIGNMENT_BYTES (PAGE_SIZE * CMA_MIN_ALIGNMENT_PAGES) + struct cma; extern unsigned long totalcma_pages; --- a/kernel/dma/contiguous.c~cma-factor-out-minimum-alignment-requirement +++ a/kernel/dma/contiguous.c @@ -399,8 +399,6 @@ static const struct reserved_mem_ops rme static int __init rmem_cma_setup(struct reserved_mem *rmem) { - phys_addr_t align = PAGE_SIZE << max(MAX_ORDER - 1, pageblock_order); - phys_addr_t mask = align - 1; unsigned long node = rmem->fdt_node; bool default_cma = of_get_flat_dt_prop(node, "linux,cma-default", NULL); struct cma *cma; @@ -416,7 +414,7 @@ static int __init rmem_cma_setup(struct of_get_flat_dt_prop(node, "no-map", NULL)) return -EINVAL; - if ((rmem->base & mask) || (rmem->size & mask)) { + if (!IS_ALIGNED(rmem->base | rmem->size, CMA_MIN_ALIGNMENT_BYTES)) { pr_err("Reserved memory: incorrect alignment of CMA region\n"); return -EINVAL; } --- a/mm/cma.c~cma-factor-out-minimum-alignment-requirement +++ a/mm/cma.c @@ -168,7 +168,6 @@ int __init cma_init_reserved_mem(phys_ad struct cma **res_cma) { struct cma *cma; - phys_addr_t alignment; /* Sanity checks */ if (cma_area_count == ARRAY_SIZE(cma_areas)) { @@ -179,15 +178,12 @@ int __init cma_init_reserved_mem(phys_ad if (!size || !memblock_is_region_reserved(base, size)) return -EINVAL; - /* ensure minimal alignment required by mm core */ - alignment = PAGE_SIZE << - max_t(unsigned long, MAX_ORDER - 1, pageblock_order); - /* alignment should be aligned with order_per_bit */ - if (!IS_ALIGNED(alignment >> PAGE_SHIFT, 1 << order_per_bit)) + if (!IS_ALIGNED(CMA_MIN_ALIGNMENT_PAGES, 1 << order_per_bit)) return -EINVAL; - if (ALIGN(base, alignment) != base || ALIGN(size, alignment) != size) + /* ensure minimal alignment required by mm core */ + if (!IS_ALIGNED(base | size, CMA_MIN_ALIGNMENT_BYTES)) return -EINVAL; /* @@ -262,14 +258,8 @@ int __init cma_declare_contiguous_nid(ph if (alignment && !is_power_of_2(alignment)) return -EINVAL; - /* - * Sanitise input arguments. - * Pages both ends in CMA area could be merged into adjacent unmovable - * migratetype page by page allocator's buddy algorithm. In the case, - * you couldn't get a contiguous memory, which is not what we want. - */ - alignment = max(alignment, (phys_addr_t)PAGE_SIZE << - max_t(unsigned long, MAX_ORDER - 1, pageblock_order)); + /* Sanitise input arguments. */ + alignment = max_t(phys_addr_t, alignment, CMA_MIN_ALIGNMENT_BYTES); if (fixed && base & (alignment - 1)) { ret = -EINVAL; pr_err("Region at %pa must be aligned to %pa bytes\n", From patchwork Tue Mar 22 21:43:20 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789135 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 74227C433EF for ; Tue, 22 Mar 2022 21:43:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 11B316B00E2; Tue, 22 Mar 2022 17:43:25 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0A4926B00E3; Tue, 22 Mar 2022 17:43:25 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EAE196B00E4; Tue, 22 Mar 2022 17:43:24 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0236.hostedemail.com [216.40.44.236]) by kanga.kvack.org (Postfix) with ESMTP id D4A2D6B00E2 for ; Tue, 22 Mar 2022 17:43:24 -0400 (EDT) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 9D7A51828AC94 for ; Tue, 22 Mar 2022 21:43:24 +0000 (UTC) X-FDA: 79273348728.18.08F26D2 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf02.hostedemail.com (Postfix) with ESMTP id E9BE68001E for ; Tue, 22 Mar 2022 21:43:23 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id B6599B81DAC; Tue, 22 Mar 2022 21:43:22 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 737AFC340EC; Tue, 22 Mar 2022 21:43:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985401; bh=JbxjnACt75jLJ1na1EB5yAF54JD2P6L7KwOtRJMAY78=; h=Date:To:From:In-Reply-To:Subject:From; b=RAXwG0t4EXQ2J8tNHurQmNhTnwvRa2eh1l/H8FF8gY8m61cR0ukWzfVPRCfIyCTdk 7Khhrc2gq//r072gYTT9NpqX3gvljKFSW/fpu0u2FMxAjwk/El2NRGkSxvaim4sISr xX8VyHfVt2JPw420GWj+TKww8ZPjli+2sX5xqwg8= Date: Tue, 22 Mar 2022 14:43:20 -0700 To: ziy@nvidia.com,vbabka@suse.cz,robin.murphy@arm.com,robh+dt@kernel.org,paulus@samba.org,m.szyprowski@samsung.com,mst@redhat.com,mpe@ellerman.id.au,minchan@kernel.org,iommu@lists.linux-foundation.org,hch@lst.de,frowand.list@gmail.com,benh@kernel.crashing.org,aneesh.kumar@linux.ibm.com,david@redhat.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 095/227] mm: enforce pageblock_order < MAX_ORDER Message-Id: <20220322214321.737AFC340EC@smtp.kernel.org> X-Stat-Signature: fg6u5f78s5js1ig5xbhugt3133yaso4r Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=RAXwG0t4; spf=pass (imf02.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspam-User: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: E9BE68001E X-HE-Tag: 1647985403-472990 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: David Hildenbrand Subject: mm: enforce pageblock_order < MAX_ORDER Some places in the kernel don't really expect pageblock_order >= MAX_ORDER, and it looks like this is only possible in corner cases: 1) CONFIG_DEFERRED_STRUCT_PAGE_INIT we'll end up freeing pageblock_order pages via __free_pages_core(), which cannot possibly work. 2) find_zone_movable_pfns_for_nodes() will roundup the ZONE_MOVABLE start PFN to MAX_ORDER_NR_PAGES. Consequently with a bigger pageblock_order, we could have a single pageblock partially managed by two zones. 3) compaction code runs into __fragmentation_index() with order >= MAX_ORDER, when checking WARN_ON_ONCE(order >= MAX_ORDER). [1] 4) mm/page_reporting.c won't be reporting any pages with default page_reporting_order == pageblock_order, as we'll be skipping the reporting loop inside page_reporting_process_zone(). 5) __rmqueue_fallback() will never be able to steal with ALLOC_NOFRAGMENT. pageblock_order >= MAX_ORDER is weird either way: it's a pure optimization for making alloc_contig_range(), as used for allcoation of gigantic pages, a little more reliable to succeed. However, if there is demand for somewhat reliable allocation of gigantic pages, affected setups should be using CMA or boottime allocations instead. So let's make sure that pageblock_order < MAX_ORDER and simplify. [1] https://lkml.kernel.org/r/87r189a2ks.fsf@linux.ibm.com Link: https://lkml.kernel.org/r/20220214174132.219303-3-david@redhat.com Signed-off-by: David Hildenbrand Reviewed-by: Zi Yan Cc: Aneesh Kumar K.V Cc: Benjamin Herrenschmidt Cc: Christoph Hellwig Cc: Frank Rowand Cc: John Garry via iommu Cc: Marek Szyprowski Cc: Michael Ellerman Cc: Michael S. Tsirkin Cc: Minchan Kim Cc: Paul Mackerras Cc: Rob Herring Cc: Robin Murphy Cc: Vlastimil Babka Signed-off-by: Andrew Morton --- drivers/virtio/virtio_mem.c | 9 ++------ include/linux/cma.h | 3 -- include/linux/pageblock-flags.h | 7 ++++-- mm/Kconfig | 3 ++ mm/page_alloc.c | 32 +++++++----------------------- 5 files changed, 20 insertions(+), 34 deletions(-) --- a/drivers/virtio/virtio_mem.c~mm-enforce-pageblock_order-max_order +++ a/drivers/virtio/virtio_mem.c @@ -2476,13 +2476,10 @@ static int virtio_mem_init_hotplug(struc VIRTIO_MEM_DEFAULT_OFFLINE_THRESHOLD); /* - * We want subblocks to span at least MAX_ORDER_NR_PAGES and - * pageblock_nr_pages pages. This: - * - Is required for now for alloc_contig_range() to work reliably - - * it doesn't properly handle smaller granularity on ZONE_NORMAL. + * TODO: once alloc_contig_range() works reliably with pageblock + * granularity on ZONE_NORMAL, use pageblock_nr_pages instead. */ - sb_size = max_t(uint64_t, MAX_ORDER_NR_PAGES, - pageblock_nr_pages) * PAGE_SIZE; + sb_size = PAGE_SIZE * MAX_ORDER_NR_PAGES; sb_size = max_t(uint64_t, vm->device_block_size, sb_size); if (sb_size < memory_block_size_bytes() && !force_bbm) { --- a/include/linux/cma.h~mm-enforce-pageblock_order-max_order +++ a/include/linux/cma.h @@ -25,8 +25,7 @@ * -- can deal with only some pageblocks of a higher-order page being * MIGRATE_CMA, we can use pageblock_nr_pages. */ -#define CMA_MIN_ALIGNMENT_PAGES max_t(phys_addr_t, MAX_ORDER_NR_PAGES, \ - pageblock_nr_pages) +#define CMA_MIN_ALIGNMENT_PAGES MAX_ORDER_NR_PAGES #define CMA_MIN_ALIGNMENT_BYTES (PAGE_SIZE * CMA_MIN_ALIGNMENT_PAGES) struct cma; --- a/include/linux/pageblock-flags.h~mm-enforce-pageblock_order-max_order +++ a/include/linux/pageblock-flags.h @@ -37,8 +37,11 @@ extern unsigned int pageblock_order; #else /* CONFIG_HUGETLB_PAGE_SIZE_VARIABLE */ -/* Huge pages are a constant size */ -#define pageblock_order HUGETLB_PAGE_ORDER +/* + * Huge pages are a constant size, but don't exceed the maximum allocation + * granularity. + */ +#define pageblock_order min_t(unsigned int, HUGETLB_PAGE_ORDER, MAX_ORDER - 1) #endif /* CONFIG_HUGETLB_PAGE_SIZE_VARIABLE */ --- a/mm/Kconfig~mm-enforce-pageblock_order-max_order +++ a/mm/Kconfig @@ -262,6 +262,9 @@ config HUGETLB_PAGE_SIZE_VARIABLE HUGETLB_PAGE_ORDER when there are multiple HugeTLB page sizes available on a platform. + Note that the pageblock_order cannot exceed MAX_ORDER - 1 and will be + clamped down to MAX_ORDER - 1. + config CONTIG_ALLOC def_bool (MEMORY_ISOLATION && COMPACTION) || CMA --- a/mm/page_alloc.c~mm-enforce-pageblock_order-max_order +++ a/mm/page_alloc.c @@ -1072,14 +1072,12 @@ static inline void __free_one_page(struc int migratetype, fpi_t fpi_flags) { struct capture_control *capc = task_capc(zone); + unsigned int max_order = pageblock_order; unsigned long buddy_pfn; unsigned long combined_pfn; - unsigned int max_order; struct page *buddy; bool to_tail; - max_order = min_t(unsigned int, MAX_ORDER - 1, pageblock_order); - VM_BUG_ON(!zone_is_initialized(zone)); VM_BUG_ON_PAGE(page->flags & PAGE_FLAGS_CHECK_AT_PREP, page); @@ -2259,19 +2257,8 @@ void __init init_cma_reserved_pageblock( } while (++p, --i); set_pageblock_migratetype(page, MIGRATE_CMA); - - if (pageblock_order >= MAX_ORDER) { - i = pageblock_nr_pages; - p = page; - do { - set_page_refcounted(p); - __free_pages(p, MAX_ORDER - 1); - p += MAX_ORDER_NR_PAGES; - } while (i -= MAX_ORDER_NR_PAGES); - } else { - set_page_refcounted(page); - __free_pages(page, pageblock_order); - } + set_page_refcounted(page); + __free_pages(page, pageblock_order); adjust_managed_page_count(page, pageblock_nr_pages); page_zone(page)->cma_pages += pageblock_nr_pages; @@ -7382,16 +7369,15 @@ static inline void setup_usemap(struct z /* Initialise the number of pages represented by NR_PAGEBLOCK_BITS */ void __init set_pageblock_order(void) { - unsigned int order; + unsigned int order = MAX_ORDER - 1; /* Check that pageblock_nr_pages has not already been setup */ if (pageblock_order) return; - if (HPAGE_SHIFT > PAGE_SHIFT) + /* Don't let pageblocks exceed the maximum allocation granularity. */ + if (HPAGE_SHIFT > PAGE_SHIFT && HUGETLB_PAGE_ORDER < order) order = HUGETLB_PAGE_ORDER; - else - order = MAX_ORDER - 1; /* * Assume the largest contiguous order of interest is a huge page. @@ -8979,14 +8965,12 @@ struct page *has_unmovable_pages(struct #ifdef CONFIG_CONTIG_ALLOC static unsigned long pfn_max_align_down(unsigned long pfn) { - return pfn & ~(max_t(unsigned long, MAX_ORDER_NR_PAGES, - pageblock_nr_pages) - 1); + return ALIGN_DOWN(pfn, MAX_ORDER_NR_PAGES); } static unsigned long pfn_max_align_up(unsigned long pfn) { - return ALIGN(pfn, max_t(unsigned long, MAX_ORDER_NR_PAGES, - pageblock_nr_pages)); + return ALIGN(pfn, MAX_ORDER_NR_PAGES); } #if defined(CONFIG_DYNAMIC_DEBUG) || \ From patchwork Tue Mar 22 21:43:23 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789196 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 540F2C433EF for ; Tue, 22 Mar 2022 21:44:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D9CF56B0104; Tue, 22 Mar 2022 17:44:18 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D4D546B0105; Tue, 22 Mar 2022 17:44:18 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C152C6B0106; Tue, 22 Mar 2022 17:44:18 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.a.hostedemail.com [64.99.140.24]) by kanga.kvack.org (Postfix) with ESMTP id B07E86B0104 for ; Tue, 22 Mar 2022 17:44:18 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay12.hostedemail.com (Postfix) with ESMTP id 758E91216E2 for ; Tue, 22 Mar 2022 21:44:18 +0000 (UTC) X-FDA: 79273350996.06.4D2414E Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf07.hostedemail.com (Postfix) with ESMTP id A28E34000B for ; Tue, 22 Mar 2022 21:43:25 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 1A8546119A; Tue, 22 Mar 2022 21:43:25 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 6E8CBC340EE; Tue, 22 Mar 2022 21:43:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985404; bh=BTbxSfAV6DWjTz04fFk5UPDaQzZk2XbSXqpFX9Uk0ko=; h=Date:To:From:In-Reply-To:Subject:From; b=stNCqLzUl+qLe87CLvT7a/c/BAlfAplrekX21RPDgkScfqvJ4rKEKVaab9rUR+wxk OcOFY8Hah43pX0eJkfdbYOHcXe07Pmppisq/JU1jSXyFIWjIJs2C459xsRDm0lSlRo YTrg77RI8o2KGaHzwx3ZuBp2KyWfSjHr48yvJ8Xk= Date: Tue, 22 Mar 2022 14:43:23 -0700 To: peterz@infradead.org,ndesaulniers@google.com,bot@kernelci.org,bigeasy@linutronix.de,nathan@kernel.org,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 096/227] mm/page_alloc: mark pagesets as __maybe_unused Message-Id: <20220322214324.6E8CBC340EE@smtp.kernel.org> Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=stNCqLzU; spf=pass (imf07.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspam-User: X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: A28E34000B X-Stat-Signature: 3x9u9dfgpxgygbcjxncdskn963zxappp X-HE-Tag: 1647985405-493297 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000001, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Nathan Chancellor Subject: mm/page_alloc: mark pagesets as __maybe_unused Commit 9983a9d577db ("locking/local_lock: Make the empty local_lock_*() function a macro.") in the -tip tree converted the local_lock_*() functions into macros, which causes a warning with clang with CONFIG_PREEMPT_RT=n + CONFIG_DEBUG_LOCK_ALLOC=n: mm/page_alloc.c:131:40: error: variable 'pagesets' is not needed and will not be emitted [-Werror,-Wunneeded-internal-declaration] static DEFINE_PER_CPU(struct pagesets, pagesets) = { ^ 1 error generated. Prior to that change, clang was not able to tell that pagesets was unused in this configuration because it does not perform cross function analysis in the frontend. After that change, it sees that the macros just do a typecheck on the lock member of pagesets, which is evaluated at compile time (so the variable is technically "used"), meaning the variable is not needed in the final assembly, as the warning states. Mark the variable as __maybe_unused to make it clear to clang that this is expected in this configuration so there is no more warning. Link: https://github.com/ClangBuiltLinux/linux/issues/1593 Link: https://lkml.kernel.org/r/20220215184322.440969-1-nathan@kernel.org Signed-off-by: Nathan Chancellor Suggested-by: Nick Desaulniers Reported-by: "kernelci.org bot" Cc: Sebastian Andrzej Siewior Cc: Peter Zijlstra Signed-off-by: Andrew Morton --- mm/page_alloc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/mm/page_alloc.c~mm-page_alloc-mark-pagesets-as-__maybe_unused +++ a/mm/page_alloc.c @@ -128,7 +128,7 @@ static DEFINE_MUTEX(pcp_batch_high_lock) struct pagesets { local_lock_t lock; }; -static DEFINE_PER_CPU(struct pagesets, pagesets) = { +static DEFINE_PER_CPU(struct pagesets, pagesets) __maybe_unused = { .lock = INIT_LOCAL_LOCK(lock), }; From patchwork Tue Mar 22 21:43:26 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789136 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1E9BDC433F5 for ; Tue, 22 Mar 2022 21:43:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A6A526B00E4; Tue, 22 Mar 2022 17:43:29 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9F4C46B00E5; Tue, 22 Mar 2022 17:43:29 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 893E76B00E6; Tue, 22 Mar 2022 17:43:29 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.25]) by kanga.kvack.org (Postfix) with ESMTP id 777426B00E4 for ; Tue, 22 Mar 2022 17:43:29 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 4426121D12 for ; Tue, 22 Mar 2022 21:43:29 +0000 (UTC) X-FDA: 79273348938.15.A420E25 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf17.hostedemail.com (Postfix) with ESMTP id C01984002D for ; Tue, 22 Mar 2022 21:43:28 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 3C05F6101E; Tue, 22 Mar 2022 21:43:28 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 8E144C340EC; Tue, 22 Mar 2022 21:43:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985407; bh=lbXbgM49FkvqVbL9ZyFann/L6qHd8yb1fr80Coe+S5g=; h=Date:To:From:In-Reply-To:Subject:From; b=WnsGfEp5PdH2jDhopowP+b/eAokXEQ7l2T/lUAoKfxqDwWOHXZ6oUovsd0v2XqfE4 jUKZy1P4xv1lXLb0iBz65aMPGFSeEwjZX6h1fEXgP3tKmlegNo7H02p0IkVnqnZijk UiZg0dJZKpDPey7Ha9jcmrvkq+GaeXEO5vVbTqec= Date: Tue, 22 Mar 2022 14:43:26 -0700 To: ziy@nvidia.com,stable@vger.kernel.org,osalvador@suse.de,mgorman@techsingularity.net,jhubbard@nvidia.com,david@redhat.com,anshuman.khandual@arm.com,apopple@nvidia.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 097/227] mm/pages_alloc.c: don't create ZONE_MOVABLE beyond the end of a node Message-Id: <20220322214327.8E144C340EC@smtp.kernel.org> X-Stat-Signature: ird6ckhhjfryhi7xeacm95yj17jgnp59 Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=WnsGfEp5; spf=pass (imf17.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspam-User: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: C01984002D X-HE-Tag: 1647985408-704688 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alistair Popple Subject: mm/pages_alloc.c: don't create ZONE_MOVABLE beyond the end of a node ZONE_MOVABLE uses the remaining memory in each node. Its starting pfn is also aligned to MAX_ORDER_NR_PAGES. It is possible for the remaining memory in a node to be less than MAX_ORDER_NR_PAGES, meaning there is not enough room for ZONE_MOVABLE on that node. Unfortunately this condition is not checked for. This leads to zone_movable_pfn[] getting set to a pfn greater than the last pfn in a node. calculate_node_totalpages() then sets zone->present_pages to be greater than zone->spanned_pages which is invalid, as spanned_pages represents the maximum number of pages in a zone assuming no holes. Subsequently it is possible free_area_init_core() will observe a zone of size zero with present pages. In this case it will skip setting up the zone, including the initialisation of free_lists[]. However populated_zone() checks zone->present_pages to see if a zone has memory available. This is used by iterators such as walk_zones_in_node(). pagetypeinfo_showfree() uses this to walk the free_list of each zone in each node, which are assumed to be initialised due to the zone not being empty. As free_area_init_core() never initialised the free_lists[] this results in the following kernel crash when trying to read /proc/pagetypeinfo: [ 67.534914] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 67.535429] #PF: supervisor read access in kernel mode [ 67.535789] #PF: error_code(0x0000) - not-present page [ 67.536128] PGD 0 P4D 0 [ 67.536305] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC NOPTI [ 67.536696] CPU: 0 PID: 456 Comm: cat Not tainted 5.16.0 #461 [ 67.537096] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014 [ 67.537638] RIP: 0010:pagetypeinfo_show+0x163/0x460 [ 67.537992] Code: 9e 82 e8 80 57 0e 00 49 8b 06 b9 01 00 00 00 4c 39 f0 75 16 e9 65 02 00 00 48 83 c1 01 48 81 f9 a0 86 01 00 0f 84 48 02 00 00 <48> 8b 00 4c 39 f0 75 e7 48 c7 c2 80 a2 e2 82 48 c7 c6 79 ef e3 82 [ 67.538259] RSP: 0018:ffffc90001c4bd10 EFLAGS: 00010003 [ 67.538259] RAX: 0000000000000000 RBX: ffff88801105f638 RCX: 0000000000000001 [ 67.538259] RDX: 0000000000000001 RSI: 000000000000068b RDI: ffff8880163dc68b [ 67.538259] RBP: ffffc90001c4bd90 R08: 0000000000000001 R09: ffff8880163dc67e [ 67.538259] R10: 656c6261766f6d6e R11: 6c6261766f6d6e55 R12: ffff88807ffb4a00 [ 67.538259] R13: ffff88807ffb49f8 R14: ffff88807ffb4580 R15: ffff88807ffb3000 [ 67.538259] FS: 00007f9c83eff5c0(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000 [ 67.538259] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 67.538259] CR2: 0000000000000000 CR3: 0000000013c8e000 CR4: 0000000000350ef0 [ 67.538259] Call Trace: [ 67.538259] [ 67.538259] seq_read_iter+0x128/0x460 [ 67.538259] ? aa_file_perm+0x1af/0x5f0 [ 67.538259] proc_reg_read_iter+0x51/0x80 [ 67.538259] ? lock_is_held_type+0xea/0x140 [ 67.538259] new_sync_read+0x113/0x1a0 [ 67.538259] vfs_read+0x136/0x1d0 [ 67.538259] ksys_read+0x70/0xf0 [ 67.538259] __x64_sys_read+0x1a/0x20 [ 67.538259] do_syscall_64+0x3b/0xc0 [ 67.538259] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 67.538259] RIP: 0033:0x7f9c83e23cce [ 67.538259] Code: c0 e9 b6 fe ff ff 50 48 8d 3d 6e 13 0a 00 e8 c9 e3 01 00 66 0f 1f 84 00 00 00 00 00 64 8b 04 25 18 00 00 00 85 c0 75 14 0f 05 <48> 3d 00 f0 ff ff 77 5a c3 66 0f 1f 84 00 00 00 00 00 48 83 ec 28 [ 67.538259] RSP: 002b:00007fff116e1a08 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [ 67.538259] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007f9c83e23cce [ 67.538259] RDX: 0000000000020000 RSI: 00007f9c83a2c000 RDI: 0000000000000003 [ 67.538259] RBP: 00007f9c83a2c000 R08: 00007f9c83a2b010 R09: 0000000000000000 [ 67.538259] R10: 00007f9c83f2d7d0 R11: 0000000000000246 R12: 0000000000000000 [ 67.538259] R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000020000 [ 67.538259] Fix this by checking that the aligned zone_movable_pfn[] does not exceed the end of the node, and if it does skip creating a movable zone on this node. Link: https://lkml.kernel.org/r/20220215025831.2113067-1-apopple@nvidia.com Fixes: 2a1e274acf0b ("Create the ZONE_MOVABLE zone") Signed-off-by: Alistair Popple Acked-by: David Hildenbrand Acked-by: Mel Gorman Cc: John Hubbard Cc: Zi Yan Cc: Anshuman Khandual Cc: Oscar Salvador Cc: Signed-off-by: Andrew Morton --- mm/page_alloc.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) --- a/mm/page_alloc.c~mm-pages_allocc-dont-create-zone_movable-beyond-the-end-of-a-node +++ a/mm/page_alloc.c @@ -7951,10 +7951,17 @@ restart: out2: /* Align start of ZONE_MOVABLE on all nids to MAX_ORDER_NR_PAGES */ - for (nid = 0; nid < MAX_NUMNODES; nid++) + for (nid = 0; nid < MAX_NUMNODES; nid++) { + unsigned long start_pfn, end_pfn; + zone_movable_pfn[nid] = roundup(zone_movable_pfn[nid], MAX_ORDER_NR_PAGES); + get_pfn_range_for_nid(nid, &start_pfn, &end_pfn); + if (zone_movable_pfn[nid] >= end_pfn) + zone_movable_pfn[nid] = 0; + } + out: /* restore the node_state */ node_states[N_MEMORY] = saved_node_state; From patchwork Tue Mar 22 21:43:30 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789137 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 285BAC433F5 for ; Tue, 22 Mar 2022 21:43:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B7C356B00E6; Tue, 22 Mar 2022 17:43:32 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B05936B00E7; Tue, 22 Mar 2022 17:43:32 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9F3606B00E8; Tue, 22 Mar 2022 17:43:32 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0219.hostedemail.com [216.40.44.219]) by kanga.kvack.org (Postfix) with ESMTP id 8C2386B00E6 for ; Tue, 22 Mar 2022 17:43:32 -0400 (EDT) Received: from smtpin30.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 553708249980 for ; Tue, 22 Mar 2022 21:43:32 +0000 (UTC) X-FDA: 79273349064.30.DE3A17B Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf24.hostedemail.com (Postfix) with ESMTP id D211B180039 for ; Tue, 22 Mar 2022 21:43:31 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 4A9C260A14; Tue, 22 Mar 2022 21:43:31 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9D78CC340EE; Tue, 22 Mar 2022 21:43:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985410; bh=MvlE/YdWISrTBlm8HEn6KO4CRbuwRXLBW8jijui9aFI=; h=Date:To:From:In-Reply-To:Subject:From; b=DGpPWhMES2JMpX2swDCOx9utDKzljS3ZJ5ZWXAvyrGKZYpzz+4e9lfJoDMg6QDTe0 qOBHj5cVN7Nd4hioRvQ9zP2TpNxSHKgfo6CfkYOjIsq1jTcCkQNRJV3p4FjEEhyg1V HEjkPa6slnz33/z6mrmGvAZmMtunYb7ljDvEXPH0= Date: Tue, 22 Mar 2022 14:43:30 -0700 To: vbabka@suse.cz,mhocko@kernel.org,dave.hansen@linux.intel.com,brouer@redhat.com,aaron.lu@intel.com,mgorman@techsingularity.net,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 098/227] mm/page_alloc: fetch the correct pcp buddy during bulk free Message-Id: <20220322214330.9D78CC340EE@smtp.kernel.org> X-Stat-Signature: 9ahkh3c3q7mjw4fuqo87jj19unibhcbj X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: D211B180039 Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=DGpPWhME; dmarc=none; spf=pass (imf24.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Rspam-User: X-HE-Tag: 1647985411-293489 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Mel Gorman Subject: mm/page_alloc: fetch the correct pcp buddy during bulk free Patch series "Follow-up on high-order PCP caching", v2. Commit 44042b449872 ("mm/page_alloc: allow high-order pages to be stored on the per-cpu lists") was primarily aimed at reducing the cost of SLUB cache refills of high-order pages in two ways. Firstly, zone lock acquisitions was reduced and secondly, there were fewer buddy list modifications. This is a follow-up series fixing some issues that became apparant after merging. Patch 1 is a functional fix. It's harmless but inefficient. Patches 2-5 reduce the overhead of bulk freeing of PCP pages. While the overhead is small, it's cumulative and noticable when truncating large files. The changelog for patch 4 includes results of a microbench that deletes large sparse files with data in page cache. Sparse files were used to eliminate filesystem overhead. Patch 6 addresses issues with high-order PCP pages being stored on PCP lists for too long. Pages freed on a CPU potentially may not be quickly reused and in some cases this can increase cache miss rates. Details are included in the changelog. This patch (of 6): free_pcppages_bulk() prefetches buddies about to be freed but the order must also be passed in as PCP lists store multiple orders. Link: https://lkml.kernel.org/r/20220217002227.5739-1-mgorman@techsingularity.net Link: https://lkml.kernel.org/r/20220217002227.5739-2-mgorman@techsingularity.net Fixes: 44042b449872 ("mm/page_alloc: allow high-order pages to be stored on the per-cpu lists") Signed-off-by: Mel Gorman Reviewed-by: Vlastimil Babka Reviewed-by: Aaron Lu Tested-by: Aaron Lu Cc: Dave Hansen Cc: Michal Hocko Cc: Jesper Dangaard Brouer Signed-off-by: Andrew Morton --- mm/page_alloc.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) --- a/mm/page_alloc.c~mm-page_alloc-fetch-the-correct-pcp-buddy-during-bulk-free +++ a/mm/page_alloc.c @@ -1429,10 +1429,10 @@ static bool bulkfree_pcp_prepare(struct } #endif /* CONFIG_DEBUG_VM */ -static inline void prefetch_buddy(struct page *page) +static inline void prefetch_buddy(struct page *page, unsigned int order) { unsigned long pfn = page_to_pfn(page); - unsigned long buddy_pfn = __find_buddy_pfn(pfn, 0); + unsigned long buddy_pfn = __find_buddy_pfn(pfn, order); struct page *buddy = page + (buddy_pfn - pfn); prefetch(buddy); @@ -1509,7 +1509,7 @@ static void free_pcppages_bulk(struct zo * prefetch buddy for the first pcp->batch nr of pages. */ if (prefetch_nr) { - prefetch_buddy(page); + prefetch_buddy(page, order); prefetch_nr--; } } while (count > 0 && --batch_free && !list_empty(list)); From patchwork Tue Mar 22 21:43:33 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789138 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 60DB7C433F5 for ; Tue, 22 Mar 2022 21:43:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E61986B00E8; Tue, 22 Mar 2022 17:43:36 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E113E6B00E9; Tue, 22 Mar 2022 17:43:36 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CB3786B00EA; Tue, 22 Mar 2022 17:43:36 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0009.hostedemail.com [216.40.44.9]) by kanga.kvack.org (Postfix) with ESMTP id B41E46B00E8 for ; Tue, 22 Mar 2022 17:43:36 -0400 (EDT) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 6F02818265D22 for ; Tue, 22 Mar 2022 21:43:36 +0000 (UTC) X-FDA: 79273349232.26.F5594A9 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf04.hostedemail.com (Postfix) with ESMTP id EBB2C40032 for ; Tue, 22 Mar 2022 21:43:35 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id DB7CDB81DB3; Tue, 22 Mar 2022 21:43:34 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 98B0DC340F3; Tue, 22 Mar 2022 21:43:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985413; bh=6rFJhdM8DpdUmiWloX+OzeAm5fksq9BQ8wrsG4qfAa8=; h=Date:To:From:In-Reply-To:Subject:From; b=jWC7TeGPiqTuOVK4FvS7uYiFQn+RRjUWSjjmwX83IlZBrdfcs9usbJ3sOxSapLi3B 4u0Tuw855LHK3sHcP22fJJ6t5MlqWPn+vt9nQmJoysibiIMtQ4FLhLP6zctzB2tRbB wKYP8cUq7TmzqRBOqf1PiFtw+eJXTGtkk7gWRH7s= Date: Tue, 22 Mar 2022 14:43:33 -0700 To: vbabka@suse.cz,mhocko@kernel.org,dave.hansen@linux.intel.com,brouer@redhat.com,aaron.lu@intel.com,mgorman@techsingularity.net,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 099/227] mm/page_alloc: track range of active PCP lists during bulk free Message-Id: <20220322214333.98B0DC340F3@smtp.kernel.org> X-Rspam-User: X-Stat-Signature: ojre56k3yfyurwjfjstd7ypfnwk41pxy Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=jWC7TeGP; spf=pass (imf04.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: EBB2C40032 X-HE-Tag: 1647985415-530114 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Mel Gorman Subject: mm/page_alloc: track range of active PCP lists during bulk free free_pcppages_bulk() frees pages in a round-robin fashion. Originally, this was dealing only with migratetypes but storing high-order pages means that there can be many more empty lists that are uselessly checked. Track the minimum and maximum active pindex to reduce the search space. Link: https://lkml.kernel.org/r/20220217002227.5739-3-mgorman@techsingularity.net Signed-off-by: Mel Gorman Reviewed-by: Vlastimil Babka Tested-by: Aaron Lu Cc: Dave Hansen Cc: Jesper Dangaard Brouer Cc: Michal Hocko Signed-off-by: Andrew Morton --- mm/page_alloc.c | 17 +++++++++++++---- 1 file changed, 13 insertions(+), 4 deletions(-) --- a/mm/page_alloc.c~mm-page_alloc-track-range-of-active-pcp-lists-during-bulk-free +++ a/mm/page_alloc.c @@ -1447,6 +1447,8 @@ static void free_pcppages_bulk(struct zo struct per_cpu_pages *pcp) { int pindex = 0; + int min_pindex = 0; + int max_pindex = NR_PCP_LISTS - 1; int batch_free = 0; int nr_freed = 0; unsigned int order; @@ -1472,13 +1474,20 @@ static void free_pcppages_bulk(struct zo */ do { batch_free++; - if (++pindex == NR_PCP_LISTS) - pindex = 0; + if (++pindex > max_pindex) + pindex = min_pindex; list = &pcp->lists[pindex]; - } while (list_empty(list)); + if (!list_empty(list)) + break; + + if (pindex == max_pindex) + max_pindex--; + if (pindex == min_pindex) + min_pindex++; + } while (1); /* This is the only non-empty list. Free them all. */ - if (batch_free == NR_PCP_LISTS) + if (batch_free >= max_pindex - min_pindex) batch_free = count; order = pindex_to_order(pindex); From patchwork Tue Mar 22 21:43:36 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789139 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 77D52C433F5 for ; Tue, 22 Mar 2022 21:43:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 11C1D6B00EA; Tue, 22 Mar 2022 17:43:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0A56F6B00EB; Tue, 22 Mar 2022 17:43:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id ED4AF6B00EC; Tue, 22 Mar 2022 17:43:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.25]) by kanga.kvack.org (Postfix) with ESMTP id DB8896B00EA for ; Tue, 22 Mar 2022 17:43:39 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id B9DE722C37 for ; Tue, 22 Mar 2022 21:43:39 +0000 (UTC) X-FDA: 79273349358.02.7D6C669 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf02.hostedemail.com (Postfix) with ESMTP id 1EDEE80020 for ; Tue, 22 Mar 2022 21:43:38 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id DFABBB81D9E; Tue, 22 Mar 2022 21:43:37 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 94B20C340EC; Tue, 22 Mar 2022 21:43:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985416; bh=oTGU3v+FDaYpYNMTFZCrhPgnhnewJiycMKKjLxSu9UU=; h=Date:To:From:In-Reply-To:Subject:From; b=w23Cyq1+EgB/5f5QLoQy7WpBKbgeT+xcpWc9zaWc8A9DZ4uCvUSig2l54Hce2lU7p lxmVG1ncXorc9wcKMux3+u2AuNQip0ZNvR8xkrF3V9EsqpMsah0XAJqJSOxoKv5ejV bVPjb9TRKj9RgdRKBzTu+2D5zYxbl5EREcV+DNZw= Date: Tue, 22 Mar 2022 14:43:36 -0700 To: vbabka@suse.cz,mhocko@kernel.org,dave.hansen@linux.intel.com,brouer@redhat.com,aaron.lu@intel.com,mgorman@techsingularity.net,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 100/227] mm/page_alloc: simplify how many pages are selected per pcp list during bulk free Message-Id: <20220322214336.94B20C340EC@smtp.kernel.org> X-Stat-Signature: gafyg7i6acthp7rgo741biachbb9mhb5 Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=w23Cyq1+; spf=pass (imf02.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 1EDEE80020 X-HE-Tag: 1647985418-497238 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Mel Gorman Subject: mm/page_alloc: simplify how many pages are selected per pcp list during bulk free free_pcppages_bulk() selects pages to free by round-robining between lists. Originally this was to evenly shrink pages by migratetype but uneven freeing is inevitable due to high pages. Simplify list selection by starting with a list that definitely has pages on it in free_unref_page_commit() and for drain, it does not matter where draining starts as all pages are removed. Link: https://lkml.kernel.org/r/20220217002227.5739-4-mgorman@techsingularity.net Signed-off-by: Mel Gorman Reviewed-by: Vlastimil Babka Tested-by: Aaron Lu Cc: Dave Hansen Cc: Jesper Dangaard Brouer Cc: Michal Hocko Signed-off-by: Andrew Morton --- mm/page_alloc.c | 34 +++++++++++----------------------- 1 file changed, 11 insertions(+), 23 deletions(-) --- a/mm/page_alloc.c~mm-page_alloc-simplify-how-many-pages-are-selected-per-pcp-list-during-bulk-free +++ a/mm/page_alloc.c @@ -1444,13 +1444,11 @@ static inline void prefetch_buddy(struct * count is the number of pages to free. */ static void free_pcppages_bulk(struct zone *zone, int count, - struct per_cpu_pages *pcp) + struct per_cpu_pages *pcp, + int pindex) { - int pindex = 0; int min_pindex = 0; int max_pindex = NR_PCP_LISTS - 1; - int batch_free = 0; - int nr_freed = 0; unsigned int order; int prefetch_nr = READ_ONCE(pcp->batch); bool isolated_pageblocks; @@ -1464,16 +1462,10 @@ static void free_pcppages_bulk(struct zo count = min(pcp->count, count); while (count > 0) { struct list_head *list; + int nr_pages; - /* - * Remove pages from lists in a round-robin fashion. A - * batch_free count is maintained that is incremented when an - * empty list is encountered. This is so more pages are freed - * off fuller lists instead of spinning excessively around empty - * lists - */ + /* Remove pages from lists in a round-robin fashion. */ do { - batch_free++; if (++pindex > max_pindex) pindex = min_pindex; list = &pcp->lists[pindex]; @@ -1486,18 +1478,15 @@ static void free_pcppages_bulk(struct zo min_pindex++; } while (1); - /* This is the only non-empty list. Free them all. */ - if (batch_free >= max_pindex - min_pindex) - batch_free = count; - order = pindex_to_order(pindex); + nr_pages = 1 << order; BUILD_BUG_ON(MAX_ORDER >= (1<lru); - nr_freed += 1 << order; - count -= 1 << order; + count -= nr_pages; + pcp->count -= nr_pages; if (bulkfree_pcp_prepare(page)) continue; @@ -1521,9 +1510,8 @@ static void free_pcppages_bulk(struct zo prefetch_buddy(page, order); prefetch_nr--; } - } while (count > 0 && --batch_free && !list_empty(list)); + } while (count > 0 && !list_empty(list)); } - pcp->count -= nr_freed; /* * local_lock_irq held so equivalent to spin_lock_irqsave for @@ -3077,7 +3065,7 @@ void drain_zone_pages(struct zone *zone, batch = READ_ONCE(pcp->batch); to_drain = min(pcp->count, batch); if (to_drain > 0) - free_pcppages_bulk(zone, to_drain, pcp); + free_pcppages_bulk(zone, to_drain, pcp, 0); local_unlock_irqrestore(&pagesets.lock, flags); } #endif @@ -3098,7 +3086,7 @@ static void drain_pages_zone(unsigned in pcp = per_cpu_ptr(zone->per_cpu_pageset, cpu); if (pcp->count) - free_pcppages_bulk(zone, pcp->count, pcp); + free_pcppages_bulk(zone, pcp->count, pcp, 0); local_unlock_irqrestore(&pagesets.lock, flags); } @@ -3379,7 +3367,7 @@ static void free_unref_page_commit(struc if (pcp->count >= high) { int batch = READ_ONCE(pcp->batch); - free_pcppages_bulk(zone, nr_pcp_free(pcp, high, batch), pcp); + free_pcppages_bulk(zone, nr_pcp_free(pcp, high, batch), pcp, pindex); } } From patchwork Tue Mar 22 21:43:38 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789140 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0C35BC433EF for ; Tue, 22 Mar 2022 21:43:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 959286B00EC; Tue, 22 Mar 2022 17:43:41 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 908976B00ED; Tue, 22 Mar 2022 17:43:41 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 75B8A6B00EE; Tue, 22 Mar 2022 17:43:41 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.26]) by kanga.kvack.org (Postfix) with ESMTP id 560566B00EC for ; Tue, 22 Mar 2022 17:43:41 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 2A1BF61E04 for ; Tue, 22 Mar 2022 21:43:41 +0000 (UTC) X-FDA: 79273349442.07.B764251 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf03.hostedemail.com (Postfix) with ESMTP id B705720021 for ; Tue, 22 Mar 2022 21:43:40 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 349CB60A1B; Tue, 22 Mar 2022 21:43:40 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 8FFC5C340EC; Tue, 22 Mar 2022 21:43:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985419; bh=1sREwPqsoklvKO9TTinMABRXKUs84U+dTlddvMWZ4Sk=; h=Date:To:From:In-Reply-To:Subject:From; b=QgJAwkajMt1MxK3aqVK3NEFE90p6sL9qzBbvM9GDkcTpawj0lNzhzfO5AinhNevqW fgfQZVTMDrLnaDNl94FS025UuK4e4fFuWd9TDPdnE+8Uxi5CZLJtRk8KjXQZL0z6OH 7Lj1YDWcH3zGN2IFMh5234CxObWx+W8Axopzg0xI= Date: Tue, 22 Mar 2022 14:43:38 -0700 To: vbabka@suse.cz,mhocko@kernel.org,dave.hansen@linux.intel.com,brouer@redhat.com,aaron.lu@intel.com,mgorman@techsingularity.net,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 101/227] mm/page_alloc: drain the requested list first during bulk free Message-Id: <20220322214339.8FFC5C340EC@smtp.kernel.org> Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=QgJAwkaj; spf=pass (imf03.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspam-User: X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: B705720021 X-Stat-Signature: keepnjz5othyubcdzotydmtwow81hwcz X-HE-Tag: 1647985420-30661 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Mel Gorman Subject: mm/page_alloc: drain the requested list first during bulk free Prior to the series, pindex 0 (order-0 MIGRATE_UNMOVABLE) was always skipped first and the precise reason is forgotten. A potential reason may have been to artificially preserve MIGRATE_UNMOVABLE but there is no reason why that would be optimal as it depends on the workload. The more likely reason is that it was less complicated to do a pre-increment instead of a post-increment in terms of overall code flow. As free_pcppages_bulk() now typically receives the pindex of the PCP list that exceeded high, always start draining that list. Link: https://lkml.kernel.org/r/20220217002227.5739-5-mgorman@techsingularity.net Signed-off-by: Mel Gorman Reviewed-by: Vlastimil Babka Tested-by: Aaron Lu Cc: Dave Hansen Cc: Jesper Dangaard Brouer Cc: Michal Hocko Signed-off-by: Andrew Morton --- mm/page_alloc.c | 4 ++++ 1 file changed, 4 insertions(+) --- a/mm/page_alloc.c~mm-page_alloc-drain-the-requested-list-first-during-bulk-free +++ a/mm/page_alloc.c @@ -1460,6 +1460,10 @@ static void free_pcppages_bulk(struct zo * below while (list_empty(list)) loop. */ count = min(pcp->count, count); + + /* Ensure requested pindex is drained first. */ + pindex = pindex - 1; + while (count > 0) { struct list_head *list; int nr_pages; From patchwork Tue Mar 22 21:43:42 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789141 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 88510C433F5 for ; Tue, 22 Mar 2022 21:43:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 18FCF6B00EE; Tue, 22 Mar 2022 17:43:46 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 119A06B00EF; Tue, 22 Mar 2022 17:43:46 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EAEB66B00F0; Tue, 22 Mar 2022 17:43:45 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0114.hostedemail.com [216.40.44.114]) by kanga.kvack.org (Postfix) with ESMTP id D5CC96B00EE for ; Tue, 22 Mar 2022 17:43:45 -0400 (EDT) Received: from smtpin16.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 9EBFEA4DDF for ; Tue, 22 Mar 2022 21:43:45 +0000 (UTC) X-FDA: 79273349610.16.DBF703E Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf04.hostedemail.com (Postfix) with ESMTP id 1365140013 for ; Tue, 22 Mar 2022 21:43:44 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id E8CF2B81DB5; Tue, 22 Mar 2022 21:43:43 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id A0B96C340EC; Tue, 22 Mar 2022 21:43:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985422; bh=BjpmI5D87g7Z+XJBfZmngKJHs0dzuJgyOI0NRbyHgrI=; h=Date:To:From:In-Reply-To:Subject:From; b=NA6Fcr6HNRh7AY555+tV9Zr9eFFMtLt5I/q8LFe163c1+Avfu7CKA+mzgxLhxTa6N eC+RZCZjTFrjrpiA+x6CpVcJwmL68yu2OoWPPKQXPYOzMAslo4XehA/FegN0c5GbZC 7YLNDSc7EFMyWUO4jz2KGt4G640P5Y8hQIvG378o= Date: Tue, 22 Mar 2022 14:43:42 -0700 To: vbabka@suse.cz,mhocko@kernel.org,dave.hansen@linux.intel.com,brouer@redhat.com,aaron.lu@intel.com,mgorman@techsingularity.net,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 102/227] mm/page_alloc: free pages in a single pass during bulk free Message-Id: <20220322214342.A0B96C340EC@smtp.kernel.org> X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 1365140013 X-Stat-Signature: 99ssfayjzbmokifdyyc7m7jjdgx56ha4 X-Rspam-User: Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=NA6Fcr6H; dmarc=none; spf=pass (imf04.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-HE-Tag: 1647985424-751173 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Mel Gorman Subject: mm/page_alloc: free pages in a single pass during bulk free free_pcppages_bulk() has taken two passes through the pcp lists since commit 0a5f4e5b4562 ("mm/free_pcppages_bulk: do not hold lock when picking pages to free") due to deferring the cost of selecting PCP lists until the zone lock is held. Now that list selection is simplier, the main cost during selection is bulkfree_pcp_prepare() which in the normal case is a simple check and prefetching. As the list manipulations have cost in itself, go back to freeing pages in a single pass. The series up to this point was evaulated using a trunc microbenchmark that is truncating sparse files stored in page cache (mmtests config config-io-trunc). Sparse files were used to limit filesystem interaction. The results versus a revert of storing high-order pages in the PCP lists is 1-socket Skylake 5.17.0-rc3 5.17.0-rc3 5.17.0-rc3 vanilla mm-reverthighpcp-v1 mm-highpcpopt-v2 Min elapsed 540.00 ( 0.00%) 530.00 ( 1.85%) 530.00 ( 1.85%) Amean elapsed 543.00 ( 0.00%) 530.00 * 2.39%* 530.00 * 2.39%* Stddev elapsed 4.83 ( 0.00%) 0.00 ( 100.00%) 0.00 ( 100.00%) CoeffVar elapsed 0.89 ( 0.00%) 0.00 ( 100.00%) 0.00 ( 100.00%) Max elapsed 550.00 ( 0.00%) 530.00 ( 3.64%) 530.00 ( 3.64%) BAmean-50 elapsed 540.00 ( 0.00%) 530.00 ( 1.85%) 530.00 ( 1.85%) BAmean-95 elapsed 542.22 ( 0.00%) 530.00 ( 2.25%) 530.00 ( 2.25%) BAmean-99 elapsed 542.22 ( 0.00%) 530.00 ( 2.25%) 530.00 ( 2.25%) 2-socket CascadeLake 5.17.0-rc3 5.17.0-rc3 5.17.0-rc3 vanilla mm-reverthighpcp-v1 mm-highpcpopt-v2 Min elapsed 510.00 ( 0.00%) 500.00 ( 1.96%) 500.00 ( 1.96%) Amean elapsed 529.00 ( 0.00%) 521.00 ( 1.51%) 510.00 * 3.59%* Stddev elapsed 16.63 ( 0.00%) 12.87 ( 22.64%) 11.55 ( 30.58%) CoeffVar elapsed 3.14 ( 0.00%) 2.47 ( 21.46%) 2.26 ( 27.99%) Max elapsed 550.00 ( 0.00%) 540.00 ( 1.82%) 530.00 ( 3.64%) BAmean-50 elapsed 516.00 ( 0.00%) 512.00 ( 0.78%) 500.00 ( 3.10%) BAmean-95 elapsed 526.67 ( 0.00%) 518.89 ( 1.48%) 507.78 ( 3.59%) BAmean-99 elapsed 526.67 ( 0.00%) 518.89 ( 1.48%) 507.78 ( 3.59%) The original motivation for multi-passes was will-it-scale page_fault1 using $nr_cpu processes. 2-socket CascadeLake (40 cores, 80 CPUs HT enabled) 5.17.0-rc3 5.17.0-rc3 vanilla mm-highpcpopt-v2 Hmean page_fault1-processes-2 2694662.26 ( 0.00%) 2695780.35 ( 0.04%) Hmean page_fault1-processes-5 6425819.34 ( 0.00%) 6435544.57 * 0.15%* Hmean page_fault1-processes-8 9642169.10 ( 0.00%) 9658962.39 ( 0.17%) Hmean page_fault1-processes-12 12167502.10 ( 0.00%) 12190163.79 ( 0.19%) Hmean page_fault1-processes-21 15636859.03 ( 0.00%) 15612447.26 ( -0.16%) Hmean page_fault1-processes-30 25157348.61 ( 0.00%) 25169456.65 ( 0.05%) Hmean page_fault1-processes-48 27694013.85 ( 0.00%) 27671111.46 ( -0.08%) Hmean page_fault1-processes-79 25928742.64 ( 0.00%) 25934202.02 ( 0.02%) <-- Hmean page_fault1-processes-110 25730869.75 ( 0.00%) 25671880.65 * -0.23%* Hmean page_fault1-processes-141 25626992.42 ( 0.00%) 25629551.61 ( 0.01%) Hmean page_fault1-processes-172 25611651.35 ( 0.00%) 25614927.99 ( 0.01%) Hmean page_fault1-processes-203 25577298.75 ( 0.00%) 25583445.59 ( 0.02%) Hmean page_fault1-processes-234 25580686.07 ( 0.00%) 25608240.71 ( 0.11%) Hmean page_fault1-processes-265 25570215.47 ( 0.00%) 25568647.58 ( -0.01%) Hmean page_fault1-processes-296 25549488.62 ( 0.00%) 25543935.00 ( -0.02%) Hmean page_fault1-processes-320 25555149.05 ( 0.00%) 25575696.74 ( 0.08%) The differences are mostly within the noise and the difference close to $nr_cpus is negligible. Link: https://lkml.kernel.org/r/20220217002227.5739-6-mgorman@techsingularity.net Signed-off-by: Mel Gorman Reviewed-by: Vlastimil Babka Tested-by: Aaron Lu Cc: Dave Hansen Cc: Jesper Dangaard Brouer Cc: Michal Hocko Signed-off-by: Andrew Morton --- mm/page_alloc.c | 56 +++++++++++++++++----------------------------- 1 file changed, 21 insertions(+), 35 deletions(-) --- a/mm/page_alloc.c~mm-page_alloc-free-pages-in-a-single-pass-during-bulk-free +++ a/mm/page_alloc.c @@ -1452,8 +1452,7 @@ static void free_pcppages_bulk(struct zo unsigned int order; int prefetch_nr = READ_ONCE(pcp->batch); bool isolated_pageblocks; - struct page *page, *tmp; - LIST_HEAD(head); + struct page *page; /* * Ensure proper count is passed which otherwise would stuck in the @@ -1464,6 +1463,13 @@ static void free_pcppages_bulk(struct zo /* Ensure requested pindex is drained first. */ pindex = pindex - 1; + /* + * local_lock_irq held so equivalent to spin_lock_irqsave for + * both PREEMPT_RT and non-PREEMPT_RT configurations. + */ + spin_lock(&zone->lock); + isolated_pageblocks = has_isolate_pageblock(zone); + while (count > 0) { struct list_head *list; int nr_pages; @@ -1486,7 +1492,11 @@ static void free_pcppages_bulk(struct zo nr_pages = 1 << order; BUILD_BUG_ON(MAX_ORDER >= (1<lru); count -= nr_pages; @@ -1495,12 +1505,6 @@ static void free_pcppages_bulk(struct zo if (bulkfree_pcp_prepare(page)) continue; - /* Encode order with the migratetype */ - page->index <<= NR_PCP_ORDER_WIDTH; - page->index |= order; - - list_add_tail(&page->lru, &head); - /* * We are going to put the page back to the global * pool, prefetch its buddy to speed up later access @@ -1514,36 +1518,18 @@ static void free_pcppages_bulk(struct zo prefetch_buddy(page, order); prefetch_nr--; } - } while (count > 0 && !list_empty(list)); - } - /* - * local_lock_irq held so equivalent to spin_lock_irqsave for - * both PREEMPT_RT and non-PREEMPT_RT configurations. - */ - spin_lock(&zone->lock); - isolated_pageblocks = has_isolate_pageblock(zone); + /* MIGRATE_ISOLATE page should not go to pcplists */ + VM_BUG_ON_PAGE(is_migrate_isolate(mt), page); + /* Pageblock could have been isolated meanwhile */ + if (unlikely(isolated_pageblocks)) + mt = get_pageblock_migratetype(page); - /* - * Use safe version since after __free_one_page(), - * page->lru.next will not point to original list. - */ - list_for_each_entry_safe(page, tmp, &head, lru) { - int mt = get_pcppage_migratetype(page); - - /* mt has been encoded with the order (see above) */ - order = mt & NR_PCP_ORDER_MASK; - mt >>= NR_PCP_ORDER_WIDTH; - - /* MIGRATE_ISOLATE page should not go to pcplists */ - VM_BUG_ON_PAGE(is_migrate_isolate(mt), page); - /* Pageblock could have been isolated meanwhile */ - if (unlikely(isolated_pageblocks)) - mt = get_pageblock_migratetype(page); - - __free_one_page(page, page_to_pfn(page), zone, order, mt, FPI_NONE); - trace_mm_page_pcpu_drain(page, order, mt); + __free_one_page(page, page_to_pfn(page), zone, order, mt, FPI_NONE); + trace_mm_page_pcpu_drain(page, order, mt); + } while (count > 0 && !list_empty(list)); } + spin_unlock(&zone->lock); } From patchwork Tue Mar 22 21:43:45 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789142 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4E0B8C433EF for ; Tue, 22 Mar 2022 21:43:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DC6696B00F0; Tue, 22 Mar 2022 17:43:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D4D8C6B00F1; Tue, 22 Mar 2022 17:43:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BA2516B00F2; Tue, 22 Mar 2022 17:43:47 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.25]) by kanga.kvack.org (Postfix) with ESMTP id 9D87D6B00F0 for ; Tue, 22 Mar 2022 17:43:47 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay12.hostedemail.com (Postfix) with ESMTP id 6CAC81212F5 for ; Tue, 22 Mar 2022 21:43:47 +0000 (UTC) X-FDA: 79273349694.10.B4D75D2 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf15.hostedemail.com (Postfix) with ESMTP id F0AFCA0019 for ; Tue, 22 Mar 2022 21:43:46 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 6DBFA6149E; Tue, 22 Mar 2022 21:43:46 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id C4A70C340EC; Tue, 22 Mar 2022 21:43:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985425; bh=ikz4+EgGmBvYeY85qoGEj16OtIH5w8CZum76gehh5ik=; h=Date:To:From:In-Reply-To:Subject:From; b=pAsHzGHTcQrrIuLtVpq6GGYbHwL/WL3uuRSwb2d8zooG5Xnv2zYgviz4X2J6ibmLg nCHCcRuTwlLRxopiMVTo7nChzsr6vc5p9OhlzI/Qiz0yJ86Neu7APElTZGNPWl8hIW /SjvYPIS4lCx1e+wdc/scL035vDMJoFydRVSuQhY= Date: Tue, 22 Mar 2022 14:43:45 -0700 To: vbabka@suse.cz,mhocko@kernel.org,dave.hansen@linux.intel.com,brouer@redhat.com,aaron.lu@intel.com,mgorman@techsingularity.net,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 103/227] mm/page_alloc: limit number of high-order pages on PCP during bulk free Message-Id: <20220322214345.C4A70C340EC@smtp.kernel.org> X-Rspam-User: X-Stat-Signature: 7y97nagtbdae5zjbmjumtgb3h6m9ndfa Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=pAsHzGHT; spf=pass (imf15.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: F0AFCA0019 X-HE-Tag: 1647985426-501405 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Mel Gorman Subject: mm/page_alloc: limit number of high-order pages on PCP during bulk free When a PCP is mostly used for frees then high-order pages can exist on PCP lists for some time. This is problematic when the allocation pattern is all allocations from one CPU and all frees from another resulting in colder pages being used. When bulk freeing pages, limit the number of high-order pages that are stored on the PCP lists. Netperf running on localhost exhibits this pattern and while it does not matter for some machines, it does matter for others with smaller caches where cache misses cause problems due to reduced page reuse. Pages freed directly to the buddy list may be reused quickly while still cache hot where as storing on the PCP lists may be cold by the time free_pcppages_bulk() is called. Using perf kmem:mm_page_alloc, the 5 most used page frames were 5.17-rc3 13041 pfn=0x111a30 13081 pfn=0x5814d0 13097 pfn=0x108258 13121 pfn=0x689598 13128 pfn=0x5814d8 5.17-revert-highpcp 192009 pfn=0x54c140 195426 pfn=0x1081d0 200908 pfn=0x61c808 243515 pfn=0xa9dc20 402523 pfn=0x222bb8 5.17-full-series 142693 pfn=0x346208 162227 pfn=0x13bf08 166413 pfn=0x2711e0 166950 pfn=0x2702f8 The spread is wider as there is still time before pages freed to one PCP get released with a tradeoff between fast reuse and reduced zone lock acquisition. On the machine used to gather the traces, the headline performance was equivalent. netperf-tcp 5.17.0-rc3 5.17.0-rc3 5.17.0-rc3 vanilla mm-reverthighpcp-v1r1 mm-highpcplimit-v2 Hmean 64 839.93 ( 0.00%) 840.77 ( 0.10%) 841.02 ( 0.13%) Hmean 128 1614.22 ( 0.00%) 1622.07 * 0.49%* 1636.41 * 1.37%* Hmean 256 2952.00 ( 0.00%) 2953.19 ( 0.04%) 2977.76 * 0.87%* Hmean 1024 10291.67 ( 0.00%) 10239.17 ( -0.51%) 10434.41 * 1.39%* Hmean 2048 17335.08 ( 0.00%) 17399.97 ( 0.37%) 17134.81 * -1.16%* Hmean 3312 22628.15 ( 0.00%) 22471.97 ( -0.69%) 22422.78 ( -0.91%) Hmean 4096 25009.50 ( 0.00%) 24752.83 * -1.03%* 24740.41 ( -1.08%) Hmean 8192 32745.01 ( 0.00%) 31682.63 * -3.24%* 32153.50 * -1.81%* Hmean 16384 39759.59 ( 0.00%) 36805.78 * -7.43%* 38948.13 * -2.04%* On a 1-socket skylake machine with a small CPU cache that suffers more if cache misses are too high netperf-tcp 5.17.0-rc3 5.17.0-rc3 5.17.0-rc3 vanilla mm-reverthighpcp-v1 mm-highpcplimit-v2 Hmean 64 938.95 ( 0.00%) 941.50 * 0.27%* 943.61 * 0.50%* Hmean 128 1843.10 ( 0.00%) 1857.58 * 0.79%* 1861.09 * 0.98%* Hmean 256 3573.07 ( 0.00%) 3667.45 * 2.64%* 3674.91 * 2.85%* Hmean 1024 13206.52 ( 0.00%) 13487.80 * 2.13%* 13393.21 * 1.41%* Hmean 2048 22870.23 ( 0.00%) 23337.96 * 2.05%* 23188.41 * 1.39%* Hmean 3312 31001.99 ( 0.00%) 32206.50 * 3.89%* 31863.62 * 2.78%* Hmean 4096 35364.59 ( 0.00%) 36490.96 * 3.19%* 36112.54 * 2.11%* Hmean 8192 48497.71 ( 0.00%) 49954.05 * 3.00%* 49588.26 * 2.25%* Hmean 16384 58410.86 ( 0.00%) 60839.80 * 4.16%* 62282.96 * 6.63%* Note that this was a machine that did not benefit from caching high-order pages and performance is almost restored with the series applied. It's not fully restored as cache misses are still higher. This is a trade-off between optimising for a workload that does all allocs on one CPU and frees on another or more general workloads that need high-order pages for SLUB and benefit from avoiding zone->lock for every SLUB refill/drain. Link: https://lkml.kernel.org/r/20220217002227.5739-7-mgorman@techsingularity.net Signed-off-by: Mel Gorman Reviewed-by: Vlastimil Babka Tested-by: Aaron Lu Cc: Dave Hansen Cc: Jesper Dangaard Brouer Cc: Michal Hocko Signed-off-by: Andrew Morton --- mm/page_alloc.c | 26 +++++++++++++++++++++----- 1 file changed, 21 insertions(+), 5 deletions(-) --- a/mm/page_alloc.c~mm-page_alloc-limit-number-of-high-order-pages-on-pcp-during-bulk-free +++ a/mm/page_alloc.c @@ -3299,10 +3299,15 @@ static bool free_unref_page_prepare(stru return true; } -static int nr_pcp_free(struct per_cpu_pages *pcp, int high, int batch) +static int nr_pcp_free(struct per_cpu_pages *pcp, int high, int batch, + bool free_high) { int min_nr_free, max_nr_free; + /* Free everything if batch freeing high-order pages. */ + if (unlikely(free_high)) + return pcp->count; + /* Check for PCP disabled or boot pageset */ if (unlikely(high < batch)) return 1; @@ -3323,11 +3328,12 @@ static int nr_pcp_free(struct per_cpu_pa return batch; } -static int nr_pcp_high(struct per_cpu_pages *pcp, struct zone *zone) +static int nr_pcp_high(struct per_cpu_pages *pcp, struct zone *zone, + bool free_high) { int high = READ_ONCE(pcp->high); - if (unlikely(!high)) + if (unlikely(!high || free_high)) return 0; if (!test_bit(ZONE_RECLAIM_ACTIVE, &zone->flags)) @@ -3347,17 +3353,27 @@ static void free_unref_page_commit(struc struct per_cpu_pages *pcp; int high; int pindex; + bool free_high; __count_vm_event(PGFREE); pcp = this_cpu_ptr(zone->per_cpu_pageset); pindex = order_to_pindex(migratetype, order); list_add(&page->lru, &pcp->lists[pindex]); pcp->count += 1 << order; - high = nr_pcp_high(pcp, zone); + + /* + * As high-order pages other than THP's stored on PCP can contribute + * to fragmentation, limit the number stored when PCP is heavily + * freeing without allocation. The remainder after bulk freeing + * stops will be drained from vmstat refresh context. + */ + free_high = (pcp->free_factor && order && order <= PAGE_ALLOC_COSTLY_ORDER); + + high = nr_pcp_high(pcp, zone, free_high); if (pcp->count >= high) { int batch = READ_ONCE(pcp->batch); - free_pcppages_bulk(zone, nr_pcp_free(pcp, high, batch), pcp, pindex); + free_pcppages_bulk(zone, nr_pcp_free(pcp, high, batch, free_high), pcp, pindex); } } From patchwork Tue Mar 22 21:43:48 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789143 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 81D50C433F5 for ; Tue, 22 Mar 2022 21:43:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1428C6B00F2; Tue, 22 Mar 2022 17:43:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0F2D36B00F3; Tue, 22 Mar 2022 17:43:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id ED54E6B00F4; Tue, 22 Mar 2022 17:43:50 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0048.hostedemail.com [216.40.44.48]) by kanga.kvack.org (Postfix) with ESMTP id D7C2D6B00F2 for ; Tue, 22 Mar 2022 17:43:50 -0400 (EDT) Received: from smtpin20.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 9CDEC1820BA6C for ; Tue, 22 Mar 2022 21:43:50 +0000 (UTC) X-FDA: 79273349820.20.2B1FB75 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf04.hostedemail.com (Postfix) with ESMTP id 1403340013 for ; Tue, 22 Mar 2022 21:43:49 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 7DE516101E; Tue, 22 Mar 2022 21:43:49 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id D21E5C340F2; Tue, 22 Mar 2022 21:43:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985428; bh=yE+foT6ycZIEk/yhItCMeIRV6Fxl6kJtyoQy1cLNBEk=; h=Date:To:From:In-Reply-To:Subject:From; b=ffogBE1QK8ATvyueuaFksTTTwu4dRNHj4FJejOEvNp6ApwYgykUo99Pvt6dTvQGBA viiYylbFAgnPBQucj5tvdxjmH9RsijQDHxQOjlSS0h/UY6RVGSjPhBGDnLMmfTUgaS aP7RVk2XtqnLcDGRFNTeAB3OpoyG7kubpDO4ssOY= Date: Tue, 22 Mar 2022 14:43:48 -0700 To: vbabka@suse.cz,mhocko@kernel.org,dave.hansen@linux.intel.com,brouer@redhat.com,aaron.lu@intel.com,mgorman@techsingularity.net,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 104/227] mm/page_alloc: do not prefetch buddies during bulk free Message-Id: <20220322214348.D21E5C340F2@smtp.kernel.org> X-Stat-Signature: 7d9cmn48am5sczi58u8aedq9h7o5oxrp Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=ffogBE1Q; spf=pass (imf04.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspam-User: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 1403340013 X-HE-Tag: 1647985429-88578 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Mel Gorman Subject: mm/page_alloc: do not prefetch buddies during bulk free free_pcppages_bulk() has taken two passes through the pcp lists since commit 0a5f4e5b4562 ("mm/free_pcppages_bulk: do not hold lock when picking pages to free") due to deferring the cost of selecting PCP lists until the zone lock is held. As the list processing now takes place under the zone lock, it's less clear that this will always benefit for two reasons. 1. There is a guaranteed cost to calculating the buddy which definitely has to be calculated again. However, as the zone lock is held and there is no deferring of buddy merging, there is no guarantee that the prefetch will have completed when the second buddy calculation takes place and buddies are being merged. With or without the prefetch, there may be further stalls depending on how many pages get merged. In other words, a stall due to merging is inevitable and at best only one stall might be avoided at the cost of calculating the buddy location twice. 2. As the zone lock is held, prefetch_nr makes less sense as once prefetch_nr expires, the cache lines of interest have already been merged. The main concern is that there is a definite cost to calculating the buddy location early for the prefetch and it is a "maybe win" depending on whether the CPU prefetch logic and memory is fast enough. Remove the prefetch logic on the basis that reduced instructions in a path is always a saving where as the prefetch might save one memory stall depending on the CPU and memory. In most cases, this has marginal benefit as the calculations are a small part of the overall freeing of pages. However, it was detectable on at least one machine. 5.17.0-rc3 5.17.0-rc3 mm-highpcplimit-v2r1 mm-noprefetch-v1r1 Min elapsed 630.00 ( 0.00%) 610.00 ( 3.17%) Amean elapsed 639.00 ( 0.00%) 623.00 * 2.50%* Max elapsed 660.00 ( 0.00%) 660.00 ( 0.00%) Link: https://lkml.kernel.org/r/20220221094119.15282-2-mgorman@techsingularity.net Signed-off-by: Mel Gorman Suggested-by: Aaron Lu Reviewed-by: Vlastimil Babka Reviewed-by: Aaron Lu Cc: Dave Hansen Cc: Michal Hocko Cc: Jesper Dangaard Brouer Signed-off-by: Andrew Morton --- mm/page_alloc.c | 24 ------------------------ 1 file changed, 24 deletions(-) --- a/mm/page_alloc.c~mm-page_alloc-do-not-prefetch-buddies-during-bulk-free +++ a/mm/page_alloc.c @@ -1429,15 +1429,6 @@ static bool bulkfree_pcp_prepare(struct } #endif /* CONFIG_DEBUG_VM */ -static inline void prefetch_buddy(struct page *page, unsigned int order) -{ - unsigned long pfn = page_to_pfn(page); - unsigned long buddy_pfn = __find_buddy_pfn(pfn, order); - struct page *buddy = page + (buddy_pfn - pfn); - - prefetch(buddy); -} - /* * Frees a number of pages from the PCP lists * Assumes all pages on list are in same zone. @@ -1450,7 +1441,6 @@ static void free_pcppages_bulk(struct zo int min_pindex = 0; int max_pindex = NR_PCP_LISTS - 1; unsigned int order; - int prefetch_nr = READ_ONCE(pcp->batch); bool isolated_pageblocks; struct page *page; @@ -1505,20 +1495,6 @@ static void free_pcppages_bulk(struct zo if (bulkfree_pcp_prepare(page)) continue; - /* - * We are going to put the page back to the global - * pool, prefetch its buddy to speed up later access - * under zone->lock. It is believed the overhead of - * an additional test and calculating buddy_pfn here - * can be offset by reduced memory latency later. To - * avoid excessive prefetching due to large count, only - * prefetch buddy for the first pcp->batch nr of pages. - */ - if (prefetch_nr) { - prefetch_buddy(page, order); - prefetch_nr--; - } - /* MIGRATE_ISOLATE page should not go to pcplists */ VM_BUG_ON_PAGE(is_migrate_isolate(mt), page); /* Pageblock could have been isolated meanwhile */ From patchwork Tue Mar 22 21:43:51 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789144 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 40801C433EF for ; Tue, 22 Mar 2022 21:43:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C17B56B00F4; Tue, 22 Mar 2022 17:43:55 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BC7466B00F5; Tue, 22 Mar 2022 17:43:55 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A68266B00F6; Tue, 22 Mar 2022 17:43:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.25]) by kanga.kvack.org (Postfix) with ESMTP id 92A516B00F4 for ; Tue, 22 Mar 2022 17:43:55 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay11.hostedemail.com (Postfix) with ESMTP id 684F181A0C for ; Tue, 22 Mar 2022 21:43:55 +0000 (UTC) X-FDA: 79273350030.02.862C406 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf25.hostedemail.com (Postfix) with ESMTP id B496BA0034 for ; Tue, 22 Mar 2022 21:43:54 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 875D9B81DAD; Tue, 22 Mar 2022 21:43:53 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 20C03C340F2; Tue, 22 Mar 2022 21:43:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985432; bh=YMAfW3T/bUpVLYcd4mPh1SpYHjD7iHgHkVx9fuGdacU=; h=Date:To:From:In-Reply-To:Subject:From; b=S2QjmBHCLJ8Coj3f5fZuwL4w47qhcthE/fBUplExYxEejfiulCDwr/3/0VitvZkmn cOnTtAdiaLpv9SdarCzNOmzgxOpD9zyV5pyOhAJsGQ/q8uzIhwFJvThYMNWfHSMeRy WScbd8ra/2tXP4cYICZXIfUJALARvx/aff6pkCYw= Date: Tue, 22 Mar 2022 14:43:51 -0700 To: richard.weiyang@gmail.com,raquini@redhat.com,mhocko@suse.com,dennis@kernel.org,david@redhat.com,dave.hansen@linux.intel.com,amakhalov@vmware.com,osalvador@suse.de,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 105/227] arch/x86/mm/numa: Do not initialize nodes twice Message-Id: <20220322214352.20C03C340F2@smtp.kernel.org> X-Stat-Signature: ix39x9t61a51mb83d4tqq4nb8z9pgw6e Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=S2QjmBHC; spf=pass (imf25.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: B496BA0034 X-HE-Tag: 1647985434-251270 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Oscar Salvador Subject: arch/x86/mm/numa: Do not initialize nodes twice On x86, prior to ("mm: handle uninitialized numa nodes gracecully"), NUMA nodes could be allocated at three different places. - numa_register_memblks - init_cpu_to_node - init_gi_nodes All these calls happen at setup_arch, and have the following order: setup_arch ... x86_numa_init numa_init numa_register_memblks ... init_cpu_to_node init_memory_less_node alloc_node_data free_area_init_memoryless_node init_gi_nodes init_memory_less_node alloc_node_data free_area_init_memoryless_node numa_register_memblks() is only interested in those nodes which have memory, so it skips over any memoryless node it founds. Later on, when we have read ACPI's SRAT table, we call init_cpu_to_node() and init_gi_nodes(), which initialize any memoryless node we might have that have either CPU or Initiator affinity, meaning we allocate pg_data_t struct for them and we mark them as ONLINE. So far so good, but the thing is that after ("mm: handle uninitialized numa nodes gracefully"), we allocate all possible NUMA nodes in free_area_init(), meaning we have a picture like the following: setup_arch x86_numa_init numa_init numa_register_memblks <-- allocate non-memoryless node x86_init.paging.pagetable_init ... free_area_init free_area_init_memoryless <-- allocate memoryless node init_cpu_to_node alloc_node_data <-- allocate memoryless node with CPU free_area_init_memoryless_node init_gi_nodes alloc_node_data <-- allocate memoryless node with Initiator free_area_init_memoryless_node free_area_init() already allocates all possible NUMA nodes, but init_cpu_to_node() and init_gi_nodes() are clueless about that, so they go ahead and allocate a new pg_data_t struct without checking anything, meaning we end up allocating twice. It should be mad clear that this only happens in the case where memoryless NUMA node happens to have a CPU/Initiator affinity. So get rid of init_memory_less_node() and just set the node online. Note that setting the node online is needed, otherwise we choke down the chain when bringup_nonboot_cpus() ends up calling __try_online_node()->register_one_node()->... and we blow up in bus_add_device(). As can be seen here: ========== [ 0.585060] BUG: kernel NULL pointer dereference, address: 0000000000000060 [ 0.586091] #PF: supervisor read access in kernel mode [ 0.586831] #PF: error_code(0x0000) - not-present page [ 0.586930] PGD 0 P4D 0 [ 0.586930] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI [ 0.586930] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.17.0-rc4-1-default+ #45 [ 0.586930] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/4 [ 0.586930] RIP: 0010:bus_add_device+0x5a/0x140 [ 0.586930] Code: 8b 74 24 20 48 89 df e8 84 96 ff ff 85 c0 89 c5 75 38 48 8b 53 50 48 85 d2 0f 84 bb 00 004 [ 0.586930] RSP: 0000:ffffc9000022bd10 EFLAGS: 00010246 [ 0.586930] RAX: 0000000000000000 RBX: ffff888100987400 RCX: ffff8881003e4e19 [ 0.586930] RDX: ffff8881009a5e00 RSI: ffff888100987400 RDI: ffff888100987400 [ 0.586930] RBP: 0000000000000000 R08: ffff8881003e4e18 R09: ffff8881003e4c98 [ 0.586930] R10: 0000000000000000 R11: ffff888100402bc0 R12: ffffffff822ceba0 [ 0.586930] R13: 0000000000000000 R14: ffff888100987400 R15: 0000000000000000 [ 0.586930] FS: 0000000000000000(0000) GS:ffff88853fc00000(0000) knlGS:0000000000000000 [ 0.586930] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 0.586930] CR2: 0000000000000060 CR3: 000000000200a001 CR4: 00000000001706b0 [ 0.586930] Call Trace: [ 0.586930] [ 0.586930] device_add+0x4c0/0x910 [ 0.586930] __register_one_node+0x97/0x2d0 [ 0.586930] __try_online_node+0x85/0xc0 [ 0.586930] try_online_node+0x25/0x40 [ 0.586930] cpu_up+0x4f/0x100 [ 0.586930] bringup_nonboot_cpus+0x4f/0x60 [ 0.586930] smp_init+0x26/0x79 [ 0.586930] kernel_init_freeable+0x130/0x2f1 [ 0.586930] ? rest_init+0x100/0x100 [ 0.586930] kernel_init+0x17/0x150 [ 0.586930] ? rest_init+0x100/0x100 [ 0.586930] ret_from_fork+0x22/0x30 [ 0.586930] [ 0.586930] Modules linked in: [ 0.586930] CR2: 0000000000000060 [ 0.586930] ---[ end trace 0000000000000000 ]--- ========== The reason is simple, by the time bringup_nonboot_cpus() gets called, we did not register the node_subsys bus yet, so we crash when bus_add_device() tries to dereference bus()->p. The following shows the order of the calls: kernel_init_freeable smp_init bringup_nonboot_cpus ... bus_add_device() <- we did not register node_subsys yet do_basic_setup do_initcalls postcore_initcall(register_node_type); register_node_type subsys_system_register subsys_register bus_register <- register node_subsys bus Why setting the node online saves us then? Well, simply because __try_online_node() backs off when the node is online, meaning we do not end up calling register_one_node() in the first place. This is subtle, broken and deserves a deep analysis and thought about how to put this into shape, but for now let us have this easy fix for the leaking memory issue. [osalvador@suse.de: add comments] Link: https://lkml.kernel.org/r/20220221142649.3457-1-osalvador@suse.de Link: https://lkml.kernel.org/r/20220218224302.5282-2-osalvador@suse.de Fixes: da4490c958ad ("mm: handle uninitialized numa nodes gracefully") Signed-off-by: Oscar Salvador Acked-by: Michal Hocko Cc: David Hildenbrand Cc: Rafael Aquini Cc: Dave Hansen Cc: Wei Yang Cc: Dennis Zhou Cc: Alexey Makhalov Signed-off-by: Andrew Morton --- arch/x86/mm/numa.c | 33 ++++++++++++++++++++------------- include/linux/mm.h | 1 - mm/page_alloc.c | 2 +- 3 files changed, 21 insertions(+), 15 deletions(-) --- a/arch/x86/mm/numa.c~arch-x86-mm-numa-do-not-initialize-nodes-twice +++ a/arch/x86/mm/numa.c @@ -738,17 +738,6 @@ void __init x86_numa_init(void) numa_init(dummy_numa_init); } -static void __init init_memory_less_node(int nid) -{ - /* Allocate and initialize node data. Memory-less node is now online.*/ - alloc_node_data(nid); - free_area_init_memoryless_node(nid); - - /* - * All zonelists will be built later in start_kernel() after per cpu - * areas are initialized. - */ -} /* * A node may exist which has one or more Generic Initiators but no CPUs and no @@ -766,9 +755,18 @@ void __init init_gi_nodes(void) { int nid; + /* + * Exclude this node from + * bringup_nonboot_cpus + * cpu_up + * __try_online_node + * register_one_node + * because node_subsys is not initialized yet. + * TODO remove dependency on node_online + */ for_each_node_state(nid, N_GENERIC_INITIATOR) if (!node_online(nid)) - init_memory_less_node(nid); + node_set_online(nid); } /* @@ -798,8 +796,17 @@ void __init init_cpu_to_node(void) if (node == NUMA_NO_NODE) continue; + /* + * Exclude this node from + * bringup_nonboot_cpus + * cpu_up + * __try_online_node + * register_one_node + * because node_subsys is not initialized yet. + * TODO remove dependency on node_online + */ if (!node_online(node)) - init_memory_less_node(node); + node_set_online(node); numa_set_node(cpu, node); } --- a/include/linux/mm.h~arch-x86-mm-numa-do-not-initialize-nodes-twice +++ a/include/linux/mm.h @@ -2449,7 +2449,6 @@ static inline spinlock_t *pud_lock(struc } extern void __init pagecache_init(void); -extern void __init free_area_init_memoryless_node(int nid); extern void free_initmem(void); /* --- a/mm/page_alloc.c~arch-x86-mm-numa-do-not-initialize-nodes-twice +++ a/mm/page_alloc.c @@ -7626,7 +7626,7 @@ static void __init free_area_init_node(i free_area_init_core(pgdat); } -void __init free_area_init_memoryless_node(int nid) +static void __init free_area_init_memoryless_node(int nid) { free_area_init_node(nid); } From patchwork Tue Mar 22 21:43:54 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789146 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F3BF8C433EF for ; Tue, 22 Mar 2022 21:43:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8BD016B00F6; Tue, 22 Mar 2022 17:43:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 892B66B00F7; Tue, 22 Mar 2022 17:43:58 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 733A66B00F8; Tue, 22 Mar 2022 17:43:58 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.28]) by kanga.kvack.org (Postfix) with ESMTP id 605256B00F6 for ; Tue, 22 Mar 2022 17:43:58 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 2AA7421B15 for ; Tue, 22 Mar 2022 21:43:58 +0000 (UTC) X-FDA: 79273350156.03.00D4878 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf30.hostedemail.com (Postfix) with ESMTP id 7831E8001F for ; Tue, 22 Mar 2022 21:43:57 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 6E997B81DB7; Tue, 22 Mar 2022 21:43:56 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 28386C340EC; Tue, 22 Mar 2022 21:43:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985435; bh=S/eyA2HwSh27Qj38qgVfJnrFJHeLMF87XhC8sk0Sb1k=; h=Date:To:From:In-Reply-To:Subject:From; b=LW+CRqO5Vx4Wz5V6elTfWc71oJwF60oYGmUUo65ziUyeTvjf4YGAf0NikBQ4l60VX rajogBY7Zt6X0cMkYo2hMgbEeY34F1ufiXaKb2mZyrgF8c5oOAjDuzEpepNWHLxK/i SD6nNl8Xn94gkGaeQCPMQrSSlsh8mLUelNzwp1JA= Date: Tue, 22 Mar 2022 14:43:54 -0700 To: timmurray@google.com,shakeelb@google.com,roman.gushchin@linux.dev,pmladek@suse.com,peterz@infradead.org,minchan@kernel.org,mhocko@suse.com,hannes@cmpxchg.org,surenb@google.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 106/227] mm: count time in drain_all_pages during direct reclaim as memory pressure Message-Id: <20220322214355.28386C340EC@smtp.kernel.org> X-Stat-Signature: 5xk95nc54ftakpewfhwbhjwbbsxjo53o Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=LW+CRqO5; spf=pass (imf30.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspam-User: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 7831E8001F X-HE-Tag: 1647985437-762448 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Suren Baghdasaryan Subject: mm: count time in drain_all_pages during direct reclaim as memory pressure When page allocation in direct reclaim path fails, the system will make one attempt to shrink per-cpu page lists and free pages from high alloc reserves. Draining per-cpu pages into buddy allocator can be a very slow operation because it's done using workqueues and the task in direct reclaim waits for all of them to finish before proceeding. Currently this time is not accounted as psi memory stall. While testing mobile devices under extreme memory pressure, when allocations are failing during direct reclaim, we notices that psi events which would be expected in such conditions were not triggered. After profiling these cases it was determined that the reason for missing psi events was that a big chunk of time spent in direct reclaim is not accounted as memory stall, therefore psi would not reach the levels at which an event is generated. Further investigation revealed that the bulk of that unaccounted time was spent inside drain_all_pages call. A typical captured case when drain_all_pages path gets activated: __alloc_pages_slowpath took 44.644.613ns __perform_reclaim took 751.668ns (1.7%) drain_all_pages took 43.887.167ns (98.3%) PSI in this case records the time spent in __perform_reclaim but ignores drain_all_pages, IOW it misses 98.3% of the time spent in __alloc_pages_slowpath. Annotate __alloc_pages_direct_reclaim in its entirety so that delays from handling page allocation failure in the direct reclaim path are accounted as memory stall. Link: https://lkml.kernel.org/r/20220223194812.1299646-1-surenb@google.com Signed-off-by: Suren Baghdasaryan Reported-by: Tim Murray Acked-by: Johannes Weiner Acked-by: Michal Hocko Reviewed-by: Shakeel Butt Cc: Petr Mladek Cc: Peter Zijlstra Cc: Roman Gushchin Cc: Minchan Kim Signed-off-by: Andrew Morton --- mm/page_alloc.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) --- a/mm/page_alloc.c~mm-count-time-in-drain_all_pages-during-direct-reclaim-as-memory-pressure +++ a/mm/page_alloc.c @@ -4554,13 +4554,12 @@ __perform_reclaim(gfp_t gfp_mask, unsign const struct alloc_context *ac) { unsigned int noreclaim_flag; - unsigned long pflags, progress; + unsigned long progress; cond_resched(); /* We now go into synchronous reclaim */ cpuset_memory_pressure_bump(); - psi_memstall_enter(&pflags); fs_reclaim_acquire(gfp_mask); noreclaim_flag = memalloc_noreclaim_save(); @@ -4569,7 +4568,6 @@ __perform_reclaim(gfp_t gfp_mask, unsign memalloc_noreclaim_restore(noreclaim_flag); fs_reclaim_release(gfp_mask); - psi_memstall_leave(&pflags); cond_resched(); @@ -4583,11 +4581,13 @@ __alloc_pages_direct_reclaim(gfp_t gfp_m unsigned long *did_some_progress) { struct page *page = NULL; + unsigned long pflags; bool drained = false; + psi_memstall_enter(&pflags); *did_some_progress = __perform_reclaim(gfp_mask, order, ac); if (unlikely(!(*did_some_progress))) - return NULL; + goto out; retry: page = get_page_from_freelist(gfp_mask, order, alloc_flags, ac); @@ -4603,6 +4603,8 @@ retry: drained = true; goto retry; } +out: + psi_memstall_leave(&pflags); return page; } From patchwork Tue Mar 22 21:43:57 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789145 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C6793C433F5 for ; Tue, 22 Mar 2022 21:44:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 64AE36B00F8; Tue, 22 Mar 2022 17:44:01 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5FC2F6B00F9; Tue, 22 Mar 2022 17:44:01 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4E9A06B00FA; Tue, 22 Mar 2022 17:44:01 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.25]) by kanga.kvack.org (Postfix) with ESMTP id 3F2CF6B00F8 for ; Tue, 22 Mar 2022 17:44:01 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 16CA661D0A for ; Tue, 22 Mar 2022 21:44:01 +0000 (UTC) X-FDA: 79273350282.02.5429B42 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf17.hostedemail.com (Postfix) with ESMTP id A2E5340029 for ; Tue, 22 Mar 2022 21:44:00 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 8BDA4B81D77; Tue, 22 Mar 2022 21:43:59 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 358BDC340EE; Tue, 22 Mar 2022 21:43:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985438; bh=E4nHmtAv0Fqu/LoRC6ZvCmDHwa3dac55j0gxoKuqloA=; h=Date:To:From:In-Reply-To:Subject:From; b=Zgf61Nhv9csIcNw7b/j3uyBS+d/Hbml7hzO4a3zHO3bsN8STwYG5VX/QG7KEiPH8V y/dWCEoZrDSi6RCZjLPu0snbV+fpvgaYWKyR+MMpwYSKkjOroEZSEFXfHiwV0pzPtq 0cHJNqZSFah7mo7BK/vR7NfL81UioYuqh6vGuit8= Date: Tue, 22 Mar 2022 14:43:57 -0700 To: weixugc@google.com,vbabka@suse.cz,shakeelb@google.com,rientjes@google.com,mhocko@kernel.org,mgorman@techsingularity.net,hughd@google.com,gthelen@google.com,edumazet@google.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 107/227] mm/page_alloc: call check_new_pages() while zone spinlock is not held Message-Id: <20220322214358.358BDC340EE@smtp.kernel.org> X-Stat-Signature: rdheewrh94md5c8jjni8t9ispnxpg9tb Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=Zgf61Nhv; spf=pass (imf17.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspam-User: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: A2E5340029 X-HE-Tag: 1647985440-481549 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Eric Dumazet Subject: mm/page_alloc: call check_new_pages() while zone spinlock is not held For high order pages not using pcp, rmqueue() is currently calling the costly check_new_pages() while zone spinlock is held, and hard irqs masked. This is not needed, we can release the spinlock sooner to reduce zone spinlock contention. Note that after this patch, we call __mod_zone_freepage_state() before deciding to leak the page because it is in bad state. Link: https://lkml.kernel.org/r/20220304170215.1868106-1-eric.dumazet@gmail.com Signed-off-by: Eric Dumazet Reviewed-by: Shakeel Butt Acked-by: David Rientjes Acked-by: Mel Gorman Reviewed-by: Vlastimil Babka Cc: Michal Hocko Cc: Wei Xu Cc: Greg Thelen Cc: Hugh Dickins Signed-off-by: Andrew Morton --- mm/page_alloc.c | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) --- a/mm/page_alloc.c~mm-page_alloc-call-check_new_pages-while-zone-spinlock-is-not-held +++ a/mm/page_alloc.c @@ -3665,10 +3665,10 @@ struct page *rmqueue(struct zone *prefer * allocate greater than order-1 page units with __GFP_NOFAIL. */ WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1)); - spin_lock_irqsave(&zone->lock, flags); do { page = NULL; + spin_lock_irqsave(&zone->lock, flags); /* * order-0 request can reach here when the pcplist is skipped * due to non-CMA allocation context. HIGHATOMIC area is @@ -3680,15 +3680,15 @@ struct page *rmqueue(struct zone *prefer if (page) trace_mm_page_alloc_zone_locked(page, order, migratetype); } - if (!page) + if (!page) { page = __rmqueue(zone, order, migratetype, alloc_flags); - } while (page && check_new_pages(page, order)); - if (!page) - goto failed; - - __mod_zone_freepage_state(zone, -(1 << order), - get_pcppage_migratetype(page)); - spin_unlock_irqrestore(&zone->lock, flags); + if (!page) + goto failed; + } + __mod_zone_freepage_state(zone, -(1 << order), + get_pcppage_migratetype(page)); + spin_unlock_irqrestore(&zone->lock, flags); + } while (check_new_pages(page, order)); __count_zid_vm_events(PGALLOC, page_zonenum(page), 1 << order); zone_statistics(preferred_zone, zone, 1); From patchwork Tue Mar 22 21:44:00 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789147 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1DF71C43219 for ; Tue, 22 Mar 2022 21:44:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9A6F46B00FA; Tue, 22 Mar 2022 17:44:04 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 955816B00FB; Tue, 22 Mar 2022 17:44:04 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 86BCA6B00FC; Tue, 22 Mar 2022 17:44:04 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.a.hostedemail.com [64.99.140.24]) by kanga.kvack.org (Postfix) with ESMTP id 756FF6B00FA for ; Tue, 22 Mar 2022 17:44:04 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 4AD56307 for ; Tue, 22 Mar 2022 21:44:04 +0000 (UTC) X-FDA: 79273350408.12.D528DA7 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf16.hostedemail.com (Postfix) with ESMTP id BD537180034 for ; Tue, 22 Mar 2022 21:44:03 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 9FF42B81DB7; Tue, 22 Mar 2022 21:44:02 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3E00CC340EC; Tue, 22 Mar 2022 21:44:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985441; bh=0KQ7zTA0G7TDqiIMvbVyZ5xBbQEyvkssveWNFsODP7g=; h=Date:To:From:In-Reply-To:Subject:From; b=xRZKeCtEppkmuuflegs32owXrMaOfwAjLtro/WubjcyzUhmxYnkG8cFnLcXXprlMS jqDTOjq9QIV13dtr0tyICuwkHZB7BzfMtD4JBPLBU6k9sTGxyM61AFGIDrypSc4W1v M84q8r6Vr4CMiMWqiTOlXZPqlJVgvzTR6qlTLvQQ= Date: Tue, 22 Mar 2022 14:44:00 -0700 To: weixugc@google.com,vbabka@suse.cz,shakeelb@google.com,rientjes@google.com,mhocko@kernel.org,hughd@google.com,gthelen@google.com,edumazet@google.com,mgorman@techsingularity.net,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 108/227] mm/page_alloc: check high-order pages for corruption during PCP operations Message-Id: <20220322214401.3E00CC340EC@smtp.kernel.org> Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=xRZKeCtE; spf=pass (imf16.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspam-User: X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: BD537180034 X-Stat-Signature: 3yewhpm5ue6t8wy1h33sfcbx19ufoa8e X-HE-Tag: 1647985443-485989 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Mel Gorman Subject: mm/page_alloc: check high-order pages for corruption during PCP operations Eric Dumazet pointed out that commit 44042b449872 ("mm/page_alloc: allow high-order pages to be stored on the per-cpu lists") only checks the head page during PCP refill and allocation operations. This was an oversight and all pages should be checked. This will incur a small performance penalty but it's necessary for correctness. Link: https://lkml.kernel.org/r/20220310092456.GJ15701@techsingularity.net Fixes: 44042b449872 ("mm/page_alloc: allow high-order pages to be stored on the per-cpu lists") Signed-off-by: Mel Gorman Reported-by: Eric Dumazet Acked-by: Eric Dumazet Reviewed-by: Shakeel Butt Acked-by: Vlastimil Babka Acked-by: David Rientjes Cc: Michal Hocko Cc: Wei Xu Cc: Greg Thelen Cc: Hugh Dickins Signed-off-by: Andrew Morton --- mm/page_alloc.c | 46 +++++++++++++++++++++++----------------------- 1 file changed, 23 insertions(+), 23 deletions(-) --- a/mm/page_alloc.c~mm-page_alloc-check-high-order-pages-for-corruption-during-pcp-operations +++ a/mm/page_alloc.c @@ -2291,23 +2291,36 @@ static inline int check_new_page(struct return 1; } +static bool check_new_pages(struct page *page, unsigned int order) +{ + int i; + for (i = 0; i < (1 << order); i++) { + struct page *p = page + i; + + if (unlikely(check_new_page(p))) + return true; + } + + return false; +} + #ifdef CONFIG_DEBUG_VM /* * With DEBUG_VM enabled, order-0 pages are checked for expected state when * being allocated from pcp lists. With debug_pagealloc also enabled, they are * also checked when pcp lists are refilled from the free lists. */ -static inline bool check_pcp_refill(struct page *page) +static inline bool check_pcp_refill(struct page *page, unsigned int order) { if (debug_pagealloc_enabled_static()) - return check_new_page(page); + return check_new_pages(page, order); else return false; } -static inline bool check_new_pcp(struct page *page) +static inline bool check_new_pcp(struct page *page, unsigned int order) { - return check_new_page(page); + return check_new_pages(page, order); } #else /* @@ -2315,32 +2328,19 @@ static inline bool check_new_pcp(struct * when pcp lists are being refilled from the free lists. With debug_pagealloc * enabled, they are also checked when being allocated from the pcp lists. */ -static inline bool check_pcp_refill(struct page *page) +static inline bool check_pcp_refill(struct page *page, unsigned int order) { - return check_new_page(page); + return check_new_pages(page, order); } -static inline bool check_new_pcp(struct page *page) +static inline bool check_new_pcp(struct page *page, unsigned int order) { if (debug_pagealloc_enabled_static()) - return check_new_page(page); + return check_new_pages(page, order); else return false; } #endif /* CONFIG_DEBUG_VM */ -static bool check_new_pages(struct page *page, unsigned int order) -{ - int i; - for (i = 0; i < (1 << order); i++) { - struct page *p = page + i; - - if (unlikely(check_new_page(p))) - return true; - } - - return false; -} - inline void post_alloc_hook(struct page *page, unsigned int order, gfp_t gfp_flags) { @@ -2982,7 +2982,7 @@ static int rmqueue_bulk(struct zone *zon if (unlikely(page == NULL)) break; - if (unlikely(check_pcp_refill(page))) + if (unlikely(check_pcp_refill(page, order))) continue; /* @@ -3600,7 +3600,7 @@ struct page *__rmqueue_pcplist(struct zo page = list_first_entry(list, struct page, lru); list_del(&page->lru); pcp->count -= 1 << order; - } while (check_new_pcp(page)); + } while (check_new_pcp(page, order)); return page; } From patchwork Tue Mar 22 21:44:03 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789148 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9E641C433F5 for ; Tue, 22 Mar 2022 21:44:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 340756B00FC; Tue, 22 Mar 2022 17:44:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2EDB56B00FD; Tue, 22 Mar 2022 17:44:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 18E156B00FE; Tue, 22 Mar 2022 17:44:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.26]) by kanga.kvack.org (Postfix) with ESMTP id 0405D6B00FC for ; Tue, 22 Mar 2022 17:44:06 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id C9625229FE for ; Tue, 22 Mar 2022 21:44:05 +0000 (UTC) X-FDA: 79273350450.12.4DCB24A Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf01.hostedemail.com (Postfix) with ESMTP id 5DEA340016 for ; Tue, 22 Mar 2022 21:44:05 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id D2D9561347; Tue, 22 Mar 2022 21:44:04 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3450DC340EC; Tue, 22 Mar 2022 21:44:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985444; bh=xgqZDbTzEsfmPCiFaNGMAyorMwxLVu3/TxjnXbwlF0g=; h=Date:To:From:In-Reply-To:Subject:From; b=UN4AfFbF1qt7cw8LtPLdLjeWFXkhyNcLQDOXuJswVbXLEsnSXEpB3nj9jiFd3Ui1E kn/HrUMz+OJMTlBnAeBndv10VMw7zN1cvjGZl4/ts35fhuELeCAsoXfYH9AOp632Qk DwTzyhc3tnHZlYdTckVGsq3x7TJac0Yskcz9v4mo= Date: Tue, 22 Mar 2022 14:44:03 -0700 To: shy828301@gmail.com,osalvador@suse.de,mike.kravetz@oracle.com,linmiaohe@huawei.com,anshuman.khandual@arm.com,naoya.horiguchi@nec.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 109/227] mm/memory-failure.c: remove obsolete comment Message-Id: <20220322214404.3450DC340EC@smtp.kernel.org> Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=UN4AfFbF; spf=pass (imf01.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Stat-Signature: amh9kj4auf754b4a49zeb6eurggbz9a6 X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 5DEA340016 X-HE-Tag: 1647985445-110351 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Naoya Horiguchi Subject: mm/memory-failure.c: remove obsolete comment With the introduction of mf_mutex, most of memory error handling process is mutually exclusive, so the in-line comment about subtlety about double-checking PageHWPoison is no more correct. So remove it. Link: https://lkml.kernel.org/r/20220125025601.3054511-1-naoya.horiguchi@linux.dev Signed-off-by: Naoya Horiguchi Suggested-by: Mike Kravetz Reviewed-by: Miaohe Lin Reviewed-by: Anshuman Khandual Reviewed-by: Oscar Salvador Reviewed-by: Yang Shi Signed-off-by: Andrew Morton --- mm/memory-failure.c | 6 ------ 1 file changed, 6 deletions(-) --- a/mm/memory-failure.c~mm-hwpoison-remove-obsolete-comment +++ a/mm/memory-failure.c @@ -2150,12 +2150,6 @@ static int __soft_offline_page(struct pa .gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL, }; - /* - * Check PageHWPoison again inside page lock because PageHWPoison - * is set by memory_failure() outside page lock. Note that - * memory_failure() also double-checks PageHWPoison inside page lock, - * so there's no race between soft_offline_page() and memory_failure(). - */ lock_page(page); if (!PageHuge(page)) wait_on_page_writeback(page); From patchwork Tue Mar 22 21:44:06 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789149 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9C373C433EF for ; Tue, 22 Mar 2022 21:44:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2F28D6B00FE; Tue, 22 Mar 2022 17:44:09 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 277FC6B00FF; Tue, 22 Mar 2022 17:44:09 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 140446B0100; Tue, 22 Mar 2022 17:44:09 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.25]) by kanga.kvack.org (Postfix) with ESMTP id 00A2A6B00FE for ; Tue, 22 Mar 2022 17:44:08 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id D2BB861E04 for ; Tue, 22 Mar 2022 21:44:08 +0000 (UTC) X-FDA: 79273350576.10.B3C7659 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf08.hostedemail.com (Postfix) with ESMTP id 6AAB916001E for ; Tue, 22 Mar 2022 21:44:08 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id DE619614A2; Tue, 22 Mar 2022 21:44:07 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 42FB7C340F2; Tue, 22 Mar 2022 21:44:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985447; bh=HdU0vZFsm6ZQTbOCxFhzrAcCi2PY1UHl6gH2MZTIBHk=; h=Date:To:From:In-Reply-To:Subject:From; b=GVy8cKzz1rcCph6Y67vbN1CY1T68/z3LpdLuysVOiSjlT4sI/ykjVRwJ4VWanu7aS 3alESKRNoXyB5FKef1UzcvHxmQtR3Hv2nvHpANMXrMjg2S30VclJqmftvItK6AvVVF PhMhNG5OStYCLxcNlGy4hPFlTT3nNw0+TpbczaFQ= Date: Tue, 22 Mar 2022 14:44:06 -0700 To: youquan.song@intel.com,tony.luck@intel.com,naoya.horiguchi@nec.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 110/227] mm/hwpoison: fix error page recovered but reported "not recovered" Message-Id: <20220322214407.42FB7C340F2@smtp.kernel.org> X-Stat-Signature: j7pmx3tp4bo8wjbtths94synu9iebrgd X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 6AAB916001E Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=GVy8cKzz; dmarc=none; spf=pass (imf08.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Rspam-User: X-HE-Tag: 1647985448-255336 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Naoya Horiguchi Subject: mm/hwpoison: fix error page recovered but reported "not recovered" When an uncorrected memory error is consumed there is a race between the CMCI from the memory controller reporting an uncorrected error with a UCNA signature, and the core reporting and SRAR signature machine check when the data is about to be consumed. If the CMCI wins that race, the page is marked poisoned when uc_decode_notifier() calls memory_failure() and the machine check processing code finds the page already poisoned. It calls kill_accessing_process() to make sure a SIGBUS is sent. But returns the wrong error code. Console log looks like this: [34775.674296] mce: Uncorrected hardware memory error in user-access at 3710b3400 [34775.675413] Memory failure: 0x3710b3: recovery action for dirty LRU page: Recovered [34775.690310] Memory failure: 0x3710b3: already hardware poisoned [34775.696247] Memory failure: 0x3710b3: Sending SIGBUS to einj_mem_uc:361438 due to hardware memory corruption [34775.706072] mce: Memory error not recovered kill_accessing_process() is supposed to return -EHWPOISON to notify that SIGBUS is already set to the process and kill_me_maybe() doesn't have to send it again. But current code simply fails to do this, so fix it to make sure to work as intended. This change avoids the noise message "Memory error not recovered" and skips duplicate SIGBUSs. [tony.luck@intel.com: reword some parts of commit message] Link: https://lkml.kernel.org/r/20220113231117.1021405-1-naoya.horiguchi@linux.dev Fixes: a3f5d80ea401 ("mm,hwpoison: send SIGBUS with error virutal address") Signed-off-by: Naoya Horiguchi Reported-by: Youquan Song Cc: Tony Luck Signed-off-by: Andrew Morton --- mm/memory-failure.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) --- a/mm/memory-failure.c~mm-hwpoison-fix-error-page-recovered-but-reported-not-recovered +++ a/mm/memory-failure.c @@ -707,8 +707,10 @@ static int kill_accessing_process(struct (void *)&priv); if (ret == 1 && priv.tk.addr) kill_proc(&priv.tk, pfn, flags); + else + ret = 0; mmap_read_unlock(p->mm); - return ret ? -EFAULT : -EHWPOISON; + return ret > 0 ? -EHWPOISON : -EFAULT; } static const char *action_name[] = { From patchwork Tue Mar 22 21:44:09 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789194 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 15FF1C433EF for ; Tue, 22 Mar 2022 21:44:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AB7F16B0100; Tue, 22 Mar 2022 17:44:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A67926B0101; Tue, 22 Mar 2022 17:44:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 92F1F6B0102; Tue, 22 Mar 2022 17:44:13 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0201.hostedemail.com [216.40.44.201]) by kanga.kvack.org (Postfix) with ESMTP id 80D536B0100 for ; Tue, 22 Mar 2022 17:44:13 -0400 (EDT) Received: from smtpin28.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 44E6FA5658 for ; Tue, 22 Mar 2022 21:44:13 +0000 (UTC) X-FDA: 79273350786.28.77BA623 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf14.hostedemail.com (Postfix) with ESMTP id 5DE8510002F for ; Tue, 22 Mar 2022 21:44:12 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id A2018B81DB7; Tue, 22 Mar 2022 21:44:11 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4F104C340EE; Tue, 22 Mar 2022 21:44:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985450; bh=SF8hrnVp1Izvv/ktKEWASEk7Ceno1vMJ7K7QQ6lIL2w=; h=Date:To:From:In-Reply-To:Subject:From; b=VQjs1azW94VO/FF1jDT92Agnqb3tDON5VXfStBJi6YdTiUhnVTs8OtM8pLpziPC4v QDFyio5XyUv4gVaW9DqIUc67HFA0ywD0XvEk2ImQF6XX7wkcjGNJwWCES+IedT6vGc w7kiwBdzRyrYNLzM1ULR7kgUj1HFeaPvFIzvvVP8= Date: Tue, 22 Mar 2022 14:44:09 -0700 To: willy@infradead.org,stable@vger.kernel.org,osalvador@suse.de,naoya.horiguchi@nec.com,mgorman@suse.de,linmiaohe@huawei.com,jhubbard@nvidia.com,hannes@cmpxchg.org,riel@surriel.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 111/227] mm: invalidate hwpoison page cache page in fault path Message-Id: <20220322214410.4F104C340EE@smtp.kernel.org> X-Rspamd-Server: rspam09 X-Rspam-User: X-Stat-Signature: y9dez7r3hao6spn4ou1qpyrwin9qswq8 Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=VQjs1azW; dmarc=none; spf=pass (imf14.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Rspamd-Queue-Id: 5DE8510002F X-HE-Tag: 1647985452-953780 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Rik van Riel Subject: mm: invalidate hwpoison page cache page in fault path Sometimes the page offlining code can leave behind a hwpoisoned clean page cache page. This can lead to programs being killed over and over and over again as they fault in the hwpoisoned page, get killed, and then get re-spawned by whatever wanted to run them. This is particularly embarrassing when the page was offlined due to having too many corrected memory errors. Now we are killing tasks due to them trying to access memory that probably isn't even corrupted. This problem can be avoided by invalidating the page from the page fault handler, which already has a branch for dealing with these kinds of pages. With this patch we simply pretend the page fault was successful if the page was invalidated, return to userspace, incur another page fault, read in the file from disk (to a new memory page), and then everything works again. Link: https://lkml.kernel.org/r/20220212213740.423efcea@imladris.surriel.com Signed-off-by: Rik van Riel Reviewed-by: Miaohe Lin Acked-by: Naoya Horiguchi Reviewed-by: Oscar Salvador Cc: John Hubbard Cc: Mel Gorman Cc: Johannes Weiner Cc: Matthew Wilcox Cc: Signed-off-by: Andrew Morton --- mm/memory.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) --- a/mm/memory.c~mm-clean-up-hwpoison-page-cache-page-in-fault-path +++ a/mm/memory.c @@ -3877,11 +3877,16 @@ static vm_fault_t __do_fault(struct vm_f return ret; if (unlikely(PageHWPoison(vmf->page))) { - if (ret & VM_FAULT_LOCKED) + vm_fault_t poisonret = VM_FAULT_HWPOISON; + if (ret & VM_FAULT_LOCKED) { + /* Retry if a clean page was removed from the cache. */ + if (invalidate_inode_page(vmf->page)) + poisonret = 0; unlock_page(vmf->page); + } put_page(vmf->page); vmf->page = NULL; - return VM_FAULT_HWPOISON; + return poisonret; } if (unlikely(!(ret & VM_FAULT_LOCKED))) From patchwork Tue Mar 22 21:44:12 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789195 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F1CC4C433EF for ; Tue, 22 Mar 2022 21:44:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7FC496B0102; Tue, 22 Mar 2022 17:44:16 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7A8A86B0103; Tue, 22 Mar 2022 17:44:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 670456B0104; Tue, 22 Mar 2022 17:44:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.28]) by kanga.kvack.org (Postfix) with ESMTP id 57B4C6B0102 for ; Tue, 22 Mar 2022 17:44:16 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay12.hostedemail.com (Postfix) with ESMTP id 15F98121702 for ; Tue, 22 Mar 2022 21:44:16 +0000 (UTC) X-FDA: 79273350912.14.F71F6D5 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf15.hostedemail.com (Postfix) with ESMTP id 9D519A002D for ; Tue, 22 Mar 2022 21:44:15 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 86CE7B81DAF; Tue, 22 Mar 2022 21:44:14 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3D944C340EC; Tue, 22 Mar 2022 21:44:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985453; bh=sq91+5Li+9XYqSTLeVxMzaSRvaiLl3o2S1cnLE6YEvY=; h=Date:To:From:In-Reply-To:Subject:From; b=yzYU1uJHEeHkSqKsXHaM994oYK3rj3OYSg9+9WrhT8mKGrY2vL9W1F+evskHFhuWB 6s3/hPT5vEfwMQHZq8XoMLMRE56nkaTtbZJqU7njxTlInLtuIkaCdhyhCpNIRjTVmZ 6KmU600+DSh0G/aSuXtMopDNPEQifhsiJ0mYzVM0= Date: Tue, 22 Mar 2022 14:44:12 -0700 To: naoya.horiguchi@nec.com,linmiaohe@huawei.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 112/227] mm/memory-failure.c: minor clean up for memory_failure_dev_pagemap Message-Id: <20220322214413.3D944C340EC@smtp.kernel.org> X-Rspam-User: X-Stat-Signature: twpd3dhypzncyntx9g9b3nppgj4b77ok Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=yzYU1uJH; spf=pass (imf15.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 9D519A002D X-HE-Tag: 1647985455-18679 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Miaohe Lin Subject: mm/memory-failure.c: minor clean up for memory_failure_dev_pagemap Patch series "A few cleanup and fixup patches for memory failure", v3. This series contains a few patches to simplify the code logic, remove unneeded variable and remove obsolete comment. Also we fix race changing page more robustly in memory_failure. More details can be found in the respective changelogs. This patch (of 8): The flags always has MF_ACTION_REQUIRED and MF_MUST_KILL set. So we do not need to check these flags again. Link: https://lkml.kernel.org/r/20220218090118.1105-1-linmiaohe@huawei.com Link: https://lkml.kernel.org/r/20220218090118.1105-2-linmiaohe@huawei.com Signed-off-by: Miaohe Lin Acked-by: Naoya Horiguchi Signed-off-by: Andrew Morton --- mm/memory-failure.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) --- a/mm/memory-failure.c~mm-memory-failurec-minor-clean-up-for-memory_failure_dev_pagemap +++ a/mm/memory-failure.c @@ -1640,7 +1640,7 @@ static int memory_failure_dev_pagemap(un * SIGBUS (i.e. MF_MUST_KILL) */ flags |= MF_ACTION_REQUIRED | MF_MUST_KILL; - collect_procs(page, &tokill, flags & MF_ACTION_REQUIRED); + collect_procs(page, &tokill, true); list_for_each_entry(tk, &tokill, nd) if (tk->size_shift) @@ -1655,7 +1655,7 @@ static int memory_failure_dev_pagemap(un start = (page->index << PAGE_SHIFT) & ~(size - 1); unmap_mapping_range(page->mapping, start, size, 0); } - kill_procs(&tokill, flags & MF_MUST_KILL, false, pfn, flags); + kill_procs(&tokill, true, false, pfn, flags); rc = 0; unlock: dax_unlock_page(page, cookie); From patchwork Tue Mar 22 21:44:15 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789197 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C53F0C433F5 for ; Tue, 22 Mar 2022 21:44:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E3A7C6B0105; Tue, 22 Mar 2022 17:44:19 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D9B816B0106; Tue, 22 Mar 2022 17:44:19 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BC7166B0107; Tue, 22 Mar 2022 17:44:19 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0112.hostedemail.com [216.40.44.112]) by kanga.kvack.org (Postfix) with ESMTP id AB1336B0105 for ; Tue, 22 Mar 2022 17:44:19 -0400 (EDT) Received: from smtpin21.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 7AE5F8249980 for ; Tue, 22 Mar 2022 21:44:19 +0000 (UTC) X-FDA: 79273351038.21.B9BDC9A Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf22.hostedemail.com (Postfix) with ESMTP id CE1F1C0031 for ; Tue, 22 Mar 2022 21:44:18 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id AF87BB81DB9; Tue, 22 Mar 2022 21:44:17 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 2EF77C340F2; Tue, 22 Mar 2022 21:44:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985456; bh=1o+/vlScURsqqtKv20aatWbXrGRQh7hTcKq2D/GnsUo=; h=Date:To:From:In-Reply-To:Subject:From; b=myQxlKVGAU2YOqcufqolY6980r6D+v6HSqx3qIPQmqNigZw3fM34RwCUyPFxyQh+M y/YsJ3g0n/e5i7qGdJxAQ9GzVrb5fTG9VyRcjJd3dOxEPWLNgRvZwx07x7oLMx4dvM 6jgkFV0kY338zGbVzORw1C8IIkWL8caL1kr5rIeo= Date: Tue, 22 Mar 2022 14:44:15 -0700 To: naoya.horiguchi@nec.com,linmiaohe@huawei.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 113/227] mm/memory-failure.c: catch unexpected -EFAULT from vma_address() Message-Id: <20220322214416.2EF77C340F2@smtp.kernel.org> Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=myQxlKVG; spf=pass (imf22.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Stat-Signature: zn5zbe44jjabw48663fn19uppitiyqqe X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: CE1F1C0031 X-HE-Tag: 1647985458-555943 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Miaohe Lin Subject: mm/memory-failure.c: catch unexpected -EFAULT from vma_address() It's unexpected to walk the page table when vma_address() return -EFAULT. But dev_pagemap_mapping_shift() is called only when vma associated to the error page is found already in collect_procs_{file,anon}, so vma_address() should not return -EFAULT except with some bug, as Naoya pointed out. We can use VM_BUG_ON_VMA() to catch this bug here. Link: https://lkml.kernel.org/r/20220218090118.1105-3-linmiaohe@huawei.com Signed-off-by: Miaohe Lin Acked-by: Naoya Horiguchi Signed-off-by: Andrew Morton --- mm/memory-failure.c | 1 + 1 file changed, 1 insertion(+) --- a/mm/memory-failure.c~mm-memory-failurec-catch-unexpected-efault-from-vma_address +++ a/mm/memory-failure.c @@ -315,6 +315,7 @@ static unsigned long dev_pagemap_mapping pmd_t *pmd; pte_t *pte; + VM_BUG_ON_VMA(address == -EFAULT, vma); pgd = pgd_offset(vma->vm_mm, address); if (!pgd_present(*pgd)) return 0; From patchwork Tue Mar 22 21:44:18 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789198 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id ADA40C433FE for ; Tue, 22 Mar 2022 21:44:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2F4126B0106; Tue, 22 Mar 2022 17:44:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 251AA6B0108; Tue, 22 Mar 2022 17:44:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0F4C96B0109; Tue, 22 Mar 2022 17:44:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0204.hostedemail.com [216.40.44.204]) by kanga.kvack.org (Postfix) with ESMTP id EE27A6B0106 for ; Tue, 22 Mar 2022 17:44:20 -0400 (EDT) Received: from smtpin16.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id BA27818265D22 for ; Tue, 22 Mar 2022 21:44:20 +0000 (UTC) X-FDA: 79273351080.16.F61A141 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf19.hostedemail.com (Postfix) with ESMTP id 527071A0035 for ; Tue, 22 Mar 2022 21:44:20 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id C41AE6100A; Tue, 22 Mar 2022 21:44:19 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 227F1C340EC; Tue, 22 Mar 2022 21:44:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985459; bh=Ri4lzkNYEUnyFRxMfVz3fD4DiHzvZjtn2EJnYhjyP0E=; h=Date:To:From:In-Reply-To:Subject:From; b=dfjxQ7naSq8Ej2j0s1C8uPHdOjh9Kn/HvJdoCIbABZ+UsBULmOeMGixy3lPr8kCsp EzhbAbejzX9uqWrXzvCmwLuiR1sUgKKT77Ke9Gof2ajsJ63Ewq3wScrhNRfubvZDnx FWrpdIwT3R6a92DsPLF7VHhqEKYySZr/f9xu7q1k= Date: Tue, 22 Mar 2022 14:44:18 -0700 To: naoya.horiguchi@nec.com,linmiaohe@huawei.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 114/227] mm/memory-failure.c: rework the signaling logic in kill_proc Message-Id: <20220322214419.227F1C340EC@smtp.kernel.org> X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 527071A0035 X-Rspam-User: Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=dfjxQ7na; dmarc=none; spf=pass (imf19.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Stat-Signature: 5b1wjm8utexdri4ii7gu67b34wqgan53 X-HE-Tag: 1647985460-970170 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Miaohe Lin Subject: mm/memory-failure.c: rework the signaling logic in kill_proc BUS_MCEERR_AR code is only sent when MF_ACTION_REQUIRED is set and the target is current. Rework the code to make this clear. Link: https://lkml.kernel.org/r/20220218090118.1105-4-linmiaohe@huawei.com Signed-off-by: Miaohe Lin Acked-by: Naoya Horiguchi Signed-off-by: Andrew Morton --- mm/memory-failure.c | 16 ++++++---------- 1 file changed, 6 insertions(+), 10 deletions(-) --- a/mm/memory-failure.c~mm-memory-failurec-rework-the-signaling-logic-in-kill_proc +++ a/mm/memory-failure.c @@ -258,16 +258,13 @@ static int kill_proc(struct to_kill *tk, pr_err("Memory failure: %#lx: Sending SIGBUS to %s:%d due to hardware memory corruption\n", pfn, t->comm, t->pid); - if (flags & MF_ACTION_REQUIRED) { - if (t == current) - ret = force_sig_mceerr(BUS_MCEERR_AR, - (void __user *)tk->addr, addr_lsb); - else - /* Signal other processes sharing the page if they have PF_MCE_EARLY set. */ - ret = send_sig_mceerr(BUS_MCEERR_AO, (void __user *)tk->addr, - addr_lsb, t); - } else { + if ((flags & MF_ACTION_REQUIRED) && (t == current)) + ret = force_sig_mceerr(BUS_MCEERR_AR, + (void __user *)tk->addr, addr_lsb); + else /* + * Signal other processes sharing the page if they have + * PF_MCE_EARLY set. * Don't use force here, it's convenient if the signal * can be temporarily blocked. * This could cause a loop when the user sets SIGBUS @@ -275,7 +272,6 @@ static int kill_proc(struct to_kill *tk, */ ret = send_sig_mceerr(BUS_MCEERR_AO, (void __user *)tk->addr, addr_lsb, t); /* synchronous? */ - } if (ret < 0) pr_info("Memory failure: Error sending signal to %s:%d: %d\n", t->comm, t->pid, ret); From patchwork Tue Mar 22 21:44:21 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789199 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D2D0BC43217 for ; Tue, 22 Mar 2022 21:44:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6C0256B010A; Tue, 22 Mar 2022 17:44:25 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6479D6B010B; Tue, 22 Mar 2022 17:44:25 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 538AC6B010C; Tue, 22 Mar 2022 17:44:25 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.a.hostedemail.com [64.99.140.24]) by kanga.kvack.org (Postfix) with ESMTP id 449736B010A for ; Tue, 22 Mar 2022 17:44:25 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay11.hostedemail.com (Postfix) with ESMTP id 21FA3813A4 for ; Tue, 22 Mar 2022 21:44:25 +0000 (UTC) X-FDA: 79273351290.04.ADF7F1F Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf22.hostedemail.com (Postfix) with ESMTP id 7F156C0030 for ; Tue, 22 Mar 2022 21:44:24 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 7A944B81D77; Tue, 22 Mar 2022 21:44:23 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0F3C2C340F4; Tue, 22 Mar 2022 21:44:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985462; bh=5knbsc9OlMkSq/1JDIMEwr7Iq1plkwPjCJe2HFT5o5s=; h=Date:To:From:In-Reply-To:Subject:From; b=MeK1pPSadAhOJHBjCXNBdOmV+lXNgwsEfcOI13rfFlmV9IErLT7PNfJfGbBAVpHrZ cbcxiG7B3uFjJbh0Uxn7g6odnIgKVY30iMto9aENL2BdtTrO8t+2c/pyvwk6/apStt gnte1kXO3Hf7vQjMK2HKUtlR4E2cXLaOvsipj5hk= Date: Tue, 22 Mar 2022 14:44:21 -0700 To: naoya.horiguchi@nec.com,linmiaohe@huawei.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 115/227] mm/memory-failure.c: fix race with changing page more robustly Message-Id: <20220322214422.0F3C2C340F4@smtp.kernel.org> X-Rspam-User: Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=MeK1pPSa; spf=pass (imf22.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 7F156C0030 X-Stat-Signature: wb8wwpa74hixdijie8zg53h1qrfpkpjk X-HE-Tag: 1647985464-759643 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Miaohe Lin Subject: mm/memory-failure.c: fix race with changing page more robustly We're only intended to deal with the non-Compound page after we split thp in memory_failure. However, the page could have changed compound pages due to race window. If this happens, we could retry once to hopefully handle the page next round. Also remove unneeded orig_head. It's always equal to the hpage. So we can use hpage directly and remove this redundant one. Link: https://lkml.kernel.org/r/20220218090118.1105-5-linmiaohe@huawei.com Signed-off-by: Miaohe Lin Acked-by: Naoya Horiguchi Signed-off-by: Andrew Morton --- mm/memory-failure.c | 20 +++++++++++++++----- 1 file changed, 15 insertions(+), 5 deletions(-) --- a/mm/memory-failure.c~mm-memory-failurec-fix-race-with-changing-page-more-robustly +++ a/mm/memory-failure.c @@ -1686,7 +1686,6 @@ int memory_failure(unsigned long pfn, in { struct page *p; struct page *hpage; - struct page *orig_head; struct dev_pagemap *pgmap; int res = 0; unsigned long page_flags; @@ -1732,7 +1731,7 @@ try_again: goto unlock_mutex; } - orig_head = hpage = compound_head(p); + hpage = compound_head(p); num_poisoned_pages_inc(); /* @@ -1813,10 +1812,21 @@ try_again: lock_page(p); /* - * The page could have changed compound pages during the locking. - * If this happens just bail out. + * We're only intended to deal with the non-Compound page here. + * However, the page could have changed compound pages due to + * race window. If this happens, we could try again to hopefully + * handle the page next round. */ - if (PageCompound(p) && compound_head(p) != orig_head) { + if (PageCompound(p)) { + if (retry) { + if (TestClearPageHWPoison(p)) + num_poisoned_pages_dec(); + unlock_page(p); + put_page(p); + flags &= ~MF_COUNT_INCREASED; + retry = false; + goto try_again; + } action_result(pfn, MF_MSG_DIFFERENT_COMPOUND, MF_IGNORED); res = -EBUSY; goto unlock_page; From patchwork Tue Mar 22 21:44:24 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789200 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C1210C433EF for ; Tue, 22 Mar 2022 21:44:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5D5116B010C; Tue, 22 Mar 2022 17:44:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 586A26B010D; Tue, 22 Mar 2022 17:44:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 473EB6B010E; Tue, 22 Mar 2022 17:44:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0019.hostedemail.com [216.40.44.19]) by kanga.kvack.org (Postfix) with ESMTP id 3294E6B010C for ; Tue, 22 Mar 2022 17:44:28 -0400 (EDT) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 06847A4DB6 for ; Tue, 22 Mar 2022 21:44:28 +0000 (UTC) X-FDA: 79273351416.23.5686EB6 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf07.hostedemail.com (Postfix) with ESMTP id 8DCB34002D for ; Tue, 22 Mar 2022 21:44:27 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 61DCBB81DAD; Tue, 22 Mar 2022 21:44:26 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id EDD56C340EC; Tue, 22 Mar 2022 21:44:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985465; bh=b5Hq2WgA5i2EfppjmNyqdZs21SuR45WdVgAyWeLF1fE=; h=Date:To:From:In-Reply-To:Subject:From; b=XgMeqJEsy7jFxojtjKqPE6BtivOwTqim4lrvAFf/P6bT2u3fbtwPzeZClbIHkVL+h kh1n0jN4U7CWPcAv44uP25UbUt8K3pVzA+PeiZUFVdxieeUmy3irzUqLb2gcjjuKhl EqruMMuVvqchye43nYCjE/rA1VqbOXwaKA39HMpw= Date: Tue, 22 Mar 2022 14:44:24 -0700 To: naoya.horiguchi@nec.com,linmiaohe@huawei.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 116/227] mm/memory-failure.c: remove PageSlab check in hwpoison_filter_dev Message-Id: <20220322214424.EDD56C340EC@smtp.kernel.org> X-Stat-Signature: 8f19ax155hjb1y6yj4m5y8wda8spnixr Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=XgMeqJEs; dmarc=none; spf=pass (imf07.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Rspam-User: X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: 8DCB34002D X-HE-Tag: 1647985467-58606 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Miaohe Lin Subject: mm/memory-failure.c: remove PageSlab check in hwpoison_filter_dev Since commit 03e5ac2fc3bf ("mm: fix crash when using XFS on loopback"), page_mapping() can handle the Slab pages. So remove this unnecessary PageSlab check and obsolete comment. Link: https://lkml.kernel.org/r/20220218090118.1105-6-linmiaohe@huawei.com Signed-off-by: Miaohe Lin Acked-by: Naoya Horiguchi Signed-off-by: Andrew Morton --- mm/memory-failure.c | 6 ------ 1 file changed, 6 deletions(-) --- a/mm/memory-failure.c~mm-memory-failurec-remove-pageslab-check-in-hwpoison_filter_dev +++ a/mm/memory-failure.c @@ -130,12 +130,6 @@ static int hwpoison_filter_dev(struct pa hwpoison_filter_dev_minor == ~0U) return 0; - /* - * page_mapping() does not accept slab pages. - */ - if (PageSlab(p)) - return -EINVAL; - mapping = page_mapping(p); if (mapping == NULL || mapping->host == NULL) return -EINVAL; From patchwork Tue Mar 22 21:44:27 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789201 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A4168C433EF for ; Tue, 22 Mar 2022 21:44:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 27CAB6B010E; Tue, 22 Mar 2022 17:44:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 22A486B010F; Tue, 22 Mar 2022 17:44:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0F40E6B0110; Tue, 22 Mar 2022 17:44:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.28]) by kanga.kvack.org (Postfix) with ESMTP id EC5B66B010E for ; Tue, 22 Mar 2022 17:44:30 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id C4C9921D3E for ; Tue, 22 Mar 2022 21:44:30 +0000 (UTC) X-FDA: 79273351500.06.F6AD824 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf12.hostedemail.com (Postfix) with ESMTP id 581A340040 for ; Tue, 22 Mar 2022 21:44:30 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 41AB2B81DB7; Tue, 22 Mar 2022 21:44:29 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id D8BD8C340EC; Tue, 22 Mar 2022 21:44:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985467; bh=vOR0AWbeZ0W+/p8PKfNYSI+2VebxktM8MpUsSX/2EA0=; h=Date:To:From:In-Reply-To:Subject:From; b=RmXiSxIgUbDoNDYofhuj2yUWkr1X7nMiZEVqbPrLAJPMFb0MxiWrHaPDNtJDANv3w EDpHGcTi07IXZmG84KKuJcTsfhIFOQQXQfimqCcWac0m5f5hiKR33Tz4uvToaPSGw0 BrrggZre6i4sMOUjWHbwZFyVtlEtqi2KvRUMV0rU= Date: Tue, 22 Mar 2022 14:44:27 -0700 To: naoya.horiguchi@nec.com,linmiaohe@huawei.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 117/227] mm/memory-failure.c: rework the try_to_unmap logic in hwpoison_user_mappings() Message-Id: <20220322214427.D8BD8C340EC@smtp.kernel.org> X-Rspam-User: X-Stat-Signature: 9jjm58qfiakf5w39bgzxpyso3gjn8pti Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=RmXiSxIg; spf=pass (imf12.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 581A340040 X-HE-Tag: 1647985470-92105 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Miaohe Lin Subject: mm/memory-failure.c: rework the try_to_unmap logic in hwpoison_user_mappings() Only for hugetlb pages in shared mappings, try_to_unmap should take semaphore in write mode here. Rework the code to make it clear. Link: https://lkml.kernel.org/r/20220218090118.1105-7-linmiaohe@huawei.com Signed-off-by: Miaohe Lin Acked-by: Naoya Horiguchi Signed-off-by: Andrew Morton --- mm/memory-failure.c | 34 +++++++++++++++------------------- 1 file changed, 15 insertions(+), 19 deletions(-) --- a/mm/memory-failure.c~mm-memory-failurec-rework-the-try_to_unmap-logic-in-hwpoison_user_mappings +++ a/mm/memory-failure.c @@ -1404,26 +1404,22 @@ static bool hwpoison_user_mappings(struc if (kill) collect_procs(hpage, &tokill, flags & MF_ACTION_REQUIRED); - if (!PageHuge(hpage)) { - try_to_unmap(hpage, ttu); + if (PageHuge(hpage) && !PageAnon(hpage)) { + /* + * For hugetlb pages in shared mappings, try_to_unmap + * could potentially call huge_pmd_unshare. Because of + * this, take semaphore in write mode here and set + * TTU_RMAP_LOCKED to indicate we have taken the lock + * at this higher level. + */ + mapping = hugetlb_page_mapping_lock_write(hpage); + if (mapping) { + try_to_unmap(hpage, ttu|TTU_RMAP_LOCKED); + i_mmap_unlock_write(mapping); + } else + pr_info("Memory failure: %#lx: could not lock mapping for mapped huge page\n", pfn); } else { - if (!PageAnon(hpage)) { - /* - * For hugetlb pages in shared mappings, try_to_unmap - * could potentially call huge_pmd_unshare. Because of - * this, take semaphore in write mode here and set - * TTU_RMAP_LOCKED to indicate we have taken the lock - * at this higher level. - */ - mapping = hugetlb_page_mapping_lock_write(hpage); - if (mapping) { - try_to_unmap(hpage, ttu|TTU_RMAP_LOCKED); - i_mmap_unlock_write(mapping); - } else - pr_info("Memory failure: %#lx: could not lock mapping for mapped huge page\n", pfn); - } else { - try_to_unmap(hpage, ttu); - } + try_to_unmap(hpage, ttu); } unmap_success = !page_mapped(hpage); From patchwork Tue Mar 22 21:44:30 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789202 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A3191C433EF for ; Tue, 22 Mar 2022 21:44:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 366326B0110; Tue, 22 Mar 2022 17:44:33 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3136E6B0111; Tue, 22 Mar 2022 17:44:33 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1463C6B0112; Tue, 22 Mar 2022 17:44:33 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0048.hostedemail.com [216.40.44.48]) by kanga.kvack.org (Postfix) with ESMTP id F1C6A6B0110 for ; Tue, 22 Mar 2022 17:44:32 -0400 (EDT) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id BF71418249F88 for ; Tue, 22 Mar 2022 21:44:32 +0000 (UTC) X-FDA: 79273351584.18.A702294 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf10.hostedemail.com (Postfix) with ESMTP id 4DC1BC0037 for ; Tue, 22 Mar 2022 21:44:32 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 238E6B81DB9; Tue, 22 Mar 2022 21:44:31 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id B6D57C340EC; Tue, 22 Mar 2022 21:44:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985470; bh=CYCAgfZaJlS+pByLr6BdL66Gk9MBlsp4NM9QveUQY6I=; h=Date:To:From:In-Reply-To:Subject:From; b=A02pgbBhecTW8shMFuLms40pR+vQwFIaxuydE+95b472UPszazlftFFy5k6e54OM9 EmHysvnnsheCo8NtROE4UiLqHvrtVElgob3cDjaXRlvbXq5OR5+XTQl4sMreFQyUgb sOO5OrO69t5oB6N8FlFaKfE6i8I7mqjNnZvOtaMQ= Date: Tue, 22 Mar 2022 14:44:30 -0700 To: linmiaohe@huawei.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 118/227] mm/memory-failure.c: remove obsolete comment in __soft_offline_page Message-Id: <20220322214430.B6D57C340EC@smtp.kernel.org> X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 4DC1BC0037 X-Stat-Signature: nxuhkqxyixygibsjcbho6qh6hdx4j7es X-Rspam-User: Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=A02pgbBh; dmarc=none; spf=pass (imf10.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-HE-Tag: 1647985472-793076 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Miaohe Lin Subject: mm/memory-failure.c: remove obsolete comment in __soft_offline_page Since commit add05cecef80 ("mm: soft-offline: don't free target page in successful page migration"), set_migratetype_isolate logic is removed. Remove this obsolete comment. Link: https://lkml.kernel.org/r/20220218090118.1105-8-linmiaohe@huawei.com Signed-off-by: Miaohe Lin Signed-off-by: Andrew Morton --- mm/memory-failure.c | 4 ---- 1 file changed, 4 deletions(-) --- a/mm/memory-failure.c~mm-memory-failurec-remove-obsolete-comment-in-__soft_offline_page +++ a/mm/memory-failure.c @@ -2167,10 +2167,6 @@ static int __soft_offline_page(struct pa ret = invalidate_inode_page(page); unlock_page(page); - /* - * RED-PEN would be better to keep it isolated here, but we - * would need to fix isolation locking first. - */ if (ret) { pr_info("soft_offline: %#lx: invalidated\n", pfn); page_handle_poison(page, false, true); From patchwork Tue Mar 22 21:44:33 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789203 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 316BFC433F5 for ; Tue, 22 Mar 2022 21:44:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B2D4B6B0114; Tue, 22 Mar 2022 17:44:35 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AB5BD6B0113; Tue, 22 Mar 2022 17:44:35 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9305E6B0114; Tue, 22 Mar 2022 17:44:35 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.27]) by kanga.kvack.org (Postfix) with ESMTP id 7ED726B0112 for ; Tue, 22 Mar 2022 17:44:35 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 4BF8121D12 for ; Tue, 22 Mar 2022 21:44:35 +0000 (UTC) X-FDA: 79273351710.06.DD1AFAF Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf19.hostedemail.com (Postfix) with ESMTP id D95C31A0018 for ; Tue, 22 Mar 2022 21:44:34 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 4BCCC61523; Tue, 22 Mar 2022 21:44:34 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id A3D0FC340EC; Tue, 22 Mar 2022 21:44:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985473; bh=ufU2dBdaV6hQjERzZES5W1c91mKRaDKZ44CKgjOy4ek=; h=Date:To:From:In-Reply-To:Subject:From; b=b8QpBTuK/8KnsZ+HMy9OLdYQqvzvU5OHR7SjRMaRvtgAZSKUswv2+ZN6bqxnMosvs /UDyUQgbB+w58as6oI5yGw/B4LkHWDY1IxL5XKRLeh8AryWPPp5sZdoSNmH4QeIlBl sDCpLR0yUWRpnR4gAwRZthr4gaNqyeHtip/Z/IGk= Date: Tue, 22 Mar 2022 14:44:33 -0700 To: naoya.horiguchi@nec.com,linmiaohe@huawei.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 119/227] mm/memory-failure.c: remove unnecessary PageTransTail check Message-Id: <20220322214433.A3D0FC340EC@smtp.kernel.org> X-Stat-Signature: he1opwohygn99rhe44qkc7knbahn58xk Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=b8QpBTuK; spf=pass (imf19.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: D95C31A0018 X-HE-Tag: 1647985474-163301 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Miaohe Lin Subject: mm/memory-failure.c: remove unnecessary PageTransTail check When we reach here, we're guaranteed to have non-compound page as thp is already splited. Remove this unnecessary PageTransTail check. Link: https://lkml.kernel.org/r/20220218090118.1105-9-linmiaohe@huawei.com Signed-off-by: Miaohe Lin Acked-by: Naoya Horiguchi Signed-off-by: Andrew Morton --- mm/memory-failure.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/mm/memory-failure.c~mm-memory-failurec-remove-unnecessary-pagetranstail-check +++ a/mm/memory-failure.c @@ -1844,7 +1844,7 @@ try_again: * page_lock. We need wait writeback completion for this page or it * may trigger vfs BUG while evict inode. */ - if (!PageTransTail(p) && !PageLRU(p) && !PageWriteback(p)) + if (!PageLRU(p) && !PageWriteback(p)) goto identify_page_state; /* From patchwork Tue Mar 22 21:44:35 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789204 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2FA79C433EF for ; Tue, 22 Mar 2022 21:44:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B2EB66B0113; Tue, 22 Mar 2022 17:44:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AC2D96B0115; Tue, 22 Mar 2022 17:44:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9558B6B0116; Tue, 22 Mar 2022 17:44:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.25]) by kanga.kvack.org (Postfix) with ESMTP id 815456B0113 for ; Tue, 22 Mar 2022 17:44:39 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 63C5323702 for ; Tue, 22 Mar 2022 21:44:39 +0000 (UTC) X-FDA: 79273351878.03.F7BCA4D Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf26.hostedemail.com (Postfix) with ESMTP id D46A5140034 for ; Tue, 22 Mar 2022 21:44:38 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id D9F52B81DAD; Tue, 22 Mar 2022 21:44:37 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 8C593C340EC; Tue, 22 Mar 2022 21:44:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985476; bh=aJvl14R3//yR6TUdAYyl92DPDqdsx+7yQr9sKRBh24I=; h=Date:To:From:In-Reply-To:Subject:From; b=kLB9x8YDFbqUP73qYGE2T9PxzKil09MTTjeOtYbrghLiBIsRHa18utt5TyvkZUKoe EXMUILpGMKqKTqW626JQfFgs63nU3jwKKvVYS7QS08OhF00Vvh/ePKpcmfLERr5eze xOYUaJ4BotQ6L5n27vY1YjxI7GWmvs9EmxrWT2z0= Date: Tue, 22 Mar 2022 14:44:35 -0700 To: naoya.horiguchi@nec.com,linmiaohe@huawei.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 120/227] mm/hwpoison-inject: support injecting hwpoison to free page Message-Id: <20220322214436.8C593C340EC@smtp.kernel.org> X-Stat-Signature: uxn1gzs71fgbjgxzgoi3dte3zk8irqeg X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: D46A5140034 Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=kLB9x8YD; dmarc=none; spf=pass (imf26.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Rspam-User: X-HE-Tag: 1647985478-757003 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Miaohe Lin Subject: mm/hwpoison-inject: support injecting hwpoison to free page memory_failure() can handle free buddy page. Support injecting hwpoison to free page by adding is_free_buddy_page check when hwpoison filter is disabled. [akpm@linux-foundation.org: export is_free_buddy_page() to modules] Link: https://lkml.kernel.org/r/20220218092052.3853-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin Cc: Naoya Horiguchi Signed-off-by: Andrew Morton --- mm/hwpoison-inject.c | 4 ++-- mm/page_alloc.c | 1 + 2 files changed, 3 insertions(+), 2 deletions(-) --- a/mm/hwpoison-inject.c~mm-hwpoison-inject-support-injecting-hwpoison-to-free-page +++ a/mm/hwpoison-inject.c @@ -32,9 +32,9 @@ static int hwpoison_inject(void *data, u shake_page(hpage); /* - * This implies unable to support non-LRU pages. + * This implies unable to support non-LRU pages except free page. */ - if (!PageLRU(hpage) && !PageHuge(p)) + if (!PageLRU(hpage) && !PageHuge(p) && !is_free_buddy_page(p)) return 0; /* --- a/mm/page_alloc.c~mm-hwpoison-inject-support-injecting-hwpoison-to-free-page +++ a/mm/page_alloc.c @@ -9417,6 +9417,7 @@ bool is_free_buddy_page(struct page *pag return order < MAX_ORDER; } +EXPORT_SYMBOL(is_free_buddy_page); #ifdef CONFIG_MEMORY_FAILURE /* From patchwork Tue Mar 22 21:44:38 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789205 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 69752C433EF for ; Tue, 22 Mar 2022 21:44:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 058756B0116; Tue, 22 Mar 2022 17:44:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id F22186B0117; Tue, 22 Mar 2022 17:44:42 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DC2D66B0118; Tue, 22 Mar 2022 17:44:42 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.25]) by kanga.kvack.org (Postfix) with ESMTP id C44726B0116 for ; Tue, 22 Mar 2022 17:44:42 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay13.hostedemail.com (Postfix) with ESMTP id 7974561973 for ; Tue, 22 Mar 2022 21:44:42 +0000 (UTC) X-FDA: 79273352004.01.E720926 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf13.hostedemail.com (Postfix) with ESMTP id 095A92000D for ; Tue, 22 Mar 2022 21:44:41 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id EDC78B81D77; Tue, 22 Mar 2022 21:44:40 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 8DC27C340EC; Tue, 22 Mar 2022 21:44:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985479; bh=ZvOSrd3ZmpmUC3b9SxlsCcPdesA004qet5FVbPnbosk=; h=Date:To:From:In-Reply-To:Subject:From; b=IuUJKBgQSK/bDVYL2hGQn9my63DEkCF/dugsqgLIpL//0vhd5oIqmEPvBIEhYNO+a wBUvUPT/JEJSdEVdyX1ZNYP0R5GEahHBjN+el7UEvFu1MXBvTqV++AJ/GJuiS1fCf8 bq+oSvrWNayjGZ35fuztoUg8gsDNJQOaf+wiEdIM= Date: Tue, 22 Mar 2022 14:44:38 -0700 To: tony.luck@intel.com,tglx@linutronix.de,naoya.horiguchi@nec.com,mingo@redhat.com,linmiaohe@huawei.com,hpa@zytor.com,dave.hansen@linux.intel.com,bp@suse.de,luofei@unicloud.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 121/227] mm/hwpoison: avoid the impact of hwpoison_filter() return value on mce handler Message-Id: <20220322214439.8DC27C340EC@smtp.kernel.org> X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 095A92000D X-Stat-Signature: u697psmyzfdjkepxjixpozxfipu8fm9m X-Rspam-User: Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=IuUJKBgQ; dmarc=none; spf=pass (imf13.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-HE-Tag: 1647985481-402013 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: luofei Subject: mm/hwpoison: avoid the impact of hwpoison_filter() return value on mce handler When the hwpoison page meets the filter conditions, it should not be regarded as successful memory_failure() processing for mce handler, but should return a distinct value, otherwise mce handler regards the error page has been identified and isolated, which may lead to calling set_mce_nospec() to change page attribute, etc. Here memory_failure() return -EOPNOTSUPP to indicate that the error event is filtered, mce handler should not take any action for this situation and hwpoison injector should treat as correct. Link: https://lkml.kernel.org/r/20220223082135.2769649-1-luofei@unicloud.com Signed-off-by: luofei Acked-by: Borislav Petkov Cc: Dave Hansen Cc: H. Peter Anvin Cc: Ingo Molnar Cc: Miaohe Lin Cc: Naoya Horiguchi Cc: Thomas Gleixner Cc: Tony Luck Signed-off-by: Andrew Morton --- arch/x86/kernel/cpu/mce/core.c | 8 +++++--- drivers/base/memory.c | 2 ++ mm/hwpoison-inject.c | 3 ++- mm/madvise.c | 2 ++ mm/memory-failure.c | 9 +++++++-- 5 files changed, 18 insertions(+), 6 deletions(-) --- a/arch/x86/kernel/cpu/mce/core.c~mm-hwpoison-avoid-the-impact-of-hwpoison_filter-return-value-on-mce-handler +++ a/arch/x86/kernel/cpu/mce/core.c @@ -1304,10 +1304,12 @@ static void kill_me_maybe(struct callbac /* * -EHWPOISON from memory_failure() means that it already sent SIGBUS - * to the current process with the proper error info, so no need to - * send SIGBUS here again. + * to the current process with the proper error info, + * -EOPNOTSUPP means hwpoison_filter() filtered the error event, + * + * In both cases, no further processing is required. */ - if (ret == -EHWPOISON) + if (ret == -EHWPOISON || ret == -EOPNOTSUPP) return; pr_err("Memory error not recovered"); --- a/drivers/base/memory.c~mm-hwpoison-avoid-the-impact-of-hwpoison_filter-return-value-on-mce-handler +++ a/drivers/base/memory.c @@ -555,6 +555,8 @@ static ssize_t hard_offline_page_store(s return -EINVAL; pfn >>= PAGE_SHIFT; ret = memory_failure(pfn, 0); + if (ret == -EOPNOTSUPP) + ret = 0; return ret ? ret : count; } --- a/mm/hwpoison-inject.c~mm-hwpoison-avoid-the-impact-of-hwpoison_filter-return-value-on-mce-handler +++ a/mm/hwpoison-inject.c @@ -48,7 +48,8 @@ static int hwpoison_inject(void *data, u inject: pr_info("Injecting memory failure at pfn %#lx\n", pfn); - return memory_failure(pfn, 0); + err = memory_failure(pfn, 0); + return (err == -EOPNOTSUPP) ? 0 : err; } static int hwpoison_unpoison(void *data, u64 val) --- a/mm/madvise.c~mm-hwpoison-avoid-the-impact-of-hwpoison_filter-return-value-on-mce-handler +++ a/mm/madvise.c @@ -1067,6 +1067,8 @@ static int madvise_inject_error(int beha pr_info("Injecting memory failure for pfn %#lx at process virtual address %#lx\n", pfn, start); ret = memory_failure(pfn, MF_COUNT_INCREASED); + if (ret == -EOPNOTSUPP) + ret = 0; } if (ret) --- a/mm/memory-failure.c~mm-hwpoison-avoid-the-impact-of-hwpoison_filter-return-value-on-mce-handler +++ a/mm/memory-failure.c @@ -1515,7 +1515,7 @@ static int memory_failure_hugetlb(unsign if (TestClearPageHWPoison(head)) num_poisoned_pages_dec(); unlock_page(head); - return 0; + return -EOPNOTSUPP; } unlock_page(head); res = MF_FAILED; @@ -1602,7 +1602,7 @@ static int memory_failure_dev_pagemap(un goto out; if (hwpoison_filter(page)) { - rc = 0; + rc = -EOPNOTSUPP; goto unlock; } @@ -1671,6 +1671,10 @@ static DEFINE_MUTEX(mf_mutex); * * Must run in process context (e.g. a work queue) with interrupts * enabled and no spinlocks hold. + * + * Return: 0 for successfully handled the memory error, + * -EOPNOTSUPP for memory_filter() filtered the error event, + * < 0(except -EOPNOTSUPP) on failure. */ int memory_failure(unsigned long pfn, int flags) { @@ -1836,6 +1840,7 @@ try_again: num_poisoned_pages_dec(); unlock_page(p); put_page(p); + res = -EOPNOTSUPP; goto unlock_mutex; } From patchwork Tue Mar 22 21:44:41 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789206 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4C348C433FE for ; Tue, 22 Mar 2022 21:44:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C8C266B0118; Tue, 22 Mar 2022 17:44:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C3B806B0119; Tue, 22 Mar 2022 17:44:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AB6C86B011A; Tue, 22 Mar 2022 17:44:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0212.hostedemail.com [216.40.44.212]) by kanga.kvack.org (Postfix) with ESMTP id 81C906B0118 for ; Tue, 22 Mar 2022 17:44:44 -0400 (EDT) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 45A051825672D for ; Tue, 22 Mar 2022 21:44:44 +0000 (UTC) X-FDA: 79273352088.23.71BB185 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf01.hostedemail.com (Postfix) with ESMTP id B81ED40016 for ; Tue, 22 Mar 2022 21:44:43 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 2F3BF6119A; Tue, 22 Mar 2022 21:44:43 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 89519C340EC; Tue, 22 Mar 2022 21:44:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985482; bh=ReSpgwCOC8EZ2mYlf7ddgCC1InqFtXZHECYBWEiCYJQ=; h=Date:To:From:In-Reply-To:Subject:From; b=MzfJP7rUJL3GWG7O/a/WgQZ/k2lq2alx0GB3hQX/2VabhjDG5n/AtKLkwfLMicuJg BH5vTQXLAjNv+JGByjecCBvEnIewsHq7lCULbh1Ra+BRQhbP1lT8Ox3RlrpQFRVkkX S2607Aru71r3zQkwuYYc6suH7MsjOKns73tf8gMA= Date: Tue, 22 Mar 2022 14:44:41 -0700 To: tony.luck@intel.com,tglx@linutronix.de,naoya.horiguchi@nec.com,mingo@redhat.com,linmiaohe@huawei.com,hpa@zytor.com,dave.hansen@linux.intel.com,bp@alien8.de,luofei@unicloud.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 122/227] mm/hwpoison: add in-use hugepage hwpoison filter judgement Message-Id: <20220322214442.89519C340EC@smtp.kernel.org> X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: B81ED40016 X-Stat-Signature: m5m9nge3kgzy1kxzgtwbcab1nxj9pyok X-Rspam-User: Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=MzfJP7rU; dmarc=none; spf=pass (imf01.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-HE-Tag: 1647985483-220911 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: luofei Subject: mm/hwpoison: add in-use hugepage hwpoison filter judgement After successfully obtaining the reference count of the huge page, it is still necessary to call hwpoison_filter() to make a filter judgement, otherwise the filter hugepage will be unmaped and the related process may be killed. Link: https://lkml.kernel.org/r/20220223082254.2769757-1-luofei@unicloud.com Signed-off-by: luofei Reviewed-by: Miaohe Lin Cc: Borislav Petkov Cc: Dave Hansen Cc: H. Peter Anvin Cc: Ingo Molnar Cc: Naoya Horiguchi Cc: Thomas Gleixner Cc: Tony Luck Signed-off-by: Andrew Morton --- mm/memory-failure.c | 8 ++++++++ 1 file changed, 8 insertions(+) --- a/mm/memory-failure.c~mm-hwpoison-add-in-use-hugepage-hwpoison-filter-judgement +++ a/mm/memory-failure.c @@ -1534,6 +1534,14 @@ static int memory_failure_hugetlb(unsign lock_page(head); page_flags = head->flags; + if (hwpoison_filter(p)) { + if (TestClearPageHWPoison(head)) + num_poisoned_pages_dec(); + put_page(p); + res = -EOPNOTSUPP; + goto out; + } + /* * TODO: hwpoison for pud-sized hugetlb doesn't work right now, so * simply disable it. In order to make it work properly, we need From patchwork Tue Mar 22 21:44:44 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789207 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 62E70C433F5 for ; Tue, 22 Mar 2022 21:44:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 055FE6B011A; Tue, 22 Mar 2022 17:44:49 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0062D6B011B; Tue, 22 Mar 2022 17:44:48 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E111F6B011C; Tue, 22 Mar 2022 17:44:48 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.25]) by kanga.kvack.org (Postfix) with ESMTP id D13416B011A for ; Tue, 22 Mar 2022 17:44:48 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id AE39C23F83 for ; Tue, 22 Mar 2022 21:44:48 +0000 (UTC) X-FDA: 79273352256.15.0E7E162 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf27.hostedemail.com (Postfix) with ESMTP id 1168B40020 for ; Tue, 22 Mar 2022 21:44:47 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id F2670B81DB7; Tue, 22 Mar 2022 21:44:46 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 87E5CC340EC; Tue, 22 Mar 2022 21:44:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985485; bh=xkrxMmxe6pG21FSJF2rhuxx4y2yhqicxMLIqE2ZKoBE=; h=Date:To:From:In-Reply-To:Subject:From; b=rB3psBfOiPS3FnNak3XUxZ8Sx/pbZJonkV5mmXdkFvPuDN5kft/JrWytK14FMpdEn in/TWE11XTuDSOljQ7j/xXBnGnOP90Ma4Pq9D2bvPu8NttYKHb2Cxfcc583+DaKqOr Q7XeLQNbaxlR3M6XRyngY8ujzkaPAcV/x3KjtG9M= Date: Tue, 22 Mar 2022 14:44:44 -0700 To: tony.luck@intel.com,shy828301@gmail.com,naoya.horiguchi@nec.com,mike.kravetz@oracle.com,bp@alien8.de,linmiaohe@huawei.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 123/227] mm/memory-failure.c: fix race with changing page compound again Message-Id: <20220322214445.87E5CC340EC@smtp.kernel.org> X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 1168B40020 X-Rspam-User: Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=rB3psBfO; dmarc=none; spf=pass (imf27.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Stat-Signature: kasab11i9u31bcnsixzy7annbbytrds4 X-HE-Tag: 1647985487-349476 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Miaohe Lin Subject: mm/memory-failure.c: fix race with changing page compound again Patch series "A few fixup patches for memory failure", v2. This series contains a few patches to fix the race with changing page compound page, make non-LRU movable pages unhandlable and so on. More details can be found in the respective changelogs. There is a race window where we got the compound_head, the hugetlb page could be freed to buddy, or even changed to another compound page just before we try to get hwpoison page. Think about the below race window: CPU 1 CPU 2 memory_failure_hugetlb struct page *head = compound_head(p); hugetlb page might be freed to buddy, or even changed to another compound page. get_hwpoison_page -- page is not what we want now... If this race happens, just bail out. Also MF_MSG_DIFFERENT_PAGE_SIZE is introduced to record this event. [akpm@linux-foundation.org: s@/**@/*@, per Naoya Horiguchi] Link: https://lkml.kernel.org/r/20220312074613.4798-1-linmiaohe@huawei.com Link: https://lkml.kernel.org/r/20220312074613.4798-2-linmiaohe@huawei.com Signed-off-by: Miaohe Lin Acked-by: Naoya Horiguchi Cc: Tony Luck Cc: Borislav Petkov Cc: Mike Kravetz Cc: Yang Shi Signed-off-by: Andrew Morton --- include/linux/mm.h | 1 + include/ras/ras_event.h | 1 + mm/memory-failure.c | 12 ++++++++++++ 3 files changed, 14 insertions(+) --- a/include/linux/mm.h~mm-memory-failurec-fix-race-with-changing-page-compound-again +++ a/include/linux/mm.h @@ -3239,6 +3239,7 @@ enum mf_action_page_type { MF_MSG_BUDDY, MF_MSG_DAX, MF_MSG_UNSPLIT_THP, + MF_MSG_DIFFERENT_PAGE_SIZE, MF_MSG_UNKNOWN, }; --- a/include/ras/ras_event.h~mm-memory-failurec-fix-race-with-changing-page-compound-again +++ a/include/ras/ras_event.h @@ -374,6 +374,7 @@ TRACE_EVENT(aer_event, EM ( MF_MSG_BUDDY, "free buddy page" ) \ EM ( MF_MSG_DAX, "dax page" ) \ EM ( MF_MSG_UNSPLIT_THP, "unsplit thp" ) \ + EM ( MF_MSG_DIFFERENT_PAGE_SIZE, "different page size" ) \ EMe ( MF_MSG_UNKNOWN, "unknown page" ) /* --- a/mm/memory-failure.c~mm-memory-failurec-fix-race-with-changing-page-compound-again +++ a/mm/memory-failure.c @@ -732,6 +732,7 @@ static const char * const action_page_ty [MF_MSG_BUDDY] = "free buddy page", [MF_MSG_DAX] = "dax page", [MF_MSG_UNSPLIT_THP] = "unsplit thp", + [MF_MSG_DIFFERENT_PAGE_SIZE] = "different page size", [MF_MSG_UNKNOWN] = "unknown page", }; @@ -1532,6 +1533,17 @@ static int memory_failure_hugetlb(unsign } lock_page(head); + + /* + * The page could have changed compound pages due to race window. + * If this happens just bail out. + */ + if (!PageHuge(p) || compound_head(p) != head) { + action_result(pfn, MF_MSG_DIFFERENT_PAGE_SIZE, MF_IGNORED); + res = -EBUSY; + goto out; + } + page_flags = head->flags; if (hwpoison_filter(p)) { From patchwork Tue Mar 22 21:44:47 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789208 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 27CCFC433F5 for ; Tue, 22 Mar 2022 21:44:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B0A076B011C; Tue, 22 Mar 2022 17:44:50 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id ADD5D6B011D; Tue, 22 Mar 2022 17:44:50 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7A7F16B011E; Tue, 22 Mar 2022 17:44:50 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.25]) by kanga.kvack.org (Postfix) with ESMTP id 661796B011C for ; Tue, 22 Mar 2022 17:44:50 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 45B52130B for ; Tue, 22 Mar 2022 21:44:50 +0000 (UTC) X-FDA: 79273352340.12.4E32041 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf28.hostedemail.com (Postfix) with ESMTP id A727DC001E for ; Tue, 22 Mar 2022 21:44:49 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 2A5F96104C; Tue, 22 Mar 2022 21:44:49 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 8208FC340F2; Tue, 22 Mar 2022 21:44:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985488; bh=SYcvGt1UY10z+cDuh9Pcwxy8R8rN3AVQY2fMIIcccC4=; h=Date:To:From:In-Reply-To:Subject:From; b=RO0xYNQq/fynKFosGwVl29RIdRZ2gmQMj6UvZfAjyF1fdhmH1/7hP3g5E8xt517/t 48EIg5xQRuYJAzUwFri+f9Dgdb7q3CnLK6BRvu7Xt/hipk1kXDL2PNAvWQ3hd73ZCO rkEs5WldUkuJXgMtvUu1Eb9V5DJD8Yuw77iU1N7E= Date: Tue, 22 Mar 2022 14:44:47 -0700 To: tony.luck@intel.com,shy828301@gmail.com,naoya.horiguchi@nec.com,mike.kravetz@oracle.com,bp@alien8.de,linmiaohe@huawei.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 124/227] mm/memory-failure.c: avoid calling invalidate_inode_page() with unexpected pages Message-Id: <20220322214448.8208FC340F2@smtp.kernel.org> X-Rspam-User: X-Stat-Signature: sud4nrktsw3z7jjeo46y1wzj8snhwjyp Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=RO0xYNQq; spf=pass (imf28.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: A727DC001E X-HE-Tag: 1647985489-161199 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Miaohe Lin Subject: mm/memory-failure.c: avoid calling invalidate_inode_page() with unexpected pages Since commit 042c4f32323b ("mm/truncate: Inline invalidate_complete_page() into its one caller"), invalidate_inode_page() can invalidate the pages in the swap cache because the check of page->mapping != mapping is removed. But invalidate_inode_page() is not expected to deal with the pages in swap cache. Also non-lru movable page can reach here too. They're not page cache pages. Skip these pages by checking PageSwapCache and PageLRU. Link: https://lkml.kernel.org/r/20220312074613.4798-3-linmiaohe@huawei.com Signed-off-by: Miaohe Lin Cc: Borislav Petkov Cc: Mike Kravetz Cc: Naoya Horiguchi Cc: Tony Luck Cc: Yang Shi Signed-off-by: Andrew Morton --- mm/memory-failure.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/mm/memory-failure.c~mm-memory-failurec-avoid-calling-invalidate_inode_page-with-unexpected-pages +++ a/mm/memory-failure.c @@ -2184,7 +2184,7 @@ static int __soft_offline_page(struct pa return 0; } - if (!PageHuge(page)) + if (!PageHuge(page) && PageLRU(page) && !PageSwapCache(page)) /* * Try to invalidate first. This should work for * non dirty unmapped page cache pages. From patchwork Tue Mar 22 21:44:50 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789209 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4F1E4C433EF for ; Tue, 22 Mar 2022 21:44:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D01596B011E; Tue, 22 Mar 2022 17:44:54 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CB0AC6B011F; Tue, 22 Mar 2022 17:44:54 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B52106B0120; Tue, 22 Mar 2022 17:44:54 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.a.hostedemail.com [64.99.140.24]) by kanga.kvack.org (Postfix) with ESMTP id A335B6B011E for ; Tue, 22 Mar 2022 17:44:54 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 799C42479E for ; Tue, 22 Mar 2022 21:44:54 +0000 (UTC) X-FDA: 79273352508.11.3008819 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf25.hostedemail.com (Postfix) with ESMTP id E2C87A0020 for ; Tue, 22 Mar 2022 21:44:53 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id D6967B81DAD; Tue, 22 Mar 2022 21:44:52 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 79723C340EE; Tue, 22 Mar 2022 21:44:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985491; bh=zCEyjdx5mHzAsw8Q2UZPFRSoRBqMsxXVKNHia3WkJyw=; h=Date:To:From:In-Reply-To:Subject:From; b=GCY75yYEe/eZ1v1mAWx6vqle7Po4M6FgQuNM4h3RwC4rj21re+bA9HAhyRnlbxtS5 CjeqRHaw5CkOt0j8EsZmZNUnSX5vSjcsQYypVM9iJBeGNhgknCbhg+CbvOC/VVO3yt 3tfNl+hR89ffpYN26TVoADCZOpkiRctcD0Wpg7ug= Date: Tue, 22 Mar 2022 14:44:50 -0700 To: tony.luck@intel.com,shy828301@gmail.com,naoya.horiguchi@nec.com,mike.kravetz@oracle.com,bp@alien8.de,linmiaohe@huawei.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 125/227] mm/memory-failure.c: make non-LRU movable pages unhandlable Message-Id: <20220322214451.79723C340EE@smtp.kernel.org> Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=GCY75yYE; spf=pass (imf25.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Stat-Signature: oxb371y1jzh3g4km3wj9et9ufb3m5acc X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: E2C87A0020 X-HE-Tag: 1647985493-97658 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Miaohe Lin Subject: mm/memory-failure.c: make non-LRU movable pages unhandlable We can not really handle non-LRU movable pages in memory failure. Typically they are balloon, zsmalloc, etc. Assuming we run into a base (4K) non-LRU movable page, we could reach as far as identify_page_state(), it should not fall into any category except me_unknown. For the non-LRU compound movable pages, they could be taken for transhuge pages but it's unexpected to split non-LRU movable pages using split_huge_page_to_list in memory_failure. So we could just simply make non-LRU movable pages unhandlable to avoid these possible nasty cases. Link: https://lkml.kernel.org/r/20220312074613.4798-4-linmiaohe@huawei.com Signed-off-by: Miaohe Lin Suggested-by: Yang Shi Reviewed-by: Yang Shi Acked-by: Naoya Horiguchi Cc: Borislav Petkov Cc: Mike Kravetz Cc: Tony Luck Signed-off-by: Andrew Morton --- mm/memory-failure.c | 20 +++++++++++++------- 1 file changed, 13 insertions(+), 7 deletions(-) --- a/mm/memory-failure.c~mm-memory-failurec-make-non-lru-movable-pages-unhandlable +++ a/mm/memory-failure.c @@ -1176,12 +1176,18 @@ void ClearPageHWPoisonTakenOff(struct pa * does not return true for hugetlb or device memory pages, so it's assumed * to be called only in the context where we never have such pages. */ -static inline bool HWPoisonHandlable(struct page *page) +static inline bool HWPoisonHandlable(struct page *page, unsigned long flags) { - return PageLRU(page) || __PageMovable(page) || is_free_buddy_page(page); + bool movable = false; + + /* Soft offline could mirgate non-LRU movable pages */ + if ((flags & MF_SOFT_OFFLINE) && __PageMovable(page)) + movable = true; + + return movable || PageLRU(page) || is_free_buddy_page(page); } -static int __get_hwpoison_page(struct page *page) +static int __get_hwpoison_page(struct page *page, unsigned long flags) { struct page *head = compound_head(page); int ret = 0; @@ -1196,7 +1202,7 @@ static int __get_hwpoison_page(struct pa * for any unsupported type of page in order to reduce the risk of * unexpected races caused by taking a page refcount. */ - if (!HWPoisonHandlable(head)) + if (!HWPoisonHandlable(head, flags)) return -EBUSY; if (get_page_unless_zero(head)) { @@ -1221,7 +1227,7 @@ static int get_any_page(struct page *p, try_again: if (!count_increased) { - ret = __get_hwpoison_page(p); + ret = __get_hwpoison_page(p, flags); if (!ret) { if (page_count(p)) { /* We raced with an allocation, retry. */ @@ -1249,7 +1255,7 @@ try_again: } } - if (PageHuge(p) || HWPoisonHandlable(p)) { + if (PageHuge(p) || HWPoisonHandlable(p, flags)) { ret = 1; } else { /* @@ -2302,7 +2308,7 @@ int soft_offline_page(unsigned long pfn, retry: get_online_mems(); - ret = get_hwpoison_page(page, flags); + ret = get_hwpoison_page(page, flags | MF_SOFT_OFFLINE); put_online_mems(); if (ret > 0) { From patchwork Tue Mar 22 21:44:53 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789210 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AEC00C433F5 for ; Tue, 22 Mar 2022 21:44:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1E0026B0120; Tue, 22 Mar 2022 17:44:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 16BE96B0122; Tue, 22 Mar 2022 17:44:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F224B6B0121; Tue, 22 Mar 2022 17:44:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.28]) by kanga.kvack.org (Postfix) with ESMTP id DB7556B011F for ; Tue, 22 Mar 2022 17:44:55 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay12.hostedemail.com (Postfix) with ESMTP id 6BB20121953 for ; Tue, 22 Mar 2022 21:44:55 +0000 (UTC) X-FDA: 79273352550.11.F92B8CD Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf17.hostedemail.com (Postfix) with ESMTP id 19BC940033 for ; Tue, 22 Mar 2022 21:44:54 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id A294D6104C; Tue, 22 Mar 2022 21:44:54 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 6384CC340F2; Tue, 22 Mar 2022 21:44:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985494; bh=ME/RtbfS+u0oOaqRfNDLWI3rW/1j4uR3uNN2Z3ml3A4=; h=Date:To:From:In-Reply-To:Subject:From; b=J+fOALuoVrlpbRT7v3lwq3vVafQxztlTzU7O+A63iQY1SP/OaEFOUs+CD00KbLPt/ 8wJgK5KMPANxALTjUyY3+FLC0Wn6Q/NdyTDWRhR5NDcAt0KD/dT4aUk4NGx7vyUTms RBIWRbZ4FRI6UgPrrxxG0MvxH7fpNoukpa48crK4= Date: Tue, 22 Mar 2022 14:44:53 -0700 To: willy@infradead.org,mgorman@techsingularity.net,david@redhat.com,vbabka@suse.cz,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 126/227] mm, fault-injection: declare should_fail_alloc_page() Message-Id: <20220322214454.6384CC340F2@smtp.kernel.org> Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=J+fOALuo; spf=pass (imf17.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Stat-Signature: e1mgdxo4sn7i19m5rp91jq6na35tp5xh X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 19BC940033 X-HE-Tag: 1647985494-606952 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Vlastimil Babka Subject: mm, fault-injection: declare should_fail_alloc_page() The mm/ directory can almost fully be built with W=1, which would help in local development. One remaining issue is missing prototype for should_fail_alloc_page(). Thus add it next to the should_failslab() prototype. Note the previous attempt by commit f7173090033c ("mm/page_alloc: make should_fail_alloc_page() static") had to be reverted by commit 54aa386661fe as it caused an unresolved symbol error with CONFIG_DEBUG_INFO_BTF=y Link: https://lkml.kernel.org/r/20220314165724.16071-1-vbabka@suse.cz Signed-off-by: Vlastimil Babka Cc: Mel Gorman Cc: Matthew Wilcox Cc: David Hildenbrand Signed-off-by: Andrew Morton --- include/linux/fault-inject.h | 2 ++ 1 file changed, 2 insertions(+) --- a/include/linux/fault-inject.h~mm-fault-injection-declare-should_fail_alloc_page +++ a/include/linux/fault-inject.h @@ -64,6 +64,8 @@ static inline struct dentry *fault_creat struct kmem_cache; +bool should_fail_alloc_page(gfp_t gfp_mask, unsigned int order); + int should_failslab(struct kmem_cache *s, gfp_t gfpflags); #ifdef CONFIG_FAILSLAB extern bool __should_failslab(struct kmem_cache *s, gfp_t gfpflags); From patchwork Tue Mar 22 21:44:56 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789211 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 11848C433EF for ; Tue, 22 Mar 2022 21:45:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9A63B6B0122; Tue, 22 Mar 2022 17:45:00 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 955A86B0123; Tue, 22 Mar 2022 17:45:00 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 81E156B0124; Tue, 22 Mar 2022 17:45:00 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.25]) by kanga.kvack.org (Postfix) with ESMTP id 706EA6B0122 for ; Tue, 22 Mar 2022 17:45:00 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 468FB23F88 for ; Tue, 22 Mar 2022 21:45:00 +0000 (UTC) X-FDA: 79273352760.08.B0494AB Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf14.hostedemail.com (Postfix) with ESMTP id 62FE710002F for ; Tue, 22 Mar 2022 21:44:59 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id ADFFDB81DB9; Tue, 22 Mar 2022 21:44:58 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 65988C340EC; Tue, 22 Mar 2022 21:44:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985497; bh=xpb+C+JgfC5R4O8FqnsK5U6W1BFO1h90Du/JBYTules=; h=Date:To:From:In-Reply-To:Subject:From; b=RiFbCSUvOoTtpC/WBF8XprLOMs01Q5mjp8Rlx7EuMaXF2/rTq+WJaAHAT4pxbHSJA fqinLT2WCaMWNkx2E26MQFTE3qNaDVPtl7ykLhmwQPGrXbayUcS15yqA82TIhMba2j b9WWFRaYNnfMutq/K0F0dYXU9Rv9PJPQ9BKdJQwM= Date: Tue, 22 Mar 2022 14:44:56 -0700 To: hughd@google.com,herbert.van.den.bergh@oracle.com,chris.mason@oracle.com,linmiaohe@huawei.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 127/227] mm/mlock: fix potential imbalanced rlimit ucounts adjustment Message-Id: <20220322214457.65988C340EC@smtp.kernel.org> X-Stat-Signature: bujh49u4hu31xyubdrdag4zo3uh7jiee Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=RiFbCSUv; spf=pass (imf14.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspam-User: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 62FE710002F X-HE-Tag: 1647985499-873527 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Miaohe Lin Subject: mm/mlock: fix potential imbalanced rlimit ucounts adjustment user_shm_lock forgets to set allowed to 0 when get_ucounts fails. So the later user_shm_unlock might do the extra dec_rlimit_ucounts. Fix this by resetting allowed to 0. Link: https://lkml.kernel.org/r/20220310132417.41189-1-linmiaohe@huawei.com Fixes: d7c9e99aee48 ("Reimplement RLIMIT_MEMLOCK on top of ucounts") Signed-off-by: Miaohe Lin Reviewed-by: Andrew Morton Acked-by: Hugh Dickins Cc: Herbert van den Bergh Cc: Chris Mason Signed-off-by: Andrew Morton --- mm/mlock.c | 1 + 1 file changed, 1 insertion(+) --- a/mm/mlock.c~mm-mlock-fix-potential-imbalanced-rlimit-ucounts-adjustment +++ a/mm/mlock.c @@ -839,6 +839,7 @@ int user_shm_lock(size_t size, struct uc } if (!get_ucounts(ucounts)) { dec_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_MEMLOCK, locked); + allowed = 0; goto out; } allowed = 1; From patchwork Tue Mar 22 21:45:00 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789212 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8A019C433EF for ; Tue, 22 Mar 2022 21:45:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1E16B6B0124; Tue, 22 Mar 2022 17:45:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 166386B0125; Tue, 22 Mar 2022 17:45:05 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 006F26B0126; Tue, 22 Mar 2022 17:45:04 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0183.hostedemail.com [216.40.44.183]) by kanga.kvack.org (Postfix) with ESMTP id DD0516B0124 for ; Tue, 22 Mar 2022 17:45:04 -0400 (EDT) Received: from smtpin30.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id A033DA15F3 for ; Tue, 22 Mar 2022 21:45:04 +0000 (UTC) X-FDA: 79273352928.30.03FEA1A Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf16.hostedemail.com (Postfix) with ESMTP id 8326A180018 for ; Tue, 22 Mar 2022 21:45:03 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 049F4B81DAF; Tue, 22 Mar 2022 21:45:02 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id AE0EAC340EE; Tue, 22 Mar 2022 21:45:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985500; bh=ObXc8lhics7V9Ur0I8q8hfCsD/sLow743weHbxJJAXY=; h=Date:To:From:In-Reply-To:Subject:From; b=oYI66f+XfLbGqyBUHyI+gGw4smcmiWTNsic1AJzNdtC7rNavMkn2g5BokA3hUbkOr AzExNfaM55P1FD0dVhehrKATCMD4ffYKg/6azYiT2dec2Y31vRRDMbkg5G/InZksIk bws6aYiHYekBz9YgoPa4henClyS55Qbp6OV8wugw= Date: Tue, 22 Mar 2022 14:45:00 -0700 To: zhengqi.arch@bytedance.com,willy@infradead.org,song.bao.hua@hisilicon.com,osalvador@suse.de,mike.kravetz@oracle.com,mhocko@suse.com,fam.zheng@bytedance.com,duanxiongchun@bytedance.com,david@redhat.com,corbet@lwn.net,chenhuang5@huawei.com,bodeddub@amazon.com,songmuchun@bytedance.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 128/227] mm: hugetlb: free the 2nd vmemmap page associated with each HugeTLB page Message-Id: <20220322214500.AE0EAC340EE@smtp.kernel.org> X-Stat-Signature: hs3fj96wnezes3hs9yjpeu99u5ewanic X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 8326A180018 Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=oYI66f+X; dmarc=none; spf=pass (imf16.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Rspam-User: X-HE-Tag: 1647985503-907759 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Muchun Song Subject: mm: hugetlb: free the 2nd vmemmap page associated with each HugeTLB page Patch series "Free the 2nd vmemmap page associated with each HugeTLB page", v7. This series can minimize the overhead of struct page for 2MB HugeTLB pages significantly. It further reduces the overhead of struct page by 12.5% for a 2MB HugeTLB compared to the previous approach, which means 2GB per 1TB HugeTLB. It is a nice gain. Comments and reviews are welcome. Thanks. The main implementation and details can refer to the commit log of patch 1. In this series, I have changed the following four helpers, the following table shows the impact of the overhead of those helpers. +------------------+-----------------------+ | APIs | head page | tail page | +------------------+-----------+-----------+ | PageHead() | Y | N | +------------------+-----------+-----------+ | PageTail() | Y | N | +------------------+-----------+-----------+ | PageCompound() | N | N | +------------------+-----------+-----------+ | compound_head() | Y | N | +------------------+-----------+-----------+ Y: Overhead is increased. N: Overhead is _NOT_ increased. It shows that the overhead of those helpers on a tail page don't change between "hugetlb_free_vmemmap=on" and "hugetlb_free_vmemmap=off". But the overhead on a head page will be increased when "hugetlb_free_vmemmap=on" (except PageCompound()). So I believe that Matthew Wilcox's folio series will help with this. The users of PageHead() and PageTail() are much less than compound_head() and most users of PageTail() are VM_BUG_ON(), so I have done some tests about the overhead of compound_head() on head pages. I have tested the overhead of calling compound_head() on a head page, which is 2.11ns (Measure the call time of 10 million times compound_head(), and then average). For a head page whose address is not aligned with PAGE_SIZE or a non-compound page, the overhead of compound_head() is 2.54ns which is increased by 20%. For a head page whose address is aligned with PAGE_SIZE, the overhead of compound_head() is 2.97ns which is increased by 40%. Most pages are the former. I do not think the overhead is significant since the overhead of compound_head() itself is low. This patch (of 5): This patch minimizes the overhead of struct page for 2MB HugeTLB pages significantly. It further reduces the overhead of struct page by 12.5% for a 2MB HugeTLB compared to the previous approach, which means 2GB per 1TB HugeTLB (2MB type). After the feature of "Free sonme vmemmap pages of HugeTLB page" is enabled, the mapping of the vmemmap addresses associated with a 2MB HugeTLB page becomes the figure below. HugeTLB struct pages(8 pages) page frame(8 pages) +-----------+ ---virt_to_page---> +-----------+ mapping to +-----------+---> PG_head | | | 0 | -------------> | 0 | | | +-----------+ +-----------+ | | | 1 | -------------> | 1 | | | +-----------+ +-----------+ | | | 2 | ----------------^ ^ ^ ^ ^ ^ | | +-----------+ | | | | | | | | 3 | ------------------+ | | | | | | +-----------+ | | | | | | | 4 | --------------------+ | | | | 2MB | +-----------+ | | | | | | 5 | ----------------------+ | | | | +-----------+ | | | | | 6 | ------------------------+ | | | +-----------+ | | | | 7 | --------------------------+ | | +-----------+ | | | | | | +-----------+ As we can see, the 2nd vmemmap page frame (indexed by 1) is reused and remaped. However, the 2nd vmemmap page frame is also can be freed to the buddy allocator, then we can change the mapping from the figure above to the figure below. HugeTLB struct pages(8 pages) page frame(8 pages) +-----------+ ---virt_to_page---> +-----------+ mapping to +-----------+---> PG_head | | | 0 | -------------> | 0 | | | +-----------+ +-----------+ | | | 1 | ---------------^ ^ ^ ^ ^ ^ ^ | | +-----------+ | | | | | | | | | 2 | -----------------+ | | | | | | | +-----------+ | | | | | | | | 3 | -------------------+ | | | | | | +-----------+ | | | | | | | 4 | ---------------------+ | | | | 2MB | +-----------+ | | | | | | 5 | -----------------------+ | | | | +-----------+ | | | | | 6 | -------------------------+ | | | +-----------+ | | | | 7 | ---------------------------+ | | +-----------+ | | | | | | +-----------+ After we do this, all tail vmemmap pages (1-7) are mapped to the head vmemmap page frame (0). In other words, there are more than one page struct with PG_head associated with each HugeTLB page. We __know__ that there is only one head page struct, the tail page structs with PG_head are fake head page structs. We need an approach to distinguish between those two different types of page structs so that compound_head(), PageHead() and PageTail() can work properly if the parameter is the tail page struct but with PG_head. The following code snippet describes how to distinguish between real and fake head page struct. if (test_bit(PG_head, &page->flags)) { unsigned long head = READ_ONCE(page[1].compound_head); if (head & 1) { if (head == (unsigned long)page + 1) ==> head page struct else ==> tail page struct } else ==> head page struct } We can safely access the field of the @page[1] with PG_head because the @page is a compound page composed with at least two contiguous pages. [songmuchun@bytedance.com: restore lost comment changes] Link: https://lkml.kernel.org/r/20211101031651.75851-1-songmuchun@bytedance.com Link: https://lkml.kernel.org/r/20211101031651.75851-2-songmuchun@bytedance.com Signed-off-by: Muchun Song Reviewed-by: Barry Song Cc: Mike Kravetz Cc: Oscar Salvador Cc: Michal Hocko Cc: David Hildenbrand Cc: Chen Huang Cc: Bodeddula Balasubramaniam Cc: Jonathan Corbet Cc: Matthew Wilcox Cc: Xiongchun Duan Cc: Fam Zheng Cc: Qi Zheng Signed-off-by: Andrew Morton --- Documentation/admin-guide/kernel-parameters.txt | 2 include/linux/page-flags.h | 78 +++++++++++++- mm/hugetlb_vmemmap.c | 62 ++++++----- mm/sparse-vmemmap.c | 21 +++ 4 files changed, 130 insertions(+), 33 deletions(-) --- a/Documentation/admin-guide/kernel-parameters.txt~mm-hugetlb-free-the-2nd-vmemmap-page-associated-with-each-hugetlb-page +++ a/Documentation/admin-guide/kernel-parameters.txt @@ -1625,7 +1625,7 @@ [KNL] Reguires CONFIG_HUGETLB_PAGE_FREE_VMEMMAP enabled. Allows heavy hugetlb users to free up some more - memory (6 * PAGE_SIZE for each 2MB hugetlb page). + memory (7 * PAGE_SIZE for each 2MB hugetlb page). Format: { on | off (default) } on: enable the feature --- a/include/linux/page-flags.h~mm-hugetlb-free-the-2nd-vmemmap-page-associated-with-each-hugetlb-page +++ a/include/linux/page-flags.h @@ -190,13 +190,69 @@ enum pageflags { #ifndef __GENERATING_BOUNDS_H +#ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP +extern bool hugetlb_free_vmemmap_enabled; + +/* + * If the feature of freeing some vmemmap pages associated with each HugeTLB + * page is enabled, the head vmemmap page frame is reused and all of the tail + * vmemmap addresses map to the head vmemmap page frame (furture details can + * refer to the figure at the head of the mm/hugetlb_vmemmap.c). In other + * words, there are more than one page struct with PG_head associated with each + * HugeTLB page. We __know__ that there is only one head page struct, the tail + * page structs with PG_head are fake head page structs. We need an approach + * to distinguish between those two different types of page structs so that + * compound_head() can return the real head page struct when the parameter is + * the tail page struct but with PG_head. + * + * The page_fixed_fake_head() returns the real head page struct if the @page is + * fake page head, otherwise, returns @page which can either be a true page + * head or tail. + */ +static __always_inline const struct page *page_fixed_fake_head(const struct page *page) +{ + if (!hugetlb_free_vmemmap_enabled) + return page; + + /* + * Only addresses aligned with PAGE_SIZE of struct page may be fake head + * struct page. The alignment check aims to avoid access the fields ( + * e.g. compound_head) of the @page[1]. It can avoid touch a (possibly) + * cold cacheline in some cases. + */ + if (IS_ALIGNED((unsigned long)page, PAGE_SIZE) && + test_bit(PG_head, &page->flags)) { + /* + * We can safely access the field of the @page[1] with PG_head + * because the @page is a compound page composed with at least + * two contiguous pages. + */ + unsigned long head = READ_ONCE(page[1].compound_head); + + if (likely(head & 1)) + return (const struct page *)(head - 1); + } + return page; +} +#else +static inline const struct page *page_fixed_fake_head(const struct page *page) +{ + return page; +} +#endif + +static __always_inline int page_is_fake_head(struct page *page) +{ + return page_fixed_fake_head(page) != page; +} + static inline unsigned long _compound_head(const struct page *page) { unsigned long head = READ_ONCE(page->compound_head); if (unlikely(head & 1)) return head - 1; - return (unsigned long)page; + return (unsigned long)page_fixed_fake_head(page); } #define compound_head(page) ((typeof(page))_compound_head(page)) @@ -231,12 +287,13 @@ static inline unsigned long _compound_he static __always_inline int PageTail(struct page *page) { - return READ_ONCE(page->compound_head) & 1; + return READ_ONCE(page->compound_head) & 1 || page_is_fake_head(page); } static __always_inline int PageCompound(struct page *page) { - return test_bit(PG_head, &page->flags) || PageTail(page); + return test_bit(PG_head, &page->flags) || + READ_ONCE(page->compound_head) & 1; } #define PAGE_POISON_PATTERN -1l @@ -695,7 +752,20 @@ static inline bool test_set_page_writeba return set_page_writeback(page); } -__PAGEFLAG(Head, head, PF_ANY) CLEARPAGEFLAG(Head, head, PF_ANY) +static __always_inline bool folio_test_head(struct folio *folio) +{ + return test_bit(PG_head, folio_flags(folio, FOLIO_PF_ANY)); +} + +static __always_inline int PageHead(struct page *page) +{ + PF_POISONED_CHECK(page); + return test_bit(PG_head, &page->flags) && !page_is_fake_head(page); +} + +__SETPAGEFLAG(Head, head, PF_ANY) +__CLEARPAGEFLAG(Head, head, PF_ANY) +CLEARPAGEFLAG(Head, head, PF_ANY) /** * folio_test_large() - Does this folio contain more than one page? --- a/mm/hugetlb_vmemmap.c~mm-hugetlb-free-the-2nd-vmemmap-page-associated-with-each-hugetlb-page +++ a/mm/hugetlb_vmemmap.c @@ -124,9 +124,9 @@ * page of page structs (page 0) associated with the HugeTLB page contains the 4 * page structs necessary to describe the HugeTLB. The only use of the remaining * pages of page structs (page 1 to page 7) is to point to page->compound_head. - * Therefore, we can remap pages 2 to 7 to page 1. Only 2 pages of page structs + * Therefore, we can remap pages 1 to 7 to page 0. Only 1 page of page structs * will be used for each HugeTLB page. This will allow us to free the remaining - * 6 pages to the buddy allocator. + * 7 pages to the buddy allocator. * * Here is how things look after remapping. * @@ -134,30 +134,30 @@ * +-----------+ ---virt_to_page---> +-----------+ mapping to +-----------+ * | | | 0 | -------------> | 0 | * | | +-----------+ +-----------+ - * | | | 1 | -------------> | 1 | - * | | +-----------+ +-----------+ - * | | | 2 | ----------------^ ^ ^ ^ ^ ^ - * | | +-----------+ | | | | | - * | | | 3 | ------------------+ | | | | - * | | +-----------+ | | | | - * | | | 4 | --------------------+ | | | - * | PMD | +-----------+ | | | - * | level | | 5 | ----------------------+ | | - * | mapping | +-----------+ | | - * | | | 6 | ------------------------+ | - * | | +-----------+ | - * | | | 7 | --------------------------+ + * | | | 1 | ---------------^ ^ ^ ^ ^ ^ ^ + * | | +-----------+ | | | | | | + * | | | 2 | -----------------+ | | | | | + * | | +-----------+ | | | | | + * | | | 3 | -------------------+ | | | | + * | | +-----------+ | | | | + * | | | 4 | ---------------------+ | | | + * | PMD | +-----------+ | | | + * | level | | 5 | -----------------------+ | | + * | mapping | +-----------+ | | + * | | | 6 | -------------------------+ | + * | | +-----------+ | + * | | | 7 | ---------------------------+ * | | +-----------+ * | | * | | * | | * +-----------+ * - * When a HugeTLB is freed to the buddy system, we should allocate 6 pages for + * When a HugeTLB is freed to the buddy system, we should allocate 7 pages for * vmemmap pages and restore the previous mapping relationship. * * For the HugeTLB page of the pud level mapping. It is similar to the former. - * We also can use this approach to free (PAGE_SIZE - 2) vmemmap pages. + * We also can use this approach to free (PAGE_SIZE - 1) vmemmap pages. * * Apart from the HugeTLB page of the pmd/pud level mapping, some architectures * (e.g. aarch64) provides a contiguous bit in the translation table entries @@ -166,7 +166,13 @@ * * The contiguous bit is used to increase the mapping size at the pmd and pte * (last) level. So this type of HugeTLB page can be optimized only when its - * size of the struct page structs is greater than 2 pages. + * size of the struct page structs is greater than 1 page. + * + * Notice: The head vmemmap page is not freed to the buddy allocator and all + * tail vmemmap pages are mapped to the head vmemmap page frame. So we can see + * more than one struct page struct with PG_head (e.g. 8 per 2 MB HugeTLB page) + * associated with each HugeTLB page. The compound_head() can handle this + * correctly (more details refer to the comment above compound_head()). */ #define pr_fmt(fmt) "HugeTLB: " fmt @@ -175,19 +181,21 @@ /* * There are a lot of struct page structures associated with each HugeTLB page. * For tail pages, the value of compound_head is the same. So we can reuse first - * page of tail page structures. We map the virtual addresses of the remaining - * pages of tail page structures to the first tail page struct, and then free - * these page frames. Therefore, we need to reserve two pages as vmemmap areas. + * page of head page structures. We map the virtual addresses of all the pages + * of tail page structures to the head page struct, and then free these page + * frames. Therefore, we need to reserve one pages as vmemmap areas. */ -#define RESERVE_VMEMMAP_NR 2U +#define RESERVE_VMEMMAP_NR 1U #define RESERVE_VMEMMAP_SIZE (RESERVE_VMEMMAP_NR << PAGE_SHIFT) -bool hugetlb_free_vmemmap_enabled = IS_ENABLED(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON); +bool hugetlb_free_vmemmap_enabled __read_mostly = + IS_ENABLED(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON); +EXPORT_SYMBOL(hugetlb_free_vmemmap_enabled); static int __init early_hugetlb_free_vmemmap_param(char *buf) { /* We cannot optimize if a "struct page" crosses page boundaries. */ - if ((!is_power_of_2(sizeof(struct page)))) { + if (!is_power_of_2(sizeof(struct page))) { pr_warn("cannot free vmemmap pages because \"struct page\" crosses page boundaries\n"); return 0; } @@ -236,7 +244,6 @@ int alloc_huge_page_vmemmap(struct hstat */ ret = vmemmap_remap_alloc(vmemmap_addr, vmemmap_end, vmemmap_reuse, GFP_KERNEL | __GFP_NORETRY | __GFP_THISNODE); - if (!ret) ClearHPageVmemmapOptimized(head); @@ -282,9 +289,8 @@ void __init hugetlb_vmemmap_init(struct vmemmap_pages = (nr_pages * sizeof(struct page)) >> PAGE_SHIFT; /* - * The head page and the first tail page are not to be freed to buddy - * allocator, the other pages will map to the first tail page, so they - * can be freed. + * The head page is not to be freed to buddy allocator, the other tail + * pages will map to the head page, so they can be freed. * * Could RESERVE_VMEMMAP_NR be greater than @vmemmap_pages? It is true * on some architectures (e.g. aarch64). See Documentation/arm64/ --- a/mm/sparse-vmemmap.c~mm-hugetlb-free-the-2nd-vmemmap-page-associated-with-each-hugetlb-page +++ a/mm/sparse-vmemmap.c @@ -245,6 +245,26 @@ static void vmemmap_remap_pte(pte_t *pte set_pte_at(&init_mm, addr, pte, entry); } +/* + * How many struct page structs need to be reset. When we reuse the head + * struct page, the special metadata (e.g. page->flags or page->mapping) + * cannot copy to the tail struct page structs. The invalid value will be + * checked in the free_tail_pages_check(). In order to avoid the message + * of "corrupted mapping in tail page". We need to reset at least 3 (one + * head struct page struct and two tail struct page structs) struct page + * structs. + */ +#define NR_RESET_STRUCT_PAGE 3 + +static inline void reset_struct_pages(struct page *start) +{ + int i; + struct page *from = start + NR_RESET_STRUCT_PAGE; + + for (i = 0; i < NR_RESET_STRUCT_PAGE; i++) + memcpy(start + i, from, sizeof(*from)); +} + static void vmemmap_restore_pte(pte_t *pte, unsigned long addr, struct vmemmap_remap_walk *walk) { @@ -258,6 +278,7 @@ static void vmemmap_restore_pte(pte_t *p list_del(&page->lru); to = page_to_virt(page); copy_page(to, (void *)walk->reuse_addr); + reset_struct_pages(to); set_pte_at(&init_mm, addr, pte, mk_pte(page, pgprot)); } From patchwork Tue Mar 22 21:45:03 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789213 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BEFF5C433FE for ; Tue, 22 Mar 2022 21:45:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 516406B0126; Tue, 22 Mar 2022 17:45:07 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 475626B0127; Tue, 22 Mar 2022 17:45:07 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 364F16B0128; Tue, 22 Mar 2022 17:45:07 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0222.hostedemail.com [216.40.44.222]) by kanga.kvack.org (Postfix) with ESMTP id 1E2BC6B0126 for ; Tue, 22 Mar 2022 17:45:07 -0400 (EDT) Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id C83C8A2B0A for ; Tue, 22 Mar 2022 21:45:06 +0000 (UTC) X-FDA: 79273353012.29.6920D53 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf02.hostedemail.com (Postfix) with ESMTP id 312FF8001F for ; Tue, 22 Mar 2022 21:45:06 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 17F31B81DB9; Tue, 22 Mar 2022 21:45:05 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id BDFA1C340EE; Tue, 22 Mar 2022 21:45:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985503; bh=vPONigjaBCfZVM3/BVqTj7wOQmCbFeY346sYpFDejnI=; h=Date:To:From:In-Reply-To:Subject:From; b=z2dOorYZQqepQ7OWqyQfMFL0Czmg+Bx0Z5h+nyWcq8zZghYZB1ZKn+i4U+LAYfxBL O/Q1+14s55m93Ck8lQ7XUCPpccIA2BFoiAoFJ9r0UFdEBBKIWxMIDDQ3naWBr3QYi0 jyAFhkTReRcrJTNFHd24a2oyGO/WQX4paHTTlBLU= Date: Tue, 22 Mar 2022 14:45:03 -0700 To: zhengqi.arch@bytedance.com,willy@infradead.org,song.bao.hua@hisilicon.com,osalvador@suse.de,mike.kravetz@oracle.com,mhocko@suse.com,fam.zheng@bytedance.com,duanxiongchun@bytedance.com,david@redhat.com,corbet@lwn.net,chenhuang5@huawei.com,bodeddub@amazon.com,songmuchun@bytedance.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 129/227] mm: hugetlb: replace hugetlb_free_vmemmap_enabled with a static_key Message-Id: <20220322214503.BDFA1C340EE@smtp.kernel.org> X-Rspamd-Server: rspam09 X-Rspam-User: X-Stat-Signature: 9zeumkubc4kn7q9ztdw4jekom7suuu8a Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=z2dOorYZ; dmarc=none; spf=pass (imf02.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Rspamd-Queue-Id: 312FF8001F X-HE-Tag: 1647985506-388757 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Muchun Song Subject: mm: hugetlb: replace hugetlb_free_vmemmap_enabled with a static_key The page_fixed_fake_head() is used throughout memory management and the conditional check requires checking a global variable, although the overhead of this check may be small, it increases when the memory cache comes under pressure. Also, the global variable will not be modified after system boot, so it is very appropriate to use static key machanism. Link: https://lkml.kernel.org/r/20211101031651.75851-3-songmuchun@bytedance.com Signed-off-by: Muchun Song Reviewed-by: Barry Song Cc: Bodeddula Balasubramaniam Cc: Chen Huang Cc: David Hildenbrand Cc: Fam Zheng Cc: Jonathan Corbet Cc: Matthew Wilcox Cc: Michal Hocko Cc: Mike Kravetz Cc: Oscar Salvador Cc: Qi Zheng Cc: Xiongchun Duan Signed-off-by: Andrew Morton --- include/linux/hugetlb.h | 6 ------ include/linux/page-flags.h | 16 ++++++++++++++-- mm/hugetlb_vmemmap.c | 12 ++++++------ mm/memory_hotplug.c | 2 +- 4 files changed, 21 insertions(+), 15 deletions(-) --- a/include/linux/hugetlb.h~mm-hugetlb-replace-hugetlb_free_vmemmap_enabled-with-a-static_key +++ a/include/linux/hugetlb.h @@ -1075,12 +1075,6 @@ static inline void set_huge_swap_pte_at( } #endif /* CONFIG_HUGETLB_PAGE */ -#ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP -extern bool hugetlb_free_vmemmap_enabled; -#else -#define hugetlb_free_vmemmap_enabled false -#endif - static inline spinlock_t *huge_pte_lock(struct hstate *h, struct mm_struct *mm, pte_t *pte) { --- a/include/linux/page-flags.h~mm-hugetlb-replace-hugetlb_free_vmemmap_enabled-with-a-static_key +++ a/include/linux/page-flags.h @@ -191,7 +191,14 @@ enum pageflags { #ifndef __GENERATING_BOUNDS_H #ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP -extern bool hugetlb_free_vmemmap_enabled; +DECLARE_STATIC_KEY_MAYBE(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON, + hugetlb_free_vmemmap_enabled_key); + +static __always_inline bool hugetlb_free_vmemmap_enabled(void) +{ + return static_branch_maybe(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON, + &hugetlb_free_vmemmap_enabled_key); +} /* * If the feature of freeing some vmemmap pages associated with each HugeTLB @@ -211,7 +218,7 @@ extern bool hugetlb_free_vmemmap_enabled */ static __always_inline const struct page *page_fixed_fake_head(const struct page *page) { - if (!hugetlb_free_vmemmap_enabled) + if (!hugetlb_free_vmemmap_enabled()) return page; /* @@ -239,6 +246,11 @@ static inline const struct page *page_fi { return page; } + +static inline bool hugetlb_free_vmemmap_enabled(void) +{ + return false; +} #endif static __always_inline int page_is_fake_head(struct page *page) --- a/mm/hugetlb_vmemmap.c~mm-hugetlb-replace-hugetlb_free_vmemmap_enabled-with-a-static_key +++ a/mm/hugetlb_vmemmap.c @@ -188,9 +188,9 @@ #define RESERVE_VMEMMAP_NR 1U #define RESERVE_VMEMMAP_SIZE (RESERVE_VMEMMAP_NR << PAGE_SHIFT) -bool hugetlb_free_vmemmap_enabled __read_mostly = - IS_ENABLED(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON); -EXPORT_SYMBOL(hugetlb_free_vmemmap_enabled); +DEFINE_STATIC_KEY_MAYBE(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON, + hugetlb_free_vmemmap_enabled_key); +EXPORT_SYMBOL(hugetlb_free_vmemmap_enabled_key); static int __init early_hugetlb_free_vmemmap_param(char *buf) { @@ -204,9 +204,9 @@ static int __init early_hugetlb_free_vme return -EINVAL; if (!strcmp(buf, "on")) - hugetlb_free_vmemmap_enabled = true; + static_branch_enable(&hugetlb_free_vmemmap_enabled_key); else if (!strcmp(buf, "off")) - hugetlb_free_vmemmap_enabled = false; + static_branch_disable(&hugetlb_free_vmemmap_enabled_key); else return -EINVAL; @@ -284,7 +284,7 @@ void __init hugetlb_vmemmap_init(struct BUILD_BUG_ON(__NR_USED_SUBPAGE >= RESERVE_VMEMMAP_SIZE / sizeof(struct page)); - if (!hugetlb_free_vmemmap_enabled) + if (!hugetlb_free_vmemmap_enabled()) return; vmemmap_pages = (nr_pages * sizeof(struct page)) >> PAGE_SHIFT; --- a/mm/memory_hotplug.c~mm-hugetlb-replace-hugetlb_free_vmemmap_enabled-with-a-static_key +++ a/mm/memory_hotplug.c @@ -1327,7 +1327,7 @@ bool mhp_supports_memmap_on_memory(unsig * populate a single PMD. */ return memmap_on_memory && - !hugetlb_free_vmemmap_enabled && + !hugetlb_free_vmemmap_enabled() && IS_ENABLED(CONFIG_MHP_MEMMAP_ON_MEMORY) && size == memory_block_size_bytes() && IS_ALIGNED(vmemmap_size, PMD_SIZE) && From patchwork Tue Mar 22 21:45:06 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789214 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C9D85C433EF for ; Tue, 22 Mar 2022 21:45:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5D5A06B0128; Tue, 22 Mar 2022 17:45:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5863F6B0129; Tue, 22 Mar 2022 17:45:10 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 44CA46B012A; Tue, 22 Mar 2022 17:45:10 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.28]) by kanga.kvack.org (Postfix) with ESMTP id 3316A6B0128 for ; Tue, 22 Mar 2022 17:45:10 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 085FA24779 for ; Tue, 22 Mar 2022 21:45:10 +0000 (UTC) X-FDA: 79273353180.09.188110D Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf18.hostedemail.com (Postfix) with ESMTP id 2B60E1C002F for ; Tue, 22 Mar 2022 21:45:09 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 23645B81DB7; Tue, 22 Mar 2022 21:45:08 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id CDC35C340EE; Tue, 22 Mar 2022 21:45:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985506; bh=0kwVp1uCVNM1slEZwqSgpEq9T2hfQzOGDOCOgOTuyLo=; h=Date:To:From:In-Reply-To:Subject:From; b=RlALM8E+W1qsx2BH17YhyxMquSfH+KVsC+RJ3tMVNykvNYWWI827HAWsd1M9fAj9u 3gr/ePy7kV+UmVmvtjJmVXaPFXFUbnpTIcw22QbMR+BjC8Ub3UZYJyr1hA8ld0jslg tHr8hxdqVEG8XeaUS42yMGeRM5aLpLf/vE6OYbZM= Date: Tue, 22 Mar 2022 14:45:06 -0700 To: zhengqi.arch@bytedance.com,willy@infradead.org,song.bao.hua@hisilicon.com,osalvador@suse.de,mike.kravetz@oracle.com,mhocko@suse.com,fam.zheng@bytedance.com,duanxiongchun@bytedance.com,david@redhat.com,corbet@lwn.net,chenhuang5@huawei.com,bodeddub@amazon.com,songmuchun@bytedance.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 130/227] mm: sparsemem: use page table lock to protect kernel pmd operations Message-Id: <20220322214506.CDC35C340EE@smtp.kernel.org> X-Stat-Signature: c7x14rq1k7etcu5n7uixpryrjpoxc1ti Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=RlALM8E+; spf=pass (imf18.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 2B60E1C002F X-HE-Tag: 1647985509-924514 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Muchun Song Subject: mm: sparsemem: use page table lock to protect kernel pmd operations The init_mm.page_table_lock is used to protect kernel page tables, we can use it to serialize splitting vmemmap PMD mappings instead of mmap write lock, which can increase the concurrency of vmemmap_remap_free(). Actually, It increase the concurrency between allocations of HugeTLB pages. But it is not the only benefit. There are a lot of users of mmap read lock of init_mm. The mmap write lock is holding through vmemmap_remap_free(), removing mmap write lock usage to make it does not affect other users of mmap read lock. It is not making anything worse and always a win to move. Now the kernel page table walker does not hold the page_table_lock when walking pmd entries. There may be consistency issue of a pmd entry, because pmd entry might change from a huge pmd entry to a PTE page table. There is only one user of kernel page table walker, namely ptdump. The ptdump already considers the consistency, which use a local variable to cache the value of pmd entry. But we also need to update ->action to ACTION_CONTINUE to make sure the walker does not walk every pte entry again when concurrent thread has split the huge pmd. Link: https://lkml.kernel.org/r/20211101031651.75851-4-songmuchun@bytedance.com Signed-off-by: Muchun Song Cc: Barry Song Cc: Bodeddula Balasubramaniam Cc: Chen Huang Cc: David Hildenbrand Cc: Fam Zheng Cc: Jonathan Corbet Cc: Matthew Wilcox Cc: Michal Hocko Cc: Mike Kravetz Cc: Oscar Salvador Cc: Qi Zheng Cc: Xiongchun Duan Signed-off-by: Andrew Morton --- mm/ptdump.c | 16 ++++++++++---- mm/sparse-vmemmap.c | 47 +++++++++++++++++++++++++++--------------- 2 files changed, 43 insertions(+), 20 deletions(-) --- a/mm/ptdump.c~mm-sparsemem-use-page-table-lock-to-protect-kernel-pmd-operations +++ a/mm/ptdump.c @@ -40,8 +40,10 @@ static int ptdump_pgd_entry(pgd_t *pgd, if (st->effective_prot) st->effective_prot(st, 0, pgd_val(val)); - if (pgd_leaf(val)) + if (pgd_leaf(val)) { st->note_page(st, addr, 0, pgd_val(val)); + walk->action = ACTION_CONTINUE; + } return 0; } @@ -61,8 +63,10 @@ static int ptdump_p4d_entry(p4d_t *p4d, if (st->effective_prot) st->effective_prot(st, 1, p4d_val(val)); - if (p4d_leaf(val)) + if (p4d_leaf(val)) { st->note_page(st, addr, 1, p4d_val(val)); + walk->action = ACTION_CONTINUE; + } return 0; } @@ -82,8 +86,10 @@ static int ptdump_pud_entry(pud_t *pud, if (st->effective_prot) st->effective_prot(st, 2, pud_val(val)); - if (pud_leaf(val)) + if (pud_leaf(val)) { st->note_page(st, addr, 2, pud_val(val)); + walk->action = ACTION_CONTINUE; + } return 0; } @@ -101,8 +107,10 @@ static int ptdump_pmd_entry(pmd_t *pmd, if (st->effective_prot) st->effective_prot(st, 3, pmd_val(val)); - if (pmd_leaf(val)) + if (pmd_leaf(val)) { st->note_page(st, addr, 3, pmd_val(val)); + walk->action = ACTION_CONTINUE; + } return 0; } --- a/mm/sparse-vmemmap.c~mm-sparsemem-use-page-table-lock-to-protect-kernel-pmd-operations +++ a/mm/sparse-vmemmap.c @@ -53,8 +53,7 @@ struct vmemmap_remap_walk { struct list_head *vmemmap_pages; }; -static int split_vmemmap_huge_pmd(pmd_t *pmd, unsigned long start, - struct vmemmap_remap_walk *walk) +static int __split_vmemmap_huge_pmd(pmd_t *pmd, unsigned long start) { pmd_t __pmd; int i; @@ -76,15 +75,34 @@ static int split_vmemmap_huge_pmd(pmd_t set_pte_at(&init_mm, addr, pte, entry); } - /* Make pte visible before pmd. See comment in pmd_install(). */ - smp_wmb(); - pmd_populate_kernel(&init_mm, pmd, pgtable); - - flush_tlb_kernel_range(start, start + PMD_SIZE); + spin_lock(&init_mm.page_table_lock); + if (likely(pmd_leaf(*pmd))) { + /* Make pte visible before pmd. See comment in pmd_install(). */ + smp_wmb(); + pmd_populate_kernel(&init_mm, pmd, pgtable); + flush_tlb_kernel_range(start, start + PMD_SIZE); + } else { + pte_free_kernel(&init_mm, pgtable); + } + spin_unlock(&init_mm.page_table_lock); return 0; } +static int split_vmemmap_huge_pmd(pmd_t *pmd, unsigned long start) +{ + int leaf; + + spin_lock(&init_mm.page_table_lock); + leaf = pmd_leaf(*pmd); + spin_unlock(&init_mm.page_table_lock); + + if (!leaf) + return 0; + + return __split_vmemmap_huge_pmd(pmd, start); +} + static void vmemmap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, struct vmemmap_remap_walk *walk) @@ -121,13 +139,12 @@ static int vmemmap_pmd_range(pud_t *pud, pmd = pmd_offset(pud, addr); do { - if (pmd_leaf(*pmd)) { - int ret; + int ret; + + ret = split_vmemmap_huge_pmd(pmd, addr & PMD_MASK); + if (ret) + return ret; - ret = split_vmemmap_huge_pmd(pmd, addr & PMD_MASK, walk); - if (ret) - return ret; - } next = pmd_addr_end(addr, end); vmemmap_pte_range(pmd, addr, next, walk); } while (pmd++, addr = next, addr != end); @@ -321,10 +338,8 @@ int vmemmap_remap_free(unsigned long sta */ BUG_ON(start - reuse != PAGE_SIZE); - mmap_write_lock(&init_mm); + mmap_read_lock(&init_mm); ret = vmemmap_remap_range(reuse, end, &walk); - mmap_write_downgrade(&init_mm); - if (ret && walk.nr_walked) { end = reuse + walk.nr_walked * PAGE_SIZE; /* From patchwork Tue Mar 22 21:45:09 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789215 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5BACDC433F5 for ; Tue, 22 Mar 2022 21:45:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 909C96B0129; Tue, 22 Mar 2022 17:45:11 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8912D6B012A; Tue, 22 Mar 2022 17:45:11 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 731A56B012B; Tue, 22 Mar 2022 17:45:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.27]) by kanga.kvack.org (Postfix) with ESMTP id 6382E6B0129 for ; Tue, 22 Mar 2022 17:45:11 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 36DE321AB7 for ; Tue, 22 Mar 2022 21:45:11 +0000 (UTC) X-FDA: 79273353222.09.524C2A2 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf16.hostedemail.com (Postfix) with ESMTP id ADAD018002A for ; Tue, 22 Mar 2022 21:45:10 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 272DA61655; Tue, 22 Mar 2022 21:45:10 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id DDC56C340F2; Tue, 22 Mar 2022 21:45:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985510; bh=q4zS/1ME6Kub7gmvh3UfUF6RtdaBL0kzq9W6jVdq3aM=; h=Date:To:From:In-Reply-To:Subject:From; b=or33exVIOlX8MT0MV8rGdpTvKYCn908k4NnSg0dDIOH7BEr+UCUiqyNGLyjkQtNUc P7iY/K2lbrCtOZLHJec7j4hNiA7LbHQfCee1MPWhaQphVCx4S2YJtXt9cu0RlVWNvG nes3TTU445epqRBViiG2XUUFgO2pBnPcPtmpYCAc= Date: Tue, 22 Mar 2022 14:45:09 -0700 To: zhengqi.arch@bytedance.com,willy@infradead.org,song.bao.hua@hisilicon.com,osalvador@suse.de,mike.kravetz@oracle.com,mhocko@suse.com,fam.zheng@bytedance.com,duanxiongchun@bytedance.com,david@redhat.com,corbet@lwn.net,chenhuang5@huawei.com,bodeddub@amazon.com,songmuchun@bytedance.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 131/227] selftests: vm: add a hugetlb test case Message-Id: <20220322214509.DDC56C340F2@smtp.kernel.org> X-Rspam-User: X-Stat-Signature: fr7k8ic4n8akome9n7cq9woeiu9f8fmn Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=or33exVI; spf=pass (imf16.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: ADAD018002A X-HE-Tag: 1647985510-321520 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Muchun Song Subject: selftests: vm: add a hugetlb test case Since the head vmemmap page frame associated with each HugeTLB page is reused, we should hide the PG_head flag of tail struct page from the user. Add a tese case to check whether it is work properly. The test steps are as follows. 1) alloc 2MB hugeTLB 2) get each page frame 3) apply those APIs in each page frame 4) Those APIs work completely the same as before. Reading the flags of a page by /proc/kpageflags is done in stable_page_flags(), which has invoked PageHead(), PageTail(), PageCompound() and compound_head(). If those APIs work properly, the head page must have 15 and 17 bits set. And tail pages must have 16 and 17 bits set but 15 bit unset. Those flags are checked in check_page_flags(). Link: https://lkml.kernel.org/r/20211101031651.75851-5-songmuchun@bytedance.com Signed-off-by: Muchun Song Reviewed-by: Barry Song Cc: Bodeddula Balasubramaniam Cc: Chen Huang Cc: David Hildenbrand Cc: Fam Zheng Cc: Jonathan Corbet Cc: Matthew Wilcox Cc: Michal Hocko Cc: Mike Kravetz Cc: Oscar Salvador Cc: Qi Zheng Cc: Xiongchun Duan Signed-off-by: Andrew Morton --- tools/testing/selftests/vm/.gitignore | 1 tools/testing/selftests/vm/Makefile | 1 tools/testing/selftests/vm/hugepage-vmemmap.c | 144 ++++++++++++++++ tools/testing/selftests/vm/run_vmtests.sh | 11 + 4 files changed, 157 insertions(+) --- a/tools/testing/selftests/vm/.gitignore~selftests-vm-add-a-hugetlb-test-case +++ a/tools/testing/selftests/vm/.gitignore @@ -2,6 +2,7 @@ hugepage-mmap hugepage-mremap hugepage-shm +hugepage-vmemmap khugepaged map_hugetlb map_populate --- /dev/null +++ a/tools/testing/selftests/vm/hugepage-vmemmap.c @@ -0,0 +1,144 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * A test case of using hugepage memory in a user application using the + * mmap system call with MAP_HUGETLB flag. Before running this program + * make sure the administrator has allocated enough default sized huge + * pages to cover the 2 MB allocation. + */ +#include +#include +#include +#include +#include + +#define MAP_LENGTH (2UL * 1024 * 1024) + +#ifndef MAP_HUGETLB +#define MAP_HUGETLB 0x40000 /* arch specific */ +#endif + +#define PAGE_SIZE 4096 + +#define PAGE_COMPOUND_HEAD (1UL << 15) +#define PAGE_COMPOUND_TAIL (1UL << 16) +#define PAGE_HUGE (1UL << 17) + +#define HEAD_PAGE_FLAGS (PAGE_COMPOUND_HEAD | PAGE_HUGE) +#define TAIL_PAGE_FLAGS (PAGE_COMPOUND_TAIL | PAGE_HUGE) + +#define PM_PFRAME_BITS 55 +#define PM_PFRAME_MASK ~((1UL << PM_PFRAME_BITS) - 1) + +/* + * For ia64 architecture, Linux kernel reserves Region number 4 for hugepages. + * That means the addresses starting with 0x800000... will need to be + * specified. Specifying a fixed address is not required on ppc64, i386 + * or x86_64. + */ +#ifdef __ia64__ +#define MAP_ADDR (void *)(0x8000000000000000UL) +#define MAP_FLAGS (MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB | MAP_FIXED) +#else +#define MAP_ADDR NULL +#define MAP_FLAGS (MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB) +#endif + +static void write_bytes(char *addr, size_t length) +{ + unsigned long i; + + for (i = 0; i < length; i++) + *(addr + i) = (char)i; +} + +static unsigned long virt_to_pfn(void *addr) +{ + int fd; + unsigned long pagemap; + + fd = open("/proc/self/pagemap", O_RDONLY); + if (fd < 0) + return -1UL; + + lseek(fd, (unsigned long)addr / PAGE_SIZE * sizeof(pagemap), SEEK_SET); + read(fd, &pagemap, sizeof(pagemap)); + close(fd); + + return pagemap & ~PM_PFRAME_MASK; +} + +static int check_page_flags(unsigned long pfn) +{ + int fd, i; + unsigned long pageflags; + + fd = open("/proc/kpageflags", O_RDONLY); + if (fd < 0) + return -1; + + lseek(fd, pfn * sizeof(pageflags), SEEK_SET); + + read(fd, &pageflags, sizeof(pageflags)); + if ((pageflags & HEAD_PAGE_FLAGS) != HEAD_PAGE_FLAGS) { + close(fd); + printf("Head page flags (%lx) is invalid\n", pageflags); + return -1; + } + + /* + * pages other than the first page must be tail and shouldn't be head; + * this also verifies kernel has correctly set the fake page_head to tail + * while hugetlb_free_vmemmap is enabled. + */ + for (i = 1; i < MAP_LENGTH / PAGE_SIZE; i++) { + read(fd, &pageflags, sizeof(pageflags)); + if ((pageflags & TAIL_PAGE_FLAGS) != TAIL_PAGE_FLAGS || + (pageflags & HEAD_PAGE_FLAGS) == HEAD_PAGE_FLAGS) { + close(fd); + printf("Tail page flags (%lx) is invalid\n", pageflags); + return -1; + } + } + + close(fd); + + return 0; +} + +int main(int argc, char **argv) +{ + void *addr; + unsigned long pfn; + + addr = mmap(MAP_ADDR, MAP_LENGTH, PROT_READ | PROT_WRITE, MAP_FLAGS, -1, 0); + if (addr == MAP_FAILED) { + perror("mmap"); + exit(1); + } + + /* Trigger allocation of HugeTLB page. */ + write_bytes(addr, MAP_LENGTH); + + pfn = virt_to_pfn(addr); + if (pfn == -1UL) { + munmap(addr, MAP_LENGTH); + perror("virt_to_pfn"); + exit(1); + } + + printf("Returned address is %p whose pfn is %lx\n", addr, pfn); + + if (check_page_flags(pfn) < 0) { + munmap(addr, MAP_LENGTH); + perror("check_page_flags"); + exit(1); + } + + /* munmap() length of MAP_HUGETLB memory must be hugepage aligned */ + if (munmap(addr, MAP_LENGTH)) { + perror("munmap"); + exit(1); + } + + return 0; +} --- a/tools/testing/selftests/vm/Makefile~selftests-vm-add-a-hugetlb-test-case +++ a/tools/testing/selftests/vm/Makefile @@ -33,6 +33,7 @@ TEST_GEN_FILES += hmm-tests TEST_GEN_FILES += hugepage-mmap TEST_GEN_FILES += hugepage-mremap TEST_GEN_FILES += hugepage-shm +TEST_GEN_FILES += hugepage-vmemmap TEST_GEN_FILES += khugepaged TEST_GEN_FILES += madv_populate TEST_GEN_FILES += map_fixed_noreplace --- a/tools/testing/selftests/vm/run_vmtests.sh~selftests-vm-add-a-hugetlb-test-case +++ a/tools/testing/selftests/vm/run_vmtests.sh @@ -120,6 +120,17 @@ else fi rm -f $mnt/huge_mremap +echo "------------------------" +echo "running hugepage-vmemmap" +echo "------------------------" +./hugepage-vmemmap +if [ $? -ne 0 ]; then + echo "[FAIL]" + exitcode=1 +else + echo "[PASS]" +fi + echo "NOTE: The above hugetlb tests provide minimal coverage. Use" echo " https://github.com/libhugetlbfs/libhugetlbfs.git for" echo " hugetlb regression testing." From patchwork Tue Mar 22 21:45:12 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789216 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8E6ACC433F5 for ; Tue, 22 Mar 2022 21:45:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 251F26B012C; Tue, 22 Mar 2022 17:45:16 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 201566B012D; Tue, 22 Mar 2022 17:45:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0F1526B012E; Tue, 22 Mar 2022 17:45:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.a.hostedemail.com [64.99.140.24]) by kanga.kvack.org (Postfix) with ESMTP id 007E36B012C for ; Tue, 22 Mar 2022 17:45:15 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id D2BC823F83 for ; Tue, 22 Mar 2022 21:45:15 +0000 (UTC) X-FDA: 79273353390.02.9680ACB Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf01.hostedemail.com (Postfix) with ESMTP id 6434A4002C for ; Tue, 22 Mar 2022 21:45:15 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 48393B81DAD; Tue, 22 Mar 2022 21:45:14 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0295AC340EC; Tue, 22 Mar 2022 21:45:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985513; bh=EsPE2pXMRoOflKuqFla5ZNuHb5UVnMvBNCPCleVykuc=; h=Date:To:From:In-Reply-To:Subject:From; b=SOgYP04XHqcfUNy9KbObB7Xo1YGyJs1vLJHWSI9nVGsBHSjTXrz5hXWB2qhcF3Iex +W8Tz7WHWE/U4nvanutYygDHfuGa9h5sGks3hfljY9JdZoRYgfCGyn5dygVgnjylrB miUSVufxp7fEfaW9i8RUGCxJ5awDpMMznHto5U4M= Date: Tue, 22 Mar 2022 14:45:12 -0700 To: zhengqi.arch@bytedance.com,willy@infradead.org,song.bao.hua@hisilicon.com,osalvador@suse.de,mike.kravetz@oracle.com,mhocko@suse.com,fam.zheng@bytedance.com,duanxiongchun@bytedance.com,david@redhat.com,corbet@lwn.net,chenhuang5@huawei.com,bodeddub@amazon.com,songmuchun@bytedance.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 132/227] mm: sparsemem: move vmemmap related to HugeTLB to CONFIG_HUGETLB_PAGE_FREE_VMEMMAP Message-Id: <20220322214513.0295AC340EC@smtp.kernel.org> X-Rspam-User: Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=SOgYP04X; spf=pass (imf01.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 6434A4002C X-Stat-Signature: z1y6pkrxoujpedqms8fxfzndbxih55p3 X-HE-Tag: 1647985515-787547 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Muchun Song Subject: mm: sparsemem: move vmemmap related to HugeTLB to CONFIG_HUGETLB_PAGE_FREE_VMEMMAP The vmemmap_remap_free/alloc are relevant to HugeTLB, so move those functiongs to the scope of CONFIG_HUGETLB_PAGE_FREE_VMEMMAP. Link: https://lkml.kernel.org/r/20211101031651.75851-6-songmuchun@bytedance.com Signed-off-by: Muchun Song Reviewed-by: Barry Song Cc: Bodeddula Balasubramaniam Cc: Chen Huang Cc: David Hildenbrand Cc: Fam Zheng Cc: Jonathan Corbet Cc: Matthew Wilcox Cc: Michal Hocko Cc: Mike Kravetz Cc: Oscar Salvador Cc: Qi Zheng Cc: Xiongchun Duan Signed-off-by: Andrew Morton --- include/linux/mm.h | 2 ++ mm/sparse-vmemmap.c | 2 ++ 2 files changed, 4 insertions(+) --- a/include/linux/mm.h~mm-sparsemem-move-vmemmap-related-to-hugetlb-to-config_hugetlb_page_free_vmemmap +++ a/include/linux/mm.h @@ -3146,10 +3146,12 @@ static inline void print_vma_addr(char * } #endif +#ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP int vmemmap_remap_free(unsigned long start, unsigned long end, unsigned long reuse); int vmemmap_remap_alloc(unsigned long start, unsigned long end, unsigned long reuse, gfp_t gfp_mask); +#endif void *sparse_buffer_alloc(unsigned long size); struct page * __populate_section_memmap(unsigned long pfn, --- a/mm/sparse-vmemmap.c~mm-sparsemem-move-vmemmap-related-to-hugetlb-to-config_hugetlb_page_free_vmemmap +++ a/mm/sparse-vmemmap.c @@ -34,6 +34,7 @@ #include #include +#ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP /** * struct vmemmap_remap_walk - walk vmemmap page table * @@ -419,6 +420,7 @@ int vmemmap_remap_alloc(unsigned long st return 0; } +#endif /* CONFIG_HUGETLB_PAGE_FREE_VMEMMAP */ /* * Allocate a block of memory to be used to back the virtual memory map From patchwork Tue Mar 22 21:45:15 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789217 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C69C1C433F5 for ; Tue, 22 Mar 2022 21:45:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 587C46B012E; Tue, 22 Mar 2022 17:45:20 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 536D96B012F; Tue, 22 Mar 2022 17:45:20 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 424C26B0130; Tue, 22 Mar 2022 17:45:20 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0105.hostedemail.com [216.40.44.105]) by kanga.kvack.org (Postfix) with ESMTP id 318B26B012E for ; Tue, 22 Mar 2022 17:45:20 -0400 (EDT) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 0325FA2B0A for ; Tue, 22 Mar 2022 21:45:20 +0000 (UTC) X-FDA: 79273353600.18.F7FD1C6 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf11.hostedemail.com (Postfix) with ESMTP id 2C3CE4000F for ; Tue, 22 Mar 2022 21:45:19 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id A23886104C; Tue, 22 Mar 2022 21:45:18 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0493DC340EE; Tue, 22 Mar 2022 21:45:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985518; bh=KgFxigUkeYF+PFEggiPurq/9Jlhixybpa2gQoa4tROE=; h=Date:To:From:In-Reply-To:Subject:From; b=yFEY1JqFq+c/WASd69mRQGgIsHZuntzedT1Dy/MC7tuoM2VPZ8NcYRfu9TkQ3Tfcx 9PjuN1yvKoeIQQI6RDLQvZhlhAPAm4MdistvVRCG0USTxptNzKgX/yd1T5y/LmvJfj 4OF15R7wpAyScgMIq1GQpzdV1UU0HTkuuPZuCIs8= Date: Tue, 22 Mar 2022 14:45:15 -0700 To: tglx@linutronix.de,paul.walmsley@sifive.com,palmer@dabbelt.com,mingo@redhat.com,mike.kravetz@oracle.com,linux@armlinux.org.uk,anshuman.khandual@arm.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 133/227] mm/hugetlb: generalize ARCH_WANT_GENERAL_HUGETLB Message-Id: <20220322214518.0493DC340EE@smtp.kernel.org> Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=yFEY1JqF; spf=pass (imf11.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Stat-Signature: q97pajbn1nmuazdz1yhb7iaagpwzp3hd X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 2C3CE4000F X-HE-Tag: 1647985519-103890 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Anshuman Khandual Subject: mm/hugetlb: generalize ARCH_WANT_GENERAL_HUGETLB ARCH_WANT_GENERAL_HUGETLB config has duplicate definitions on platforms that subscribe it. Instead make it a generic config option which can be selected on applicable platforms when required. Link: https://lkml.kernel.org/r/1643718465-4324-1-git-send-email-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual Cc: Russell King Cc: Paul Walmsley Cc: Palmer Dabbelt Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Mike Kravetz Signed-off-by: Andrew Morton --- arch/arm/Kconfig | 4 +--- arch/riscv/Kconfig | 4 +--- arch/x86/Kconfig | 4 +--- mm/Kconfig | 3 +++ 4 files changed, 6 insertions(+), 9 deletions(-) --- a/arch/arm/Kconfig~mm-hugetlb-generalize-arch_want_general_hugetlb +++ a/arch/arm/Kconfig @@ -37,6 +37,7 @@ config ARM select ARCH_USE_CMPXCHG_LOCKREF select ARCH_USE_MEMTEST select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT if MMU + select ARCH_WANT_GENERAL_HUGETLB select ARCH_WANT_IPC_PARSE_VERSION select ARCH_WANT_LD_ORPHAN_WARN select BINFMT_FLAT_ARGVP_ENVP_ON_STACK @@ -1508,9 +1509,6 @@ config HW_PERF_EVENTS def_bool y depends on ARM_PMU -config ARCH_WANT_GENERAL_HUGETLB - def_bool y - config ARM_MODULE_PLTS bool "Use PLTs to allow module memory to spill over into vmalloc area" depends on MODULES --- a/arch/riscv/Kconfig~mm-hugetlb-generalize-arch_want_general_hugetlb +++ a/arch/riscv/Kconfig @@ -40,6 +40,7 @@ config RISCV select ARCH_USE_MEMTEST select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT if MMU select ARCH_WANT_FRAME_POINTERS + select ARCH_WANT_GENERAL_HUGETLB select ARCH_WANT_HUGE_PMD_SHARE if 64BIT select BINFMT_FLAT_NO_DATA_START_OFFSET if !MMU select BUILDTIME_TABLE_SORT if MMU @@ -171,9 +172,6 @@ config ARCH_SPARSEMEM_ENABLE config ARCH_SELECT_MEMORY_MODEL def_bool ARCH_SPARSEMEM_ENABLE -config ARCH_WANT_GENERAL_HUGETLB - def_bool y - config ARCH_SUPPORTS_UPROBES def_bool y --- a/arch/x86/Kconfig~mm-hugetlb-generalize-arch_want_general_hugetlb +++ a/arch/x86/Kconfig @@ -118,6 +118,7 @@ config X86 select ARCH_WANT_DEFAULT_BPF_JIT if X86_64 select ARCH_WANTS_DYNAMIC_TASK_STRUCT select ARCH_WANTS_NO_INSTR + select ARCH_WANT_GENERAL_HUGETLB select ARCH_WANT_HUGE_PMD_SHARE select ARCH_WANT_LD_ORPHAN_WARN select ARCH_WANTS_THP_SWAP if X86_64 @@ -347,9 +348,6 @@ config ARCH_NR_GPIO config ARCH_SUSPEND_POSSIBLE def_bool y -config ARCH_WANT_GENERAL_HUGETLB - def_bool y - config AUDIT_ARCH def_bool y if X86_64 --- a/mm/Kconfig~mm-hugetlb-generalize-arch_want_general_hugetlb +++ a/mm/Kconfig @@ -414,6 +414,9 @@ choice benefit. endchoice +config ARCH_WANT_GENERAL_HUGETLB + bool + config ARCH_WANTS_THP_SWAP def_bool n From patchwork Tue Mar 22 21:45:20 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789218 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1250DC433FE for ; Tue, 22 Mar 2022 21:45:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9A4DF6B0130; Tue, 22 Mar 2022 17:45:22 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 92DBE6B0131; Tue, 22 Mar 2022 17:45:22 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 81F736B0132; Tue, 22 Mar 2022 17:45:22 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0223.hostedemail.com [216.40.44.223]) by kanga.kvack.org (Postfix) with ESMTP id 7243A6B0130 for ; Tue, 22 Mar 2022 17:45:22 -0400 (EDT) Received: from smtpin16.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 38CBF1828AE47 for ; Tue, 22 Mar 2022 21:45:22 +0000 (UTC) X-FDA: 79273353684.16.2EC00F4 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf25.hostedemail.com (Postfix) with ESMTP id C8393A0035 for ; Tue, 22 Mar 2022 21:45:21 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 4472E6119A; Tue, 22 Mar 2022 21:45:21 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 06E1EC340EC; Tue, 22 Mar 2022 21:45:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985521; bh=jH8dF7GDqUGZPmmXDZ4LtM2Kj40YUDVUvC12nBofJUE=; h=Date:To:From:In-Reply-To:Subject:From; b=0w+vLIgBJ6Y0OOXMHFc2ZCycmpK2I/yanNMe6iD9OLNpVDCf1NyvXdeGRWotLn1p3 R7ZtxbsbKPGx8Wju5EWlwV8ttD1E+Ow6Ud+4EpMvprE6p8J1/t2EMCaPVxLBVKXMIo 2GGhH1+aM0pkv5wOPJmjzctEMoFnNPMp9QtxEJRk= Date: Tue, 22 Mar 2022 14:45:20 -0700 To: yaozhenguo1@gmail.com,mhocko@suse.com,liuyuntao10@huawei.com,dan.carpenter@oracle.com,baolin.wang@linux.alibaba.com,mike.kravetz@oracle.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 134/227] hugetlb: clean up potential spectre issue warnings Message-Id: <20220322214521.06E1EC340EC@smtp.kernel.org> X-Rspamd-Server: rspam09 X-Rspam-User: X-Stat-Signature: mkp46o5156mwc68c4w5xrohpqt3n5dk3 Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=0w+vLIgB; dmarc=none; spf=pass (imf25.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Rspamd-Queue-Id: C8393A0035 X-HE-Tag: 1647985521-337311 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Mike Kravetz Subject: hugetlb: clean up potential spectre issue warnings Recently introduced code allows numa nodes to be specified on the kernel command line for hugetlb allocations or CMA reservations. The node values are user specified and used as indicies into arrays. This generated the following smatch warnings: mm/hugetlb.c:4170 hugepages_setup() warn: potential spectre issue 'default_hugepages_in_node' [w] mm/hugetlb.c:4172 hugepages_setup() warn: potential spectre issue 'parsed_hstate->max_huge_pages_node' [w] mm/hugetlb.c:6898 cmdline_parse_hugetlb_cma() warn: potential spectre issue 'hugetlb_cma_size_in_node' [w] (local cap) Clean up by using array_index_nospec to sanitize array indicies. The routine cmdline_parse_hugetlb_cma has the same overflow/truncation issue addressed in [1]. That is also fixed with this change. [1] https://lore.kernel.org/linux-mm/20220209134018.8242-1-liuyuntao10@huawei.com/ As Michal pointed out, this is unlikely to be exploitable because it is __init code. But the patch suppresses the warnings. [mike.kravetz@oracle.com: v2] Link: https://lkml.kernel.org/r/20220218212946.35441-1-mike.kravetz@oracle.com Link: https://lkml.kernel.org/r/20220217234218.192885-1-mike.kravetz@oracle.com Signed-off-by: Mike Kravetz Cc: Baolin Wang Cc: Zhenguo Yao Cc: Liu Yuntao Cc: Dan Carpenter Cc: Michal Hocko Signed-off-by: Andrew Morton --- mm/hugetlb.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) --- a/mm/hugetlb.c~hugetlb-clean-up-potential-spectre-issue-warnings +++ a/mm/hugetlb.c @@ -31,6 +31,7 @@ #include #include #include +#include #include #include @@ -4161,7 +4162,7 @@ static int __init hugepages_setup(char * } if (tmp >= nr_online_nodes) goto invalid; - node = tmp; + node = array_index_nospec(tmp, nr_online_nodes); p += count + 1; /* Parse hugepages */ if (sscanf(p, "%lu%n", &tmp, &count) != 1) @@ -6889,9 +6890,9 @@ static int __init cmdline_parse_hugetlb_ break; if (s[count] == ':') { - nid = tmp; - if (nid < 0 || nid >= MAX_NUMNODES) + if (tmp >= MAX_NUMNODES) break; + nid = array_index_nospec(tmp, MAX_NUMNODES); s += count + 1; tmp = memparse(s, &s); From patchwork Tue Mar 22 21:45:23 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789220 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E072EC433EF for ; Tue, 22 Mar 2022 21:45:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 403656B0133; Tue, 22 Mar 2022 17:45:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2EC5C6B0134; Tue, 22 Mar 2022 17:45:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0F0016B0135; Tue, 22 Mar 2022 17:45:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0040.hostedemail.com [216.40.44.40]) by kanga.kvack.org (Postfix) with ESMTP id E2EAF6B0133 for ; Tue, 22 Mar 2022 17:45:30 -0400 (EDT) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 9FE968249980 for ; Tue, 22 Mar 2022 21:45:30 +0000 (UTC) X-FDA: 79273354020.22.F447989 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf07.hostedemail.com (Postfix) with ESMTP id 0B2B440045 for ; Tue, 22 Mar 2022 21:45:24 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 993C06104C; Tue, 22 Mar 2022 21:45:24 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id F1983C340EC; Tue, 22 Mar 2022 21:45:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985524; bh=6w9+CxZXSPgJv6PDHksGM+bIyjBwyBPpEv+CDSlrXOw=; h=Date:To:From:In-Reply-To:Subject:From; b=rfVIue0METJk/Xq0XBuXCGRC1J5UqPwuNboPwgrH1I718HqnYRlvxWoJaUZAOdVJZ scwdb57/6JNvykH5vQAkqdimy284yXdDPSjIB3P4VgfBj7vzxMGOzocz/8csP4vnTb 0YaKkJ5WqWi3qhC3t2NBXqjRUc9zJHBgkCal67Q8= Date: Tue, 22 Mar 2022 14:45:23 -0700 To: songmuchun@bytedance.com,mike.kravetz@oracle.com,linmiaohe@huawei.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 135/227] mm/hugetlb: use helper macro __ATTR_RW Message-Id: <20220322214523.F1983C340EC@smtp.kernel.org> X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 0B2B440045 X-Stat-Signature: sdwzjehazsarw7cpcs7qnnndtfomc7uz X-Rspam-User: Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=rfVIue0M; dmarc=none; spf=pass (imf07.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-HE-Tag: 1647985524-91056 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Miaohe Lin Subject: mm/hugetlb: use helper macro __ATTR_RW Use helper macro __ATTR_RW to define HSTATE_ATTR to make code more clear. Minor readability improvement. Link: https://lkml.kernel.org/r/20220222112731.33479-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin Reviewed-by: Mike Kravetz Reviewed-by: Muchun Song Signed-off-by: Andrew Morton --- mm/hugetlb.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) --- a/mm/hugetlb.c~mm-hugetlb-use-helper-macro-__attr_rw +++ a/mm/hugetlb.c @@ -3499,8 +3499,7 @@ static int demote_pool_huge_page(struct static struct kobj_attribute _name##_attr = __ATTR_WO(_name) #define HSTATE_ATTR(_name) \ - static struct kobj_attribute _name##_attr = \ - __ATTR(_name, 0644, _name##_show, _name##_store) + static struct kobj_attribute _name##_attr = __ATTR_RW(_name) static struct kobject *hugepages_kobj; static struct kobject *hstate_kobjs[HUGE_MAX_HSTATE]; From patchwork Tue Mar 22 21:45:26 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789219 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D3087C433F5 for ; Tue, 22 Mar 2022 21:45:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6BEB46B0132; Tue, 22 Mar 2022 17:45:30 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 66DF56B0133; Tue, 22 Mar 2022 17:45:30 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 535ED6B0134; Tue, 22 Mar 2022 17:45:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.26]) by kanga.kvack.org (Postfix) with ESMTP id 411E86B0132 for ; Tue, 22 Mar 2022 17:45:30 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 20DF522B99 for ; Tue, 22 Mar 2022 21:45:30 +0000 (UTC) X-FDA: 79273354020.11.9EB6CD1 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf28.hostedemail.com (Postfix) with ESMTP id 8DF29C002D for ; Tue, 22 Mar 2022 21:45:29 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 5E559B81D77; Tue, 22 Mar 2022 21:45:28 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 05F8DC340EC; Tue, 22 Mar 2022 21:45:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985527; bh=OetOeqPdLI4ZR8ZX+aYpyGdrzB+LenaHlloTwIB5/zo=; h=Date:To:From:In-Reply-To:Subject:From; b=0NRfyC9mVWsQ2z6NwepGE44iY/ttwFuM+IrsGtJazjJ6psqb4bLNkayertwW2EDcl BR9wLLR5fjUE6S87h21iXE7NLN8iIqEyMYiYw1FiTtRec3eh4sPlnAbHrJ007Pr9MB AQzdyigSVQbmWHLVsP49D+3n0Ud1+NFIRRv0Eo3A= Date: Tue, 22 Mar 2022 14:45:26 -0700 To: willy@infradead.org,mike.kravetz@oracle.com,kirill@shutemov.name,hch@infradead.org,dhowells@redhat.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 136/227] mm/hugetlb.c: export PageHeadHuge() Message-Id: <20220322214527.05F8DC340EC@smtp.kernel.org> X-Stat-Signature: yub7f65qyzu4rsi3qjno48x7rx47gwke X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 8DF29C002D Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=0NRfyC9m; dmarc=none; spf=pass (imf28.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Rspam-User: X-HE-Tag: 1647985529-770832 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: David Howells Subject: mm/hugetlb.c: export PageHeadHuge() Export PageHeadHuge() - it's used by folio_test_hugetlb() and thence by such as folio_file_page() and folio_contains(). Matthew suggested I use the first of those instead of doing the same calculation manually - but I can't call it from a module. Kirill suggested rearranging things to put it in a header, but that introduces header dependencies because of where constants are defined. [akpm@linux-foundation.org: s/EXPORT_SYMBOL/EXPORT_SYMBOL_GPL/, per Christoph] Link: https://lkml.kernel.org/r/2494562.1646054576@warthog.procyon.org.uk Link: https://lore.kernel.org/r/163707085314.3221130.14783857863702203440.stgit@warthog.procyon.org.uk/ Signed-off-by: David Howells Cc: Matthew Wilcox (Oracle) Cc: Kirill A. Shutemov Cc: Christoph Hellwig Cc: Mike Kravetz Signed-off-by: Andrew Morton --- mm/hugetlb.c | 1 + 1 file changed, 1 insertion(+) --- a/mm/hugetlb.c~mm-export-pageheadhuge +++ a/mm/hugetlb.c @@ -1855,6 +1855,7 @@ int PageHeadHuge(struct page *page_head) return page_head[1].compound_dtor == HUGETLB_PAGE_DTOR; } +EXPORT_SYMBOL_GPL(PageHeadHuge); /* * Find and lock address space (mapping) in write mode. From patchwork Tue Mar 22 21:45:29 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789221 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A9C66C433EF for ; Tue, 22 Mar 2022 21:45:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EFCC76B0134; Tue, 22 Mar 2022 17:45:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E364B6B0135; Tue, 22 Mar 2022 17:45:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CFD4E6B0136; Tue, 22 Mar 2022 17:45:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.28]) by kanga.kvack.org (Postfix) with ESMTP id BE8A36B0134 for ; Tue, 22 Mar 2022 17:45:31 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 921EA16E6 for ; Tue, 22 Mar 2022 21:45:31 +0000 (UTC) X-FDA: 79273354062.03.AE4F95B Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf29.hostedemail.com (Postfix) with ESMTP id 1A411120022 for ; Tue, 22 Mar 2022 21:45:30 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 873526119A; Tue, 22 Mar 2022 21:45:30 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id DC679C340EC; Tue, 22 Mar 2022 21:45:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985530; bh=4St+tgtH1Jiny/9gcE5NZiAuSuxhgUqN6h/sMNCrcOo=; h=Date:To:From:In-Reply-To:Subject:From; b=S1uAoqi///NFwCX24MtkGNPepHOZfwW155zP4chrh2hVdkY79uJtB1qbqmlgPDDe/ d0Rx+TLhOzr/fyOYkooGv0iUnvRciVPcpK2PWshewPCpaciNPI3ZRJ6eXIEQdi1+00 Dpa0wpWzmZjRNqAoSery7cvJ8qtZtXdlvyi4mbag= Date: Tue, 22 Mar 2022 14:45:29 -0700 To: anshuman.khandual@arm.com,linmiaohe@huawei.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 137/227] mm: remove unneeded local variable follflags Message-Id: <20220322214529.DC679C340EC@smtp.kernel.org> X-Stat-Signature: g9km1bsbjxywf1ht4wk7jzuwjr5uwgc8 X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 1A411120022 Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b="S1uAoqi/"; dmarc=none; spf=pass (imf29.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Rspam-User: X-HE-Tag: 1647985530-406589 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Miaohe Lin Subject: mm: remove unneeded local variable follflags We can pass FOLL_GET | FOLL_DUMP to follow_page directly to simplify the code a bit in add_page_for_migration and split_huge_pages_pid. Link: https://lkml.kernel.org/r/20220311072002.35575-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin Reviewed-by: Anshuman Khandual Signed-off-by: Andrew Morton --- mm/huge_memory.c | 4 +--- mm/migrate.c | 4 +--- 2 files changed, 2 insertions(+), 6 deletions(-) --- a/mm/huge_memory.c~mm-remove-unneeded-local-variable-follflags-v2 +++ a/mm/huge_memory.c @@ -2953,7 +2953,6 @@ static int split_huge_pages_pid(int pid, */ for (addr = vaddr_start; addr < vaddr_end; addr += PAGE_SIZE) { struct vm_area_struct *vma = find_vma(mm, addr); - unsigned int follflags; struct page *page; if (!vma || addr < vma->vm_start) @@ -2966,8 +2965,7 @@ static int split_huge_pages_pid(int pid, } /* FOLL_DUMP to ignore special (like zero) pages */ - follflags = FOLL_GET | FOLL_DUMP; - page = follow_page(vma, addr, follflags); + page = follow_page(vma, addr, FOLL_GET | FOLL_DUMP); if (IS_ERR(page)) continue; --- a/mm/migrate.c~mm-remove-unneeded-local-variable-follflags-v2 +++ a/mm/migrate.c @@ -1611,7 +1611,6 @@ static int add_page_for_migration(struct { struct vm_area_struct *vma; struct page *page; - unsigned int follflags; int err; mmap_read_lock(mm); @@ -1621,8 +1620,7 @@ static int add_page_for_migration(struct goto out; /* FOLL_DUMP to ignore special (like zero) pages */ - follflags = FOLL_GET | FOLL_DUMP; - page = follow_page(vma, addr, follflags); + page = follow_page(vma, addr, FOLL_GET | FOLL_DUMP); err = PTR_ERR(page); if (IS_ERR(page)) From patchwork Tue Mar 22 21:45:32 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789222 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DD90AC433FE for ; Tue, 22 Mar 2022 21:45:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 75AE06B0138; Tue, 22 Mar 2022 17:45:36 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 70AB56B0139; Tue, 22 Mar 2022 17:45:36 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5D2166B013A; Tue, 22 Mar 2022 17:45:36 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0142.hostedemail.com [216.40.44.142]) by kanga.kvack.org (Postfix) with ESMTP id 4C4F66B0138 for ; Tue, 22 Mar 2022 17:45:36 -0400 (EDT) Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 10B381828AE4F for ; Tue, 22 Mar 2022 21:45:36 +0000 (UTC) X-FDA: 79273354272.29.006EAB6 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf26.hostedemail.com (Postfix) with ESMTP id 6E81A140032 for ; Tue, 22 Mar 2022 21:45:35 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 57065B81DAF; Tue, 22 Mar 2022 21:45:34 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id F2426C340EC; Tue, 22 Mar 2022 21:45:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985533; bh=+5TUPhQ/6ZbOV7CYkFiqJ8UHMbcsztMM9/gHKodPFi0=; h=Date:To:From:In-Reply-To:Subject:From; b=p9E78Kf0r21zwNZJu8g7VK+1MQm7Vyk4AW5RkyLMRR9fca5AVQ9DU31kepYyB1Yws c5snnsZAseDZa6HTZBVjTdFDcELgmK22sBlw3qCTIcSWQ0GFAJh00Ka+SU++F86rP5 G8znH9k/5imvaasTT9fXNKmwqo7NtzqSP5soyo6g= Date: Tue, 22 Mar 2022 14:45:32 -0700 To: rppt@linux.ibm.com,peterx@redhat.com,jack@suse.cz,david@redhat.com,aarcange@redhat.com,namit@vmware.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 138/227] userfaultfd: provide unmasked address on page-fault Message-Id: <20220322214532.F2426C340EC@smtp.kernel.org> X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 6E81A140032 X-Stat-Signature: 5f7i5xd683cafjzc9fs4qs3yrc9k4iks Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=p9E78Kf0; dmarc=none; spf=pass (imf26.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Rspam-User: X-HE-Tag: 1647985535-78203 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Nadav Amit Subject: userfaultfd: provide unmasked address on page-fault Userfaultfd is supposed to provide the full address (i.e., unmasked) of the faulting access back to userspace. However, that is not the case for quite some time. Even running "userfaultfd_demo" from the userfaultfd man page provides the wrong output (and contradicts the man page). Notice that "UFFD_EVENT_PAGEFAULT event" shows the masked address (7fc5e30b3000) and not the first read address (0x7fc5e30b300f). Address returned by mmap() = 0x7fc5e30b3000 fault_handler_thread(): poll() returns: nready = 1; POLLIN = 1; POLLERR = 0 UFFD_EVENT_PAGEFAULT event: flags = 0; address = 7fc5e30b3000 (uffdio_copy.copy returned 4096) Read address 0x7fc5e30b300f in main(): A Read address 0x7fc5e30b340f in main(): A Read address 0x7fc5e30b380f in main(): A Read address 0x7fc5e30b3c0f in main(): A The exact address is useful for various reasons and specifically for prefetching decisions. If it is known that the memory is populated by certain objects whose size is not page-aligned, then based on the faulting address, the uffd-monitor can decide whether to prefetch and prefault the adjacent page. This bug has been for quite some time in the kernel: since commit 1a29d85eb0f1 ("mm: use vmf->address instead of of vmf->virtual_address") vmf->virtual_address"), which dates back to 2016. A concern has been raised that existing userspace application might rely on the old/wrong behavior in which the address is masked. Therefore, it was suggested to provide the masked address unless the user explicitly asks for the exact address. Add a new userfaultfd feature UFFD_FEATURE_EXACT_ADDRESS to direct userfaultfd to provide the exact address. Add a new "real_address" field to vmf to hold the unmasked address. Provide the address to userspace accordingly. Initialize real_address in various code-paths to be consistent with address, even when it is not used, to be on the safe side. [namit@vmware.com: initialize real_address on all code paths, per Jan] Link: https://lkml.kernel.org/r/20220226022655.350562-1-namit@vmware.com [akpm@linux-foundation.org: fix typo in comment, per Jan] Link: https://lkml.kernel.org/r/20220218041003.3508-1-namit@vmware.com Signed-off-by: Nadav Amit Acked-by: Peter Xu Reviewed-by: David Hildenbrand Acked-by: Mike Rapoport Reviewed-by: Jan Kara Cc: Andrea Arcangeli Signed-off-by: Andrew Morton --- fs/userfaultfd.c | 5 ++++- include/linux/mm.h | 3 ++- include/uapi/linux/userfaultfd.h | 8 +++++++- mm/hugetlb.c | 6 ++++-- mm/memory.c | 1 + mm/swapfile.c | 1 + 6 files changed, 19 insertions(+), 5 deletions(-) --- a/fs/userfaultfd.c~userfaultfd-provide-unmasked-address-on-page-fault +++ a/fs/userfaultfd.c @@ -198,6 +198,9 @@ static inline struct uffd_msg userfault_ struct uffd_msg msg; msg_init(&msg); msg.event = UFFD_EVENT_PAGEFAULT; + + if (!(features & UFFD_FEATURE_EXACT_ADDRESS)) + address &= PAGE_MASK; msg.arg.pagefault.address = address; /* * These flags indicate why the userfault occurred: @@ -482,7 +485,7 @@ vm_fault_t handle_userfault(struct vm_fa init_waitqueue_func_entry(&uwq.wq, userfaultfd_wake_function); uwq.wq.private = current; - uwq.msg = userfault_msg(vmf->address, vmf->flags, reason, + uwq.msg = userfault_msg(vmf->real_address, vmf->flags, reason, ctx->features); uwq.ctx = ctx; uwq.waken = false; --- a/include/linux/mm.h~userfaultfd-provide-unmasked-address-on-page-fault +++ a/include/linux/mm.h @@ -478,7 +478,8 @@ struct vm_fault { struct vm_area_struct *vma; /* Target VMA */ gfp_t gfp_mask; /* gfp mask to be used for allocations */ pgoff_t pgoff; /* Logical page offset based on vma */ - unsigned long address; /* Faulting virtual address */ + unsigned long address; /* Faulting virtual address - masked */ + unsigned long real_address; /* Faulting virtual address - unmasked */ }; enum fault_flag flags; /* FAULT_FLAG_xxx flags * XXX: should really be 'const' */ --- a/include/uapi/linux/userfaultfd.h~userfaultfd-provide-unmasked-address-on-page-fault +++ a/include/uapi/linux/userfaultfd.h @@ -32,7 +32,8 @@ UFFD_FEATURE_SIGBUS | \ UFFD_FEATURE_THREAD_ID | \ UFFD_FEATURE_MINOR_HUGETLBFS | \ - UFFD_FEATURE_MINOR_SHMEM) + UFFD_FEATURE_MINOR_SHMEM | \ + UFFD_FEATURE_EXACT_ADDRESS) #define UFFD_API_IOCTLS \ ((__u64)1 << _UFFDIO_REGISTER | \ (__u64)1 << _UFFDIO_UNREGISTER | \ @@ -189,6 +190,10 @@ struct uffdio_api { * * UFFD_FEATURE_MINOR_SHMEM indicates the same support as * UFFD_FEATURE_MINOR_HUGETLBFS, but for shmem-backed pages instead. + * + * UFFD_FEATURE_EXACT_ADDRESS indicates that the exact address of page + * faults would be provided and the offset within the page would not be + * masked. */ #define UFFD_FEATURE_PAGEFAULT_FLAG_WP (1<<0) #define UFFD_FEATURE_EVENT_FORK (1<<1) @@ -201,6 +206,7 @@ struct uffdio_api { #define UFFD_FEATURE_THREAD_ID (1<<8) #define UFFD_FEATURE_MINOR_HUGETLBFS (1<<9) #define UFFD_FEATURE_MINOR_SHMEM (1<<10) +#define UFFD_FEATURE_EXACT_ADDRESS (1<<11) __u64 features; __u64 ioctls; --- a/mm/hugetlb.c~userfaultfd-provide-unmasked-address-on-page-fault +++ a/mm/hugetlb.c @@ -5341,6 +5341,7 @@ static inline vm_fault_t hugetlb_handle_ pgoff_t idx, unsigned int flags, unsigned long haddr, + unsigned long addr, unsigned long reason) { vm_fault_t ret; @@ -5348,6 +5349,7 @@ static inline vm_fault_t hugetlb_handle_ struct vm_fault vmf = { .vma = vma, .address = haddr, + .real_address = addr, .flags = flags, /* @@ -5416,7 +5418,7 @@ retry: /* Check for page in userfault range */ if (userfaultfd_missing(vma)) { ret = hugetlb_handle_userfault(vma, mapping, idx, - flags, haddr, + flags, haddr, address, VM_UFFD_MISSING); goto out; } @@ -5480,7 +5482,7 @@ retry: unlock_page(page); put_page(page); ret = hugetlb_handle_userfault(vma, mapping, idx, - flags, haddr, + flags, haddr, address, VM_UFFD_MINOR); goto out; } --- a/mm/memory.c~userfaultfd-provide-unmasked-address-on-page-fault +++ a/mm/memory.c @@ -4633,6 +4633,7 @@ static vm_fault_t __handle_mm_fault(stru struct vm_fault vmf = { .vma = vma, .address = address & PAGE_MASK, + .real_address = address, .flags = flags, .pgoff = linear_page_index(vma, address), .gfp_mask = __get_fault_gfp_mask(vma), --- a/mm/swapfile.c~userfaultfd-provide-unmasked-address-on-page-fault +++ a/mm/swapfile.c @@ -1951,6 +1951,7 @@ static int unuse_pte_range(struct vm_are struct vm_fault vmf = { .vma = vma, .address = addr, + .real_address = addr, .pmd = pmd, }; From patchwork Tue Mar 22 21:45:35 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789223 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7FFA6C43219 for ; Tue, 22 Mar 2022 21:45:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 143EF6B013A; Tue, 22 Mar 2022 17:45:38 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 055576B013B; Tue, 22 Mar 2022 17:45:38 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E0FE96B013C; Tue, 22 Mar 2022 17:45:37 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.26]) by kanga.kvack.org (Postfix) with ESMTP id CC1FE6B013A for ; Tue, 22 Mar 2022 17:45:37 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 9BBDD23FE6 for ; Tue, 22 Mar 2022 21:45:37 +0000 (UTC) X-FDA: 79273354314.08.16879F8 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf19.hostedemail.com (Postfix) with ESMTP id 2875B1A0036 for ; Tue, 22 Mar 2022 21:45:37 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 991166104C; Tue, 22 Mar 2022 21:45:36 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id EFA3DC340EC; Tue, 22 Mar 2022 21:45:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985536; bh=koN+O93h9DoI06AbB3NNqvHiV3pa1CEpORDqQtz7MlY=; h=Date:To:From:In-Reply-To:Subject:From; b=mmRClisvqLONimhzS6Jl8prC2DOaOWqTd/7U/ZY1/iHvTSRhouVWoEcnHUduH7GRO Mtrgyql6QXsB+uZAy/i2zRR+w8Lyn8Za5EXd5roaBrzLzlY+bwxbAfylVCRhAj8XAv c4q3/A98BRJW2feZMgcuBeYRETxNJCXKdENTNwjM= Date: Tue, 22 Mar 2022 14:45:35 -0700 To: shuah@kernel.org,guozhengkui@vivo.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 139/227] userfaultfd/selftests: fix uninitialized_var.cocci warning Message-Id: <20220322214535.EFA3DC340EC@smtp.kernel.org> Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=mmRClisv; spf=pass (imf19.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspam-User: X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 2875B1A0036 X-Stat-Signature: dfu35119a5tthm33g5w6rn3g4e1nb9t1 X-HE-Tag: 1647985536-222366 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Guo Zhengkui Subject: userfaultfd/selftests: fix uninitialized_var.cocci warning Fix following coccicheck warning: tools/testing/selftests/vm/userfaultfd.c:556:23-24: WARNING this kind of initialization is deprecated `unsigned long page_nr = *(&page_nr)` has the same form of uninitialized_var() macro. I remove the redundant assignement. It has been tested with gcc (Debian 8.3.0-6) 8.3.0. The patch which removed uninitialized_var() is: https://lore.kernel.org/all/20121028102007.GA7547@gmail.com/ And there is very few "/* GCC */" comments in the Linux kernel code now. Link: https://lkml.kernel.org/r/20220304082333.9252-1-guozhengkui@vivo.com Signed-off-by: Guo Zhengkui Cc: Shuah Khan Signed-off-by: Andrew Morton --- tools/testing/selftests/vm/userfaultfd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/tools/testing/selftests/vm/userfaultfd.c~userfaultfd-selftests-fix-uninitialized_varcocci-warning +++ a/tools/testing/selftests/vm/userfaultfd.c @@ -540,7 +540,7 @@ static void continue_range(int ufd, __u6 static void *locking_thread(void *arg) { unsigned long cpu = (unsigned long) arg; - unsigned long page_nr = *(&(page_nr)); /* uninitialized warning */ + unsigned long page_nr; unsigned long long count; if (!(bounces & BOUNCE_RANDOM)) { From patchwork Tue Mar 22 21:45:38 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789224 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DF2B3C433EF for ; Tue, 22 Mar 2022 21:45:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 783526B013C; Tue, 22 Mar 2022 17:45:42 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 731376B013D; Tue, 22 Mar 2022 17:45:42 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 584A86B013E; Tue, 22 Mar 2022 17:45:42 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.a.hostedemail.com [64.99.140.24]) by kanga.kvack.org (Postfix) with ESMTP id 488B46B013C for ; Tue, 22 Mar 2022 17:45:42 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 1C61124293 for ; Tue, 22 Mar 2022 21:45:42 +0000 (UTC) X-FDA: 79273354524.07.1D9A40C Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf03.hostedemail.com (Postfix) with ESMTP id 6D8F920033 for ; Tue, 22 Mar 2022 21:45:41 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 5CDFDB81DAD; Tue, 22 Mar 2022 21:45:40 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 03621C340EE; Tue, 22 Mar 2022 21:45:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985539; bh=EioTn1AYwKLEaPNTWCWOFb/pRNMlD4WN2TI+GFaUeJ8=; h=Date:To:From:In-Reply-To:Subject:From; b=p96dNi2j6jLI67sQUAX7c5LC5z00zLvjrTTSJuis3p7GDjDQztcYow2qy+ILvXOiy Wr3ydweFPqsnu5NvRROkN3vX3mbGYt4TFL7ecbrUyui3Ti6WnV8V0hBYTiEbgJDuFn z87ieqAjqZ6WzTyqOgi44uQxT8LCRANCrsyGbi6U= Date: Tue, 22 Mar 2022 14:45:38 -0700 To: willy@infradead.org,neilb@suse.de,jack@suse.de,djwong@kernel.org,david@fromorbit.com,hughd@google.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 140/227] mm/fs: delete PF_SWAPWRITE Message-Id: <20220322214539.03621C340EE@smtp.kernel.org> X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 6D8F920033 X-Rspam-User: Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=p96dNi2j; dmarc=none; spf=pass (imf03.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Stat-Signature: fjywywj7f6ii9ikcejxgz5yz4mz4qhpd X-HE-Tag: 1647985541-591927 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Hugh Dickins Subject: mm/fs: delete PF_SWAPWRITE PF_SWAPWRITE has been redundant since v3.2 commit ee72886d8ed5 ("mm: vmscan: do not writeback filesystem pages in direct reclaim"). Coincidentally, NeilBrown's current patch "remove inode_congested()" deletes may_write_to_inode(), which appeared to be the one function which took notice of PF_SWAPWRITE. But if you study the old logic, and the conditions under which may_write_to_inode() was called, you discover that flag and function have been pointless for a decade. Link: https://lkml.kernel.org/r/75e80e7-742d-e3bd-531-614db8961e4@google.com Signed-off-by: Hugh Dickins Cc: NeilBrown Cc: Jan Kara Cc: "Darrick J. Wong" Cc: Dave Chinner Cc: Matthew Wilcox Signed-off-by: Andrew Morton --- fs/fs-writeback.c | 3 --- fs/xfs/libxfs/xfs_btree.c | 2 +- include/linux/sched.h | 1 - mm/migrate.c | 7 ------- mm/vmscan.c | 8 ++------ 5 files changed, 3 insertions(+), 18 deletions(-) --- a/fs/fs-writeback.c~mm-fs-delete-pf_swapwrite +++ a/fs/fs-writeback.c @@ -2197,7 +2197,6 @@ void wb_workfn(struct work_struct *work) long pages_written; set_worker_desc("flush-%s", bdi_dev_name(wb->bdi)); - current->flags |= PF_SWAPWRITE; if (likely(!current_is_workqueue_rescuer() || !test_bit(WB_registered, &wb->state))) { @@ -2226,8 +2225,6 @@ void wb_workfn(struct work_struct *work) wb_wakeup(wb); else if (wb_has_dirty_io(wb) && dirty_writeback_interval) wb_wakeup_delayed(wb); - - current->flags &= ~PF_SWAPWRITE; } /* --- a/fs/xfs/libxfs/xfs_btree.c~mm-fs-delete-pf_swapwrite +++ a/fs/xfs/libxfs/xfs_btree.c @@ -2818,7 +2818,7 @@ xfs_btree_split_worker( * in any way. */ if (args->kswapd) - new_pflags |= PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD; + new_pflags |= PF_MEMALLOC | PF_KSWAPD; current_set_flags_nested(&pflags, new_pflags); xfs_trans_set_context(args->cur->bc_tp); --- a/include/linux/sched.h~mm-fs-delete-pf_swapwrite +++ a/include/linux/sched.h @@ -1689,7 +1689,6 @@ extern struct pid *cad_pid; * I am cleaning dirty pages from some other bdi. */ #define PF_KTHREAD 0x00200000 /* I am a kernel thread */ #define PF_RANDOMIZE 0x00400000 /* Randomize virtual address space */ -#define PF_SWAPWRITE 0x00800000 /* Allowed to write to swap */ #define PF_NO_SETAFFINITY 0x04000000 /* Userland is not allowed to meddle with cpus_mask */ #define PF_MCE_EARLY 0x08000000 /* Early kill for mce process policy */ #define PF_MEMALLOC_PIN 0x10000000 /* Allocation context constrained to zones which allow long term pinning. */ --- a/mm/migrate.c~mm-fs-delete-pf_swapwrite +++ a/mm/migrate.c @@ -1350,7 +1350,6 @@ int migrate_pages(struct list_head *from bool is_thp = false; struct page *page; struct page *page2; - int swapwrite = current->flags & PF_SWAPWRITE; int rc, nr_subpages; LIST_HEAD(ret_pages); LIST_HEAD(thp_split_pages); @@ -1359,9 +1358,6 @@ int migrate_pages(struct list_head *from trace_mm_migrate_pages_start(mode, reason); - if (!swapwrite) - current->flags |= PF_SWAPWRITE; - thp_subpage_migration: for (pass = 0; pass < 10 && (retry || thp_retry); pass++) { retry = 0; @@ -1516,9 +1512,6 @@ out: trace_mm_migrate_pages(nr_succeeded, nr_failed_pages, nr_thp_succeeded, nr_thp_failed, nr_thp_split, mode, reason); - if (!swapwrite) - current->flags &= ~PF_SWAPWRITE; - if (ret_succeeded) *ret_succeeded = nr_succeeded; --- a/mm/vmscan.c~mm-fs-delete-pf_swapwrite +++ a/mm/vmscan.c @@ -4457,7 +4457,7 @@ static int kswapd(void *p) * us from recursively trying to free more memory as we're * trying to free the first piece of memory in the first place). */ - tsk->flags |= PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD; + tsk->flags |= PF_MEMALLOC | PF_KSWAPD; set_freezable(); WRITE_ONCE(pgdat->kswapd_order, 0); @@ -4508,7 +4508,7 @@ kswapd_try_sleep: goto kswapd_try_sleep; } - tsk->flags &= ~(PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD); + tsk->flags &= ~(PF_MEMALLOC | PF_KSWAPD); return 0; } @@ -4749,11 +4749,8 @@ static int __node_reclaim(struct pglist_ fs_reclaim_acquire(sc.gfp_mask); /* * We need to be able to allocate from the reserves for RECLAIM_UNMAP - * and we also need to be able to write out pages for RECLAIM_WRITE - * and RECLAIM_UNMAP. */ noreclaim_flag = memalloc_noreclaim_save(); - p->flags |= PF_SWAPWRITE; set_task_reclaim_state(p, &sc.reclaim_state); if (node_pagecache_reclaimable(pgdat) > pgdat->min_unmapped_pages) { @@ -4767,7 +4764,6 @@ static int __node_reclaim(struct pglist_ } set_task_reclaim_state(p, NULL); - current->flags &= ~PF_SWAPWRITE; memalloc_noreclaim_restore(noreclaim_flag); fs_reclaim_release(sc.gfp_mask); psi_memstall_leave(&pflags); From patchwork Tue Mar 22 21:45:41 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789225 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E21EBC433F5 for ; Tue, 22 Mar 2022 21:45:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 783D96B013E; Tue, 22 Mar 2022 17:45:45 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 730E96B013F; Tue, 22 Mar 2022 17:45:45 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5D1E26B0140; Tue, 22 Mar 2022 17:45:45 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.26]) by kanga.kvack.org (Postfix) with ESMTP id 4D76D6B013E for ; Tue, 22 Mar 2022 17:45:45 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 26DDD23FAE for ; Tue, 22 Mar 2022 21:45:45 +0000 (UTC) X-FDA: 79273354650.08.EEB7E63 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf18.hostedemail.com (Postfix) with ESMTP id 9055D1C002F for ; Tue, 22 Mar 2022 21:45:44 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 5ED7BB81DB8; Tue, 22 Mar 2022 21:45:43 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id EFBAAC340EE; Tue, 22 Mar 2022 21:45:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985542; bh=SYqAIWdzQCET7gkLPY8nAnHNdVgCb4scExLimb1I8DM=; h=Date:To:From:In-Reply-To:Subject:From; b=EjZr6vcL+/ZZSGYZdQT5choOW4azawRhAR324F5cSWGup8XwI/66ibLzGXtHmRoZv 6NF2LllKKcxD+LOD+oXLNK6isii4DkHBdEQ8Kv3x2E2NEqCxjQcqbYvTXoTAKO9M8Z /4Dpnh8juCRnDlKnqDqQAiUV+qrfI+bkgZtELQtY= Date: Tue, 22 Mar 2022 14:45:41 -0700 To: rientjes@google.com,alexs@kernel.org,alexander.duyck@gmail.com,hughd@google.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 141/227] mm: __isolate_lru_page_prepare() in isolate_migratepages_block() Message-Id: <20220322214541.EFBAAC340EE@smtp.kernel.org> X-Rspam-User: Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=EjZr6vcL; spf=pass (imf18.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 9055D1C002F X-Stat-Signature: xhrc5cdpxi4ptueg37715fae7nyou7qi X-HE-Tag: 1647985544-493574 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Hugh Dickins Subject: mm: __isolate_lru_page_prepare() in isolate_migratepages_block() __isolate_lru_page_prepare() conflates two unrelated functions, with the flags to one disjoint from the flags to the other; and hides some of the important checks outside of isolate_migratepages_block(), where the sequence is better to be visible. It comes from the days of lumpy reclaim, before compaction, when the combination made more sense. Move what's needed by mm/compaction.c isolate_migratepages_block() inline there, and what's needed by mm/vmscan.c isolate_lru_pages() inline there. Shorten "isolate_mode" to "mode", so the sequence of conditions is easier to read. Declare a "mapping" variable, to save one call to page_mapping() (but not another: calling again after page is locked is necessary). Simplify isolate_lru_pages() with a "move_to" list pointer. Link: https://lkml.kernel.org/r/879d62a8-91cc-d3c6-fb3b-69768236df68@google.com Signed-off-by: Hugh Dickins Acked-by: David Rientjes Reviewed-by: Alex Shi Cc: Alexander Duyck Signed-off-by: Andrew Morton --- include/linux/swap.h | 1 mm/compaction.c | 51 +++++++++++++++++--- mm/vmscan.c | 101 +++++++---------------------------------- 3 files changed, 62 insertions(+), 91 deletions(-) --- a/include/linux/swap.h~mm-__isolate_lru_page_prepare-in-isolate_migratepages_block +++ a/include/linux/swap.h @@ -387,7 +387,6 @@ extern void lru_cache_add_inactive_or_un extern unsigned long zone_reclaimable_pages(struct zone *zone); extern unsigned long try_to_free_pages(struct zonelist *zonelist, int order, gfp_t gfp_mask, nodemask_t *mask); -extern bool __isolate_lru_page_prepare(struct page *page, isolate_mode_t mode); extern unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg, unsigned long nr_pages, gfp_t gfp_mask, --- a/mm/compaction.c~mm-__isolate_lru_page_prepare-in-isolate_migratepages_block +++ a/mm/compaction.c @@ -785,7 +785,7 @@ static bool too_many_isolated(pg_data_t * @cc: Compaction control structure. * @low_pfn: The first PFN to isolate * @end_pfn: The one-past-the-last PFN to isolate, within same pageblock - * @isolate_mode: Isolation mode to be used. + * @mode: Isolation mode to be used. * * Isolate all pages that can be migrated from the range specified by * [low_pfn, end_pfn). The range is expected to be within same pageblock. @@ -798,7 +798,7 @@ static bool too_many_isolated(pg_data_t */ static int isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, - unsigned long end_pfn, isolate_mode_t isolate_mode) + unsigned long end_pfn, isolate_mode_t mode) { pg_data_t *pgdat = cc->zone->zone_pgdat; unsigned long nr_scanned = 0, nr_isolated = 0; @@ -806,6 +806,7 @@ isolate_migratepages_block(struct compac unsigned long flags = 0; struct lruvec *locked = NULL; struct page *page = NULL, *valid_page = NULL; + struct address_space *mapping; unsigned long start_pfn = low_pfn; bool skip_on_failure = false; unsigned long next_skip_pfn = 0; @@ -990,7 +991,7 @@ isolate_migratepages_block(struct compac locked = NULL; } - if (!isolate_movable_page(page, isolate_mode)) + if (!isolate_movable_page(page, mode)) goto isolate_success; } @@ -1002,15 +1003,15 @@ isolate_migratepages_block(struct compac * so avoid taking lru_lock and isolating it unnecessarily in an * admittedly racy check. */ - if (!page_mapping(page) && - page_count(page) > page_mapcount(page)) + mapping = page_mapping(page); + if (!mapping && page_count(page) > page_mapcount(page)) goto isolate_fail; /* * Only allow to migrate anonymous pages in GFP_NOFS context * because those do not depend on fs locks. */ - if (!(cc->gfp_mask & __GFP_FS) && page_mapping(page)) + if (!(cc->gfp_mask & __GFP_FS) && mapping) goto isolate_fail; /* @@ -1021,9 +1022,45 @@ isolate_migratepages_block(struct compac if (unlikely(!get_page_unless_zero(page))) goto isolate_fail; - if (!__isolate_lru_page_prepare(page, isolate_mode)) + /* Only take pages on LRU: a check now makes later tests safe */ + if (!PageLRU(page)) goto isolate_fail_put; + /* Compaction might skip unevictable pages but CMA takes them */ + if (!(mode & ISOLATE_UNEVICTABLE) && PageUnevictable(page)) + goto isolate_fail_put; + + /* + * To minimise LRU disruption, the caller can indicate with + * ISOLATE_ASYNC_MIGRATE that it only wants to isolate pages + * it will be able to migrate without blocking - clean pages + * for the most part. PageWriteback would require blocking. + */ + if ((mode & ISOLATE_ASYNC_MIGRATE) && PageWriteback(page)) + goto isolate_fail_put; + + if ((mode & ISOLATE_ASYNC_MIGRATE) && PageDirty(page)) { + bool migrate_dirty; + + /* + * Only pages without mappings or that have a + * ->migratepage callback are possible to migrate + * without blocking. However, we can be racing with + * truncation so it's necessary to lock the page + * to stabilise the mapping as truncation holds + * the page lock until after the page is removed + * from the page cache. + */ + if (!trylock_page(page)) + goto isolate_fail_put; + + mapping = page_mapping(page); + migrate_dirty = !mapping || mapping->a_ops->migratepage; + unlock_page(page); + if (!migrate_dirty) + goto isolate_fail_put; + } + /* Try isolate the page */ if (!TestClearPageLRU(page)) goto isolate_fail_put; --- a/mm/vmscan.c~mm-__isolate_lru_page_prepare-in-isolate_migratepages_block +++ a/mm/vmscan.c @@ -1999,69 +1999,6 @@ unsigned int reclaim_clean_pages_from_li } /* - * Attempt to remove the specified page from its LRU. Only take this page - * if it is of the appropriate PageActive status. Pages which are being - * freed elsewhere are also ignored. - * - * page: page to consider - * mode: one of the LRU isolation modes defined above - * - * returns true on success, false on failure. - */ -bool __isolate_lru_page_prepare(struct page *page, isolate_mode_t mode) -{ - /* Only take pages on the LRU. */ - if (!PageLRU(page)) - return false; - - /* Compaction should not handle unevictable pages but CMA can do so */ - if (PageUnevictable(page) && !(mode & ISOLATE_UNEVICTABLE)) - return false; - - /* - * To minimise LRU disruption, the caller can indicate that it only - * wants to isolate pages it will be able to operate on without - * blocking - clean pages for the most part. - * - * ISOLATE_ASYNC_MIGRATE is used to indicate that it only wants to pages - * that it is possible to migrate without blocking - */ - if (mode & ISOLATE_ASYNC_MIGRATE) { - /* All the caller can do on PageWriteback is block */ - if (PageWriteback(page)) - return false; - - if (PageDirty(page)) { - struct address_space *mapping; - bool migrate_dirty; - - /* - * Only pages without mappings or that have a - * ->migratepage callback are possible to migrate - * without blocking. However, we can be racing with - * truncation so it's necessary to lock the page - * to stabilise the mapping as truncation holds - * the page lock until after the page is removed - * from the page cache. - */ - if (!trylock_page(page)) - return false; - - mapping = page_mapping(page); - migrate_dirty = !mapping || mapping->a_ops->migratepage; - unlock_page(page); - if (!migrate_dirty) - return false; - } - } - - if ((mode & ISOLATE_UNMAPPED) && page_mapped(page)) - return false; - - return true; -} - -/* * Update LRU sizes after isolating pages. The LRU size updates must * be complete before mem_cgroup_update_lru_size due to a sanity check. */ @@ -2112,11 +2049,11 @@ static unsigned long isolate_lru_pages(u unsigned long skipped = 0; unsigned long scan, total_scan, nr_pages; LIST_HEAD(pages_skipped); - isolate_mode_t mode = (sc->may_unmap ? 0 : ISOLATE_UNMAPPED); total_scan = 0; scan = 0; while (scan < nr_to_scan && !list_empty(src)) { + struct list_head *move_to = src; struct page *page; page = lru_to_page(src); @@ -2126,9 +2063,9 @@ static unsigned long isolate_lru_pages(u total_scan += nr_pages; if (page_zonenum(page) > sc->reclaim_idx) { - list_move(&page->lru, &pages_skipped); nr_skipped[page_zonenum(page)] += nr_pages; - continue; + move_to = &pages_skipped; + goto move; } /* @@ -2136,37 +2073,34 @@ static unsigned long isolate_lru_pages(u * return with no isolated pages if the LRU mostly contains * ineligible pages. This causes the VM to not reclaim any * pages, triggering a premature OOM. - * - * Account all tail pages of THP. This would not cause - * premature OOM since __isolate_lru_page() returns -EBUSY - * only when the page is being freed somewhere else. + * Account all tail pages of THP. */ scan += nr_pages; - if (!__isolate_lru_page_prepare(page, mode)) { - /* It is being freed elsewhere */ - list_move(&page->lru, src); - continue; - } + + if (!PageLRU(page)) + goto move; + if (!sc->may_unmap && page_mapped(page)) + goto move; + /* * Be careful not to clear PageLRU until after we're * sure the page is not being freed elsewhere -- the * page release code relies on it. */ - if (unlikely(!get_page_unless_zero(page))) { - list_move(&page->lru, src); - continue; - } + if (unlikely(!get_page_unless_zero(page))) + goto move; if (!TestClearPageLRU(page)) { /* Another thread is already isolating this page */ put_page(page); - list_move(&page->lru, src); - continue; + goto move; } nr_taken += nr_pages; nr_zone_taken[page_zonenum(page)] += nr_pages; - list_move(&page->lru, dst); + move_to = dst; +move: + list_move(&page->lru, move_to); } /* @@ -2190,7 +2124,8 @@ static unsigned long isolate_lru_pages(u } *nr_scanned = total_scan; trace_mm_vmscan_lru_isolate(sc->reclaim_idx, sc->order, nr_to_scan, - total_scan, skipped, nr_taken, mode, lru); + total_scan, skipped, nr_taken, + sc->may_unmap ? 0 : ISOLATE_UNMAPPED, lru); update_lru_sizes(lruvec, lru, nr_zone_taken); return nr_taken; } From patchwork Tue Mar 22 21:45:44 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789226 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C08C9C433F5 for ; Tue, 22 Mar 2022 21:45:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5D4056B0140; Tue, 22 Mar 2022 17:45:48 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 585646B0141; Tue, 22 Mar 2022 17:45:48 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4725A6B0142; Tue, 22 Mar 2022 17:45:48 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0154.hostedemail.com [216.40.44.154]) by kanga.kvack.org (Postfix) with ESMTP id 3757D6B0140 for ; Tue, 22 Mar 2022 17:45:48 -0400 (EDT) Received: from smtpin24.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id F14D6A32B2 for ; Tue, 22 Mar 2022 21:45:47 +0000 (UTC) X-FDA: 79273354734.24.2BBF4A5 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf02.hostedemail.com (Postfix) with ESMTP id 67A9180010 for ; Tue, 22 Mar 2022 21:45:47 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 4DF37B81DAD; Tue, 22 Mar 2022 21:45:46 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0AF3BC340EC; Tue, 22 Mar 2022 21:45:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985545; bh=J09gnwx+pM4OrqpRiCN3GoxRGIEM2JJo77DhAVWt2/o=; h=Date:To:From:In-Reply-To:Subject:From; b=Z29Wx5GtoDHoMxP/UeJv0ahkKwvT7trNuaLYNZYmdV5hP9wwM38v/0M0h1JnS6uZ3 4ZAyhMilkOT0yTM9hfGX3ypoYuhRsMWTMu1iPLN9uBVbQnomGeh5K/4aAK4LkGR9hm jbJy5M2iUhFSr+KzeM25ZzroUYyWoISJFWA21BzI= Date: Tue, 22 Mar 2022 14:45:44 -0700 To: songmuchun@bytedance.com,shakeelb@google.com,roman.gushchin@linux.dev,mhocko@suse.com,hannes@cmpxchg.org,longman@redhat.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 142/227] mm/list_lru: optimize memcg_reparent_list_lru_node() Message-Id: <20220322214545.0AF3BC340EC@smtp.kernel.org> X-Rspam-User: X-Stat-Signature: zzq5omadp63aihx3tt6cp1tcbdka9adn Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=Z29Wx5Gt; spf=pass (imf02.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 67A9180010 X-HE-Tag: 1647985547-148316 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Waiman Long Subject: mm/list_lru: optimize memcg_reparent_list_lru_node() Since commit 2c80cd57c743 ("mm/list_lru.c: fix list_lru_count_node() to be race free"), we are tracking the total number of lru entries in a list_lru_node in its nr_items field. In the case of memcg_reparent_list_lru_node(), there is nothing to be done if nr_items is 0. We don't even need to take the nlru->lock as no new lru entry could be added by a racing list_lru_add() to the draining src_idx memcg at this point. On systems that serve a lot of containers, it is possible that there can be thousands of list_lru's present due to the fact that each container may mount its own container specific filesystems. As a typical container uses only a few cpus, it is likely that only the list_lru_node that contains those cpus will be utilized while the rests may be empty. In other words, there can be a lot of list_lru_node with 0 nr_items. By skipping a lock/unlock operation and loading a cacheline from memcg_lrus, a sizeable number of cpu cycles can be saved. That can be substantial if we are talking about thousands of list_lru_node's with 0 nr_items. Link: https://lkml.kernel.org/r/20220309144000.1470138-1-longman@redhat.com Signed-off-by: Waiman Long Reviewed-by: Roman Gushchin Cc: Muchun Song Cc: Michal Hocko Cc: Johannes Weiner Cc: Shakeel Butt Signed-off-by: Andrew Morton --- mm/list_lru.c | 6 ++++++ 1 file changed, 6 insertions(+) --- a/mm/list_lru.c~mm-list_lru-optimize-memcg_reparent_list_lru_node +++ a/mm/list_lru.c @@ -395,6 +395,12 @@ static void memcg_reparent_list_lru_node struct list_lru_one *src, *dst; /* + * If there is no lru entry in this nlru, we can skip it immediately. + */ + if (!READ_ONCE(nlru->nr_items)) + return; + + /* * Since list_lru_{add,del} may be called under an IRQ-safe lock, * we have to use IRQ-safe primitives here to avoid deadlock. */ From patchwork Tue Mar 22 21:45:47 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789227 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 44342C433EF for ; Tue, 22 Mar 2022 21:45:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C8AA56B0142; Tue, 22 Mar 2022 17:45:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C39806B0143; Tue, 22 Mar 2022 17:45:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id ADB666B0144; Tue, 22 Mar 2022 17:45:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0118.hostedemail.com [216.40.44.118]) by kanga.kvack.org (Postfix) with ESMTP id 9E8096B0142 for ; Tue, 22 Mar 2022 17:45:51 -0400 (EDT) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 5F8A51827C15C for ; Tue, 22 Mar 2022 21:45:51 +0000 (UTC) X-FDA: 79273354902.18.5CBFB51 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf07.hostedemail.com (Postfix) with ESMTP id BBD094001E for ; Tue, 22 Mar 2022 21:45:50 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 6EFF3B81DB7; Tue, 22 Mar 2022 21:45:49 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 101B4C340EE; Tue, 22 Mar 2022 21:45:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985548; bh=nuD6SuBfmNlTRsYolyl+S0xAQThzwrc+qWwimrO1VD8=; h=Date:To:From:In-Reply-To:Subject:From; b=bAcNaAD2Xq1jBAA3MzY1BWeJLg9DDwuyaUdjcWXRiDIUkifXWHzS+h5qAm6J7rAwS taQusEnhZR9NsqkOUBdeEZ9YKGCXSNL8VlOXPZKyCgOTugrnSqAYT/XnSrD0W9QE5R 9roQtvhpItOoFLa07bzoUz+PVUC0tgPOdZkF281w= Date: Tue, 22 Mar 2022 14:45:47 -0700 To: willy@infradead.org,tglx@linutronix.de,paulmck@kernel.org,nsaenzju@redhat.com,minchan@kernel.org,mgorman@techsingularity.net,juri.lelli@redhat.com,bigeasy@linutronix.de,mtosatti@redhat.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 143/227] mm: lru_cache_disable: replace work queue synchronization with synchronize_rcu Message-Id: <20220322214548.101B4C340EE@smtp.kernel.org> X-Stat-Signature: tugbs9jb63jqdhkj5hen4jfc64ibm4jz Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=bAcNaAD2; spf=pass (imf07.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: BBD094001E X-HE-Tag: 1647985550-558475 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Marcelo Tosatti Subject: mm: lru_cache_disable: replace work queue synchronization with synchronize_rcu On systems that run FIFO:1 applications that busy loop, any SCHED_OTHER task that attempts to execute on such a CPU (such as work threads) will not be scheduled, which leads to system hangs. Commit d479960e44f27e0e5 ("mm: disable LRU pagevec during the migration temporarily") relies on queueing work items on all online CPUs to ensure visibility of lru_disable_count. To fix this, replace the usage of work items with synchronize_rcu, which provides the same guarantees. Readers of lru_disable_count are protected by either disabling preemption or rcu_read_lock: preempt_disable, local_irq_disable [bh_lru_lock()] rcu_read_lock [rt_spin_lock CONFIG_PREEMPT_RT] preempt_disable [local_lock !CONFIG_PREEMPT_RT] Since v5.1 kernel, synchronize_rcu() is guaranteed to wait on preempt_disable() regions of code. So any CPU which sees lru_disable_count = 0 will have exited the critical section when synchronize_rcu() returns. Link: https://lkml.kernel.org/r/Yin7hDxdt0s/x+fp@fuller.cnet Signed-off-by: Marcelo Tosatti Reviewed-by: Nicolas Saenz Julienne Acked-by: Minchan Kim Cc: Matthew Wilcox Cc: Mel Gorman Cc: Juri Lelli Cc: Thomas Gleixner Cc: Sebastian Andrzej Siewior Cc: Paul E. McKenney Signed-off-by: Andrew Morton --- mm/swap.c | 23 ++++++++++++++--------- 1 file changed, 14 insertions(+), 9 deletions(-) --- a/mm/swap.c~mm-lru_cache_disable-replace-work-queue-synchronization-with-synchronize_rcu +++ a/mm/swap.c @@ -831,8 +831,7 @@ inline void __lru_add_drain_all(bool for for_each_online_cpu(cpu) { struct work_struct *work = &per_cpu(lru_add_drain_work, cpu); - if (force_all_cpus || - pagevec_count(&per_cpu(lru_pvecs.lru_add, cpu)) || + if (pagevec_count(&per_cpu(lru_pvecs.lru_add, cpu)) || data_race(pagevec_count(&per_cpu(lru_rotate.pvec, cpu))) || pagevec_count(&per_cpu(lru_pvecs.lru_deactivate_file, cpu)) || pagevec_count(&per_cpu(lru_pvecs.lru_deactivate, cpu)) || @@ -876,15 +875,21 @@ atomic_t lru_disable_count = ATOMIC_INIT void lru_cache_disable(void) { atomic_inc(&lru_disable_count); -#ifdef CONFIG_SMP /* - * lru_add_drain_all in the force mode will schedule draining on - * all online CPUs so any calls of lru_cache_disabled wrapped by - * local_lock or preemption disabled would be ordered by that. - * The atomic operation doesn't need to have stronger ordering - * requirements because that is enforced by the scheduling - * guarantees. + * Readers of lru_disable_count are protected by either disabling + * preemption or rcu_read_lock: + * + * preempt_disable, local_irq_disable [bh_lru_lock()] + * rcu_read_lock [rt_spin_lock CONFIG_PREEMPT_RT] + * preempt_disable [local_lock !CONFIG_PREEMPT_RT] + * + * Since v5.1 kernel, synchronize_rcu() is guaranteed to wait on + * preempt_disable() regions of code. So any CPU which sees + * lru_disable_count = 0 will have exited the critical + * section when synchronize_rcu() returns. */ + synchronize_rcu(); +#ifdef CONFIG_SMP __lru_add_drain_all(true); #else lru_add_and_bh_lrus_drain(); From patchwork Tue Mar 22 21:45:50 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789228 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 885F5C433FE for ; Tue, 22 Mar 2022 21:45:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 031716B0143; Tue, 22 Mar 2022 17:45:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id F21536B0145; Tue, 22 Mar 2022 17:45:52 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E3AAD6B0146; Tue, 22 Mar 2022 17:45:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0087.hostedemail.com [216.40.44.87]) by kanga.kvack.org (Postfix) with ESMTP id D2CF06B0143 for ; Tue, 22 Mar 2022 17:45:52 -0400 (EDT) Received: from smtpin30.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 91663A5658 for ; Tue, 22 Mar 2022 21:45:52 +0000 (UTC) X-FDA: 79273354944.30.5341EC6 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf27.hostedemail.com (Postfix) with ESMTP id 29B4840035 for ; Tue, 22 Mar 2022 21:45:52 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id A5A0C6119A; Tue, 22 Mar 2022 21:45:51 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0E6EAC340F2; Tue, 22 Mar 2022 21:45:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985551; bh=5y73nK+IfYhE2dzuCLdG3voFW/Zf/4PV3Up3kcCaWYI=; h=Date:To:From:In-Reply-To:Subject:From; b=sjtkuscVntASDfgT3zmiaNhOOI6kH82m0qm0MTFbnF2tmYxduAtM419gkhGLEi5JB NXfq3dZkh2PAY1+SfTIufW58Lpgn5RlbGbSWaNBMsg6hGCxcY2BYnvXZTvOWbumwxR HoHjrdtu3koKT0HSbslAl2ytwOO8dusRMgdw7ihA= Date: Tue, 22 Mar 2022 14:45:50 -0700 To: tj@kernel.org,tglx@linutronix.de,lizefan.x@bytedance.com,hannes@cmpxchg.org,bigeasy@linutronix.de,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 144/227] mm: workingset: replace IRQ-off check with a lockdep assert. Message-Id: <20220322214551.0E6EAC340F2@smtp.kernel.org> X-Stat-Signature: xgkpaaodic8df9acwdepfroofkupexto Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=sjtkuscV; spf=pass (imf27.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspam-User: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 29B4840035 X-HE-Tag: 1647985552-817603 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Sebastian Andrzej Siewior Subject: mm: workingset: replace IRQ-off check with a lockdep assert. Commit 68d48e6a2df57 ("mm: workingset: add vmstat counter for shadow nodes") introduced an IRQ-off check to ensure that a lock is held which also disabled interrupts. This does not work the same way on PREEMPT_RT because none of the locks, that are held, disable interrupts. Replace this check with a lockdep assert which ensures that the lock is held. Link: https://lkml.kernel.org/r/20220301122143.1521823-3-bigeasy@linutronix.de Signed-off-by: Sebastian Andrzej Siewior Cc: Johannes Weiner Cc: Tejun Heo Cc: Zefan Li Cc: Thomas Gleixner Signed-off-by: Andrew Morton --- mm/workingset.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) --- a/mm/workingset.c~mm-workingset-replace-irq-off-check-with-a-lockdep-assert +++ a/mm/workingset.c @@ -433,6 +433,8 @@ struct list_lru shadow_nodes; void workingset_update_node(struct xa_node *node) { + struct address_space *mapping; + /* * Track non-empty nodes that contain only shadow entries; * unlink those that contain pages or are being freed. @@ -441,7 +443,8 @@ void workingset_update_node(struct xa_no * already where they should be. The list_empty() test is safe * as node->private_list is protected by the i_pages lock. */ - VM_WARN_ON_ONCE(!irqs_disabled()); /* For __inc_lruvec_page_state */ + mapping = container_of(node->array, struct address_space, i_pages); + lockdep_assert_held(&mapping->i_pages.xa_lock); if (node->count && node->count == node->nr_values) { if (list_empty(&node->private_list)) { From patchwork Tue Mar 22 21:45:53 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789229 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E2BA0C433EF for ; Tue, 22 Mar 2022 21:45:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7CF846B0146; Tue, 22 Mar 2022 17:45:57 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 77EE96B0147; Tue, 22 Mar 2022 17:45:57 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 66E076B0148; Tue, 22 Mar 2022 17:45:57 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0049.hostedemail.com [216.40.44.49]) by kanga.kvack.org (Postfix) with ESMTP id 551B46B0146 for ; Tue, 22 Mar 2022 17:45:57 -0400 (EDT) Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 1B670A4DCE for ; Tue, 22 Mar 2022 21:45:57 +0000 (UTC) X-FDA: 79273355154.29.349FDA6 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf27.hostedemail.com (Postfix) with ESMTP id 7C2DE4003A for ; Tue, 22 Mar 2022 21:45:56 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 6110AB81DAF; Tue, 22 Mar 2022 21:45:55 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 08273C340EC; Tue, 22 Mar 2022 21:45:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985554; bh=rkVyZCLtKId1gqM4EKvVgPOCeh/3NOaqkpucbTzplJU=; h=Date:To:From:In-Reply-To:Subject:From; b=pR2xMY8NY5adcB2gvwwTAHHq0TKoaWsUe6151vAC62IoBZMrlAfjhkg6UKmE4uufO s3miP1zLFP73kMYmLrUcJb9qaOAsnDjBfYZ1m/thBtzeAW02k1ZyuoZvRP+MzpuIGY kI4So82P/ZAugJuzFNjW2kuW6ZwTw22/FptLHeKo= Date: Tue, 22 Mar 2022 14:45:53 -0700 To: vbabka@suse.cz,mhocko@suse.com,iamjoonsoo.kim@lge.com,hannes@cmpxchg.org,quic_charante@quicinc.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 145/227] mm: vmscan: fix documentation for page_check_references() Message-Id: <20220322214554.08273C340EC@smtp.kernel.org> X-Stat-Signature: exqizrggmwbr4kmz59n4yan111cegdf6 Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=pR2xMY8N; spf=pass (imf27.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspam-User: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 7C2DE4003A X-HE-Tag: 1647985556-225584 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Charan Teja Kalla Subject: mm: vmscan: fix documentation for page_check_references() commit b518154e59aa ("mm/vmscan: protect the workingset on anonymous LRU") requires to look twice for both mapped anon/file pages are used more than once to take the decission of reclaim or activation. Correct the documentation accordingly. Link: https://lkml.kernel.org/r/1646925640-21324-1-git-send-email-quic_charante@quicinc.com Signed-off-by: Charan Teja Kalla Cc: Joonsoo Kim Cc: Michal Hocko Cc: Johannes Weiner Cc: Vlastimil Babka Signed-off-by: Andrew Morton --- mm/vmscan.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/mm/vmscan.c~mm-vmscan-fix-documentation-for-page_check_references +++ a/mm/vmscan.c @@ -1385,7 +1385,7 @@ static enum page_references page_check_r /* * All mapped pages start out with page table * references from the instantiating fault, so we need - * to look twice if a mapped file page is used more + * to look twice if a mapped file/anon page is used more * than once. * * Mark it and spare it for another trip around the From patchwork Tue Mar 22 21:45:56 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789230 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C1B91C4167B for ; Tue, 22 Mar 2022 21:46:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5ACE86B0148; Tue, 22 Mar 2022 17:46:00 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 55BA66B0149; Tue, 22 Mar 2022 17:46:00 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 44BA16B014A; Tue, 22 Mar 2022 17:46:00 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0243.hostedemail.com [216.40.44.243]) by kanga.kvack.org (Postfix) with ESMTP id 35C106B0148 for ; Tue, 22 Mar 2022 17:46:00 -0400 (EDT) Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id E89AB18259C5A for ; Tue, 22 Mar 2022 21:45:59 +0000 (UTC) X-FDA: 79273355238.29.89053CF Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf27.hostedemail.com (Postfix) with ESMTP id 544B540038 for ; Tue, 22 Mar 2022 21:45:59 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 35B4CB81DB7; Tue, 22 Mar 2022 21:45:58 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id E507AC340EC; Tue, 22 Mar 2022 21:45:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985557; bh=Zz3C5jyZcF3r7n9TLJbiOQTwALX+wG89V4HOV0ON8Ww=; h=Date:To:From:In-Reply-To:Subject:From; b=oK+TTQ1IsDbLedzlt9cvWJikwYu5U4B2pHTxa3GCkiDgq2BJ6p2iwBEaW8oPN8yRz AuptZEiwDTtti9/N8kYes9XIp4wieRqJf+o09k5liBBpEcWFmSvQLwZO6RfcdHGqQD E/JTBdYBPLmgBfu6IqySJwVGjhgplAgK6qpMO7r8= Date: Tue, 22 Mar 2022 14:45:56 -0700 To: rostedt@goodmis.org,mingo@redhat.com,baolin.wang@linux.alibaba.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 146/227] mm: compaction: cleanup the compaction trace events Message-Id: <20220322214556.E507AC340EC@smtp.kernel.org> X-Rspam-User: Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=oK+TTQ1I; spf=pass (imf27.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 544B540038 X-Stat-Signature: 7zx41gwqboebjfmskiiqsyxh71sxsc5t X-HE-Tag: 1647985559-373541 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Baolin Wang Subject: mm: compaction: cleanup the compaction trace events As Steven suggested [1], we should access the pointers from the trace event to avoid dereferencing them to the tracepoint function when the tracepoint is disabled. [1] https://lkml.org/lkml/2021/11/3/409 Link: https://lkml.kernel.org/r/4cd393b4d57f8f01ed72c001509b28e3a3b1a8c1.1646985115.git.baolin.wang@linux.alibaba.com Signed-off-by: Baolin Wang Cc: Steven Rostedt (Google) Cc: Ingo Molnar Signed-off-by: Andrew Morton --- include/trace/events/compaction.h | 26 +++++++++++++------------- mm/compaction.c | 9 +++------ 2 files changed, 16 insertions(+), 19 deletions(-) --- a/include/trace/events/compaction.h~mm-compaction-cleanup-the-compaction-trace-events +++ a/include/trace/events/compaction.h @@ -67,10 +67,10 @@ DEFINE_EVENT(mm_compaction_isolate_templ #ifdef CONFIG_COMPACTION TRACE_EVENT(mm_compaction_migratepages, - TP_PROTO(unsigned long nr_all, + TP_PROTO(struct compact_control *cc, unsigned int nr_succeeded), - TP_ARGS(nr_all, nr_succeeded), + TP_ARGS(cc, nr_succeeded), TP_STRUCT__entry( __field(unsigned long, nr_migrated) @@ -79,7 +79,7 @@ TRACE_EVENT(mm_compaction_migratepages, TP_fast_assign( __entry->nr_migrated = nr_succeeded; - __entry->nr_failed = nr_all - nr_succeeded; + __entry->nr_failed = cc->nr_migratepages - nr_succeeded; ), TP_printk("nr_migrated=%lu nr_failed=%lu", @@ -88,10 +88,10 @@ TRACE_EVENT(mm_compaction_migratepages, ); TRACE_EVENT(mm_compaction_begin, - TP_PROTO(unsigned long zone_start, unsigned long migrate_pfn, - unsigned long free_pfn, unsigned long zone_end, bool sync), + TP_PROTO(struct compact_control *cc, unsigned long zone_start, + unsigned long zone_end, bool sync), - TP_ARGS(zone_start, migrate_pfn, free_pfn, zone_end, sync), + TP_ARGS(cc, zone_start, zone_end, sync), TP_STRUCT__entry( __field(unsigned long, zone_start) @@ -103,8 +103,8 @@ TRACE_EVENT(mm_compaction_begin, TP_fast_assign( __entry->zone_start = zone_start; - __entry->migrate_pfn = migrate_pfn; - __entry->free_pfn = free_pfn; + __entry->migrate_pfn = cc->migrate_pfn; + __entry->free_pfn = cc->free_pfn; __entry->zone_end = zone_end; __entry->sync = sync; ), @@ -118,11 +118,11 @@ TRACE_EVENT(mm_compaction_begin, ); TRACE_EVENT(mm_compaction_end, - TP_PROTO(unsigned long zone_start, unsigned long migrate_pfn, - unsigned long free_pfn, unsigned long zone_end, bool sync, + TP_PROTO(struct compact_control *cc, unsigned long zone_start, + unsigned long zone_end, bool sync, int status), - TP_ARGS(zone_start, migrate_pfn, free_pfn, zone_end, sync, status), + TP_ARGS(cc, zone_start, zone_end, sync, status), TP_STRUCT__entry( __field(unsigned long, zone_start) @@ -135,8 +135,8 @@ TRACE_EVENT(mm_compaction_end, TP_fast_assign( __entry->zone_start = zone_start; - __entry->migrate_pfn = migrate_pfn; - __entry->free_pfn = free_pfn; + __entry->migrate_pfn = cc->migrate_pfn; + __entry->free_pfn = cc->free_pfn; __entry->zone_end = zone_end; __entry->sync = sync; __entry->status = status; --- a/mm/compaction.c~mm-compaction-cleanup-the-compaction-trace-events +++ a/mm/compaction.c @@ -2387,8 +2387,7 @@ compact_zone(struct compact_control *cc, update_cached = !sync && cc->zone->compact_cached_migrate_pfn[0] == cc->zone->compact_cached_migrate_pfn[1]; - trace_mm_compaction_begin(start_pfn, cc->migrate_pfn, - cc->free_pfn, end_pfn, sync); + trace_mm_compaction_begin(cc, start_pfn, end_pfn, sync); /* lru_add_drain_all could be expensive with involving other CPUs */ lru_add_drain(); @@ -2438,8 +2437,7 @@ compact_zone(struct compact_control *cc, compaction_free, (unsigned long)cc, cc->mode, MR_COMPACTION, &nr_succeeded); - trace_mm_compaction_migratepages(cc->nr_migratepages, - nr_succeeded); + trace_mm_compaction_migratepages(cc, nr_succeeded); /* All pages were either migrated or will be released */ cc->nr_migratepages = 0; @@ -2515,8 +2513,7 @@ out: count_compact_events(COMPACTMIGRATE_SCANNED, cc->total_migrate_scanned); count_compact_events(COMPACTFREE_SCANNED, cc->total_free_scanned); - trace_mm_compaction_end(start_pfn, cc->migrate_pfn, - cc->free_pfn, end_pfn, sync, ret); + trace_mm_compaction_end(cc, start_pfn, end_pfn, sync, ret); return ret; } From patchwork Tue Mar 22 21:45:59 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789231 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EB30BC433F5 for ; Tue, 22 Mar 2022 21:46:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 86AE46B014A; Tue, 22 Mar 2022 17:46:03 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 81AFD6B014B; Tue, 22 Mar 2022 17:46:03 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 70A0A6B014C; Tue, 22 Mar 2022 17:46:03 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0071.hostedemail.com [216.40.44.71]) by kanga.kvack.org (Postfix) with ESMTP id 617286B014A for ; Tue, 22 Mar 2022 17:46:03 -0400 (EDT) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 1BAE3A4DCE for ; Tue, 22 Mar 2022 21:46:03 +0000 (UTC) X-FDA: 79273355406.23.E6465E5 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf14.hostedemail.com (Postfix) with ESMTP id 1CD2810002C for ; Tue, 22 Mar 2022 21:46:01 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 5D24CB81DAD; Tue, 22 Mar 2022 21:46:01 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id ED3C8C340EE; Tue, 22 Mar 2022 21:45:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985560; bh=vksHlh41Yl8TntR4WEXfMzLzqp4dQjqNQcVeZWBmRXI=; h=Date:To:From:In-Reply-To:Subject:From; b=kBzVph6uHpG5u7ts3Lowr6XsitPhMfna0Tw9PvqqB5J8umZ/w7LKMaAQE46N5cw4B c5P5IW7TtpPd1ltOob5kQcKROBkO5AI0+IYlZp4a/dOP1qXhfc35exp4VTrDxJJy7g unis9W2tc8uU0g3+xvOPHfGIglVgVM8VdrGpwscc= Date: Tue, 22 Mar 2022 14:45:59 -0700 To: vbabka@suse.cz,stable@vger.kernel.org,oleg@redhat.com,Liam.Howlett@oracle.com,hughd@google.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 147/227] mempolicy: mbind_range() set_policy() after vma_merge() Message-Id: <20220322214559.ED3C8C340EE@smtp.kernel.org> X-Stat-Signature: 1hnoyz9aabufsdh4b9mes19db6qy9j8h Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=kBzVph6u; spf=pass (imf14.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 1CD2810002C X-HE-Tag: 1647985561-525790 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Hugh Dickins Subject: mempolicy: mbind_range() set_policy() after vma_merge() v2.6.34 commit 9d8cebd4bcd7 ("mm: fix mbind vma merge problem") introduced vma_merge() to mbind_range(); but unlike madvise, mlock and mprotect, it put a "continue" to next vma where its precedents go to update flags on current vma before advancing: that left vma with the wrong setting in the infamous vma_merge() case 8. v3.10 commit 1444f92c8498 ("mm: merging memory blocks resets mempolicy") tried to fix that in vma_adjust(), without fully understanding the issue. v3.11 commit 3964acd0dbec ("mm: mempolicy: fix mbind_range() && vma_adjust() interaction") reverted that, and went about the fix in the right way, but chose to optimize out an unnecessary mpol_dup() with a prior mpol_equal() test. But on tmpfs, that also pessimized out the vital call to its ->set_policy(), leaving the new mbind unenforced. The user visible effect was that the pages got allocated on the local node (happened to be 0), after the mbind() caller had specifically asked for them to be allocated on node 1. There was not any page migration involved in the case reported: the pages simply got allocated on the wrong node. Just delete that optimization now (though it could be made conditional on vma not having a set_policy). Also remove the "next" variable: it turned out to be blameless, but also pointless. Link: https://lkml.kernel.org/r/319e4db9-64ae-4bca-92f0-ade85d342ff@google.com Fixes: 3964acd0dbec ("mm: mempolicy: fix mbind_range() && vma_adjust() interaction") Signed-off-by: Hugh Dickins Acked-by: Oleg Nesterov Reviewed-by: Liam R. Howlett Cc: Vlastimil Babka Cc: Signed-off-by: Andrew Morton --- mm/mempolicy.c | 8 +------- 1 file changed, 1 insertion(+), 7 deletions(-) --- a/mm/mempolicy.c~mempolicy-mbind_range-set_policy-after-vma_merge +++ a/mm/mempolicy.c @@ -786,7 +786,6 @@ static int vma_replace_policy(struct vm_ static int mbind_range(struct mm_struct *mm, unsigned long start, unsigned long end, struct mempolicy *new_pol) { - struct vm_area_struct *next; struct vm_area_struct *prev; struct vm_area_struct *vma; int err = 0; @@ -801,8 +800,7 @@ static int mbind_range(struct mm_struct if (start > vma->vm_start) prev = vma; - for (; vma && vma->vm_start < end; prev = vma, vma = next) { - next = vma->vm_next; + for (; vma && vma->vm_start < end; prev = vma, vma = vma->vm_next) { vmstart = max(start, vma->vm_start); vmend = min(end, vma->vm_end); @@ -817,10 +815,6 @@ static int mbind_range(struct mm_struct anon_vma_name(vma)); if (prev) { vma = prev; - next = vma->vm_next; - if (mpol_equal(vma_policy(vma), new_pol)) - continue; - /* vma_merge() joined vma && vma->next, case 8 */ goto replace; } if (vma->vm_start != vmstart) { From patchwork Tue Mar 22 21:46:02 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789232 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9018DC433EF for ; Tue, 22 Mar 2022 21:46:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 252EE6B014C; Tue, 22 Mar 2022 17:46:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 202AF6B014D; Tue, 22 Mar 2022 17:46:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0F0DD6B014E; Tue, 22 Mar 2022 17:46:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0070.hostedemail.com [216.40.44.70]) by kanga.kvack.org (Postfix) with ESMTP id F3E186B014C for ; Tue, 22 Mar 2022 17:46:05 -0400 (EDT) Received: from smtpin31.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id BA42D8249980 for ; Tue, 22 Mar 2022 21:46:05 +0000 (UTC) X-FDA: 79273355490.31.EEC153F Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf08.hostedemail.com (Postfix) with ESMTP id 423DB160026 for ; Tue, 22 Mar 2022 21:46:05 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 30886B81DBA; Tue, 22 Mar 2022 21:46:04 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id E2866C340EC; Tue, 22 Mar 2022 21:46:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985563; bh=L/2shDOh3xnUyMXVRhAr6pEdFIeqol9RjJVq2cMCBtQ=; h=Date:To:From:In-Reply-To:Subject:From; b=H0wzKb6jnOn1VRyAvw5n7VEQMVNrILzbRdZ5D8tTlPj4gw0y/9+JnjAuAu95gTE2Z wYqu5lwJBnCs7BCJnKaJmyS48xHflWTauGkX7h7El85is5UCy58HArWy7GxBatQ9uW 6/XSblcr7BZuoXlGitx1rzZq1YM3WsW2o2IGcJ6k= Date: Tue, 22 Mar 2022 14:46:02 -0700 To: rientjes@google.com,mhocko@suse.com,linmiaohe@huawei.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 148/227] mm/oom_kill: remove unneeded is_memcg_oom check Message-Id: <20220322214602.E2866C340EC@smtp.kernel.org> X-Rspam-User: X-Stat-Signature: jth4nhquu9f5dasmwzp4sxcphiyc17mh Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=H0wzKb6j; spf=pass (imf08.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 423DB160026 X-HE-Tag: 1647985565-742197 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Miaohe Lin Subject: mm/oom_kill: remove unneeded is_memcg_oom check oom_cpuset_eligible() is always called when !is_memcg_oom(). Remove this unnecessary check. Link: https://lkml.kernel.org/r/20220224115933.20154-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin Acked-by: David Rientjes Acked-by: Michal Hocko Signed-off-by: Andrew Morton --- mm/oom_kill.c | 3 --- 1 file changed, 3 deletions(-) --- a/mm/oom_kill.c~mm-oom_kill-remove-unneeded-is_memcg_oom-check +++ a/mm/oom_kill.c @@ -93,9 +93,6 @@ static bool oom_cpuset_eligible(struct t bool ret = false; const nodemask_t *mask = oc->nodemask; - if (is_memcg_oom(oc)) - return true; - rcu_read_lock(); for_each_thread(start, tsk) { if (mask) { From patchwork Tue Mar 22 21:46:05 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789233 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E13EDC433EF for ; Tue, 22 Mar 2022 21:46:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 780F96B014E; Tue, 22 Mar 2022 17:46:09 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 730D16B014F; Tue, 22 Mar 2022 17:46:09 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5F8DD6B0150; Tue, 22 Mar 2022 17:46:09 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.26]) by kanga.kvack.org (Postfix) with ESMTP id 50AFB6B014E for ; Tue, 22 Mar 2022 17:46:09 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 22D8B247A7 for ; Tue, 22 Mar 2022 21:46:09 +0000 (UTC) X-FDA: 79273355658.12.469AA1F Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf31.hostedemail.com (Postfix) with ESMTP id 7753420002 for ; Tue, 22 Mar 2022 21:46:08 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 6A9BAB81DAD; Tue, 22 Mar 2022 21:46:07 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 00702C340EE; Tue, 22 Mar 2022 21:46:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985566; bh=twgtnMsCKbLVHH5ZK/yXbLcXsd805xYGF/oG8kSHFMY=; h=Date:To:From:In-Reply-To:Subject:From; b=X07MdqreJqPOBhicaHrfB0ohQzVYaAAIQ5g+s4jLo+7C93Bhh8qf19o7gq2o0pVCX 5DJhtqRZ0dZyPMQCEafH1nIUEwJPmMBxsgAUr7LPps6vrFyCg0lMZ2Y7xZSmtAdX2+ 41mx0ZK59HcxuRtV8N4KGdZg0cfVQ9DZfKD9rzCc= Date: Tue, 22 Mar 2022 14:46:05 -0700 To: ziy@nvidia.com,zhongjiang-ali@linux.alibaba.com,xlpang@linux.alibaba.com,shy828301@gmail.com,osalvador@suse.de,mgorman@techsingularity.net,dave.hansen@linux.intel.com,baolin.wang@linux.alibaba.com,ying.huang@intel.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 149/227] mm,migrate: fix establishing demotion target Message-Id: <20220322214606.00702C340EE@smtp.kernel.org> X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 7753420002 X-Stat-Signature: fikzfseeymjjdzmpg8z8754h9jf4mrac Authentication-Results: imf31.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=X07Mdqre; dmarc=none; spf=pass (imf31.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Rspam-User: X-HE-Tag: 1647985568-710098 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Huang Ying Subject: mm,migrate: fix establishing demotion target In commit ac16ec835314 ("mm: migrate: support multiple target nodes demotion"), after the first demotion target node is found, we will continue to check the next candidate obtained via find_next_best_node(). This is to find all demotion target nodes with same NUMA distance. But one side effect of find_next_best_node() is that the candidate node returned will be set in "used" parameter, even if the candidate node isn't passed in the following NUMA distance checking, the candidate node will not be used as demotion target node for the following nodes. For example, for system as follows, node distances: node 0 1 2 3 0: 10 21 17 28 1: 21 10 28 17 2: 17 28 10 28 3: 28 17 28 10 when we establish demotion target node for node 0, in the first round node 2 is added to the demotion target node set. Then in the second round, node 3 is checked and failed because distance(0, 3) > distance(0, 2). But node 3 is set in "used" nodemask too. When we establish demotion target node for node 1, there is no available node. This is wrong, node 3 should be set as the demotion target of node 1. To fix this, if the candidate node is failed to pass the distance checking, it will be cleared in "used" nodemask. So that it can be used for the following node. The bug can be reproduced and fixed with this patch on a 2 socket server machine with DRAM and PMEM. Link: https://lkml.kernel.org/r/20220128055940.1792614-1-ying.huang@intel.com Fixes: ac16ec835314 ("mm: migrate: support multiple target nodes demotion") Signed-off-by: "Huang, Ying" Reviewed-by: Baolin Wang Cc: Baolin Wang Cc: Dave Hansen Cc: Zi Yan Cc: Oscar Salvador Cc: Yang Shi Cc: zhongjiang-ali Cc: Xunlei Pang Cc: Mel Gorman Signed-off-by: Andrew Morton --- mm/migrate.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) --- a/mm/migrate.c~mmmigrate-fix-establishing-demotion-target +++ a/mm/migrate.c @@ -3079,18 +3079,21 @@ static int establish_migrate_target(int if (best_distance != -1) { val = node_distance(node, migration_target); if (val > best_distance) - return NUMA_NO_NODE; + goto out_clear; } index = nd->nr; if (WARN_ONCE(index >= DEMOTION_TARGET_NODES, "Exceeds maximum demotion target nodes\n")) - return NUMA_NO_NODE; + goto out_clear; nd->nodes[index] = migration_target; nd->nr++; return migration_target; +out_clear: + node_clear(migration_target, *used); + return NUMA_NO_NODE; } /* From patchwork Tue Mar 22 21:46:08 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789234 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0B64CC433F5 for ; Tue, 22 Mar 2022 21:46:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 893906B0150; Tue, 22 Mar 2022 17:46:12 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 843266B0151; Tue, 22 Mar 2022 17:46:12 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 709676B0152; Tue, 22 Mar 2022 17:46:12 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.a.hostedemail.com [64.99.140.24]) by kanga.kvack.org (Postfix) with ESMTP id 610806B0150 for ; Tue, 22 Mar 2022 17:46:12 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay12.hostedemail.com (Postfix) with ESMTP id 2171F121237 for ; Tue, 22 Mar 2022 21:46:12 +0000 (UTC) X-FDA: 79273355784.12.D8546AE Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf05.hostedemail.com (Postfix) with ESMTP id 85228100032 for ; Tue, 22 Mar 2022 21:46:11 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 6A3C3B81DB9; Tue, 22 Mar 2022 21:46:10 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0CCD5C340EC; Tue, 22 Mar 2022 21:46:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985569; bh=gQQxkqj9KAdsN9UOV7tN0boTk/XrOQnfxwa7dIsj5Vo=; h=Date:To:From:In-Reply-To:Subject:From; b=v7WyapuT3n0yOLtPI537aNWnzrYma3Dx5ZKOGIayDiqMbHEcs0c+dmjI+3FUo30vF b2JwMKIKedLa1IUysHmQ87Q4nK/9BW5fyDzXOFD+/7Il+5Rk12Cx6pu9cBUMyNUWa3 yCRsE0FThhn5VRZYQ1EX4UEkhBRuPYlJflHN6aNU= Date: Tue, 22 Mar 2022 14:46:08 -0700 To: willy@infradead.org,william.kucharski@oracle.com,vbabka@suse.cz,shy828301@gmail.com,nicholas.tang@mediatek.com,maz@kernel.org,matthias.bgg@gmail.com,Kuan-Ying.Lee@mediatek.com,dhowells@redhat.com,david@redhat.com,andrew.yang@mediatek.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 150/227] mm/migrate: fix race between lock page and clear PG_Isolated Message-Id: <20220322214609.0CCD5C340EC@smtp.kernel.org> X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 85228100032 X-Rspam-User: Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=v7WyapuT; dmarc=none; spf=pass (imf05.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Stat-Signature: tgqkaeyszgn3h6be5jzr1bj3wkf4g9ad X-HE-Tag: 1647985571-805412 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: "andrew.yang" Subject: mm/migrate: fix race between lock page and clear PG_Isolated When memory is tight, system may start to compact memory for large continuous memory demands. If one process tries to lock a memory page that is being locked and isolated for compaction, it may wait a long time or even forever. This is because compaction will perform non-atomic PG_Isolated clear while holding page lock, this may overwrite PG_waiters set by the process that can't obtain the page lock and add itself to the waiting queue to wait for the lock to be unlocked. CPU1 CPU2 lock_page(page); (successful) lock_page(); (failed) __ClearPageIsolated(page); SetPageWaiters(page) (may be overwritten) unlock_page(page); The solution is to not perform non-atomic operation on page flags while holding page lock. Link: https://lkml.kernel.org/r/20220315030515.20263-1-andrew.yang@mediatek.com Signed-off-by: andrew.yang Cc: Matthias Brugger Cc: Matthew Wilcox Cc: "Vlastimil Babka" Cc: David Howells Cc: "William Kucharski" Cc: David Hildenbrand Cc: Yang Shi Cc: Marc Zyngier Cc: Nicholas Tang Cc: Kuan-Ying Lee Signed-off-by: Andrew Morton --- include/linux/page-flags.h | 2 +- mm/migrate.c | 12 ++++++------ 2 files changed, 7 insertions(+), 7 deletions(-) --- a/include/linux/page-flags.h~mm-migrate-fix-race-between-lock-page-and-clear-pg_isolated +++ a/include/linux/page-flags.h @@ -1000,7 +1000,7 @@ PAGE_TYPE_OPS(Guard, guard) extern bool is_free_buddy_page(struct page *page); -__PAGEFLAG(Isolated, isolated, PF_ANY); +PAGEFLAG(Isolated, isolated, PF_ANY); #ifdef CONFIG_MMU #define __PG_MLOCKED (1UL << PG_mlocked) --- a/mm/migrate.c~mm-migrate-fix-race-between-lock-page-and-clear-pg_isolated +++ a/mm/migrate.c @@ -107,7 +107,7 @@ int isolate_movable_page(struct page *pa /* Driver shouldn't use PG_isolated bit of page->flags */ WARN_ON_ONCE(PageIsolated(page)); - __SetPageIsolated(page); + SetPageIsolated(page); unlock_page(page); return 0; @@ -126,7 +126,7 @@ static void putback_movable_page(struct mapping = page_mapping(page); mapping->a_ops->putback_page(page); - __ClearPageIsolated(page); + ClearPageIsolated(page); } /* @@ -159,7 +159,7 @@ void putback_movable_pages(struct list_h if (PageMovable(page)) putback_movable_page(page); else - __ClearPageIsolated(page); + ClearPageIsolated(page); unlock_page(page); put_page(page); } else { @@ -883,7 +883,7 @@ static int move_to_new_page(struct page VM_BUG_ON_PAGE(!PageIsolated(page), page); if (!PageMovable(page)) { rc = MIGRATEPAGE_SUCCESS; - __ClearPageIsolated(page); + ClearPageIsolated(page); goto out; } @@ -905,7 +905,7 @@ static int move_to_new_page(struct page * We clear PG_movable under page_lock so any compactor * cannot try to migrate this page. */ - __ClearPageIsolated(page); + ClearPageIsolated(page); } /* @@ -1091,7 +1091,7 @@ static int unmap_and_move(new_page_t get if (unlikely(__PageMovable(page))) { lock_page(page); if (!PageMovable(page)) - __ClearPageIsolated(page); + ClearPageIsolated(page); unlock_page(page); } goto out; From patchwork Tue Mar 22 21:46:11 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789235 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C7AA4C433EF for ; Tue, 22 Mar 2022 21:46:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 586456B0152; Tue, 22 Mar 2022 17:46:15 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5342E6B0153; Tue, 22 Mar 2022 17:46:15 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 423D46B0154; Tue, 22 Mar 2022 17:46:15 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0124.hostedemail.com [216.40.44.124]) by kanga.kvack.org (Postfix) with ESMTP id 32D056B0152 for ; Tue, 22 Mar 2022 17:46:15 -0400 (EDT) Received: from smtpin30.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id C3D211827C179 for ; Tue, 22 Mar 2022 21:46:14 +0000 (UTC) X-FDA: 79273355868.30.6CA71BC Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf22.hostedemail.com (Postfix) with ESMTP id 59DB3C002D for ; Tue, 22 Mar 2022 21:46:14 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 4BC30B81D77; Tue, 22 Mar 2022 21:46:13 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 09E2AC340EC; Tue, 22 Mar 2022 21:46:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985572; bh=1ms0A4CRfbE+q+WFuovPaSOgayD/+vzxjQilvcPcAH4=; h=Date:To:From:In-Reply-To:Subject:From; b=jhKGz1weAcvN1OjR4QHCsvWBLJ56SZz2uJxcpHltSp0LprHZZDx4wHWEDkCHVTYMV vz+UqA0RbTzggObouoddRUjH4xCgGx+X0Rz9GB+nZ4AiPPwf1DZgzhf6vUhLOvRusy RlWy5BTRfV3+/e68FlE/ZhwC7CeA6zawpEI6qPq8= Date: Tue, 22 Mar 2022 14:46:11 -0700 To: ziy@nvidia.com,shy828301@gmail.com,rcampbell@nvidia.com,kirill.shutemov@linux.intel.com,hughd@google.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 151/227] mm/thp: refix __split_huge_pmd_locked() for migration PMD Message-Id: <20220322214612.09E2AC340EC@smtp.kernel.org> Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=jhKGz1we; spf=pass (imf22.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Stat-Signature: zo7nj4do7egjzn8qgouwmpjcbcea399p X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 59DB3C002D X-HE-Tag: 1647985574-109996 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Hugh Dickins Subject: mm/thp: refix __split_huge_pmd_locked() for migration PMD Migration entries do not contribute to a page's reference count: move __split_huge_pmd_locked()'s page_ref_add() into pmd_migration's else block (along with the page_count() check - a page is quite likely to have reference count frozen to 0 when a migration entry is found). This will fix a very rare anonymous memory leak, after a split_huge_pmd() raced with an anon split_huge_page() or an anon THP migrate_pages(): since the wrongly raised refcount stopped the page (perhaps small, perhaps huge, depending on when the race hit) from ever being freed. At first I thought there were worse risks, from prematurely unfreezing a frozen page: but now think that would only affect page cache pages, which do not come this way (except for anonymous pages in swap cache, perhaps). Link: https://lkml.kernel.org/r/84792468-f512-e48f-378c-e34c3641e97@google.com Fixes: ec0abae6dcdf ("mm/thp: fix __split_huge_pmd_locked() for migration PMD") Signed-off-by: Hugh Dickins Reviewed-by: Yang Shi Cc: Ralph Campbell Cc: Zi Yan Cc: "Kirill A. Shutemov" Signed-off-by: Andrew Morton --- mm/huge_memory.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) --- a/mm/huge_memory.c~mm-thp-refix-__split_huge_pmd_locked-for-migration-pmd +++ a/mm/huge_memory.c @@ -2055,9 +2055,9 @@ static void __split_huge_pmd_locked(stru young = pmd_young(old_pmd); soft_dirty = pmd_soft_dirty(old_pmd); uffd_wp = pmd_uffd_wp(old_pmd); + VM_BUG_ON_PAGE(!page_count(page), page); + page_ref_add(page, HPAGE_PMD_NR - 1); } - VM_BUG_ON_PAGE(!page_count(page), page); - page_ref_add(page, HPAGE_PMD_NR - 1); /* * Withdraw the table only after we mark the pmd entry invalid. From patchwork Tue Mar 22 21:46:14 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789236 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DF1C1C433EF for ; Tue, 22 Mar 2022 21:46:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 70BB76B0154; Tue, 22 Mar 2022 17:46:18 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6BC9B6B0155; Tue, 22 Mar 2022 17:46:18 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 584586B0156; Tue, 22 Mar 2022 17:46:18 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.a.hostedemail.com [64.99.140.24]) by kanga.kvack.org (Postfix) with ESMTP id 49AC06B0154 for ; Tue, 22 Mar 2022 17:46:18 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 2820961D0A for ; Tue, 22 Mar 2022 21:46:18 +0000 (UTC) X-FDA: 79273356036.02.BC1C896 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf10.hostedemail.com (Postfix) with ESMTP id 8ACE1C002A for ; Tue, 22 Mar 2022 21:46:17 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 627B9B81DBA; Tue, 22 Mar 2022 21:46:16 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0B859C340EE; Tue, 22 Mar 2022 21:46:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985575; bh=RZk32U2wkpIOQUJvahJ71fz2IWCApZYl7wNieg6vRYk=; h=Date:To:From:In-Reply-To:Subject:From; b=NoVE+Vijy36uDUoTptgMWu633gboe8ANZ77+QSIL93sbpeSoW6AFsnHkDJpxcc6wj qutkvzmUc/DrNA7oTuMTQ2LIvfY/ES0g8sP/dpov9jXpIpIFZdtSnzckKWNMVoBRf+ hunfFRtRniKuk/SDxiyhJsMXrxPZSNaMuXAGvLcs= Date: Tue, 22 Mar 2022 14:46:14 -0700 To: sourabhjain@linux.ibm.com,osalvador@suse.de,mpe@ellerman.id.au,mike.kravetz@oracle.com,mahesh@linux.ibm.com,david@redhat.com,hbathini@linux.ibm.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 152/227] mm/cma: provide option to opt out from exposing pages on activation failure Message-Id: <20220322214615.0B859C340EE@smtp.kernel.org> X-Stat-Signature: x8pqj5c6idjrjyzohy5ju6ds3gxomeew Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=NoVE+Vij; spf=pass (imf10.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 8ACE1C002A X-HE-Tag: 1647985577-435476 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Hari Bathini Subject: mm/cma: provide option to opt out from exposing pages on activation failure Patch series "powerpc/fadump: handle CMA activation failure appropriately", v3. Commit 072355c1cf2d ("mm/cma: expose all pages to the buddy if activation of an area fails") started exposing all pages to buddy allocator on CMA activation failure. But there can be CMA users that want to handle the reserved memory differently on CMA allocation failure. Provide an option to opt out from exposing pages to buddy for such cases. Link: https://lkml.kernel.org/r/20220117075246.36072-1-hbathini@linux.ibm.com Link: https://lkml.kernel.org/r/20220117075246.36072-2-hbathini@linux.ibm.com Signed-off-by: Hari Bathini Reviewed-by: David Hildenbrand Cc: Oscar Salvador Cc: Mike Kravetz Cc: Mahesh Salgaonkar Cc: Sourabh Jain Cc: Michael Ellerman Signed-off-by: Andrew Morton --- include/linux/cma.h | 2 ++ mm/cma.c | 11 +++++++++-- mm/cma.h | 1 + 3 files changed, 12 insertions(+), 2 deletions(-) --- a/include/linux/cma.h~mm-cma-provide-option-to-opt-out-from-exposing-pages-on-activation-failure +++ a/include/linux/cma.h @@ -58,4 +58,6 @@ extern bool cma_pages_valid(struct cma * extern bool cma_release(struct cma *cma, const struct page *pages, unsigned long count); extern int cma_for_each_area(int (*it)(struct cma *cma, void *data), void *data); + +extern void cma_reserve_pages_on_error(struct cma *cma); #endif --- a/mm/cma.c~mm-cma-provide-option-to-opt-out-from-exposing-pages-on-activation-failure +++ a/mm/cma.c @@ -131,8 +131,10 @@ not_in_zone: bitmap_free(cma->bitmap); out_error: /* Expose all pages to the buddy, they are useless for CMA. */ - for (pfn = base_pfn; pfn < base_pfn + cma->count; pfn++) - free_reserved_page(pfn_to_page(pfn)); + if (!cma->reserve_pages_on_error) { + for (pfn = base_pfn; pfn < base_pfn + cma->count; pfn++) + free_reserved_page(pfn_to_page(pfn)); + } totalcma_pages -= cma->count; cma->count = 0; pr_err("CMA area %s could not be activated\n", cma->name); @@ -150,6 +152,11 @@ static int __init cma_init_reserved_area } core_initcall(cma_init_reserved_areas); +void __init cma_reserve_pages_on_error(struct cma *cma) +{ + cma->reserve_pages_on_error = true; +} + /** * cma_init_reserved_mem() - create custom contiguous area from reserved memory * @base: Base address of the reserved area --- a/mm/cma.h~mm-cma-provide-option-to-opt-out-from-exposing-pages-on-activation-failure +++ a/mm/cma.h @@ -30,6 +30,7 @@ struct cma { /* kobject requires dynamic object */ struct cma_kobject *cma_kobj; #endif + bool reserve_pages_on_error; }; extern struct cma cma_areas[MAX_CMA_AREAS]; From patchwork Tue Mar 22 21:46:17 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789237 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 76D62C433FE for ; Tue, 22 Mar 2022 21:46:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0CA9B6B0156; Tue, 22 Mar 2022 17:46:20 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 079DC6B0157; Tue, 22 Mar 2022 17:46:19 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D725C6B0158; Tue, 22 Mar 2022 17:46:19 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.26]) by kanga.kvack.org (Postfix) with ESMTP id C6C7D6B0156 for ; Tue, 22 Mar 2022 17:46:19 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay11.hostedemail.com (Postfix) with ESMTP id A9A0481A3D for ; Tue, 22 Mar 2022 21:46:19 +0000 (UTC) X-FDA: 79273356078.10.408EB16 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf04.hostedemail.com (Postfix) with ESMTP id 4164C40021 for ; Tue, 22 Mar 2022 21:46:19 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id AB1CC6165D; Tue, 22 Mar 2022 21:46:18 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0DC70C340EC; Tue, 22 Mar 2022 21:46:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985578; bh=rVcXrpMApCnqbgK4FnFZ6aZ0OghhdBHDhXCnJqs3fYI=; h=Date:To:From:In-Reply-To:Subject:From; b=mM9yP7B55z3rV7AECvjzPz28YcUlLU/2X/Vz+4F+KqAeAE41tEsyvKR1YVlDPAFTw Cj1j1hNqOqbHCrMkCKI5CjhUO8FluGDwe2dQDyFAw7a8IYP1gKfa6ilD/zS2BMMnlh LWjIwyNxlQdmA3puTLc6Dknwnvj9YeR4k/TSqw8E= Date: Tue, 22 Mar 2022 14:46:17 -0700 To: sourabhjain@linux.ibm.com,osalvador@suse.de,mpe@ellerman.id.au,mike.kravetz@oracle.com,mahesh@linux.ibm.com,david@redhat.com,hbathini@linux.ibm.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 153/227] powerpc/fadump: opt out from freeing pages on cma activation failure Message-Id: <20220322214618.0DC70C340EC@smtp.kernel.org> X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 4164C40021 X-Stat-Signature: f3afg7c64kk83hc3uuuo7e31ah5ozj75 Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=mM9yP7B5; dmarc=none; spf=pass (imf04.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Rspam-User: X-HE-Tag: 1647985579-74845 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Hari Bathini Subject: powerpc/fadump: opt out from freeing pages on cma activation failure With commit a4e92ce8e4c8 ("powerpc/fadump: Reservationless firmware assisted dump"), Linux kernel's Contiguous Memory Allocator (CMA) based reservation was introduced in fadump. That change was aimed at using CMA to let applications utilize the memory reserved for fadump while blocking it from being used for kernel pages. The assumption was, even if CMA activation fails for whatever reason, the memory still remains reserved to avoid it from being used for kernel pages. But commit 072355c1cf2d ("mm/cma: expose all pages to the buddy if activation of an area fails") breaks this assumption as it started exposing all pages to buddy allocator on CMA activation failure. It led to warning messages like below while running crash-utility on vmcore of a kernel having above two commits: crash: seek error: kernel virtual address: To fix this problem, opt out from exposing pages to buddy allocator on CMA activation failure for fadump reserved memory. Link: https://lkml.kernel.org/r/20220117075246.36072-3-hbathini@linux.ibm.com Signed-off-by: Hari Bathini Acked-by: David Hildenbrand Acked-by: Michael Ellerman Cc: Mahesh Salgaonkar Cc: Mike Kravetz Cc: Oscar Salvador Cc: Sourabh Jain Signed-off-by: Andrew Morton --- arch/powerpc/kernel/fadump.c | 6 ++++++ 1 file changed, 6 insertions(+) --- a/arch/powerpc/kernel/fadump.c~powerpc-fadump-opt-out-from-freeing-pages-on-cma-activation-failure +++ a/arch/powerpc/kernel/fadump.c @@ -113,6 +113,12 @@ static int __init fadump_cma_init(void) } /* + * If CMA activation fails, keep the pages reserved, instead of + * exposing them to buddy allocator. Same as 'fadump=nocma' case. + */ + cma_reserve_pages_on_error(fadump_cma); + + /* * So we now have successfully initialized cma area for fadump. */ pr_info("Initialized 0x%lx bytes cma area at %ldMB from 0x%lx " From patchwork Tue Mar 22 21:46:20 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789238 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 68AB8C433EF for ; Tue, 22 Mar 2022 21:46:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EAC806B0158; Tue, 22 Mar 2022 17:46:24 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E5DBF6B0159; Tue, 22 Mar 2022 17:46:24 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D24C96B015A; Tue, 22 Mar 2022 17:46:24 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0170.hostedemail.com [216.40.44.170]) by kanga.kvack.org (Postfix) with ESMTP id C37876B0158 for ; Tue, 22 Mar 2022 17:46:24 -0400 (EDT) Received: from smtpin21.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 78BE318289DE1 for ; Tue, 22 Mar 2022 21:46:24 +0000 (UTC) X-FDA: 79273356288.21.F416DC5 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf19.hostedemail.com (Postfix) with ESMTP id CF1E61A0028 for ; Tue, 22 Mar 2022 21:46:23 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id A2E64B81D5F; Tue, 22 Mar 2022 21:46:22 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 464CDC340EC; Tue, 22 Mar 2022 21:46:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985581; bh=hIclrAO0aJSlCuB+gAg4to78GxvsR5pR8C0Gwnzx6Lo=; h=Date:To:From:In-Reply-To:Subject:From; b=2jujozzTHD90rlhhAPNDFauvplfQsZOaf1iDlhfJ54zquLzmxry0TbuvSPBVmeBgP 2OYLIWcaUv7GZpGuJqCw+RykZnUHusBZjPzgMOYMnsyw2lI1uFY5IxNPsnl88nBeVJ HuM+a4GYa041+8rndz2073PoSv0E12eoe0ogqKK4= Date: Tue, 22 Mar 2022 14:46:20 -0700 To: ziy@nvidia.com,zhongjiang-ali@linux.alibaba.com,weixugc@google.com,shy828301@gmail.com,shakeelb@google.com,riel@surriel.com,rdunlap@infradead.org,peterz@infradead.org,osalvador@suse.de,mhocko@suse.com,mgorman@techsingularity.net,hannes@cmpxchg.org,feng.tang@intel.com,dave.hansen@linux.intel.com,baolin.wang@linux.alibaba.com,ying.huang@intel.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 154/227] NUMA Balancing: add page promotion counter Message-Id: <20220322214621.464CDC340EC@smtp.kernel.org> X-Rspamd-Server: rspam09 X-Rspam-User: X-Stat-Signature: st76i5sh1j56qouu9hqiabidnfurkn4a Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=2jujozzT; dmarc=none; spf=pass (imf19.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Rspamd-Queue-Id: CF1E61A0028 X-HE-Tag: 1647985583-81950 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Huang Ying Subject: NUMA Balancing: add page promotion counter Patch series "NUMA balancing: optimize memory placement for memory tiering system", v13 With the advent of various new memory types, some machines will have multiple types of memory, e.g. DRAM and PMEM (persistent memory). The memory subsystem of these machines can be called memory tiering system, because the performance of the different types of memory are different. After commit c221c0b0308f ("device-dax: "Hotplug" persistent memory for use like normal RAM"), the PMEM could be used as the cost-effective volatile memory in separate NUMA nodes. In a typical memory tiering system, there are CPUs, DRAM and PMEM in each physical NUMA node. The CPUs and the DRAM will be put in one logical node, while the PMEM will be put in another (faked) logical node. To optimize the system overall performance, the hot pages should be placed in DRAM node. To do that, we need to identify the hot pages in the PMEM node and migrate them to DRAM node via NUMA migration. In the original NUMA balancing, there are already a set of existing mechanisms to identify the pages recently accessed by the CPUs in a node and migrate the pages to the node. So we can reuse these mechanisms to build the mechanisms to optimize the page placement in the memory tiering system. This is implemented in this patchset. At the other hand, the cold pages should be placed in PMEM node. So, we also need to identify the cold pages in the DRAM node and migrate them to PMEM node. In commit 26aa2d199d6f ("mm/migrate: demote pages during reclaim"), a mechanism to demote the cold DRAM pages to PMEM node under memory pressure is implemented. Based on that, the cold DRAM pages can be demoted to PMEM node proactively to free some memory space on DRAM node to accommodate the promoted hot PMEM pages. This is implemented in this patchset too. We have tested the solution with the pmbench memory accessing benchmark with the 80:20 read/write ratio and the Gauss access address distribution on a 2 socket Intel server with Optane DC Persistent Memory Model. The test results shows that the pmbench score can improve up to 95.9%. This patch (of 3): In a system with multiple memory types, e.g. DRAM and PMEM, the CPU and DRAM in one socket will be put in one NUMA node as before, while the PMEM will be put in another NUMA node as described in the description of the commit c221c0b0308f ("device-dax: "Hotplug" persistent memory for use like normal RAM"). So, the NUMA balancing mechanism will identify all PMEM accesses as remote access and try to promote the PMEM pages to DRAM. To distinguish the number of the inter-type promoted pages from that of the inter-socket migrated pages. A new vmstat count is added. The counter is per-node (count in the target node). So this can be used to identify promotion imbalance among the NUMA nodes. Link: https://lkml.kernel.org/r/20220301085329.3210428-1-ying.huang@intel.com Link: https://lkml.kernel.org/r/20220221084529.1052339-1-ying.huang@intel.com Link: https://lkml.kernel.org/r/20220221084529.1052339-2-ying.huang@intel.com Signed-off-by: "Huang, Ying" Reviewed-by: Yang Shi Tested-by: Baolin Wang Reviewed-by: Baolin Wang Acked-by: Johannes Weiner Reviewed-by: Oscar Salvador Cc: Michal Hocko Cc: Rik van Riel Cc: Mel Gorman Cc: Peter Zijlstra Cc: Dave Hansen Cc: Zi Yan Cc: Wei Xu Cc: Shakeel Butt Cc: zhongjiang-ali Cc: Feng Tang Cc: Randy Dunlap Signed-off-by: Andrew Morton --- include/linux/mmzone.h | 3 +++ include/linux/node.h | 5 +++++ mm/migrate.c | 13 ++++++++++--- mm/vmstat.c | 3 +++ 4 files changed, 21 insertions(+), 3 deletions(-) --- a/include/linux/mmzone.h~numa-balancing-add-page-promotion-counter +++ a/include/linux/mmzone.h @@ -222,6 +222,9 @@ enum node_stat_item { #ifdef CONFIG_SWAP NR_SWAPCACHE, #endif +#ifdef CONFIG_NUMA_BALANCING + PGPROMOTE_SUCCESS, /* promote successfully */ +#endif NR_VM_NODE_STAT_ITEMS }; --- a/include/linux/node.h~numa-balancing-add-page-promotion-counter +++ a/include/linux/node.h @@ -181,4 +181,9 @@ static inline void register_hugetlbfs_wi #define to_node(device) container_of(device, struct node, dev) +static inline bool node_is_toptier(int node) +{ + return node_state(node, N_CPU); +} + #endif /* _LINUX_NODE_H_ */ --- a/mm/migrate.c~numa-balancing-add-page-promotion-counter +++ a/mm/migrate.c @@ -2069,6 +2069,7 @@ int migrate_misplaced_page(struct page * pg_data_t *pgdat = NODE_DATA(node); int isolated; int nr_remaining; + unsigned int nr_succeeded; LIST_HEAD(migratepages); new_page_t *new; bool compound; @@ -2107,7 +2108,8 @@ int migrate_misplaced_page(struct page * list_add(&page->lru, &migratepages); nr_remaining = migrate_pages(&migratepages, *new, NULL, node, - MIGRATE_ASYNC, MR_NUMA_MISPLACED, NULL); + MIGRATE_ASYNC, MR_NUMA_MISPLACED, + &nr_succeeded); if (nr_remaining) { if (!list_empty(&migratepages)) { list_del(&page->lru); @@ -2116,8 +2118,13 @@ int migrate_misplaced_page(struct page * putback_lru_page(page); } isolated = 0; - } else - count_vm_numa_events(NUMA_PAGE_MIGRATE, nr_pages); + } + if (nr_succeeded) { + count_vm_numa_events(NUMA_PAGE_MIGRATE, nr_succeeded); + if (!node_is_toptier(page_to_nid(page)) && node_is_toptier(node)) + mod_node_page_state(pgdat, PGPROMOTE_SUCCESS, + nr_succeeded); + } BUG_ON(!list_empty(&migratepages)); return isolated; --- a/mm/vmstat.c~numa-balancing-add-page-promotion-counter +++ a/mm/vmstat.c @@ -1242,6 +1242,9 @@ const char * const vmstat_text[] = { #ifdef CONFIG_SWAP "nr_swapcached", #endif +#ifdef CONFIG_NUMA_BALANCING + "pgpromote_success", +#endif /* enum writeback_stat_item counters */ "nr_dirty_threshold", From patchwork Tue Mar 22 21:46:23 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789239 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3752DC433F5 for ; Tue, 22 Mar 2022 21:46:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C8FAC6B015A; Tue, 22 Mar 2022 17:46:26 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C133E6B015B; Tue, 22 Mar 2022 17:46:26 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AE04A6B015C; Tue, 22 Mar 2022 17:46:26 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0099.hostedemail.com [216.40.44.99]) by kanga.kvack.org (Postfix) with ESMTP id 9C84C6B015A for ; Tue, 22 Mar 2022 17:46:26 -0400 (EDT) Received: from smtpin21.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 54944A565B for ; Tue, 22 Mar 2022 21:46:26 +0000 (UTC) X-FDA: 79273356372.21.CDC1BD1 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf11.hostedemail.com (Postfix) with ESMTP id C125740017 for ; Tue, 22 Mar 2022 21:46:25 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 2B0A36100A; Tue, 22 Mar 2022 21:46:25 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7DD22C340EC; Tue, 22 Mar 2022 21:46:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985584; bh=dEwK72wY/c2fYqbBvUjTF6jyvKp0v2Yq8oFmJJCBEnc=; h=Date:To:From:In-Reply-To:Subject:From; b=Su7L95Ei2JRfi1vVD0WtQmvh28nnvmD+Izw8xZoh6BX2eiJlQHGXZ7Emv+048Q2mm 6jVg6BtRCB1jGs6kXZ5zkD3pCiHZ5b+OQK1tODrtHwPjfZWzKpbS2BiHBEGemeUw8n YFy2s1q1npXxnb0FhZ94pZdWHD9DfrZfAJUOL91g= Date: Tue, 22 Mar 2022 14:46:23 -0700 To: ziy@nvidia.com,zhongjiang-ali@linux.alibaba.com,weixugc@google.com,shy828301@gmail.com,shakeelb@google.com,riel@surriel.com,rdunlap@infradead.org,peterz@infradead.org,osalvador@suse.de,mhocko@suse.com,mgorman@techsingularity.net,hannes@cmpxchg.org,feng.tang@intel.com,dave.hansen@linux.intel.com,baolin.wang@linux.alibaba.com,ying.huang@intel.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 155/227] NUMA balancing: optimize page placement for memory tiering system Message-Id: <20220322214624.7DD22C340EC@smtp.kernel.org> X-Stat-Signature: 1jusgnn99bj8rcey85he7en9ahucbh4g X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: C125740017 Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=Su7L95Ei; dmarc=none; spf=pass (imf11.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Rspam-User: X-HE-Tag: 1647985585-721666 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Huang Ying Subject: NUMA balancing: optimize page placement for memory tiering system With the advent of various new memory types, some machines will have multiple types of memory, e.g. DRAM and PMEM (persistent memory). The memory subsystem of these machines can be called memory tiering system, because the performance of the different types of memory are usually different. In such system, because of the memory accessing pattern changing etc, some pages in the slow memory may become hot globally. So in this patch, the NUMA balancing mechanism is enhanced to optimize the page placement among the different memory types according to hot/cold dynamically. In a typical memory tiering system, there are CPUs, fast memory and slow memory in each physical NUMA node. The CPUs and the fast memory will be put in one logical node (called fast memory node), while the slow memory will be put in another (faked) logical node (called slow memory node). That is, the fast memory is regarded as local while the slow memory is regarded as remote. So it's possible for the recently accessed pages in the slow memory node to be promoted to the fast memory node via the existing NUMA balancing mechanism. The original NUMA balancing mechanism will stop to migrate pages if the free memory of the target node becomes below the high watermark. This is a reasonable policy if there's only one memory type. But this makes the original NUMA balancing mechanism almost do not work to optimize page placement among different memory types. Details are as follows. It's the common cases that the working-set size of the workload is larger than the size of the fast memory nodes. Otherwise, it's unnecessary to use the slow memory at all. So, there are almost always no enough free pages in the fast memory nodes, so that the globally hot pages in the slow memory node cannot be promoted to the fast memory node. To solve the issue, we have 2 choices as follows, a. Ignore the free pages watermark checking when promoting hot pages from the slow memory node to the fast memory node. This will create some memory pressure in the fast memory node, thus trigger the memory reclaiming. So that, the cold pages in the fast memory node will be demoted to the slow memory node. b. Define a new watermark called wmark_promo which is higher than wmark_high, and have kswapd reclaiming pages until free pages reach such watermark. The scenario is as follows: when we want to promote hot-pages from a slow memory to a fast memory, but fast memory's free pages would go lower than high watermark with such promotion, we wake up kswapd with wmark_promo watermark in order to demote cold pages and free us up some space. So, next time we want to promote hot-pages we might have a chance of doing so. The choice "a" may create high memory pressure in the fast memory node. If the memory pressure of the workload is high, the memory pressure may become so high that the memory allocation latency of the workload is influenced, e.g. the direct reclaiming may be triggered. The choice "b" works much better at this aspect. If the memory pressure of the workload is high, the hot pages promotion will stop earlier because its allocation watermark is higher than that of the normal memory allocation. So in this patch, choice "b" is implemented. A new zone watermark (WMARK_PROMO) is added. Which is larger than the high watermark and can be controlled via watermark_scale_factor. In addition to the original page placement optimization among sockets, the NUMA balancing mechanism is extended to be used to optimize page placement according to hot/cold among different memory types. So the sysctl user space interface (numa_balancing) is extended in a backward compatible way as follow, so that the users can enable/disable these functionality individually. The sysctl is converted from a Boolean value to a bits field. The definition of the flags is, - 0: NUMA_BALANCING_DISABLED - 1: NUMA_BALANCING_NORMAL - 2: NUMA_BALANCING_MEMORY_TIERING We have tested the patch with the pmbench memory accessing benchmark with the 80:20 read/write ratio and the Gauss access address distribution on a 2 socket Intel server with Optane DC Persistent Memory Model. The test results shows that the pmbench score can improve up to 95.9%. Thanks Andrew Morton to help fix the document format error. Link: https://lkml.kernel.org/r/20220221084529.1052339-3-ying.huang@intel.com Signed-off-by: "Huang, Ying" Tested-by: Baolin Wang Reviewed-by: Baolin Wang Acked-by: Johannes Weiner Reviewed-by: Oscar Salvador Reviewed-by: Yang Shi Cc: Michal Hocko Cc: Rik van Riel Cc: Mel Gorman Cc: Peter Zijlstra Cc: Dave Hansen Cc: Zi Yan Cc: Wei Xu Cc: Shakeel Butt Cc: zhongjiang-ali Cc: Randy Dunlap Cc: Feng Tang Signed-off-by: Andrew Morton --- Documentation/admin-guide/sysctl/kernel.rst | 31 ++++++++++++------ include/linux/mmzone.h | 1 include/linux/sched/sysctl.h | 10 +++++ kernel/sched/core.c | 21 +++++++++--- kernel/sysctl.c | 2 - mm/migrate.c | 16 ++++++++- mm/page_alloc.c | 3 + mm/vmscan.c | 6 ++- 8 files changed, 71 insertions(+), 19 deletions(-) --- a/Documentation/admin-guide/sysctl/kernel.rst~numa-balancing-optimize-page-placement-for-memory-tiering-system +++ a/Documentation/admin-guide/sysctl/kernel.rst @@ -595,16 +595,23 @@ Documentation/admin-guide/kernel-paramet numa_balancing ============== -Enables/disables automatic page fault based NUMA memory -balancing. Memory is moved automatically to nodes -that access it often. - -Enables/disables automatic NUMA memory balancing. On NUMA machines, there -is a performance penalty if remote memory is accessed by a CPU. When this -feature is enabled the kernel samples what task thread is accessing memory -by periodically unmapping pages and later trapping a page fault. At the -time of the page fault, it is determined if the data being accessed should -be migrated to a local memory node. +Enables/disables and configures automatic page fault based NUMA memory +balancing. Memory is moved automatically to nodes that access it often. +The value to set can be the result of ORing the following: + += ================================= +0 NUMA_BALANCING_DISABLED +1 NUMA_BALANCING_NORMAL +2 NUMA_BALANCING_MEMORY_TIERING += ================================= + +Or NUMA_BALANCING_NORMAL to optimize page placement among different +NUMA nodes to reduce remote accessing. On NUMA machines, there is a +performance penalty if remote memory is accessed by a CPU. When this +feature is enabled the kernel samples what task thread is accessing +memory by periodically unmapping pages and later trapping a page +fault. At the time of the page fault, it is determined if the data +being accessed should be migrated to a local memory node. The unmapping of pages and trapping faults incur additional overhead that ideally is offset by improved memory locality but there is no universal @@ -615,6 +622,10 @@ faults may be controlled by the `numa_ba numa_balancing_scan_delay_ms, numa_balancing_scan_period_max_ms, numa_balancing_scan_size_mb`_, and numa_balancing_settle_count sysctls. +Or NUMA_BALANCING_MEMORY_TIERING to optimize page placement among +different types of memory (represented as different NUMA nodes) to +place the hot pages in the fast memory. This is implemented based on +unmapping and page fault too. numa_balancing_scan_period_min_ms, numa_balancing_scan_delay_ms, numa_balancing_scan_period_max_ms, numa_balancing_scan_size_mb =============================================================================================================================== --- a/include/linux/mmzone.h~numa-balancing-optimize-page-placement-for-memory-tiering-system +++ a/include/linux/mmzone.h @@ -353,6 +353,7 @@ enum zone_watermarks { WMARK_MIN, WMARK_LOW, WMARK_HIGH, + WMARK_PROMO, NR_WMARK }; --- a/include/linux/sched/sysctl.h~numa-balancing-optimize-page-placement-for-memory-tiering-system +++ a/include/linux/sched/sysctl.h @@ -23,6 +23,16 @@ enum sched_tunable_scaling { SCHED_TUNABLESCALING_END, }; +#define NUMA_BALANCING_DISABLED 0x0 +#define NUMA_BALANCING_NORMAL 0x1 +#define NUMA_BALANCING_MEMORY_TIERING 0x2 + +#ifdef CONFIG_NUMA_BALANCING +extern int sysctl_numa_balancing_mode; +#else +#define sysctl_numa_balancing_mode 0 +#endif + /* * control realtime throttling: * --- a/kernel/sched/core.c~numa-balancing-optimize-page-placement-for-memory-tiering-system +++ a/kernel/sched/core.c @@ -4279,7 +4279,9 @@ DEFINE_STATIC_KEY_FALSE(sched_numa_balan #ifdef CONFIG_NUMA_BALANCING -void set_numabalancing_state(bool enabled) +int sysctl_numa_balancing_mode; + +static void __set_numabalancing_state(bool enabled) { if (enabled) static_branch_enable(&sched_numa_balancing); @@ -4287,13 +4289,22 @@ void set_numabalancing_state(bool enable static_branch_disable(&sched_numa_balancing); } +void set_numabalancing_state(bool enabled) +{ + if (enabled) + sysctl_numa_balancing_mode = NUMA_BALANCING_NORMAL; + else + sysctl_numa_balancing_mode = NUMA_BALANCING_DISABLED; + __set_numabalancing_state(enabled); +} + #ifdef CONFIG_PROC_SYSCTL int sysctl_numa_balancing(struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) { struct ctl_table t; int err; - int state = static_branch_likely(&sched_numa_balancing); + int state = sysctl_numa_balancing_mode; if (write && !capable(CAP_SYS_ADMIN)) return -EPERM; @@ -4303,8 +4314,10 @@ int sysctl_numa_balancing(struct ctl_tab err = proc_dointvec_minmax(&t, write, buffer, lenp, ppos); if (err < 0) return err; - if (write) - set_numabalancing_state(state); + if (write) { + sysctl_numa_balancing_mode = state; + __set_numabalancing_state(state); + } return err; } #endif --- a/kernel/sysctl.c~numa-balancing-optimize-page-placement-for-memory-tiering-system +++ a/kernel/sysctl.c @@ -1696,7 +1696,7 @@ static struct ctl_table kern_table[] = { .mode = 0644, .proc_handler = sysctl_numa_balancing, .extra1 = SYSCTL_ZERO, - .extra2 = SYSCTL_ONE, + .extra2 = SYSCTL_FOUR, }, #endif /* CONFIG_NUMA_BALANCING */ { --- a/mm/migrate.c~numa-balancing-optimize-page-placement-for-memory-tiering-system +++ a/mm/migrate.c @@ -51,6 +51,7 @@ #include #include #include +#include #include @@ -2031,16 +2032,27 @@ static int numamigrate_isolate_page(pg_d { int page_lru; int nr_pages = thp_nr_pages(page); + int order = compound_order(page); - VM_BUG_ON_PAGE(compound_order(page) && !PageTransHuge(page), page); + VM_BUG_ON_PAGE(order && !PageTransHuge(page), page); /* Do not migrate THP mapped by multiple processes */ if (PageTransHuge(page) && total_mapcount(page) > 1) return 0; /* Avoid migrating to a node that is nearly full */ - if (!migrate_balanced_pgdat(pgdat, nr_pages)) + if (!migrate_balanced_pgdat(pgdat, nr_pages)) { + int z; + + if (!(sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING)) + return 0; + for (z = pgdat->nr_zones - 1; z >= 0; z--) { + if (populated_zone(pgdat->node_zones + z)) + break; + } + wakeup_kswapd(pgdat->node_zones + z, 0, order, ZONE_MOVABLE); return 0; + } if (isolate_lru_page(page)) return 0; --- a/mm/page_alloc.c~numa-balancing-optimize-page-placement-for-memory-tiering-system +++ a/mm/page_alloc.c @@ -8441,7 +8441,8 @@ static void __setup_per_zone_wmarks(void zone->watermark_boost = 0; zone->_watermark[WMARK_LOW] = min_wmark_pages(zone) + tmp; - zone->_watermark[WMARK_HIGH] = min_wmark_pages(zone) + tmp * 2; + zone->_watermark[WMARK_HIGH] = low_wmark_pages(zone) + tmp; + zone->_watermark[WMARK_PROMO] = high_wmark_pages(zone) + tmp; spin_unlock_irqrestore(&zone->lock, flags); } --- a/mm/vmscan.c~numa-balancing-optimize-page-placement-for-memory-tiering-system +++ a/mm/vmscan.c @@ -56,6 +56,7 @@ #include #include +#include #include "internal.h" @@ -3895,7 +3896,10 @@ static bool pgdat_balanced(pg_data_t *pg if (!managed_zone(zone)) continue; - mark = high_wmark_pages(zone); + if (sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING) + mark = wmark_pages(zone, WMARK_PROMO); + else + mark = high_wmark_pages(zone); if (zone_watermark_ok_safe(zone, order, mark, highest_zoneidx)) return true; } From patchwork Tue Mar 22 21:46:27 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789240 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8BC77C433F5 for ; Tue, 22 Mar 2022 21:46:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F21846B015C; Tue, 22 Mar 2022 17:46:30 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id ED2596B015D; Tue, 22 Mar 2022 17:46:30 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D996C6B015E; Tue, 22 Mar 2022 17:46:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.28]) by kanga.kvack.org (Postfix) with ESMTP id C90EE6B015C for ; Tue, 22 Mar 2022 17:46:30 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 9C7B424396 for ; Tue, 22 Mar 2022 21:46:30 +0000 (UTC) X-FDA: 79273356540.03.E411823 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf07.hostedemail.com (Postfix) with ESMTP id 13AFA4001E for ; Tue, 22 Mar 2022 21:46:29 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id F368FB81D5F; Tue, 22 Mar 2022 21:46:28 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id B01E4C340EC; Tue, 22 Mar 2022 21:46:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985587; bh=9IqJpby1rXcDo2CXiHZeKmOcsdDkqvvMLJ8P6jqL7js=; h=Date:To:From:In-Reply-To:Subject:From; b=NU13ZgZaOecsXMgL969FB7+Dl297YXkEvTedEQ4LZBaEGlT4wEsxmGCAJOdZtwpB2 vHsINt3DiUbjgW7TeVCBlSGNcapqIDleBr0qzT/lFULeY0Q50c/6DxtMtfJiUVYBlc ywzQL9ynkOqN8Gu8/wH6aqviwoLhO6KX2+K/fYOQ= Date: Tue, 22 Mar 2022 14:46:27 -0700 To: ziy@nvidia.com,zhongjiang-ali@linux.alibaba.com,weixugc@google.com,shy828301@gmail.com,shakeelb@google.com,riel@surriel.com,rdunlap@infradead.org,peterz@infradead.org,osalvador@suse.de,mhocko@suse.com,mgorman@techsingularity.net,hannes@cmpxchg.org,feng.tang@intel.com,dave.hansen@linux.intel.com,baolin.wang@linux.alibaba.com,ying.huang@intel.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 156/227] memory tiering: skip to scan fast memory Message-Id: <20220322214627.B01E4C340EC@smtp.kernel.org> X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 13AFA4001E X-Stat-Signature: aar49cf98c5sbhwjdfb6k35qyccms139 Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=NU13ZgZa; dmarc=none; spf=pass (imf07.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Rspam-User: X-HE-Tag: 1647985589-573156 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Huang Ying Subject: memory tiering: skip to scan fast memory If the NUMA balancing isn't used to optimize the page placement among sockets but only among memory types, the hot pages in the fast memory node couldn't be migrated (promoted) to anywhere. So it's unnecessary to scan the pages in the fast memory node via changing their PTE/PMD mapping to be PROT_NONE. So that the page faults could be avoided too. In the test, if only the memory tiering NUMA balancing mode is enabled, the number of the NUMA balancing hint faults for the DRAM node is reduced to almost 0 with the patch. While the benchmark score doesn't change visibly. Link: https://lkml.kernel.org/r/20220221084529.1052339-4-ying.huang@intel.com Signed-off-by: "Huang, Ying" Suggested-by: Dave Hansen Tested-by: Baolin Wang Reviewed-by: Baolin Wang Acked-by: Johannes Weiner Reviewed-by: Oscar Salvador Reviewed-by: Yang Shi Cc: Michal Hocko Cc: Rik van Riel Cc: Mel Gorman Cc: Peter Zijlstra Cc: Zi Yan Cc: Wei Xu Cc: Shakeel Butt Cc: zhongjiang-ali Cc: Feng Tang Cc: Randy Dunlap Signed-off-by: Andrew Morton --- mm/huge_memory.c | 30 +++++++++++++++++++++--------- mm/mprotect.c | 13 ++++++++++++- 2 files changed, 33 insertions(+), 10 deletions(-) --- a/mm/huge_memory.c~memory-tiering-skip-to-scan-fast-memory +++ a/mm/huge_memory.c @@ -34,6 +34,7 @@ #include #include #include +#include #include #include @@ -1766,17 +1767,28 @@ int change_huge_pmd(struct vm_area_struc } #endif - /* - * Avoid trapping faults against the zero page. The read-only - * data is likely to be read-cached on the local CPU and - * local/remote hits to the zero page are not interesting. - */ - if (prot_numa && is_huge_zero_pmd(*pmd)) - goto unlock; + if (prot_numa) { + struct page *page; + /* + * Avoid trapping faults against the zero page. The read-only + * data is likely to be read-cached on the local CPU and + * local/remote hits to the zero page are not interesting. + */ + if (is_huge_zero_pmd(*pmd)) + goto unlock; - if (prot_numa && pmd_protnone(*pmd)) - goto unlock; + if (pmd_protnone(*pmd)) + goto unlock; + page = pmd_page(*pmd); + /* + * Skip scanning top tier node if normal numa + * balancing is disabled + */ + if (!(sysctl_numa_balancing_mode & NUMA_BALANCING_NORMAL) && + node_is_toptier(page_to_nid(page))) + goto unlock; + } /* * In case prot_numa, we are under mmap_read_lock(mm). It's critical * to not clear pmd intermittently to avoid race with MADV_DONTNEED --- a/mm/mprotect.c~memory-tiering-skip-to-scan-fast-memory +++ a/mm/mprotect.c @@ -29,6 +29,7 @@ #include #include #include +#include #include #include #include @@ -83,6 +84,7 @@ static unsigned long change_pte_range(st */ if (prot_numa) { struct page *page; + int nid; /* Avoid TLB flush if possible */ if (pte_protnone(oldpte)) @@ -109,7 +111,16 @@ static unsigned long change_pte_range(st * Don't mess with PTEs if page is already on the node * a single-threaded process is running on. */ - if (target_node == page_to_nid(page)) + nid = page_to_nid(page); + if (target_node == nid) + continue; + + /* + * Skip scanning top tier node if normal numa + * balancing is disabled + */ + if (!(sysctl_numa_balancing_mode & NUMA_BALANCING_NORMAL) && + node_is_toptier(nid)) continue; } From patchwork Tue Mar 22 21:46:30 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789241 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3859AC433EF for ; Tue, 22 Mar 2022 21:46:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C64546B015E; Tue, 22 Mar 2022 17:46:32 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B545B6B015F; Tue, 22 Mar 2022 17:46:32 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A16DC6B0160; Tue, 22 Mar 2022 17:46:32 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0093.hostedemail.com [216.40.44.93]) by kanga.kvack.org (Postfix) with ESMTP id 89DC26B015E for ; Tue, 22 Mar 2022 17:46:32 -0400 (EDT) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 4EA2AA32B2 for ; Tue, 22 Mar 2022 21:46:32 +0000 (UTC) X-FDA: 79273356624.23.0707DFE Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf27.hostedemail.com (Postfix) with ESMTP id DEC4F4003B for ; Tue, 22 Mar 2022 21:46:31 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 5F18861661; Tue, 22 Mar 2022 21:46:31 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id B1191C340EE; Tue, 22 Mar 2022 21:46:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985590; bh=xaK+sbY5miXqyqOSu9bwyslH+MF1ZJGfIkEfy8Dq/fE=; h=Date:To:From:In-Reply-To:Subject:From; b=KaTZQrBxUn7M4axLQnzaI3F+LIAs4kV4m/L5UN/n+rpQiUhh1TjYhBdWKIlnENeQz uHgZcga2OmYVFeqSTY0l+KjPv9VPrpfoWvhiO1/S21T1eTldaaWDjztfYgRU5h3AHS qOibatlrMN0G/RXSSOVBSyZa0/uOOQUH4yVY+KKA= Date: Tue, 22 Mar 2022 14:46:30 -0700 To: yuzhao@google.com,minchan@kernel.org,iamjoonsoo.kim@lge.com,cgel.zte@gmail.com,hannes@cmpxchg.org,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 157/227] mm: page_io: fix psi memory pressure error on cold swapins Message-Id: <20220322214630.B1191C340EE@smtp.kernel.org> X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: DEC4F4003B X-Stat-Signature: 7zohdra95dogt7ke3kahgrr3txitprpr X-Rspam-User: Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=KaTZQrBx; dmarc=none; spf=pass (imf27.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-HE-Tag: 1647985591-418692 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Johannes Weiner Subject: mm: page_io: fix psi memory pressure error on cold swapins Once upon a time, all swapins counted toward memory pressure[1]. Then Joonsoo introduced workingset detection for anonymous pages and we gained the ability to distinguish hot from cold swapins[2][3]. But we failed to update swap_readpage() accordingly, and now we account partial memory pressure in the swapin path of cold memory. Not for all situations - which adds more inconsistency: paths using the conventional submit_bio() and lock_page() route will not see much pressure - unless storage itself is heavily congested and the bio submissions stall. ZRAM and ZSWAP do most of the work directly from swap_readpage() and will see all swapins reflected as pressure. IOW, a workload doing cold swapins could see little to no pressure reported with on-disk swap, but potentially high pressure with a zram or zswap backend. That confuses any psi-based health monitoring, load shedding, proactive reclaim, or userspace OOM killing schemes that might be in place for the workload. Restore consistency by making all swapin stall accounting conditional on the page actually being part of the workingset. [1] commit 937790699be9 ("mm/page_io.c: annotate refault stalls from swap_readpage") [2] commit aae466b0052e ("mm/swap: implement workingset detection for anonymous LRU") [3] commit cad8320b4b39 ("mm/swap: don't SetPageWorkingset unconditionally during swapin") Link: https://lkml.kernel.org/r/20220214214921.419687-1-hannes@cmpxchg.org Signed-off-by: Johannes Weiner Reported-by: CGEL Acked-by: Minchan Kim Cc: Joonsoo Kim Cc: Yu Zhao Signed-off-by: Andrew Morton --- mm/page_io.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) --- a/mm/page_io.c~mm-page_io-fix-psi-memory-pressure-error-on-cold-swapins +++ a/mm/page_io.c @@ -359,6 +359,7 @@ int swap_readpage(struct page *page, boo struct bio *bio; int ret = 0; struct swap_info_struct *sis = page_swap_info(page); + bool workingset = PageWorkingset(page); unsigned long pflags; VM_BUG_ON_PAGE(!PageSwapCache(page) && !synchronous, page); @@ -370,7 +371,8 @@ int swap_readpage(struct page *page, boo * or the submitting cgroup IO-throttled, submission can be a * significant part of overall IO time. */ - psi_memstall_enter(&pflags); + if (workingset) + psi_memstall_enter(&pflags); delayacct_swapin_start(); if (frontswap_load(page) == 0) { @@ -433,7 +435,8 @@ int swap_readpage(struct page *page, boo bio_put(bio); out: - psi_memstall_leave(&pflags); + if (workingset) + psi_memstall_leave(&pflags); delayacct_swapin_end(); return ret; } From patchwork Tue Mar 22 21:46:33 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789242 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 17D8CC433EF for ; Tue, 22 Mar 2022 21:46:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A17BD6B0160; Tue, 22 Mar 2022 17:46:35 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9C8A96B0161; Tue, 22 Mar 2022 17:46:35 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8B7C26B0162; Tue, 22 Mar 2022 17:46:35 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0183.hostedemail.com [216.40.44.183]) by kanga.kvack.org (Postfix) with ESMTP id 7C2696B0160 for ; Tue, 22 Mar 2022 17:46:35 -0400 (EDT) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 4A4238249980 for ; Tue, 22 Mar 2022 21:46:35 +0000 (UTC) X-FDA: 79273356750.22.0DC9D29 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf01.hostedemail.com (Postfix) with ESMTP id D17DE4001E for ; Tue, 22 Mar 2022 21:46:34 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 50FF96119A; Tue, 22 Mar 2022 21:46:34 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id AAEE2C340F4; Tue, 22 Mar 2022 21:46:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985593; bh=5ErEVaDZfioISIDZcZyTLyPEkR5qz3X0E6do+82rk+c=; h=Date:To:From:In-Reply-To:Subject:From; b=u3MA7BHsUpbtzI1r2JqwI7icG5yAnsbZnyUZWf/c4lWDiUDJCGTS0KXDQAlaTNEAG mkb2RPqtOcvkOLGodpPgtNpM5BSxne3KSg3NyofHHW1447kU84jUSIj1nIOjAaXE0T eqjdcpL68SONtWD7FkpQmXAUym9TsKxTYuwtvodc= Date: Tue, 22 Mar 2022 14:46:33 -0700 To: yang.shi@linux.alibaba.com,saravanand@fb.com,ran.xiaokai@zte.com.cn,hughd@google.com,dave.hansen@linux.intel.com,yang.yang29@zte.com.cn,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 158/227] mm/vmstat: add event for ksm swapping in copy Message-Id: <20220322214633.AAEE2C340F4@smtp.kernel.org> X-Rspam-User: X-Stat-Signature: hfz1pzpuquq1kbe4y7zds4ykt8f39wor Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=u3MA7BHs; spf=pass (imf01.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: D17DE4001E X-HE-Tag: 1647985594-154373 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yang Yang Subject: mm/vmstat: add event for ksm swapping in copy When faults in from swap what used to be a KSM page and that page had been swapped in before, system has to make a copy, and leaves remerging the pages to a later pass of ksmd. That is not good for performace, we'd better to reduce this kind of copy. There are some ways to reduce it, for example lessen swappiness or madvise(, , MADV_MERGEABLE) range. So add this event to support doing this tuning. Just like this patch: "mm, THP, swap: add THP swapping out fallback counting". Link: https://lkml.kernel.org/r/20220113023839.758845-1-yang.yang29@zte.com.cn Signed-off-by: Yang Yang Reviewed-by: Ran Xiaokai Cc: Hugh Dickins Cc: Yang Shi Cc: Dave Hansen Cc: Saravanan D Signed-off-by: Andrew Morton --- include/linux/vm_event_item.h | 3 +++ mm/ksm.c | 3 +++ mm/vmstat.c | 3 +++ 3 files changed, 9 insertions(+) --- a/include/linux/vm_event_item.h~mm-vmstat-add-event-for-ksm-swapping-in-copy +++ a/include/linux/vm_event_item.h @@ -129,6 +129,9 @@ enum vm_event_item { PGPGIN, PGPGOUT, PS #ifdef CONFIG_SWAP SWAP_RA, SWAP_RA_HIT, +#ifdef CONFIG_KSM + KSM_SWPIN_COPY, +#endif #endif #ifdef CONFIG_X86 DIRECT_MAP_LEVEL2_SPLIT, --- a/mm/ksm.c~mm-vmstat-add-event-for-ksm-swapping-in-copy +++ a/mm/ksm.c @@ -2595,6 +2595,9 @@ struct page *ksm_might_need_to_copy(stru SetPageDirty(new_page); __SetPageUptodate(new_page); __SetPageLocked(new_page); +#ifdef CONFIG_SWAP + count_vm_event(KSM_SWPIN_COPY); +#endif } return new_page; --- a/mm/vmstat.c~mm-vmstat-add-event-for-ksm-swapping-in-copy +++ a/mm/vmstat.c @@ -1388,6 +1388,9 @@ const char * const vmstat_text[] = { #ifdef CONFIG_SWAP "swap_ra", "swap_ra_hit", +#ifdef CONFIG_KSM + "ksm_swpin_copy", +#endif #endif #ifdef CONFIG_X86 "direct_map_level2_splits", From patchwork Tue Mar 22 21:46:35 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789243 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 38F41C433F5 for ; Tue, 22 Mar 2022 21:46:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B78DC6B0162; Tue, 22 Mar 2022 17:46:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B27EF6B0163; Tue, 22 Mar 2022 17:46:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9EFC96B0164; Tue, 22 Mar 2022 17:46:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0108.hostedemail.com [216.40.44.108]) by kanga.kvack.org (Postfix) with ESMTP id 8D3E96B0162 for ; Tue, 22 Mar 2022 17:46:39 -0400 (EDT) Received: from smtpin31.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 4E0E68249980 for ; Tue, 22 Mar 2022 21:46:39 +0000 (UTC) X-FDA: 79273356918.31.278BA0D Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf01.hostedemail.com (Postfix) with ESMTP id D598640037 for ; Tue, 22 Mar 2022 21:46:38 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id D843BB81DAF; Tue, 22 Mar 2022 21:46:37 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9103FC340EC; Tue, 22 Mar 2022 21:46:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985596; bh=2ar7RtaI2P01CapefEmmSPNXxQF5PPXBY9ag+WtVuNo=; h=Date:To:From:In-Reply-To:Subject:From; b=tj0T7Qgt7Yo1td1FvuFpRQwoEj2rCVgKNzdfd2FjrAqD35S1Y6NOnI2EDk5wo5c4j 5zKTWg8/HAftsKaciY9PfPx4WuUhMRo4T7mHG5qeTzBLApTvATOzHSJzQMfH+mVV1T Z4cMgWDuj2lFvUCWi8939WO6FGI563s2gJli35u0= Date: Tue, 22 Mar 2022 14:46:35 -0700 To: linmiaohe@huawei.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 159/227] mm/ksm: use helper macro __ATTR_RW Message-Id: <20220322214636.9103FC340EC@smtp.kernel.org> X-Rspam-User: X-Stat-Signature: 989fy8khkmwuigwbkn61m4ppuktq6fox Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=tj0T7Qgt; spf=pass (imf01.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: D598640037 X-HE-Tag: 1647985598-77041 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Miaohe Lin Subject: mm/ksm: use helper macro __ATTR_RW Use helper macro __ATTR_RW to define KSM_ATTR to make code more clear. Minor readability improvement. Link: https://lkml.kernel.org/r/20220221115809.26381-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin Signed-off-by: Andrew Morton --- mm/ksm.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) --- a/mm/ksm.c~mm-ksm-use-helper-macro-__attr_rw +++ a/mm/ksm.c @@ -2829,8 +2829,7 @@ static void wait_while_offlining(void) #define KSM_ATTR_RO(_name) \ static struct kobj_attribute _name##_attr = __ATTR_RO(_name) #define KSM_ATTR(_name) \ - static struct kobj_attribute _name##_attr = \ - __ATTR(_name, 0644, _name##_show, _name##_store) + static struct kobj_attribute _name##_attr = __ATTR_RW(_name) static ssize_t sleep_millisecs_show(struct kobject *kobj, struct kobj_attribute *attr, char *buf) From patchwork Tue Mar 22 21:46:38 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789244 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7043FC433FE for ; Tue, 22 Mar 2022 21:46:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 02E346B0164; Tue, 22 Mar 2022 17:46:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id F1F8A6B0165; Tue, 22 Mar 2022 17:46:42 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DE7736B0166; Tue, 22 Mar 2022 17:46:42 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.26]) by kanga.kvack.org (Postfix) with ESMTP id CD2B86B0164 for ; Tue, 22 Mar 2022 17:46:42 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay11.hostedemail.com (Postfix) with ESMTP id 9949981A3D for ; Tue, 22 Mar 2022 21:46:42 +0000 (UTC) X-FDA: 79273357044.13.3C6F1E3 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf28.hostedemail.com (Postfix) with ESMTP id 0829CC0037 for ; Tue, 22 Mar 2022 21:46:41 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id E7316B81DAF; Tue, 22 Mar 2022 21:46:40 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 8E875C340EC; Tue, 22 Mar 2022 21:46:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985599; bh=jedrvYwFspMSMsyT77JGhwiWtvwDzGhVLUBNE0+fuGg=; h=Date:To:From:In-Reply-To:Subject:From; b=z+c8GO9uLdRpXBnRpB75QxO0Az8B6Qf7dFo31CUWKSGchQ60fA1CYCFVFpUlyNI+D wT43ZMBv+6kdyo6jRVkspCt94d6jGjXapjlvD9UYGepsoHdAYtN7n3LvKQE3rTpvP3 tb3QCCYdOUkRPatb3IZpLFR+JEKZH+y4v070+faA= Date: Tue, 22 Mar 2022 14:46:38 -0700 To: shy828301@gmail.com,rientjes@google.com,naoya.horiguchi@nec.com,mike.kravetz@oracle.com,willy@infradead.org,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 160/227] mm/hwpoison: check the subpage, not the head page Message-Id: <20220322214639.8E875C340EC@smtp.kernel.org> X-Rspam-User: X-Stat-Signature: 7wtmb67obno6ssy6e1wpyg974f8re4sr Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=z+c8GO9u; spf=pass (imf28.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 0829CC0037 X-HE-Tag: 1647985601-287898 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: "Matthew Wilcox (Oracle)" Subject: mm/hwpoison: check the subpage, not the head page Hardware poison is tracked on a per-page basis, not on the head page. Link: https://lkml.kernel.org/r/20220130013042.1906881-1-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) Acked-by: Naoya Horiguchi Reviewed-by: Yang Shi Cc: David Rientjes Cc: Mike Kravetz Signed-off-by: Andrew Morton --- mm/rmap.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) --- a/mm/rmap.c~mm-hwpoison-check-the-subpage-not-the-head-page +++ a/mm/rmap.c @@ -1553,7 +1553,7 @@ static bool try_to_unmap_one(struct page /* Update high watermark before we lower rss */ update_hiwater_rss(mm); - if (PageHWPoison(page) && !(flags & TTU_IGNORE_HWPOISON)) { + if (PageHWPoison(subpage) && !(flags & TTU_IGNORE_HWPOISON)) { pteval = swp_entry_to_pte(make_hwpoison_entry(subpage)); if (PageHuge(page)) { hugetlb_count_sub(compound_nr(page), mm); @@ -1873,7 +1873,7 @@ static bool try_to_migrate_one(struct pa * memory are supported. */ subpage = page; - } else if (PageHWPoison(page)) { + } else if (PageHWPoison(subpage)) { pteval = swp_entry_to_pte(make_hwpoison_entry(subpage)); if (PageHuge(page)) { hugetlb_count_sub(compound_nr(page), mm); From patchwork Tue Mar 22 21:46:41 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789245 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F2570C433F5 for ; Tue, 22 Mar 2022 21:46:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8BACB6B0166; Tue, 22 Mar 2022 17:46:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 81CF06B0167; Tue, 22 Mar 2022 17:46:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 61F846B0168; Tue, 22 Mar 2022 17:46:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.27]) by kanga.kvack.org (Postfix) with ESMTP id 484CB6B0166 for ; Tue, 22 Mar 2022 17:46:44 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 1FACE21D78 for ; Tue, 22 Mar 2022 21:46:44 +0000 (UTC) X-FDA: 79273357128.11.37E377B Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf26.hostedemail.com (Postfix) with ESMTP id 9C379140034 for ; Tue, 22 Mar 2022 21:46:43 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 2D4F761658; Tue, 22 Mar 2022 21:46:43 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7F6CEC340EC; Tue, 22 Mar 2022 21:46:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985602; bh=DZeMtvAKaJMEfII0unlLLrvet4WaXJzruMCAT5PWPkw=; h=Date:To:From:In-Reply-To:Subject:From; b=tjqcLOtVdZVoMjm+7UHwoRg9Ri4zf4P8deO2fQLao9R8VKTfapaDMX/8kM9uaztc9 BRDePb/9rL5RFEM99/sFFnD15N6BRl1L2isT8uR3zbDVlzSRyrMPN1KzQBXDdxykdA HCKTupwgDqfDAuiPSMvqQV5LHneKiMgSIu94po6o= Date: Tue, 22 Mar 2022 14:46:41 -0700 To: david@redhat.com,linmiaohe@huawei.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 161/227] mm/madvise: use vma_lookup() instead of find_vma() Message-Id: <20220322214642.7F6CEC340EC@smtp.kernel.org> X-Rspam-User: X-Stat-Signature: dqkbcrei7mw6j7xpki6ya43prptqbf3u Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=tjqcLOtV; spf=pass (imf26.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 9C379140034 X-HE-Tag: 1647985603-603920 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Miaohe Lin Subject: mm/madvise: use vma_lookup() instead of find_vma() Using vma_lookup() verifies the start address is contained in the found vma. This results in easier to read the code. Link: https://lkml.kernel.org/r/20220311082731.63513-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin Reviewed-by: David Hildenbrand Signed-off-by: Andrew Morton --- mm/madvise.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) --- a/mm/madvise.c~mm-madvise-use-vma_lookup-instead-of-find_vma +++ a/mm/madvise.c @@ -849,8 +849,8 @@ static long madvise_populate(struct vm_a * our VMA might have been split. */ if (!vma || start >= vma->vm_end) { - vma = find_vma(mm, start); - if (!vma || start < vma->vm_start) + vma = vma_lookup(mm, start); + if (!vma) return -ENOMEM; } From patchwork Tue Mar 22 21:46:44 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789246 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 56965C4332F for ; Tue, 22 Mar 2022 21:46:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DE8C66B0168; Tue, 22 Mar 2022 17:46:48 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D984F6B0169; Tue, 22 Mar 2022 17:46:48 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C87EB6B016A; Tue, 22 Mar 2022 17:46:48 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0224.hostedemail.com [216.40.44.224]) by kanga.kvack.org (Postfix) with ESMTP id B87166B0168 for ; Tue, 22 Mar 2022 17:46:48 -0400 (EDT) Received: from smtpin28.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 784031828AE2E for ; Tue, 22 Mar 2022 21:46:48 +0000 (UTC) X-FDA: 79273357296.28.24C0530 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf09.hostedemail.com (Postfix) with ESMTP id EB957140012 for ; Tue, 22 Mar 2022 21:46:47 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id D4C9BB81DAF; Tue, 22 Mar 2022 21:46:46 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 900B1C340F2; Tue, 22 Mar 2022 21:46:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985605; bh=DcWI2doj2SlVH2dL49MiEISyaQO5oHBqgS/a2Bax58s=; h=Date:To:From:In-Reply-To:Subject:From; b=Lj+WZW9cQsxX+l0SVoKezLDRzmLNPDQKjVGPX8f2Q2pPxN643MYyrO71m+5snAand NWqIbJCJcx4H7BEWntc4G7cYt25xD+prJRwrv31ZrCZluo7syPPU3HbfJAnz3uQ+Um eRKYG65KVQaVU356yXB4Nrrk9dLiOYn7R6DMJunQ= Date: Tue, 22 Mar 2022 14:46:44 -0700 To: vbabka@suse.cz,surenb@google.com,stable@vger.kernel.org,sfr@canb.auug.org.au,rientjes@google.com,nadav.amit@gmail.com,minchan@kernel.org,mhocko@suse.com,quic_charante@quicinc.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 162/227] mm: madvise: return correct bytes advised with process_madvise Message-Id: <20220322214645.900B1C340F2@smtp.kernel.org> Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=Lj+WZW9c; spf=pass (imf09.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Stat-Signature: pzn9fd9jeek1rm4gieo3ydt3rrdb69gk X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: EB957140012 X-HE-Tag: 1647985607-832485 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Charan Teja Kalla Subject: mm: madvise: return correct bytes advised with process_madvise Patch series "mm: madvise: return correct bytes processed with process_madvise", v2. With the process_madvise(), always choose to return non zero processed bytes over an error. This can help the user to know on which VMA, passed in the 'struct iovec' vector list, is failed to advise thus can take the decission of retrying/skipping on that VMA. This patch (of 2): The process_madvise() system call returns error even after processing some VMA's passed in the 'struct iovec' vector list which leaves the user confused to know where to restart the advise next. It is also against this syscall man page[1] documentation where it mentions that "return value may be less than the total number of requested bytes, if an error occurred after some iovec elements were already processed.". Consider a user passed 10 VMA's in the 'struct iovec' vector list of which 9 are processed but one. Then it just returns the error caused on that failed VMA despite the first 9 VMA's processed, leaving the user confused about on which VMA it is failed. Returning the number of bytes processed here can help the user to know which VMA it is failed on and thus can retry/skip the advise on that VMA. [1]https://man7.org/linux/man-pages/man2/process_madvise.2.html. Link: https://lkml.kernel.org/r/cover.1647008754.git.quic_charante@quicinc.com Link: https://lkml.kernel.org/r/125b61a0edcee5c2db8658aed9d06a43a19ccafc.1647008754.git.quic_charante@quicinc.com Fixes: ecb8ac8b1f14("mm/madvise: introduce process_madvise() syscall: an external memory hinting API") Signed-off-by: Charan Teja Kalla Cc: Suren Baghdasaryan Cc: Vlastimil Babka Cc: David Rientjes Cc: Stephen Rothwell Cc: Minchan Kim Cc: Nadav Amit Cc: Michal Hocko Cc: Signed-off-by: Andrew Morton --- mm/madvise.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) --- a/mm/madvise.c~mm-madvise-return-correct-bytes-advised-with-process_madvise +++ a/mm/madvise.c @@ -1435,8 +1435,7 @@ SYSCALL_DEFINE5(process_madvise, int, pi iov_iter_advance(&iter, iovec.iov_len); } - if (ret == 0) - ret = total_len - iov_iter_count(&iter); + ret = (total_len - iov_iter_count(&iter)) ? : ret; release_mm: mmput(mm); From patchwork Tue Mar 22 21:46:48 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789248 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 97341C4332F for ; Tue, 22 Mar 2022 21:46:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 251936B016B; Tue, 22 Mar 2022 17:46:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2025B6B016A; Tue, 22 Mar 2022 17:46:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 052B56B016C; Tue, 22 Mar 2022 17:46:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0155.hostedemail.com [216.40.44.155]) by kanga.kvack.org (Postfix) with ESMTP id E9DFE6B016A for ; Tue, 22 Mar 2022 17:46:55 -0400 (EDT) Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id A38181828AE2E for ; Tue, 22 Mar 2022 21:46:55 +0000 (UTC) X-FDA: 79273357590.29.F816939 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf25.hostedemail.com (Postfix) with ESMTP id 379ADA002C for ; Tue, 22 Mar 2022 21:46:55 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 005AEB81DC6; Tue, 22 Mar 2022 21:46:50 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id AB7A1C340EC; Tue, 22 Mar 2022 21:46:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985608; bh=aWVQSF/htNZy1jYL2zv40uPoiE5nLGqWuLDXHzyFsrk=; h=Date:To:From:In-Reply-To:Subject:From; b=hJbVq/TxDWydbfNA+HTiHuMcz+ngF/ZHQfx+kH1CgIR0T0QcgYHlv1w2dGpB+OTZ8 GaecN07YbTHyMBD1UPtuxaE8NA63sDQd/4IEEQUdUhiIKpKRM4zIhZavVY3Dr4xIDz 4dGyUWJQ2s0FlBCMlU0zq7/vv9WYjgCmWvLIzaSw= Date: Tue, 22 Mar 2022 14:46:48 -0700 To: vbabka@suse.cz,surenb@google.com,stable@vger.kernel.org,sfr@canb.auug.org.au,rientjes@google.com,nadav.amit@gmail.com,minchan@kernel.org,mhocko@suse.com,quic_charante@quicinc.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 163/227] mm: madvise: skip unmapped vma holes passed to process_madvise Message-Id: <20220322214648.AB7A1C340EC@smtp.kernel.org> X-Stat-Signature: 7u3qwkepmpqruah85b8k5ewz5wpqktx9 Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b="hJbVq/Tx"; spf=pass (imf25.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspam-User: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 379ADA002C X-HE-Tag: 1647985615-996166 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Charan Teja Kalla Subject: mm: madvise: skip unmapped vma holes passed to process_madvise The process_madvise() system call is expected to skip holes in vma passed through 'struct iovec' vector list. But do_madvise, which process_madvise() calls for each vma, returns ENOMEM in case of unmapped holes, despite the VMA is processed. Thus process_madvise() should treat ENOMEM as expected and consider the VMA passed to as processed and continue processing other vma's in the vector list. Returning -ENOMEM to user, despite the VMA is processed, will be unable to figure out where to start the next madvise. Link: https://lkml.kernel.org/r/4f091776142f2ebf7b94018146de72318474e686.1647008754.git.quic_charante@quicinc.com Fixes: ecb8ac8b1f14("mm/madvise: introduce process_madvise() syscall: an external memory hinting API") Signed-off-by: Charan Teja Kalla Cc: David Rientjes Cc: Michal Hocko Cc: Minchan Kim Cc: Nadav Amit Cc: Stephen Rothwell Cc: Suren Baghdasaryan Cc: Vlastimil Babka Cc: Signed-off-by: Andrew Morton --- mm/madvise.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) --- a/mm/madvise.c~mm-madvise-skip-unmapped-vma-holes-passed-to-process_madvise +++ a/mm/madvise.c @@ -1428,9 +1428,16 @@ SYSCALL_DEFINE5(process_madvise, int, pi while (iov_iter_count(&iter)) { iovec = iov_iter_iovec(&iter); + /* + * do_madvise returns ENOMEM if unmapped holes are present + * in the passed VMA. process_madvise() is expected to skip + * unmapped holes passed to it in the 'struct iovec' list + * and not fail because of them. Thus treat -ENOMEM return + * from do_madvise as valid and continue processing. + */ ret = do_madvise(mm, (unsigned long)iovec.iov_base, iovec.iov_len, behavior); - if (ret < 0) + if (ret < 0 && ret != -ENOMEM) break; iov_iter_advance(&iter, iovec.iov_len); } From patchwork Tue Mar 22 21:46:51 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789247 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 99139C43217 for ; Tue, 22 Mar 2022 21:46:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5021A6B016A; Tue, 22 Mar 2022 17:46:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 312D36B016D; Tue, 22 Mar 2022 17:46:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1B19C6B016E; Tue, 22 Mar 2022 17:46:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.26]) by kanga.kvack.org (Postfix) with ESMTP id 06AE66B016A for ; Tue, 22 Mar 2022 17:46:56 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id D3EB124345 for ; Tue, 22 Mar 2022 21:46:55 +0000 (UTC) X-FDA: 79273357590.14.B02D149 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf15.hostedemail.com (Postfix) with ESMTP id 2C768A0029 for ; Tue, 22 Mar 2022 21:46:55 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 2A418B81DAF; Tue, 22 Mar 2022 21:46:53 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id D1831C340F2; Tue, 22 Mar 2022 21:46:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985611; bh=1zSoEtIHOKY7J1FRXgtxNycyaKaHvAaljecoBEBAIfM=; h=Date:To:From:In-Reply-To:Subject:From; b=ts1lxcHh/R+3zTHNoSIDZOopqlIEgzvioOuQpajFOhIZIwmHRm09f21CY7ytxAVzp /tOBy28ygHyNmctAIacn0nWVDbYAEi8lsICz7UcRl2QnGauxYHdfWc7e5j7E9w8oTA m7sIO6owbk2R+cTfuDkPqvAmlpk2QfcaUigCJLDo= Date: Tue, 22 Mar 2022 14:46:51 -0700 To: tj@kernel.org,rppt@linux.ibm.com,richard.weiyang@gmail.com,raquini@redhat.com,osalvador@suse.de,npache@redhat.com,eric.dumazet@gmail.com,dennis@kernel.org,david@redhat.com,cl@linux.com,amakhalov@vmware.com,mhocko@suse.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 164/227] mm, memory_hotplug: make arch_alloc_nodedata independent on CONFIG_MEMORY_HOTPLUG Message-Id: <20220322214651.D1831C340F2@smtp.kernel.org> X-Stat-Signature: jde5tw3bkkng3858ffbsxtabk14wskz9 X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 2C768A0029 Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=ts1lxcHh; dmarc=none; spf=pass (imf15.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Rspam-User: X-HE-Tag: 1647985615-735421 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Michal Hocko Subject: mm, memory_hotplug: make arch_alloc_nodedata independent on CONFIG_MEMORY_HOTPLUG Patch series "mm, memory_hotplug: handle unitialized numa node gracefully". The core of the fix is patch 2 which also links existing bug reports. The high level goal is to have all possible numa nodes have their pgdat allocated and initialized so for_each_possible_node(nid) NODE_DATA(nid) will never return garbage. This has proven to be problem in several places when an offline numa node is used for an allocation just to realize that node_data and therefore allocation fallback zonelists are not initialized and such an allocation request blows up. There were attempts to address that by checking node_online in several places including the page allocator. This patchset approaches the problem from a different perspective and instead of special casing, which just adds a runtime overhead, it allocates pglist_data for each possible node. This can add some memory overhead for platforms with high number of possible nodes if they do not contain any memory. This should be a rather rare configuration though. How to test this? David has provided and excellent howto: http://lkml.kernel.org/r/6e5ebc19-890c-b6dd-1924-9f25c441010d@redhat.com Patches 1 and 3-6 are mostly cleanups. The patchset has been reviewed by Rafael (thanks!) and the core fix tested by Rafael and Alexey (thanks to both). David has tested as per instructions above and hasn't found any fallouts in the memory hotplug scenarios. This patch (of 6): This is a preparatory patch and it doesn't introduce any functional change. It merely pulls out arch_alloc_nodedata (and co) outside of CONFIG_MEMORY_HOTPLUG because the following patch will need to call this from the generic MM code. Link: https://lkml.kernel.org/r/20220127085305.20890-1-mhocko@kernel.org Link: https://lkml.kernel.org/r/20220127085305.20890-2-mhocko@kernel.org Signed-off-by: Michal Hocko Acked-by: Rafael Aquini Acked-by: David Hildenbrand Acked-by: Mike Rapoport Reviewed-by: Oscar Salvador Reviewed-by: Wei Yang Cc: Alexey Makhalov Cc: Christoph Lameter Cc: Dennis Zhou Cc: Eric Dumazet Cc: Nico Pache Cc: Tejun Heo Signed-off-by: Andrew Morton --- arch/ia64/mm/discontig.c | 2 include/linux/memory_hotplug.h | 119 +++++++++++++++---------------- 2 files changed, 59 insertions(+), 62 deletions(-) --- a/arch/ia64/mm/discontig.c~mm-memory_hotplug-make-arch_alloc_nodedata-independent-on-config_memory_hotplug +++ a/arch/ia64/mm/discontig.c @@ -608,7 +608,6 @@ void __init paging_init(void) zero_page_memmap_ptr = virt_to_page(ia64_imva(empty_zero_page)); } -#ifdef CONFIG_MEMORY_HOTPLUG pg_data_t *arch_alloc_nodedata(int nid) { unsigned long size = compute_pernodesize(nid); @@ -626,7 +625,6 @@ void arch_refresh_nodedata(int update_no pgdat_list[update_node] = update_pgdat; scatter_node_data(); } -#endif #ifdef CONFIG_SPARSEMEM_VMEMMAP int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node, --- a/include/linux/memory_hotplug.h~mm-memory_hotplug-make-arch_alloc_nodedata-independent-on-config_memory_hotplug +++ a/include/linux/memory_hotplug.h @@ -16,6 +16,65 @@ struct memory_group; struct resource; struct vmem_altmap; +#ifdef CONFIG_HAVE_ARCH_NODEDATA_EXTENSION +/* + * For supporting node-hotadd, we have to allocate a new pgdat. + * + * If an arch has generic style NODE_DATA(), + * node_data[nid] = kzalloc() works well. But it depends on the architecture. + * + * In general, generic_alloc_nodedata() is used. + * Now, arch_free_nodedata() is just defined for error path of node_hot_add. + * + */ +extern pg_data_t *arch_alloc_nodedata(int nid); +extern void arch_free_nodedata(pg_data_t *pgdat); +extern void arch_refresh_nodedata(int nid, pg_data_t *pgdat); + +#else /* CONFIG_HAVE_ARCH_NODEDATA_EXTENSION */ + +#define arch_alloc_nodedata(nid) generic_alloc_nodedata(nid) +#define arch_free_nodedata(pgdat) generic_free_nodedata(pgdat) + +#ifdef CONFIG_NUMA +/* + * XXX: node aware allocation can't work well to get new node's memory at this time. + * Because, pgdat for the new node is not allocated/initialized yet itself. + * To use new node's memory, more consideration will be necessary. + */ +#define generic_alloc_nodedata(nid) \ +({ \ + kzalloc(sizeof(pg_data_t), GFP_KERNEL); \ +}) +/* + * This definition is just for error path in node hotadd. + * For node hotremove, we have to replace this. + */ +#define generic_free_nodedata(pgdat) kfree(pgdat) + +extern pg_data_t *node_data[]; +static inline void arch_refresh_nodedata(int nid, pg_data_t *pgdat) +{ + node_data[nid] = pgdat; +} + +#else /* !CONFIG_NUMA */ + +/* never called */ +static inline pg_data_t *generic_alloc_nodedata(int nid) +{ + BUG(); + return NULL; +} +static inline void generic_free_nodedata(pg_data_t *pgdat) +{ +} +static inline void arch_refresh_nodedata(int nid, pg_data_t *pgdat) +{ +} +#endif /* CONFIG_NUMA */ +#endif /* CONFIG_HAVE_ARCH_NODEDATA_EXTENSION */ + #ifdef CONFIG_MEMORY_HOTPLUG struct page *pfn_to_online_page(unsigned long pfn); @@ -154,66 +213,6 @@ int add_pages(int nid, unsigned long sta struct mhp_params *params); #endif /* ARCH_HAS_ADD_PAGES */ -#ifdef CONFIG_HAVE_ARCH_NODEDATA_EXTENSION -/* - * For supporting node-hotadd, we have to allocate a new pgdat. - * - * If an arch has generic style NODE_DATA(), - * node_data[nid] = kzalloc() works well. But it depends on the architecture. - * - * In general, generic_alloc_nodedata() is used. - * Now, arch_free_nodedata() is just defined for error path of node_hot_add. - * - */ -extern pg_data_t *arch_alloc_nodedata(int nid); -extern void arch_free_nodedata(pg_data_t *pgdat); -extern void arch_refresh_nodedata(int nid, pg_data_t *pgdat); - -#else /* CONFIG_HAVE_ARCH_NODEDATA_EXTENSION */ - -#define arch_alloc_nodedata(nid) generic_alloc_nodedata(nid) -#define arch_free_nodedata(pgdat) generic_free_nodedata(pgdat) - -#ifdef CONFIG_NUMA -/* - * If ARCH_HAS_NODEDATA_EXTENSION=n, this func is used to allocate pgdat. - * XXX: kmalloc_node() can't work well to get new node's memory at this time. - * Because, pgdat for the new node is not allocated/initialized yet itself. - * To use new node's memory, more consideration will be necessary. - */ -#define generic_alloc_nodedata(nid) \ -({ \ - kzalloc(sizeof(pg_data_t), GFP_KERNEL); \ -}) -/* - * This definition is just for error path in node hotadd. - * For node hotremove, we have to replace this. - */ -#define generic_free_nodedata(pgdat) kfree(pgdat) - -extern pg_data_t *node_data[]; -static inline void arch_refresh_nodedata(int nid, pg_data_t *pgdat) -{ - node_data[nid] = pgdat; -} - -#else /* !CONFIG_NUMA */ - -/* never called */ -static inline pg_data_t *generic_alloc_nodedata(int nid) -{ - BUG(); - return NULL; -} -static inline void generic_free_nodedata(pg_data_t *pgdat) -{ -} -static inline void arch_refresh_nodedata(int nid, pg_data_t *pgdat) -{ -} -#endif /* CONFIG_NUMA */ -#endif /* CONFIG_HAVE_ARCH_NODEDATA_EXTENSION */ - void get_online_mems(void); void put_online_mems(void); From patchwork Tue Mar 22 21:46:54 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789249 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 58B94C4321E for ; Tue, 22 Mar 2022 21:46:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9A5906B016D; Tue, 22 Mar 2022 17:46:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 954256B016E; Tue, 22 Mar 2022 17:46:58 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 77E756B016F; Tue, 22 Mar 2022 17:46:58 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0239.hostedemail.com [216.40.44.239]) by kanga.kvack.org (Postfix) with ESMTP id 4FCF96B016D for ; Tue, 22 Mar 2022 17:46:58 -0400 (EDT) Received: from smtpin27.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 108AD18289DE1 for ; Tue, 22 Mar 2022 21:46:58 +0000 (UTC) X-FDA: 79273357716.27.22ED411 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf05.hostedemail.com (Postfix) with ESMTP id 5E216100028 for ; Tue, 22 Mar 2022 21:46:57 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 4F1A5B81DB7; Tue, 22 Mar 2022 21:46:56 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 05995C340EC; Tue, 22 Mar 2022 21:46:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985615; bh=J6sxI0zslAJM8odRc4cTzmPlpT6OowBIpVrA4fYcmII=; h=Date:To:From:In-Reply-To:Subject:From; b=rZkRUX3HaZDL62POJz86cJWn626MygVzL6Bpb+L2fgJuRs7AJXKBYCAXjOn3i6CHm P8+MxGN4gBkQUa4E2S6L5QfQQxSgZ0QJ4MgshHfGGn08sBXK8NjG0Aj4rspJj1ck1M mSCdE8XU+bBv2q4LjlsMyHb3vPmEf7RwXuVEpXZs= Date: Tue, 22 Mar 2022 14:46:54 -0700 To: tj@kernel.org,rppt@linux.ibm.com,richard.weiyang@gmail.com,raquini@redhat.com,osalvador@suse.de,npache@redhat.com,eric.dumazet@gmail.com,dennis@kernel.org,david@redhat.com,cl@linux.com,amakhalov@vmware.com,mhocko@suse.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 165/227] mm: handle uninitialized numa nodes gracefully Message-Id: <20220322214655.05995C340EC@smtp.kernel.org> X-Stat-Signature: 3rhr1j36qoyf66kwgsjarunwqt1hzjw5 X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 5E216100028 Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=rZkRUX3H; dmarc=none; spf=pass (imf05.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Rspam-User: X-HE-Tag: 1647985617-250665 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Michal Hocko Subject: mm: handle uninitialized numa nodes gracefully We have had several reports [1][2][3] that page allocator blows up when an allocation from a possible node is requested. The underlying reason is that NODE_DATA for the specific node is not allocated. NUMA specific initialization is arch specific and it can vary a lot. E.g. x86 tries to initialize all nodes that have some cpu affinity (see init_cpu_to_node) but this can be insufficient because the node might be cpuless for example. One way to address this problem would be to check for !node_online nodes when trying to get a zonelist and silently fall back to another node. That is unfortunately adding a branch into allocator hot path and it doesn't handle any other potential NODE_DATA users. This patch takes a different approach (following a lead of [3]) and it pre allocates pgdat for all possible nodes in an arch indipendent code - free_area_init. All uninitialized nodes are treated as memoryless nodes. node_state of the node is not changed because that would lead to other side effects - e.g. sysfs representation of such a node and from past discussions [4] it is known that some tools might have problems digesting that. Newly allocated pgdat only gets a minimal initialization and the rest of the work is expected to be done by the memory hotplug - hotadd_new_pgdat (renamed to hotadd_init_pgdat). generic_alloc_nodedata is changed to use the memblock allocator because neither page nor slab allocators are available at the stage when all pgdats are allocated. Hotplug doesn't allocate pgdat anymore so we can use the early boot allocator. The only arch specific implementation is ia64 and that is changed to use the early allocator as well. [1] http://lkml.kernel.org/r/20211101201312.11589-1-amakhalov@vmware.com [2] http://lkml.kernel.org/r/20211207224013.880775-1-npache@redhat.com [3] http://lkml.kernel.org/r/20190114082416.30939-1-mhocko@kernel.org [4] http://lkml.kernel.org/r/20200428093836.27190-1-srikar@linux.vnet.ibm.com [akpm@linux-foundation.org: replace comment, per Mike] Link: https://lkml.kernel.org/r/Yfe7RBeLCijnWBON@dhcp22.suse.cz Reported-by: Alexey Makhalov Tested-by: Alexey Makhalov Reported-by: Nico Pache Acked-by: Rafael Aquini Tested-by: Rafael Aquini Acked-by: David Hildenbrand Reviewed-by: Oscar Salvador Acked-by: Mike Rapoport Signed-off-by: Michal Hocko Cc: Christoph Lameter Cc: Dennis Zhou Cc: Eric Dumazet Cc: Tejun Heo Cc: Wei Yang Signed-off-by: Andrew Morton --- arch/ia64/mm/discontig.c | 4 +-- include/linux/memory_hotplug.h | 2 - mm/internal.h | 2 + mm/memory_hotplug.c | 21 ++++++---------- mm/page_alloc.c | 40 +++++++++++++++++++++++++++---- 5 files changed, 50 insertions(+), 19 deletions(-) --- a/arch/ia64/mm/discontig.c~mm-handle-uninitialized-numa-nodes-gracefully +++ a/arch/ia64/mm/discontig.c @@ -608,11 +608,11 @@ void __init paging_init(void) zero_page_memmap_ptr = virt_to_page(ia64_imva(empty_zero_page)); } -pg_data_t *arch_alloc_nodedata(int nid) +pg_data_t * __init arch_alloc_nodedata(int nid) { unsigned long size = compute_pernodesize(nid); - return kzalloc(size, GFP_KERNEL); + return memblock_alloc(size, SMP_CACHE_BYTES); } void arch_free_nodedata(pg_data_t *pgdat) --- a/include/linux/memory_hotplug.h~mm-handle-uninitialized-numa-nodes-gracefully +++ a/include/linux/memory_hotplug.h @@ -44,7 +44,7 @@ extern void arch_refresh_nodedata(int ni */ #define generic_alloc_nodedata(nid) \ ({ \ - kzalloc(sizeof(pg_data_t), GFP_KERNEL); \ + memblock_alloc(sizeof(*pgdat), SMP_CACHE_BYTES); \ }) /* * This definition is just for error path in node hotadd. --- a/mm/internal.h~mm-handle-uninitialized-numa-nodes-gracefully +++ a/mm/internal.h @@ -707,4 +707,6 @@ void vunmap_range_noflush(unsigned long int numa_migrate_prep(struct page *page, struct vm_area_struct *vma, unsigned long addr, int page_nid, int *flags); +DECLARE_PER_CPU(struct per_cpu_nodestat, boot_nodestats); + #endif /* __MM_INTERNAL_H */ --- a/mm/memory_hotplug.c~mm-handle-uninitialized-numa-nodes-gracefully +++ a/mm/memory_hotplug.c @@ -1162,19 +1162,21 @@ static void reset_node_present_pages(pg_ } /* we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG */ -static pg_data_t __ref *hotadd_new_pgdat(int nid) +static pg_data_t __ref *hotadd_init_pgdat(int nid) { struct pglist_data *pgdat; pgdat = NODE_DATA(nid); - if (!pgdat) { - pgdat = arch_alloc_nodedata(nid); - if (!pgdat) - return NULL; + /* + * NODE_DATA is preallocated (free_area_init) but its internal + * state is not allocated completely. Add missing pieces. + * Completely offline nodes stay around and they just need + * reintialization. + */ + if (pgdat->per_cpu_nodestats == &boot_nodestats) { pgdat->per_cpu_nodestats = alloc_percpu(struct per_cpu_nodestat); - arch_refresh_nodedata(nid, pgdat); } else { int cpu; /* @@ -1193,8 +1195,6 @@ static pg_data_t __ref *hotadd_new_pgdat } } - /* we can use NODE_DATA(nid) from here */ - pgdat->node_id = nid; pgdat->node_start_pfn = 0; /* init node's zones as empty zones, we don't have any present pages.*/ @@ -1246,7 +1246,7 @@ static int __try_online_node(int nid, bo if (node_online(nid)) return 0; - pgdat = hotadd_new_pgdat(nid); + pgdat = hotadd_init_pgdat(nid); if (!pgdat) { pr_err("Cannot online node %d due to NULL pgdat\n", nid); ret = -ENOMEM; @@ -1445,9 +1445,6 @@ int __ref add_memory_resource(int nid, s return ret; error: - /* rollback pgdat allocation and others */ - if (new_node) - rollback_node_hotadd(nid); if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK)) memblock_remove(start, size); error_mem_hotplug_end: --- a/mm/page_alloc.c~mm-handle-uninitialized-numa-nodes-gracefully +++ a/mm/page_alloc.c @@ -6341,7 +6341,7 @@ static void per_cpu_pages_init(struct pe #define BOOT_PAGESET_BATCH 1 static DEFINE_PER_CPU(struct per_cpu_pages, boot_pageset); static DEFINE_PER_CPU(struct per_cpu_zonestat, boot_zonestats); -static DEFINE_PER_CPU(struct per_cpu_nodestat, boot_nodestats); +DEFINE_PER_CPU(struct per_cpu_nodestat, boot_nodestats); static void __build_all_zonelists(void *data) { @@ -6363,7 +6363,11 @@ static void __build_all_zonelists(void * if (self && !node_online(self->node_id)) { build_zonelists(self); } else { - for_each_online_node(nid) { + /* + * All possible nodes have pgdat preallocated + * in free_area_init + */ + for_each_node(nid) { pg_data_t *pgdat = NODE_DATA(nid); build_zonelists(pgdat); @@ -8063,8 +8067,36 @@ void __init free_area_init(unsigned long /* Initialise every node */ mminit_verify_pageflags_layout(); setup_nr_node_ids(); - for_each_online_node(nid) { - pg_data_t *pgdat = NODE_DATA(nid); + for_each_node(nid) { + pg_data_t *pgdat; + + if (!node_online(nid)) { + pr_info("Initializing node %d as memoryless\n", nid); + + /* Allocator not initialized yet */ + pgdat = arch_alloc_nodedata(nid); + if (!pgdat) { + pr_err("Cannot allocate %zuB for node %d.\n", + sizeof(*pgdat), nid); + continue; + } + arch_refresh_nodedata(nid, pgdat); + free_area_init_memoryless_node(nid); + + /* + * We do not want to confuse userspace by sysfs + * files/directories for node without any memory + * attached to it, so this node is not marked as + * N_MEMORY and not marked online so that no sysfs + * hierarchy will be created via register_one_node for + * it. The pgdat will get fully initialized by + * hotadd_init_pgdat() when memory is hotplugged into + * this node. + */ + continue; + } + + pgdat = NODE_DATA(nid); free_area_init_node(nid); /* Any memory on that node */ From patchwork Tue Mar 22 21:46:57 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789250 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2C1D7C433FE for ; Tue, 22 Mar 2022 21:47:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 92EE96B016E; Tue, 22 Mar 2022 17:47:00 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8DFFF6B0172; Tue, 22 Mar 2022 17:47:00 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 77E366B0171; Tue, 22 Mar 2022 17:47:00 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0178.hostedemail.com [216.40.44.178]) by kanga.kvack.org (Postfix) with ESMTP id 634C76B016E for ; Tue, 22 Mar 2022 17:47:00 -0400 (EDT) Received: from smtpin31.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 280D5A2B0A for ; Tue, 22 Mar 2022 21:47:00 +0000 (UTC) X-FDA: 79273357800.31.1EAE7EE Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf25.hostedemail.com (Postfix) with ESMTP id C6721A002F for ; Tue, 22 Mar 2022 21:46:58 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 56A516100A; Tue, 22 Mar 2022 21:46:58 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 1B596C340EE; Tue, 22 Mar 2022 21:46:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985618; bh=GeV3BIf5Jn1Ndw7SjltGRC71yluBE1EUWo5jZwbArKs=; h=Date:To:From:In-Reply-To:Subject:From; b=cz5iZqqtIUaFCGfozXTjy6rTWGqQUB4V3hNe3xSc5cgFQzmk8hy4di+6tsu4tCnSK C7QH5x0hBXuhOHZlELdLgO0Q/0RGP4gn3ZYErnYtJNlKQN29Tg1Jwz5FYTXOaknx5Z R4uzaCY3qumc1QO4vMtxA46pSaixeprHwekXGPgQ= Date: Tue, 22 Mar 2022 14:46:57 -0700 To: tj@kernel.org,rppt@linux.ibm.com,richard.weiyang@gmail.com,raquini@redhat.com,osalvador@suse.de,npache@redhat.com,eric.dumazet@gmail.com,dennis@kernel.org,david@redhat.com,cl@linux.com,amakhalov@vmware.com,mhocko@suse.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 166/227] mm, memory_hotplug: drop arch_free_nodedata Message-Id: <20220322214658.1B596C340EE@smtp.kernel.org> X-Stat-Signature: k5pp1ozhzi8se5u5n8dhhgu7k7ytnyqw Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=cz5iZqqt; spf=pass (imf25.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: C6721A002F X-HE-Tag: 1647985618-341451 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Michal Hocko Subject: mm, memory_hotplug: drop arch_free_nodedata Prior to "mm: handle uninitialized numa nodes gracefully" memory hotplug used to allocate pgdat when memory has been added to a node (hotadd_init_pgdat) arch_free_nodedata has been only used in the failure path because once the pgdat is exported (to be visible by NODA_DATA(nid)) it cannot really be freed because there is no synchronization available for that. pgdat is allocated for each possible nodes now so the memory hotplug doesn't need to do the ever use arch_free_nodedata so drop it. This patch doesn't introduce any functional change. Link: https://lkml.kernel.org/r/20220127085305.20890-4-mhocko@kernel.org Signed-off-by: Michal Hocko Acked-by: Rafael Aquini Acked-by: David Hildenbrand Acked-by: Mike Rapoport Reviewed-by: Oscar Salvador Cc: Alexey Makhalov Cc: Christoph Lameter Cc: Dennis Zhou Cc: Eric Dumazet Cc: Nico Pache Cc: Tejun Heo Cc: Wei Yang Signed-off-by: Andrew Morton --- arch/ia64/mm/discontig.c | 5 ----- include/linux/memory_hotplug.h | 3 --- mm/memory_hotplug.c | 10 ---------- 3 files changed, 18 deletions(-) --- a/arch/ia64/mm/discontig.c~mm-memory_hotplug-drop-arch_free_nodedata +++ a/arch/ia64/mm/discontig.c @@ -615,11 +615,6 @@ pg_data_t * __init arch_alloc_nodedata(i return memblock_alloc(size, SMP_CACHE_BYTES); } -void arch_free_nodedata(pg_data_t *pgdat) -{ - kfree(pgdat); -} - void arch_refresh_nodedata(int update_node, pg_data_t *update_pgdat) { pgdat_list[update_node] = update_pgdat; --- a/include/linux/memory_hotplug.h~mm-memory_hotplug-drop-arch_free_nodedata +++ a/include/linux/memory_hotplug.h @@ -24,17 +24,14 @@ struct vmem_altmap; * node_data[nid] = kzalloc() works well. But it depends on the architecture. * * In general, generic_alloc_nodedata() is used. - * Now, arch_free_nodedata() is just defined for error path of node_hot_add. * */ extern pg_data_t *arch_alloc_nodedata(int nid); -extern void arch_free_nodedata(pg_data_t *pgdat); extern void arch_refresh_nodedata(int nid, pg_data_t *pgdat); #else /* CONFIG_HAVE_ARCH_NODEDATA_EXTENSION */ #define arch_alloc_nodedata(nid) generic_alloc_nodedata(nid) -#define arch_free_nodedata(pgdat) generic_free_nodedata(pgdat) #ifdef CONFIG_NUMA /* --- a/mm/memory_hotplug.c~mm-memory_hotplug-drop-arch_free_nodedata +++ a/mm/memory_hotplug.c @@ -1217,16 +1217,6 @@ static pg_data_t __ref *hotadd_init_pgda return pgdat; } -static void rollback_node_hotadd(int nid) -{ - pg_data_t *pgdat = NODE_DATA(nid); - - arch_refresh_nodedata(nid, NULL); - free_percpu(pgdat->per_cpu_nodestats); - arch_free_nodedata(pgdat); -} - - /* * __try_online_node - online a node if offlined * @nid: the node ID From patchwork Tue Mar 22 21:47:00 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789251 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 464CEC433F5 for ; Tue, 22 Mar 2022 21:47:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D24FC6B0172; Tue, 22 Mar 2022 17:47:04 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CD4816B0173; Tue, 22 Mar 2022 17:47:04 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BC4836B0174; Tue, 22 Mar 2022 17:47:04 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0062.hostedemail.com [216.40.44.62]) by kanga.kvack.org (Postfix) with ESMTP id A9D326B0172 for ; Tue, 22 Mar 2022 17:47:04 -0400 (EDT) Received: from smtpin28.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 6DEB91828AE47 for ; Tue, 22 Mar 2022 21:47:04 +0000 (UTC) X-FDA: 79273357968.28.E9CD71E Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf01.hostedemail.com (Postfix) with ESMTP id 804CD4001A for ; Tue, 22 Mar 2022 21:47:03 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 791AEB81DB7; Tue, 22 Mar 2022 21:47:02 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 383FDC340EC; Tue, 22 Mar 2022 21:47:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985621; bh=qFI2ea2EzX/1UcA4g0drXk7gN/Plzy+8Oz6wAotqbmQ=; h=Date:To:From:In-Reply-To:Subject:From; b=DE5QrawLCW/RlnOs4KKa1VH6SlDXcjjmnkJW+ENvWfD192r5stsdiCxSrvgn3W12Q 2I8fAWJuHTuMgN++v3p7QgivFfceBmuhpeEk8Rs6eHoEFlRunp5A6pqYsA+83fWOhy 2IGK9meome+xJtdlPOCJ82Uh91uNluBStqlwq65g= Date: Tue, 22 Mar 2022 14:47:00 -0700 To: tj@kernel.org,rppt@linux.ibm.com,richard.weiyang@gmail.com,raquini@redhat.com,osalvador@suse.de,npache@redhat.com,eric.dumazet@gmail.com,dennis@kernel.org,david@redhat.com,cl@linux.com,amakhalov@vmware.com,mhocko@suse.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 167/227] mm, memory_hotplug: reorganize new pgdat initialization Message-Id: <20220322214701.383FDC340EC@smtp.kernel.org> X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 804CD4001A X-Rspam-User: Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=DE5QrawL; dmarc=none; spf=pass (imf01.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Stat-Signature: hnn8ryx7ui9k5wrcnnbh3kxcxpfh777m X-HE-Tag: 1647985623-222150 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Michal Hocko Subject: mm, memory_hotplug: reorganize new pgdat initialization When a !node_online node is brought up it needs a hotplug specific initialization because the node could be either uninitialized yet or it could have been recycled after previous hotremove. hotadd_init_pgdat is responsible for that. Internal pgdat state is initialized at two places currently - hotadd_init_pgdat - free_area_init_core_hotplug There is no real clear cut what should go where but this patch's chosen to move the whole internal state initialization into free_area_init_core_hotplug. hotadd_init_pgdat is still responsible to pull all the parts together - most notably to initialize zonelists because those depend on the overall topology. This patch doesn't introduce any functional change. Link: https://lkml.kernel.org/r/20220127085305.20890-5-mhocko@kernel.org Signed-off-by: Michal Hocko Acked-by: Rafael Aquini Acked-by: David Hildenbrand Reviewed-by: Oscar Salvador Cc: Alexey Makhalov Cc: Christoph Lameter Cc: Dennis Zhou Cc: Eric Dumazet Cc: Mike Rapoport Cc: Nico Pache Cc: Tejun Heo Cc: Wei Yang Signed-off-by: Andrew Morton --- include/linux/memory_hotplug.h | 2 +- mm/memory_hotplug.c | 28 +++------------------------- mm/page_alloc.c | 25 +++++++++++++++++++++++-- 3 files changed, 27 insertions(+), 28 deletions(-) --- a/include/linux/memory_hotplug.h~mm-memory_hotplug-reorganize-new-pgdat-initialization +++ a/include/linux/memory_hotplug.h @@ -319,7 +319,7 @@ extern void set_zone_contiguous(struct z extern void clear_zone_contiguous(struct zone *zone); #ifdef CONFIG_MEMORY_HOTPLUG -extern void __ref free_area_init_core_hotplug(int nid); +extern void __ref free_area_init_core_hotplug(struct pglist_data *pgdat); extern int __add_memory(int nid, u64 start, u64 size, mhp_t mhp_flags); extern int add_memory(int nid, u64 start, u64 size, mhp_t mhp_flags); extern int add_memory_resource(int nid, struct resource *resource, --- a/mm/memory_hotplug.c~mm-memory_hotplug-reorganize-new-pgdat-initialization +++ a/mm/memory_hotplug.c @@ -1166,39 +1166,16 @@ static pg_data_t __ref *hotadd_init_pgda { struct pglist_data *pgdat; - pgdat = NODE_DATA(nid); - /* * NODE_DATA is preallocated (free_area_init) but its internal * state is not allocated completely. Add missing pieces. * Completely offline nodes stay around and they just need * reintialization. */ - if (pgdat->per_cpu_nodestats == &boot_nodestats) { - pgdat->per_cpu_nodestats = - alloc_percpu(struct per_cpu_nodestat); - } else { - int cpu; - /* - * Reset the nr_zones, order and highest_zoneidx before reuse. - * Note that kswapd will init kswapd_highest_zoneidx properly - * when it starts in the near future. - */ - pgdat->nr_zones = 0; - pgdat->kswapd_order = 0; - pgdat->kswapd_highest_zoneidx = 0; - for_each_online_cpu(cpu) { - struct per_cpu_nodestat *p; - - p = per_cpu_ptr(pgdat->per_cpu_nodestats, cpu); - memset(p, 0, sizeof(*p)); - } - } - - pgdat->node_start_pfn = 0; + pgdat = NODE_DATA(nid); /* init node's zones as empty zones, we don't have any present pages.*/ - free_area_init_core_hotplug(nid); + free_area_init_core_hotplug(pgdat); /* * The node we allocated has no zone fallback lists. For avoiding @@ -1210,6 +1187,7 @@ static pg_data_t __ref *hotadd_init_pgda * When memory is hot-added, all the memory is in offline state. So * clear all zones' present_pages because they will be updated in * online_pages() and offline_pages(). + * TODO: should be in free_area_init_core_hotplug? */ reset_node_managed_pages(pgdat); reset_node_present_pages(pgdat); --- a/mm/page_alloc.c~mm-memory_hotplug-reorganize-new-pgdat-initialization +++ a/mm/page_alloc.c @@ -7466,12 +7466,33 @@ static void __meminit zone_init_internal * NOTE: this function is only called during memory hotplug */ #ifdef CONFIG_MEMORY_HOTPLUG -void __ref free_area_init_core_hotplug(int nid) +void __ref free_area_init_core_hotplug(struct pglist_data *pgdat) { + int nid = pgdat->node_id; enum zone_type z; - pg_data_t *pgdat = NODE_DATA(nid); + int cpu; pgdat_init_internals(pgdat); + + if (pgdat->per_cpu_nodestats == &boot_nodestats) + pgdat->per_cpu_nodestats = alloc_percpu(struct per_cpu_nodestat); + + /* + * Reset the nr_zones, order and highest_zoneidx before reuse. + * Note that kswapd will init kswapd_highest_zoneidx properly + * when it starts in the near future. + */ + pgdat->nr_zones = 0; + pgdat->kswapd_order = 0; + pgdat->kswapd_highest_zoneidx = 0; + pgdat->node_start_pfn = 0; + for_each_online_cpu(cpu) { + struct per_cpu_nodestat *p; + + p = per_cpu_ptr(pgdat->per_cpu_nodestats, cpu); + memset(p, 0, sizeof(*p)); + } + for (z = 0; z < MAX_NR_ZONES; z++) zone_init_internals(&pgdat->node_zones[z], z, nid, 0); } From patchwork Tue Mar 22 21:47:03 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789252 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 080C8C433FE for ; Tue, 22 Mar 2022 21:47:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8E06F6B0174; Tue, 22 Mar 2022 17:47:07 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 891946B0175; Tue, 22 Mar 2022 17:47:07 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 77E536B0176; Tue, 22 Mar 2022 17:47:07 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0024.hostedemail.com [216.40.44.24]) by kanga.kvack.org (Postfix) with ESMTP id 65C596B0174 for ; Tue, 22 Mar 2022 17:47:07 -0400 (EDT) Received: from smtpin28.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 28B4AA2B0A for ; Tue, 22 Mar 2022 21:47:07 +0000 (UTC) X-FDA: 79273358094.28.D2B65F3 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf11.hostedemail.com (Postfix) with ESMTP id A2B5440007 for ; Tue, 22 Mar 2022 21:47:06 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 91467B81DAF; Tue, 22 Mar 2022 21:47:05 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4F7C7C340EE; Tue, 22 Mar 2022 21:47:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985624; bh=urddp+u7qEoMilszfxvuCzWkSMvo7mc65sO3QUuIc0s=; h=Date:To:From:In-Reply-To:Subject:From; b=kg/qReqmk0Kqs0dCFx+uZXRs6w2HeRNgNTdULm1IKUUj/mYfAFSafchKl4VvQsMvt uVTmnuW3ZD94HZF7apZ0S0kuV/d+uzntkEkgZ6uUUP3hKbXzfpUnkKn3BoaR0OdP5z QA8xjy4G6sX6VIJY+wgRSL+f1YncFdal7eVABBig= Date: Tue, 22 Mar 2022 14:47:03 -0700 To: tj@kernel.org,rppt@linux.ibm.com,richard.weiyang@gmail.com,raquini@redhat.com,osalvador@suse.de,npache@redhat.com,eric.dumazet@gmail.com,dennis@kernel.org,david@redhat.com,cl@linux.com,amakhalov@vmware.com,mhocko@suse.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 168/227] mm: make free_area_init_node aware of memory less nodes Message-Id: <20220322214704.4F7C7C340EE@smtp.kernel.org> X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: A2B5440007 X-Stat-Signature: y4tfzo4hnk4iquf7e3czmf16wppedw4c X-Rspam-User: Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b="kg/qReqm"; dmarc=none; spf=pass (imf11.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-HE-Tag: 1647985626-727484 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Michal Hocko Subject: mm: make free_area_init_node aware of memory less nodes free_area_init_node is also called from memory less node initialization path (free_area_init_memoryless_node). It doesn't really make much sense to display the physical memory range for those nodes: Initmem setup node XX [mem 0x0000000000000000-0x0000000000000000] Instead be explicit that the node is memoryless: Initmem setup node XX as memoryless Link: https://lkml.kernel.org/r/20220127085305.20890-6-mhocko@kernel.org Signed-off-by: Michal Hocko Acked-by: Rafael Aquini Acked-by: David Hildenbrand Reviewed-by: Mike Rapoport Reviewed-by: Oscar Salvador Cc: Alexey Makhalov Cc: Christoph Lameter Cc: Dennis Zhou Cc: Eric Dumazet Cc: Nico Pache Cc: Tejun Heo Cc: Wei Yang Signed-off-by: Andrew Morton --- mm/page_alloc.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) --- a/mm/page_alloc.c~mm-make-free_area_init_node-aware-of-memory-less-nodes +++ a/mm/page_alloc.c @@ -7642,9 +7642,14 @@ static void __init free_area_init_node(i pgdat->node_start_pfn = start_pfn; pgdat->per_cpu_nodestats = NULL; - pr_info("Initmem setup node %d [mem %#018Lx-%#018Lx]\n", nid, - (u64)start_pfn << PAGE_SHIFT, - end_pfn ? ((u64)end_pfn << PAGE_SHIFT) - 1 : 0); + if (start_pfn != end_pfn) { + pr_info("Initmem setup node %d [mem %#018Lx-%#018Lx]\n", nid, + (u64)start_pfn << PAGE_SHIFT, + end_pfn ? ((u64)end_pfn << PAGE_SHIFT) - 1 : 0); + } else { + pr_info("Initmem setup node %d as memoryless\n", nid); + } + calculate_node_totalpages(pgdat, start_pfn, end_pfn); alloc_node_mem_map(pgdat); From patchwork Tue Mar 22 21:47:06 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789253 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 29E11C433F5 for ; Tue, 22 Mar 2022 21:47:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B9F3E6B0176; Tue, 22 Mar 2022 17:47:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B27776B0177; Tue, 22 Mar 2022 17:47:10 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A4BC16B0178; Tue, 22 Mar 2022 17:47:10 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0099.hostedemail.com [216.40.44.99]) by kanga.kvack.org (Postfix) with ESMTP id 9489D6B0176 for ; Tue, 22 Mar 2022 17:47:10 -0400 (EDT) Received: from smtpin24.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 64388A3244 for ; Tue, 22 Mar 2022 21:47:10 +0000 (UTC) X-FDA: 79273358220.24.8424BEA Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf21.hostedemail.com (Postfix) with ESMTP id D21611C0007 for ; Tue, 22 Mar 2022 21:47:09 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id A2E27B81D5F; Tue, 22 Mar 2022 21:47:08 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5F9C2C340F2; Tue, 22 Mar 2022 21:47:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985627; bh=fDWYER1e+boei8jJvhe+8X1MXU40/XACDM3uTEQN10E=; h=Date:To:From:In-Reply-To:Subject:From; b=u9VBJllCl8ZZR0MU5ydVooFBMMawP6BKJ1SwOjOavnzhoZpNvuSDz2rX5+jT2AYJg A6djqbGX4xRIMC6kxKeX+h6nCbKLljgwfrlm2R5KRhlJD+mb9VdIEu+IA1DYoNywnh 9sxYAbpu8RMdvNrENfGyyZIy4YHt1iOZ4fbLCGtA= Date: Tue, 22 Mar 2022 14:47:06 -0700 To: tj@kernel.org,rppt@linux.ibm.com,raquini@redhat.com,osalvador@suse.de,npache@redhat.com,mhocko@suse.com,eric.dumazet@gmail.com,dennis@kernel.org,david@redhat.com,cl@linux.com,amakhalov@vmware.com,richard.weiyang@gmail.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 169/227] memcg: do not tweak node in alloc_mem_cgroup_per_node_info Message-Id: <20220322214707.5F9C2C340F2@smtp.kernel.org> X-Stat-Signature: 3em31zw9i3p8fhcqgzm44f3zrucxfmat Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=u9VBJllC; spf=pass (imf21.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: D21611C0007 X-HE-Tag: 1647985629-183760 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Wei Yang Subject: memcg: do not tweak node in alloc_mem_cgroup_per_node_info alloc_mem_cgroup_per_node_info is allocated for each possible node and this used to be a problem because !node_online nodes didn't have appropriate data structure allocated. This has changed by "mm: handle uninitialized numa nodes gracefully" so we can drop the special casing here. Link: https://lkml.kernel.org/r/20220127085305.20890-7-mhocko@kernel.org Signed-off-by: Wei Yang Signed-off-by: Michal Hocko Cc: David Hildenbrand Cc: Alexey Makhalov Cc: Dennis Zhou Cc: Eric Dumazet Cc: Tejun Heo Cc: Christoph Lameter Cc: Nico Pache Cc: Wei Yang Cc: Mike Rapoport Cc: Oscar Salvador Cc: Rafael Aquini Signed-off-by: Andrew Morton --- mm/memcontrol.c | 14 ++------------ 1 file changed, 2 insertions(+), 12 deletions(-) --- a/mm/memcontrol.c~memcg-do-not-tweak-node-in-alloc_mem_cgroup_per_node_info +++ a/mm/memcontrol.c @@ -5020,18 +5020,8 @@ struct mem_cgroup *mem_cgroup_from_id(un static int alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node) { struct mem_cgroup_per_node *pn; - int tmp = node; - /* - * This routine is called against possible nodes. - * But it's BUG to call kmalloc() against offline node. - * - * TODO: this routine can waste much memory for nodes which will - * never be onlined. It's better to use memory hotplug callback - * function. - */ - if (!node_state(node, N_NORMAL_MEMORY)) - tmp = -1; - pn = kzalloc_node(sizeof(*pn), GFP_KERNEL, tmp); + + pn = kzalloc_node(sizeof(*pn), GFP_KERNEL, node); if (!pn) return 1; From patchwork Tue Mar 22 21:47:09 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789254 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 48A1CC433EF for ; Tue, 22 Mar 2022 21:47:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D4C936B0178; Tue, 22 Mar 2022 17:47:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CFB5C6B0179; Tue, 22 Mar 2022 17:47:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BEAC66B017A; Tue, 22 Mar 2022 17:47:13 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.26]) by kanga.kvack.org (Postfix) with ESMTP id B05636B0178 for ; Tue, 22 Mar 2022 17:47:13 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 7EF8C2479E for ; Tue, 22 Mar 2022 21:47:13 +0000 (UTC) X-FDA: 79273358346.07.4550FCA Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf12.hostedemail.com (Postfix) with ESMTP id F2FB340045 for ; Tue, 22 Mar 2022 21:47:12 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id DB100B81DB7; Tue, 22 Mar 2022 21:47:11 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 78FFEC340EE; Tue, 22 Mar 2022 21:47:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985630; bh=OS2JY1FSwLuTAcLjNpZhlKwc2xb6s+hIEI0ezMIjvL0=; h=Date:To:From:In-Reply-To:Subject:From; b=YmiTrfqkxmXDMGehcEF1DKJ21jllpp+qQB2GmNhJFSQMCXtdIhIr5ANt4fvhDPyC/ CAWWXHUcAOo2WaaxcZnvclShW5Zn85ORiY1u201e0yACjTKVY56lqsnbuVy1UIVoOr j5gVrs8dGQgiswvpbXZJFv4pTsAgtlp2meo+1MbE= Date: Tue, 22 Mar 2022 14:47:09 -0700 To: rafael@kernel.org,osalvador@suse.de,mhocko@suse.com,gregkh@linuxfoundation.org,david@redhat.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 170/227] drivers/base/memory: add memory block to memory group after registration succeeded Message-Id: <20220322214710.78FFEC340EE@smtp.kernel.org> X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: F2FB340045 X-Stat-Signature: bwte4q7wj4gehokrrksu74rxzz8j8ouw X-Rspam-User: Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=YmiTrfqk; dmarc=none; spf=pass (imf12.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-HE-Tag: 1647985632-635230 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: David Hildenbrand Subject: drivers/base/memory: add memory block to memory group after registration succeeded If register_memory() fails, we freed the memory block but already added the memory block to the group list, not good. Let's defer adding the block to the memory group to after registering the memory block device. We do handle it properly during unregister_memory(), but that's not called when the registration fails. Link: https://lkml.kernel.org/r/20220128144540.153902-1-david@redhat.com Fixes: 028fc57a1c36 ("drivers/base/memory: introduce "memory groups" to logically group memory blocks") Signed-off-by: David Hildenbrand Reviewed-by: Oscar Salvador Acked-by: Michal Hocko Cc: Greg Kroah-Hartman Cc: "Rafael J. Wysocki" Signed-off-by: Andrew Morton --- drivers/base/memory.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) --- a/drivers/base/memory.c~drivers-base-memory-add-memory-block-to-memory-group-after-registration-succeeded +++ a/drivers/base/memory.c @@ -665,14 +665,16 @@ static int init_memory_block(unsigned lo mem->nr_vmemmap_pages = nr_vmemmap_pages; INIT_LIST_HEAD(&mem->group_next); + ret = register_memory(mem); + if (ret) + return ret; + if (group) { mem->group = group; list_add(&mem->group_next, &group->memory_blocks); } - ret = register_memory(mem); - - return ret; + return 0; } static int add_memory_block(unsigned long base_section_nr) From patchwork Tue Mar 22 21:47:13 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789255 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9F224C433F5 for ; Tue, 22 Mar 2022 21:47:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 33B616B017A; Tue, 22 Mar 2022 17:47:16 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2C2CB6B017B; Tue, 22 Mar 2022 17:47:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 18A9A6B017C; Tue, 22 Mar 2022 17:47:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0201.hostedemail.com [216.40.44.201]) by kanga.kvack.org (Postfix) with ESMTP id 084946B017A for ; Tue, 22 Mar 2022 17:47:16 -0400 (EDT) Received: from smtpin28.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id CD8F7A4A62 for ; Tue, 22 Mar 2022 21:47:15 +0000 (UTC) X-FDA: 79273358430.28.F44F70C Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf01.hostedemail.com (Postfix) with ESMTP id 3802540021 for ; Tue, 22 Mar 2022 21:47:15 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 26E0EB81DAF; Tue, 22 Mar 2022 21:47:14 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id B60AFC340F2; Tue, 22 Mar 2022 21:47:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985633; bh=NYdK6/dSUzC668bEuxDFIs438rTjM7GW+TuJibVSw0Y=; h=Date:To:From:In-Reply-To:Subject:From; b=O0yMAXAU6MtB8sYT0pGzfMcDX+UmbaedsHzktveIUlUxrGoB5SkjRo1qWDXaSaewv OX8c0iQNpVrB4PNfAjd1h2FTv0Apa9d6VXTPk8u/OpdEevqKZViod4UlIGO0wxX198 Rg5Yzt71XU54Zf/1tWJApkBNFJsOQVBDYWKqET10= Date: Tue, 22 Mar 2022 14:47:13 -0700 To: ysato@users.sourceforge.jp,will@kernel.org,tsbogend@alpha.franken.de,tglx@linutronix.de,rppt@kernel.org,rafael@kernel.org,paul.walmsley@sifive.com,paulus@samba.org,palmer@dabbelt.com,osalvador@suse.de,mpe@ellerman.id.au,mingo@redhat.com,mhocko@suse.com,matorola@gmail.com,hca@linux.ibm.com,gregkh@linuxfoundation.org,gor@linux.ibm.com,davem@davemloft.net,dave.hansen@linux.intel.com,dalias@libc.org,catalin.marinas@arm.com,bp@alien8.de,benh@kernel.crashing.org,aou@eecs.berkeley.edu,david@redhat.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 171/227] drivers/base/node: consolidate node device subsystem initialization in node_dev_init() Message-Id: <20220322214713.B60AFC340F2@smtp.kernel.org> X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 3802540021 X-Stat-Signature: r7tf8r6pf9ciug5w9xyi13zpb1s9zsph Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=O0yMAXAU; dmarc=none; spf=pass (imf01.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Rspam-User: X-HE-Tag: 1647985635-565469 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: David Hildenbrand Subject: drivers/base/node: consolidate node device subsystem initialization in node_dev_init() ... and call node_dev_init() after memory_dev_init() from driver_init(), so before any of the existing arch/subsys calls. All online nodes should be known at that point: early during boot, arch code determines node and zone ranges and sets the relevant nodes online; usually this happens in setup_arch(). This is in line with memory_dev_init(), which initializes the memory device subsystem and creates all memory block devices. Similar to memory_dev_init(), panic() if anything goes wrong, we don't want to continue with such basic initialization errors. The important part is that node_dev_init() gets called after memory_dev_init() and after cpu_dev_init(), but before any of the relevant archs call register_cpu() to register the new cpu device under the node device. The latter should be the case for the current users of topology_init(). Link: https://lkml.kernel.org/r/20220203105212.30385-1-david@redhat.com Signed-off-by: David Hildenbrand Reviewed-by: Oscar Salvador Tested-by: Anatoly Pugachev (sparc64) Cc: Greg Kroah-Hartman Cc: Michal Hocko Cc: Oscar Salvador Cc: Mike Rapoport Cc: Catalin Marinas Cc: Will Deacon Cc: Thomas Bogendoerfer Cc: Michael Ellerman Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Paul Walmsley Cc: Palmer Dabbelt Cc: Albert Ou Cc: Heiko Carstens Cc: Vasily Gorbik Cc: Yoshinori Sato Cc: Rich Felker Cc: "David S. Miller" Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Borislav Petkov Cc: Dave Hansen Cc: "Rafael J. Wysocki" Signed-off-by: Andrew Morton --- arch/arm64/kernel/setup.c | 3 --- arch/ia64/kernel/topology.c | 10 ---------- arch/mips/kernel/topology.c | 5 ----- arch/powerpc/kernel/sysfs.c | 17 ----------------- arch/riscv/kernel/setup.c | 3 --- arch/s390/kernel/numa.c | 7 ------- arch/sh/kernel/topology.c | 5 ----- arch/sparc/kernel/sysfs.c | 12 ------------ arch/x86/kernel/topology.c | 5 ----- drivers/base/init.c | 1 + drivers/base/node.c | 30 +++++++++++++++++------------- include/linux/node.h | 4 ++++ 12 files changed, 22 insertions(+), 80 deletions(-) --- a/arch/arm64/kernel/setup.c~drivers-base-node-consolidate-node-device-subsystem-initialization-in-node_dev_init +++ a/arch/arm64/kernel/setup.c @@ -406,9 +406,6 @@ static int __init topology_init(void) { int i; - for_each_online_node(i) - register_one_node(i); - for_each_possible_cpu(i) { struct cpu *cpu = &per_cpu(cpu_data.cpu, i); cpu->hotpluggable = cpu_can_disable(i); --- a/arch/ia64/kernel/topology.c~drivers-base-node-consolidate-node-device-subsystem-initialization-in-node_dev_init +++ a/arch/ia64/kernel/topology.c @@ -70,16 +70,6 @@ static int __init topology_init(void) { int i, err = 0; -#ifdef CONFIG_NUMA - /* - * MCD - Do we want to register all ONLINE nodes, or all POSSIBLE nodes? - */ - for_each_online_node(i) { - if ((err = register_one_node(i))) - goto out; - } -#endif - sysfs_cpus = kcalloc(NR_CPUS, sizeof(struct ia64_cpu), GFP_KERNEL); if (!sysfs_cpus) panic("kzalloc in topology_init failed - NR_CPUS too big?"); --- a/arch/mips/kernel/topology.c~drivers-base-node-consolidate-node-device-subsystem-initialization-in-node_dev_init +++ a/arch/mips/kernel/topology.c @@ -12,11 +12,6 @@ static int __init topology_init(void) { int i, ret; -#ifdef CONFIG_NUMA - for_each_online_node(i) - register_one_node(i); -#endif /* CONFIG_NUMA */ - for_each_present_cpu(i) { struct cpu *c = &per_cpu(cpu_devices, i); --- a/arch/powerpc/kernel/sysfs.c~drivers-base-node-consolidate-node-device-subsystem-initialization-in-node_dev_init +++ a/arch/powerpc/kernel/sysfs.c @@ -1110,14 +1110,6 @@ EXPORT_SYMBOL_GPL(cpu_remove_dev_attr_gr /* NUMA stuff */ #ifdef CONFIG_NUMA -static void __init register_nodes(void) -{ - int i; - - for (i = 0; i < MAX_NUMNODES; i++) - register_one_node(i); -} - int sysfs_add_device_to_node(struct device *dev, int nid) { struct node *node = node_devices[nid]; @@ -1132,13 +1124,6 @@ void sysfs_remove_device_from_node(struc sysfs_remove_link(&node->dev.kobj, kobject_name(&dev->kobj)); } EXPORT_SYMBOL_GPL(sysfs_remove_device_from_node); - -#else -static void __init register_nodes(void) -{ - return; -} - #endif /* Only valid if CPU is present. */ @@ -1155,8 +1140,6 @@ static int __init topology_init(void) { int cpu, r; - register_nodes(); - for_each_possible_cpu(cpu) { struct cpu *c = &per_cpu(cpu_devices, cpu); --- a/arch/riscv/kernel/setup.c~drivers-base-node-consolidate-node-device-subsystem-initialization-in-node_dev_init +++ a/arch/riscv/kernel/setup.c @@ -301,9 +301,6 @@ static int __init topology_init(void) { int i, ret; - for_each_online_node(i) - register_one_node(i); - for_each_possible_cpu(i) { struct cpu *cpu = &per_cpu(cpu_devices, i); --- a/arch/s390/kernel/numa.c~drivers-base-node-consolidate-node-device-subsystem-initialization-in-node_dev_init +++ a/arch/s390/kernel/numa.c @@ -33,10 +33,3 @@ void __init numa_setup(void) NODE_DATA(0)->node_spanned_pages = memblock_end_of_DRAM() >> PAGE_SHIFT; NODE_DATA(0)->node_id = 0; } - -static int __init numa_init_late(void) -{ - register_one_node(0); - return 0; -} -arch_initcall(numa_init_late); --- a/arch/sh/kernel/topology.c~drivers-base-node-consolidate-node-device-subsystem-initialization-in-node_dev_init +++ a/arch/sh/kernel/topology.c @@ -46,11 +46,6 @@ static int __init topology_init(void) { int i, ret; -#ifdef CONFIG_NUMA - for_each_online_node(i) - register_one_node(i); -#endif - for_each_present_cpu(i) { struct cpu *c = &per_cpu(cpu_devices, i); --- a/arch/sparc/kernel/sysfs.c~drivers-base-node-consolidate-node-device-subsystem-initialization-in-node_dev_init +++ a/arch/sparc/kernel/sysfs.c @@ -244,22 +244,10 @@ static void __init check_mmu_stats(void) mmu_stats_supported = 1; } -static void register_nodes(void) -{ -#ifdef CONFIG_NUMA - int i; - - for (i = 0; i < MAX_NUMNODES; i++) - register_one_node(i); -#endif -} - static int __init topology_init(void) { int cpu, ret; - register_nodes(); - check_mmu_stats(); for_each_possible_cpu(cpu) { --- a/arch/x86/kernel/topology.c~drivers-base-node-consolidate-node-device-subsystem-initialization-in-node_dev_init +++ a/arch/x86/kernel/topology.c @@ -154,11 +154,6 @@ static int __init topology_init(void) { int i; -#ifdef CONFIG_NUMA - for_each_online_node(i) - register_one_node(i); -#endif - for_each_present_cpu(i) arch_register_cpu(i); --- a/drivers/base/init.c~drivers-base-node-consolidate-node-device-subsystem-initialization-in-node_dev_init +++ a/drivers/base/init.c @@ -35,5 +35,6 @@ void __init driver_init(void) auxiliary_bus_init(); cpu_dev_init(); memory_dev_init(); + node_dev_init(); container_dev_init(); } --- a/drivers/base/node.c~drivers-base-node-consolidate-node-device-subsystem-initialization-in-node_dev_init +++ a/drivers/base/node.c @@ -1065,26 +1065,30 @@ static const struct attribute_group *cpu }; #define NODE_CALLBACK_PRI 2 /* lower than SLAB */ -static int __init register_node_type(void) +void __init node_dev_init(void) { - int ret; + static struct notifier_block node_memory_callback_nb = { + .notifier_call = node_memory_callback, + .priority = NODE_CALLBACK_PRI, + }; + int ret, i; BUILD_BUG_ON(ARRAY_SIZE(node_state_attr) != NR_NODE_STATES); BUILD_BUG_ON(ARRAY_SIZE(node_state_attrs)-1 != NR_NODE_STATES); ret = subsys_system_register(&node_subsys, cpu_root_attr_groups); - if (!ret) { - static struct notifier_block node_memory_callback_nb = { - .notifier_call = node_memory_callback, - .priority = NODE_CALLBACK_PRI, - }; - register_hotmemory_notifier(&node_memory_callback_nb); - } + if (ret) + panic("%s() failed to register subsystem: %d\n", __func__, ret); + + register_hotmemory_notifier(&node_memory_callback_nb); /* - * Note: we're not going to unregister the node class if we fail - * to register the node state class attribute files. + * Create all node devices, which will properly link the node + * to applicable memory block devices and already created cpu devices. */ - return ret; + for_each_online_node(i) { + ret = register_one_node(i); + if (ret) + panic("%s() failed to add node: %d\n", __func__, ret); + } } -postcore_initcall(register_node_type); --- a/include/linux/node.h~drivers-base-node-consolidate-node-device-subsystem-initialization-in-node_dev_init +++ a/include/linux/node.h @@ -112,6 +112,7 @@ static inline void link_mem_sections(int extern void unregister_node(struct node *node); #ifdef CONFIG_NUMA +extern void node_dev_init(void); /* Core of the node registration - only memory hotplug should use this */ extern int __register_one_node(int nid); @@ -149,6 +150,9 @@ extern void register_hugetlbfs_with_node node_registration_func_t unregister); #endif #else +static inline void node_dev_init(void) +{ +} static inline int __register_one_node(int nid) { return 0; From patchwork Tue Mar 22 21:47:16 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789256 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9E1B2C433F5 for ; Tue, 22 Mar 2022 21:47:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 339A06B017C; Tue, 22 Mar 2022 17:47:20 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2EA546B017D; Tue, 22 Mar 2022 17:47:20 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1B3A06B017E; Tue, 22 Mar 2022 17:47:20 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.28]) by kanga.kvack.org (Postfix) with ESMTP id 0C66B6B017C for ; Tue, 22 Mar 2022 17:47:20 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id D652120E55 for ; Tue, 22 Mar 2022 21:47:19 +0000 (UTC) X-FDA: 79273358598.12.F0C86A0 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf22.hostedemail.com (Postfix) with ESMTP id 36B18C002A for ; Tue, 22 Mar 2022 21:47:19 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 16BE5B81D5F; Tue, 22 Mar 2022 21:47:18 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id C3A08C340EE; Tue, 22 Mar 2022 21:47:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985636; bh=oA4q3srNS/ASuTBwak4bUqiug5XtLMt68SKmOkdl8i4=; h=Date:To:From:In-Reply-To:Subject:From; b=Wo44jk+vm0zF9tiavGanLr5WOG8Qu0sq75BXDge9LOmIiDgXT8tzQQGWTVw8uvstG nyuJzpHjyyBh3cG6evli3VUWREJFJ5jP1FpO48H1eDH9IMYNzI7nOJudAQrQlUAf5t Uq4nbxioeSVMmu+f2bmsA0FYBZfJ7mL0sc8P4u48= Date: Tue, 22 Mar 2022 14:47:16 -0700 To: david@redhat.com,linmiaohe@huawei.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 172/227] mm/memory_hotplug: remove obsolete comment of __add_pages Message-Id: <20220322214716.C3A08C340EE@smtp.kernel.org> X-Rspam-User: Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=Wo44jk+v; spf=pass (imf22.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 36B18C002A X-Stat-Signature: 7dfatx5uay6aieea65fo4b3xwtiijaso X-HE-Tag: 1647985639-531532 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Miaohe Lin Subject: mm/memory_hotplug: remove obsolete comment of __add_pages Patch series "A few cleanup patches around memory_hotplug". This series contains a few patches to fix obsolete and misplaced comments, clean up the try_offline_node function and so on. This patch (of 4): Since commit f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online"), there is no need to pass in the zone. [akpm@linux-foundation.org: remove the comment altogether, per David] Link: https://lkml.kernel.org/r/20220207133643.23427-1-linmiaohe@huawei.com Link: https://lkml.kernel.org/r/20220207133643.23427-2-linmiaohe@huawei.com Signed-off-by: Miaohe Lin Cc: David Hildenbrand Signed-off-by: Andrew Morton --- mm/memory_hotplug.c | 6 ------ 1 file changed, 6 deletions(-) --- a/mm/memory_hotplug.c~mm-memory_hotplug-remove-obsolete-comment-of-__add_pages +++ a/mm/memory_hotplug.c @@ -295,12 +295,6 @@ struct page *pfn_to_online_page(unsigned } EXPORT_SYMBOL_GPL(pfn_to_online_page); -/* - * Reasonably generic function for adding memory. It is - * expected that archs that support memory hotplug will - * call this function after deciding the zone to which to - * add the new pages. - */ int __ref __add_pages(int nid, unsigned long pfn, unsigned long nr_pages, struct mhp_params *params) { From patchwork Tue Mar 22 21:47:19 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789257 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6FA0AC433EF for ; Tue, 22 Mar 2022 21:47:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0F04C6B017E; Tue, 22 Mar 2022 17:47:23 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0A0496B017F; Tue, 22 Mar 2022 17:47:23 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E832F6B0180; Tue, 22 Mar 2022 17:47:22 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.25]) by kanga.kvack.org (Postfix) with ESMTP id DA0766B017E for ; Tue, 22 Mar 2022 17:47:22 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id A9C8361E46 for ; Tue, 22 Mar 2022 21:47:22 +0000 (UTC) X-FDA: 79273358724.09.F04C448 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf16.hostedemail.com (Postfix) with ESMTP id 3432A180018 for ; Tue, 22 Mar 2022 21:47:22 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 198BDB81DB7; Tue, 22 Mar 2022 21:47:21 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id A67DCC340F2; Tue, 22 Mar 2022 21:47:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985639; bh=CM1n9mzzhlL7za/LeGoqlxU72kT0yRPY/3c8VjkAdpw=; h=Date:To:From:In-Reply-To:Subject:From; b=NcQ7n2NFi96gYLMps5hb3/IiVGnhOXfQgX77FnwXfR4FrBaS0KWepr+YRsiIW14fo kZAd9Qzqc/CzXiQtRte5j9yuXADOLjQQ4FLncXLqTbYpWSpG0/tGTYWAPLmz28hsMD RdUlZIBUc4z8SiV6CIWZuLr/pqjvTLLEmHzE+Tjk= Date: Tue, 22 Mar 2022 14:47:19 -0700 To: osalvador@suse.de,david@redhat.com,linmiaohe@huawei.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 173/227] mm/memory_hotplug: avoid calling zone_intersects() for ZONE_NORMAL Message-Id: <20220322214719.A67DCC340F2@smtp.kernel.org> X-Stat-Signature: jhc6mq3cssb9a4391j4hgbeknzgyg9bo X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 3432A180018 Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=NcQ7n2NF; dmarc=none; spf=pass (imf16.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Rspam-User: X-HE-Tag: 1647985642-160583 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Miaohe Lin Subject: mm/memory_hotplug: avoid calling zone_intersects() for ZONE_NORMAL If zid reaches ZONE_NORMAL, the caller will always get the NORMAL zone no matter what zone_intersects() returns. So we can save some possible cpu cycles by avoid calling zone_intersects() for ZONE_NORMAL. Link: https://lkml.kernel.org/r/20220207133643.23427-3-linmiaohe@huawei.com Signed-off-by: Miaohe Lin Reviewed-by: David Hildenbrand Reviewed-by: Oscar Salvador Signed-off-by: Andrew Morton --- mm/memory_hotplug.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/mm/memory_hotplug.c~mm-memory_hotplug-avoid-calling-zone_intersects-for-zone_normal +++ a/mm/memory_hotplug.c @@ -823,7 +823,7 @@ static struct zone *default_kernel_zone_ struct pglist_data *pgdat = NODE_DATA(nid); int zid; - for (zid = 0; zid <= ZONE_NORMAL; zid++) { + for (zid = 0; zid < ZONE_NORMAL; zid++) { struct zone *zone = &pgdat->node_zones[zid]; if (zone_intersects(zone, start_pfn, nr_pages)) From patchwork Tue Mar 22 21:47:22 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789258 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 39898C433FE for ; Tue, 22 Mar 2022 21:47:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B9DC96B0180; Tue, 22 Mar 2022 17:47:24 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B27976B0181; Tue, 22 Mar 2022 17:47:24 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 92C7C6B0182; Tue, 22 Mar 2022 17:47:24 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0197.hostedemail.com [216.40.44.197]) by kanga.kvack.org (Postfix) with ESMTP id 818C86B0180 for ; Tue, 22 Mar 2022 17:47:24 -0400 (EDT) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 49E048249980 for ; Tue, 22 Mar 2022 21:47:24 +0000 (UTC) X-FDA: 79273358808.25.83D7A9A Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf06.hostedemail.com (Postfix) with ESMTP id CFBDF180004 for ; Tue, 22 Mar 2022 21:47:23 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 4673661668; Tue, 22 Mar 2022 21:47:23 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9A78BC340EC; Tue, 22 Mar 2022 21:47:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985642; bh=PpEsN8mcG7GQQNSB7nA/+rtCMUs7PJFXq1LdZYof+Gw=; h=Date:To:From:In-Reply-To:Subject:From; b=0zKe3ATtNLBpqmeRaR7IDapsKUnpR85jVv5gwvkEEC9qncwhmFWxJLcEQGz7aEodb EgsUCZEvPwzf3gm+mc2T5bqrPHcR/r5rc3qaFB3HZzfHUzHdcceqKJzTb3p/d8KzM+ LQyUAqIDgBNFF8Bi6JfxDiWofzDHSTEhEsDfd4m8= Date: Tue, 22 Mar 2022 14:47:22 -0700 To: osalvador@suse.de,david@redhat.com,linmiaohe@huawei.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 174/227] mm/memory_hotplug: clean up try_offline_node Message-Id: <20220322214722.9A78BC340EC@smtp.kernel.org> X-Stat-Signature: 4c7ng8beim38wu19ucdnffewi4nuowsp Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=0zKe3ATt; spf=pass (imf06.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspam-User: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: CFBDF180004 X-HE-Tag: 1647985643-750617 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Miaohe Lin Subject: mm/memory_hotplug: clean up try_offline_node We can use helper macro node_spanned_pages to check whether node spans pages. And we can change the parameter of check_cpu_on_node to nid as that's what it really cares. Thus we can further get rid of the local variable pgdat and improve the readability a bit. Link: https://lkml.kernel.org/r/20220207133643.23427-4-linmiaohe@huawei.com Signed-off-by: Miaohe Lin Reviewed-by: David Hildenbrand Reviewed-by: Oscar Salvador Signed-off-by: Andrew Morton --- mm/memory_hotplug.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) --- a/mm/memory_hotplug.c~mm-memory_hotplug-clean-up-try_offline_node +++ a/mm/memory_hotplug.c @@ -2005,12 +2005,12 @@ static int get_nr_vmemmap_pages_cb(struc return mem->nr_vmemmap_pages; } -static int check_cpu_on_node(pg_data_t *pgdat) +static int check_cpu_on_node(int nid) { int cpu; for_each_present_cpu(cpu) { - if (cpu_to_node(cpu) == pgdat->node_id) + if (cpu_to_node(cpu) == nid) /* * the cpu on this node isn't removed, and we can't * offline this node. @@ -2044,7 +2044,6 @@ static int check_no_memblock_for_node_cb */ void try_offline_node(int nid) { - pg_data_t *pgdat = NODE_DATA(nid); int rc; /* @@ -2052,7 +2051,7 @@ void try_offline_node(int nid) * offline it. A node spans memory after move_pfn_range_to_zone(), * e.g., after the memory block was onlined. */ - if (pgdat->node_spanned_pages) + if (node_spanned_pages(nid)) return; /* @@ -2064,7 +2063,7 @@ void try_offline_node(int nid) if (rc) return; - if (check_cpu_on_node(pgdat)) + if (check_cpu_on_node(nid)) return; /* From patchwork Tue Mar 22 21:47:24 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789259 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1324AC433F5 for ; Tue, 22 Mar 2022 21:47:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9F1766B0182; Tue, 22 Mar 2022 17:47:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9A0356B0183; Tue, 22 Mar 2022 17:47:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8900B6B0184; Tue, 22 Mar 2022 17:47:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0069.hostedemail.com [216.40.44.69]) by kanga.kvack.org (Postfix) with ESMTP id 762446B0182 for ; Tue, 22 Mar 2022 17:47:27 -0400 (EDT) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 3D9871828AE47 for ; Tue, 22 Mar 2022 21:47:27 +0000 (UTC) X-FDA: 79273358934.22.E9F8186 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf22.hostedemail.com (Postfix) with ESMTP id C1DFCC0017 for ; Tue, 22 Mar 2022 21:47:26 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 432CB61658; Tue, 22 Mar 2022 21:47:26 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9C4D1C340EC; Tue, 22 Mar 2022 21:47:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985645; bh=ie5oGyxFVGCmNbNWfhit0ER4qM+rx5o2roKxzo/Yspw=; h=Date:To:From:In-Reply-To:Subject:From; b=DsrlBd6LYdCqB5WYWmmx1lC715pxpsyad84nYsBPDx1S9uIoVkSZuNBfN5WmjUfLC tGdjdK2/vmqImvp75RHXNC5Ef3jDIN/GmzHmvEw5BJJqJx8duZEEVBC3TfEsz/2haN LcUvkGqpY64aUvkvT+GIXgXtWfEVpAcWk2geRSMo= Date: Tue, 22 Mar 2022 14:47:24 -0700 To: osalvador@suse.de,david@redhat.com,linmiaohe@huawei.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 175/227] mm/memory_hotplug: fix misplaced comment in offline_pages Message-Id: <20220322214725.9C4D1C340EC@smtp.kernel.org> X-Stat-Signature: n1fawxey55u6gyaqaeh6a4q1kzrub185 Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=DsrlBd6L; spf=pass (imf22.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspam-User: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: C1DFCC0017 X-HE-Tag: 1647985646-567528 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Miaohe Lin Subject: mm/memory_hotplug: fix misplaced comment in offline_pages It's misplaced since commit 7960509329c2 ("mm, memory_hotplug: print reason for the offlining failure"). Move it to the right place. Link: https://lkml.kernel.org/r/20220207133643.23427-5-linmiaohe@huawei.com Signed-off-by: Miaohe Lin Reviewed-by: David Hildenbrand Reviewed-by: Oscar Salvador Signed-off-by: Andrew Morton --- mm/memory_hotplug.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/mm/memory_hotplug.c~mm-memory_hotplug-fix-misplaced-comment-in-offline_pages +++ a/mm/memory_hotplug.c @@ -1963,6 +1963,7 @@ int __ref offline_pages(unsigned long st return 0; failed_removal_isolated: + /* pushback to free area */ undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE); memory_notify(MEM_CANCEL_OFFLINE, &arg); failed_removal_pcplists_disabled: @@ -1973,7 +1974,6 @@ failed_removal: (unsigned long long) start_pfn << PAGE_SHIFT, ((unsigned long long) end_pfn << PAGE_SHIFT) - 1, reason); - /* pushback to free area */ mem_hotplug_done(); return ret; } From patchwork Tue Mar 22 21:47:28 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789260 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 33B70C433FE for ; Tue, 22 Mar 2022 21:47:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C14576B0184; Tue, 22 Mar 2022 17:47:30 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BC2C36B0185; Tue, 22 Mar 2022 17:47:30 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AB1E06B0186; Tue, 22 Mar 2022 17:47:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.26]) by kanga.kvack.org (Postfix) with ESMTP id 982256B0184 for ; Tue, 22 Mar 2022 17:47:30 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay12.hostedemail.com (Postfix) with ESMTP id 5690D1213C6 for ; Tue, 22 Mar 2022 21:47:30 +0000 (UTC) X-FDA: 79273359060.11.18DB0F3 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf19.hostedemail.com (Postfix) with ESMTP id F292A1A0018 for ; Tue, 22 Mar 2022 21:47:29 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 56C0B61666; Tue, 22 Mar 2022 21:47:29 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id A7226C340EC; Tue, 22 Mar 2022 21:47:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985648; bh=XHZSHrl0g4YG5ETrkh5ZV4nruaRl+iQ/z4t6kK3KvUo=; h=Date:To:From:In-Reply-To:Subject:From; b=niNyw6zG+Rp9an6sQjB8z+0YSscKJvr9ynkfBbxWTJS3npEwKoOF4t9Vkg1kljB5p +dMx7EKiYHBQcQ9W1GAFf9WYQCTc4q5VmySltOpZ+LTOOXSdWgeIKsBU64gTgeTevg mf2xAZ8kWv+r/93Hx10ybHYt6DOS2Bi8tZ8M/UzA= Date: Tue, 22 Mar 2022 14:47:28 -0700 To: rparrazo@redhat.com,rafael@kernel.org,osalvador@suse.de,mhocko@suse.com,gregkh@linuxfoundation.org,david@redhat.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 176/227] drivers/base/node: rename link_mem_sections() to register_memory_block_under_node() Message-Id: <20220322214728.A7226C340EC@smtp.kernel.org> X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: F292A1A0018 X-Stat-Signature: xtfd9adasbrdojr4mmk6kp9387d5srdp Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=niNyw6zG; dmarc=none; spf=pass (imf19.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Rspam-User: X-HE-Tag: 1647985649-382744 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: David Hildenbrand Subject: drivers/base/node: rename link_mem_sections() to register_memory_block_under_node() Patch series "drivers/base/memory: determine and store zone for single-zone memory blocks", v2. I remember talking to Michal in the past about removing test_pages_in_a_zone(), which we use for: * verifying that a memory block we intend to offline is really only managed by a single zone. We don't support offlining of memory blocks that are managed by multiple zones (e.g., multiple nodes, DMA and DMA32) * exposing that zone to user space via /sys/devices/system/memory/memory*/valid_zones Now that I identified some more cases where test_pages_in_a_zone() might go wrong, and we received an UBSAN report (see patch #3), let's get rid of this PFN walker. So instead of detecting the zone at runtime with test_pages_in_a_zone() by scanning the memmap, let's determine and remember for each memory block if it's managed by a single zone. The stored zone can then be used for the above two cases, avoiding a manual lookup using test_pages_in_a_zone(). This avoids eventually stumbling over uninitialized memmaps in corner cases, especially when ZONE_DEVICE ranges partly fall into memory block (that are responsible for managing System RAM). Handling memory onlining is easy, because we online to exactly one zone. Handling boot memory is more tricky, because we want to avoid scanning all zones of all nodes to detect possible zones that overlap with the physical memory region of interest. Fortunately, we already have code that determines the applicable nodes for a memory block, to create sysfs links -- we'll hook into that. Patch #1 is a simple cleanup I had laying around for a longer time. Patch #2 contains the main logic to remove test_pages_in_a_zone() and further details. [1] https://lkml.kernel.org/r/20220128144540.153902-1-david@redhat.com [2] https://lkml.kernel.org/r/20220203105212.30385-1-david@redhat.com This patch (of 2): Let's adjust the stale terminology, making it match unregister_memory_block_under_nodes() and do_register_memory_block_under_node(). We're dealing with memory block devices, which span 1..X memory sections. Link: https://lkml.kernel.org/r/20220210184359.235565-1-david@redhat.com Link: https://lkml.kernel.org/r/20220210184359.235565-2-david@redhat.com Signed-off-by: David Hildenbrand Acked-by: Oscar Salvador Cc: Greg Kroah-Hartman Cc: Michal Hocko Cc: "Rafael J. Wysocki" Cc: Rafael Parra Signed-off-by: Andrew Morton --- drivers/base/node.c | 5 +++-- include/linux/node.h | 16 ++++++++-------- mm/memory_hotplug.c | 6 +++--- 3 files changed, 14 insertions(+), 13 deletions(-) --- a/drivers/base/node.c~drivers-base-node-rename-link_mem_sections-to-register_memory_block_under_node +++ a/drivers/base/node.c @@ -892,8 +892,9 @@ void unregister_memory_block_under_nodes kobject_name(&node_devices[mem_blk->nid]->dev.kobj)); } -void link_mem_sections(int nid, unsigned long start_pfn, unsigned long end_pfn, - enum meminit_context context) +void register_memory_blocks_under_node(int nid, unsigned long start_pfn, + unsigned long end_pfn, + enum meminit_context context) { walk_memory_blocks_func_t func; --- a/include/linux/node.h~drivers-base-node-rename-link_mem_sections-to-register_memory_block_under_node +++ a/include/linux/node.h @@ -99,13 +99,13 @@ extern struct node *node_devices[]; typedef void (*node_registration_func_t)(struct node *); #if defined(CONFIG_MEMORY_HOTPLUG) && defined(CONFIG_NUMA) -void link_mem_sections(int nid, unsigned long start_pfn, - unsigned long end_pfn, - enum meminit_context context); +void register_memory_blocks_under_node(int nid, unsigned long start_pfn, + unsigned long end_pfn, + enum meminit_context context); #else -static inline void link_mem_sections(int nid, unsigned long start_pfn, - unsigned long end_pfn, - enum meminit_context context) +static inline void register_memory_blocks_under_node(int nid, unsigned long start_pfn, + unsigned long end_pfn, + enum meminit_context context) { } #endif @@ -129,8 +129,8 @@ static inline int register_one_node(int error = __register_one_node(nid); if (error) return error; - /* link memory sections under this node */ - link_mem_sections(nid, start_pfn, end_pfn, MEMINIT_EARLY); + register_memory_blocks_under_node(nid, start_pfn, end_pfn, + MEMINIT_EARLY); } return error; --- a/mm/memory_hotplug.c~drivers-base-node-rename-link_mem_sections-to-register_memory_block_under_node +++ a/mm/memory_hotplug.c @@ -1383,9 +1383,9 @@ int __ref add_memory_resource(int nid, s BUG_ON(ret); } - /* link memory sections under this node.*/ - link_mem_sections(nid, PFN_DOWN(start), PFN_UP(start + size - 1), - MEMINIT_HOTPLUG); + register_memory_blocks_under_node(nid, PFN_DOWN(start), + PFN_UP(start + size - 1), + MEMINIT_HOTPLUG); /* create new memmap entry */ if (!strcmp(res->name, "System RAM")) From patchwork Tue Mar 22 21:47:31 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789261 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0F777C433EF for ; Tue, 22 Mar 2022 21:47:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9A3256B0186; Tue, 22 Mar 2022 17:47:35 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 951756B0187; Tue, 22 Mar 2022 17:47:35 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7F2CA6B0188; Tue, 22 Mar 2022 17:47:35 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0147.hostedemail.com [216.40.44.147]) by kanga.kvack.org (Postfix) with ESMTP id 6B8F66B0186 for ; Tue, 22 Mar 2022 17:47:35 -0400 (EDT) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 3237818289DE1 for ; Tue, 22 Mar 2022 21:47:35 +0000 (UTC) X-FDA: 79273359270.25.19CFB06 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf30.hostedemail.com (Postfix) with ESMTP id 91FDC80018 for ; Tue, 22 Mar 2022 21:47:34 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 4DF35B81D5F; Tue, 22 Mar 2022 21:47:33 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id D5659C340EC; Tue, 22 Mar 2022 21:47:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985651; bh=KgFU0n1i+NCLiF8oiQcUfsg9Cc1y0rto5vEx9fZitxQ=; h=Date:To:From:In-Reply-To:Subject:From; b=alVAI++1FipzlO36SSW/dHTeK+I8CgkdoT9AO7kK7OokaLChcUDVxuvz+fpOTGFFn 6t+4IMaeRx9fSOR8KpWangS0E6ayZ6CU2PKZRMRI4VjX6yXgIZ5DTVS5S4qRcHvKio YYZWyA8oqotmKt72C56EtfKQmab0t/WD83vqKWiw= Date: Tue, 22 Mar 2022 14:47:31 -0700 To: rparrazo@redhat.com,rafael@kernel.org,osalvador@suse.de,mhocko@suse.com,gregkh@linuxfoundation.org,david@redhat.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 177/227] drivers/base/memory: determine and store zone for single-zone memory blocks Message-Id: <20220322214731.D5659C340EC@smtp.kernel.org> X-Stat-Signature: a7paenejsscemieszremc63zteasnb5y X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 91FDC80018 Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=alVAI++1; dmarc=none; spf=pass (imf30.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Rspam-User: X-HE-Tag: 1647985654-251152 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: David Hildenbrand Subject: drivers/base/memory: determine and store zone for single-zone memory blocks test_pages_in_a_zone() is just another nasty PFN walker that can easily stumble over ZONE_DEVICE memory ranges falling into the same memory block as ordinary system RAM: the memmap of parts of these ranges might possibly be uninitialized. In fact, we observed (on an older kernel) with UBSAN: [ 7691.855626] UBSAN: Undefined behaviour in ./include/linux/mm.h:1133:50 [ 7691.862155] index 7 is out of range for type 'zone [5]' [ 7691.867393] CPU: 121 PID: 35603 Comm: read_all Kdump: loaded Tainted: [...] [ 7691.879990] Hardware name: Dell Inc. PowerEdge R7425/08V001, BIOS 1.12.2 11/15/2019 [ 7691.887643] Call Trace: [ 7691.890107] dump_stack+0x9a/0xf0 [ 7691.893438] ubsan_epilogue+0x9/0x7a [ 7691.897025] __ubsan_handle_out_of_bounds+0x13a/0x181 [ 7691.902086] ? __ubsan_handle_shift_out_of_bounds+0x289/0x289 [ 7691.907841] ? sched_clock_cpu+0x18/0x1e0 [ 7691.911867] ? __lock_acquire+0x610/0x38d0 [ 7691.915979] test_pages_in_a_zone+0x3c4/0x500 [ 7691.920357] show_valid_zones+0x1fa/0x380 [ 7691.924375] ? print_allowed_zone+0x80/0x80 [ 7691.928571] ? __lock_is_held+0xb4/0x140 [ 7691.932509] ? __lock_is_held+0xb4/0x140 [ 7691.936447] ? dev_attr_store+0x70/0x70 [ 7691.940296] dev_attr_show+0x43/0xb0 [ 7691.943884] ? memset+0x1f/0x40 [ 7691.947042] sysfs_kf_seq_show+0x1c5/0x440 [ 7691.951153] seq_read+0x49d/0x1190 [ 7691.954574] ? seq_escape+0x1f0/0x1f0 [ 7691.958249] ? fsnotify_first_mark+0x150/0x150 [ 7691.962713] vfs_read+0xff/0x300 [ 7691.965952] ksys_read+0xb8/0x170 [ 7691.969279] ? kernel_write+0x130/0x130 [ 7691.973126] ? entry_SYSCALL_64_after_hwframe+0x7a/0xdf [ 7691.978365] ? do_syscall_64+0x22/0x4b0 [ 7691.982212] do_syscall_64+0xa5/0x4b0 [ 7691.985887] entry_SYSCALL_64_after_hwframe+0x6a/0xdf [ 7691.990947] RIP: 0033:0x7f01f4439b52 We seem to stumble over a memmap that contains a garbage zone id. While we could try inserting pfn_to_online_page() calls, it will just make memory offlining slower, because we use test_pages_in_a_zone() to make sure we're offlining pages that all belong to the same zone. Let's just get rid of this PFN walker and determine the single zone of a memory block -- if any -- for early memory blocks during boot. For memory onlining, we know the single zone already. Let's avoid any additional memmap scanning and just rely on the zone information available during boot. For memory hot(un)plug, we only really care about memory blocks that: * span a single zone (and, thereby, a single node) * are completely System RAM (IOW, no holes, no ZONE_DEVICE) If one of these conditions is not met, we reject memory offlining. Hotplugged memory blocks (starting out offline), always meet both conditions. There are three scenarios to handle: (1) Memory hot(un)plug A memory block with zone == NULL cannot be offlined, corresponding to our previous test_pages_in_a_zone() check. After successful memory onlining/offlining, we simply set the zone accordingly. * Memory onlining: set the zone we just used for onlining * Memory offlining: set zone = NULL So a hotplugged memory block starts with zone = NULL. Once memory onlining is done, we set the proper zone. (2) Boot memory with !CONFIG_NUMA We know that there is just a single pgdat, so we simply scan all zones of that pgdat for an intersection with our memory block PFN range when adding the memory block. If more than one zone intersects (e.g., DMA and DMA32 on x86 for the first memory block) we set zone = NULL and consequently mimic what test_pages_in_a_zone() used to do. (3) Boot memory with CONFIG_NUMA At the point in time we create the memory block devices during boot, we don't know yet which nodes *actually* span a memory block. While we could scan all zones of all nodes for intersections, overlapping nodes complicate the situation and scanning all nodes is possibly expensive. But that problem has already been solved by the code that sets the node of a memory block and creates the link in the sysfs -- do_register_memory_block_under_node(). So, we hook into the code that sets the node id for a memory block. If we already have a different node id set for the memory block, we know that multiple nodes *actually* have PFNs falling into our memory block: we set zone = NULL and consequently mimic what test_pages_in_a_zone() used to do. If there is no node id set, we do the same as (2) for the given node. Note that the call order in driver_init() is: -> memory_dev_init(): create memory block devices -> node_dev_init(): link memory block devices to the node and set the node id So in summary, we detect if there is a single zone responsible for this memory block and we consequently store the zone in that case in the memory block, updating it during memory onlining/offlining. Link: https://lkml.kernel.org/r/20220210184359.235565-3-david@redhat.com Signed-off-by: David Hildenbrand Reported-by: Rafael Parra Reviewed-by: Oscar Salvador Cc: "Rafael J. Wysocki" Cc: Greg Kroah-Hartman Cc: Michal Hocko Cc: Rafael Parra Signed-off-by: Andrew Morton --- drivers/base/memory.c | 101 +++++++++++++++++++++++++++++-- drivers/base/node.c | 13 +-- include/linux/memory.h | 12 +++ include/linux/memory_hotplug.h | 6 - mm/memory_hotplug.c | 50 +++------------ 5 files changed, 125 insertions(+), 57 deletions(-) --- a/drivers/base/memory.c~drivers-base-memory-determine-and-store-zone-for-single-zone-memory-blocks +++ a/drivers/base/memory.c @@ -215,6 +215,7 @@ static int memory_block_online(struct me adjust_present_page_count(pfn_to_page(start_pfn), mem->group, nr_vmemmap_pages); + mem->zone = zone; return ret; } @@ -225,6 +226,9 @@ static int memory_block_offline(struct m unsigned long nr_vmemmap_pages = mem->nr_vmemmap_pages; int ret; + if (!mem->zone) + return -EINVAL; + /* * Unaccount before offlining, such that unpopulated zone and kthreads * can properly be torn down in offline_pages(). @@ -234,7 +238,7 @@ static int memory_block_offline(struct m -nr_vmemmap_pages); ret = offline_pages(start_pfn + nr_vmemmap_pages, - nr_pages - nr_vmemmap_pages, mem->group); + nr_pages - nr_vmemmap_pages, mem->zone, mem->group); if (ret) { /* offline_pages() failed. Account back. */ if (nr_vmemmap_pages) @@ -246,6 +250,7 @@ static int memory_block_offline(struct m if (nr_vmemmap_pages) mhp_deinit_memmap_on_memory(start_pfn, nr_vmemmap_pages); + mem->zone = NULL; return ret; } @@ -411,11 +416,10 @@ static ssize_t valid_zones_show(struct d */ if (mem->state == MEM_ONLINE) { /* - * The block contains more than one zone can not be offlined. - * This can happen e.g. for ZONE_DMA and ZONE_DMA32 + * If !mem->zone, the memory block spans multiple zones and + * cannot get offlined. */ - default_zone = test_pages_in_a_zone(start_pfn, - start_pfn + nr_pages); + default_zone = mem->zone; if (!default_zone) return sysfs_emit(buf, "%s\n", "none"); len += sysfs_emit_at(buf, len, "%s", default_zone->name); @@ -643,6 +647,82 @@ int register_memory(struct memory_block return ret; } +static struct zone *early_node_zone_for_memory_block(struct memory_block *mem, + int nid) +{ + const unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr); + const unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block; + struct zone *zone, *matching_zone = NULL; + pg_data_t *pgdat = NODE_DATA(nid); + int i; + + /* + * This logic only works for early memory, when the applicable zones + * already span the memory block. We don't expect overlapping zones on + * a single node for early memory. So if we're told that some PFNs + * of a node fall into this memory block, we can assume that all node + * zones that intersect with the memory block are actually applicable. + * No need to look at the memmap. + */ + for (i = 0; i < MAX_NR_ZONES; i++) { + zone = pgdat->node_zones + i; + if (!populated_zone(zone)) + continue; + if (!zone_intersects(zone, start_pfn, nr_pages)) + continue; + if (!matching_zone) { + matching_zone = zone; + continue; + } + /* Spans multiple zones ... */ + matching_zone = NULL; + break; + } + return matching_zone; +} + +#ifdef CONFIG_NUMA +/** + * memory_block_add_nid() - Indicate that system RAM falling into this memory + * block device (partially) belongs to the given node. + * @mem: The memory block device. + * @nid: The node id. + * @context: The memory initialization context. + * + * Indicate that system RAM falling into this memory block (partially) belongs + * to the given node. If the context indicates ("early") that we are adding the + * node during node device subsystem initialization, this will also properly + * set/adjust mem->zone based on the zone ranges of the given node. + */ +void memory_block_add_nid(struct memory_block *mem, int nid, + enum meminit_context context) +{ + if (context == MEMINIT_EARLY && mem->nid != nid) { + /* + * For early memory we have to determine the zone when setting + * the node id and handle multiple nodes spanning a single + * memory block by indicate via zone == NULL that we're not + * dealing with a single zone. So if we're setting the node id + * the first time, determine if there is a single zone. If we're + * setting the node id a second time to a different node, + * invalidate the single detected zone. + */ + if (mem->nid == NUMA_NO_NODE) + mem->zone = early_node_zone_for_memory_block(mem, nid); + else + mem->zone = NULL; + } + + /* + * If this memory block spans multiple nodes, we only indicate + * the last processed node. If we span multiple nodes (not applicable + * to hotplugged memory), zone == NULL will prohibit memory offlining + * and consequently unplug. + */ + mem->nid = nid; +} +#endif + static int init_memory_block(unsigned long block_id, unsigned long state, unsigned long nr_vmemmap_pages, struct memory_group *group) @@ -665,6 +745,17 @@ static int init_memory_block(unsigned lo mem->nr_vmemmap_pages = nr_vmemmap_pages; INIT_LIST_HEAD(&mem->group_next); +#ifndef CONFIG_NUMA + if (state == MEM_ONLINE) + /* + * MEM_ONLINE at this point implies early memory. With NUMA, + * we'll determine the zone when setting the node id via + * memory_block_add_nid(). Memory hotplug updated the zone + * manually when memory onlining/offlining succeeds. + */ + mem->zone = early_node_zone_for_memory_block(mem, NUMA_NO_NODE); +#endif /* CONFIG_NUMA */ + ret = register_memory(mem); if (ret) return ret; --- a/drivers/base/node.c~drivers-base-memory-determine-and-store-zone-for-single-zone-memory-blocks +++ a/drivers/base/node.c @@ -796,15 +796,12 @@ static int __ref get_nid_for_pfn(unsigne } static void do_register_memory_block_under_node(int nid, - struct memory_block *mem_blk) + struct memory_block *mem_blk, + enum meminit_context context) { int ret; - /* - * If this memory block spans multiple nodes, we only indicate - * the last processed node. - */ - mem_blk->nid = nid; + memory_block_add_nid(mem_blk, nid, context); ret = sysfs_create_link_nowarn(&node_devices[nid]->dev.kobj, &mem_blk->dev.kobj, @@ -857,7 +854,7 @@ static int register_mem_block_under_node if (page_nid != nid) continue; - do_register_memory_block_under_node(nid, mem_blk); + do_register_memory_block_under_node(nid, mem_blk, MEMINIT_EARLY); return 0; } /* mem section does not span the specified node */ @@ -873,7 +870,7 @@ static int register_mem_block_under_node { int nid = *(int *)arg; - do_register_memory_block_under_node(nid, mem_blk); + do_register_memory_block_under_node(nid, mem_blk, MEMINIT_HOTPLUG); return 0; } --- a/include/linux/memory.h~drivers-base-memory-determine-and-store-zone-for-single-zone-memory-blocks +++ a/include/linux/memory.h @@ -70,6 +70,13 @@ struct memory_block { unsigned long state; /* serialized by the dev->lock */ int online_type; /* for passing data to online routine */ int nid; /* NID for this memory block */ + /* + * The single zone of this memory block if all PFNs of this memory block + * that are System RAM (not a memory hole, not ZONE_DEVICE ranges) are + * managed by a single zone. NULL if multiple zones (including nodes) + * apply. + */ + struct zone *zone; struct device dev; /* * Number of vmemmap pages. These pages @@ -161,6 +168,11 @@ int walk_dynamic_memory_groups(int nid, }) #define register_hotmemory_notifier(nb) register_memory_notifier(nb) #define unregister_hotmemory_notifier(nb) unregister_memory_notifier(nb) + +#ifdef CONFIG_NUMA +void memory_block_add_nid(struct memory_block *mem, int nid, + enum meminit_context context); +#endif /* CONFIG_NUMA */ #endif /* CONFIG_MEMORY_HOTPLUG */ /* --- a/include/linux/memory_hotplug.h~drivers-base-memory-determine-and-store-zone-for-single-zone-memory-blocks +++ a/include/linux/memory_hotplug.h @@ -163,8 +163,6 @@ extern int mhp_init_memmap_on_memory(uns extern void mhp_deinit_memmap_on_memory(unsigned long pfn, unsigned long nr_pages); extern int online_pages(unsigned long pfn, unsigned long nr_pages, struct zone *zone, struct memory_group *group); -extern struct zone *test_pages_in_a_zone(unsigned long start_pfn, - unsigned long end_pfn); extern void __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn); @@ -293,7 +291,7 @@ static inline void pgdat_resize_init(str extern void try_offline_node(int nid); extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages, - struct memory_group *group); + struct zone *zone, struct memory_group *group); extern int remove_memory(u64 start, u64 size); extern void __remove_memory(u64 start, u64 size); extern int offline_and_remove_memory(u64 start, u64 size); @@ -302,7 +300,7 @@ extern int offline_and_remove_memory(u64 static inline void try_offline_node(int nid) {} static inline int offline_pages(unsigned long start_pfn, unsigned long nr_pages, - struct memory_group *group) + struct zone *zone, struct memory_group *group) { return -EINVAL; } --- a/mm/memory_hotplug.c~drivers-base-memory-determine-and-store-zone-for-single-zone-memory-blocks +++ a/mm/memory_hotplug.c @@ -1549,38 +1549,6 @@ bool mhp_range_allowed(u64 start, u64 si #ifdef CONFIG_MEMORY_HOTREMOVE /* - * Confirm all pages in a range [start, end) belong to the same zone (skipping - * memory holes). When true, return the zone. - */ -struct zone *test_pages_in_a_zone(unsigned long start_pfn, - unsigned long end_pfn) -{ - unsigned long pfn, sec_end_pfn; - struct zone *zone = NULL; - struct page *page; - - for (pfn = start_pfn, sec_end_pfn = SECTION_ALIGN_UP(start_pfn + 1); - pfn < end_pfn; - pfn = sec_end_pfn, sec_end_pfn += PAGES_PER_SECTION) { - /* Make sure the memory section is present first */ - if (!present_section_nr(pfn_to_section_nr(pfn))) - continue; - for (; pfn < sec_end_pfn && pfn < end_pfn; - pfn += MAX_ORDER_NR_PAGES) { - /* Check if we got outside of the zone */ - if (zone && !zone_spans_pfn(zone, pfn)) - return NULL; - page = pfn_to_page(pfn); - if (zone && page_zone(page) != zone) - return NULL; - zone = page_zone(page); - } - } - - return zone; -} - -/* * Scan pfn range [start,end) to find movable/migratable pages (LRU pages, * non-lru movable pages and hugepages). Will skip over most unmovable * pages (esp., pages that can be skipped when offlining), but bail out on @@ -1803,15 +1771,15 @@ static int count_system_ram_pages_cb(uns } int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages, - struct memory_group *group) + struct zone *zone, struct memory_group *group) { const unsigned long end_pfn = start_pfn + nr_pages; unsigned long pfn, system_ram_pages = 0; + const int node = zone_to_nid(zone); unsigned long flags; - struct zone *zone; struct memory_notify arg; - int ret, node; char *reason; + int ret; /* * {on,off}lining is constrained to full memory sections (or more @@ -1843,15 +1811,17 @@ int __ref offline_pages(unsigned long st goto failed_removal; } - /* This makes hotplug much easier...and readable. - we assume this for now. .*/ - zone = test_pages_in_a_zone(start_pfn, end_pfn); - if (!zone) { + /* + * We only support offlining of memory blocks managed by a single zone, + * checked by calling code. This is just a sanity check that we might + * want to remove in the future. + */ + if (WARN_ON_ONCE(page_zone(pfn_to_page(start_pfn)) != zone || + page_zone(pfn_to_page(end_pfn - 1)) != zone)) { ret = -EINVAL; reason = "multizone range"; goto failed_removal; } - node = zone_to_nid(zone); /* * Disable pcplists so that page isolation cannot race with freeing From patchwork Tue Mar 22 21:47:34 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789262 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 70806C433F5 for ; Tue, 22 Mar 2022 21:47:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DC0C36B0188; Tue, 22 Mar 2022 17:47:36 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D72D26B0189; Tue, 22 Mar 2022 17:47:36 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C38C16B018A; Tue, 22 Mar 2022 17:47:36 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.a.hostedemail.com [64.99.140.24]) by kanga.kvack.org (Postfix) with ESMTP id B37E26B0188 for ; Tue, 22 Mar 2022 17:47:36 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 8C79D23FF7 for ; Tue, 22 Mar 2022 21:47:36 +0000 (UTC) X-FDA: 79273359312.06.DC7DC2B Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf01.hostedemail.com (Postfix) with ESMTP id 0CFAA4001E for ; Tue, 22 Mar 2022 21:47:35 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 7ECDD61658; Tue, 22 Mar 2022 21:47:35 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id D1A7BC340EC; Tue, 22 Mar 2022 21:47:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985654; bh=hZmnlFBROMKtttPam+a64W/geo9wV78nPOc3rq/yyfQ=; h=Date:To:From:In-Reply-To:Subject:From; b=E6XXtJCZESguDOPaRnXuMGIHtD+ba/+H7VYtPoiL4FoC1zlzlCUA25KlQm/K0GWKX M3vJIU10jgbXapsuqtkCcfkChFEu9+sEYLdllhrfCWV6V/ZHcv7rZ+OiQd+EwZw1v2 gM+rr0vCsDUNJnLec/XxT3j3d3crNSAgoq6Y8PHs= Date: Tue, 22 Mar 2022 14:47:34 -0700 To: rafael@kernel.org,osalvador@suse.de,mhocko@suse.com,gregkh@linuxfoundation.org,david@redhat.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 178/227] drivers/base/memory: clarify adding and removing of memory blocks Message-Id: <20220322214734.D1A7BC340EC@smtp.kernel.org> X-Stat-Signature: 1idwhnr8kbhpuj9c1cwa3ccsedshoutb Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=E6XXtJCZ; dmarc=none; spf=pass (imf01.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Rspam-User: X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: 0CFAA4001E X-HE-Tag: 1647985655-480799 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: David Hildenbrand Subject: drivers/base/memory: clarify adding and removing of memory blocks Let's make it clearer at which places we actually add and remove memory blocks -- streamlining the terminology -- and highlight which memory block start out online and which start out as offline. * rename add_memory_block -> add_boot_memory_block * rename init_memory_block -> add_memory_block * rename unregister_memory -> remove_memory_block * rename register_memory -> __add_memory_block * add add_hotplug_memory_block * mark add_boot_memory_block with __init (suggested by Oscar) __add_memory_block() is a pure helper for add_memory_block(), remove the somewhat obvious comment. Link: https://lkml.kernel.org/r/20220221154531.11382-1-david@redhat.com Signed-off-by: David Hildenbrand Reviewed-by: Oscar Salvador Cc: "Rafael J. Wysocki" Cc: Greg Kroah-Hartman Cc: Michal Hocko Signed-off-by: Andrew Morton --- drivers/base/memory.c | 38 ++++++++++++++++++++------------------ 1 file changed, 20 insertions(+), 18 deletions(-) --- a/drivers/base/memory.c~drivers-base-memory-clarify-adding-and-removing-of-memory-blocks +++ a/drivers/base/memory.c @@ -619,11 +619,7 @@ static const struct attribute_group *mem NULL, }; -/* - * register_memory - Setup a sysfs device for a memory block - */ -static -int register_memory(struct memory_block *memory) +static int __add_memory_block(struct memory_block *memory) { int ret; @@ -723,9 +719,9 @@ void memory_block_add_nid(struct memory_ } #endif -static int init_memory_block(unsigned long block_id, unsigned long state, - unsigned long nr_vmemmap_pages, - struct memory_group *group) +static int add_memory_block(unsigned long block_id, unsigned long state, + unsigned long nr_vmemmap_pages, + struct memory_group *group) { struct memory_block *mem; int ret = 0; @@ -756,7 +752,7 @@ static int init_memory_block(unsigned lo mem->zone = early_node_zone_for_memory_block(mem, NUMA_NO_NODE); #endif /* CONFIG_NUMA */ - ret = register_memory(mem); + ret = __add_memory_block(mem); if (ret) return ret; @@ -768,7 +764,7 @@ static int init_memory_block(unsigned lo return 0; } -static int add_memory_block(unsigned long base_section_nr) +static int __init add_boot_memory_block(unsigned long base_section_nr) { int section_count = 0; unsigned long nr; @@ -780,11 +776,18 @@ static int add_memory_block(unsigned lon if (section_count == 0) return 0; - return init_memory_block(memory_block_id(base_section_nr), - MEM_ONLINE, 0, NULL); + return add_memory_block(memory_block_id(base_section_nr), + MEM_ONLINE, 0, NULL); +} + +static int add_hotplug_memory_block(unsigned long block_id, + unsigned long nr_vmemmap_pages, + struct memory_group *group) +{ + return add_memory_block(block_id, MEM_OFFLINE, nr_vmemmap_pages, group); } -static void unregister_memory(struct memory_block *memory) +static void remove_memory_block(struct memory_block *memory) { if (WARN_ON_ONCE(memory->dev.bus != &memory_subsys)) return; @@ -823,8 +826,7 @@ int create_memory_block_devices(unsigned return -EINVAL; for (block_id = start_block_id; block_id != end_block_id; block_id++) { - ret = init_memory_block(block_id, MEM_OFFLINE, vmemmap_pages, - group); + ret = add_hotplug_memory_block(block_id, vmemmap_pages, group); if (ret) break; } @@ -835,7 +837,7 @@ int create_memory_block_devices(unsigned mem = find_memory_block_by_id(block_id); if (WARN_ON_ONCE(!mem)) continue; - unregister_memory(mem); + remove_memory_block(mem); } } return ret; @@ -864,7 +866,7 @@ void remove_memory_block_devices(unsigne if (WARN_ON_ONCE(!mem)) continue; unregister_memory_block_under_nodes(mem); - unregister_memory(mem); + remove_memory_block(mem); } } @@ -924,7 +926,7 @@ void __init memory_dev_init(void) */ for (nr = 0; nr <= __highest_present_section_nr; nr += sections_per_block) { - ret = add_memory_block(nr); + ret = add_boot_memory_block(nr); if (ret) panic("%s() failed to add memory block: %d\n", __func__, ret); From patchwork Tue Mar 22 21:47:37 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789263 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B7D6AC433F5 for ; Tue, 22 Mar 2022 21:47:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 498FA6B018A; Tue, 22 Mar 2022 17:47:41 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4493F6B018B; Tue, 22 Mar 2022 17:47:41 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 311BC6B018C; Tue, 22 Mar 2022 17:47:41 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.25]) by kanga.kvack.org (Postfix) with ESMTP id 234A16B018A for ; Tue, 22 Mar 2022 17:47:41 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id F020020E55 for ; Tue, 22 Mar 2022 21:47:40 +0000 (UTC) X-FDA: 79273359480.04.15A0AB2 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf23.hostedemail.com (Postfix) with ESMTP id 4CF37140034 for ; Tue, 22 Mar 2022 21:47:40 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 3BC60B81DB0; Tue, 22 Mar 2022 21:47:39 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id EDF5BC340EC; Tue, 22 Mar 2022 21:47:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985658; bh=PKBM4rTGgiS0BGv1YOa2LNcKh6Y8F4G0wabfTEppIwc=; h=Date:To:From:In-Reply-To:Subject:From; b=y63Q6xmI2dDwq+Pl3592Z/OD/jYEdT1Q/B+Qg/qpHt8tp0PJ9VDnVY1KS6D9pzIlX F2nLV6kdwug+d7/ZxMtsXD7Pf6NodpQ02lx51zPMPx6673fawzo55spDW4MJWF3yfx t65dC9zXN89PYZnSVf521Ltt4dfvKCqmTF+iGqN4= Date: Tue, 22 Mar 2022 14:47:37 -0700 To: ying.huang@intel.com,stable@vger.kernel.org,huntbag@linux.vnet.ibm.com,dave.hansen@linux.intel.com,baolin.wang@linux.alibaba.com,osalvador@suse.de,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 179/227] mm: only re-generate demotion targets when a numa node changes its N_CPU state Message-Id: <20220322214737.EDF5BC340EC@smtp.kernel.org> X-Rspam-User: X-Stat-Signature: ji1o1jnayioau73f9myij1i4demjkja6 Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=y63Q6xmI; spf=pass (imf23.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 4CF37140034 X-HE-Tag: 1647985660-307607 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Oscar Salvador Subject: mm: only re-generate demotion targets when a numa node changes its N_CPU state Abhishek reported that after patch [1], hotplug operations are taking ~double the expected time. [2] The reason behind is that the CPU callbacks that migrate_on_reclaim_init() sets always call set_migration_target_nodes() whenever a CPU is brought up/down. But we only care about numa nodes going from having cpus to become cpuless, and vice versa, as that influences the demotion_target order. We do already have two CPU callbacks (vmstat_cpu_online() and vmstat_cpu_dead()) that check exactly that, so get rid of the CPU callbacks in migrate_on_reclaim_init() and only call set_migration_target_nodes() from vmstat_cpu_{dead,online}() whenever a numa node change its N_CPU state. [1] https://lore.kernel.org/linux-mm/20210721063926.3024591-2-ying.huang@intel.com/ [2] https://lore.kernel.org/linux-mm/eb438ddd-2919-73d4-bd9f-b7eecdd9577a@linux.vnet.ibm.com/ [osalvador@suse.de: add feedback from Huang Ying] Link: https://lkml.kernel.org/r/20220314150945.12694-1-osalvador@suse.de Link: https://lkml.kernel.org/r/20220310120749.23077-1-osalvador@suse.de Fixes: 884a6e5d1f93b ("mm/migrate: update node demotion order on hotplug events") Signed-off-by: Oscar Salvador Reviewed-by: Baolin Wang Tested-by: Baolin Wang Reported-by: Abhishek Goel Cc: Dave Hansen Cc: "Huang, Ying" Cc: Abhishek Goel Cc: Signed-off-by: Andrew Morton --- include/linux/migrate.h | 8 ++++++ mm/migrate.c | 47 ++++++++------------------------------ mm/vmstat.c | 13 +++++++++- 3 files changed, 30 insertions(+), 38 deletions(-) --- a/include/linux/migrate.h~mm-only-re-generate-demotion-targets-when-a-numa-node-changes-its-n_cpu-state +++ a/include/linux/migrate.h @@ -48,7 +48,15 @@ int folio_migrate_mapping(struct address struct folio *newfolio, struct folio *folio, int extra_count); extern bool numa_demotion_enabled; +extern void migrate_on_reclaim_init(void); +#ifdef CONFIG_HOTPLUG_CPU +extern void set_migration_target_nodes(void); #else +static inline void set_migration_target_nodes(void) {} +#endif +#else + +static inline void set_migration_target_nodes(void) {} static inline void putback_movable_pages(struct list_head *l) {} static inline int migrate_pages(struct list_head *l, new_page_t new, --- a/mm/migrate.c~mm-only-re-generate-demotion-targets-when-a-numa-node-changes-its-n_cpu-state +++ a/mm/migrate.c @@ -3209,7 +3209,7 @@ again: /* * For callers that do not hold get_online_mems() already. */ -static void set_migration_target_nodes(void) +void set_migration_target_nodes(void) { get_online_mems(); __set_migration_target_nodes(); @@ -3273,51 +3273,24 @@ static int __meminit migrate_on_reclaim_ return notifier_from_errno(0); } -/* - * React to hotplug events that might affect the migration targets - * like events that online or offline NUMA nodes. - * - * The ordering is also currently dependent on which nodes have - * CPUs. That means we need CPU on/offline notification too. - */ -static int migration_online_cpu(unsigned int cpu) -{ - set_migration_target_nodes(); - return 0; -} - -static int migration_offline_cpu(unsigned int cpu) -{ - set_migration_target_nodes(); - return 0; -} - -static int __init migrate_on_reclaim_init(void) +void __init migrate_on_reclaim_init(void) { - int ret; - node_demotion = kmalloc_array(nr_node_ids, sizeof(struct demotion_nodes), GFP_KERNEL); WARN_ON(!node_demotion); - ret = cpuhp_setup_state_nocalls(CPUHP_MM_DEMOTION_DEAD, "mm/demotion:offline", - NULL, migration_offline_cpu); + hotplug_memory_notifier(migrate_on_reclaim_callback, 100); /* - * In the unlikely case that this fails, the automatic - * migration targets may become suboptimal for nodes - * where N_CPU changes. With such a small impact in a - * rare case, do not bother trying to do anything special. + * At this point, all numa nodes with memory/CPus have their state + * properly set, so we can build the demotion order now. + * Let us hold the cpu_hotplug lock just, as we could possibily have + * CPU hotplug events during boot. */ - WARN_ON(ret < 0); - ret = cpuhp_setup_state(CPUHP_AP_MM_DEMOTION_ONLINE, "mm/demotion:online", - migration_online_cpu, NULL); - WARN_ON(ret < 0); - - hotplug_memory_notifier(migrate_on_reclaim_callback, 100); - return 0; + cpus_read_lock(); + set_migration_target_nodes(); + cpus_read_unlock(); } -late_initcall(migrate_on_reclaim_init); #endif /* CONFIG_HOTPLUG_CPU */ bool numa_demotion_enabled = false; --- a/mm/vmstat.c~mm-only-re-generate-demotion-targets-when-a-numa-node-changes-its-n_cpu-state +++ a/mm/vmstat.c @@ -28,6 +28,7 @@ #include #include #include +#include #include "internal.h" @@ -2049,7 +2050,12 @@ static void __init init_cpu_node_state(v static int vmstat_cpu_online(unsigned int cpu) { refresh_zone_stat_thresholds(); - node_set_state(cpu_to_node(cpu), N_CPU); + + if (!node_state(cpu_to_node(cpu), N_CPU)) { + node_set_state(cpu_to_node(cpu), N_CPU); + set_migration_target_nodes(); + } + return 0; } @@ -2072,6 +2078,8 @@ static int vmstat_cpu_dead(unsigned int return 0; node_clear_state(node, N_CPU); + set_migration_target_nodes(); + return 0; } @@ -2103,6 +2111,9 @@ void __init init_mm_internals(void) start_shepherd_timer(); #endif +#if defined(CONFIG_MIGRATION) && defined(CONFIG_HOTPLUG_CPU) + migrate_on_reclaim_init(); +#endif #ifdef CONFIG_PROC_FS proc_create_seq("buddyinfo", 0444, NULL, &fragmentation_op); proc_create_seq("pagetypeinfo", 0400, NULL, &pagetypeinfo_op); From patchwork Tue Mar 22 21:47:40 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789264 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 93DF7C433EF for ; Tue, 22 Mar 2022 21:47:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2289B6B018C; Tue, 22 Mar 2022 17:47:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1D8E56B018D; Tue, 22 Mar 2022 17:47:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 079E96B018E; Tue, 22 Mar 2022 17:47:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0093.hostedemail.com [216.40.44.93]) by kanga.kvack.org (Postfix) with ESMTP id E95066B018C for ; Tue, 22 Mar 2022 17:47:43 -0400 (EDT) Received: from smtpin17.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id B44FDA4DAC for ; Tue, 22 Mar 2022 21:47:43 +0000 (UTC) X-FDA: 79273359606.17.5F5AC79 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf02.hostedemail.com (Postfix) with ESMTP id 468858000B for ; Tue, 22 Mar 2022 21:47:43 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 32167B81DAB; Tue, 22 Mar 2022 21:47:42 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id E123AC340EC; Tue, 22 Mar 2022 21:47:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985661; bh=dF8NF39j9FHPKjFgkJgcfFPmWB58valHeHG0ZipHqBo=; h=Date:To:From:In-Reply-To:Subject:From; b=Gb6IAwQ82eXxeIfhTlABwtWjdeiHo4n19Tmy4e37auS9yFMbjUypdUlgE5rgSdDVT aYPSF4vGqTTIwGtwGyxw1tEbcqaL5qKdYLfsoSxeq2IaSzPU5SvNHEwaMsO5+6PBdz 3dz814Z/AhV6OKvE80lJtRFHGI2IBsOPrvvDW4E8= Date: Tue, 22 Mar 2022 14:47:40 -0700 To: shy828301@gmail.com,kirill.shutemov@linux.intel.com,hughd@google.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 180/227] mm/thp: ClearPageDoubleMap in first page_add_file_rmap() Message-Id: <20220322214740.E123AC340EC@smtp.kernel.org> X-Stat-Signature: g85xcnei9jf6dayi5fg59j4oogrkc8s7 Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=Gb6IAwQ8; spf=pass (imf02.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspam-User: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 468858000B X-HE-Tag: 1647985663-360934 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Hugh Dickins Subject: mm/thp: ClearPageDoubleMap in first page_add_file_rmap() PageDoubleMap is maintained differently for anon and for shmem+file: the shmem+file one was never cleared, because a safe place to do so could not be found; so it would blight future use of the cached hugepage until evicted. See https://lore.kernel.org/lkml/1571938066-29031-1-git-send-email-yang.shi@linux.alibaba.com/ But page_add_file_rmap() does provide a safe place to do so (though later than one might wish): allowing testing to return to an initial state without a damaging drop_caches. Link: https://lkml.kernel.org/r/61c5cf99-a962-9a25-597a-53ab1bd8fbc0@google.com Fixes: 9a73f61bdb8a ("thp, mlock: do not mlock PTE-mapped file huge pages") Signed-off-by: Hugh Dickins Reviewed-by: Yang Shi Cc: "Kirill A. Shutemov" Signed-off-by: Andrew Morton --- mm/rmap.c | 11 +++++++++++ 1 file changed, 11 insertions(+) --- a/mm/rmap.c~mm-thp-clearpagedoublemap-in-first-page_add_file_rmap +++ a/mm/rmap.c @@ -1252,6 +1252,17 @@ void page_add_file_rmap(struct page *pag } if (!atomic_inc_and_test(compound_mapcount_ptr(page))) goto out; + + /* + * It is racy to ClearPageDoubleMap in page_remove_file_rmap(); + * but page lock is held by all page_add_file_rmap() compound + * callers, and SetPageDoubleMap below warns if !PageLocked: + * so here is a place that DoubleMap can be safely cleared. + */ + VM_WARN_ON_ONCE(!PageLocked(page)); + if (nr == nr_pages && PageDoubleMap(page)) + ClearPageDoubleMap(page); + if (PageSwapBacked(page)) __mod_lruvec_page_state(page, NR_SHMEM_PMDMAPPED, nr_pages); From patchwork Tue Mar 22 21:47:43 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789265 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CA647C433F5 for ; Tue, 22 Mar 2022 21:47:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 621DD6B0190; Tue, 22 Mar 2022 17:47:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5CF996B018F; Tue, 22 Mar 2022 17:47:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4701F6B0190; Tue, 22 Mar 2022 17:47:47 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0218.hostedemail.com [216.40.44.218]) by kanga.kvack.org (Postfix) with ESMTP id 380216B018E for ; Tue, 22 Mar 2022 17:47:47 -0400 (EDT) Received: from smtpin17.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id EF0AAA4DBC for ; Tue, 22 Mar 2022 21:47:46 +0000 (UTC) X-FDA: 79273359732.17.E33DA16 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf09.hostedemail.com (Postfix) with ESMTP id 53870140020 for ; Tue, 22 Mar 2022 21:47:46 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 41712B81DB0; Tue, 22 Mar 2022 21:47:45 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id F3CCCC340EE; Tue, 22 Mar 2022 21:47:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985664; bh=WCsIdgvm4p3k+5Cv/PlRL2Jz2UslW1DVV3Oh54qbpSc=; h=Date:To:From:In-Reply-To:Subject:From; b=Yg0t/pwjbmCAlOjlL9bpiOQwUD8OX99GKGBEEK6Pa9+GjWrv/2Mv60/IPlQtYRD/m TzXlOW2wO6+KOima4Ibt8TUVeXISFou4nyMbQSVT9kpDsJnCZ2AhWOI4KhRxD1FOFk 50uu28z6c45CsI8rHwu8pnOISXdLhZgj+hUlioFw= Date: Tue, 22 Mar 2022 14:47:43 -0700 To: vitaly.wool@konsulko.com,sjenning@redhat.com,ddstreet@ieee.org,maciej.szmigiero@oracle.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 181/227] mm/zswap.c: allow handling just same-value filled pages Message-Id: <20220322214743.F3CCCC340EE@smtp.kernel.org> X-Stat-Signature: dyet9zfro8zu8r788k1z6k4tx5aow3fb X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 53870140020 Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b="Yg0t/pwj"; dmarc=none; spf=pass (imf09.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Rspam-User: X-HE-Tag: 1647985666-815972 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: "Maciej S. Szmigiero" Subject: mm/zswap.c: allow handling just same-value filled pages Zswap has an ability to efficiently store same-value filled pages, which can be turned on and off using the "same_filled_pages_enabled" parameter. However, there is currently no way to enable just this (lightweight) functionality, while not making use of the whole compressed page storage machinery. Add a "non_same_filled_pages_enabled" parameter which allows disabling handling of pages that aren't same-value filled. This way zswap can be run in such lightweight same-value filled pages only mode. Link: https://lkml.kernel.org/r/7dbafa963e8bab43608189abbe2067f4b9287831.1641247624.git.maciej.szmigiero@oracle.com Signed-off-by: Maciej S. Szmigiero Cc: Seth Jennings Cc: Dan Streetman Cc: Vitaly Wool Signed-off-by: Andrew Morton --- Documentation/admin-guide/mm/zswap.rst | 22 +++++++++++++++++++--- mm/zswap.c | 15 ++++++++++++++- 2 files changed, 33 insertions(+), 4 deletions(-) --- a/Documentation/admin-guide/mm/zswap.rst~mm-zswapc-allow-handling-just-same-value-filled-pages +++ a/Documentation/admin-guide/mm/zswap.rst @@ -130,9 +130,25 @@ attribute, e.g.:: echo 1 > /sys/module/zswap/parameters/same_filled_pages_enabled When zswap same-filled page identification is disabled at runtime, it will stop -checking for the same-value filled pages during store operation. However, the -existing pages which are marked as same-value filled pages remain stored -unchanged in zswap until they are either loaded or invalidated. +checking for the same-value filled pages during store operation. +In other words, every page will be then considered non-same-value filled. +However, the existing pages which are marked as same-value filled pages remain +stored unchanged in zswap until they are either loaded or invalidated. + +In some circumstances it might be advantageous to make use of just the zswap +ability to efficiently store same-filled pages without enabling the whole +compressed page storage. +In this case the handling of non-same-value pages by zswap (enabled by default) +can be disabled by setting the ``non_same_filled_pages_enabled`` attribute +to 0, e.g. ``zswap.non_same_filled_pages_enabled=0``. +It can also be enabled and disabled at runtime using the sysfs +``non_same_filled_pages_enabled`` attribute, e.g.:: + + echo 1 > /sys/module/zswap/parameters/non_same_filled_pages_enabled + +Disabling both ``zswap.same_filled_pages_enabled`` and +``zswap.non_same_filled_pages_enabled`` effectively disables accepting any new +pages by zswap. To prevent zswap from shrinking pool when zswap is full and there's a high pressure on swap (this will result in flipping pages in and out zswap pool --- a/mm/zswap.c~mm-zswapc-allow-handling-just-same-value-filled-pages +++ a/mm/zswap.c @@ -120,11 +120,19 @@ static unsigned int zswap_accept_thr_per module_param_named(accept_threshold_percent, zswap_accept_thr_percent, uint, 0644); -/* Enable/disable handling same-value filled pages (enabled by default) */ +/* + * Enable/disable handling same-value filled pages (enabled by default). + * If disabled every page is considered non-same-value filled. + */ static bool zswap_same_filled_pages_enabled = true; module_param_named(same_filled_pages_enabled, zswap_same_filled_pages_enabled, bool, 0644); +/* Enable/disable handling non-same-value filled pages (enabled by default) */ +static bool zswap_non_same_filled_pages_enabled = true; +module_param_named(non_same_filled_pages_enabled, zswap_non_same_filled_pages_enabled, + bool, 0644); + /********************************* * data structures **********************************/ @@ -1147,6 +1155,11 @@ static int zswap_frontswap_store(unsigne kunmap_atomic(src); } + if (!zswap_non_same_filled_pages_enabled) { + ret = -EINVAL; + goto freepage; + } + /* if entry is successfully added, it keeps the reference */ entry->pool = zswap_pool_current_get(); if (!entry->pool) { From patchwork Tue Mar 22 21:47:46 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789266 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AF5D2C433EF for ; Tue, 22 Mar 2022 21:47:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 44BA56B018F; Tue, 22 Mar 2022 17:47:50 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3F9F56B0191; Tue, 22 Mar 2022 17:47:50 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2C2356B0192; Tue, 22 Mar 2022 17:47:50 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0234.hostedemail.com [216.40.44.234]) by kanga.kvack.org (Postfix) with ESMTP id 1AC0A6B018F for ; Tue, 22 Mar 2022 17:47:50 -0400 (EDT) Received: from smtpin20.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id D80AEA4A62 for ; Tue, 22 Mar 2022 21:47:49 +0000 (UTC) X-FDA: 79273359858.20.11287FB Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf28.hostedemail.com (Postfix) with ESMTP id 7953CC002A for ; Tue, 22 Mar 2022 21:47:49 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 666E1B81DB7; Tue, 22 Mar 2022 21:47:48 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id F3E62C340EC; Tue, 22 Mar 2022 21:47:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985667; bh=wSq2BlYXr458jeO5QBHmxSyrfGF3SPE45BMzWNlzMko=; h=Date:To:From:In-Reply-To:Subject:From; b=isH+xXdff08DAlM+70Z0mlENxEN3gISUMF4TQjVsHMYWY4mC2H5RCsPXWYS3yfVYq m4ng2ogZHAvjdsSrdAgXxMeRSE2rbZYVaVifykHrhvs4cDFp4JU5EpYrCLQSLdrkPz PcG3Q13WtnEMf0gHzebp6HDgoYmlfogqEgtFChUQ= Date: Tue, 22 Mar 2022 14:47:46 -0700 To: steve@sk2.org,songmuchun@bytedance.com,linmiaohe@huawei.com,keescook@chromium.org,christophe.leroy@csgroup.eu,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 182/227] mm: remove usercopy_warn() Message-Id: <20220322214746.F3E62C340EC@smtp.kernel.org> X-Stat-Signature: jo19hfzzp31kstfjci5fm6enr5ma46ti Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=isH+xXdf; spf=pass (imf28.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 7953CC002A X-HE-Tag: 1647985669-216648 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Christophe Leroy Subject: mm: remove usercopy_warn() Users of usercopy_warn() were removed by commit 53944f171a89 ("mm: remove HARDENED_USERCOPY_FALLBACK") Remove it. Link: https://lkml.kernel.org/r/5f26643fc70b05f8455b60b99c30c17d635fa640.1644231910.git.christophe.leroy@csgroup.eu Signed-off-by: Christophe Leroy Reviewed-by: Miaohe Lin Reviewed-by: Stephen Kitt Reviewed-by: Muchun Song Cc: Kees Cook Signed-off-by: Andrew Morton --- include/linux/uaccess.h | 2 -- mm/usercopy.c | 11 ----------- 2 files changed, 13 deletions(-) --- a/include/linux/uaccess.h~mm-remove-usercopy_warn +++ a/include/linux/uaccess.h @@ -401,8 +401,6 @@ static inline void user_access_restore(u #endif #ifdef CONFIG_HARDENED_USERCOPY -void usercopy_warn(const char *name, const char *detail, bool to_user, - unsigned long offset, unsigned long len); void __noreturn usercopy_abort(const char *name, const char *detail, bool to_user, unsigned long offset, unsigned long len); --- a/mm/usercopy.c~mm-remove-usercopy_warn +++ a/mm/usercopy.c @@ -70,17 +70,6 @@ static noinline int check_stack_object(c * kmem_cache_create_usercopy() function to create the cache (and * carefully audit the whitelist range). */ -void usercopy_warn(const char *name, const char *detail, bool to_user, - unsigned long offset, unsigned long len) -{ - WARN_ONCE(1, "Bad or missing usercopy whitelist? Kernel memory %s attempt detected %s %s%s%s%s (offset %lu, size %lu)!\n", - to_user ? "exposure" : "overwrite", - to_user ? "from" : "to", - name ? : "unknown?!", - detail ? " '" : "", detail ? : "", detail ? "'" : "", - offset, len); -} - void __noreturn usercopy_abort(const char *name, const char *detail, bool to_user, unsigned long offset, unsigned long len) From patchwork Tue Mar 22 21:47:49 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789267 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C9964C43217 for ; Tue, 22 Mar 2022 21:47:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 55E836B0192; Tue, 22 Mar 2022 17:47:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 50BD36B0193; Tue, 22 Mar 2022 17:47:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3FC646B0194; Tue, 22 Mar 2022 17:47:53 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0079.hostedemail.com [216.40.44.79]) by kanga.kvack.org (Postfix) with ESMTP id 312406B0192 for ; Tue, 22 Mar 2022 17:47:53 -0400 (EDT) Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id E51D48249980 for ; Tue, 22 Mar 2022 21:47:52 +0000 (UTC) X-FDA: 79273359984.29.F2EA485 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf16.hostedemail.com (Postfix) with ESMTP id 74A2D18002C for ; Tue, 22 Mar 2022 21:47:52 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 60CD8B81DC3; Tue, 22 Mar 2022 21:47:51 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0C9DCC340EC; Tue, 22 Mar 2022 21:47:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985670; bh=CBQauugBlttK82R1G0LQ5qfgK60FXc1ArnvAVJh/Nlo=; h=Date:To:From:In-Reply-To:Subject:From; b=wUnE7ZW4psb92RHZU7k9l7oXy0IGNqBDq3gX+PX37JhWmKqnrE6//obsRVWr6FMwf Rz2c4JWgewz86wC1cBaWxxlS9gXRUW64s1np/c1d22ooG3DLZNEC91JSu6oJKo9Pzn M8wwQDKyIOB+1ob/xiiAgA8HozEAWGslPYBPrG40= Date: Tue, 22 Mar 2022 14:47:49 -0700 To: David.Laight@ACULAB.COM,anshuman.khandual@arm.com,christophe.leroy@csgroup.eu,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 183/227] mm: uninline copy_overflow() Message-Id: <20220322214750.0C9DCC340EC@smtp.kernel.org> X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 74A2D18002C X-Stat-Signature: uskqumqn9whg85ps9dr8fsthtqj5f68h Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=wUnE7ZW4; dmarc=none; spf=pass (imf16.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Rspam-User: X-HE-Tag: 1647985672-475855 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Christophe Leroy Subject: mm: uninline copy_overflow() While building a small config with CONFIG_CC_OPTIMISE_FOR_SIZE, I ended up with more than 50 times the following function in vmlinux because GCC doesn't honor the 'inline' keyword: c00243bc : c00243bc: 94 21 ff f0 stwu r1,-16(r1) c00243c0: 7c 85 23 78 mr r5,r4 c00243c4: 7c 64 1b 78 mr r4,r3 c00243c8: 3c 60 c0 62 lis r3,-16286 c00243cc: 7c 08 02 a6 mflr r0 c00243d0: 38 63 5e e5 addi r3,r3,24293 c00243d4: 90 01 00 14 stw r0,20(r1) c00243d8: 4b ff 82 45 bl c001c61c <__warn_printk> c00243dc: 0f e0 00 00 twui r0,0 c00243e0: 80 01 00 14 lwz r0,20(r1) c00243e4: 38 21 00 10 addi r1,r1,16 c00243e8: 7c 08 03 a6 mtlr r0 c00243ec: 4e 80 00 20 blr With -Winline, GCC tells: /include/linux/thread_info.h:212:20: warning: inlining failed in call to 'copy_overflow': call is unlikely and code size would grow [-Winline] copy_overflow() is a non conditional warning called by check_copy_size() on an error path. check_copy_size() have to remain inlined in order to benefit from constant folding, but copy_overflow() is not worth inlining. Uninline the warning when CONFIG_BUG is selected. When CONFIG_BUG is not selected, WARN() does nothing so skip it. This reduces the size of vmlinux by almost 4kbytes. Link: https://lkml.kernel.org/r/e1723b9cfa924bcefcd41f69d0025b38e4c9364e.1644819985.git.christophe.leroy@csgroup.eu Signed-off-by: Christophe Leroy Cc: David Laight Cc: Anshuman Khandual Signed-off-by: Andrew Morton --- include/linux/thread_info.h | 5 ++++- mm/maccess.c | 6 ++++++ 2 files changed, 10 insertions(+), 1 deletion(-) --- a/include/linux/thread_info.h~mm-uninline-copy_overflow +++ a/include/linux/thread_info.h @@ -209,9 +209,12 @@ __bad_copy_from(void); extern void __compiletime_error("copy destination size is too small") __bad_copy_to(void); +void __copy_overflow(int size, unsigned long count); + static inline void copy_overflow(int size, unsigned long count) { - WARN(1, "Buffer overflow detected (%d < %lu)!\n", size, count); + if (IS_ENABLED(CONFIG_BUG)) + __copy_overflow(size, count); } static __always_inline __must_check bool --- a/mm/maccess.c~mm-uninline-copy_overflow +++ a/mm/maccess.c @@ -335,3 +335,9 @@ long strnlen_user_nofault(const void __u return ret; } + +void __copy_overflow(int size, unsigned long count) +{ + WARN(1, "Buffer overflow detected (%d < %lu)!\n", size, count); +} +EXPORT_SYMBOL(__copy_overflow); From patchwork Tue Mar 22 21:47:52 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789268 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id ECC11C4332F for ; Tue, 22 Mar 2022 21:47:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7A69E6B0194; Tue, 22 Mar 2022 17:47:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 757086B0195; Tue, 22 Mar 2022 17:47:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5D1DF6B0196; Tue, 22 Mar 2022 17:47:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.28]) by kanga.kvack.org (Postfix) with ESMTP id 4C2DD6B0194 for ; Tue, 22 Mar 2022 17:47:56 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 2D70921D64 for ; Tue, 22 Mar 2022 21:47:56 +0000 (UTC) X-FDA: 79273360152.06.90BC7A4 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf23.hostedemail.com (Postfix) with ESMTP id 949E314002C for ; Tue, 22 Mar 2022 21:47:55 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 7B708B81DC3; Tue, 22 Mar 2022 21:47:54 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 11685C340EC; Tue, 22 Mar 2022 21:47:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985673; bh=a8aRCtBiFCKEeAsFJLyqyO+5JeBvC4QDTP6d5RykoZs=; h=Date:To:From:In-Reply-To:Subject:From; b=NA4jjGwfweuQtlJq+s/SKWxxk5fUCe1fd7yLJxtefwZVLbUETCxVBM0R8LNkrqYF+ 9OQSSXn4Ko+4SXGEG00dwGj+1YDNkEd41HSmgTiIt1q+ZZMj73kH1VYEWJql+xFOLo oGIBkUxGHsjntEedB7B+qtQ0ohc9jIin440Y5aHg= Date: Tue, 22 Mar 2022 14:47:52 -0700 To: keescook@chromium.org,i.zhbanov@omprussia.ru,crecklin@redhat.com,rdunlap@infradead.org,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 184/227] mm/usercopy: return 1 from hardened_usercopy __setup() handler Message-Id: <20220322214753.11685C340EC@smtp.kernel.org> X-Rspamd-Server: rspam09 X-Rspam-User: X-Stat-Signature: zbydd9z64wmtu44m8dkp4eqis4fpajse Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=NA4jjGwf; dmarc=none; spf=pass (imf23.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Rspamd-Queue-Id: 949E314002C X-HE-Tag: 1647985675-535960 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Randy Dunlap Subject: mm/usercopy: return 1 from hardened_usercopy __setup() handler __setup() handlers should return 1 if the command line option is handled and 0 if not (or maybe never return 0; it just pollutes init's environment). This prevents: Unknown kernel command line parameters \ "BOOT_IMAGE=/boot/bzImage-517rc5 hardened_usercopy=off", will be \ passed to user space. Run /sbin/init as init process with arguments: /sbin/init with environment: HOME=/ TERM=linux BOOT_IMAGE=/boot/bzImage-517rc5 hardened_usercopy=off or hardened_usercopy=on but when "hardened_usercopy=foo" is used, there is no Unknown kernel command line parameter. Return 1 to indicate that the boot option has been handled. Print a warning if strtobool() returns an error on the option string, but do not mark this as in unknown command line option and do not cause init's environment to be polluted with this string. Link: https://lkml.kernel.org/r/20220222034249.14795-1-rdunlap@infradead.org Link: lore.kernel.org/r/64644a2f-4a20-bab3-1e15-3b2cdd0defe3@omprussia.ru Fixes: b5cb15d9372ab ("usercopy: Allow boot cmdline disabling of hardening") Signed-off-by: Randy Dunlap Reported-by: Igor Zhbanov Acked-by: Chris von Recklinghausen Cc: Kees Cook Signed-off-by: Andrew Morton --- mm/usercopy.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) --- a/mm/usercopy.c~mm-usercopy-return-1-from-hardened_usercopy-__setup-handler +++ a/mm/usercopy.c @@ -284,7 +284,10 @@ static bool enable_checks __initdata = t static int __init parse_hardened_usercopy(char *str) { - return strtobool(str, &enable_checks); + if (strtobool(str, &enable_checks)) + pr_warn("Invalid option string for hardened_usercopy: '%s'\n", + str); + return 1; } __setup("hardened_usercopy=", parse_hardened_usercopy); From patchwork Tue Mar 22 21:47:55 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789269 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A211CC433F5 for ; Tue, 22 Mar 2022 21:47:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3D6706B0196; Tue, 22 Mar 2022 17:47:59 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3849B6B0197; Tue, 22 Mar 2022 17:47:59 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 274786B0198; Tue, 22 Mar 2022 17:47:59 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.26]) by kanga.kvack.org (Postfix) with ESMTP id 159B96B0196 for ; Tue, 22 Mar 2022 17:47:59 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id E43AB243A2 for ; Tue, 22 Mar 2022 21:47:58 +0000 (UTC) X-FDA: 79273360236.10.F714FCC Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf22.hostedemail.com (Postfix) with ESMTP id 7D603C0017 for ; Tue, 22 Mar 2022 21:47:58 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 6A31BB81DC9; Tue, 22 Mar 2022 21:47:57 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0F77BC340F5; Tue, 22 Mar 2022 21:47:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985676; bh=DXWMWCtNqcjsmwVePQKE6A5+0n6p7w3U86+C8RQt9xA=; h=Date:To:From:In-Reply-To:Subject:From; b=q0IGxVA6vcDdJWR2L0f3dFsHSfX7Hvo8bjNjdTPNDQau3QGOJm9W5K5I8fHAYpfie diQaShq61vT023NC2P8bbU2hAXOl19HKxqufL9rsIFDltnt1cQK6Cx7G3hswh70xN4 YUVTQyTmbZwB/S4qazYx/Pw8b1Z9u//NPdQPfiLQ= Date: Tue, 22 Mar 2022 14:47:55 -0700 To: willy@infradead.org,mgorman@techsingularity.net,david@redhat.com,vbabka@suse.cz,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 185/227] mm/early_ioremap: declare early_memremap_pgprot_adjust() Message-Id: <20220322214756.0F77BC340F5@smtp.kernel.org> X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 7D603C0017 X-Stat-Signature: 7swp78jr7bd7fcwfpg9yooxcwoeokdms X-Rspam-User: Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=q0IGxVA6; dmarc=none; spf=pass (imf22.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-HE-Tag: 1647985678-251010 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Vlastimil Babka Subject: mm/early_ioremap: declare early_memremap_pgprot_adjust() The mm/ directory can almost fully be built with W=1, which would help in local development. One remaining issue is missing prototype for early_memremap_pgprot_adjust(). Thus add a declaration for this function. Use mm/internal.h instead of asm/early_ioremap.h to avoid missing type definitions and unnecessary exposure. Link: https://lkml.kernel.org/r/20220314165724.16071-2-vbabka@suse.cz Signed-off-by: Vlastimil Babka Cc: Mel Gorman Cc: Matthew Wilcox Cc: David Hildenbrand Signed-off-by: Andrew Morton --- mm/early_ioremap.c | 1 + mm/internal.h | 6 ++++++ 2 files changed, 7 insertions(+) --- a/mm/early_ioremap.c~mm-early_ioremap-declare-early_memremap_pgprot_adjust +++ a/mm/early_ioremap.c @@ -17,6 +17,7 @@ #include #include #include +#include "internal.h" #ifdef CONFIG_MMU static int early_ioremap_debug __initdata; --- a/mm/internal.h~mm-early_ioremap-declare-early_memremap_pgprot_adjust +++ a/mm/internal.h @@ -155,6 +155,12 @@ extern unsigned long highest_memmap_pfn; #define MAX_RECLAIM_RETRIES 16 /* + * in mm/early_ioremap.c + */ +pgprot_t __init early_memremap_pgprot_adjust(resource_size_t phys_addr, + unsigned long size, pgprot_t prot); + +/* * in mm/vmscan.c: */ extern int isolate_lru_page(struct page *page); From patchwork Tue Mar 22 21:47:58 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789270 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B4E68C433F5 for ; Tue, 22 Mar 2022 21:48:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 536296B0198; Tue, 22 Mar 2022 17:48:01 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4BDA56B0199; Tue, 22 Mar 2022 17:48:01 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3D6A66B019A; Tue, 22 Mar 2022 17:48:01 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0161.hostedemail.com [216.40.44.161]) by kanga.kvack.org (Postfix) with ESMTP id 21E116B0198 for ; Tue, 22 Mar 2022 17:48:01 -0400 (EDT) Received: from smtpin24.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id D786B8249980 for ; Tue, 22 Mar 2022 21:48:00 +0000 (UTC) X-FDA: 79273360320.24.AF3E7C8 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf24.hostedemail.com (Postfix) with ESMTP id 3DA07180043 for ; Tue, 22 Mar 2022 21:48:00 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id A6937615BD; Tue, 22 Mar 2022 21:47:59 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0A924C340EC; Tue, 22 Mar 2022 21:47:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985679; bh=qpcuFDUcYAmFeBf1F6hEHT80Ur3W5fKvcCX8Qm302zM=; h=Date:To:From:In-Reply-To:Subject:From; b=BUbJS6yP06zf9YHUEVoPSFkEQLtGdmC8ahyLEzvis2338Im/SN59G+vm1x+kx4/9a Sw4XXHyIHJXNAZgFtvyYb9QzSW1Dt5tepVWyHWx7wSPW2cLyj0Qh3x0uiPkI5M6Njz 1GfcwJjmlZKET5gXj/U36CVftmUtHp8rkBDmW31g= Date: Tue, 22 Mar 2022 14:47:58 -0700 To: ira.weiny@intel.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 186/227] highmem: document kunmap_local() Message-Id: <20220322214759.0A924C340EC@smtp.kernel.org> X-Stat-Signature: nnwffspdkgitfkk9jyswc8kojdjo4uwb Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=BUbJS6yP; spf=pass (imf24.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspam-User: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 3DA07180043 X-HE-Tag: 1647985680-269663 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Ira Weiny Subject: highmem: document kunmap_local() Some users of kmap() add an offset to the kmap() address to be used during the mapping. When converting to kmap_local_page() the base address does not need to be stored because any address within the page can be used in kunmap_local(). However, this was not clear from the documentation and cause some questions.[1] Document that any address in the page can be used in kunmap_local() to clarify this for future users. [1] https://lore.kernel.org/lkml/20211213154543.GM3538886@iweiny-DESK2.sc.intel.com/ [ira.weiny@intel.com: updates per Christoph] Link: https://lkml.kernel.org/r/20220124182138.816693-1-ira.weiny@intel.com Link: https://lkml.kernel.org/r/20220124013045.806718-1-ira.weiny@intel.com Signed-off-by: Ira Weiny Signed-off-by: Andrew Morton --- include/linux/highmem-internal.h | 10 ++++++++++ 1 file changed, 10 insertions(+) --- a/include/linux/highmem-internal.h~highmem-document-kunmap_local +++ a/include/linux/highmem-internal.h @@ -246,6 +246,16 @@ do { \ __kunmap_atomic(__addr); \ } while (0) +/** + * kunmap_local - Unmap a page mapped via kmap_local_page(). + * @__addr: An address within the page mapped + * + * @__addr can be any address within the mapped page. Commonly it is the + * address return from kmap_local_page(), but it can also include offsets. + * + * Unmapping should be done in the reverse order of the mapping. See + * kmap_local_page() for details. + */ #define kunmap_local(__addr) \ do { \ BUILD_BUG_ON(__same_type((__addr), struct page *)); \ From patchwork Tue Mar 22 21:48:01 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789271 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DDC05C433F5 for ; Tue, 22 Mar 2022 21:48:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 757226B019A; Tue, 22 Mar 2022 17:48:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 72E206B019B; Tue, 22 Mar 2022 17:48:05 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 61D886B019C; Tue, 22 Mar 2022 17:48:05 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0218.hostedemail.com [216.40.44.218]) by kanga.kvack.org (Postfix) with ESMTP id 4FE4C6B019A for ; Tue, 22 Mar 2022 17:48:05 -0400 (EDT) Received: from smtpin20.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id F140318289DE1 for ; Tue, 22 Mar 2022 21:48:04 +0000 (UTC) X-FDA: 79273360530.20.6BF1353 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf31.hostedemail.com (Postfix) with ESMTP id 5B1BB2001F for ; Tue, 22 Mar 2022 21:48:04 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 45A2BB81DAB; Tue, 22 Mar 2022 21:48:03 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id EBCC0C340EC; Tue, 22 Mar 2022 21:48:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985682; bh=/zLP9MOXMJ3xicO3J2mxQjO4RL4KSCc6UFcpTeEo2EU=; h=Date:To:From:In-Reply-To:Subject:From; b=DwjDMB/MusslyG1lX0xvXbA9bEq3/LkMVFSSqg0bgfQWq4pZFVE4rh1ZHAiBxSIDc syPAlwZ1wcs2aNPen3qPge4MNfMeTOy/efcwb2fpOqoX216u/aSdLtqewdX0ZBJVZg wUJqWK+KrQjSvNoF298hEzlkuh3u0qnWZ5IAj5pA= Date: Tue, 22 Mar 2022 14:48:01 -0700 To: songmuchun@bytedance.com,rientjes@google.com,david@redhat.com,linmiaohe@huawei.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 187/227] mm/highmem: remove unnecessary done label Message-Id: <20220322214801.EBCC0C340EC@smtp.kernel.org> X-Rspamd-Server: rspam09 X-Rspam-User: X-Stat-Signature: 39xcsdbw3hyfeyjc6a1eb4in3fopcfrz Authentication-Results: imf31.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b="DwjDMB/M"; dmarc=none; spf=pass (imf31.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Rspamd-Queue-Id: 5B1BB2001F X-HE-Tag: 1647985684-598820 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Miaohe Lin Subject: mm/highmem: remove unnecessary done label Remove unnecessary done label to simplify the code. Link: https://lkml.kernel.org/r/20220126092542.64659-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin Reviewed-by: Muchun Song Reviewed-by: David Hildenbrand Acked-by: David Rientjes Signed-off-by: Andrew Morton --- mm/highmem.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) --- a/mm/highmem.c~mm-highmem-remove-unnecessary-done-label +++ a/mm/highmem.c @@ -736,11 +736,11 @@ void *page_address(const struct page *pa list_for_each_entry(pam, &pas->lh, list) { if (pam->page == page) { ret = pam->virtual; - goto done; + break; } } } -done: + spin_unlock_irqrestore(&pas->lock, flags); return ret; } @@ -773,13 +773,12 @@ void set_page_address(struct page *page, list_for_each_entry(pam, &pas->lh, list) { if (pam->page == page) { list_del(&pam->list); - spin_unlock_irqrestore(&pas->lock, flags); - goto done; + break; } } spin_unlock_irqrestore(&pas->lock, flags); } -done: + return; } From patchwork Tue Mar 22 21:48:04 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789272 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 983B3C433F5 for ; Tue, 22 Mar 2022 21:48:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3871F6B019C; Tue, 22 Mar 2022 17:48:08 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 336696B019D; Tue, 22 Mar 2022 17:48:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 225406B019E; Tue, 22 Mar 2022 17:48:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0175.hostedemail.com [216.40.44.175]) by kanga.kvack.org (Postfix) with ESMTP id 111D46B019C for ; Tue, 22 Mar 2022 17:48:08 -0400 (EDT) Received: from smtpin31.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id CA67D8249980 for ; Tue, 22 Mar 2022 21:48:07 +0000 (UTC) X-FDA: 79273360614.31.632BDE7 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf13.hostedemail.com (Postfix) with ESMTP id 52A6B2001D for ; Tue, 22 Mar 2022 21:48:07 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 41C5BB81DAF; Tue, 22 Mar 2022 21:48:06 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id F2891C340EC; Tue, 22 Mar 2022 21:48:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985685; bh=z4mrUtnlH4ASlAaRrDE7qBye07gqXWEm8RyBeOgfDUg=; h=Date:To:From:In-Reply-To:Subject:From; b=pRCETW4PfGSJG7SNdsnvUnmZ9SbeV6hGwg+H7+R8b0nxEdqFvqHypdVf5nE9IzaTX mAnUkPczsvQTBJeCua8MVUfIza7RQvr4Lq9+T8sobQLyTSV1Op1TraRLpJw3xRvLnb Rk967EdWg+/9meC3yyq2fg8uwWAFcq8C8TO4pnsM= Date: Tue, 22 Mar 2022 14:48:04 -0700 To: linux@treblig.org,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 188/227] mm/page_table_check.c: use strtobool for param parsing Message-Id: <20220322214804.F2891C340EC@smtp.kernel.org> X-Stat-Signature: k651ijebim4a7x9bzfa8c37aoeansizn Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=pRCETW4P; dmarc=none; spf=pass (imf13.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Rspam-User: X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: 52A6B2001D X-HE-Tag: 1647985687-572577 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: "Dr. David Alan Gilbert" Subject: mm/page_table_check.c: use strtobool for param parsing Use strtobool rather than open coding "on" and "off" parsing. Link: https://lkml.kernel.org/r/20220227181038.126926-1-linux@treblig.org Signed-off-by: Dr. David Alan Gilbert Signed-off-by: Andrew Morton --- mm/page_table_check.c | 10 +--------- 1 file changed, 1 insertion(+), 9 deletions(-) --- a/mm/page_table_check.c~mm-use-strtobool-for-param-parsing +++ a/mm/page_table_check.c @@ -23,15 +23,7 @@ EXPORT_SYMBOL(page_table_check_disabled) static int __init early_page_table_check_param(char *buf) { - if (!buf) - return -EINVAL; - - if (strcmp(buf, "on") == 0) - __page_table_check_enabled = true; - else if (strcmp(buf, "off") == 0) - __page_table_check_enabled = false; - - return 0; + return strtobool(buf, &__page_table_check_enabled); } early_param("page_table_check", early_page_table_check_param); From patchwork Tue Mar 22 21:48:07 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789273 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B6F53C433EF for ; Tue, 22 Mar 2022 21:48:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 50D256B019E; Tue, 22 Mar 2022 17:48:11 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4BCF16B019F; Tue, 22 Mar 2022 17:48:11 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3AC086B01A0; Tue, 22 Mar 2022 17:48:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.a.hostedemail.com [64.99.140.24]) by kanga.kvack.org (Postfix) with ESMTP id 2B1DB6B019E for ; Tue, 22 Mar 2022 17:48:11 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay11.hostedemail.com (Postfix) with ESMTP id ED41281A2D for ; Tue, 22 Mar 2022 21:48:10 +0000 (UTC) X-FDA: 79273360740.01.1998745 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf03.hostedemail.com (Postfix) with ESMTP id 744CD20014 for ; Tue, 22 Mar 2022 21:48:10 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 4A996B81D59; Tue, 22 Mar 2022 21:48:09 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id DF59EC340EE; Tue, 22 Mar 2022 21:48:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985688; bh=sltB8cC+Ikzv8M3cSTK2rdMR2TH4AcUo9crEI388ltQ=; h=Date:To:From:In-Reply-To:Subject:From; b=r5hiGhskaMMxJllpAoEFguZwwiqAnukvkjtw2W41EAW3LaWtoyPjNNU95XCPCscen /rc/ZDseKh5YDjywkWyd/Tbp1mgnZDtNF93BPGUk0cTjSMGk6HfrD9vIq2RixSmBYj F4En94IuXTtFWXh7ABBbgWMPusLINkiLPZU9MZ+w= Date: Tue, 22 Mar 2022 14:48:07 -0700 To: glider@google.com,elver@google.com,dvyukov@google.com,tangmeng@uniontech.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 189/227] mm/kfence: remove unnecessary CONFIG_KFENCE option Message-Id: <20220322214807.DF59EC340EE@smtp.kernel.org> X-Rspam-User: Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=r5hiGhsk; spf=pass (imf03.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 744CD20014 X-Stat-Signature: 7g3egm3xemyzjeh6o7pdwxgcecbpsowb X-HE-Tag: 1647985690-802810 X-Bogosity: Ham, tests=bogofilter, spamicity=0.004412, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: tangmeng Subject: mm/kfence: remove unnecessary CONFIG_KFENCE option In mm/Makefile has: obj-$(CONFIG_KFENCE) += kfence/ So that we don't need 'obj-$(CONFIG_KFENCE) :=' in mm/kfence/Makefile, delete it from mm/kfence/Makefile. Link: https://lkml.kernel.org/r/20220221065525.21344-1-tangmeng@uniontech.com Signed-off-by: tangmeng Reviewed-by: Marco Elver Cc: Alexander Potapenko Cc: Dmitriy Vyukov Signed-off-by: Andrew Morton --- mm/kfence/Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/mm/kfence/Makefile~mm-kfence-remove-unnecessary-config_kfence-option +++ a/mm/kfence/Makefile @@ -1,6 +1,6 @@ # SPDX-License-Identifier: GPL-2.0 -obj-$(CONFIG_KFENCE) := core.o report.o +obj-y := core.o report.o CFLAGS_kfence_test.o := -g -fno-omit-frame-pointer -fno-optimize-sibling-calls obj-$(CONFIG_KFENCE_KUNIT_TEST) += kfence_test.o From patchwork Tue Mar 22 21:48:10 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789274 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 80282C433EF for ; Tue, 22 Mar 2022 21:48:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 165EE6B01A0; Tue, 22 Mar 2022 17:48:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0F0046B01A1; Tue, 22 Mar 2022 17:48:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EF8086B01A2; Tue, 22 Mar 2022 17:48:12 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0144.hostedemail.com [216.40.44.144]) by kanga.kvack.org (Postfix) with ESMTP id DCF186B01A0 for ; Tue, 22 Mar 2022 17:48:12 -0400 (EDT) Received: from smtpin30.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 9B50C1828AE2E for ; Tue, 22 Mar 2022 21:48:12 +0000 (UTC) X-FDA: 79273360824.30.5173F27 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf16.hostedemail.com (Postfix) with ESMTP id 25CA418002A for ; Tue, 22 Mar 2022 21:48:12 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 8FCC16174B; Tue, 22 Mar 2022 21:48:11 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id EAFBDC340EC; Tue, 22 Mar 2022 21:48:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985691; bh=qj3zkiXxH1lU5fnTBu/L7vSbRLdo2ycVZo2tH+WdVpM=; h=Date:To:From:In-Reply-To:Subject:From; b=LR882vkSSqVZA695slKuGBExxDqoPHEq7KAyX0mlx3ISh0GAC7/VD3h9rnDl1k6L0 cj20z3MnYDnwHMek9DHku9QXYWcXZ5/mqc+ozGREDn1+bTBcQDttD+WKAgEGcsHxyq 05A70Z0B0F59kdkgZJS9PViinQrsAHL67cGonIb4= Date: Tue, 22 Mar 2022 14:48:10 -0700 To: glider@google.com,elver@google.com,dvyukov@google.com,dtcccc@linux.alibaba.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 190/227] kfence: allow re-enabling KFENCE after system startup Message-Id: <20220322214810.EAFBDC340EC@smtp.kernel.org> X-Rspam-User: Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=LR882vkS; spf=pass (imf16.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 25CA418002A X-Stat-Signature: kr3gkkiwnushz9nx9bxpfq5acqruzufo X-HE-Tag: 1647985692-315459 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Tianchen Ding Subject: kfence: allow re-enabling KFENCE after system startup Patch series "provide the flexibility to enable KFENCE", v3. If CONFIG_CONTIG_ALLOC is not supported, we fallback to try alloc_pages_exact(). Allocating pages in this way has limits about MAX_ORDER (default 11). So we will not support allocating kfence pool after system startup with a large KFENCE_NUM_OBJECTS. When handling failures in kfence_init_pool_late(), we pair free_pages_exact() to alloc_pages_exact() for compatibility consideration, though it actually does the same as free_contig_range(). This patch (of 2): If once KFENCE is disabled by: echo 0 > /sys/module/kfence/parameters/sample_interval KFENCE could never be re-enabled until next rebooting. Allow re-enabling it by writing a positive num to sample_interval. Link: https://lkml.kernel.org/r/20220307074516.6920-1-dtcccc@linux.alibaba.com Link: https://lkml.kernel.org/r/20220307074516.6920-2-dtcccc@linux.alibaba.com Signed-off-by: Tianchen Ding Reviewed-by: Marco Elver Cc: Alexander Potapenko Cc: Dmitry Vyukov Signed-off-by: Andrew Morton --- mm/kfence/core.c | 21 ++++++++++++++++++--- 1 file changed, 18 insertions(+), 3 deletions(-) --- a/mm/kfence/core.c~kfence-allow-re-enabling-kfence-after-system-startup +++ a/mm/kfence/core.c @@ -38,14 +38,17 @@ #define KFENCE_WARN_ON(cond) \ ({ \ const bool __cond = WARN_ON(cond); \ - if (unlikely(__cond)) \ + if (unlikely(__cond)) { \ WRITE_ONCE(kfence_enabled, false); \ + disabled_by_warn = true; \ + } \ __cond; \ }) /* === Data ================================================================= */ static bool kfence_enabled __read_mostly; +static bool disabled_by_warn __read_mostly; unsigned long kfence_sample_interval __read_mostly = CONFIG_KFENCE_SAMPLE_INTERVAL; EXPORT_SYMBOL_GPL(kfence_sample_interval); /* Export for test modules. */ @@ -55,6 +58,7 @@ EXPORT_SYMBOL_GPL(kfence_sample_interval #endif #define MODULE_PARAM_PREFIX "kfence." +static int kfence_enable_late(void); static int param_set_sample_interval(const char *val, const struct kernel_param *kp) { unsigned long num; @@ -65,10 +69,11 @@ static int param_set_sample_interval(con if (!num) /* Using 0 to indicate KFENCE is disabled. */ WRITE_ONCE(kfence_enabled, false); - else if (!READ_ONCE(kfence_enabled) && system_state != SYSTEM_BOOTING) - return -EINVAL; /* Cannot (re-)enable KFENCE on-the-fly. */ *((unsigned long *)kp->arg) = num; + + if (num && !READ_ONCE(kfence_enabled) && system_state != SYSTEM_BOOTING) + return disabled_by_warn ? -EINVAL : kfence_enable_late(); return 0; } @@ -787,6 +792,16 @@ void __init kfence_init(void) (void *)(__kfence_pool + KFENCE_POOL_SIZE)); } +static int kfence_enable_late(void) +{ + if (!__kfence_pool) + return -EINVAL; + + WRITE_ONCE(kfence_enabled, true); + queue_delayed_work(system_unbound_wq, &kfence_timer, 0); + return 0; +} + void kfence_shutdown_cache(struct kmem_cache *s) { unsigned long flags; From patchwork Tue Mar 22 21:48:13 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789275 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BF4A1C433F5 for ; Tue, 22 Mar 2022 21:48:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5AB9D6B01A2; Tue, 22 Mar 2022 17:48:17 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 559996B01A3; Tue, 22 Mar 2022 17:48:17 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 448EA6B01A4; Tue, 22 Mar 2022 17:48:17 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.25]) by kanga.kvack.org (Postfix) with ESMTP id 349196B01A2 for ; Tue, 22 Mar 2022 17:48:17 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay13.hostedemail.com (Postfix) with ESMTP id D70BF61973 for ; Tue, 22 Mar 2022 21:48:16 +0000 (UTC) X-FDA: 79273361034.04.5789E98 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf09.hostedemail.com (Postfix) with ESMTP id 6F46E14000D for ; Tue, 22 Mar 2022 21:48:16 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 51F0AB81DAB; Tue, 22 Mar 2022 21:48:15 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id E58EBC340EE; Tue, 22 Mar 2022 21:48:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985694; bh=gt5kXwgl0T06KlyAH23RaCVOLJostPIhoWTjCfUT7sw=; h=Date:To:From:In-Reply-To:Subject:From; b=gevX4wpPCX2PtcxnL5L2RUNbkgMBVYf5rIwlyRFFJDReZbKQlJjNxGNzsEvLmfKbJ mE9kwIrQKciOIBmSatot0svCt67SZG2qlioLS8bFsQDAXMKNHiEakXEK93c+BNWP5K l/pAgj9RPlj7rkprOLG6zGLFbcCgs0ge4v+HsvZo= Date: Tue, 22 Mar 2022 14:48:13 -0700 To: liupeng256@huawei.com,glider@google.com,elver@google.com,dvyukov@google.com,dtcccc@linux.alibaba.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 191/227] kfence: alloc kfence_pool after system startup Message-Id: <20220322214813.E58EBC340EE@smtp.kernel.org> X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 6F46E14000D X-Stat-Signature: sbbb1u5wkxbuyj9eewkjz3aaxswfehe1 X-Rspam-User: Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=gevX4wpP; dmarc=none; spf=pass (imf09.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-HE-Tag: 1647985696-973438 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Tianchen Ding Subject: kfence: alloc kfence_pool after system startup Allow enabling KFENCE after system startup by allocating its pool via the page allocator. This provides the flexibility to enable KFENCE even if it wasn't enabled at boot time. Link: https://lkml.kernel.org/r/20220307074516.6920-3-dtcccc@linux.alibaba.com Signed-off-by: Tianchen Ding Reviewed-by: Marco Elver Tested-by: Peng Liu Cc: Alexander Potapenko Cc: Dmitry Vyukov Signed-off-by: Andrew Morton --- mm/kfence/core.c | 111 ++++++++++++++++++++++++++++++++++++--------- 1 file changed, 90 insertions(+), 21 deletions(-) --- a/mm/kfence/core.c~kfence-alloc-kfence_pool-after-system-startup +++ a/mm/kfence/core.c @@ -96,7 +96,7 @@ static unsigned long kfence_skip_covered module_param_named(skip_covered_thresh, kfence_skip_covered_thresh, ulong, 0644); /* The pool of pages used for guard pages and objects. */ -char *__kfence_pool __ro_after_init; +char *__kfence_pool __read_mostly; EXPORT_SYMBOL(__kfence_pool); /* Export for test modules. */ /* @@ -537,17 +537,19 @@ static void rcu_guarded_free(struct rcu_ kfence_guarded_free((void *)meta->addr, meta, false); } -static bool __init kfence_init_pool(void) +/* + * Initialization of the KFENCE pool after its allocation. + * Returns 0 on success; otherwise returns the address up to + * which partial initialization succeeded. + */ +static unsigned long kfence_init_pool(void) { unsigned long addr = (unsigned long)__kfence_pool; struct page *pages; int i; - if (!__kfence_pool) - return false; - if (!arch_kfence_init_pool()) - goto err; + return addr; pages = virt_to_page(addr); @@ -565,7 +567,7 @@ static bool __init kfence_init_pool(void /* Verify we do not have a compound head page. */ if (WARN_ON(compound_head(&pages[i]) != &pages[i])) - goto err; + return addr; __SetPageSlab(&pages[i]); } @@ -578,7 +580,7 @@ static bool __init kfence_init_pool(void */ for (i = 0; i < 2; i++) { if (unlikely(!kfence_protect(addr))) - goto err; + return addr; addr += PAGE_SIZE; } @@ -595,7 +597,7 @@ static bool __init kfence_init_pool(void /* Protect the right redzone. */ if (unlikely(!kfence_protect(addr + PAGE_SIZE))) - goto err; + return addr; addr += 2 * PAGE_SIZE; } @@ -608,9 +610,21 @@ static bool __init kfence_init_pool(void */ kmemleak_free(__kfence_pool); - return true; + return 0; +} + +static bool __init kfence_init_pool_early(void) +{ + unsigned long addr; + + if (!__kfence_pool) + return false; + + addr = kfence_init_pool(); + + if (!addr) + return true; -err: /* * Only release unprotected pages, and do not try to go back and change * page attributes due to risk of failing to do so as well. If changing @@ -623,6 +637,26 @@ err: return false; } +static bool kfence_init_pool_late(void) +{ + unsigned long addr, free_size; + + addr = kfence_init_pool(); + + if (!addr) + return true; + + /* Same as above. */ + free_size = KFENCE_POOL_SIZE - (addr - (unsigned long)__kfence_pool); +#ifdef CONFIG_CONTIG_ALLOC + free_contig_range(page_to_pfn(virt_to_page(addr)), free_size / PAGE_SIZE); +#else + free_pages_exact((void *)addr, free_size); +#endif + __kfence_pool = NULL; + return false; +} + /* === DebugFS Interface ==================================================== */ static int stats_show(struct seq_file *seq, void *v) @@ -771,31 +805,66 @@ void __init kfence_alloc_pool(void) pr_err("failed to allocate pool\n"); } +static void kfence_init_enable(void) +{ + if (!IS_ENABLED(CONFIG_KFENCE_STATIC_KEYS)) + static_branch_enable(&kfence_allocation_key); + WRITE_ONCE(kfence_enabled, true); + queue_delayed_work(system_unbound_wq, &kfence_timer, 0); + pr_info("initialized - using %lu bytes for %d objects at 0x%p-0x%p\n", KFENCE_POOL_SIZE, + CONFIG_KFENCE_NUM_OBJECTS, (void *)__kfence_pool, + (void *)(__kfence_pool + KFENCE_POOL_SIZE)); +} + void __init kfence_init(void) { + stack_hash_seed = (u32)random_get_entropy(); + /* Setting kfence_sample_interval to 0 on boot disables KFENCE. */ if (!kfence_sample_interval) return; - stack_hash_seed = (u32)random_get_entropy(); - if (!kfence_init_pool()) { + if (!kfence_init_pool_early()) { pr_err("%s failed\n", __func__); return; } - if (!IS_ENABLED(CONFIG_KFENCE_STATIC_KEYS)) - static_branch_enable(&kfence_allocation_key); - WRITE_ONCE(kfence_enabled, true); - queue_delayed_work(system_unbound_wq, &kfence_timer, 0); - pr_info("initialized - using %lu bytes for %d objects at 0x%p-0x%p\n", KFENCE_POOL_SIZE, - CONFIG_KFENCE_NUM_OBJECTS, (void *)__kfence_pool, - (void *)(__kfence_pool + KFENCE_POOL_SIZE)); + kfence_init_enable(); +} + +static int kfence_init_late(void) +{ + const unsigned long nr_pages = KFENCE_POOL_SIZE / PAGE_SIZE; +#ifdef CONFIG_CONTIG_ALLOC + struct page *pages; + + pages = alloc_contig_pages(nr_pages, GFP_KERNEL, first_online_node, NULL); + if (!pages) + return -ENOMEM; + __kfence_pool = page_to_virt(pages); +#else + if (nr_pages > MAX_ORDER_NR_PAGES) { + pr_warn("KFENCE_NUM_OBJECTS too large for buddy allocator\n"); + return -EINVAL; + } + __kfence_pool = alloc_pages_exact(KFENCE_POOL_SIZE, GFP_KERNEL); + if (!__kfence_pool) + return -ENOMEM; +#endif + + if (!kfence_init_pool_late()) { + pr_err("%s failed\n", __func__); + return -EBUSY; + } + + kfence_init_enable(); + return 0; } static int kfence_enable_late(void) { if (!__kfence_pool) - return -EINVAL; + return kfence_init_late(); WRITE_ONCE(kfence_enabled, true); queue_delayed_work(system_unbound_wq, &kfence_timer, 0); From patchwork Tue Mar 22 21:48:16 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789276 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C6B1FC433F5 for ; Tue, 22 Mar 2022 21:48:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5D0E36B01A4; Tue, 22 Mar 2022 17:48:20 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 580426B01A5; Tue, 22 Mar 2022 17:48:20 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 46F476B01A6; Tue, 22 Mar 2022 17:48:20 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.26]) by kanga.kvack.org (Postfix) with ESMTP id 37DEF6B01A4 for ; Tue, 22 Mar 2022 17:48:20 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay11.hostedemail.com (Postfix) with ESMTP id 0EA6781A2D for ; Tue, 22 Mar 2022 21:48:20 +0000 (UTC) X-FDA: 79273361160.09.9E2DB3D Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf19.hostedemail.com (Postfix) with ESMTP id 74E551A001E for ; Tue, 22 Mar 2022 21:48:19 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 71A7DB81D59; Tue, 22 Mar 2022 21:48:18 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 09BBDC340EC; Tue, 22 Mar 2022 21:48:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985697; bh=1SzSqyKKcHEJ9DMHDkyZiaGv+c9q1oyFNVD1PLarfDM=; h=Date:To:From:In-Reply-To:Subject:From; b=CsXzAp9QkWuWMMTF8MEOWhixZZ9yOHmwaBa6N24TV6SJkLBLKpfan2NlDDrP//W47 OLZgj4nckZR+gT01XaU6pE2kzhqQ5x3Ln1nDCQrU0BZAq5zbckuGXV3atfHrun0NA1 1lGBLW7id6mWU3juGtfBcYjUjUhvJxogfPLB/0JE= Date: Tue, 22 Mar 2022 14:48:16 -0700 To: wangkefeng.wang@huawei.com,glider@google.com,elver@google.com,dvyukov@google.com,dlatypov@google.com,davidgow@google.com,brendanhiggins@google.com,liupeng256@huawei.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 192/227] kunit: fix UAF when run kfence test case test_gfpzero Message-Id: <20220322214817.09BBDC340EC@smtp.kernel.org> X-Stat-Signature: kaierjdauzyna8eyg3gxwgdyfo4qpxyo Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=CsXzAp9Q; spf=pass (imf19.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 74E551A001E X-HE-Tag: 1647985699-352343 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Peng Liu Subject: kunit: fix UAF when run kfence test case test_gfpzero Patch series "kunit: fix a UAF bug and do some optimization", v2. This series is to fix UAF (use after free) when running kfence test case test_gfpzero, which is time costly. This UAF bug can be easily triggered by setting CONFIG_KFENCE_NUM_OBJECTS = 65535. Furthermore, some optimization for kunit tests has been done. This patch (of 3): Kunit will create a new thread to run an actual test case, and the main process will wait for the completion of the actual test thread until overtime. The variable "struct kunit test" has local property in function kunit_try_catch_run, and will be used in the test case thread. Task kunit_try_catch_run will free "struct kunit test" when kunit runs overtime, but the actual test case is still run and an UAF bug will be triggered. The above problem has been both observed in a physical machine and qemu platform when running kfence kunit tests. The problem can be triggered when setting CONFIG_KFENCE_NUM_OBJECTS = 65535. Under this setting, the test case test_gfpzero will cost hours and kunit will run to overtime. The follows show the panic log. BUG: unable to handle page fault for address: ffffffff82d882e9 Call Trace: kunit_log_append+0x58/0xd0 ... test_alloc.constprop.0.cold+0x6b/0x8a [kfence_test] test_gfpzero.cold+0x61/0x8ab [kfence_test] kunit_try_run_case+0x4c/0x70 kunit_generic_run_threadfn_adapter+0x11/0x20 kthread+0x166/0x190 ret_from_fork+0x22/0x30 Kernel panic - not syncing: Fatal exception Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 To solve this problem, the test case thread should be stopped when the kunit frame runs overtime. The stop signal will send in function kunit_try_catch_run, and test_gfpzero will handle it. Link: https://lkml.kernel.org/r/20220309083753.1561921-1-liupeng256@huawei.com Link: https://lkml.kernel.org/r/20220309083753.1561921-2-liupeng256@huawei.com Signed-off-by: Peng Liu Reviewed-by: Marco Elver Reviewed-by: Brendan Higgins Tested-by: Brendan Higgins Cc: Alexander Potapenko Cc: Dmitry Vyukov Cc: Wang Kefeng Cc: Daniel Latypov Cc: David Gow Signed-off-by: Andrew Morton --- lib/kunit/try-catch.c | 1 + mm/kfence/kfence_test.c | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) --- a/lib/kunit/try-catch.c~kunit-fix-uaf-when-run-kfence-test-case-test_gfpzero +++ a/lib/kunit/try-catch.c @@ -78,6 +78,7 @@ void kunit_try_catch_run(struct kunit_tr if (time_remaining == 0) { kunit_err(test, "try timed out\n"); try_catch->try_result = -ETIMEDOUT; + kthread_stop(task_struct); } exit_code = try_catch->try_result; --- a/mm/kfence/kfence_test.c~kunit-fix-uaf-when-run-kfence-test-case-test_gfpzero +++ a/mm/kfence/kfence_test.c @@ -623,7 +623,7 @@ static void test_gfpzero(struct kunit *t break; test_free(buf2); - if (i == CONFIG_KFENCE_NUM_OBJECTS) { + if (kthread_should_stop() || (i == CONFIG_KFENCE_NUM_OBJECTS)) { kunit_warn(test, "giving up ... cannot get same object back\n"); return; } From patchwork Tue Mar 22 21:48:19 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789277 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D7027C433FE for ; Tue, 22 Mar 2022 21:48:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6951C6B01A6; Tue, 22 Mar 2022 17:48:23 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6440C6B01A7; Tue, 22 Mar 2022 17:48:23 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 533006B01A8; Tue, 22 Mar 2022 17:48:23 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.27]) by kanga.kvack.org (Postfix) with ESMTP id 424206B01A6 for ; Tue, 22 Mar 2022 17:48:23 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay12.hostedemail.com (Postfix) with ESMTP id 075A1121952 for ; Tue, 22 Mar 2022 21:48:23 +0000 (UTC) X-FDA: 79273361286.14.CAADFED Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf27.hostedemail.com (Postfix) with ESMTP id 8938940035 for ; Tue, 22 Mar 2022 21:48:22 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 7118DB81DC6; Tue, 22 Mar 2022 21:48:21 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 1E539C340EE; Tue, 22 Mar 2022 21:48:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985700; bh=PTY5AQR0Uv4zkrNMjuB72K6AdNno877uzvOUdRK72r8=; h=Date:To:From:In-Reply-To:Subject:From; b=YvRIVBtrFKPNv6lR2NDtOLLO8TpAgI73EEU34kkodSQOtq42UDmcr0KnIxnthq57N We+U4riAf3MBL+amxmwmgFimpq4Og8Sb7/TCed45zaO1AoJxLzjxeSWRzE/yy8OFXD EMUC6D9kZRJh7mzgJtM1wj7WZCCHJKJkcbc709fM= Date: Tue, 22 Mar 2022 14:48:19 -0700 To: wangkefeng.wang@huawei.com,glider@google.com,elver@google.com,dvyukov@google.com,dlatypov@google.com,davidgow@google.com,brendanhiggins@google.com,liupeng256@huawei.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 193/227] kunit: make kunit_test_timeout compatible with comment Message-Id: <20220322214820.1E539C340EE@smtp.kernel.org> X-Stat-Signature: 5efskaoxde6zmmt4btefahbxadwpeabx Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=YvRIVBtr; spf=pass (imf27.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 8938940035 X-HE-Tag: 1647985702-470532 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Peng Liu Subject: kunit: make kunit_test_timeout compatible with comment In function kunit_test_timeout, it is declared "300 * MSEC_PER_SEC" represent 5min. However, it is wrong when dealing with arm64 whose default HZ = 250, or some other situations. Use msecs_to_jiffies to fix this, and kunit_test_timeout will work as desired. Link: https://lkml.kernel.org/r/20220309083753.1561921-3-liupeng256@huawei.com Fixes: 5f3e06208920 ("kunit: test: add support for test abort") Signed-off-by: Peng Liu Reviewed-by: Marco Elver Reviewed-by: Daniel Latypov Reviewed-by: Brendan Higgins Tested-by: Brendan Higgins Cc: Alexander Potapenko Cc: Dmitry Vyukov Cc: Wang Kefeng Cc: David Gow Signed-off-by: Andrew Morton --- lib/kunit/try-catch.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/lib/kunit/try-catch.c~kunit-make-kunit_test_timeout-compatible-with-comment +++ a/lib/kunit/try-catch.c @@ -52,7 +52,7 @@ static unsigned long kunit_test_timeout( * If tests timeout due to exceeding sysctl_hung_task_timeout_secs, * the task will be killed and an oops generated. */ - return 300 * MSEC_PER_SEC; /* 5 min */ + return 300 * msecs_to_jiffies(MSEC_PER_SEC); /* 5 min */ } void kunit_try_catch_run(struct kunit_try_catch *try_catch, void *context) From patchwork Tue Mar 22 21:48:22 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789278 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5EB74C43219 for ; Tue, 22 Mar 2022 21:48:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E877B6B01A8; Tue, 22 Mar 2022 17:48:24 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DBE996B01A9; Tue, 22 Mar 2022 17:48:24 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C85E76B01AA; Tue, 22 Mar 2022 17:48:24 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0152.hostedemail.com [216.40.44.152]) by kanga.kvack.org (Postfix) with ESMTP id B55B16B01A8 for ; Tue, 22 Mar 2022 17:48:24 -0400 (EDT) Received: from smtpin16.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 7DB861828AE2E for ; Tue, 22 Mar 2022 21:48:24 +0000 (UTC) X-FDA: 79273361328.16.9AA71C5 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf06.hostedemail.com (Postfix) with ESMTP id 1E123180010 for ; Tue, 22 Mar 2022 21:48:23 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 841436174F; Tue, 22 Mar 2022 21:48:23 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4868CC340EE; Tue, 22 Mar 2022 21:48:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985703; bh=Z+in6z7I+vTIn+ymFgxwN19wkjdS6edefn4YcYFMOZM=; h=Date:To:From:In-Reply-To:Subject:From; b=x8QvN7mdjlevad1xqvmGgN/eB9zeQoVTS9s1Z7AVaQqbFj/9iLBHFMtOy2pOuZA65 NlcCvRkkNzx+kPjjflLl4jn4/0jOKzMBVW/5G7XcrSOfrHFyiFL4hTJ+7ZytPguLcC 4Q7CQIIfdGN4tsT5muOCY8rPUFEkAKwOrVWO3B08= Date: Tue, 22 Mar 2022 14:48:22 -0700 To: wangkefeng.wang@huawei.com,glider@google.com,elver@google.com,dvyukov@google.com,dlatypov@google.com,davidgow@google.com,brendanhiggins@google.com,liupeng256@huawei.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 194/227] kfence: test: try to avoid test_gfpzero trigger rcu_stall Message-Id: <20220322214823.4868CC340EE@smtp.kernel.org> X-Rspam-User: X-Stat-Signature: j8ypkwfhhiypt8xx7fdzk5odcgdc93ta Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=x8QvN7md; spf=pass (imf06.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 1E123180010 X-HE-Tag: 1647985703-60597 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Peng Liu Subject: kfence: test: try to avoid test_gfpzero trigger rcu_stall When CONFIG_KFENCE_NUM_OBJECTS is set to a big number, kfence kunit-test-case test_gfpzero will eat up nearly all the CPU's resources and rcu_stall is reported as the following log which is cut from a physical server. rcu: INFO: rcu_sched self-detected stall on CPU rcu: 68-....: (14422 ticks this GP) idle=6ce/1/0x4000000000000002 softirq=592/592 fqs=7500 (t=15004 jiffies g=10677 q=20019) Task dump for CPU 68: task:kunit_try_catch state:R running task stack: 0 pid: 9728 ppid: 2 flags:0x0000020a Call trace: dump_backtrace+0x0/0x1e4 show_stack+0x20/0x2c sched_show_task+0x148/0x170 ... rcu_sched_clock_irq+0x70/0x180 update_process_times+0x68/0xb0 tick_sched_handle+0x38/0x74 ... gic_handle_irq+0x78/0x2c0 el1_irq+0xb8/0x140 kfree+0xd8/0x53c test_alloc+0x264/0x310 [kfence_test] test_gfpzero+0xf4/0x840 [kfence_test] kunit_try_run_case+0x48/0x20c kunit_generic_run_threadfn_adapter+0x28/0x34 kthread+0x108/0x13c ret_from_fork+0x10/0x18 To avoid rcu_stall and unacceptable latency, a schedule point is added to test_gfpzero. Link: https://lkml.kernel.org/r/20220309083753.1561921-4-liupeng256@huawei.com Signed-off-by: Peng Liu Reviewed-by: Marco Elver Tested-by: Brendan Higgins Cc: Alexander Potapenko Cc: Dmitry Vyukov Cc: Wang Kefeng Cc: Daniel Latypov Cc: David Gow Signed-off-by: Andrew Morton --- mm/kfence/kfence_test.c | 1 + 1 file changed, 1 insertion(+) --- a/mm/kfence/kfence_test.c~kfence-test-try-to-avoid-test_gfpzero-trigger-rcu_stall +++ a/mm/kfence/kfence_test.c @@ -627,6 +627,7 @@ static void test_gfpzero(struct kunit *t kunit_warn(test, "giving up ... cannot get same object back\n"); return; } + cond_resched(); } for (i = 0; i < size; i++) From patchwork Tue Mar 22 21:48:25 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789279 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 89FF9C433EF for ; Tue, 22 Mar 2022 21:48:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 164DE6B01AA; Tue, 22 Mar 2022 17:48:30 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 114BB6B01AB; Tue, 22 Mar 2022 17:48:30 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EF6CA6B01AE; Tue, 22 Mar 2022 17:48:29 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id DD2516B01AA for ; Tue, 22 Mar 2022 17:48:29 -0400 (EDT) Received: from smtpin19.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 9856D1828AE4E for ; Tue, 22 Mar 2022 21:48:29 +0000 (UTC) X-FDA: 79273361538.19.133E483 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf10.hostedemail.com (Postfix) with ESMTP id B585EC003C for ; Tue, 22 Mar 2022 21:48:28 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 9C945B81DB7; Tue, 22 Mar 2022 21:48:27 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3224BC340EC; Tue, 22 Mar 2022 21:48:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985706; bh=NXscRasXPUQvZi2U+KdwwDip00rzVpc91UdOY1HAIF4=; h=Date:To:From:In-Reply-To:Subject:From; b=rD1MNVhjyYwyPOIA63EdTSYwytSii0aMH+knNnNgDAfAlOAnpZePGO4D8OJkg7BAJ HIQ/YJ+gEGVkNtXudkcVTHo/utlux7iIlcOlTMXzB8BKbWUM86CXuObqAVeyr1XPqm 8020QMndtL25gRHrvVSvg46qxAgGx2qDZA0Bxyc8= Date: Tue, 22 Mar 2022 14:48:25 -0700 To: glider@google.com,dvyukov@google.com,elver@google.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 195/227] kfence: allow use of a deferrable timer Message-Id: <20220322214826.3224BC340EC@smtp.kernel.org> X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: B585EC003C X-Stat-Signature: 57f9sde1j1nxyfqsba86cz4mnbc455ny X-Rspam-User: Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=rD1MNVhj; dmarc=none; spf=pass (imf10.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-HE-Tag: 1647985708-454354 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Marco Elver Subject: kfence: allow use of a deferrable timer Allow the use of a deferrable timer, which does not force CPU wake-ups when the system is idle. A consequence is that the sample interval becomes very unpredictable, to the point that it is not guaranteed that the KFENCE KUnit test still passes. Nevertheless, on power-constrained systems this may be preferable, so let's give the user the option should they accept the above trade-off. Link: https://lkml.kernel.org/r/20220308141415.3168078-1-elver@google.com Signed-off-by: Marco Elver Reviewed-by: Alexander Potapenko Cc: Dmitry Vyukov Signed-off-by: Andrew Morton --- Documentation/dev-tools/kfence.rst | 12 ++++++++++++ lib/Kconfig.kfence | 12 ++++++++++++ mm/kfence/core.c | 15 +++++++++++++-- 3 files changed, 37 insertions(+), 2 deletions(-) --- a/Documentation/dev-tools/kfence.rst~kfence-allow-use-of-a-deferrable-timer +++ a/Documentation/dev-tools/kfence.rst @@ -41,6 +41,18 @@ guarded by KFENCE. The default is config ``CONFIG_KFENCE_SAMPLE_INTERVAL``. Setting ``kfence.sample_interval=0`` disables KFENCE. +The sample interval controls a timer that sets up KFENCE allocations. By +default, to keep the real sample interval predictable, the normal timer also +causes CPU wake-ups when the system is completely idle. This may be undesirable +on power-constrained systems. The boot parameter ``kfence.deferrable=1`` +instead switches to a "deferrable" timer which does not force CPU wake-ups on +idle systems, at the risk of unpredictable sample intervals. The default is +configurable via the Kconfig option ``CONFIG_KFENCE_DEFERRABLE``. + +.. warning:: + The KUnit test suite is very likely to fail when using a deferrable timer + since it currently causes very unpredictable sample intervals. + The KFENCE memory pool is of fixed size, and if the pool is exhausted, no further KFENCE allocations occur. With ``CONFIG_KFENCE_NUM_OBJECTS`` (default 255), the number of available guarded objects can be controlled. Each object --- a/lib/Kconfig.kfence~kfence-allow-use-of-a-deferrable-timer +++ a/lib/Kconfig.kfence @@ -45,6 +45,18 @@ config KFENCE_NUM_OBJECTS pages are required; with one containing the object and two adjacent ones used as guard pages. +config KFENCE_DEFERRABLE + bool "Use a deferrable timer to trigger allocations" + help + Use a deferrable timer to trigger allocations. This avoids forcing + CPU wake-ups if the system is idle, at the risk of a less predictable + sample interval. + + Warning: The KUnit test suite fails with this option enabled - due to + the unpredictability of the sample interval! + + Say N if you are unsure. + config KFENCE_STATIC_KEYS bool "Use static keys to set up allocations" if EXPERT depends on JUMP_LABEL --- a/mm/kfence/core.c~kfence-allow-use-of-a-deferrable-timer +++ a/mm/kfence/core.c @@ -95,6 +95,10 @@ module_param_cb(sample_interval, &sample static unsigned long kfence_skip_covered_thresh __read_mostly = 75; module_param_named(skip_covered_thresh, kfence_skip_covered_thresh, ulong, 0644); +/* If true, use a deferrable timer. */ +static bool kfence_deferrable __read_mostly = IS_ENABLED(CONFIG_KFENCE_DEFERRABLE); +module_param_named(deferrable, kfence_deferrable, bool, 0444); + /* The pool of pages used for guard pages and objects. */ char *__kfence_pool __read_mostly; EXPORT_SYMBOL(__kfence_pool); /* Export for test modules. */ @@ -740,6 +744,8 @@ late_initcall(kfence_debugfs_init); /* === Allocation Gate Timer ================================================ */ +static struct delayed_work kfence_timer; + #ifdef CONFIG_KFENCE_STATIC_KEYS /* Wait queue to wake up allocation-gate timer task. */ static DECLARE_WAIT_QUEUE_HEAD(allocation_wait); @@ -762,7 +768,6 @@ static DEFINE_IRQ_WORK(wake_up_kfence_ti * avoids IPIs, at the cost of not immediately capturing allocations if the * instructions remain cached. */ -static struct delayed_work kfence_timer; static void toggle_allocation_gate(struct work_struct *work) { if (!READ_ONCE(kfence_enabled)) @@ -790,7 +795,6 @@ static void toggle_allocation_gate(struc queue_delayed_work(system_unbound_wq, &kfence_timer, msecs_to_jiffies(kfence_sample_interval)); } -static DECLARE_DELAYED_WORK(kfence_timer, toggle_allocation_gate); /* === Public interface ===================================================== */ @@ -809,8 +813,15 @@ static void kfence_init_enable(void) { if (!IS_ENABLED(CONFIG_KFENCE_STATIC_KEYS)) static_branch_enable(&kfence_allocation_key); + + if (kfence_deferrable) + INIT_DEFERRABLE_WORK(&kfence_timer, toggle_allocation_gate); + else + INIT_DELAYED_WORK(&kfence_timer, toggle_allocation_gate); + WRITE_ONCE(kfence_enabled, true); queue_delayed_work(system_unbound_wq, &kfence_timer, 0); + pr_info("initialized - using %lu bytes for %d objects at 0x%p-0x%p\n", KFENCE_POOL_SIZE, CONFIG_KFENCE_NUM_OBJECTS, (void *)__kfence_pool, (void *)(__kfence_pool + KFENCE_POOL_SIZE)); From patchwork Tue Mar 22 21:48:28 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789280 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E2CBFC433FE for ; Tue, 22 Mar 2022 21:48:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6E1E06B01AE; Tue, 22 Mar 2022 17:48:32 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 695836B01B1; Tue, 22 Mar 2022 17:48:32 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4BE9E6B01B3; Tue, 22 Mar 2022 17:48:32 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0219.hostedemail.com [216.40.44.219]) by kanga.kvack.org (Postfix) with ESMTP id 3D7016B01AE for ; Tue, 22 Mar 2022 17:48:32 -0400 (EDT) Received: from smtpin28.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 02DBD1828AE4E for ; Tue, 22 Mar 2022 21:48:32 +0000 (UTC) X-FDA: 79273361664.28.859D892 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf06.hostedemail.com (Postfix) with ESMTP id 84625180012 for ; Tue, 22 Mar 2022 21:48:31 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 79904B81DAB; Tue, 22 Mar 2022 21:48:30 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 257E0C340EC; Tue, 22 Mar 2022 21:48:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985709; bh=uRITHZf8d6O7W32n304m1vz3dms6bCgyqp7pO2fMBr8=; h=Date:To:From:In-Reply-To:Subject:From; b=mDavlmCmh1Fq0DkQpofy2B+33lTcPhonwJgxzWAVXLWvjZG7FVhtuLfpZB1RKW982 3IFKBkMhZgw21ltkQLiNn4ya3A1xAl6SrUki1fAzhZp0s0WfXgNJ7yoa0+g/q/6OE4 2azALnnQtNam1lQ1jWrD1epj2T1sEyBnQ02C2mEc= Date: Tue, 22 Mar 2022 14:48:28 -0700 To: songmuchun@bytedance.com,linmiaohe@huawei.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 196/227] mm/hmm.c: remove unneeded local variable ret Message-Id: <20220322214829.257E0C340EC@smtp.kernel.org> X-Stat-Signature: pn84dpuagumet1cb13jt8od37ppbkgw9 Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=mDavlmCm; spf=pass (imf06.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 84625180012 X-HE-Tag: 1647985711-46477 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Miaohe Lin Subject: mm/hmm.c: remove unneeded local variable ret The local variable ret is always 0. Remove it to make code more tight. Link: https://lkml.kernel.org/r/20220125124833.39718-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin Reviewed-by: Muchun Song Signed-off-by: Andrew Morton --- mm/hmm.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) --- a/mm/hmm.c~mm-hmmc-remove-unneeded-local-variable-ret +++ a/mm/hmm.c @@ -417,7 +417,6 @@ static int hmm_vma_walk_pud(pud_t *pudp, struct hmm_range *range = hmm_vma_walk->range; unsigned long addr = start; pud_t pud; - int ret = 0; spinlock_t *ptl = pud_trans_huge_lock(pudp, walk->vma); if (!ptl) @@ -466,7 +465,7 @@ static int hmm_vma_walk_pud(pud_t *pudp, out_unlock: spin_unlock(ptl); - return ret; + return 0; } #else #define hmm_vma_walk_pud NULL From patchwork Tue Mar 22 21:48:31 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789281 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EB4EBC4332F for ; Tue, 22 Mar 2022 21:48:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 890126B01B3; Tue, 22 Mar 2022 17:48:35 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 840526B01F0; Tue, 22 Mar 2022 17:48:35 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7075A6B01F1; Tue, 22 Mar 2022 17:48:35 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.a.hostedemail.com [64.99.140.24]) by kanga.kvack.org (Postfix) with ESMTP id 61E016B01B3 for ; Tue, 22 Mar 2022 17:48:35 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 3BC1722C10 for ; Tue, 22 Mar 2022 21:48:35 +0000 (UTC) X-FDA: 79273361790.12.4BD7CA9 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf31.hostedemail.com (Postfix) with ESMTP id 9E6C120038 for ; Tue, 22 Mar 2022 21:48:34 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 81B18B81D59; Tue, 22 Mar 2022 21:48:33 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 246F9C340EE; Tue, 22 Mar 2022 21:48:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985712; bh=YtnK+S/OTeMSsh+8K3Gcc2G+gfbVMLvD25dI6o8ex0g=; h=Date:To:From:In-Reply-To:Subject:From; b=DL/lZVFkE5G8x8jC6cuNAnJGKHB97K37hLxWQjJF/mhhDKaoxmrtIZjP7kz+h42HC BgTqNIGf5vgakJSs7lboTnsSlMidRzRo4U7eiaBhkXNvQwcloxFBI/LO8KYbqhHEYW tkNfItor/stZmefDoiXS6AdBkiW7F3RzN0ctxwjA= Date: Tue, 22 Mar 2022 14:48:31 -0700 To: sj@kernel.org,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 197/227] mm/damon/dbgfs/init_regions: use target index instead of target id Message-Id: <20220322214832.246F9C340EE@smtp.kernel.org> X-Stat-Signature: okkbke6xjad3o6qm73n43cigic7jxqxf Authentication-Results: imf31.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b="DL/lZVFk"; spf=pass (imf31.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 9E6C120038 X-HE-Tag: 1647985714-606818 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: SeongJae Park Subject: mm/damon/dbgfs/init_regions: use target index instead of target id Patch series "Remove the type-unclear target id concept". DAMON asks each monitoring target ('struct damon_target') to have one 'unsigned long' integer called 'id', which should be unique among the targets of same monitoring context. Meaning of it is, however, totally up to the monitoring primitives that registered to the monitoring context. For example, the virtual address spaces monitoring primitives treats the id as a 'struct pid' pointer. This makes the code flexible but ugly, not well-documented, and type-unsafe[1]. Also, identification of each target can be done via its index. For the reason, this patchset removes the concept and uses clear type definition. [1] https://lore.kernel.org/linux-mm/20211013154535.4aaeaaf9d0182922e405dd1e@linux-foundation.org/ This patch (of 4): Target id is a 'unsigned long' data, which can be interpreted differently by each monitoring primitives. For example, it means 'struct pid *' for the virtual address spaces monitoring, while it means nothing but an integer to be displayed to debugfs interface users for the physical address space monitoring. It's flexible but makes code ugly and type-unsafe[1]. To be prepared for eventual removal of the concept, this commit removes a use case of the concept in 'init_regions' debugfs file handling. In detail, this commit replaces use of the id with the index of each target in the context's targets list. [1] https://lore.kernel.org/linux-mm/20211013154535.4aaeaaf9d0182922e405dd1e@linux-foundation.org/ Link: https://lkml.kernel.org/r/20211230100723.2238-1-sj@kernel.org Link: https://lkml.kernel.org/r/20211230100723.2238-2-sj@kernel.org Signed-off-by: SeongJae Park Signed-off-by: Andrew Morton --- mm/damon/dbgfs-test.h | 20 ++++++++++---------- mm/damon/dbgfs.c | 25 ++++++++++++------------- 2 files changed, 22 insertions(+), 23 deletions(-) --- a/mm/damon/dbgfs.c~mm-damon-dbgfs-init_regions-use-target-index-instead-of-target-id +++ a/mm/damon/dbgfs.c @@ -440,18 +440,20 @@ static ssize_t sprint_init_regions(struc { struct damon_target *t; struct damon_region *r; + int target_idx = 0; int written = 0; int rc; damon_for_each_target(t, c) { damon_for_each_region(r, t) { rc = scnprintf(&buf[written], len - written, - "%lu %lu %lu\n", - t->id, r->ar.start, r->ar.end); + "%d %lu %lu\n", + target_idx, r->ar.start, r->ar.end); if (!rc) return -ENOMEM; written += rc; } + target_idx++; } return written; } @@ -485,22 +487,19 @@ out: return len; } -static int add_init_region(struct damon_ctx *c, - unsigned long target_id, struct damon_addr_range *ar) +static int add_init_region(struct damon_ctx *c, int target_idx, + struct damon_addr_range *ar) { struct damon_target *t; struct damon_region *r, *prev; - unsigned long id; + unsigned long idx = 0; int rc = -EINVAL; if (ar->start >= ar->end) return -EINVAL; damon_for_each_target(t, c) { - id = t->id; - if (targetid_is_pid(c)) - id = (unsigned long)pid_vnr((struct pid *)id); - if (id == target_id) { + if (idx++ == target_idx) { r = damon_new_region(ar->start, ar->end); if (!r) return -ENOMEM; @@ -523,7 +522,7 @@ static int set_init_regions(struct damon struct damon_target *t; struct damon_region *r, *next; int pos = 0, parsed, ret; - unsigned long target_id; + int target_idx; struct damon_addr_range ar; int err; @@ -533,11 +532,11 @@ static int set_init_regions(struct damon } while (pos < len) { - ret = sscanf(&str[pos], "%lu %lu %lu%n", - &target_id, &ar.start, &ar.end, &parsed); + ret = sscanf(&str[pos], "%d %lu %lu%n", + &target_idx, &ar.start, &ar.end, &parsed); if (ret != 3) break; - err = add_init_region(c, target_id, &ar); + err = add_init_region(c, target_idx, &ar); if (err) goto fail; pos += parsed; --- a/mm/damon/dbgfs-test.h~mm-damon-dbgfs-init_regions-use-target-index-instead-of-target-id +++ a/mm/damon/dbgfs-test.h @@ -113,19 +113,19 @@ static void damon_dbgfs_test_set_init_re { struct damon_ctx *ctx = damon_new_ctx(); unsigned long ids[] = {1, 2, 3}; - /* Each line represents one region in `` `` */ - char * const valid_inputs[] = {"2 10 20\n 2 20 30\n2 35 45", - "2 10 20\n", - "2 10 20\n1 39 59\n1 70 134\n 2 20 25\n", + /* Each line represents one region in `` `` */ + char * const valid_inputs[] = {"1 10 20\n 1 20 30\n1 35 45", + "1 10 20\n", + "1 10 20\n0 39 59\n0 70 134\n 1 20 25\n", ""}; /* Reading the file again will show sorted, clean output */ - char * const valid_expects[] = {"2 10 20\n2 20 30\n2 35 45\n", - "2 10 20\n", - "1 39 59\n1 70 134\n2 10 20\n2 20 25\n", + char * const valid_expects[] = {"1 10 20\n1 20 30\n1 35 45\n", + "1 10 20\n", + "0 39 59\n0 70 134\n1 10 20\n1 20 25\n", ""}; - char * const invalid_inputs[] = {"4 10 20\n", /* target not exists */ - "2 10 20\n 2 14 26\n", /* regions overlap */ - "1 10 20\n2 30 40\n 1 5 8"}; /* not sorted by address */ + char * const invalid_inputs[] = {"3 10 20\n", /* target not exists */ + "1 10 20\n 1 14 26\n", /* regions overlap */ + "0 10 20\n1 30 40\n 0 5 8"}; /* not sorted by address */ char *input, *expect; int i, rc; char buf[256]; From patchwork Tue Mar 22 21:48:34 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789282 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 90080C433F5 for ; Tue, 22 Mar 2022 21:48:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 18D3C6B01F1; Tue, 22 Mar 2022 17:48:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0ECAF6B01F2; Tue, 22 Mar 2022 17:48:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EF6EA6B01F3; Tue, 22 Mar 2022 17:48:36 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.27]) by kanga.kvack.org (Postfix) with ESMTP id DC4646B01F1 for ; Tue, 22 Mar 2022 17:48:36 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay12.hostedemail.com (Postfix) with ESMTP id 9F90012194F for ; Tue, 22 Mar 2022 21:48:36 +0000 (UTC) X-FDA: 79273361832.13.7AEC7AD Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf03.hostedemail.com (Postfix) with ESMTP id 3C6B520014 for ; Tue, 22 Mar 2022 21:48:36 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id AB9656174A; Tue, 22 Mar 2022 21:48:35 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 1030EC340EE; Tue, 22 Mar 2022 21:48:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985715; bh=ayXBwIwhumzWY9c4wWb4oEyiB3je2vpZD8X2Qsn4eDI=; h=Date:To:From:In-Reply-To:Subject:From; b=Ykfsv2pm5JNWiwv/Cn4GWP6rJ9QDfTiQiDSceCXBjbWnBaHlWVO3swiGy/hy6y27U RzcRTFcpnFskXNkMg4F52q/mFjy7p2uSQET0WL91oNG1frCekVEhirDffHFmu50w6c QZM6LhDwstf1s1BfqmnQ2AS05vovZyVH+j67ORdE= Date: Tue, 22 Mar 2022 14:48:34 -0700 To: sj@kernel.org,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 198/227] Docs/admin-guide/mm/damon/usage: update for changed initail_regions file input Message-Id: <20220322214835.1030EC340EE@smtp.kernel.org> X-Stat-Signature: 3i1eqr6ctbwkpx47a4mpzdjsjcbko5jx X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 3C6B520014 Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=Ykfsv2pm; dmarc=none; spf=pass (imf03.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Rspam-User: X-HE-Tag: 1647985716-607179 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: SeongJae Park Subject: Docs/admin-guide/mm/damon/usage: update for changed initail_regions file input A previous commit made init_regions debugfs file to use target index instead of target id for specifying the target of the init regions. This commit updates the usage document to reflect the change. Link: https://lkml.kernel.org/r/20211230100723.2238-3-sj@kernel.org Signed-off-by: SeongJae Park Signed-off-by: Andrew Morton --- Documentation/admin-guide/mm/damon/usage.rst | 24 +++++++++-------- 1 file changed, 14 insertions(+), 10 deletions(-) --- a/Documentation/admin-guide/mm/damon/usage.rst~docs-admin-guide-mm-damon-usage-update-for-changed-initail_regions-file-input +++ a/Documentation/admin-guide/mm/damon/usage.rst @@ -108,19 +108,23 @@ In such cases, users can explicitly set as they want, by writing proper values to the ``init_regions`` file. Each line of the input should represent one region in below form.:: - + -The ``target id`` should already in ``target_ids`` file, and the regions should -be passed in address order. For example, below commands will set a couple of -address ranges, ``1-100`` and ``100-200`` as the initial monitoring target -region of process 42, and another couple of address ranges, ``20-40`` and -``50-100`` as that of process 4242.:: +The ``target idx`` should be the index of the target in ``target_ids`` file, +starting from ``0``, and the regions should be passed in address order. For +example, below commands will set a couple of address ranges, ``1-100`` and +``100-200`` as the initial monitoring target region of pid 42, which is the +first one (index ``0``) in ``target_ids``, and another couple of address +ranges, ``20-40`` and ``50-100`` as that of pid 4242, which is the second one +(index ``1``) in ``target_ids``.:: # cd /damon - # echo "42 1 100 - 42 100 200 - 4242 20 40 - 4242 50 100" > init_regions + # cat target_ids + 42 4242 + # echo "0 1 100 + 0 100 200 + 1 20 40 + 1 50 100" > init_regions Note that this sets the initial monitoring target regions only. In case of virtual memory monitoring, DAMON will automatically updates the boundary of the From patchwork Tue Mar 22 21:48:37 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789301 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1C898C433F5 for ; Tue, 22 Mar 2022 21:49:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A3D466B024E; Tue, 22 Mar 2022 17:49:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9EB4B6B024F; Tue, 22 Mar 2022 17:49:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 88C696B0250; Tue, 22 Mar 2022 17:49:37 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.28]) by kanga.kvack.org (Postfix) with ESMTP id 798606B024E for ; Tue, 22 Mar 2022 17:49:37 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 17F2A23EC7 for ; Tue, 22 Mar 2022 21:49:37 +0000 (UTC) X-FDA: 79273364394.14.09603C1 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf17.hostedemail.com (Postfix) with ESMTP id 289E140027 for ; Tue, 22 Mar 2022 21:48:39 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 8EE936174A; Tue, 22 Mar 2022 21:48:38 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id EC583C340EC; Tue, 22 Mar 2022 21:48:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985718; bh=jw/fBqUkG5Pq/0ljg8vcqTguNFMMKBtQvzYQTgtUBSA=; h=Date:To:From:In-Reply-To:Subject:From; b=h6DMSV/mPmk2lHVVYcGrkABLh3N5iNrbJIspXKMjuNipGmYNfHA++GT+Jdc/NQnyg m3Fzc6vQ6qZastql3JniDJOx2/q2ix1Mzv4OMWiQGx22QmYT1RoO+MV9N6fwTYLlxV P5eF8yiX3SvqKe1YyML+jorxrLwpIrNnd7jDrwgA= Date: Tue, 22 Mar 2022 14:48:37 -0700 To: sj@kernel.org,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 199/227] mm/damon/core: move damon_set_targets() into dbgfs Message-Id: <20220322214837.EC583C340EC@smtp.kernel.org> X-Stat-Signature: 16mp7wnrxppfqzrsui57edqzid3upfsz Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b="h6DMSV/m"; spf=pass (imf17.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 289E140027 X-HE-Tag: 1647985719-460873 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: SeongJae Park Subject: mm/damon/core: move damon_set_targets() into dbgfs damon_set_targets() function is defined in the core for general use cases, but called from only dbgfs. Also, because the function is for general use cases, dbgfs does additional handling of pid type target id case. To make the situation simpler, this commit moves the function into dbgfs and makes it to do the pid type case handling on its own. Link: https://lkml.kernel.org/r/20211230100723.2238-4-sj@kernel.org Signed-off-by: SeongJae Park Signed-off-by: Andrew Morton --- include/linux/damon.h | 2 - mm/damon/core-test.h | 5 +++ mm/damon/core.c | 32 ------------------------ mm/damon/dbgfs-test.h | 14 +++++----- mm/damon/dbgfs.c | 53 ++++++++++++++++++++++++++++++---------- 5 files changed, 52 insertions(+), 54 deletions(-) --- a/include/linux/damon.h~mm-damon-core-move-damon_set_targets-into-dbgfs +++ a/include/linux/damon.h @@ -484,8 +484,6 @@ unsigned int damon_nr_regions(struct dam struct damon_ctx *damon_new_ctx(void); void damon_destroy_ctx(struct damon_ctx *ctx); -int damon_set_targets(struct damon_ctx *ctx, - unsigned long *ids, ssize_t nr_ids); int damon_set_attrs(struct damon_ctx *ctx, unsigned long sample_int, unsigned long aggr_int, unsigned long primitive_upd_int, unsigned long min_nr_reg, unsigned long max_nr_reg); --- a/mm/damon/core.c~mm-damon-core-move-damon_set_targets-into-dbgfs +++ a/mm/damon/core.c @@ -246,38 +246,6 @@ void damon_destroy_ctx(struct damon_ctx } /** - * damon_set_targets() - Set monitoring targets. - * @ctx: monitoring context - * @ids: array of target ids - * @nr_ids: number of entries in @ids - * - * This function should not be called while the kdamond is running. - * - * Return: 0 on success, negative error code otherwise. - */ -int damon_set_targets(struct damon_ctx *ctx, - unsigned long *ids, ssize_t nr_ids) -{ - ssize_t i; - struct damon_target *t, *next; - - damon_destroy_targets(ctx); - - for (i = 0; i < nr_ids; i++) { - t = damon_new_target(ids[i]); - if (!t) { - /* The caller should do cleanup of the ids itself */ - damon_for_each_target_safe(t, next, ctx) - damon_destroy_target(t); - return -ENOMEM; - } - damon_add_target(ctx, t); - } - - return 0; -} - -/** * damon_set_attrs() - Set attributes for the monitoring. * @ctx: monitoring context * @sample_int: time interval between samplings --- a/mm/damon/core-test.h~mm-damon-core-move-damon_set_targets-into-dbgfs +++ a/mm/damon/core-test.h @@ -86,7 +86,10 @@ static void damon_test_aggregate(struct struct damon_region *r; int it, ir; - damon_set_targets(ctx, target_ids, 3); + for (it = 0; it < 3; it++) { + t = damon_new_target(target_ids[it]); + damon_add_target(ctx, t); + } it = 0; damon_for_each_target(t, ctx) { --- a/mm/damon/dbgfs.c~mm-damon-core-move-damon_set_targets-into-dbgfs +++ a/mm/damon/dbgfs.c @@ -358,11 +358,48 @@ static void dbgfs_put_pids(unsigned long put_pid((struct pid *)ids[i]); } +/* + * dbgfs_set_targets() - Set monitoring targets. + * @ctx: monitoring context + * @ids: array of target ids + * @nr_ids: number of entries in @ids + * + * This function should not be called while the kdamond is running. + * + * Return: 0 on success, negative error code otherwise. + */ +static int dbgfs_set_targets(struct damon_ctx *ctx, + unsigned long *ids, ssize_t nr_ids) +{ + ssize_t i; + struct damon_target *t, *next; + + damon_for_each_target_safe(t, next, ctx) { + if (targetid_is_pid(ctx)) + put_pid((struct pid *)t->id); + damon_destroy_target(t); + } + + for (i = 0; i < nr_ids; i++) { + t = damon_new_target(ids[i]); + if (!t) { + /* The caller should do cleanup of the ids itself */ + damon_for_each_target_safe(t, next, ctx) + damon_destroy_target(t); + if (targetid_is_pid(ctx)) + dbgfs_put_pids(ids, nr_ids); + return -ENOMEM; + } + damon_add_target(ctx, t); + } + + return 0; +} + static ssize_t dbgfs_target_ids_write(struct file *file, const char __user *buf, size_t count, loff_t *ppos) { struct damon_ctx *ctx = file->private_data; - struct damon_target *t, *next_t; bool id_is_pid = true; char *kbuf; unsigned long *targets; @@ -407,11 +444,7 @@ static ssize_t dbgfs_target_ids_write(st } /* remove previously set targets */ - damon_for_each_target_safe(t, next_t, ctx) { - if (targetid_is_pid(ctx)) - put_pid((struct pid *)t->id); - damon_destroy_target(t); - } + dbgfs_set_targets(ctx, NULL, 0); /* Configure the context for the address space type */ if (id_is_pid) @@ -419,13 +452,9 @@ static ssize_t dbgfs_target_ids_write(st else damon_pa_set_primitives(ctx); - ret = damon_set_targets(ctx, targets, nr_targets); - if (ret) { - if (id_is_pid) - dbgfs_put_pids(targets, nr_targets); - } else { + ret = dbgfs_set_targets(ctx, targets, nr_targets); + if (!ret) ret = count; - } unlock_out: mutex_unlock(&ctx->kdamond_lock); --- a/mm/damon/dbgfs-test.h~mm-damon-core-move-damon_set_targets-into-dbgfs +++ a/mm/damon/dbgfs-test.h @@ -86,23 +86,23 @@ static void damon_dbgfs_test_set_targets ctx->primitive.target_valid = NULL; ctx->primitive.cleanup = NULL; - damon_set_targets(ctx, ids, 3); + dbgfs_set_targets(ctx, ids, 3); sprint_target_ids(ctx, buf, 64); KUNIT_EXPECT_STREQ(test, (char *)buf, "1 2 3\n"); - damon_set_targets(ctx, NULL, 0); + dbgfs_set_targets(ctx, NULL, 0); sprint_target_ids(ctx, buf, 64); KUNIT_EXPECT_STREQ(test, (char *)buf, "\n"); - damon_set_targets(ctx, (unsigned long []){1, 2}, 2); + dbgfs_set_targets(ctx, (unsigned long []){1, 2}, 2); sprint_target_ids(ctx, buf, 64); KUNIT_EXPECT_STREQ(test, (char *)buf, "1 2\n"); - damon_set_targets(ctx, (unsigned long []){2}, 1); + dbgfs_set_targets(ctx, (unsigned long []){2}, 1); sprint_target_ids(ctx, buf, 64); KUNIT_EXPECT_STREQ(test, (char *)buf, "2\n"); - damon_set_targets(ctx, NULL, 0); + dbgfs_set_targets(ctx, NULL, 0); sprint_target_ids(ctx, buf, 64); KUNIT_EXPECT_STREQ(test, (char *)buf, "\n"); @@ -130,7 +130,7 @@ static void damon_dbgfs_test_set_init_re int i, rc; char buf[256]; - damon_set_targets(ctx, ids, 3); + dbgfs_set_targets(ctx, ids, 3); /* Put valid inputs and check the results */ for (i = 0; i < ARRAY_SIZE(valid_inputs); i++) { @@ -158,7 +158,7 @@ static void damon_dbgfs_test_set_init_re KUNIT_EXPECT_STREQ(test, (char *)buf, ""); } - damon_set_targets(ctx, NULL, 0); + dbgfs_set_targets(ctx, NULL, 0); damon_destroy_ctx(ctx); } From patchwork Tue Mar 22 21:48:40 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789283 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D7327C433F5 for ; Tue, 22 Mar 2022 21:48:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 696CD6B01F3; Tue, 22 Mar 2022 17:48:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 643386B01F4; Tue, 22 Mar 2022 17:48:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 50C116B01F5; Tue, 22 Mar 2022 17:48:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.28]) by kanga.kvack.org (Postfix) with ESMTP id 3F4DA6B01F3 for ; Tue, 22 Mar 2022 17:48:44 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay13.hostedemail.com (Postfix) with ESMTP id E37DA61991 for ; Tue, 22 Mar 2022 21:48:43 +0000 (UTC) X-FDA: 79273362168.13.5B1DC00 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf06.hostedemail.com (Postfix) with ESMTP id 61A51180010 for ; Tue, 22 Mar 2022 21:48:43 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 2D0C8B81DAB; Tue, 22 Mar 2022 21:48:42 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id DC28DC340EC; Tue, 22 Mar 2022 21:48:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985721; bh=W4SPC9RR1WKwf5/t0r6r/k1cPTEomNJjoAcGOK7Tg7Y=; h=Date:To:From:In-Reply-To:Subject:From; b=PzEBD5ULKtyouhA5xnortKDDzidM7ENo/ahbIeCZ3mroOIqLRylbBVFbFgONRa26M iR6rnMWgHKAfldK7BWfeA3GF2AcpLAiEiDlAs1BaipEVgmG1vXuKI1c+UDC6MB7o/a 2lx/tT+n6Kqe5hYNhui7Vofp3zZ0ZMpHQqrf5BGQ= Date: Tue, 22 Mar 2022 14:48:40 -0700 To: sj@kernel.org,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 200/227] mm/damon: remove the target id concept Message-Id: <20220322214840.DC28DC340EC@smtp.kernel.org> X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 61A51180010 X-Rspam-User: Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=PzEBD5UL; dmarc=none; spf=pass (imf06.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Stat-Signature: 1xm7cr8hcw88uxhwfb8dg4m7xyh35hdt X-HE-Tag: 1647985723-316556 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: SeongJae Park Subject: mm/damon: remove the target id concept DAMON asks each monitoring target ('struct damon_target') to have one 'unsigned long' integer called 'id', which should be unique among the targets of same monitoring context. Meaning of it is, however, totally up to the monitoring primitives that registered to the monitoring context. For example, the virtual address spaces monitoring primitives treats the id as a 'struct pid' pointer. This makes the code flexible, but ugly, not well-documented, and type-unsafe[1]. Also, identification of each target can be done via its index. For the reason, this commit removes the concept and uses clear type definition. For now, only 'struct pid' pointer is used for the virtual address spaces monitoring. If DAMON is extended in future so that we need to put another identifier field in the struct, we will use a union for such primitives-dependent fields and document which primitives are using which type. [1] https://lore.kernel.org/linux-mm/20211013154535.4aaeaaf9d0182922e405dd1e@linux-foundation.org/ Link: https://lkml.kernel.org/r/20211230100723.2238-5-sj@kernel.org Signed-off-by: SeongJae Park Signed-off-by: Andrew Morton --- include/linux/damon.h | 11 +- mm/damon/core-test.h | 18 ++-- mm/damon/core.c | 4 - mm/damon/dbgfs-test.h | 63 +++++----------- mm/damon/dbgfs.c | 152 +++++++++++++++++++++++----------------- mm/damon/reclaim.c | 3 mm/damon/vaddr-test.h | 6 - mm/damon/vaddr.c | 4 - 8 files changed, 133 insertions(+), 128 deletions(-) --- a/include/linux/damon.h~mm-damon-remove-the-target-id-concept +++ a/include/linux/damon.h @@ -60,19 +60,18 @@ struct damon_region { /** * struct damon_target - Represents a monitoring target. - * @id: Unique identifier for this target. + * @pid: The PID of the virtual address space to monitor. * @nr_regions: Number of monitoring target regions of this target. * @regions_list: Head of the monitoring target regions of this target. * @list: List head for siblings. * * Each monitoring context could have multiple targets. For example, a context * for virtual memory address spaces could have multiple target processes. The - * @id of each target should be unique among the targets of the context. For - * example, in the virtual address monitoring context, it could be a pidfd or - * an address of an mm_struct. + * @pid should be set for appropriate address space monitoring primitives + * including the virtual address spaces monitoring primitives. */ struct damon_target { - unsigned long id; + struct pid *pid; unsigned int nr_regions; struct list_head regions_list; struct list_head list; @@ -475,7 +474,7 @@ struct damos *damon_new_scheme( void damon_add_scheme(struct damon_ctx *ctx, struct damos *s); void damon_destroy_scheme(struct damos *s); -struct damon_target *damon_new_target(unsigned long id); +struct damon_target *damon_new_target(void); void damon_add_target(struct damon_ctx *ctx, struct damon_target *t); bool damon_targets_empty(struct damon_ctx *ctx); void damon_free_target(struct damon_target *t); --- a/mm/damon/core.c~mm-damon-remove-the-target-id-concept +++ a/mm/damon/core.c @@ -144,7 +144,7 @@ void damon_destroy_scheme(struct damos * * * Returns the pointer to the new struct if success, or NULL otherwise */ -struct damon_target *damon_new_target(unsigned long id) +struct damon_target *damon_new_target(void) { struct damon_target *t; @@ -152,7 +152,7 @@ struct damon_target *damon_new_target(un if (!t) return NULL; - t->id = id; + t->pid = NULL; t->nr_regions = 0; INIT_LIST_HEAD(&t->regions_list); --- a/mm/damon/core-test.h~mm-damon-remove-the-target-id-concept +++ a/mm/damon/core-test.h @@ -24,7 +24,7 @@ static void damon_test_regions(struct ku KUNIT_EXPECT_EQ(test, 2ul, r->ar.end); KUNIT_EXPECT_EQ(test, 0u, r->nr_accesses); - t = damon_new_target(42); + t = damon_new_target(); KUNIT_EXPECT_EQ(test, 0u, damon_nr_regions(t)); damon_add_region(r, t); @@ -52,8 +52,7 @@ static void damon_test_target(struct kun struct damon_ctx *c = damon_new_ctx(); struct damon_target *t; - t = damon_new_target(42); - KUNIT_EXPECT_EQ(test, 42ul, t->id); + t = damon_new_target(); KUNIT_EXPECT_EQ(test, 0u, nr_damon_targets(c)); damon_add_target(c, t); @@ -78,7 +77,6 @@ static void damon_test_target(struct kun static void damon_test_aggregate(struct kunit *test) { struct damon_ctx *ctx = damon_new_ctx(); - unsigned long target_ids[] = {1, 2, 3}; unsigned long saddr[][3] = {{10, 20, 30}, {5, 42, 49}, {13, 33, 55} }; unsigned long eaddr[][3] = {{15, 27, 40}, {31, 45, 55}, {23, 44, 66} }; unsigned long accesses[][3] = {{42, 95, 84}, {10, 20, 30}, {0, 1, 2} }; @@ -87,7 +85,7 @@ static void damon_test_aggregate(struct int it, ir; for (it = 0; it < 3; it++) { - t = damon_new_target(target_ids[it]); + t = damon_new_target(); damon_add_target(ctx, t); } @@ -125,7 +123,7 @@ static void damon_test_split_at(struct k struct damon_target *t; struct damon_region *r; - t = damon_new_target(42); + t = damon_new_target(); r = damon_new_region(0, 100); damon_add_region(r, t); damon_split_region_at(c, t, r, 25); @@ -146,7 +144,7 @@ static void damon_test_merge_two(struct struct damon_region *r, *r2, *r3; int i; - t = damon_new_target(42); + t = damon_new_target(); r = damon_new_region(0, 100); r->nr_accesses = 10; damon_add_region(r, t); @@ -194,7 +192,7 @@ static void damon_test_merge_regions_of( unsigned long eaddrs[] = {112, 130, 156, 170, 230}; int i; - t = damon_new_target(42); + t = damon_new_target(); for (i = 0; i < ARRAY_SIZE(sa); i++) { r = damon_new_region(sa[i], ea[i]); r->nr_accesses = nrs[i]; @@ -218,14 +216,14 @@ static void damon_test_split_regions_of( struct damon_target *t; struct damon_region *r; - t = damon_new_target(42); + t = damon_new_target(); r = damon_new_region(0, 22); damon_add_region(r, t); damon_split_regions_of(c, t, 2); KUNIT_EXPECT_LE(test, damon_nr_regions(t), 2u); damon_free_target(t); - t = damon_new_target(42); + t = damon_new_target(); r = damon_new_region(0, 220); damon_add_region(r, t); damon_split_regions_of(c, t, 4); --- a/mm/damon/dbgfs.c~mm-damon-remove-the-target-id-concept +++ a/mm/damon/dbgfs.c @@ -275,7 +275,7 @@ out: return ret; } -static inline bool targetid_is_pid(const struct damon_ctx *ctx) +static inline bool target_has_pid(const struct damon_ctx *ctx) { return ctx->primitive.target_valid == damon_va_target_valid; } @@ -283,17 +283,19 @@ static inline bool targetid_is_pid(const static ssize_t sprint_target_ids(struct damon_ctx *ctx, char *buf, ssize_t len) { struct damon_target *t; - unsigned long id; + int id; int written = 0; int rc; damon_for_each_target(t, ctx) { - id = t->id; - if (targetid_is_pid(ctx)) + if (target_has_pid(ctx)) /* Show pid numbers to debugfs users */ - id = (unsigned long)pid_vnr((struct pid *)id); + id = pid_vnr(t->pid); + else + /* Show 42 for physical address space, just for fun */ + id = 42; - rc = scnprintf(&buf[written], len - written, "%lu ", id); + rc = scnprintf(&buf[written], len - written, "%d ", id); if (!rc) return -ENOMEM; written += rc; @@ -321,75 +323,114 @@ static ssize_t dbgfs_target_ids_read(str } /* - * Converts a string into an array of unsigned long integers + * Converts a string into an integers array * - * Returns an array of unsigned long integers if the conversion success, or - * NULL otherwise. + * Returns an array of integers array if the conversion success, or NULL + * otherwise. */ -static unsigned long *str_to_target_ids(const char *str, ssize_t len, - ssize_t *nr_ids) +static int *str_to_ints(const char *str, ssize_t len, ssize_t *nr_ints) { - unsigned long *ids; - const int max_nr_ids = 32; - unsigned long id; + int *array; + const int max_nr_ints = 32; + int nr; int pos = 0, parsed, ret; - *nr_ids = 0; - ids = kmalloc_array(max_nr_ids, sizeof(id), GFP_KERNEL); - if (!ids) + *nr_ints = 0; + array = kmalloc_array(max_nr_ints, sizeof(*array), GFP_KERNEL); + if (!array) return NULL; - while (*nr_ids < max_nr_ids && pos < len) { - ret = sscanf(&str[pos], "%lu%n", &id, &parsed); + while (*nr_ints < max_nr_ints && pos < len) { + ret = sscanf(&str[pos], "%d%n", &nr, &parsed); pos += parsed; if (ret != 1) break; - ids[*nr_ids] = id; - *nr_ids += 1; + array[*nr_ints] = nr; + *nr_ints += 1; } - return ids; + return array; } -static void dbgfs_put_pids(unsigned long *ids, int nr_ids) +static void dbgfs_put_pids(struct pid **pids, int nr_pids) { int i; - for (i = 0; i < nr_ids; i++) - put_pid((struct pid *)ids[i]); + for (i = 0; i < nr_pids; i++) + put_pid(pids[i]); +} + +/* + * Converts a string into an struct pid pointers array + * + * Returns an array of struct pid pointers if the conversion success, or NULL + * otherwise. + */ +static struct pid **str_to_pids(const char *str, ssize_t len, ssize_t *nr_pids) +{ + int *ints; + ssize_t nr_ints; + struct pid **pids; + + *nr_pids = 0; + + ints = str_to_ints(str, len, &nr_ints); + if (!ints) + return NULL; + + pids = kmalloc_array(nr_ints, sizeof(*pids), GFP_KERNEL); + if (!pids) + goto out; + + for (; *nr_pids < nr_ints; (*nr_pids)++) { + pids[*nr_pids] = find_get_pid(ints[*nr_pids]); + if (!pids[*nr_pids]) { + dbgfs_put_pids(pids, *nr_pids); + kfree(ints); + kfree(pids); + return NULL; + } + } + +out: + kfree(ints); + return pids; } /* * dbgfs_set_targets() - Set monitoring targets. * @ctx: monitoring context - * @ids: array of target ids - * @nr_ids: number of entries in @ids + * @nr_targets: number of targets + * @pids: array of target pids (size is same to @nr_targets) * - * This function should not be called while the kdamond is running. + * This function should not be called while the kdamond is running. @pids is + * ignored if the context is not configured to have pid in each target. On + * failure, reference counts of all pids in @pids are decremented. * * Return: 0 on success, negative error code otherwise. */ -static int dbgfs_set_targets(struct damon_ctx *ctx, - unsigned long *ids, ssize_t nr_ids) +static int dbgfs_set_targets(struct damon_ctx *ctx, ssize_t nr_targets, + struct pid **pids) { ssize_t i; struct damon_target *t, *next; damon_for_each_target_safe(t, next, ctx) { - if (targetid_is_pid(ctx)) - put_pid((struct pid *)t->id); + if (target_has_pid(ctx)) + put_pid(t->pid); damon_destroy_target(t); } - for (i = 0; i < nr_ids; i++) { - t = damon_new_target(ids[i]); + for (i = 0; i < nr_targets; i++) { + t = damon_new_target(); if (!t) { - /* The caller should do cleanup of the ids itself */ damon_for_each_target_safe(t, next, ctx) damon_destroy_target(t); - if (targetid_is_pid(ctx)) - dbgfs_put_pids(ids, nr_ids); + if (target_has_pid(ctx)) + dbgfs_put_pids(pids, nr_targets); return -ENOMEM; } + if (target_has_pid(ctx)) + t->pid = pids[i]; damon_add_target(ctx, t); } @@ -402,10 +443,9 @@ static ssize_t dbgfs_target_ids_write(st struct damon_ctx *ctx = file->private_data; bool id_is_pid = true; char *kbuf; - unsigned long *targets; + struct pid **target_pids = NULL; ssize_t nr_targets; ssize_t ret; - int i; kbuf = user_input_str(buf, count, ppos); if (IS_ERR(kbuf)) @@ -413,38 +453,27 @@ static ssize_t dbgfs_target_ids_write(st if (!strncmp(kbuf, "paddr\n", count)) { id_is_pid = false; - /* target id is meaningless here, but we set it just for fun */ - scnprintf(kbuf, count, "42 "); - } - - targets = str_to_target_ids(kbuf, count, &nr_targets); - if (!targets) { - ret = -ENOMEM; - goto out; + nr_targets = 1; } if (id_is_pid) { - for (i = 0; i < nr_targets; i++) { - targets[i] = (unsigned long)find_get_pid( - (int)targets[i]); - if (!targets[i]) { - dbgfs_put_pids(targets, i); - ret = -EINVAL; - goto free_targets_out; - } + target_pids = str_to_pids(kbuf, count, &nr_targets); + if (!target_pids) { + ret = -ENOMEM; + goto out; } } mutex_lock(&ctx->kdamond_lock); if (ctx->kdamond) { if (id_is_pid) - dbgfs_put_pids(targets, nr_targets); + dbgfs_put_pids(target_pids, nr_targets); ret = -EBUSY; goto unlock_out; } /* remove previously set targets */ - dbgfs_set_targets(ctx, NULL, 0); + dbgfs_set_targets(ctx, 0, NULL); /* Configure the context for the address space type */ if (id_is_pid) @@ -452,14 +481,13 @@ static ssize_t dbgfs_target_ids_write(st else damon_pa_set_primitives(ctx); - ret = dbgfs_set_targets(ctx, targets, nr_targets); + ret = dbgfs_set_targets(ctx, nr_targets, target_pids); if (!ret) ret = count; unlock_out: mutex_unlock(&ctx->kdamond_lock); -free_targets_out: - kfree(targets); + kfree(target_pids); out: kfree(kbuf); return ret; @@ -688,12 +716,12 @@ static void dbgfs_before_terminate(struc { struct damon_target *t, *next; - if (!targetid_is_pid(ctx)) + if (!target_has_pid(ctx)) return; mutex_lock(&ctx->kdamond_lock); damon_for_each_target_safe(t, next, ctx) { - put_pid((struct pid *)t->id); + put_pid(t->pid); damon_destroy_target(t); } mutex_unlock(&ctx->kdamond_lock); --- a/mm/damon/dbgfs-test.h~mm-damon-remove-the-target-id-concept +++ a/mm/damon/dbgfs-test.h @@ -12,66 +12,58 @@ #include -static void damon_dbgfs_test_str_to_target_ids(struct kunit *test) +static void damon_dbgfs_test_str_to_ints(struct kunit *test) { char *question; - unsigned long *answers; - unsigned long expected[] = {12, 35, 46}; + int *answers; + int expected[] = {12, 35, 46}; ssize_t nr_integers = 0, i; question = "123"; - answers = str_to_target_ids(question, strlen(question), - &nr_integers); + answers = str_to_ints(question, strlen(question), &nr_integers); KUNIT_EXPECT_EQ(test, (ssize_t)1, nr_integers); - KUNIT_EXPECT_EQ(test, 123ul, answers[0]); + KUNIT_EXPECT_EQ(test, 123, answers[0]); kfree(answers); question = "123abc"; - answers = str_to_target_ids(question, strlen(question), - &nr_integers); + answers = str_to_ints(question, strlen(question), &nr_integers); KUNIT_EXPECT_EQ(test, (ssize_t)1, nr_integers); - KUNIT_EXPECT_EQ(test, 123ul, answers[0]); + KUNIT_EXPECT_EQ(test, 123, answers[0]); kfree(answers); question = "a123"; - answers = str_to_target_ids(question, strlen(question), - &nr_integers); + answers = str_to_ints(question, strlen(question), &nr_integers); KUNIT_EXPECT_EQ(test, (ssize_t)0, nr_integers); kfree(answers); question = "12 35"; - answers = str_to_target_ids(question, strlen(question), - &nr_integers); + answers = str_to_ints(question, strlen(question), &nr_integers); KUNIT_EXPECT_EQ(test, (ssize_t)2, nr_integers); for (i = 0; i < nr_integers; i++) KUNIT_EXPECT_EQ(test, expected[i], answers[i]); kfree(answers); question = "12 35 46"; - answers = str_to_target_ids(question, strlen(question), - &nr_integers); + answers = str_to_ints(question, strlen(question), &nr_integers); KUNIT_EXPECT_EQ(test, (ssize_t)3, nr_integers); for (i = 0; i < nr_integers; i++) KUNIT_EXPECT_EQ(test, expected[i], answers[i]); kfree(answers); question = "12 35 abc 46"; - answers = str_to_target_ids(question, strlen(question), - &nr_integers); + answers = str_to_ints(question, strlen(question), &nr_integers); KUNIT_EXPECT_EQ(test, (ssize_t)2, nr_integers); for (i = 0; i < 2; i++) KUNIT_EXPECT_EQ(test, expected[i], answers[i]); kfree(answers); question = ""; - answers = str_to_target_ids(question, strlen(question), - &nr_integers); + answers = str_to_ints(question, strlen(question), &nr_integers); KUNIT_EXPECT_EQ(test, (ssize_t)0, nr_integers); kfree(answers); question = "\n"; - answers = str_to_target_ids(question, strlen(question), - &nr_integers); + answers = str_to_ints(question, strlen(question), &nr_integers); KUNIT_EXPECT_EQ(test, (ssize_t)0, nr_integers); kfree(answers); } @@ -79,30 +71,20 @@ static void damon_dbgfs_test_str_to_targ static void damon_dbgfs_test_set_targets(struct kunit *test) { struct damon_ctx *ctx = dbgfs_new_ctx(); - unsigned long ids[] = {1, 2, 3}; char buf[64]; - /* Make DAMON consider target id as plain number */ - ctx->primitive.target_valid = NULL; - ctx->primitive.cleanup = NULL; + /* Make DAMON consider target has no pid */ + ctx->primitive = (struct damon_primitive){}; - dbgfs_set_targets(ctx, ids, 3); - sprint_target_ids(ctx, buf, 64); - KUNIT_EXPECT_STREQ(test, (char *)buf, "1 2 3\n"); - - dbgfs_set_targets(ctx, NULL, 0); + dbgfs_set_targets(ctx, 0, NULL); sprint_target_ids(ctx, buf, 64); KUNIT_EXPECT_STREQ(test, (char *)buf, "\n"); - dbgfs_set_targets(ctx, (unsigned long []){1, 2}, 2); - sprint_target_ids(ctx, buf, 64); - KUNIT_EXPECT_STREQ(test, (char *)buf, "1 2\n"); - - dbgfs_set_targets(ctx, (unsigned long []){2}, 1); + dbgfs_set_targets(ctx, 1, NULL); sprint_target_ids(ctx, buf, 64); - KUNIT_EXPECT_STREQ(test, (char *)buf, "2\n"); + KUNIT_EXPECT_STREQ(test, (char *)buf, "42\n"); - dbgfs_set_targets(ctx, NULL, 0); + dbgfs_set_targets(ctx, 0, NULL); sprint_target_ids(ctx, buf, 64); KUNIT_EXPECT_STREQ(test, (char *)buf, "\n"); @@ -112,7 +94,6 @@ static void damon_dbgfs_test_set_targets static void damon_dbgfs_test_set_init_regions(struct kunit *test) { struct damon_ctx *ctx = damon_new_ctx(); - unsigned long ids[] = {1, 2, 3}; /* Each line represents one region in `` `` */ char * const valid_inputs[] = {"1 10 20\n 1 20 30\n1 35 45", "1 10 20\n", @@ -130,7 +111,7 @@ static void damon_dbgfs_test_set_init_re int i, rc; char buf[256]; - dbgfs_set_targets(ctx, ids, 3); + dbgfs_set_targets(ctx, 3, NULL); /* Put valid inputs and check the results */ for (i = 0; i < ARRAY_SIZE(valid_inputs); i++) { @@ -158,12 +139,12 @@ static void damon_dbgfs_test_set_init_re KUNIT_EXPECT_STREQ(test, (char *)buf, ""); } - dbgfs_set_targets(ctx, NULL, 0); + dbgfs_set_targets(ctx, 0, NULL); damon_destroy_ctx(ctx); } static struct kunit_case damon_test_cases[] = { - KUNIT_CASE(damon_dbgfs_test_str_to_target_ids), + KUNIT_CASE(damon_dbgfs_test_str_to_ints), KUNIT_CASE(damon_dbgfs_test_set_targets), KUNIT_CASE(damon_dbgfs_test_set_init_regions), {}, --- a/mm/damon/reclaim.c~mm-damon-remove-the-target-id-concept +++ a/mm/damon/reclaim.c @@ -387,8 +387,7 @@ static int __init damon_reclaim_init(voi damon_pa_set_primitives(ctx); ctx->callback.after_aggregation = damon_reclaim_after_aggregation; - /* 4242 means nothing but fun */ - target = damon_new_target(4242); + target = damon_new_target(); if (!target) { damon_destroy_ctx(ctx); return -ENOMEM; --- a/mm/damon/vaddr.c~mm-damon-remove-the-target-id-concept +++ a/mm/damon/vaddr.c @@ -23,12 +23,12 @@ #endif /* - * 't->id' should be the pointer to the relevant 'struct pid' having reference + * 't->pid' should be the pointer to the relevant 'struct pid' having reference * count. Caller must put the returned task, unless it is NULL. */ static inline struct task_struct *damon_get_task_struct(struct damon_target *t) { - return get_pid_task((struct pid *)t->id, PIDTYPE_PID); + return get_pid_task(t->pid, PIDTYPE_PID); } /* --- a/mm/damon/vaddr-test.h~mm-damon-remove-the-target-id-concept +++ a/mm/damon/vaddr-test.h @@ -139,7 +139,7 @@ static void damon_do_test_apply_three_re struct damon_region *r; int i; - t = damon_new_target(42); + t = damon_new_target(); for (i = 0; i < nr_regions / 2; i++) { r = damon_new_region(regions[i * 2], regions[i * 2 + 1]); damon_add_region(r, t); @@ -251,7 +251,7 @@ static void damon_test_apply_three_regio static void damon_test_split_evenly_fail(struct kunit *test, unsigned long start, unsigned long end, unsigned int nr_pieces) { - struct damon_target *t = damon_new_target(42); + struct damon_target *t = damon_new_target(); struct damon_region *r = damon_new_region(start, end); damon_add_region(r, t); @@ -270,7 +270,7 @@ static void damon_test_split_evenly_fail static void damon_test_split_evenly_succ(struct kunit *test, unsigned long start, unsigned long end, unsigned int nr_pieces) { - struct damon_target *t = damon_new_target(42); + struct damon_target *t = damon_new_target(); struct damon_region *r = damon_new_region(start, end); unsigned long expected_width = (end - start) / nr_pieces; unsigned long i = 0; From patchwork Tue Mar 22 21:48:43 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789297 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 29C03C433EF for ; Tue, 22 Mar 2022 21:49:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AB3B86B0213; Tue, 22 Mar 2022 17:49:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A61CF6B0246; Tue, 22 Mar 2022 17:49:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 83DE86B0248; Tue, 22 Mar 2022 17:49:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0024.hostedemail.com [216.40.44.24]) by kanga.kvack.org (Postfix) with ESMTP id 7215E6B0213 for ; Tue, 22 Mar 2022 17:49:27 -0400 (EDT) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 40E20A4DD9 for ; Tue, 22 Mar 2022 21:49:27 +0000 (UTC) X-FDA: 79273363974.22.F801C68 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf07.hostedemail.com (Postfix) with ESMTP id 3D1BC4001E for ; Tue, 22 Mar 2022 21:48:46 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 29E8DB81D59; Tue, 22 Mar 2022 21:48:45 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id DD6A9C340EC; Tue, 22 Mar 2022 21:48:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985724; bh=mUuROq10ahLPR03SHPG1MdCOWBxv+599rSB3EdvybYM=; h=Date:To:From:In-Reply-To:Subject:From; b=lKyB3klfkRYBZsMx+kfzEWYz1fAOXkpTXebbCnMOSwo8Jx9TH/dONH3VWnAkXwPe+ R2zuCsNhgJOSLNpFeSb2eXQlbVbb6d/2+kD2+C7W6+N19YiPRzVJ/oHyXjRAHYFv0B J3ChoBMbZU4J04CKaH+UFZ303yfDkgAYeuqdsuJQ= Date: Tue, 22 Mar 2022 14:48:43 -0700 To: sj@kernel.org,rientjes@google.com,linmiaohe@huawei.com,jrdr.linux@gmail.com,dan.carpenter@oracle.com,baolin.wang@linux.alibaba.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 201/227] mm/damon: remove redundant page validation Message-Id: <20220322214843.DD6A9C340EC@smtp.kernel.org> X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 3D1BC4001E X-Stat-Signature: 1d8c5kppmu954qmq1e6f4szefnccee55 Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=lKyB3klf; dmarc=none; spf=pass (imf07.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Rspam-User: X-HE-Tag: 1647985726-559716 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Baolin Wang Subject: mm/damon: remove redundant page validation It will never get a NULL page by pte_page() as discussed in thread [1], thus remove the redundant page validation to fix below Smatch static checker warning. mm/damon/vaddr.c:405 damon_hugetlb_mkold() warn: 'page' can't be NULL. [1] https://lore.kernel.org/linux-mm/20220106091200.GA14564@kili/ Link: https://lkml.kernel.org/r/6d32f7d201b8970d53f51b6c5717d472aed2987c.1642386715.git.baolin.wang@linux.alibaba.com Signed-off-by: Baolin Wang Reported-by: Dan Carpenter Reviewed-by: SeongJae Park Acked-by: David Rientjes Acked-by: Souptick Joarder Reviewed-by: Miaohe Lin Signed-off-by: Andrew Morton --- mm/damon/vaddr.c | 6 ------ 1 file changed, 6 deletions(-) --- a/mm/damon/vaddr.c~mm-damon-remove-redundant-page-validation +++ a/mm/damon/vaddr.c @@ -402,9 +402,6 @@ static void damon_hugetlb_mkold(pte_t *p pte_t entry = huge_ptep_get(pte); struct page *page = pte_page(entry); - if (!page) - return; - get_page(page); if (pte_young(entry)) { @@ -564,9 +561,6 @@ static int damon_young_hugetlb_entry(pte goto out; page = pte_page(entry); - if (!page) - goto out; - get_page(page); if (pte_young(entry) || !page_is_idle(page) || From patchwork Tue Mar 22 21:48:46 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789284 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2FB0FC433F5 for ; Tue, 22 Mar 2022 21:48:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BC7DC6B01F5; Tue, 22 Mar 2022 17:48:50 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B73316B01F6; Tue, 22 Mar 2022 17:48:50 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9ECFA6B01F7; Tue, 22 Mar 2022 17:48:50 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0018.hostedemail.com [216.40.44.18]) by kanga.kvack.org (Postfix) with ESMTP id 8EDE66B01F5 for ; Tue, 22 Mar 2022 17:48:50 -0400 (EDT) Received: from smtpin24.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 558ACA4DD9 for ; Tue, 22 Mar 2022 21:48:50 +0000 (UTC) X-FDA: 79273362420.24.AA7DB23 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf13.hostedemail.com (Postfix) with ESMTP id 953FB20026 for ; Tue, 22 Mar 2022 21:48:49 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 47249B81DC3; Tue, 22 Mar 2022 21:48:48 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id D6952C340EE; Tue, 22 Mar 2022 21:48:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985727; bh=9AMztZDwYA/GPqgogGvDHmHxliV1tL6fZ+OLkXmJ4RY=; h=Date:To:From:In-Reply-To:Subject:From; b=TezCyqTgkCzfZqPByCFbQbDnTjMZ0kaI66h1dQgLaa6DCo1nbt94nS4EK96MD65Zd PqOmyk8xk2x9YURmyU4lvX7bhvde7Q+h43bjKysf/Z4lBV5ZvQJ++6onYPbfxYku9q bVHO+F6IIGZch9ePItaEjJcrQpH02iVLZvVtKV2w= Date: Tue, 22 Mar 2022 14:48:46 -0700 To: xhao@linux.alibaba.com,rientjes@google.com,sj@kernel.org,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 202/227] mm/damon: rename damon_primitives to damon_operations Message-Id: <20220322214846.D6952C340EE@smtp.kernel.org> X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 953FB20026 X-Rspam-User: Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=TezCyqTg; dmarc=none; spf=pass (imf13.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Stat-Signature: rj3smjifdhmxdsrom1ffr8bnr61rmrpm X-HE-Tag: 1647985729-877170 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: SeongJae Park Subject: mm/damon: rename damon_primitives to damon_operations Patch series "Allow DAMON user code independent of monitoring primitives". In-kernel DAMON user code is required to configure the monitoring context (struct damon_ctx) with proper monitoring primitives (struct damon_primitive). This makes the user code dependent to all supporting monitoring primitives. For example, DAMON debugfs interface depends on both DAMON_VADDR and DAMON_PADDR, though some users have interest in only one use case. As more monitoring primitives are introduced, the problem will be bigger. To minimize such unnecessary dependency, this patchset makes monitoring primitives can be registered by the implemnting code and later dynamically searched and selected by the user code. In addition to that, this patchset renames monitoring primitives to monitoring operations, which is more easy to intuitively understand what it means and how it would be structed. This patch (of 8): DAMON has a set of callback functions called monitoring primitives and let it can be configured with various implementations for easy extension for different address spaces and usages. However, the word 'primitive' is not so explicit. Meanwhile, many other structs resembles similar purpose calls themselves 'operations'. To make the code easier to be understood, this commit renames 'damon_primitives' to 'damon_operations' before it is too late to rename. Link: https://lkml.kernel.org/r/20220215184603.1479-1-sj@kernel.org Link: https://lkml.kernel.org/r/20220215184603.1479-2-sj@kernel.org Signed-off-by: SeongJae Park Cc: Xin Hao Cc: David Rientjes Signed-off-by: Andrew Morton --- include/linux/damon.h | 48 ++++++------- mm/damon/Kconfig | 12 +-- mm/damon/Makefile | 4 - mm/damon/core.c | 65 +++++++++--------- mm/damon/dbgfs-test.h | 2 mm/damon/dbgfs.c | 10 +- mm/damon/{prmtv-common.c => ops-common.c} | 2 +- mm/damon/{prmtv-common.h => ops-common.h} | 0 mm/damon/paddr.c | 22 +++--- mm/damon/reclaim.c | 2 mm/damon/vaddr-test.h | 2 mm/damon/vaddr.c | 22 +++--- 14 files changed, 244 insertions(+), 243 deletions(-) --- a/include/linux/damon.h~mm-damon-rename-damon_primitives-to-damon_operations +++ a/include/linux/damon.h @@ -67,8 +67,8 @@ struct damon_region { * * Each monitoring context could have multiple targets. For example, a context * for virtual memory address spaces could have multiple target processes. The - * @pid should be set for appropriate address space monitoring primitives - * including the virtual address spaces monitoring primitives. + * @pid should be set for appropriate &struct damon_operations including the + * virtual address spaces monitoring operations. */ struct damon_target { struct pid *pid; @@ -120,9 +120,9 @@ enum damos_action { * uses smaller one as the effective quota. * * For selecting regions within the quota, DAMON prioritizes current scheme's - * target memory regions using the &struct damon_primitive->get_scheme_score. + * target memory regions using the &struct damon_operations->get_scheme_score. * You could customize the prioritization logic by setting &weight_sz, - * &weight_nr_accesses, and &weight_age, because monitoring primitives are + * &weight_nr_accesses, and &weight_age, because monitoring operations are * encouraged to respect those. */ struct damos_quota { @@ -256,10 +256,10 @@ struct damos { struct damon_ctx; /** - * struct damon_primitive - Monitoring primitives for given use cases. + * struct damon_operations - Monitoring operations for given use cases. * - * @init: Initialize primitive-internal data structures. - * @update: Update primitive-internal data structures. + * @init: Initialize operations-related data structures. + * @update: Update operations-related data structures. * @prepare_access_checks: Prepare next access check of target regions. * @check_accesses: Check the accesses to target regions. * @reset_aggregated: Reset aggregated accesses monitoring results. @@ -269,18 +269,18 @@ struct damon_ctx; * @cleanup: Clean up the context. * * DAMON can be extended for various address spaces and usages. For this, - * users should register the low level primitives for their target address - * space and usecase via the &damon_ctx.primitive. Then, the monitoring thread + * users should register the low level operations for their target address + * space and usecase via the &damon_ctx.ops. Then, the monitoring thread * (&damon_ctx.kdamond) calls @init and @prepare_access_checks before starting - * the monitoring, @update after each &damon_ctx.primitive_update_interval, and + * the monitoring, @update after each &damon_ctx.ops_update_interval, and * @check_accesses, @target_valid and @prepare_access_checks after each * &damon_ctx.sample_interval. Finally, @reset_aggregated is called after each * &damon_ctx.aggr_interval. * - * @init should initialize primitive-internal data structures. For example, + * @init should initialize operations-related data structures. For example, * this could be used to construct proper monitoring target regions and link * those to @damon_ctx.adaptive_targets. - * @update should update the primitive-internal data structures. For example, + * @update should update the operations-related data structures. For example, * this could be used to update monitoring target regions for current status. * @prepare_access_checks should manipulate the monitoring regions to be * prepared for the next access check. @@ -300,7 +300,7 @@ struct damon_ctx; * monitoring. * @cleanup is called from @kdamond just before its termination. */ -struct damon_primitive { +struct damon_operations { void (*init)(struct damon_ctx *context); void (*update)(struct damon_ctx *context); void (*prepare_access_checks)(struct damon_ctx *context); @@ -354,15 +354,15 @@ struct damon_callback { * * @sample_interval: The time between access samplings. * @aggr_interval: The time between monitor results aggregations. - * @primitive_update_interval: The time between monitoring primitive updates. + * @ops_update_interval: The time between monitoring operations updates. * * For each @sample_interval, DAMON checks whether each region is accessed or * not. It aggregates and keeps the access information (number of accesses to * each region) for @aggr_interval time. DAMON also checks whether the target * memory regions need update (e.g., by ``mmap()`` calls from the application, * in case of virtual memory monitoring) and applies the changes for each - * @primitive_update_interval. All time intervals are in micro-seconds. - * Please refer to &struct damon_primitive and &struct damon_callback for more + * @ops_update_interval. All time intervals are in micro-seconds. + * Please refer to &struct damon_operations and &struct damon_callback for more * detail. * * @kdamond: Kernel thread who does the monitoring. @@ -374,7 +374,7 @@ struct damon_callback { * * Once started, the monitoring thread runs until explicitly required to be * terminated or every monitoring target is invalid. The validity of the - * targets is checked via the &damon_primitive.target_valid of @primitive. The + * targets is checked via the &damon_operations.target_valid of @ops. The * termination can also be explicitly requested by writing non-zero to * @kdamond_stop. The thread sets @kdamond to NULL when it terminates. * Therefore, users can know whether the monitoring is ongoing or terminated by @@ -384,7 +384,7 @@ struct damon_callback { * Note that the monitoring thread protects only @kdamond and @kdamond_stop via * @kdamond_lock. Accesses to other fields must be protected by themselves. * - * @primitive: Set of monitoring primitives for given use cases. + * @ops: Set of monitoring operations for given use cases. * @callback: Set of callbacks for monitoring events notifications. * * @min_nr_regions: The minimum number of adaptive monitoring regions. @@ -395,17 +395,17 @@ struct damon_callback { struct damon_ctx { unsigned long sample_interval; unsigned long aggr_interval; - unsigned long primitive_update_interval; + unsigned long ops_update_interval; /* private: internal use only */ struct timespec64 last_aggregation; - struct timespec64 last_primitive_update; + struct timespec64 last_ops_update; /* public: */ struct task_struct *kdamond; struct mutex kdamond_lock; - struct damon_primitive primitive; + struct damon_operations ops; struct damon_callback callback; unsigned long min_nr_regions; @@ -484,7 +484,7 @@ unsigned int damon_nr_regions(struct dam struct damon_ctx *damon_new_ctx(void); void damon_destroy_ctx(struct damon_ctx *ctx); int damon_set_attrs(struct damon_ctx *ctx, unsigned long sample_int, - unsigned long aggr_int, unsigned long primitive_upd_int, + unsigned long aggr_int, unsigned long ops_upd_int, unsigned long min_nr_reg, unsigned long max_nr_reg); int damon_set_schemes(struct damon_ctx *ctx, struct damos **schemes, ssize_t nr_schemes); @@ -497,12 +497,12 @@ int damon_stop(struct damon_ctx **ctxs, #ifdef CONFIG_DAMON_VADDR bool damon_va_target_valid(void *t); -void damon_va_set_primitives(struct damon_ctx *ctx); +void damon_va_set_operations(struct damon_ctx *ctx); #endif /* CONFIG_DAMON_VADDR */ #ifdef CONFIG_DAMON_PADDR bool damon_pa_target_valid(void *t); -void damon_pa_set_primitives(struct damon_ctx *ctx); +void damon_pa_set_operations(struct damon_ctx *ctx); #endif /* CONFIG_DAMON_PADDR */ #endif /* _DAMON_H */ --- a/mm/damon/core.c~mm-damon-rename-damon_primitives-to-damon_operations +++ a/mm/damon/core.c @@ -204,10 +204,10 @@ struct damon_ctx *damon_new_ctx(void) ctx->sample_interval = 5 * 1000; ctx->aggr_interval = 100 * 1000; - ctx->primitive_update_interval = 60 * 1000 * 1000; + ctx->ops_update_interval = 60 * 1000 * 1000; ktime_get_coarse_ts64(&ctx->last_aggregation); - ctx->last_primitive_update = ctx->last_aggregation; + ctx->last_ops_update = ctx->last_aggregation; mutex_init(&ctx->kdamond_lock); @@ -224,8 +224,8 @@ static void damon_destroy_targets(struct { struct damon_target *t, *next_t; - if (ctx->primitive.cleanup) { - ctx->primitive.cleanup(ctx); + if (ctx->ops.cleanup) { + ctx->ops.cleanup(ctx); return; } @@ -250,7 +250,7 @@ void damon_destroy_ctx(struct damon_ctx * @ctx: monitoring context * @sample_int: time interval between samplings * @aggr_int: time interval between aggregations - * @primitive_upd_int: time interval between monitoring primitive updates + * @ops_upd_int: time interval between monitoring operations updates * @min_nr_reg: minimal number of regions * @max_nr_reg: maximum number of regions * @@ -260,7 +260,7 @@ void damon_destroy_ctx(struct damon_ctx * Return: 0 on success, negative error code otherwise. */ int damon_set_attrs(struct damon_ctx *ctx, unsigned long sample_int, - unsigned long aggr_int, unsigned long primitive_upd_int, + unsigned long aggr_int, unsigned long ops_upd_int, unsigned long min_nr_reg, unsigned long max_nr_reg) { if (min_nr_reg < 3) @@ -270,7 +270,7 @@ int damon_set_attrs(struct damon_ctx *ct ctx->sample_interval = sample_int; ctx->aggr_interval = aggr_int; - ctx->primitive_update_interval = primitive_upd_int; + ctx->ops_update_interval = ops_upd_int; ctx->min_nr_regions = min_nr_reg; ctx->max_nr_regions = max_nr_reg; @@ -516,10 +516,10 @@ static bool damos_valid_target(struct da { bool ret = __damos_valid_target(r, s); - if (!ret || !s->quota.esz || !c->primitive.get_scheme_score) + if (!ret || !s->quota.esz || !c->ops.get_scheme_score) return ret; - return c->primitive.get_scheme_score(c, t, r, s) >= s->quota.min_score; + return c->ops.get_scheme_score(c, t, r, s) >= s->quota.min_score; } static void damon_do_apply_schemes(struct damon_ctx *c, @@ -576,7 +576,7 @@ static void damon_do_apply_schemes(struc continue; /* Apply the scheme */ - if (c->primitive.apply_scheme) { + if (c->ops.apply_scheme) { if (quota->esz && quota->charged_sz + sz > quota->esz) { sz = ALIGN_DOWN(quota->esz - quota->charged_sz, @@ -586,7 +586,7 @@ static void damon_do_apply_schemes(struc damon_split_region_at(c, t, r, sz); } ktime_get_coarse_ts64(&begin); - sz_applied = c->primitive.apply_scheme(c, t, r, s); + sz_applied = c->ops.apply_scheme(c, t, r, s); ktime_get_coarse_ts64(&end); quota->total_charged_ns += timespec64_to_ns(&end) - timespec64_to_ns(&begin); @@ -660,7 +660,7 @@ static void kdamond_apply_schemes(struct damos_set_effective_quota(quota); } - if (!c->primitive.get_scheme_score) + if (!c->ops.get_scheme_score) continue; /* Fill up the score histogram */ @@ -669,7 +669,7 @@ static void kdamond_apply_schemes(struct damon_for_each_region(r, t) { if (!__damos_valid_target(r, s)) continue; - score = c->primitive.get_scheme_score( + score = c->ops.get_scheme_score( c, t, r, s); quota->histogram[score] += r->ar.end - r->ar.start; @@ -848,14 +848,15 @@ static void kdamond_split_regions(struct } /* - * Check whether it is time to check and apply the target monitoring regions + * Check whether it is time to check and apply the operations-related data + * structures. * * Returns true if it is. */ -static bool kdamond_need_update_primitive(struct damon_ctx *ctx) +static bool kdamond_need_update_operations(struct damon_ctx *ctx) { - return damon_check_reset_time_interval(&ctx->last_primitive_update, - ctx->primitive_update_interval); + return damon_check_reset_time_interval(&ctx->last_ops_update, + ctx->ops_update_interval); } /* @@ -873,11 +874,11 @@ static bool kdamond_need_stop(struct dam if (kthread_should_stop()) return true; - if (!ctx->primitive.target_valid) + if (!ctx->ops.target_valid) return false; damon_for_each_target(t, ctx) { - if (ctx->primitive.target_valid(t)) + if (ctx->ops.target_valid(t)) return false; } @@ -976,8 +977,8 @@ static int kdamond_fn(void *data) pr_debug("kdamond (%d) starts\n", current->pid); - if (ctx->primitive.init) - ctx->primitive.init(ctx); + if (ctx->ops.init) + ctx->ops.init(ctx); if (ctx->callback.before_start && ctx->callback.before_start(ctx)) done = true; @@ -987,16 +988,16 @@ static int kdamond_fn(void *data) if (kdamond_wait_activation(ctx)) continue; - if (ctx->primitive.prepare_access_checks) - ctx->primitive.prepare_access_checks(ctx); + if (ctx->ops.prepare_access_checks) + ctx->ops.prepare_access_checks(ctx); if (ctx->callback.after_sampling && ctx->callback.after_sampling(ctx)) done = true; kdamond_usleep(ctx->sample_interval); - if (ctx->primitive.check_accesses) - max_nr_accesses = ctx->primitive.check_accesses(ctx); + if (ctx->ops.check_accesses) + max_nr_accesses = ctx->ops.check_accesses(ctx); if (kdamond_aggregate_interval_passed(ctx)) { kdamond_merge_regions(ctx, @@ -1008,13 +1009,13 @@ static int kdamond_fn(void *data) kdamond_apply_schemes(ctx); kdamond_reset_aggregated(ctx); kdamond_split_regions(ctx); - if (ctx->primitive.reset_aggregated) - ctx->primitive.reset_aggregated(ctx); + if (ctx->ops.reset_aggregated) + ctx->ops.reset_aggregated(ctx); } - if (kdamond_need_update_primitive(ctx)) { - if (ctx->primitive.update) - ctx->primitive.update(ctx); + if (kdamond_need_update_operations(ctx)) { + if (ctx->ops.update) + ctx->ops.update(ctx); sz_limit = damon_region_sz_limit(ctx); } } @@ -1025,8 +1026,8 @@ static int kdamond_fn(void *data) if (ctx->callback.before_terminate) ctx->callback.before_terminate(ctx); - if (ctx->primitive.cleanup) - ctx->primitive.cleanup(ctx); + if (ctx->ops.cleanup) + ctx->ops.cleanup(ctx); pr_debug("kdamond (%d) finishes\n", current->pid); mutex_lock(&ctx->kdamond_lock); --- a/mm/damon/dbgfs.c~mm-damon-rename-damon_primitives-to-damon_operations +++ a/mm/damon/dbgfs.c @@ -56,7 +56,7 @@ static ssize_t dbgfs_attrs_read(struct f mutex_lock(&ctx->kdamond_lock); ret = scnprintf(kbuf, ARRAY_SIZE(kbuf), "%lu %lu %lu %lu %lu\n", ctx->sample_interval, ctx->aggr_interval, - ctx->primitive_update_interval, ctx->min_nr_regions, + ctx->ops_update_interval, ctx->min_nr_regions, ctx->max_nr_regions); mutex_unlock(&ctx->kdamond_lock); @@ -277,7 +277,7 @@ out: static inline bool target_has_pid(const struct damon_ctx *ctx) { - return ctx->primitive.target_valid == damon_va_target_valid; + return ctx->ops.target_valid == damon_va_target_valid; } static ssize_t sprint_target_ids(struct damon_ctx *ctx, char *buf, ssize_t len) @@ -477,9 +477,9 @@ static ssize_t dbgfs_target_ids_write(st /* Configure the context for the address space type */ if (id_is_pid) - damon_va_set_primitives(ctx); + damon_va_set_operations(ctx); else - damon_pa_set_primitives(ctx); + damon_pa_set_operations(ctx); ret = dbgfs_set_targets(ctx, nr_targets, target_pids); if (!ret) @@ -735,7 +735,7 @@ static struct damon_ctx *dbgfs_new_ctx(v if (!ctx) return NULL; - damon_va_set_primitives(ctx); + damon_va_set_operations(ctx); ctx->callback.before_terminate = dbgfs_before_terminate; return ctx; } --- a/mm/damon/dbgfs-test.h~mm-damon-rename-damon_primitives-to-damon_operations +++ a/mm/damon/dbgfs-test.h @@ -74,7 +74,7 @@ static void damon_dbgfs_test_set_targets char buf[64]; /* Make DAMON consider target has no pid */ - ctx->primitive = (struct damon_primitive){}; + ctx->ops = (struct damon_operations){}; dbgfs_set_targets(ctx, 0, NULL); sprint_target_ids(ctx, buf, 64); --- a/mm/damon/Kconfig~mm-damon-rename-damon_primitives-to-damon_operations +++ a/mm/damon/Kconfig @@ -25,27 +25,27 @@ config DAMON_KUNIT_TEST If unsure, say N. config DAMON_VADDR - bool "Data access monitoring primitives for virtual address spaces" + bool "Data access monitoring operations for virtual address spaces" depends on DAMON && MMU select PAGE_IDLE_FLAG help - This builds the default data access monitoring primitives for DAMON + This builds the default data access monitoring operations for DAMON that work for virtual address spaces. config DAMON_PADDR - bool "Data access monitoring primitives for the physical address space" + bool "Data access monitoring operations for the physical address space" depends on DAMON && MMU select PAGE_IDLE_FLAG help - This builds the default data access monitoring primitives for DAMON + This builds the default data access monitoring operations for DAMON that works for the physical address space. config DAMON_VADDR_KUNIT_TEST - bool "Test for DAMON primitives" if !KUNIT_ALL_TESTS + bool "Test for DAMON operations" if !KUNIT_ALL_TESTS depends on DAMON_VADDR && KUNIT=y default KUNIT_ALL_TESTS help - This builds the DAMON virtual addresses primitives Kunit test suite. + This builds the DAMON virtual addresses operations Kunit test suite. For more information on KUnit and unit tests in general, please refer to the KUnit documentation. --- a/mm/damon/Makefile~mm-damon-rename-damon_primitives-to-damon_operations +++ a/mm/damon/Makefile @@ -1,7 +1,7 @@ # SPDX-License-Identifier: GPL-2.0 obj-$(CONFIG_DAMON) := core.o -obj-$(CONFIG_DAMON_VADDR) += prmtv-common.o vaddr.o -obj-$(CONFIG_DAMON_PADDR) += prmtv-common.o paddr.o +obj-$(CONFIG_DAMON_VADDR) += ops-common.o vaddr.o +obj-$(CONFIG_DAMON_PADDR) += ops-common.o paddr.o obj-$(CONFIG_DAMON_DBGFS) += dbgfs.o obj-$(CONFIG_DAMON_RECLAIM) += reclaim.o --- /dev/null +++ a/mm/damon/ops-common.c @@ -0,0 +1,133 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Common Primitives for Data Access Monitoring + * + * Author: SeongJae Park + */ + +#include +#include +#include +#include + +#include "ops-common.h" + +/* + * Get an online page for a pfn if it's in the LRU list. Otherwise, returns + * NULL. + * + * The body of this function is stolen from the 'page_idle_get_page()'. We + * steal rather than reuse it because the code is quite simple. + */ +struct page *damon_get_page(unsigned long pfn) +{ + struct page *page = pfn_to_online_page(pfn); + + if (!page || !PageLRU(page) || !get_page_unless_zero(page)) + return NULL; + + if (unlikely(!PageLRU(page))) { + put_page(page); + page = NULL; + } + return page; +} + +void damon_ptep_mkold(pte_t *pte, struct mm_struct *mm, unsigned long addr) +{ + bool referenced = false; + struct page *page = damon_get_page(pte_pfn(*pte)); + + if (!page) + return; + + if (pte_young(*pte)) { + referenced = true; + *pte = pte_mkold(*pte); + } + +#ifdef CONFIG_MMU_NOTIFIER + if (mmu_notifier_clear_young(mm, addr, addr + PAGE_SIZE)) + referenced = true; +#endif /* CONFIG_MMU_NOTIFIER */ + + if (referenced) + set_page_young(page); + + set_page_idle(page); + put_page(page); +} + +void damon_pmdp_mkold(pmd_t *pmd, struct mm_struct *mm, unsigned long addr) +{ +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + bool referenced = false; + struct page *page = damon_get_page(pmd_pfn(*pmd)); + + if (!page) + return; + + if (pmd_young(*pmd)) { + referenced = true; + *pmd = pmd_mkold(*pmd); + } + +#ifdef CONFIG_MMU_NOTIFIER + if (mmu_notifier_clear_young(mm, addr, + addr + ((1UL) << HPAGE_PMD_SHIFT))) + referenced = true; +#endif /* CONFIG_MMU_NOTIFIER */ + + if (referenced) + set_page_young(page); + + set_page_idle(page); + put_page(page); +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ +} + +#define DAMON_MAX_SUBSCORE (100) +#define DAMON_MAX_AGE_IN_LOG (32) + +int damon_pageout_score(struct damon_ctx *c, struct damon_region *r, + struct damos *s) +{ + unsigned int max_nr_accesses; + int freq_subscore; + unsigned int age_in_sec; + int age_in_log, age_subscore; + unsigned int freq_weight = s->quota.weight_nr_accesses; + unsigned int age_weight = s->quota.weight_age; + int hotness; + + max_nr_accesses = c->aggr_interval / c->sample_interval; + freq_subscore = r->nr_accesses * DAMON_MAX_SUBSCORE / max_nr_accesses; + + age_in_sec = (unsigned long)r->age * c->aggr_interval / 1000000; + for (age_in_log = 0; age_in_log < DAMON_MAX_AGE_IN_LOG && age_in_sec; + age_in_log++, age_in_sec >>= 1) + ; + + /* If frequency is 0, higher age means it's colder */ + if (freq_subscore == 0) + age_in_log *= -1; + + /* + * Now age_in_log is in [-DAMON_MAX_AGE_IN_LOG, DAMON_MAX_AGE_IN_LOG]. + * Scale it to be in [0, 100] and set it as age subscore. + */ + age_in_log += DAMON_MAX_AGE_IN_LOG; + age_subscore = age_in_log * DAMON_MAX_SUBSCORE / + DAMON_MAX_AGE_IN_LOG / 2; + + hotness = (freq_weight * freq_subscore + age_weight * age_subscore); + if (freq_weight + age_weight) + hotness /= freq_weight + age_weight; + /* + * Transform it to fit in [0, DAMOS_MAX_SCORE] + */ + hotness = hotness * DAMOS_MAX_SCORE / DAMON_MAX_SUBSCORE; + + /* Return coldness of the region */ + return DAMOS_MAX_SCORE - hotness; +} --- /dev/null +++ a/mm/damon/ops-common.h @@ -0,0 +1,16 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Common Primitives for Data Access Monitoring + * + * Author: SeongJae Park + */ + +#include + +struct page *damon_get_page(unsigned long pfn); + +void damon_ptep_mkold(pte_t *pte, struct mm_struct *mm, unsigned long addr); +void damon_pmdp_mkold(pmd_t *pmd, struct mm_struct *mm, unsigned long addr); + +int damon_pageout_score(struct damon_ctx *c, struct damon_region *r, + struct damos *s); --- a/mm/damon/paddr.c~mm-damon-rename-damon_primitives-to-damon_operations +++ a/mm/damon/paddr.c @@ -14,7 +14,7 @@ #include #include "../internal.h" -#include "prmtv-common.h" +#include "ops-common.h" static bool __damon_pa_mkold(struct page *page, struct vm_area_struct *vma, unsigned long addr, void *arg) @@ -261,15 +261,15 @@ static int damon_pa_scheme_score(struct return DAMOS_MAX_SCORE; } -void damon_pa_set_primitives(struct damon_ctx *ctx) +void damon_pa_set_operations(struct damon_ctx *ctx) { - ctx->primitive.init = NULL; - ctx->primitive.update = NULL; - ctx->primitive.prepare_access_checks = damon_pa_prepare_access_checks; - ctx->primitive.check_accesses = damon_pa_check_accesses; - ctx->primitive.reset_aggregated = NULL; - ctx->primitive.target_valid = damon_pa_target_valid; - ctx->primitive.cleanup = NULL; - ctx->primitive.apply_scheme = damon_pa_apply_scheme; - ctx->primitive.get_scheme_score = damon_pa_scheme_score; + ctx->ops.init = NULL; + ctx->ops.update = NULL; + ctx->ops.prepare_access_checks = damon_pa_prepare_access_checks; + ctx->ops.check_accesses = damon_pa_check_accesses; + ctx->ops.reset_aggregated = NULL; + ctx->ops.target_valid = damon_pa_target_valid; + ctx->ops.cleanup = NULL; + ctx->ops.apply_scheme = damon_pa_apply_scheme; + ctx->ops.get_scheme_score = damon_pa_scheme_score; } --- a/mm/damon/prmtv-common.c +++ /dev/null @@ -1,133 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0 -/* - * Common Primitives for Data Access Monitoring - * - * Author: SeongJae Park - */ - -#include -#include -#include -#include - -#include "prmtv-common.h" - -/* - * Get an online page for a pfn if it's in the LRU list. Otherwise, returns - * NULL. - * - * The body of this function is stolen from the 'page_idle_get_page()'. We - * steal rather than reuse it because the code is quite simple. - */ -struct page *damon_get_page(unsigned long pfn) -{ - struct page *page = pfn_to_online_page(pfn); - - if (!page || !PageLRU(page) || !get_page_unless_zero(page)) - return NULL; - - if (unlikely(!PageLRU(page))) { - put_page(page); - page = NULL; - } - return page; -} - -void damon_ptep_mkold(pte_t *pte, struct mm_struct *mm, unsigned long addr) -{ - bool referenced = false; - struct page *page = damon_get_page(pte_pfn(*pte)); - - if (!page) - return; - - if (pte_young(*pte)) { - referenced = true; - *pte = pte_mkold(*pte); - } - -#ifdef CONFIG_MMU_NOTIFIER - if (mmu_notifier_clear_young(mm, addr, addr + PAGE_SIZE)) - referenced = true; -#endif /* CONFIG_MMU_NOTIFIER */ - - if (referenced) - set_page_young(page); - - set_page_idle(page); - put_page(page); -} - -void damon_pmdp_mkold(pmd_t *pmd, struct mm_struct *mm, unsigned long addr) -{ -#ifdef CONFIG_TRANSPARENT_HUGEPAGE - bool referenced = false; - struct page *page = damon_get_page(pmd_pfn(*pmd)); - - if (!page) - return; - - if (pmd_young(*pmd)) { - referenced = true; - *pmd = pmd_mkold(*pmd); - } - -#ifdef CONFIG_MMU_NOTIFIER - if (mmu_notifier_clear_young(mm, addr, - addr + ((1UL) << HPAGE_PMD_SHIFT))) - referenced = true; -#endif /* CONFIG_MMU_NOTIFIER */ - - if (referenced) - set_page_young(page); - - set_page_idle(page); - put_page(page); -#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ -} - -#define DAMON_MAX_SUBSCORE (100) -#define DAMON_MAX_AGE_IN_LOG (32) - -int damon_pageout_score(struct damon_ctx *c, struct damon_region *r, - struct damos *s) -{ - unsigned int max_nr_accesses; - int freq_subscore; - unsigned int age_in_sec; - int age_in_log, age_subscore; - unsigned int freq_weight = s->quota.weight_nr_accesses; - unsigned int age_weight = s->quota.weight_age; - int hotness; - - max_nr_accesses = c->aggr_interval / c->sample_interval; - freq_subscore = r->nr_accesses * DAMON_MAX_SUBSCORE / max_nr_accesses; - - age_in_sec = (unsigned long)r->age * c->aggr_interval / 1000000; - for (age_in_log = 0; age_in_log < DAMON_MAX_AGE_IN_LOG && age_in_sec; - age_in_log++, age_in_sec >>= 1) - ; - - /* If frequency is 0, higher age means it's colder */ - if (freq_subscore == 0) - age_in_log *= -1; - - /* - * Now age_in_log is in [-DAMON_MAX_AGE_IN_LOG, DAMON_MAX_AGE_IN_LOG]. - * Scale it to be in [0, 100] and set it as age subscore. - */ - age_in_log += DAMON_MAX_AGE_IN_LOG; - age_subscore = age_in_log * DAMON_MAX_SUBSCORE / - DAMON_MAX_AGE_IN_LOG / 2; - - hotness = (freq_weight * freq_subscore + age_weight * age_subscore); - if (freq_weight + age_weight) - hotness /= freq_weight + age_weight; - /* - * Transform it to fit in [0, DAMOS_MAX_SCORE] - */ - hotness = hotness * DAMOS_MAX_SCORE / DAMON_MAX_SUBSCORE; - - /* Return coldness of the region */ - return DAMOS_MAX_SCORE - hotness; -} --- a/mm/damon/prmtv-common.h +++ /dev/null @@ -1,16 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ -/* - * Common Primitives for Data Access Monitoring - * - * Author: SeongJae Park - */ - -#include - -struct page *damon_get_page(unsigned long pfn); - -void damon_ptep_mkold(pte_t *pte, struct mm_struct *mm, unsigned long addr); -void damon_pmdp_mkold(pmd_t *pmd, struct mm_struct *mm, unsigned long addr); - -int damon_pageout_score(struct damon_ctx *c, struct damon_region *r, - struct damos *s); --- a/mm/damon/reclaim.c~mm-damon-rename-damon_primitives-to-damon_operations +++ a/mm/damon/reclaim.c @@ -384,7 +384,7 @@ static int __init damon_reclaim_init(voi if (!ctx) return -ENOMEM; - damon_pa_set_primitives(ctx); + damon_pa_set_operations(ctx); ctx->callback.after_aggregation = damon_reclaim_after_aggregation; target = damon_new_target(); --- a/mm/damon/vaddr.c~mm-damon-rename-damon_primitives-to-damon_operations +++ a/mm/damon/vaddr.c @@ -15,7 +15,7 @@ #include #include -#include "prmtv-common.h" +#include "ops-common.h" #ifdef CONFIG_DAMON_VADDR_KUNIT_TEST #undef DAMON_MIN_REGION @@ -739,17 +739,17 @@ static int damon_va_scheme_score(struct return DAMOS_MAX_SCORE; } -void damon_va_set_primitives(struct damon_ctx *ctx) +void damon_va_set_operations(struct damon_ctx *ctx) { - ctx->primitive.init = damon_va_init; - ctx->primitive.update = damon_va_update; - ctx->primitive.prepare_access_checks = damon_va_prepare_access_checks; - ctx->primitive.check_accesses = damon_va_check_accesses; - ctx->primitive.reset_aggregated = NULL; - ctx->primitive.target_valid = damon_va_target_valid; - ctx->primitive.cleanup = NULL; - ctx->primitive.apply_scheme = damon_va_apply_scheme; - ctx->primitive.get_scheme_score = damon_va_scheme_score; + ctx->ops.init = damon_va_init; + ctx->ops.update = damon_va_update; + ctx->ops.prepare_access_checks = damon_va_prepare_access_checks; + ctx->ops.check_accesses = damon_va_check_accesses; + ctx->ops.reset_aggregated = NULL; + ctx->ops.target_valid = damon_va_target_valid; + ctx->ops.cleanup = NULL; + ctx->ops.apply_scheme = damon_va_apply_scheme; + ctx->ops.get_scheme_score = damon_va_scheme_score; } #include "vaddr-test.h" --- a/mm/damon/vaddr-test.h~mm-damon-rename-damon_primitives-to-damon_operations +++ a/mm/damon/vaddr-test.h @@ -314,7 +314,7 @@ static struct kunit_case damon_test_case }; static struct kunit_suite damon_test_suite = { - .name = "damon-primitives", + .name = "damon-operations", .test_cases = damon_test_cases, }; kunit_test_suite(damon_test_suite); From patchwork Tue Mar 22 21:48:49 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789285 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 72470C433EF for ; Tue, 22 Mar 2022 21:48:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 006076B01F7; Tue, 22 Mar 2022 17:48:52 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EF7096B01F9; Tue, 22 Mar 2022 17:48:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CFB166B01F8; Tue, 22 Mar 2022 17:48:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.28]) by kanga.kvack.org (Postfix) with ESMTP id C1A196B01F6 for ; Tue, 22 Mar 2022 17:48:51 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 9ACF3161A for ; Tue, 22 Mar 2022 21:48:51 +0000 (UTC) X-FDA: 79273362462.04.2EFDDCF Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf30.hostedemail.com (Postfix) with ESMTP id 230EA80027 for ; Tue, 22 Mar 2022 21:48:50 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 8EB3B6175D; Tue, 22 Mar 2022 21:48:50 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id E5CEBC340EC; Tue, 22 Mar 2022 21:48:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985730; bh=LgGIcu5sZLkQjeeebEvOWxn84gg2PYy3JGDCogtNaww=; h=Date:To:From:In-Reply-To:Subject:From; b=M8y7oWejqdkmyZZF71TECrjMnk/5yPTufUEPuCqmPWFwarszoIqSWwtaPUW9R8z1W 10kRzOJOf1ISkjEOyxMcrh+Q9vRr42hdaOhttqmtTE05LRAy8UDeLXB9O1Q1PJ/mLQ FJ0rDvmkYmifgFMGK6sGYF/qDu0f/Df7yXW1numw= Date: Tue, 22 Mar 2022 14:48:49 -0700 To: xhao@linux.alibaba.com,rientjes@google.com,sj@kernel.org,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 203/227] mm/damon: let monitoring operations can be registered and selected Message-Id: <20220322214849.E5CEBC340EC@smtp.kernel.org> X-Rspam-User: X-Stat-Signature: kdr16ub4s3gegqzkcxzcec97tigf76ht Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=M8y7oWej; spf=pass (imf30.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 230EA80027 X-HE-Tag: 1647985730-888890 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: SeongJae Park Subject: mm/damon: let monitoring operations can be registered and selected In-kernel DAMON user code like DAMON debugfs interface should set 'struct damon_operations' of its 'struct damon_ctx' on its own. Therefore, the client code should depend on all supporting monitoring operations implementations that it could use. For example, DAMON debugfs interface depends on both vaddr and paddr, while some of the users are not always interested in both. To minimize such unnecessary dependencies, this commit makes the monitoring operations can be registered by implementing code and then dynamically selected by the user code without build-time dependency. Link: https://lkml.kernel.org/r/20220215184603.1479-3-sj@kernel.org Signed-off-by: SeongJae Park Cc: David Rientjes Cc: Xin Hao Signed-off-by: Andrew Morton --- include/linux/damon.h | 18 ++++++++++ mm/damon/core.c | 66 ++++++++++++++++++++++++++++++++++++++++ 2 files changed, 84 insertions(+) --- a/include/linux/damon.h~mm-damon-let-monitoring-operations-can-be-registered-and-selected +++ a/include/linux/damon.h @@ -253,11 +253,24 @@ struct damos { struct list_head list; }; +/** + * enum damon_ops_id - Identifier for each monitoring operations implementation + * + * @DAMON_OPS_VADDR: Monitoring operations for virtual address spaces + * @DAMON_OPS_PADDR: Monitoring operations for the physical address space + */ +enum damon_ops_id { + DAMON_OPS_VADDR, + DAMON_OPS_PADDR, + NR_DAMON_OPS, +}; + struct damon_ctx; /** * struct damon_operations - Monitoring operations for given use cases. * + * @id: Identifier of this operations set. * @init: Initialize operations-related data structures. * @update: Update operations-related data structures. * @prepare_access_checks: Prepare next access check of target regions. @@ -277,6 +290,8 @@ struct damon_ctx; * &damon_ctx.sample_interval. Finally, @reset_aggregated is called after each * &damon_ctx.aggr_interval. * + * Each &struct damon_operations instance having valid @id can be registered + * via damon_register_ops() and selected by damon_select_ops() later. * @init should initialize operations-related data structures. For example, * this could be used to construct proper monitoring target regions and link * those to @damon_ctx.adaptive_targets. @@ -301,6 +316,7 @@ struct damon_ctx; * @cleanup is called from @kdamond just before its termination. */ struct damon_operations { + enum damon_ops_id id; void (*init)(struct damon_ctx *context); void (*update)(struct damon_ctx *context); void (*prepare_access_checks)(struct damon_ctx *context); @@ -489,6 +505,8 @@ int damon_set_attrs(struct damon_ctx *ct int damon_set_schemes(struct damon_ctx *ctx, struct damos **schemes, ssize_t nr_schemes); int damon_nr_running_ctxs(void); +int damon_register_ops(struct damon_operations *ops); +int damon_select_ops(struct damon_ctx *ctx, enum damon_ops_id id); int damon_start(struct damon_ctx **ctxs, int nr_ctxs); int damon_stop(struct damon_ctx **ctxs, int nr_ctxs); --- a/mm/damon/core.c~mm-damon-let-monitoring-operations-can-be-registered-and-selected +++ a/mm/damon/core.c @@ -25,6 +25,72 @@ static DEFINE_MUTEX(damon_lock); static int nr_running_ctxs; +static DEFINE_MUTEX(damon_ops_lock); +static struct damon_operations damon_registered_ops[NR_DAMON_OPS]; + +/* Should be called under damon_ops_lock with id smaller than NR_DAMON_OPS */ +static bool damon_registered_ops_id(enum damon_ops_id id) +{ + struct damon_operations empty_ops = {}; + + if (!memcmp(&empty_ops, &damon_registered_ops[id], sizeof(empty_ops))) + return false; + return true; +} + +/** + * damon_register_ops() - Register a monitoring operations set to DAMON. + * @ops: monitoring operations set to register. + * + * This function registers a monitoring operations set of valid &struct + * damon_operations->id so that others can find and use them later. + * + * Return: 0 on success, negative error code otherwise. + */ +int damon_register_ops(struct damon_operations *ops) +{ + int err = 0; + + if (ops->id >= NR_DAMON_OPS) + return -EINVAL; + mutex_lock(&damon_ops_lock); + /* Fail for already registered ops */ + if (damon_registered_ops_id(ops->id)) { + err = -EINVAL; + goto out; + } + damon_registered_ops[ops->id] = *ops; +out: + mutex_unlock(&damon_ops_lock); + return err; +} + +/** + * damon_select_ops() - Select a monitoring operations to use with the context. + * @ctx: monitoring context to use the operations. + * @id: id of the registered monitoring operations to select. + * + * This function finds registered monitoring operations set of @id and make + * @ctx to use it. + * + * Return: 0 on success, negative error code otherwise. + */ +int damon_select_ops(struct damon_ctx *ctx, enum damon_ops_id id) +{ + int err = 0; + + if (id >= NR_DAMON_OPS) + return -EINVAL; + + mutex_lock(&damon_ops_lock); + if (!damon_registered_ops_id(id)) + err = -EINVAL; + else + ctx->ops = damon_registered_ops[id]; + mutex_unlock(&damon_ops_lock); + return err; +} + /* * Construct a damon_region struct * From patchwork Tue Mar 22 21:48:52 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789286 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0765AC433EF for ; Tue, 22 Mar 2022 21:48:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 904766B01F9; Tue, 22 Mar 2022 17:48:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8B54B6B01FA; Tue, 22 Mar 2022 17:48:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7A3466B01FB; Tue, 22 Mar 2022 17:48:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.25]) by kanga.kvack.org (Postfix) with ESMTP id 6F15E6B01F9 for ; Tue, 22 Mar 2022 17:48:56 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 44BB521DBD for ; Tue, 22 Mar 2022 21:48:56 +0000 (UTC) X-FDA: 79273362672.15.BDE5B3A Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf18.hostedemail.com (Postfix) with ESMTP id C06C41C0032 for ; Tue, 22 Mar 2022 21:48:55 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 5A816B81DB7; Tue, 22 Mar 2022 21:48:54 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id E3696C340EE; Tue, 22 Mar 2022 21:48:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985733; bh=tOQnzJ88rpOy/7gqFaTnVjvoh2gUiMPmtDcK0bJDp/E=; h=Date:To:From:In-Reply-To:Subject:From; b=jr9JdEtgvbT762Anr/iQDJ2MKgpLMVIBSbvkitCkSJLoG9oe6jJLOE4jJsSJl8vZJ B9t/3s90d1qpvNLEH6kmRnuDxJRc4dNAC6lbxXKX90YLlUnl39SA36Bjs5Q8Pi7Vlt dwCxTybF+82YJA/EKu/IvDXc6NGiHDSjnj2UyQKY= Date: Tue, 22 Mar 2022 14:48:52 -0700 To: xhao@linux.alibaba.com,rientjes@google.com,sj@kernel.org,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 204/227] mm/damon/paddr,vaddr: register themselves to DAMON in subsys_initcall Message-Id: <20220322214852.E3696C340EE@smtp.kernel.org> X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: C06C41C0032 X-Rspam-User: Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=jr9JdEtg; dmarc=none; spf=pass (imf18.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Stat-Signature: mwarp5gz5tefkc69d95cy6xfm3cb8nji X-HE-Tag: 1647985735-267400 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: SeongJae Park Subject: mm/damon/paddr,vaddr: register themselves to DAMON in subsys_initcall This commit makes the monitoring operations for the physical address space and virtual address spaces register themselves to DAMON in the subsys_initcall step. Later, in-kernel DAMON user code can use them via damon_select_ops() without have to unnecessarily depend on all possible monitoring operations implementations. Link: https://lkml.kernel.org/r/20220215184603.1479-4-sj@kernel.org Signed-off-by: SeongJae Park Cc: David Rientjes Cc: Xin Hao Signed-off-by: Andrew Morton --- mm/damon/paddr.c | 20 ++++++++++++++++++++ mm/damon/vaddr.c | 20 ++++++++++++++++++++ 2 files changed, 40 insertions(+) --- a/mm/damon/paddr.c~mm-damon-paddrvaddr-register-themselves-to-damon-in-subsys_initcall +++ a/mm/damon/paddr.c @@ -273,3 +273,23 @@ void damon_pa_set_operations(struct damo ctx->ops.apply_scheme = damon_pa_apply_scheme; ctx->ops.get_scheme_score = damon_pa_scheme_score; } + +static int __init damon_pa_initcall(void) +{ + struct damon_operations ops = { + .id = DAMON_OPS_PADDR, + .init = NULL, + .update = NULL, + .prepare_access_checks = damon_pa_prepare_access_checks, + .check_accesses = damon_pa_check_accesses, + .reset_aggregated = NULL, + .target_valid = damon_pa_target_valid, + .cleanup = NULL, + .apply_scheme = damon_pa_apply_scheme, + .get_scheme_score = damon_pa_scheme_score, + }; + + return damon_register_ops(&ops); +}; + +subsys_initcall(damon_pa_initcall); --- a/mm/damon/vaddr.c~mm-damon-paddrvaddr-register-themselves-to-damon-in-subsys_initcall +++ a/mm/damon/vaddr.c @@ -752,4 +752,24 @@ void damon_va_set_operations(struct damo ctx->ops.get_scheme_score = damon_va_scheme_score; } +static int __init damon_va_initcall(void) +{ + struct damon_operations ops = { + .id = DAMON_OPS_VADDR, + .init = damon_va_init, + .update = damon_va_update, + .prepare_access_checks = damon_va_prepare_access_checks, + .check_accesses = damon_va_check_accesses, + .reset_aggregated = NULL, + .target_valid = damon_va_target_valid, + .cleanup = NULL, + .apply_scheme = damon_va_apply_scheme, + .get_scheme_score = damon_va_scheme_score, + }; + + return damon_register_ops(&ops); +}; + +subsys_initcall(damon_va_initcall); + #include "vaddr-test.h" From patchwork Tue Mar 22 21:48:55 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789287 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6417DC433F5 for ; Tue, 22 Mar 2022 21:48:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DE7F56B01FB; Tue, 22 Mar 2022 17:48:57 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CAD956B01FC; Tue, 22 Mar 2022 17:48:57 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B73B66B01FE; Tue, 22 Mar 2022 17:48:57 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0244.hostedemail.com [216.40.44.244]) by kanga.kvack.org (Postfix) with ESMTP id 9AF366B01FB for ; Tue, 22 Mar 2022 17:48:57 -0400 (EDT) Received: from smtpin30.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 5446118289DE1 for ; Tue, 22 Mar 2022 21:48:57 +0000 (UTC) X-FDA: 79273362714.30.4BF6440 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf31.hostedemail.com (Postfix) with ESMTP id F109D20010 for ; Tue, 22 Mar 2022 21:48:56 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 753F96149C; Tue, 22 Mar 2022 21:48:56 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id CD3A7C340EC; Tue, 22 Mar 2022 21:48:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985735; bh=SkZx00q+H6WrsqNnmJVvB41rO8MrGlg/DGNSTgWKqr4=; h=Date:To:From:In-Reply-To:Subject:From; b=JNjE5yN9q6vkWbdtDCRLpiVfJOVzGXT/4TmYtaeqhrPvIK0+IkLO8T6d2P9qIQPNm 0I3xC3FFGvpROwMl1BsgKidWpd2WQ2GnWsECuVkYq7Ehxba1sA9hmFydeF8tRiDRf6 eaVO6pyPpilqYVqjpqBoHhWyENG5Ssy1yHmMRsO8= Date: Tue, 22 Mar 2022 14:48:55 -0700 To: xhao@linux.alibaba.com,rientjes@google.com,sj@kernel.org,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 205/227] mm/damon/reclaim: use damon_select_ops() instead of damon_{v,p}a_set_operations() Message-Id: <20220322214855.CD3A7C340EC@smtp.kernel.org> X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: F109D20010 X-Stat-Signature: yqs7yp5x4n7d9bjou6xsisx96q6t1irm Authentication-Results: imf31.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=JNjE5yN9; dmarc=none; spf=pass (imf31.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Rspam-User: X-HE-Tag: 1647985736-992838 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: SeongJae Park Subject: mm/damon/reclaim: use damon_select_ops() instead of damon_{v,p}a_set_operations() This commit makes DAMON_RECLAIM to select the registered monitoring operations for the physical address space instead of setting it on its own. This allows DAMON_RECLAIM be independent of DAMON_PADDR, but leave the dependency as is, because it's the only one monitoring operations it use, and therefore it makes no sense to build DAMON_RECLAIM without DAMON_PADDR. Link: https://lkml.kernel.org/r/20220215184603.1479-5-sj@kernel.org Signed-off-by: SeongJae Park Cc: David Rientjes Cc: Xin Hao Signed-off-by: Andrew Morton --- mm/damon/reclaim.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) --- a/mm/damon/reclaim.c~mm-damon-reclaim-use-damon_select_ops-instead-of-damon_vpa_set_operations +++ a/mm/damon/reclaim.c @@ -384,7 +384,9 @@ static int __init damon_reclaim_init(voi if (!ctx) return -ENOMEM; - damon_pa_set_operations(ctx); + if (damon_select_ops(ctx, DAMON_OPS_PADDR)) + return -EINVAL; + ctx->callback.after_aggregation = damon_reclaim_after_aggregation; target = damon_new_target(); From patchwork Tue Mar 22 21:48:58 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789288 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6EE0EC4321E for ; Tue, 22 Mar 2022 21:49:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0045F6B01FE; Tue, 22 Mar 2022 17:49:02 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EF80C6B01FF; Tue, 22 Mar 2022 17:49:01 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DBE876B0200; Tue, 22 Mar 2022 17:49:01 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.26]) by kanga.kvack.org (Postfix) with ESMTP id CC6FF6B01FE for ; Tue, 22 Mar 2022 17:49:01 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay13.hostedemail.com (Postfix) with ESMTP id 747F76098F for ; Tue, 22 Mar 2022 21:49:01 +0000 (UTC) X-FDA: 79273362882.07.E2E2F43 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf05.hostedemail.com (Postfix) with ESMTP id 25AD3100028 for ; Tue, 22 Mar 2022 21:49:01 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 233DCB81DAB; Tue, 22 Mar 2022 21:49:00 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id B6FD9C340EC; Tue, 22 Mar 2022 21:48:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985738; bh=e18K+wm4UBcrwGriRTqtVE5w6iKB6Kmcvk6mybDaurQ=; h=Date:To:From:In-Reply-To:Subject:From; b=BQsMkZSvlGwTxyAu1inbWaexq1ztJIrBQo2LUGStVUKdqc9Lcww3Hu2WssrrHTb1q gzyCM9HJi7e3GvxOHwlagKr4j9K02l0MnaLzO4RCiUcOHbEpEQAEbajrVSfJVvtAL7 n4nEVdwSjKLENh92pziyVeCVg9zj9bDorg2V73g4= Date: Tue, 22 Mar 2022 14:48:58 -0700 To: xhao@linux.alibaba.com,rientjes@google.com,sj@kernel.org,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 206/227] mm/damon/dbgfs: use damon_select_ops() instead of damon_{v,p}a_set_operations() Message-Id: <20220322214858.B6FD9C340EC@smtp.kernel.org> X-Rspam-User: X-Stat-Signature: 3cjbthr8anq5oztreb7bpr93n6ejawad Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=BQsMkZSv; spf=pass (imf05.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 25AD3100028 X-HE-Tag: 1647985741-153136 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: SeongJae Park Subject: mm/damon/dbgfs: use damon_select_ops() instead of damon_{v,p}a_set_operations() This commit makes DAMON debugfs interface to select the registered monitoring operations for the physical address space or virtual address spaces depending on user requests instead of setting it on its own. Note that DAMON debugfs interface is still dependent to DAMON_VADDR with this change, because it is also using its symbol, 'damon_va_target_valid'. Link: https://lkml.kernel.org/r/20220215184603.1479-6-sj@kernel.org Signed-off-by: SeongJae Park Cc: David Rientjes Cc: Xin Hao Signed-off-by: Andrew Morton --- mm/damon/dbgfs.c | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-) --- a/mm/damon/dbgfs.c~mm-damon-dbgfs-use-damon_select_ops-instead-of-damon_vpa_set_operations +++ a/mm/damon/dbgfs.c @@ -474,12 +474,18 @@ static ssize_t dbgfs_target_ids_write(st /* remove previously set targets */ dbgfs_set_targets(ctx, 0, NULL); + if (!nr_targets) { + ret = count; + goto unlock_out; + } /* Configure the context for the address space type */ if (id_is_pid) - damon_va_set_operations(ctx); + ret = damon_select_ops(ctx, DAMON_OPS_VADDR); else - damon_pa_set_operations(ctx); + ret = damon_select_ops(ctx, DAMON_OPS_PADDR); + if (ret) + goto unlock_out; ret = dbgfs_set_targets(ctx, nr_targets, target_pids); if (!ret) @@ -735,7 +741,11 @@ static struct damon_ctx *dbgfs_new_ctx(v if (!ctx) return NULL; - damon_va_set_operations(ctx); + if (damon_select_ops(ctx, DAMON_OPS_VADDR) && damon_select_ops(ctx, + DAMON_OPS_PADDR)) { + damon_destroy_ctx(ctx); + return NULL; + } ctx->callback.before_terminate = dbgfs_before_terminate; return ctx; } From patchwork Tue Mar 22 21:49:01 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789289 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BD82BC433FE for ; Tue, 22 Mar 2022 21:49:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4C09F6B0200; Tue, 22 Mar 2022 17:49:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 46F0D6B0201; Tue, 22 Mar 2022 17:49:05 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 35D5C6B0202; Tue, 22 Mar 2022 17:49:05 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0139.hostedemail.com [216.40.44.139]) by kanga.kvack.org (Postfix) with ESMTP id 20E016B0200 for ; Tue, 22 Mar 2022 17:49:05 -0400 (EDT) Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id D02CD18222A0D for ; Tue, 22 Mar 2022 21:49:04 +0000 (UTC) X-FDA: 79273363008.29.A5EBC07 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf10.hostedemail.com (Postfix) with ESMTP id 5E8EEC003C for ; Tue, 22 Mar 2022 21:49:04 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 17922B81DC7; Tue, 22 Mar 2022 21:49:03 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id AB2F3C340EC; Tue, 22 Mar 2022 21:49:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985741; bh=DRr0rWYhaEq6qsecV5yL2psCKpJTreoqVr9aKJ7yFXQ=; h=Date:To:From:In-Reply-To:Subject:From; b=Mcboy7eVMW1p1tbUmogolhgHFfkDxNs6flSV4BFJUSBUv1SQSVcrkBnnG6xDunAH2 hC8mGvtrVGmgWbTG63EGCIECHTjj4R7Aw1jeJuC+6bwA8wtvniCVIy3pyXofvFY97J KAHaapVZ9qWsUoEdktNFRWYVMyYjjBYotOz7V+is= Date: Tue, 22 Mar 2022 14:49:01 -0700 To: xhao@linux.alibaba.com,rientjes@google.com,sj@kernel.org,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 207/227] mm/damon/dbgfs: use operations id for knowing if the target has pid Message-Id: <20220322214901.AB2F3C340EC@smtp.kernel.org> X-Rspam-User: X-Stat-Signature: fnh7xu1qeeycjp59mwnkcyewyurcbyj5 Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=Mcboy7eV; spf=pass (imf10.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 5E8EEC003C X-HE-Tag: 1647985744-891568 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: SeongJae Park Subject: mm/damon/dbgfs: use operations id for knowing if the target has pid DAMON debugfs interface depends on monitoring operations for virtual address spaces because it knows if the target has pid or not by seeing if the context is configured to use one of the virtual address space monitoring operation functions. We can replace that check with 'enum damon_ops_id' now, to make it independent. This commit makes the change. Link: https://lkml.kernel.org/r/20220215184603.1479-7-sj@kernel.org Signed-off-by: SeongJae Park Cc: David Rientjes Cc: Xin Hao Signed-off-by: Andrew Morton --- mm/damon/dbgfs.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) --- a/mm/damon/dbgfs.c~mm-damon-dbgfs-use-operations-id-for-knowing-if-the-target-has-pid +++ a/mm/damon/dbgfs.c @@ -277,7 +277,7 @@ out: static inline bool target_has_pid(const struct damon_ctx *ctx) { - return ctx->ops.target_valid == damon_va_target_valid; + return ctx->ops.id == DAMON_OPS_VADDR; } static ssize_t sprint_target_ids(struct damon_ctx *ctx, char *buf, ssize_t len) @@ -741,8 +741,8 @@ static struct damon_ctx *dbgfs_new_ctx(v if (!ctx) return NULL; - if (damon_select_ops(ctx, DAMON_OPS_VADDR) && damon_select_ops(ctx, - DAMON_OPS_PADDR)) { + if (damon_select_ops(ctx, DAMON_OPS_VADDR) && + damon_select_ops(ctx, DAMON_OPS_PADDR)) { damon_destroy_ctx(ctx); return NULL; } From patchwork Tue Mar 22 21:49:04 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789290 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 82226C433F5 for ; Tue, 22 Mar 2022 21:49:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1FEDC6B0202; Tue, 22 Mar 2022 17:49:08 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1AF2E6B0203; Tue, 22 Mar 2022 17:49:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0A0A96B0204; Tue, 22 Mar 2022 17:49:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.a.hostedemail.com [64.99.140.24]) by kanga.kvack.org (Postfix) with ESMTP id EAC316B0202 for ; Tue, 22 Mar 2022 17:49:07 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id C75BE23116 for ; Tue, 22 Mar 2022 21:49:07 +0000 (UTC) X-FDA: 79273363134.15.5B47ADD Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf22.hostedemail.com (Postfix) with ESMTP id 2D23FC0030 for ; Tue, 22 Mar 2022 21:49:07 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 20EAAB81DB7; Tue, 22 Mar 2022 21:49:06 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id ADEB7C340EC; Tue, 22 Mar 2022 21:49:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985744; bh=0WpgQobUTiVwtxV7NFf+Y6SWIqqNGRsmY/UVDIUiHD8=; h=Date:To:From:In-Reply-To:Subject:From; b=FKNNtWmWsjsos854FKoq8mDUk0ZwAPy16h2NL5sRpeUeI6Yy70TfY0sKD/2K8R5w/ 4FlyX6wTlQnRu5N5Kyix/JGOBq+e1nLSwBLRIYYE+W7egLF5Rpi6OkBkiohcUdNuji I6HU/DTJP0l7HrnR5vOvf+LMowiPJo97ZBXOHh8k= Date: Tue, 22 Mar 2022 14:49:04 -0700 To: xhao@linux.alibaba.com,rientjes@google.com,sj@kernel.org,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 208/227] mm/damon/dbgfs-test: fix is_target_id() change Message-Id: <20220322214904.ADEB7C340EC@smtp.kernel.org> X-Stat-Signature: ko3wkmx4eex5rfmn8rqjabc7j3x437rx X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 2D23FC0030 Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=FKNNtWmW; dmarc=none; spf=pass (imf22.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Rspam-User: X-HE-Tag: 1647985747-970126 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: SeongJae Park Subject: mm/damon/dbgfs-test: fix is_target_id() change DAMON kunit tests for DAMON debugfs interface fails because it still assumes setting empty monitoring operations makes DAMON debugfs interface believe the target of the context don't have pid. This commit fixes the kunit test fails by explicitly setting the context's monitoring operations with the operations for the physical address space, which let debugfs knows the target will not have pid. Link: https://lkml.kernel.org/r/20220215184603.1479-8-sj@kernel.org Signed-off-by: SeongJae Park Cc: David Rientjes Cc: Xin Hao Signed-off-by: Andrew Morton --- mm/damon/dbgfs-test.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) --- a/mm/damon/dbgfs-test.h~mm-damon-dbgfs-test-fix-is_target_id-change +++ a/mm/damon/dbgfs-test.h @@ -74,7 +74,7 @@ static void damon_dbgfs_test_set_targets char buf[64]; /* Make DAMON consider target has no pid */ - ctx->ops = (struct damon_operations){}; + damon_select_ops(ctx, DAMON_OPS_PADDR); dbgfs_set_targets(ctx, 0, NULL); sprint_target_ids(ctx, buf, 64); @@ -111,6 +111,8 @@ static void damon_dbgfs_test_set_init_re int i, rc; char buf[256]; + damon_select_ops(ctx, DAMON_OPS_PADDR); + dbgfs_set_targets(ctx, 3, NULL); /* Put valid inputs and check the results */ From patchwork Tue Mar 22 21:49:07 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789291 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2F2AFC433FE for ; Tue, 22 Mar 2022 21:49:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B75CD6B0204; Tue, 22 Mar 2022 17:49:09 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B24BF6B0205; Tue, 22 Mar 2022 17:49:09 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 99E126B0206; Tue, 22 Mar 2022 17:49:09 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0046.hostedemail.com [216.40.44.46]) by kanga.kvack.org (Postfix) with ESMTP id 8C0156B0204 for ; Tue, 22 Mar 2022 17:49:09 -0400 (EDT) Received: from smtpin17.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 439BB8248D52 for ; Tue, 22 Mar 2022 21:49:09 +0000 (UTC) X-FDA: 79273363218.17.F0B3E71 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf28.hostedemail.com (Postfix) with ESMTP id C7AFAC001A for ; Tue, 22 Mar 2022 21:49:08 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 487AB6149C; Tue, 22 Mar 2022 21:49:08 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9E3E2C340EC; Tue, 22 Mar 2022 21:49:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985747; bh=OE8JHBdEbctidvhLgNap8i04dQeZ+vM3fUWBBaF+yX8=; h=Date:To:From:In-Reply-To:Subject:From; b=YnLZ+J3QAVrGNatDBKV4mM7ZTldAcckRty2DPW44nNgdPZ2kWhXxCdyIDj1nAkunu 7Vme1AVPoBaz9TB7bo063rBa6NwdLIpgYmsfN8fsKBLYj0J5dn5zMD/QAZK3Twtwjg JcLI4qq1J5IhGwXSczPNm3kPCV0987+ZaPrE8qRc= Date: Tue, 22 Mar 2022 14:49:07 -0700 To: xhao@linux.alibaba.com,rientjes@google.com,sj@kernel.org,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 209/227] mm/damon/paddr,vaddr: remove damon_{p,v}a_{target_valid,set_operations}() Message-Id: <20220322214907.9E3E2C340EC@smtp.kernel.org> X-Rspam-User: Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=YnLZ+J3Q; spf=pass (imf28.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: C7AFAC001A X-Stat-Signature: ykzp5k3d686i5jimf3jghikpytftrk3a X-HE-Tag: 1647985748-7000 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: SeongJae Park Subject: mm/damon/paddr,vaddr: remove damon_{p,v}a_{target_valid,set_operations}() Because DAMON debugfs interface and DAMON-based proactive reclaim are now using monitoring operations via registration mechanism, damon_{p,v}a_{target_valid,set_operations}() functions have no user. This commit clean them up. Link: https://lkml.kernel.org/r/20220215184603.1479-9-sj@kernel.org Signed-off-by: SeongJae Park Cc: David Rientjes Cc: Xin Hao Signed-off-by: Andrew Morton --- include/linux/damon.h | 10 ---------- mm/damon/paddr.c | 20 +------------------- mm/damon/vaddr.c | 15 +-------------- 3 files changed, 2 insertions(+), 43 deletions(-) --- a/include/linux/damon.h~mm-damon-paddrvaddr-remove-damon_pva_target_validset_operations +++ a/include/linux/damon.h @@ -513,14 +513,4 @@ int damon_stop(struct damon_ctx **ctxs, #endif /* CONFIG_DAMON */ -#ifdef CONFIG_DAMON_VADDR -bool damon_va_target_valid(void *t); -void damon_va_set_operations(struct damon_ctx *ctx); -#endif /* CONFIG_DAMON_VADDR */ - -#ifdef CONFIG_DAMON_PADDR -bool damon_pa_target_valid(void *t); -void damon_pa_set_operations(struct damon_ctx *ctx); -#endif /* CONFIG_DAMON_PADDR */ - #endif /* _DAMON_H */ --- a/mm/damon/paddr.c~mm-damon-paddrvaddr-remove-damon_pva_target_validset_operations +++ a/mm/damon/paddr.c @@ -208,11 +208,6 @@ static unsigned int damon_pa_check_acces return max_nr_accesses; } -bool damon_pa_target_valid(void *t) -{ - return true; -} - static unsigned long damon_pa_apply_scheme(struct damon_ctx *ctx, struct damon_target *t, struct damon_region *r, struct damos *scheme) @@ -261,19 +256,6 @@ static int damon_pa_scheme_score(struct return DAMOS_MAX_SCORE; } -void damon_pa_set_operations(struct damon_ctx *ctx) -{ - ctx->ops.init = NULL; - ctx->ops.update = NULL; - ctx->ops.prepare_access_checks = damon_pa_prepare_access_checks; - ctx->ops.check_accesses = damon_pa_check_accesses; - ctx->ops.reset_aggregated = NULL; - ctx->ops.target_valid = damon_pa_target_valid; - ctx->ops.cleanup = NULL; - ctx->ops.apply_scheme = damon_pa_apply_scheme; - ctx->ops.get_scheme_score = damon_pa_scheme_score; -} - static int __init damon_pa_initcall(void) { struct damon_operations ops = { @@ -283,7 +265,7 @@ static int __init damon_pa_initcall(void .prepare_access_checks = damon_pa_prepare_access_checks, .check_accesses = damon_pa_check_accesses, .reset_aggregated = NULL, - .target_valid = damon_pa_target_valid, + .target_valid = NULL, .cleanup = NULL, .apply_scheme = damon_pa_apply_scheme, .get_scheme_score = damon_pa_scheme_score, --- a/mm/damon/vaddr.c~mm-damon-paddrvaddr-remove-damon_pva_target_validset_operations +++ a/mm/damon/vaddr.c @@ -653,7 +653,7 @@ static unsigned int damon_va_check_acces * Functions for the target validity check and cleanup */ -bool damon_va_target_valid(void *target) +static bool damon_va_target_valid(void *target) { struct damon_target *t = target; struct task_struct *task; @@ -739,19 +739,6 @@ static int damon_va_scheme_score(struct return DAMOS_MAX_SCORE; } -void damon_va_set_operations(struct damon_ctx *ctx) -{ - ctx->ops.init = damon_va_init; - ctx->ops.update = damon_va_update; - ctx->ops.prepare_access_checks = damon_va_prepare_access_checks; - ctx->ops.check_accesses = damon_va_check_accesses; - ctx->ops.reset_aggregated = NULL; - ctx->ops.target_valid = damon_va_target_valid; - ctx->ops.cleanup = NULL; - ctx->ops.apply_scheme = damon_va_apply_scheme; - ctx->ops.get_scheme_score = damon_va_scheme_score; -} - static int __init damon_va_initcall(void) { struct damon_operations ops = { From patchwork Tue Mar 22 21:49:09 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789292 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 34594C433FE for ; Tue, 22 Mar 2022 21:49:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BEA066B0206; Tue, 22 Mar 2022 17:49:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B9B486B0207; Tue, 22 Mar 2022 17:49:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A890D6B0208; Tue, 22 Mar 2022 17:49:13 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0187.hostedemail.com [216.40.44.187]) by kanga.kvack.org (Postfix) with ESMTP id 984F26B0206 for ; Tue, 22 Mar 2022 17:49:13 -0400 (EDT) Received: from smtpin17.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 633341828AE47 for ; Tue, 22 Mar 2022 21:49:13 +0000 (UTC) X-FDA: 79273363386.17.928C313 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf28.hostedemail.com (Postfix) with ESMTP id DBE6CC001E for ; Tue, 22 Mar 2022 21:49:12 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id C9DFAB81DC6; Tue, 22 Mar 2022 21:49:11 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7C665C340EC; Tue, 22 Mar 2022 21:49:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985750; bh=vGh/cA9ETOpoqUVsqzO3raaf979clIRiwEjeafO+Rhk=; h=Date:To:From:In-Reply-To:Subject:From; b=iFla/ExxmqochfChrM+GpP2ZDNZ8/Ih6VEml0O5WqSj53lcViEFLJPCKDoK+SCZ3j 1OAy+1tIkN0yIZqoh7I9Hv6vBVLi/o+XpFbz3nqBt0c8XAr0VGytOxw4bwhgVsU7xx iusNP0Fr64aaJrRwraJxXN8IK4PHVlBkkYNokNvM= Date: Tue, 22 Mar 2022 14:49:09 -0700 To: sj@kernel.org,tangmeng@uniontech.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 210/227] mm/damon: remove unnecessary CONFIG_DAMON option Message-Id: <20220322214910.7C665C340EC@smtp.kernel.org> X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: DBE6CC001E X-Stat-Signature: mggwsdm11q3dcb65ewa6zotrtpjz44ud Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b="iFla/Exx"; dmarc=none; spf=pass (imf28.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Rspam-User: X-HE-Tag: 1647985752-107114 X-Bogosity: Ham, tests=bogofilter, spamicity=0.044482, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: tangmeng Subject: mm/damon: remove unnecessary CONFIG_DAMON option In mm/Makefile has: obj-$(CONFIG_DAMON) += damon/ So that we don't need 'obj-$(CONFIG_DAMON) :=' in mm/damon/Makefile, delete it from mm/damon/Makefile. Link: https://lkml.kernel.org/r/20220221065255.19991-1-tangmeng@uniontech.com Signed-off-by: tangmeng Cc: SeongJae Park Signed-off-by: Andrew Morton --- mm/damon/Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/mm/damon/Makefile~mm-damon-remove-unnecessary-config_damon-option +++ a/mm/damon/Makefile @@ -1,6 +1,6 @@ # SPDX-License-Identifier: GPL-2.0 -obj-$(CONFIG_DAMON) := core.o +obj-y := core.o obj-$(CONFIG_DAMON_VADDR) += ops-common.o vaddr.o obj-$(CONFIG_DAMON_PADDR) += ops-common.o paddr.o obj-$(CONFIG_DAMON_DBGFS) += dbgfs.o From patchwork Tue Mar 22 21:49:12 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789293 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2AAD3C433FE for ; Tue, 22 Mar 2022 21:49:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B50836B0208; Tue, 22 Mar 2022 17:49:16 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AFF7E6B0209; Tue, 22 Mar 2022 17:49:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9C6306B020A; Tue, 22 Mar 2022 17:49:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 8EF206B0208 for ; Tue, 22 Mar 2022 17:49:16 -0400 (EDT) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 5A75E18222A0D for ; Tue, 22 Mar 2022 21:49:16 +0000 (UTC) X-FDA: 79273363512.23.2CA0EA5 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf20.hostedemail.com (Postfix) with ESMTP id C94901C002F for ; Tue, 22 Mar 2022 21:49:15 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id B2FB9B81DAB; Tue, 22 Mar 2022 21:49:14 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 68633C340EE; Tue, 22 Mar 2022 21:49:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985753; bh=cXpd7DykzEkja9QGRrIaORbf8fpG6oaM+sXBMDgTcsk=; h=Date:To:From:In-Reply-To:Subject:From; b=iXLPovVvl4gDmEKbtHq1+ls5jPw4S8x5FGJ2g1LI37mMbUyWg/ZuF5sa6TqHKUw2Z lKQKs59qtGtSUmGfLpO99FT60I9Nt4ZS7b3rbTud8cUbg/6GB1sIRgNgd/FtELXa5Y t1FuSE8tUFR0MJC5IOIwFA+aBMlBE4138j+ZXdy4= Date: Tue, 22 Mar 2022 14:49:12 -0700 To: corbet@lwn.net,sj@kernel.org,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 211/227] Docs/vm/damon: call low level monitoring primitives the operations Message-Id: <20220322214913.68633C340EE@smtp.kernel.org> X-Stat-Signature: pwpcdutyz7k37hgkbmc51u8ne1bhuof6 Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=iXLPovVv; spf=pass (imf20.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: C94901C002F X-HE-Tag: 1647985755-926678 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: SeongJae Park Subject: Docs/vm/damon: call low level monitoring primitives the operations Patch series "Docs/damon: Update documents for better consistency". Some of DAMON document are not properly updated for latest version. This patchset updates such parts. This patch (of 3): DAMON code calls the low level monitoring primitives implementations the monitoring operations. The documentation would have no problem at still calling those primitives implementation because there is no real difference in the concepts, but making it more consistent with the code would make it better. This commit therefore convert sentences in the doc specifically pointing the implementations of the primitives to call it monitoring operations. Link: https://lkml.kernel.org/r/20220222170100.17068-1-sj@kernel.org Link: https://lkml.kernel.org/r/20220222170100.17068-2-sj@kernel.org Signed-off-by: SeongJae Park Cc: Jonathan Corbet Signed-off-by: Andrew Morton --- Documentation/vm/damon/design.rst | 24 ++++++++++++------------ Documentation/vm/damon/faq.rst | 2 +- 2 files changed, 13 insertions(+), 13 deletions(-) --- a/Documentation/vm/damon/design.rst~docs-vm-damon-call-low-level-monitoring-primitives-the-operations +++ a/Documentation/vm/damon/design.rst @@ -13,12 +13,13 @@ primitives that dependent on and optimiz the other hand, the accuracy and overhead tradeoff mechanism, which is the core of DAMON, is in the pure logic space. DAMON separates the two parts in different layers and defines its interface to allow various low level -primitives implementations configurable with the core logic. +primitives implementations configurable with the core logic. We call the low +level primitives implementations monitoring operations. Due to this separated design and the configurable interface, users can extend -DAMON for any address space by configuring the core logics with appropriate low -level primitive implementations. If appropriate one is not provided, users can -implement the primitives on their own. +DAMON for any address space by configuring the core logics with appropriate +monitoring operations. If appropriate one is not provided, users can implement +the operations on their own. For example, physical memory, virtual memory, swap space, those for specific processes, NUMA nodes, files, and backing memory devices would be supportable. @@ -26,25 +27,24 @@ Also, if some architectures or devices s primitives, those will be easily configurable. -Reference Implementations of Address Space Specific Primitives -============================================================== +Reference Implementations of Address Space Specific Monitoring Operations +========================================================================= -The low level primitives for the fundamental access monitoring are defined in -two parts: +The monitoring operations are defined in two parts: 1. Identification of the monitoring target address range for the address space. 2. Access check of specific address range in the target space. -DAMON currently provides the implementations of the primitives for the physical +DAMON currently provides the implementations of the operations for the physical and virtual address spaces. Below two subsections describe how those work. VMA-based Target Address Range Construction ------------------------------------------- -This is only for the virtual address space primitives implementation. That for -the physical address space simply asks users to manually set the monitoring -target address ranges. +This is only for the virtual address space monitoring operations +implementation. That for the physical address space simply asks users to +manually set the monitoring target address ranges. Only small parts in the super-huge virtual address space of the processes are mapped to the physical memory and accessed. Thus, tracking the unmapped --- a/Documentation/vm/damon/faq.rst~docs-vm-damon-call-low-level-monitoring-primitives-the-operations +++ a/Documentation/vm/damon/faq.rst @@ -31,7 +31,7 @@ Does DAMON support virtual memory only? ======================================= No. The core of the DAMON is address space independent. The address space -specific low level primitive parts including monitoring target regions +specific monitoring operations including monitoring target regions constructions and actual access checks can be implemented and configured on the DAMON core by the users. In this way, DAMON users can monitor any address space with any access check technique. From patchwork Tue Mar 22 21:49:15 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789294 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 74A54C433EF for ; Tue, 22 Mar 2022 21:49:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0787F6B020A; Tue, 22 Mar 2022 17:49:22 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 028916B020B; Tue, 22 Mar 2022 17:49:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E32F86B020C; Tue, 22 Mar 2022 17:49:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0089.hostedemail.com [216.40.44.89]) by kanga.kvack.org (Postfix) with ESMTP id D482E6B020A for ; Tue, 22 Mar 2022 17:49:21 -0400 (EDT) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 967E11809B0BB for ; Tue, 22 Mar 2022 21:49:21 +0000 (UTC) X-FDA: 79273363722.25.BA96488 Received: from sin.source.kernel.org (sin.source.kernel.org [145.40.73.55]) by imf26.hostedemail.com (Postfix) with ESMTP id D97BF14003A for ; Tue, 22 Mar 2022 21:49:20 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sin.source.kernel.org (Postfix) with ESMTPS id 255E5CE1E03; Tue, 22 Mar 2022 21:49:18 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5BAA9C340EE; Tue, 22 Mar 2022 21:49:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985756; bh=ZPovejT+mS4ShXN/nfO853TOrgfFMjd8KIsrmY37H88=; h=Date:To:From:In-Reply-To:Subject:From; b=VQ+FPq88EYuoYlE4CD6Gb4EagG+YE9N176cXUfB1eWQrJJP+2z/Pk+JqnUqov6ReR nFIt7jrfF/WlWPA1hPlnITFGn8WxXhmrkAlOHGVuEX/xKsXCXfnDof9eS0CutGwUUr G5YEZswwJevnFuBv6345j6oMKgVjI4SC2PMdAmEg= Date: Tue, 22 Mar 2022 14:49:15 -0700 To: corbet@lwn.net,sj@kernel.org,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 212/227] Docs/vm/damon/design: update DAMON-Idle Page Tracking interference handling Message-Id: <20220322214916.5BAA9C340EE@smtp.kernel.org> X-Stat-Signature: po5kgkpnne7uxcfswihukpbmw6xb3wta Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=VQ+FPq88; spf=pass (imf26.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.73.55 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspam-User: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: D97BF14003A X-HE-Tag: 1647985760-168234 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000062, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: SeongJae Park Subject: Docs/vm/damon/design: update DAMON-Idle Page Tracking interference handling In DAMON's early development stage before it be merged in the mainline, it was first designed to work exclusively with Idle page tracking to avoid any interference between each other. Later, but still before be merged in the mainline, because Idle page tracking is fully under the control of sysadmins, we made the resolving of conflict as the responsibility of sysadmins. The document is not updated for the change, though. This commit updates the document for that. Link: https://lkml.kernel.org/r/20220222170100.17068-3-sj@kernel.org Signed-off-by: SeongJae Park Cc: Jonathan Corbet Signed-off-by: Andrew Morton --- Documentation/vm/damon/design.rst | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) --- a/Documentation/vm/damon/design.rst~docs-vm-damon-design-update-damon-idle-page-tracking-interference-handling +++ a/Documentation/vm/damon/design.rst @@ -84,9 +84,10 @@ table having a mapping to the address. and clear the bit(s) for next sampling target address and checks whether the bit(s) set again after one sampling period. This could disturb other kernel subsystems using the Accessed bits, namely Idle page tracking and the reclaim -logic. To avoid such disturbances, DAMON makes it mutually exclusive with Idle -page tracking and uses ``PG_idle`` and ``PG_young`` page flags to solve the -conflict with the reclaim logic, as Idle page tracking does. +logic. DAMON does nothing to avoid disturbing Idle page tracking, so handling +the interference is the responsibility of sysadmins. However, it solves the +conflict with the reclaim logic using ``PG_idle`` and ``PG_young`` page flags, +as Idle page tracking does. Address Space Independent Core Mechanisms From patchwork Tue Mar 22 21:49:18 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789295 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 60C3BC433FE for ; Tue, 22 Mar 2022 21:49:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C87A16B020B; Tue, 22 Mar 2022 17:49:22 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C3D236B020C; Tue, 22 Mar 2022 17:49:22 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AAFC66B020D; Tue, 22 Mar 2022 17:49:22 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0102.hostedemail.com [216.40.44.102]) by kanga.kvack.org (Postfix) with ESMTP id 968E86B020B for ; Tue, 22 Mar 2022 17:49:22 -0400 (EDT) Received: from smtpin30.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 53AC4180FF4A6 for ; Tue, 22 Mar 2022 21:49:22 +0000 (UTC) X-FDA: 79273363764.30.F5E9043 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf18.hostedemail.com (Postfix) with ESMTP id B8F6E1C000B for ; Tue, 22 Mar 2022 21:49:21 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id AA3EFB81DC3; Tue, 22 Mar 2022 21:49:20 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 48C1CC340EE; Tue, 22 Mar 2022 21:49:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985759; bh=nCid8sVTqW2FafLyabZL96EzaHpXLv8FFjNEEnTHbGg=; h=Date:To:From:In-Reply-To:Subject:From; b=MC/ymDRvPoczwIF5EvSzu3FcUhSM5DGyjrZvSsk9zJkkcsO/Sk8s/wRA3eTAOVm5e TZ32YmSoxG2f+PfhHuFzCL7YWFddAUVCWP6OgP40Amjz3Rvzqf+r6xU5m40hwDNuco 7hqwl2QGFoEgD8vv3TQGwfNquY120JY9aQ2tybjw= Date: Tue, 22 Mar 2022 14:49:18 -0700 To: corbet@lwn.net,sj@kernel.org,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 213/227] Docs/damon: update outdated term 'regions update interval' Message-Id: <20220322214919.48C1CC340EE@smtp.kernel.org> X-Stat-Signature: tbuckeurtqyq8gaw5dent4h89khr9d5g Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b="MC/ymDRv"; spf=pass (imf18.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspam-User: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: B8F6E1C000B X-HE-Tag: 1647985761-432621 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: SeongJae Park Subject: Docs/damon: update outdated term 'regions update interval' Before DAMON is merged in the mainline, the concept of 'regions update interval' has generalized to be used as the time interval for update of any monitoring operations related data structure, but the document has not updated properly. This commit updates the document for better consistency. Link: https://lkml.kernel.org/r/20220222170100.17068-4-sj@kernel.org Signed-off-by: SeongJae Park Cc: Jonathan Corbet Signed-off-by: Andrew Morton --- Documentation/admin-guide/mm/damon/usage.rst | 6 +++--- Documentation/vm/damon/design.rst | 12 +++++++----- 2 files changed, 10 insertions(+), 8 deletions(-) --- a/Documentation/admin-guide/mm/damon/usage.rst~docs-damon-update-outdated-term-regions-update-interval +++ a/Documentation/admin-guide/mm/damon/usage.rst @@ -47,7 +47,7 @@ Attributes ---------- Users can get and set the ``sampling interval``, ``aggregation interval``, -``regions update interval``, and min/max number of monitoring target regions by +``update interval``, and min/max number of monitoring target regions by reading from and writing to the ``attrs`` file. To know about the monitoring attributes in detail, please refer to the :doc:`/vm/damon/design`. For example, below commands set those values to 5 ms, 100 ms, 1,000 ms, 10 and @@ -128,8 +128,8 @@ ranges, ``20-40`` and ``50-100`` as that Note that this sets the initial monitoring target regions only. In case of virtual memory monitoring, DAMON will automatically updates the boundary of the -regions after one ``regions update interval``. Therefore, users should set the -``regions update interval`` large enough in this case, if they don't want the +regions after one ``update interval``. Therefore, users should set the +``update interval`` large enough in this case, if they don't want the update. --- a/Documentation/vm/damon/design.rst~docs-damon-update-outdated-term-regions-update-interval +++ a/Documentation/vm/damon/design.rst @@ -95,8 +95,8 @@ Address Space Independent Core Mechanism Below four sections describe each of the DAMON core mechanisms and the five monitoring attributes, ``sampling interval``, ``aggregation interval``, -``regions update interval``, ``minimum number of regions``, and ``maximum -number of regions``. +``update interval``, ``minimum number of regions``, and ``maximum number of +regions``. Access Frequency Monitoring @@ -169,6 +169,8 @@ The monitoring target address range coul virtual memory could be dynamically mapped and unmapped. Physical memory could be hot-plugged. -As the changes could be quite frequent in some cases, DAMON checks the dynamic -memory mapping changes and applies it to the abstracted target area only for -each of a user-specified time interval (``regions update interval``). +As the changes could be quite frequent in some cases, DAMON allows the +monitoring operations to check dynamic changes including memory mapping changes +and applies it to monitoring operations-related data structures such as the +abstracted monitoring target memory area only for each of a user-specified time +interval (``update interval``). From patchwork Tue Mar 22 21:49:21 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789296 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8AA0FC433F5 for ; Tue, 22 Mar 2022 21:49:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 16A286B020E; Tue, 22 Mar 2022 17:49:26 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 114206B020F; Tue, 22 Mar 2022 17:49:26 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EF5AA6B0213; Tue, 22 Mar 2022 17:49:25 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0104.hostedemail.com [216.40.44.104]) by kanga.kvack.org (Postfix) with ESMTP id E09306B020E for ; Tue, 22 Mar 2022 17:49:25 -0400 (EDT) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id A9861A3251 for ; Tue, 22 Mar 2022 21:49:25 +0000 (UTC) X-FDA: 79273363890.18.CD99FF3 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf15.hostedemail.com (Postfix) with ESMTP id 1F0F0A0019 for ; Tue, 22 Mar 2022 21:49:24 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id D316BB81DC3; Tue, 22 Mar 2022 21:49:23 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 70360C340EC; Tue, 22 Mar 2022 21:49:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985762; bh=bms1jO3YWO796QBjUaMuUjMdCkDJkc6CTtsFHeu48Ag=; h=Date:To:From:In-Reply-To:Subject:From; b=q2EaFnErLWtn0kz230rpTTfIz3wC8AmIfdXLJQ2MG5WbQRvPT+PffxjKrNQ63D+pl lQRoQQy+n8QxxFQgpyoEFueIgl7pt/GLER7PPX/qqCuE981HZiDPG0nbnrBr1SepAK tLUWOwldBTAWenmh+HMr8ZDcSefeVwFV+hbPf0DU= Date: Tue, 22 Mar 2022 14:49:21 -0700 To: xhao@linux.alibaba.com,skhan@linuxfoundation.org,rientjes@google.com,gregkh@linuxfoundation.org,corbet@lwn.net,sj@kernel.org,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 214/227] mm/damon/core: allow non-exclusive DAMON start/stop Message-Id: <20220322214922.70360C340EC@smtp.kernel.org> X-Rspam-User: X-Stat-Signature: gm1o9rux3i6b6g4tah33acdz59hbgxoi Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=q2EaFnEr; spf=pass (imf15.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 1F0F0A0019 X-HE-Tag: 1647985764-914294 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: SeongJae Park Subject: mm/damon/core: allow non-exclusive DAMON start/stop Patch series "Introduce DAMON sysfs interface", v3. Introduction ============ DAMON's debugfs-based user interface (DAMON_DBGFS) served very well, so far. However, it unnecessarily depends on debugfs, while DAMON is not aimed to be used for only debugging. Also, the interface receives multiple values via one file. For example, schemes file receives 18 values. As a result, it is inefficient, hard to be used, and difficult to be extended. Especially, keeping backward compatibility of user space tools is getting only challenging. It would be better to implement another reliable and flexible interface and deprecate DAMON_DBGFS in long term. For the reason, this patchset introduces a sysfs-based new user interface of DAMON. The idea of the new interface is, using directory hierarchies and having one dedicated file for each value. For a short example, users can do the virtual address monitoring via the interface as below: # cd /sys/kernel/mm/damon/admin/ # echo 1 > kdamonds/nr_kdamonds # echo 1 > kdamonds/0/contexts/nr_contexts # echo vaddr > kdamonds/0/contexts/0/operations # echo 1 > kdamonds/0/contexts/0/targets/nr_targets # echo $(pidof ) > kdamonds/0/contexts/0/targets/0/pid_target # echo on > kdamonds/0/state A brief representation of the files hierarchy of DAMON sysfs interface is as below. Childs are represented with indentation, directories are having '/' suffix, and files in each directory are separated by comma. /sys/kernel/mm/damon/admin │ kdamonds/nr_kdamonds │ │ 0/state,pid │ │ │ contexts/nr_contexts │ │ │ │ 0/operations │ │ │ │ │ monitoring_attrs/ │ │ │ │ │ │ intervals/sample_us,aggr_us,update_us │ │ │ │ │ │ nr_regions/min,max │ │ │ │ │ targets/nr_targets │ │ │ │ │ │ 0/pid_target │ │ │ │ │ │ │ regions/nr_regions │ │ │ │ │ │ │ │ 0/start,end │ │ │ │ │ │ │ │ ... │ │ │ │ │ │ ... │ │ │ │ │ schemes/nr_schemes │ │ │ │ │ │ 0/action │ │ │ │ │ │ │ access_pattern/ │ │ │ │ │ │ │ │ sz/min,max │ │ │ │ │ │ │ │ nr_accesses/min,max │ │ │ │ │ │ │ │ age/min,max │ │ │ │ │ │ │ quotas/ms,bytes,reset_interval_ms │ │ │ │ │ │ │ │ weights/sz_permil,nr_accesses_permil,age_permil │ │ │ │ │ │ │ watermarks/metric,interval_us,high,mid,low │ │ │ │ │ │ │ stats/nr_tried,sz_tried,nr_applied,sz_applied,qt_exceeds │ │ │ │ │ │ ... │ │ │ │ ... │ │ ... Detailed usage of the files will be described in the final Documentation patch of this patchset. Main Difference Between DAMON_DBGFS and DAMON_SYSFS --------------------------------------------------- At the moment, DAMON_DBGFS and DAMON_SYSFS provides same features. One important difference between them is their exclusiveness. DAMON_DBGFS works in an exclusive manner, so that no DAMON worker thread (kdamond) in the system can run concurrently and interfere somehow. For the reason, DAMON_DBGFS asks users to construct all monitoring contexts and start them at once. It's not a big problem but makes the operation a little bit complex and unflexible. For more flexible usage, DAMON_SYSFS moves the responsibility of preventing any possible interference to the admins and work in a non-exclusive manner. That is, users can configure and start contexts one by one. Note that DAMON respects both exclusive groups and non-exclusive groups of contexts, in a manner similar to that of reader-writer locks. That is, if any exclusive monitoring contexts (e.g., contexts that started via DAMON_DBGFS) are running, DAMON_SYSFS does not start new contexts, and vice versa. Future Plan of DAMON_DBGFS Deprecation ====================================== Once this patchset is merged, DAMON_DBGFS development will be frozen. That is, we will maintain it to work as is now so that no users will be break. But, it will not be extended to provide any new feature of DAMON. The support will be continued only until next LTS release. After that, we will drop DAMON_DBGFS. User-space Tooling Compatibility -------------------------------- As DAMON_SYSFS provides all features of DAMON_DBGFS, all user space tooling can move to DAMON_SYSFS. As we will continue supporting DAMON_DBGFS until next LTS kernel release, user space tools would have enough time to move to DAMON_SYSFS. The official user space tool, damo[1], is already supporting both DAMON_SYSFS and DAMON_DBGFS. Both correctness tests[2] and performance tests[3] of DAMON using DAMON_SYSFS also passed. [1] https://github.com/awslabs/damo [2] https://github.com/awslabs/damon-tests/tree/master/corr [3] https://github.com/awslabs/damon-tests/tree/master/perf Sequence of Patches =================== First two patches (patches 1-2) make core changes for DAMON_SYSFS. The first one (patch 1) allows non-exclusive DAMON contexts so that DAMON_SYSFS can work in non-exclusive mode, while the second one (patch 2) adds size of DAMON enum types so that DAMON API users can safely iterate the enums. Third patch (patch 3) implements basic sysfs stub for virtual address spaces monitoring. Note that this implements only sysfs files and DAMON is not linked. Fourth patch (patch 4) links the DAMON_SYSFS to DAMON so that users can control DAMON using the sysfs files. Following six patches (patches 5-10) implements other DAMON features that DAMON_DBGFS supports one by one (physical address space monitoring, DAMON-based operation schemes, schemes quotas, schemes prioritization weights, schemes watermarks, and schemes stats). Following patch (patch 11) adds a simple selftest for DAMON_SYSFS, and the final one (patch 12) documents DAMON_SYSFS. This patch (of 13): To avoid interference between DAMON contexts monitoring overlapping memory regions, damon_start() works in an exclusive manner. That is, damon_start() does nothing bug fails if any context that started by another instance of the function is still running. This makes its usage a little bit restrictive. However, admins could aware each DAMON usage and address such interferences on their own in some cases. This commit hence implements non-exclusive mode of the function and allows the callers to select the mode. Note that the exclusive groups and non-exclusive groups of contexts will respect each other in a manner similar to that of reader-writer locks. Therefore, this commit will not cause any behavioral change to the exclusive groups. Link: https://lkml.kernel.org/r/20220228081314.5770-1-sj@kernel.org Link: https://lkml.kernel.org/r/20220228081314.5770-2-sj@kernel.org Signed-off-by: SeongJae Park Cc: Jonathan Corbet Cc: Shuah Khan Cc: David Rientjes Cc: Xin Hao Cc: Greg Kroah-Hartman Signed-off-by: Andrew Morton --- include/linux/damon.h | 2 +- mm/damon/core.c | 23 +++++++++++++++-------- mm/damon/dbgfs.c | 2 +- mm/damon/reclaim.c | 2 +- 4 files changed, 18 insertions(+), 11 deletions(-) --- a/include/linux/damon.h~mm-damon-core-allow-non-exclusive-damon-start-stop +++ a/include/linux/damon.h @@ -508,7 +508,7 @@ int damon_nr_running_ctxs(void); int damon_register_ops(struct damon_operations *ops); int damon_select_ops(struct damon_ctx *ctx, enum damon_ops_id id); -int damon_start(struct damon_ctx **ctxs, int nr_ctxs); +int damon_start(struct damon_ctx **ctxs, int nr_ctxs, bool exclusive); int damon_stop(struct damon_ctx **ctxs, int nr_ctxs); #endif /* CONFIG_DAMON */ --- a/mm/damon/core.c~mm-damon-core-allow-non-exclusive-damon-start-stop +++ a/mm/damon/core.c @@ -24,6 +24,7 @@ static DEFINE_MUTEX(damon_lock); static int nr_running_ctxs; +static bool running_exclusive_ctxs; static DEFINE_MUTEX(damon_ops_lock); static struct damon_operations damon_registered_ops[NR_DAMON_OPS]; @@ -434,22 +435,25 @@ static int __damon_start(struct damon_ct * damon_start() - Starts the monitorings for a given group of contexts. * @ctxs: an array of the pointers for contexts to start monitoring * @nr_ctxs: size of @ctxs + * @exclusive: exclusiveness of this contexts group * * This function starts a group of monitoring threads for a group of monitoring * contexts. One thread per each context is created and run in parallel. The - * caller should handle synchronization between the threads by itself. If a - * group of threads that created by other 'damon_start()' call is currently - * running, this function does nothing but returns -EBUSY. + * caller should handle synchronization between the threads by itself. If + * @exclusive is true and a group of threads that created by other + * 'damon_start()' call is currently running, this function does nothing but + * returns -EBUSY. * * Return: 0 on success, negative error code otherwise. */ -int damon_start(struct damon_ctx **ctxs, int nr_ctxs) +int damon_start(struct damon_ctx **ctxs, int nr_ctxs, bool exclusive) { int i; int err = 0; mutex_lock(&damon_lock); - if (nr_running_ctxs) { + if ((exclusive && nr_running_ctxs) || + (!exclusive && running_exclusive_ctxs)) { mutex_unlock(&damon_lock); return -EBUSY; } @@ -460,13 +464,15 @@ int damon_start(struct damon_ctx **ctxs, break; nr_running_ctxs++; } + if (exclusive && nr_running_ctxs) + running_exclusive_ctxs = true; mutex_unlock(&damon_lock); return err; } /* - * __damon_stop() - Stops monitoring of given context. + * __damon_stop() - Stops monitoring of a given context. * @ctx: monitoring context * * Return: 0 on success, negative error code otherwise. @@ -504,9 +510,8 @@ int damon_stop(struct damon_ctx **ctxs, /* nr_running_ctxs is decremented in kdamond_fn */ err = __damon_stop(ctxs[i]); if (err) - return err; + break; } - return err; } @@ -1102,6 +1107,8 @@ static int kdamond_fn(void *data) mutex_lock(&damon_lock); nr_running_ctxs--; + if (!nr_running_ctxs && running_exclusive_ctxs) + running_exclusive_ctxs = false; mutex_unlock(&damon_lock); return 0; --- a/mm/damon/dbgfs.c~mm-damon-core-allow-non-exclusive-damon-start-stop +++ a/mm/damon/dbgfs.c @@ -967,7 +967,7 @@ static ssize_t dbgfs_monitor_on_write(st return -EINVAL; } } - ret = damon_start(dbgfs_ctxs, dbgfs_nr_ctxs); + ret = damon_start(dbgfs_ctxs, dbgfs_nr_ctxs, true); } else if (!strncmp(kbuf, "off", count)) { ret = damon_stop(dbgfs_ctxs, dbgfs_nr_ctxs); } else { --- a/mm/damon/reclaim.c~mm-damon-core-allow-non-exclusive-damon-start-stop +++ a/mm/damon/reclaim.c @@ -330,7 +330,7 @@ static int damon_reclaim_turn(bool on) if (err) goto free_scheme_out; - err = damon_start(&ctx, 1); + err = damon_start(&ctx, 1, true); if (!err) { kdamond_pid = ctx->kdamond->pid; return 0; From patchwork Tue Mar 22 21:49:24 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789298 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B7531C4332F for ; Tue, 22 Mar 2022 21:49:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 117056B0246; Tue, 22 Mar 2022 17:49:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0C8BB6B0248; Tue, 22 Mar 2022 17:49:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id ED2246B0249; Tue, 22 Mar 2022 17:49:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.26]) by kanga.kvack.org (Postfix) with ESMTP id E052E6B0246 for ; Tue, 22 Mar 2022 17:49:27 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay12.hostedemail.com (Postfix) with ESMTP id A781712194F for ; Tue, 22 Mar 2022 21:49:27 +0000 (UTC) X-FDA: 79273363974.11.9CAC248 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf15.hostedemail.com (Postfix) with ESMTP id 35EC0A0022 for ; Tue, 22 Mar 2022 21:49:27 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id C3B60B81DB7; Tue, 22 Mar 2022 21:49:26 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 6BCCAC36AE5; Tue, 22 Mar 2022 21:49:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985765; bh=Ucyl7dsrthNgSkBKW6/KVlGj1sShxc2guQAd937y+l4=; h=Date:To:From:In-Reply-To:Subject:From; b=KJ/OXuOaFoDC4kbsXeRocUQRavyYPTxFb9aLn9eNcU/97ufpLiJ2tUT3OObdVmLvO T8g9ZE635QWa6u/UWPhVkiJ1pQQWBl8dXZz51/Gf7hQ4H4Wz+LeqxLX27qiN0Oy6NH CAgUJ5kQ8qQrL0na73FLUdgs6KoUjRAd5ayQmJN0= Date: Tue, 22 Mar 2022 14:49:24 -0700 To: xhao@linux.alibaba.com,skhan@linuxfoundation.org,rientjes@google.com,gregkh@linuxfoundation.org,corbet@lwn.net,sj@kernel.org,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 215/227] mm/damon/core: add number of each enum type values Message-Id: <20220322214925.6BCCAC36AE5@smtp.kernel.org> X-Rspam-User: X-Stat-Signature: 5wug5bz5rjcyrjc3qyodde65wh81jw1u Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b="KJ/OXuOa"; spf=pass (imf15.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 35EC0A0022 X-HE-Tag: 1647985767-396825 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: SeongJae Park Subject: mm/damon/core: add number of each enum type values This commit declares the number of legal values for each DAMON enum types to make traversals of such DAMON enum types easy and safe. Link: https://lkml.kernel.org/r/20220228081314.5770-3-sj@kernel.org Signed-off-by: SeongJae Park Cc: David Rientjes Cc: Greg Kroah-Hartman Cc: Jonathan Corbet Cc: Shuah Khan Cc: Xin Hao Signed-off-by: Andrew Morton --- include/linux/damon.h | 4 ++++ 1 file changed, 4 insertions(+) --- a/include/linux/damon.h~mm-damon-core-add-number-of-each-enum-type-values +++ a/include/linux/damon.h @@ -87,6 +87,7 @@ struct damon_target { * @DAMOS_HUGEPAGE: Call ``madvise()`` for the region with MADV_HUGEPAGE. * @DAMOS_NOHUGEPAGE: Call ``madvise()`` for the region with MADV_NOHUGEPAGE. * @DAMOS_STAT: Do nothing but count the stat. + * @NR_DAMOS_ACTIONS: Total number of DAMOS actions */ enum damos_action { DAMOS_WILLNEED, @@ -95,6 +96,7 @@ enum damos_action { DAMOS_HUGEPAGE, DAMOS_NOHUGEPAGE, DAMOS_STAT, /* Do nothing but only record the stat */ + NR_DAMOS_ACTIONS, }; /** @@ -157,10 +159,12 @@ struct damos_quota { * * @DAMOS_WMARK_NONE: Ignore the watermarks of the given scheme. * @DAMOS_WMARK_FREE_MEM_RATE: Free memory rate of the system in [0,1000]. + * @NR_DAMOS_WMARK_METRICS: Total number of DAMOS watermark metrics */ enum damos_wmark_metric { DAMOS_WMARK_NONE, DAMOS_WMARK_FREE_MEM_RATE, + NR_DAMOS_WMARK_METRICS, }; /** From patchwork Tue Mar 22 21:49:27 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789299 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 83E30C433FE for ; Tue, 22 Mar 2022 21:49:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1B32A6B024A; Tue, 22 Mar 2022 17:49:32 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 161596B024B; Tue, 22 Mar 2022 17:49:32 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 001346B024C; Tue, 22 Mar 2022 17:49:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0025.hostedemail.com [216.40.44.25]) by kanga.kvack.org (Postfix) with ESMTP id E2EFE6B024A for ; Tue, 22 Mar 2022 17:49:31 -0400 (EDT) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id A4346A30A0 for ; Tue, 22 Mar 2022 21:49:31 +0000 (UTC) X-FDA: 79273364142.18.E2845EA Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf21.hostedemail.com (Postfix) with ESMTP id 151B21C002C for ; Tue, 22 Mar 2022 21:49:30 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id D508BB81DAB; Tue, 22 Mar 2022 21:49:29 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 80196C340F2; Tue, 22 Mar 2022 21:49:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985768; bh=nYZCD13pdpsqZAxonVlK2+NFPurVcZvRB6OqV/mXjhE=; h=Date:To:From:In-Reply-To:Subject:From; b=GjlV/2iSbQNiGyd5j9ck7a+5+B7/NgkPDMLiuBNqEe9Bk0sRdmSX0y5/P0EyfFwWG pIkX8B5TRzIB39eVsqGlj/EMxnSyuHy5lgHEeOA0B8SZPdTM+6eCcMJ4zgqqp5Z0lT ZGs/bSgEfOHafMl+u/LUmmTYMrtaddy5X379cWq4= Date: Tue, 22 Mar 2022 14:49:27 -0700 To: xhao@linux.alibaba.com,skhan@linuxfoundation.org,rientjes@google.com,jiapeng.chong@linux.alibaba.com,gregkh@linuxfoundation.org,corbet@lwn.net,sj@kernel.org,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 216/227] mm/damon: implement a minimal stub for sysfs-based DAMON interface Message-Id: <20220322214928.80196C340F2@smtp.kernel.org> X-Stat-Signature: ansi3yztioewni88sjpwuubz4kj6jf6y Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b="GjlV/2iS"; spf=pass (imf21.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 151B21C002C X-HE-Tag: 1647985770-325929 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: SeongJae Park Subject: mm/damon: implement a minimal stub for sysfs-based DAMON interface DAMON's debugfs-based user interface served very well, so far. However, it unnecessarily depends on debugfs, while DAMON is not aimed to be used for only debugging. Also, the interface receives multiple values via one file. For example, schemes file receives 18 values separated by white spaces. As a result, it is ineffient, hard to be used, and difficult to be extended. Especially, keeping backward compatibility of user space tools is getting only challenging. It would be better to implement another reliable and flexible interface and deprecate the debugfs interface in long term. To this end, this commit implements a stub of a part of the new user interface of DAMON using sysfs. Specifically, this commit implements the sysfs control parts for virtual address space monitoring. More specifically, the idea of the new interface is, using directory hierarchies and making one file for one value. The hierarchy that this commit is introducing is as below. In the below figure, parents-children relations are represented with indentations, each directory is having ``/`` suffix, and files in each directory are separated by comma (","). /sys/kernel/mm/damon/admin │ kdamonds/nr_kdamonds │ │ 0/state,pid │ │ │ contexts/nr_contexts │ │ │ │ 0/operations │ │ │ │ │ monitoring_attrs/ │ │ │ │ │ │ intervals/sample_us,aggr_us,update_us │ │ │ │ │ │ nr_regions/min,max │ │ │ │ │ targets/nr_targets │ │ │ │ │ │ 0/pid_target │ │ │ │ │ │ ... │ │ │ │ ... │ │ ... Writing a number to each 'nr' file makes directories of name <0> to in the directory of the 'nr' file. That's all this commit does. Writing proper values to relevant files will construct the DAMON contexts, and writing a special keyword, 'on', to 'state' files for each kdamond will ask DAMON to start the constructed contexts. For a short example, using below commands for monitoring virtual address spaces of a given workload is imaginable: # cd /sys/kernel/mm/damon/admin/ # echo 1 > kdamonds/nr_kdamonds # echo 1 > kdamonds/0/contexts/nr_contexts # echo vaddr > kdamonds/0/contexts/0/operations # echo 1 > kdamonds/0/contexts/0/targets/nr_targets # echo $(pidof ) > kdamonds/0/contexts/0/targets/0/pid_target # echo on > kdamonds/0/state Please note that this commit is implementing only the sysfs part stub as abovely mentioned. This commit doesn't implement the special keywords for 'state' files. Following commits will do that. [jiapeng.chong@linux.alibaba.com: fix missing error code in damon_sysfs_attrs_add_dirs()] Link: https://lkml.kernel.org/r/20220302111120.24984-1-jiapeng.chong@linux.alibaba.com Link: https://lkml.kernel.org/r/20220228081314.5770-4-sj@kernel.org Signed-off-by: SeongJae Park Signed-off-by: Jiapeng Chong Cc: David Rientjes Cc: Greg Kroah-Hartman Cc: Jonathan Corbet Cc: Shuah Khan Cc: Xin Hao Signed-off-by: Andrew Morton --- mm/damon/Kconfig | 7 mm/damon/Makefile | 1 mm/damon/sysfs.c | 1084 ++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 1092 insertions(+) --- a/mm/damon/Kconfig~mm-damon-implement-a-minimal-stub-for-sysfs-based-damon-interface +++ a/mm/damon/Kconfig @@ -52,6 +52,13 @@ config DAMON_VADDR_KUNIT_TEST If unsure, say N. +config DAMON_SYSFS + bool "DAMON sysfs interface" + depends on DAMON && SYSFS + help + This builds the sysfs interface for DAMON. The user space can use + the interface for arbitrary data access monitoring. + config DAMON_DBGFS bool "DAMON debugfs interface" depends on DAMON_VADDR && DAMON_PADDR && DEBUG_FS --- a/mm/damon/Makefile~mm-damon-implement-a-minimal-stub-for-sysfs-based-damon-interface +++ a/mm/damon/Makefile @@ -3,5 +3,6 @@ obj-y := core.o obj-$(CONFIG_DAMON_VADDR) += ops-common.o vaddr.o obj-$(CONFIG_DAMON_PADDR) += ops-common.o paddr.o +obj-$(CONFIG_DAMON_SYSFS) += sysfs.o obj-$(CONFIG_DAMON_DBGFS) += dbgfs.o obj-$(CONFIG_DAMON_RECLAIM) += reclaim.o --- /dev/null +++ a/mm/damon/sysfs.c @@ -0,0 +1,1084 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * DAMON sysfs Interface + * + * Copyright (c) 2022 SeongJae Park + */ + +#include +#include +#include +#include +#include + +static DEFINE_MUTEX(damon_sysfs_lock); + +/* + * unsigned long range directory + */ + +struct damon_sysfs_ul_range { + struct kobject kobj; + unsigned long min; + unsigned long max; +}; + +static struct damon_sysfs_ul_range *damon_sysfs_ul_range_alloc( + unsigned long min, + unsigned long max) +{ + struct damon_sysfs_ul_range *range = kmalloc(sizeof(*range), + GFP_KERNEL); + + if (!range) + return NULL; + range->kobj = (struct kobject){}; + range->min = min; + range->max = max; + + return range; +} + +static ssize_t min_show(struct kobject *kobj, struct kobj_attribute *attr, + char *buf) +{ + struct damon_sysfs_ul_range *range = container_of(kobj, + struct damon_sysfs_ul_range, kobj); + + return sysfs_emit(buf, "%lu\n", range->min); +} + +static ssize_t min_store(struct kobject *kobj, struct kobj_attribute *attr, + const char *buf, size_t count) +{ + struct damon_sysfs_ul_range *range = container_of(kobj, + struct damon_sysfs_ul_range, kobj); + unsigned long min; + int err; + + err = kstrtoul(buf, 0, &min); + if (err) + return -EINVAL; + + range->min = min; + return count; +} + +static ssize_t max_show(struct kobject *kobj, struct kobj_attribute *attr, + char *buf) +{ + struct damon_sysfs_ul_range *range = container_of(kobj, + struct damon_sysfs_ul_range, kobj); + + return sysfs_emit(buf, "%lu\n", range->max); +} + +static ssize_t max_store(struct kobject *kobj, struct kobj_attribute *attr, + const char *buf, size_t count) +{ + struct damon_sysfs_ul_range *range = container_of(kobj, + struct damon_sysfs_ul_range, kobj); + unsigned long max; + int err; + + err = kstrtoul(buf, 0, &max); + if (err) + return -EINVAL; + + range->max = max; + return count; +} + +static void damon_sysfs_ul_range_release(struct kobject *kobj) +{ + kfree(container_of(kobj, struct damon_sysfs_ul_range, kobj)); +} + +static struct kobj_attribute damon_sysfs_ul_range_min_attr = + __ATTR_RW_MODE(min, 0600); + +static struct kobj_attribute damon_sysfs_ul_range_max_attr = + __ATTR_RW_MODE(max, 0600); + +static struct attribute *damon_sysfs_ul_range_attrs[] = { + &damon_sysfs_ul_range_min_attr.attr, + &damon_sysfs_ul_range_max_attr.attr, + NULL, +}; +ATTRIBUTE_GROUPS(damon_sysfs_ul_range); + +static struct kobj_type damon_sysfs_ul_range_ktype = { + .release = damon_sysfs_ul_range_release, + .sysfs_ops = &kobj_sysfs_ops, + .default_groups = damon_sysfs_ul_range_groups, +}; + +/* + * target directory + */ + +struct damon_sysfs_target { + struct kobject kobj; + int pid; +}; + +static struct damon_sysfs_target *damon_sysfs_target_alloc(void) +{ + return kzalloc(sizeof(struct damon_sysfs_target), GFP_KERNEL); +} + +static ssize_t pid_target_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + struct damon_sysfs_target *target = container_of(kobj, + struct damon_sysfs_target, kobj); + + return sysfs_emit(buf, "%d\n", target->pid); +} + +static ssize_t pid_target_store(struct kobject *kobj, + struct kobj_attribute *attr, const char *buf, size_t count) +{ + struct damon_sysfs_target *target = container_of(kobj, + struct damon_sysfs_target, kobj); + int err = kstrtoint(buf, 0, &target->pid); + + if (err) + return -EINVAL; + return count; +} + +static void damon_sysfs_target_release(struct kobject *kobj) +{ + kfree(container_of(kobj, struct damon_sysfs_target, kobj)); +} + +static struct kobj_attribute damon_sysfs_target_pid_attr = + __ATTR_RW_MODE(pid_target, 0600); + +static struct attribute *damon_sysfs_target_attrs[] = { + &damon_sysfs_target_pid_attr.attr, + NULL, +}; +ATTRIBUTE_GROUPS(damon_sysfs_target); + +static struct kobj_type damon_sysfs_target_ktype = { + .release = damon_sysfs_target_release, + .sysfs_ops = &kobj_sysfs_ops, + .default_groups = damon_sysfs_target_groups, +}; + +/* + * targets directory + */ + +struct damon_sysfs_targets { + struct kobject kobj; + struct damon_sysfs_target **targets_arr; + int nr; +}; + +static struct damon_sysfs_targets *damon_sysfs_targets_alloc(void) +{ + return kzalloc(sizeof(struct damon_sysfs_targets), GFP_KERNEL); +} + +static void damon_sysfs_targets_rm_dirs(struct damon_sysfs_targets *targets) +{ + struct damon_sysfs_target **targets_arr = targets->targets_arr; + int i; + + for (i = 0; i < targets->nr; i++) + kobject_put(&targets_arr[i]->kobj); + targets->nr = 0; + kfree(targets_arr); + targets->targets_arr = NULL; +} + +static int damon_sysfs_targets_add_dirs(struct damon_sysfs_targets *targets, + int nr_targets) +{ + struct damon_sysfs_target **targets_arr, *target; + int err, i; + + damon_sysfs_targets_rm_dirs(targets); + if (!nr_targets) + return 0; + + targets_arr = kmalloc_array(nr_targets, sizeof(*targets_arr), + GFP_KERNEL | __GFP_NOWARN); + if (!targets_arr) + return -ENOMEM; + targets->targets_arr = targets_arr; + + for (i = 0; i < nr_targets; i++) { + target = damon_sysfs_target_alloc(); + if (!target) { + damon_sysfs_targets_rm_dirs(targets); + return -ENOMEM; + } + + err = kobject_init_and_add(&target->kobj, + &damon_sysfs_target_ktype, &targets->kobj, + "%d", i); + if (err) + goto out; + + targets_arr[i] = target; + targets->nr++; + } + return 0; + +out: + damon_sysfs_targets_rm_dirs(targets); + kobject_put(&target->kobj); + return err; +} + +static ssize_t nr_targets_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + struct damon_sysfs_targets *targets = container_of(kobj, + struct damon_sysfs_targets, kobj); + + return sysfs_emit(buf, "%d\n", targets->nr); +} + +static ssize_t nr_targets_store(struct kobject *kobj, + struct kobj_attribute *attr, const char *buf, size_t count) +{ + struct damon_sysfs_targets *targets = container_of(kobj, + struct damon_sysfs_targets, kobj); + int nr, err = kstrtoint(buf, 0, &nr); + + if (err) + return err; + if (nr < 0) + return -EINVAL; + + if (!mutex_trylock(&damon_sysfs_lock)) + return -EBUSY; + err = damon_sysfs_targets_add_dirs(targets, nr); + mutex_unlock(&damon_sysfs_lock); + if (err) + return err; + + return count; +} + +static void damon_sysfs_targets_release(struct kobject *kobj) +{ + kfree(container_of(kobj, struct damon_sysfs_targets, kobj)); +} + +static struct kobj_attribute damon_sysfs_targets_nr_attr = + __ATTR_RW_MODE(nr_targets, 0600); + +static struct attribute *damon_sysfs_targets_attrs[] = { + &damon_sysfs_targets_nr_attr.attr, + NULL, +}; +ATTRIBUTE_GROUPS(damon_sysfs_targets); + +static struct kobj_type damon_sysfs_targets_ktype = { + .release = damon_sysfs_targets_release, + .sysfs_ops = &kobj_sysfs_ops, + .default_groups = damon_sysfs_targets_groups, +}; + +/* + * intervals directory + */ + +struct damon_sysfs_intervals { + struct kobject kobj; + unsigned long sample_us; + unsigned long aggr_us; + unsigned long update_us; +}; + +static struct damon_sysfs_intervals *damon_sysfs_intervals_alloc( + unsigned long sample_us, unsigned long aggr_us, + unsigned long update_us) +{ + struct damon_sysfs_intervals *intervals = kmalloc(sizeof(*intervals), + GFP_KERNEL); + + if (!intervals) + return NULL; + + intervals->kobj = (struct kobject){}; + intervals->sample_us = sample_us; + intervals->aggr_us = aggr_us; + intervals->update_us = update_us; + return intervals; +} + +static ssize_t sample_us_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + struct damon_sysfs_intervals *intervals = container_of(kobj, + struct damon_sysfs_intervals, kobj); + + return sysfs_emit(buf, "%lu\n", intervals->sample_us); +} + +static ssize_t sample_us_store(struct kobject *kobj, + struct kobj_attribute *attr, const char *buf, size_t count) +{ + struct damon_sysfs_intervals *intervals = container_of(kobj, + struct damon_sysfs_intervals, kobj); + unsigned long us; + int err = kstrtoul(buf, 0, &us); + + if (err) + return -EINVAL; + + intervals->sample_us = us; + return count; +} + +static ssize_t aggr_us_show(struct kobject *kobj, struct kobj_attribute *attr, + char *buf) +{ + struct damon_sysfs_intervals *intervals = container_of(kobj, + struct damon_sysfs_intervals, kobj); + + return sysfs_emit(buf, "%lu\n", intervals->aggr_us); +} + +static ssize_t aggr_us_store(struct kobject *kobj, struct kobj_attribute *attr, + const char *buf, size_t count) +{ + struct damon_sysfs_intervals *intervals = container_of(kobj, + struct damon_sysfs_intervals, kobj); + unsigned long us; + int err = kstrtoul(buf, 0, &us); + + if (err) + return -EINVAL; + + intervals->aggr_us = us; + return count; +} + +static ssize_t update_us_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + struct damon_sysfs_intervals *intervals = container_of(kobj, + struct damon_sysfs_intervals, kobj); + + return sysfs_emit(buf, "%lu\n", intervals->update_us); +} + +static ssize_t update_us_store(struct kobject *kobj, + struct kobj_attribute *attr, const char *buf, size_t count) +{ + struct damon_sysfs_intervals *intervals = container_of(kobj, + struct damon_sysfs_intervals, kobj); + unsigned long us; + int err = kstrtoul(buf, 0, &us); + + if (err) + return -EINVAL; + + intervals->update_us = us; + return count; +} + +static void damon_sysfs_intervals_release(struct kobject *kobj) +{ + kfree(container_of(kobj, struct damon_sysfs_intervals, kobj)); +} + +static struct kobj_attribute damon_sysfs_intervals_sample_us_attr = + __ATTR_RW_MODE(sample_us, 0600); + +static struct kobj_attribute damon_sysfs_intervals_aggr_us_attr = + __ATTR_RW_MODE(aggr_us, 0600); + +static struct kobj_attribute damon_sysfs_intervals_update_us_attr = + __ATTR_RW_MODE(update_us, 0600); + +static struct attribute *damon_sysfs_intervals_attrs[] = { + &damon_sysfs_intervals_sample_us_attr.attr, + &damon_sysfs_intervals_aggr_us_attr.attr, + &damon_sysfs_intervals_update_us_attr.attr, + NULL, +}; +ATTRIBUTE_GROUPS(damon_sysfs_intervals); + +static struct kobj_type damon_sysfs_intervals_ktype = { + .release = damon_sysfs_intervals_release, + .sysfs_ops = &kobj_sysfs_ops, + .default_groups = damon_sysfs_intervals_groups, +}; + +/* + * monitoring_attrs directory + */ + +struct damon_sysfs_attrs { + struct kobject kobj; + struct damon_sysfs_intervals *intervals; + struct damon_sysfs_ul_range *nr_regions_range; +}; + +static struct damon_sysfs_attrs *damon_sysfs_attrs_alloc(void) +{ + struct damon_sysfs_attrs *attrs = kmalloc(sizeof(*attrs), GFP_KERNEL); + + if (!attrs) + return NULL; + attrs->kobj = (struct kobject){}; + return attrs; +} + +static int damon_sysfs_attrs_add_dirs(struct damon_sysfs_attrs *attrs) +{ + struct damon_sysfs_intervals *intervals; + struct damon_sysfs_ul_range *nr_regions_range; + int err; + + intervals = damon_sysfs_intervals_alloc(5000, 100000, 60000000); + if (!intervals) + return -ENOMEM; + + err = kobject_init_and_add(&intervals->kobj, + &damon_sysfs_intervals_ktype, &attrs->kobj, + "intervals"); + if (err) + goto put_intervals_out; + attrs->intervals = intervals; + + nr_regions_range = damon_sysfs_ul_range_alloc(10, 1000); + if (!nr_regions_range) { + err = -ENOMEM; + goto put_intervals_out; + } + + err = kobject_init_and_add(&nr_regions_range->kobj, + &damon_sysfs_ul_range_ktype, &attrs->kobj, + "nr_regions"); + if (err) + goto put_nr_regions_intervals_out; + attrs->nr_regions_range = nr_regions_range; + return 0; + +put_nr_regions_intervals_out: + kobject_put(&nr_regions_range->kobj); + attrs->nr_regions_range = NULL; +put_intervals_out: + kobject_put(&intervals->kobj); + attrs->intervals = NULL; + return err; +} + +static void damon_sysfs_attrs_rm_dirs(struct damon_sysfs_attrs *attrs) +{ + kobject_put(&attrs->nr_regions_range->kobj); + kobject_put(&attrs->intervals->kobj); +} + +static void damon_sysfs_attrs_release(struct kobject *kobj) +{ + kfree(container_of(kobj, struct damon_sysfs_attrs, kobj)); +} + +static struct attribute *damon_sysfs_attrs_attrs[] = { + NULL, +}; +ATTRIBUTE_GROUPS(damon_sysfs_attrs); + +static struct kobj_type damon_sysfs_attrs_ktype = { + .release = damon_sysfs_attrs_release, + .sysfs_ops = &kobj_sysfs_ops, + .default_groups = damon_sysfs_attrs_groups, +}; + +/* + * context directory + */ + +/* This should match with enum damon_ops_id */ +static const char * const damon_sysfs_ops_strs[] = { + "vaddr", + "paddr", +}; + +struct damon_sysfs_context { + struct kobject kobj; + enum damon_ops_id ops_id; + struct damon_sysfs_attrs *attrs; + struct damon_sysfs_targets *targets; +}; + +static struct damon_sysfs_context *damon_sysfs_context_alloc( + enum damon_ops_id ops_id) +{ + struct damon_sysfs_context *context = kmalloc(sizeof(*context), + GFP_KERNEL); + + if (!context) + return NULL; + context->kobj = (struct kobject){}; + context->ops_id = ops_id; + return context; +} + +static int damon_sysfs_context_set_attrs(struct damon_sysfs_context *context) +{ + struct damon_sysfs_attrs *attrs = damon_sysfs_attrs_alloc(); + int err; + + if (!attrs) + return -ENOMEM; + err = kobject_init_and_add(&attrs->kobj, &damon_sysfs_attrs_ktype, + &context->kobj, "monitoring_attrs"); + if (err) + goto out; + err = damon_sysfs_attrs_add_dirs(attrs); + if (err) + goto out; + context->attrs = attrs; + return 0; + +out: + kobject_put(&attrs->kobj); + return err; +} + +static int damon_sysfs_context_set_targets(struct damon_sysfs_context *context) +{ + struct damon_sysfs_targets *targets = damon_sysfs_targets_alloc(); + int err; + + if (!targets) + return -ENOMEM; + err = kobject_init_and_add(&targets->kobj, &damon_sysfs_targets_ktype, + &context->kobj, "targets"); + if (err) { + kobject_put(&targets->kobj); + return err; + } + context->targets = targets; + return 0; +} + +static int damon_sysfs_context_add_dirs(struct damon_sysfs_context *context) +{ + int err; + + err = damon_sysfs_context_set_attrs(context); + if (err) + return err; + + err = damon_sysfs_context_set_targets(context); + if (err) + goto put_attrs_out; + return 0; + +put_attrs_out: + kobject_put(&context->attrs->kobj); + context->attrs = NULL; + return err; +} + +static void damon_sysfs_context_rm_dirs(struct damon_sysfs_context *context) +{ + damon_sysfs_attrs_rm_dirs(context->attrs); + kobject_put(&context->attrs->kobj); + damon_sysfs_targets_rm_dirs(context->targets); + kobject_put(&context->targets->kobj); +} + +static ssize_t operations_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + struct damon_sysfs_context *context = container_of(kobj, + struct damon_sysfs_context, kobj); + + return sysfs_emit(buf, "%s\n", damon_sysfs_ops_strs[context->ops_id]); +} + +static ssize_t operations_store(struct kobject *kobj, + struct kobj_attribute *attr, const char *buf, size_t count) +{ + struct damon_sysfs_context *context = container_of(kobj, + struct damon_sysfs_context, kobj); + enum damon_ops_id id; + + for (id = 0; id < NR_DAMON_OPS; id++) { + if (sysfs_streq(buf, damon_sysfs_ops_strs[id])) { + /* Support only vaddr */ + if (id != DAMON_OPS_VADDR) + return -EINVAL; + context->ops_id = id; + return count; + } + } + return -EINVAL; +} + +static void damon_sysfs_context_release(struct kobject *kobj) +{ + kfree(container_of(kobj, struct damon_sysfs_context, kobj)); +} + +static struct kobj_attribute damon_sysfs_context_operations_attr = + __ATTR_RW_MODE(operations, 0600); + +static struct attribute *damon_sysfs_context_attrs[] = { + &damon_sysfs_context_operations_attr.attr, + NULL, +}; +ATTRIBUTE_GROUPS(damon_sysfs_context); + +static struct kobj_type damon_sysfs_context_ktype = { + .release = damon_sysfs_context_release, + .sysfs_ops = &kobj_sysfs_ops, + .default_groups = damon_sysfs_context_groups, +}; + +/* + * contexts directory + */ + +struct damon_sysfs_contexts { + struct kobject kobj; + struct damon_sysfs_context **contexts_arr; + int nr; +}; + +static struct damon_sysfs_contexts *damon_sysfs_contexts_alloc(void) +{ + return kzalloc(sizeof(struct damon_sysfs_contexts), GFP_KERNEL); +} + +static void damon_sysfs_contexts_rm_dirs(struct damon_sysfs_contexts *contexts) +{ + struct damon_sysfs_context **contexts_arr = contexts->contexts_arr; + int i; + + for (i = 0; i < contexts->nr; i++) { + damon_sysfs_context_rm_dirs(contexts_arr[i]); + kobject_put(&contexts_arr[i]->kobj); + } + contexts->nr = 0; + kfree(contexts_arr); + contexts->contexts_arr = NULL; +} + +static int damon_sysfs_contexts_add_dirs(struct damon_sysfs_contexts *contexts, + int nr_contexts) +{ + struct damon_sysfs_context **contexts_arr, *context; + int err, i; + + damon_sysfs_contexts_rm_dirs(contexts); + if (!nr_contexts) + return 0; + + contexts_arr = kmalloc_array(nr_contexts, sizeof(*contexts_arr), + GFP_KERNEL | __GFP_NOWARN); + if (!contexts_arr) + return -ENOMEM; + contexts->contexts_arr = contexts_arr; + + for (i = 0; i < nr_contexts; i++) { + context = damon_sysfs_context_alloc(DAMON_OPS_VADDR); + if (!context) { + damon_sysfs_contexts_rm_dirs(contexts); + return -ENOMEM; + } + + err = kobject_init_and_add(&context->kobj, + &damon_sysfs_context_ktype, &contexts->kobj, + "%d", i); + if (err) + goto out; + + err = damon_sysfs_context_add_dirs(context); + if (err) + goto out; + + contexts_arr[i] = context; + contexts->nr++; + } + return 0; + +out: + damon_sysfs_contexts_rm_dirs(contexts); + kobject_put(&context->kobj); + return err; +} + +static ssize_t nr_contexts_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + struct damon_sysfs_contexts *contexts = container_of(kobj, + struct damon_sysfs_contexts, kobj); + + return sysfs_emit(buf, "%d\n", contexts->nr); +} + +static ssize_t nr_contexts_store(struct kobject *kobj, + struct kobj_attribute *attr, const char *buf, size_t count) +{ + struct damon_sysfs_contexts *contexts = container_of(kobj, + struct damon_sysfs_contexts, kobj); + int nr, err; + + err = kstrtoint(buf, 0, &nr); + if (err) + return err; + /* TODO: support multiple contexts per kdamond */ + if (nr < 0 || 1 < nr) + return -EINVAL; + + if (!mutex_trylock(&damon_sysfs_lock)) + return -EBUSY; + err = damon_sysfs_contexts_add_dirs(contexts, nr); + mutex_unlock(&damon_sysfs_lock); + if (err) + return err; + + return count; +} + +static void damon_sysfs_contexts_release(struct kobject *kobj) +{ + kfree(container_of(kobj, struct damon_sysfs_contexts, kobj)); +} + +static struct kobj_attribute damon_sysfs_contexts_nr_attr + = __ATTR_RW_MODE(nr_contexts, 0600); + +static struct attribute *damon_sysfs_contexts_attrs[] = { + &damon_sysfs_contexts_nr_attr.attr, + NULL, +}; +ATTRIBUTE_GROUPS(damon_sysfs_contexts); + +static struct kobj_type damon_sysfs_contexts_ktype = { + .release = damon_sysfs_contexts_release, + .sysfs_ops = &kobj_sysfs_ops, + .default_groups = damon_sysfs_contexts_groups, +}; + +/* + * kdamond directory + */ + +struct damon_sysfs_kdamond { + struct kobject kobj; + struct damon_sysfs_contexts *contexts; + struct damon_ctx *damon_ctx; +}; + +static struct damon_sysfs_kdamond *damon_sysfs_kdamond_alloc(void) +{ + return kzalloc(sizeof(struct damon_sysfs_kdamond), GFP_KERNEL); +} + +static int damon_sysfs_kdamond_add_dirs(struct damon_sysfs_kdamond *kdamond) +{ + struct damon_sysfs_contexts *contexts; + int err; + + contexts = damon_sysfs_contexts_alloc(); + if (!contexts) + return -ENOMEM; + + err = kobject_init_and_add(&contexts->kobj, + &damon_sysfs_contexts_ktype, &kdamond->kobj, + "contexts"); + if (err) { + kobject_put(&contexts->kobj); + return err; + } + kdamond->contexts = contexts; + + return err; +} + +static void damon_sysfs_kdamond_rm_dirs(struct damon_sysfs_kdamond *kdamond) +{ + damon_sysfs_contexts_rm_dirs(kdamond->contexts); + kobject_put(&kdamond->contexts->kobj); +} + +static ssize_t state_show(struct kobject *kobj, struct kobj_attribute *attr, + char *buf) +{ + return -EINVAL; +} + +static ssize_t state_store(struct kobject *kobj, struct kobj_attribute *attr, + const char *buf, size_t count) +{ + return -EINVAL; +} + +static ssize_t pid_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + return -EINVAL; +} + +static void damon_sysfs_kdamond_release(struct kobject *kobj) +{ + struct damon_sysfs_kdamond *kdamond = container_of(kobj, + struct damon_sysfs_kdamond, kobj); + + if (kdamond->damon_ctx) + damon_destroy_ctx(kdamond->damon_ctx); + kfree(container_of(kobj, struct damon_sysfs_kdamond, kobj)); +} + +static struct kobj_attribute damon_sysfs_kdamond_state_attr = + __ATTR_RW_MODE(state, 0600); + +static struct kobj_attribute damon_sysfs_kdamond_pid_attr = + __ATTR_RO_MODE(pid, 0400); + +static struct attribute *damon_sysfs_kdamond_attrs[] = { + &damon_sysfs_kdamond_state_attr.attr, + &damon_sysfs_kdamond_pid_attr.attr, + NULL, +}; +ATTRIBUTE_GROUPS(damon_sysfs_kdamond); + +static struct kobj_type damon_sysfs_kdamond_ktype = { + .release = damon_sysfs_kdamond_release, + .sysfs_ops = &kobj_sysfs_ops, + .default_groups = damon_sysfs_kdamond_groups, +}; + +/* + * kdamonds directory + */ + +struct damon_sysfs_kdamonds { + struct kobject kobj; + struct damon_sysfs_kdamond **kdamonds_arr; + int nr; +}; + +static struct damon_sysfs_kdamonds *damon_sysfs_kdamonds_alloc(void) +{ + return kzalloc(sizeof(struct damon_sysfs_kdamonds), GFP_KERNEL); +} + +static void damon_sysfs_kdamonds_rm_dirs(struct damon_sysfs_kdamonds *kdamonds) +{ + struct damon_sysfs_kdamond **kdamonds_arr = kdamonds->kdamonds_arr; + int i; + + for (i = 0; i < kdamonds->nr; i++) { + damon_sysfs_kdamond_rm_dirs(kdamonds_arr[i]); + kobject_put(&kdamonds_arr[i]->kobj); + } + kdamonds->nr = 0; + kfree(kdamonds_arr); + kdamonds->kdamonds_arr = NULL; +} + +static int damon_sysfs_nr_running_ctxs(struct damon_sysfs_kdamond **kdamonds, + int nr_kdamonds) +{ + int nr_running_ctxs = 0; + int i; + + for (i = 0; i < nr_kdamonds; i++) { + struct damon_ctx *ctx = kdamonds[i]->damon_ctx; + + if (!ctx) + continue; + mutex_lock(&ctx->kdamond_lock); + if (ctx->kdamond) + nr_running_ctxs++; + mutex_unlock(&ctx->kdamond_lock); + } + return nr_running_ctxs; +} + +static int damon_sysfs_kdamonds_add_dirs(struct damon_sysfs_kdamonds *kdamonds, + int nr_kdamonds) +{ + struct damon_sysfs_kdamond **kdamonds_arr, *kdamond; + int err, i; + + if (damon_sysfs_nr_running_ctxs(kdamonds->kdamonds_arr, kdamonds->nr)) + return -EBUSY; + + damon_sysfs_kdamonds_rm_dirs(kdamonds); + if (!nr_kdamonds) + return 0; + + kdamonds_arr = kmalloc_array(nr_kdamonds, sizeof(*kdamonds_arr), + GFP_KERNEL | __GFP_NOWARN); + if (!kdamonds_arr) + return -ENOMEM; + kdamonds->kdamonds_arr = kdamonds_arr; + + for (i = 0; i < nr_kdamonds; i++) { + kdamond = damon_sysfs_kdamond_alloc(); + if (!kdamond) { + damon_sysfs_kdamonds_rm_dirs(kdamonds); + return -ENOMEM; + } + + err = kobject_init_and_add(&kdamond->kobj, + &damon_sysfs_kdamond_ktype, &kdamonds->kobj, + "%d", i); + if (err) + goto out; + + err = damon_sysfs_kdamond_add_dirs(kdamond); + if (err) + goto out; + + kdamonds_arr[i] = kdamond; + kdamonds->nr++; + } + return 0; + +out: + damon_sysfs_kdamonds_rm_dirs(kdamonds); + kobject_put(&kdamond->kobj); + return err; +} + +static ssize_t nr_kdamonds_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + struct damon_sysfs_kdamonds *kdamonds = container_of(kobj, + struct damon_sysfs_kdamonds, kobj); + + return sysfs_emit(buf, "%d\n", kdamonds->nr); +} + +static ssize_t nr_kdamonds_store(struct kobject *kobj, + struct kobj_attribute *attr, const char *buf, size_t count) +{ + struct damon_sysfs_kdamonds *kdamonds = container_of(kobj, + struct damon_sysfs_kdamonds, kobj); + int nr, err; + + err = kstrtoint(buf, 0, &nr); + if (err) + return err; + if (nr < 0) + return -EINVAL; + + if (!mutex_trylock(&damon_sysfs_lock)) + return -EBUSY; + err = damon_sysfs_kdamonds_add_dirs(kdamonds, nr); + mutex_unlock(&damon_sysfs_lock); + if (err) + return err; + + return count; +} + +static void damon_sysfs_kdamonds_release(struct kobject *kobj) +{ + kfree(container_of(kobj, struct damon_sysfs_kdamonds, kobj)); +} + +static struct kobj_attribute damon_sysfs_kdamonds_nr_attr = + __ATTR_RW_MODE(nr_kdamonds, 0600); + +static struct attribute *damon_sysfs_kdamonds_attrs[] = { + &damon_sysfs_kdamonds_nr_attr.attr, + NULL, +}; +ATTRIBUTE_GROUPS(damon_sysfs_kdamonds); + +static struct kobj_type damon_sysfs_kdamonds_ktype = { + .release = damon_sysfs_kdamonds_release, + .sysfs_ops = &kobj_sysfs_ops, + .default_groups = damon_sysfs_kdamonds_groups, +}; + +/* + * damon user interface directory + */ + +struct damon_sysfs_ui_dir { + struct kobject kobj; + struct damon_sysfs_kdamonds *kdamonds; +}; + +static struct damon_sysfs_ui_dir *damon_sysfs_ui_dir_alloc(void) +{ + return kzalloc(sizeof(struct damon_sysfs_ui_dir), GFP_KERNEL); +} + +static int damon_sysfs_ui_dir_add_dirs(struct damon_sysfs_ui_dir *ui_dir) +{ + struct damon_sysfs_kdamonds *kdamonds; + int err; + + kdamonds = damon_sysfs_kdamonds_alloc(); + if (!kdamonds) + return -ENOMEM; + + err = kobject_init_and_add(&kdamonds->kobj, + &damon_sysfs_kdamonds_ktype, &ui_dir->kobj, + "kdamonds"); + if (err) { + kobject_put(&kdamonds->kobj); + return err; + } + ui_dir->kdamonds = kdamonds; + return err; +} + +static void damon_sysfs_ui_dir_release(struct kobject *kobj) +{ + kfree(container_of(kobj, struct damon_sysfs_ui_dir, kobj)); +} + +static struct attribute *damon_sysfs_ui_dir_attrs[] = { + NULL, +}; +ATTRIBUTE_GROUPS(damon_sysfs_ui_dir); + +static struct kobj_type damon_sysfs_ui_dir_ktype = { + .release = damon_sysfs_ui_dir_release, + .sysfs_ops = &kobj_sysfs_ops, + .default_groups = damon_sysfs_ui_dir_groups, +}; + +static int __init damon_sysfs_init(void) +{ + struct kobject *damon_sysfs_root; + struct damon_sysfs_ui_dir *admin; + int err; + + damon_sysfs_root = kobject_create_and_add("damon", mm_kobj); + if (!damon_sysfs_root) + return -ENOMEM; + + admin = damon_sysfs_ui_dir_alloc(); + if (!admin) { + kobject_put(damon_sysfs_root); + return -ENOMEM; + } + err = kobject_init_and_add(&admin->kobj, &damon_sysfs_ui_dir_ktype, + damon_sysfs_root, "admin"); + if (err) + goto out; + err = damon_sysfs_ui_dir_add_dirs(admin); + if (err) + goto out; + return 0; + +out: + kobject_put(&admin->kobj); + kobject_put(damon_sysfs_root); + return err; +} +subsys_initcall(damon_sysfs_init); From patchwork Tue Mar 22 21:49:30 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789300 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C3615C433F5 for ; Tue, 22 Mar 2022 21:49:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5832A6B024C; Tue, 22 Mar 2022 17:49:34 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 531076B024D; Tue, 22 Mar 2022 17:49:34 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3F9ED6B024E; Tue, 22 Mar 2022 17:49:34 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.a.hostedemail.com [64.99.140.24]) by kanga.kvack.org (Postfix) with ESMTP id 2AAAA6B024C for ; Tue, 22 Mar 2022 17:49:34 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay12.hostedemail.com (Postfix) with ESMTP id DCB9E121951 for ; Tue, 22 Mar 2022 21:49:33 +0000 (UTC) X-FDA: 79273364226.12.F6940E3 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf09.hostedemail.com (Postfix) with ESMTP id 5774414000D for ; Tue, 22 Mar 2022 21:49:33 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id F1D8BB81D59; Tue, 22 Mar 2022 21:49:31 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 94629C340F3; Tue, 22 Mar 2022 21:49:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985771; bh=xchtpzypoJhLrFD0F18zrUgkVb6ZuqxFpl7RFND0jfs=; h=Date:To:From:In-Reply-To:Subject:From; b=cfJHzZgKoqtYeyV8dRgpa8kOHYyjVmOpjT4X8jSRC5+wS+VgsLa2nnxVvRrQ8yTTv rLUZuzCn6Z8xSpcqqSPqIs3oGgYXr/Ay/WwSjREVo5GbxUxgK2gAdPEfRG+tKD9jr0 yHXoaddQz0bwaFI+JNWYoEBo45dVtGvKXaeXQZz0= Date: Tue, 22 Mar 2022 14:49:30 -0700 To: xhao@linux.alibaba.com,skhan@linuxfoundation.org,rientjes@google.com,gregkh@linuxfoundation.org,corbet@lwn.net,sj@kernel.org,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 217/227] mm/damon/sysfs: link DAMON for virtual address spaces monitoring Message-Id: <20220322214931.94629C340F3@smtp.kernel.org> X-Stat-Signature: 59eb6ubnoczttnr9a4bhns5wf93s6ihk X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 5774414000D Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=cfJHzZgK; dmarc=none; spf=pass (imf09.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Rspam-User: X-HE-Tag: 1647985773-143880 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: SeongJae Park Subject: mm/damon/sysfs: link DAMON for virtual address spaces monitoring This commit links the DAMON sysfs interface to DAMON so that users can control DAMON via the interface. In detail, this commit makes writing 'on' to 'state' file constructs DAMON contexts based on values that users have written to relevant sysfs files and start the context. It supports only virtual address spaces monitoring at the moment, though. The files hierarchy of DAMON sysfs interface after this commit is shown below. In the below figure, parents-children relations are represented with indentations, each directory is having ``/`` suffix, and files in each directory are separated by comma (","). /sys/kernel/mm/damon/admin │ kdamonds/nr_kdamonds │ │ 0/state,pid │ │ │ contexts/nr_contexts │ │ │ │ 0/operations │ │ │ │ │ monitoring_attrs/ │ │ │ │ │ │ intervals/sample_us,aggr_us,update_us │ │ │ │ │ │ nr_regions/min,max │ │ │ │ │ targets/nr_targets │ │ │ │ │ │ 0/pid_target │ │ │ │ │ │ ... │ │ │ │ ... │ │ ... The usage is straightforward. Writing a number ('N') to each 'nr_*' file makes directories named '0' to 'N-1'. Users can construct DAMON contexts by writing proper values to the files in the straightforward manner and start each kdamond by writing 'on' to 'kdamonds//state'. Link: https://lkml.kernel.org/r/20220228081314.5770-5-sj@kernel.org Signed-off-by: SeongJae Park Cc: David Rientjes Cc: Greg Kroah-Hartman Cc: Jonathan Corbet Cc: Shuah Khan Cc: Xin Hao Signed-off-by: Andrew Morton --- mm/damon/sysfs.c | 192 ++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 189 insertions(+), 3 deletions(-) --- a/mm/damon/sysfs.c~mm-damon-sysfs-link-damon-for-virtual-address-spaces-monitoring +++ a/mm/damon/sysfs.c @@ -808,22 +808,208 @@ static void damon_sysfs_kdamond_rm_dirs( kobject_put(&kdamond->contexts->kobj); } +static bool damon_sysfs_ctx_running(struct damon_ctx *ctx) +{ + bool running; + + mutex_lock(&ctx->kdamond_lock); + running = ctx->kdamond != NULL; + mutex_unlock(&ctx->kdamond_lock); + return running; +} + static ssize_t state_show(struct kobject *kobj, struct kobj_attribute *attr, char *buf) { - return -EINVAL; + struct damon_sysfs_kdamond *kdamond = container_of(kobj, + struct damon_sysfs_kdamond, kobj); + struct damon_ctx *ctx = kdamond->damon_ctx; + bool running; + + if (!ctx) + running = false; + else + running = damon_sysfs_ctx_running(ctx); + + return sysfs_emit(buf, "%s\n", running ? "on" : "off"); +} + +static int damon_sysfs_set_attrs(struct damon_ctx *ctx, + struct damon_sysfs_attrs *sys_attrs) +{ + struct damon_sysfs_intervals *sys_intervals = sys_attrs->intervals; + struct damon_sysfs_ul_range *sys_nr_regions = + sys_attrs->nr_regions_range; + + return damon_set_attrs(ctx, sys_intervals->sample_us, + sys_intervals->aggr_us, sys_intervals->update_us, + sys_nr_regions->min, sys_nr_regions->max); +} + +static void damon_sysfs_destroy_targets(struct damon_ctx *ctx) +{ + struct damon_target *t, *next; + + damon_for_each_target_safe(t, next, ctx) { + if (ctx->ops.id == DAMON_OPS_VADDR) + put_pid(t->pid); + damon_destroy_target(t); + } +} + +static int damon_sysfs_set_targets(struct damon_ctx *ctx, + struct damon_sysfs_targets *sysfs_targets) +{ + int i; + + for (i = 0; i < sysfs_targets->nr; i++) { + struct damon_sysfs_target *sys_target = + sysfs_targets->targets_arr[i]; + struct damon_target *t = damon_new_target(); + + if (!t) { + damon_sysfs_destroy_targets(ctx); + return -ENOMEM; + } + if (ctx->ops.id == DAMON_OPS_VADDR) { + t->pid = find_get_pid(sys_target->pid); + if (!t->pid) { + damon_sysfs_destroy_targets(ctx); + return -EINVAL; + } + } + damon_add_target(ctx, t); + } + return 0; +} + +static void damon_sysfs_before_terminate(struct damon_ctx *ctx) +{ + struct damon_target *t, *next; + + if (ctx->ops.id != DAMON_OPS_VADDR) + return; + + mutex_lock(&ctx->kdamond_lock); + damon_for_each_target_safe(t, next, ctx) { + put_pid(t->pid); + damon_destroy_target(t); + } + mutex_unlock(&ctx->kdamond_lock); +} + +static struct damon_ctx *damon_sysfs_build_ctx( + struct damon_sysfs_context *sys_ctx) +{ + struct damon_ctx *ctx = damon_new_ctx(); + int err; + + if (!ctx) + return ERR_PTR(-ENOMEM); + + err = damon_select_ops(ctx, sys_ctx->ops_id); + if (err) + goto out; + err = damon_sysfs_set_attrs(ctx, sys_ctx->attrs); + if (err) + goto out; + err = damon_sysfs_set_targets(ctx, sys_ctx->targets); + if (err) + goto out; + + ctx->callback.before_terminate = damon_sysfs_before_terminate; + return ctx; + +out: + damon_destroy_ctx(ctx); + return ERR_PTR(err); +} + +static int damon_sysfs_turn_damon_on(struct damon_sysfs_kdamond *kdamond) +{ + struct damon_ctx *ctx; + int err; + + if (kdamond->damon_ctx && + damon_sysfs_ctx_running(kdamond->damon_ctx)) + return -EBUSY; + /* TODO: support multiple contexts per kdamond */ + if (kdamond->contexts->nr != 1) + return -EINVAL; + + if (kdamond->damon_ctx) + damon_destroy_ctx(kdamond->damon_ctx); + kdamond->damon_ctx = NULL; + + ctx = damon_sysfs_build_ctx(kdamond->contexts->contexts_arr[0]); + if (IS_ERR(ctx)) + return PTR_ERR(ctx); + err = damon_start(&ctx, 1, false); + if (err) { + damon_destroy_ctx(ctx); + return err; + } + kdamond->damon_ctx = ctx; + return err; +} + +static int damon_sysfs_turn_damon_off(struct damon_sysfs_kdamond *kdamond) +{ + if (!kdamond->damon_ctx) + return -EINVAL; + return damon_stop(&kdamond->damon_ctx, 1); + /* + * To allow users show final monitoring results of already turned-off + * DAMON, we free kdamond->damon_ctx in next + * damon_sysfs_turn_damon_on(), or kdamonds_nr_store() + */ } static ssize_t state_store(struct kobject *kobj, struct kobj_attribute *attr, const char *buf, size_t count) { - return -EINVAL; + struct damon_sysfs_kdamond *kdamond = container_of(kobj, + struct damon_sysfs_kdamond, kobj); + ssize_t ret; + + if (!mutex_trylock(&damon_sysfs_lock)) + return -EBUSY; + if (sysfs_streq(buf, "on")) + ret = damon_sysfs_turn_damon_on(kdamond); + else if (sysfs_streq(buf, "off")) + ret = damon_sysfs_turn_damon_off(kdamond); + else + ret = -EINVAL; + mutex_unlock(&damon_sysfs_lock); + if (!ret) + ret = count; + return ret; } static ssize_t pid_show(struct kobject *kobj, struct kobj_attribute *attr, char *buf) { - return -EINVAL; + struct damon_sysfs_kdamond *kdamond = container_of(kobj, + struct damon_sysfs_kdamond, kobj); + struct damon_ctx *ctx; + int pid; + + if (!mutex_trylock(&damon_sysfs_lock)) + return -EBUSY; + ctx = kdamond->damon_ctx; + if (!ctx) { + pid = -1; + goto out; + } + mutex_lock(&ctx->kdamond_lock); + if (!ctx->kdamond) + pid = -1; + else + pid = ctx->kdamond->pid; + mutex_unlock(&ctx->kdamond_lock); +out: + mutex_unlock(&damon_sysfs_lock); + return sysfs_emit(buf, "%d\n", pid); } static void damon_sysfs_kdamond_release(struct kobject *kobj) From patchwork Tue Mar 22 21:49:34 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789302 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8F8B8C433EF for ; Tue, 22 Mar 2022 21:49:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2C21C6B024F; Tue, 22 Mar 2022 17:49:38 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2710A6B0250; Tue, 22 Mar 2022 17:49:38 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0D2526B0251; Tue, 22 Mar 2022 17:49:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.a.hostedemail.com [64.99.140.24]) by kanga.kvack.org (Postfix) with ESMTP id F05126B024F for ; Tue, 22 Mar 2022 17:49:37 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id BC9F323027 for ; Tue, 22 Mar 2022 21:49:37 +0000 (UTC) X-FDA: 79273364394.01.56054D8 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf29.hostedemail.com (Postfix) with ESMTP id 1DA15120012 for ; Tue, 22 Mar 2022 21:49:37 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 0CC14B81D59; Tue, 22 Mar 2022 21:49:36 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9BC12C340EC; Tue, 22 Mar 2022 21:49:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985774; bh=B2B2Gq/oepJcT2O12H2ZX5bTF6CzCRMyQZz3waor/vs=; h=Date:To:From:In-Reply-To:Subject:From; b=g82+kE7uwnfZu9xVyAGNYlYSR3YY+zAfObQyMHnM3IEmUeAjjYYwwHhDff/hYyHvY Imp/nhX6Q/k6oiePthh+e9vqLeShvrsyy56vNk6AiuEIFHtrJgCWGeO2PzTzCPgASW L4OY9Rbheo2ACCc2QDhKnUkyWfON8yrjrx+T/BUg= Date: Tue, 22 Mar 2022 14:49:34 -0700 To: xhao@linux.alibaba.com,skhan@linuxfoundation.org,rientjes@google.com,gregkh@linuxfoundation.org,corbet@lwn.net,sj@kernel.org,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 218/227] mm/damon/sysfs: support the physical address space monitoring Message-Id: <20220322214934.9BC12C340EC@smtp.kernel.org> X-Rspam-User: X-Stat-Signature: q4cjty5zaqi7zt3afp5onpfhagkmayof Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=g82+kE7u; spf=pass (imf29.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 1DA15120012 X-HE-Tag: 1647985777-974986 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: SeongJae Park Subject: mm/damon/sysfs: support the physical address space monitoring This commit makes DAMON sysfs interface supports the physical address space monitoring. Specifically, this commit adds support of the initial monitoring regions set feature by adding 'regions' directory under each target directory and makes context operations file to receive 'paddr' in addition to 'vaddr'. As a result, the files hierarchy becomes as below: /sys/kernel/mm/damon/admin │ kdamonds/nr_kdamonds │ │ 0/state,pid │ │ │ contexts/nr_contexts │ │ │ │ 0/operations │ │ │ │ │ monitoring_attrs/ │ │ │ │ │ │ intervals/sample_us,aggr_us,update_us │ │ │ │ │ │ nr_regions/min,max │ │ │ │ │ targets/nr_targets │ │ │ │ │ │ 0/pid_target │ │ │ │ │ │ │ regions/nr_regions <- NEW DIRECTORY │ │ │ │ │ │ │ │ 0/start,end │ │ │ │ │ │ │ │ ... │ │ │ │ │ │ ... │ │ │ │ ... │ │ ... Link: https://lkml.kernel.org/r/20220228081314.5770-6-sj@kernel.org Signed-off-by: SeongJae Park Cc: David Rientjes Cc: Greg Kroah-Hartman Cc: Jonathan Corbet Cc: Shuah Khan Cc: Xin Hao Signed-off-by: Andrew Morton --- mm/damon/sysfs.c | 276 ++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 271 insertions(+), 5 deletions(-) --- a/mm/damon/sysfs.c~mm-damon-sysfs-support-the-physical-address-space-monitoring +++ a/mm/damon/sysfs.c @@ -114,11 +114,219 @@ static struct kobj_type damon_sysfs_ul_r }; /* + * init region directory + */ + +struct damon_sysfs_region { + struct kobject kobj; + unsigned long start; + unsigned long end; +}; + +static struct damon_sysfs_region *damon_sysfs_region_alloc( + unsigned long start, + unsigned long end) +{ + struct damon_sysfs_region *region = kmalloc(sizeof(*region), + GFP_KERNEL); + + if (!region) + return NULL; + region->kobj = (struct kobject){}; + region->start = start; + region->end = end; + return region; +} + +static ssize_t start_show(struct kobject *kobj, struct kobj_attribute *attr, + char *buf) +{ + struct damon_sysfs_region *region = container_of(kobj, + struct damon_sysfs_region, kobj); + + return sysfs_emit(buf, "%lu\n", region->start); +} + +static ssize_t start_store(struct kobject *kobj, struct kobj_attribute *attr, + const char *buf, size_t count) +{ + struct damon_sysfs_region *region = container_of(kobj, + struct damon_sysfs_region, kobj); + int err = kstrtoul(buf, 0, ®ion->start); + + if (err) + return -EINVAL; + return count; +} + +static ssize_t end_show(struct kobject *kobj, struct kobj_attribute *attr, + char *buf) +{ + struct damon_sysfs_region *region = container_of(kobj, + struct damon_sysfs_region, kobj); + + return sysfs_emit(buf, "%lu\n", region->end); +} + +static ssize_t end_store(struct kobject *kobj, struct kobj_attribute *attr, + const char *buf, size_t count) +{ + struct damon_sysfs_region *region = container_of(kobj, + struct damon_sysfs_region, kobj); + int err = kstrtoul(buf, 0, ®ion->end); + + if (err) + return -EINVAL; + return count; +} + +static void damon_sysfs_region_release(struct kobject *kobj) +{ + kfree(container_of(kobj, struct damon_sysfs_region, kobj)); +} + +static struct kobj_attribute damon_sysfs_region_start_attr = + __ATTR_RW_MODE(start, 0600); + +static struct kobj_attribute damon_sysfs_region_end_attr = + __ATTR_RW_MODE(end, 0600); + +static struct attribute *damon_sysfs_region_attrs[] = { + &damon_sysfs_region_start_attr.attr, + &damon_sysfs_region_end_attr.attr, + NULL, +}; +ATTRIBUTE_GROUPS(damon_sysfs_region); + +static struct kobj_type damon_sysfs_region_ktype = { + .release = damon_sysfs_region_release, + .sysfs_ops = &kobj_sysfs_ops, + .default_groups = damon_sysfs_region_groups, +}; + +/* + * init_regions directory + */ + +struct damon_sysfs_regions { + struct kobject kobj; + struct damon_sysfs_region **regions_arr; + int nr; +}; + +static struct damon_sysfs_regions *damon_sysfs_regions_alloc(void) +{ + return kzalloc(sizeof(struct damon_sysfs_regions), GFP_KERNEL); +} + +static void damon_sysfs_regions_rm_dirs(struct damon_sysfs_regions *regions) +{ + struct damon_sysfs_region **regions_arr = regions->regions_arr; + int i; + + for (i = 0; i < regions->nr; i++) + kobject_put(®ions_arr[i]->kobj); + regions->nr = 0; + kfree(regions_arr); + regions->regions_arr = NULL; +} + +static int damon_sysfs_regions_add_dirs(struct damon_sysfs_regions *regions, + int nr_regions) +{ + struct damon_sysfs_region **regions_arr, *region; + int err, i; + + damon_sysfs_regions_rm_dirs(regions); + if (!nr_regions) + return 0; + + regions_arr = kmalloc_array(nr_regions, sizeof(*regions_arr), + GFP_KERNEL | __GFP_NOWARN); + if (!regions_arr) + return -ENOMEM; + regions->regions_arr = regions_arr; + + for (i = 0; i < nr_regions; i++) { + region = damon_sysfs_region_alloc(0, 0); + if (!region) { + damon_sysfs_regions_rm_dirs(regions); + return -ENOMEM; + } + + err = kobject_init_and_add(®ion->kobj, + &damon_sysfs_region_ktype, ®ions->kobj, + "%d", i); + if (err) { + kobject_put(®ion->kobj); + damon_sysfs_regions_rm_dirs(regions); + return err; + } + + regions_arr[i] = region; + regions->nr++; + } + return 0; +} + +static ssize_t nr_regions_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + struct damon_sysfs_regions *regions = container_of(kobj, + struct damon_sysfs_regions, kobj); + + return sysfs_emit(buf, "%d\n", regions->nr); +} + +static ssize_t nr_regions_store(struct kobject *kobj, + struct kobj_attribute *attr, const char *buf, size_t count) +{ + struct damon_sysfs_regions *regions = container_of(kobj, + struct damon_sysfs_regions, kobj); + int nr, err = kstrtoint(buf, 0, &nr); + + if (err) + return err; + if (nr < 0) + return -EINVAL; + + if (!mutex_trylock(&damon_sysfs_lock)) + return -EBUSY; + err = damon_sysfs_regions_add_dirs(regions, nr); + mutex_unlock(&damon_sysfs_lock); + if (err) + return err; + + return count; +} + +static void damon_sysfs_regions_release(struct kobject *kobj) +{ + kfree(container_of(kobj, struct damon_sysfs_regions, kobj)); +} + +static struct kobj_attribute damon_sysfs_regions_nr_attr = + __ATTR_RW_MODE(nr_regions, 0600); + +static struct attribute *damon_sysfs_regions_attrs[] = { + &damon_sysfs_regions_nr_attr.attr, + NULL, +}; +ATTRIBUTE_GROUPS(damon_sysfs_regions); + +static struct kobj_type damon_sysfs_regions_ktype = { + .release = damon_sysfs_regions_release, + .sysfs_ops = &kobj_sysfs_ops, + .default_groups = damon_sysfs_regions_groups, +}; + +/* * target directory */ struct damon_sysfs_target { struct kobject kobj; + struct damon_sysfs_regions *regions; int pid; }; @@ -127,6 +335,29 @@ static struct damon_sysfs_target *damon_ return kzalloc(sizeof(struct damon_sysfs_target), GFP_KERNEL); } +static int damon_sysfs_target_add_dirs(struct damon_sysfs_target *target) +{ + struct damon_sysfs_regions *regions = damon_sysfs_regions_alloc(); + int err; + + if (!regions) + return -ENOMEM; + + err = kobject_init_and_add(®ions->kobj, &damon_sysfs_regions_ktype, + &target->kobj, "regions"); + if (err) + kobject_put(®ions->kobj); + else + target->regions = regions; + return err; +} + +static void damon_sysfs_target_rm_dirs(struct damon_sysfs_target *target) +{ + damon_sysfs_regions_rm_dirs(target->regions); + kobject_put(&target->regions->kobj); +} + static ssize_t pid_target_show(struct kobject *kobj, struct kobj_attribute *attr, char *buf) { @@ -188,8 +419,10 @@ static void damon_sysfs_targets_rm_dirs( struct damon_sysfs_target **targets_arr = targets->targets_arr; int i; - for (i = 0; i < targets->nr; i++) + for (i = 0; i < targets->nr; i++) { + damon_sysfs_target_rm_dirs(targets_arr[i]); kobject_put(&targets_arr[i]->kobj); + } targets->nr = 0; kfree(targets_arr); targets->targets_arr = NULL; @@ -224,6 +457,10 @@ static int damon_sysfs_targets_add_dirs( if (err) goto out; + err = damon_sysfs_target_add_dirs(target); + if (err) + goto out; + targets_arr[i] = target; targets->nr++; } @@ -610,9 +847,6 @@ static ssize_t operations_store(struct k for (id = 0; id < NR_DAMON_OPS; id++) { if (sysfs_streq(buf, damon_sysfs_ops_strs[id])) { - /* Support only vaddr */ - if (id != DAMON_OPS_VADDR) - return -EINVAL; context->ops_id = id; return count; } @@ -857,10 +1091,37 @@ static void damon_sysfs_destroy_targets( } } +static int damon_sysfs_set_regions(struct damon_target *t, + struct damon_sysfs_regions *sysfs_regions) +{ + int i; + + for (i = 0; i < sysfs_regions->nr; i++) { + struct damon_sysfs_region *sys_region = + sysfs_regions->regions_arr[i]; + struct damon_region *prev, *r; + + if (sys_region->start > sys_region->end) + return -EINVAL; + r = damon_new_region(sys_region->start, sys_region->end); + if (!r) + return -ENOMEM; + damon_add_region(r, t); + if (damon_nr_regions(t) > 1) { + prev = damon_prev_region(r); + if (prev->ar.end > r->ar.start) { + damon_destroy_region(r, t); + return -EINVAL; + } + } + } + return 0; +} + static int damon_sysfs_set_targets(struct damon_ctx *ctx, struct damon_sysfs_targets *sysfs_targets) { - int i; + int i, err; for (i = 0; i < sysfs_targets->nr; i++) { struct damon_sysfs_target *sys_target = @@ -879,6 +1140,11 @@ static int damon_sysfs_set_targets(struc } } damon_add_target(ctx, t); + err = damon_sysfs_set_regions(t, sys_target->regions); + if (err) { + damon_sysfs_destroy_targets(ctx); + return err; + } } return 0; } From patchwork Tue Mar 22 21:49:37 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789303 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 78B07C433FE for ; Tue, 22 Mar 2022 21:49:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 24DD86B0251; Tue, 22 Mar 2022 17:49:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1FEEF6B0252; Tue, 22 Mar 2022 17:49:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0294A6B0253; Tue, 22 Mar 2022 17:49:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.28]) by kanga.kvack.org (Postfix) with ESMTP id E8AA96B0251 for ; Tue, 22 Mar 2022 17:49:39 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id BFD1F161A for ; Tue, 22 Mar 2022 21:49:39 +0000 (UTC) X-FDA: 79273364478.08.53AA43F Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf12.hostedemail.com (Postfix) with ESMTP id 31E754003F for ; Tue, 22 Mar 2022 21:49:39 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 07C84B81DAB; Tue, 22 Mar 2022 21:49:38 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9C841C340EC; Tue, 22 Mar 2022 21:49:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985777; bh=ezzqprl0GyYYDHPIbfibQWYTPVxulWE9ITSX+sl3rdw=; h=Date:To:From:In-Reply-To:Subject:From; b=r6F8WzCHPEoCuldD3GITaEOFlIsLLsTCsZUO0KtzmAYANplju5rC6Dgck5qpoS1FG +d8VZDf8ovrONrsznxuZNCmriFYQYZn5i7kAbfTZmqSgnP3OeDEv8VysQwqaPxD4UX 22+sX2tT0tQyO8Ob7bE4gcL23ZeQ63e6Fwy3F//0= Date: Tue, 22 Mar 2022 14:49:37 -0700 To: xhao@linux.alibaba.com,skhan@linuxfoundation.org,rientjes@google.com,gregkh@linuxfoundation.org,corbet@lwn.net,sj@kernel.org,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 219/227] mm/damon/sysfs: support DAMON-based Operation Schemes Message-Id: <20220322214937.9C841C340EC@smtp.kernel.org> X-Rspamd-Server: rspam09 X-Rspam-User: X-Stat-Signature: 3hcx9wq46jhxphcsxs98ad8ehr7ahg4a Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=r6F8WzCH; dmarc=none; spf=pass (imf12.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Rspamd-Queue-Id: 31E754003F X-HE-Tag: 1647985779-660003 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: SeongJae Park Subject: mm/damon/sysfs: support DAMON-based Operation Schemes This commit makes DAMON sysfs interface supports the DAMON-based operation schemes (DAMOS) feature. Specifically, this commit adds 'schemes' directory under each context direcotry, and makes kdamond 'state' file writing respects the contents in the directory. Note that this commit doesn't support all features of DAMOS but only the target access pattern and action feature. Supports for quotas, prioritization, watermarks will follow. As a result, the files hierarchy becomes as below: /sys/kernel/mm/damon/admin │ kdamonds/nr_kdamonds │ │ 0/state,pid │ │ │ contexts/nr_contexts │ │ │ │ 0/operations │ │ │ │ │ monitoring_attrs/intervals/sample_us,aggr_us,update_us │ │ │ │ │ │ nr_regions/min,max │ │ │ │ │ targets/nr_targets │ │ │ │ │ │ 0/pid_target │ │ │ │ │ │ │ regions/nr_regions │ │ │ │ │ │ │ │ 0/start,end │ │ │ │ │ │ │ │ ... │ │ │ │ │ │ ... │ │ │ │ │ schemes/nr_schemes <- NEW DIRECTORY │ │ │ │ │ │ 0/action │ │ │ │ │ │ │ access_pattern/ │ │ │ │ │ │ │ │ sz/min,max │ │ │ │ │ │ │ │ nr_accesses/min,max │ │ │ │ │ │ │ │ age/min,max │ │ │ │ │ │ ... │ │ │ │ ... │ │ ... Link: https://lkml.kernel.org/r/20220228081314.5770-7-sj@kernel.org Signed-off-by: SeongJae Park Cc: David Rientjes Cc: Greg Kroah-Hartman Cc: Jonathan Corbet Cc: Shuah Khan Cc: Xin Hao Signed-off-by: Andrew Morton --- mm/damon/sysfs.c | 410 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 410 insertions(+) --- a/mm/damon/sysfs.c~mm-damon-sysfs-support-damon-based-operation-schemes +++ a/mm/damon/sysfs.c @@ -114,6 +114,347 @@ static struct kobj_type damon_sysfs_ul_r }; /* + * access_pattern directory + */ + +struct damon_sysfs_access_pattern { + struct kobject kobj; + struct damon_sysfs_ul_range *sz; + struct damon_sysfs_ul_range *nr_accesses; + struct damon_sysfs_ul_range *age; +}; + +static +struct damon_sysfs_access_pattern *damon_sysfs_access_pattern_alloc(void) +{ + struct damon_sysfs_access_pattern *access_pattern = + kmalloc(sizeof(*access_pattern), GFP_KERNEL); + + if (!access_pattern) + return NULL; + access_pattern->kobj = (struct kobject){}; + return access_pattern; +} + +static int damon_sysfs_access_pattern_add_range_dir( + struct damon_sysfs_access_pattern *access_pattern, + struct damon_sysfs_ul_range **range_dir_ptr, + char *name) +{ + struct damon_sysfs_ul_range *range = damon_sysfs_ul_range_alloc(0, 0); + int err; + + if (!range) + return -ENOMEM; + err = kobject_init_and_add(&range->kobj, &damon_sysfs_ul_range_ktype, + &access_pattern->kobj, name); + if (err) + kobject_put(&range->kobj); + else + *range_dir_ptr = range; + return err; +} + +static int damon_sysfs_access_pattern_add_dirs( + struct damon_sysfs_access_pattern *access_pattern) +{ + int err; + + err = damon_sysfs_access_pattern_add_range_dir(access_pattern, + &access_pattern->sz, "sz"); + if (err) + goto put_sz_out; + + err = damon_sysfs_access_pattern_add_range_dir(access_pattern, + &access_pattern->nr_accesses, "nr_accesses"); + if (err) + goto put_nr_accesses_sz_out; + + err = damon_sysfs_access_pattern_add_range_dir(access_pattern, + &access_pattern->age, "age"); + if (err) + goto put_age_nr_accesses_sz_out; + return 0; + +put_age_nr_accesses_sz_out: + kobject_put(&access_pattern->age->kobj); + access_pattern->age = NULL; +put_nr_accesses_sz_out: + kobject_put(&access_pattern->nr_accesses->kobj); + access_pattern->nr_accesses = NULL; +put_sz_out: + kobject_put(&access_pattern->sz->kobj); + access_pattern->sz = NULL; + return err; +} + +static void damon_sysfs_access_pattern_rm_dirs( + struct damon_sysfs_access_pattern *access_pattern) +{ + kobject_put(&access_pattern->sz->kobj); + kobject_put(&access_pattern->nr_accesses->kobj); + kobject_put(&access_pattern->age->kobj); +} + +static void damon_sysfs_access_pattern_release(struct kobject *kobj) +{ + kfree(container_of(kobj, struct damon_sysfs_access_pattern, kobj)); +} + +static struct attribute *damon_sysfs_access_pattern_attrs[] = { + NULL, +}; +ATTRIBUTE_GROUPS(damon_sysfs_access_pattern); + +static struct kobj_type damon_sysfs_access_pattern_ktype = { + .release = damon_sysfs_access_pattern_release, + .sysfs_ops = &kobj_sysfs_ops, + .default_groups = damon_sysfs_access_pattern_groups, +}; + +/* + * scheme directory + */ + +struct damon_sysfs_scheme { + struct kobject kobj; + enum damos_action action; + struct damon_sysfs_access_pattern *access_pattern; +}; + +/* This should match with enum damos_action */ +static const char * const damon_sysfs_damos_action_strs[] = { + "willneed", + "cold", + "pageout", + "hugepage", + "nohugepage", + "stat", +}; + +static struct damon_sysfs_scheme *damon_sysfs_scheme_alloc( + enum damos_action action) +{ + struct damon_sysfs_scheme *scheme = kmalloc(sizeof(*scheme), + GFP_KERNEL); + + if (!scheme) + return NULL; + scheme->kobj = (struct kobject){}; + scheme->action = action; + return scheme; +} + +static int damon_sysfs_scheme_set_access_pattern( + struct damon_sysfs_scheme *scheme) +{ + struct damon_sysfs_access_pattern *access_pattern; + int err; + + access_pattern = damon_sysfs_access_pattern_alloc(); + if (!access_pattern) + return -ENOMEM; + err = kobject_init_and_add(&access_pattern->kobj, + &damon_sysfs_access_pattern_ktype, &scheme->kobj, + "access_pattern"); + if (err) + goto out; + err = damon_sysfs_access_pattern_add_dirs(access_pattern); + if (err) + goto out; + scheme->access_pattern = access_pattern; + return 0; + +out: + kobject_put(&access_pattern->kobj); + return err; +} + +static int damon_sysfs_scheme_add_dirs(struct damon_sysfs_scheme *scheme) +{ + int err; + + err = damon_sysfs_scheme_set_access_pattern(scheme); + if (err) + return err; + return 0; +} + +static void damon_sysfs_scheme_rm_dirs(struct damon_sysfs_scheme *scheme) +{ + damon_sysfs_access_pattern_rm_dirs(scheme->access_pattern); + kobject_put(&scheme->access_pattern->kobj); +} + +static ssize_t action_show(struct kobject *kobj, struct kobj_attribute *attr, + char *buf) +{ + struct damon_sysfs_scheme *scheme = container_of(kobj, + struct damon_sysfs_scheme, kobj); + + return sysfs_emit(buf, "%s\n", + damon_sysfs_damos_action_strs[scheme->action]); +} + +static ssize_t action_store(struct kobject *kobj, struct kobj_attribute *attr, + const char *buf, size_t count) +{ + struct damon_sysfs_scheme *scheme = container_of(kobj, + struct damon_sysfs_scheme, kobj); + enum damos_action action; + + for (action = 0; action < NR_DAMOS_ACTIONS; action++) { + if (sysfs_streq(buf, damon_sysfs_damos_action_strs[action])) { + scheme->action = action; + return count; + } + } + return -EINVAL; +} + +static void damon_sysfs_scheme_release(struct kobject *kobj) +{ + kfree(container_of(kobj, struct damon_sysfs_scheme, kobj)); +} + +static struct kobj_attribute damon_sysfs_scheme_action_attr = + __ATTR_RW_MODE(action, 0600); + +static struct attribute *damon_sysfs_scheme_attrs[] = { + &damon_sysfs_scheme_action_attr.attr, + NULL, +}; +ATTRIBUTE_GROUPS(damon_sysfs_scheme); + +static struct kobj_type damon_sysfs_scheme_ktype = { + .release = damon_sysfs_scheme_release, + .sysfs_ops = &kobj_sysfs_ops, + .default_groups = damon_sysfs_scheme_groups, +}; + +/* + * schemes directory + */ + +struct damon_sysfs_schemes { + struct kobject kobj; + struct damon_sysfs_scheme **schemes_arr; + int nr; +}; + +static struct damon_sysfs_schemes *damon_sysfs_schemes_alloc(void) +{ + return kzalloc(sizeof(struct damon_sysfs_schemes), GFP_KERNEL); +} + +static void damon_sysfs_schemes_rm_dirs(struct damon_sysfs_schemes *schemes) +{ + struct damon_sysfs_scheme **schemes_arr = schemes->schemes_arr; + int i; + + for (i = 0; i < schemes->nr; i++) { + damon_sysfs_scheme_rm_dirs(schemes_arr[i]); + kobject_put(&schemes_arr[i]->kobj); + } + schemes->nr = 0; + kfree(schemes_arr); + schemes->schemes_arr = NULL; +} + +static int damon_sysfs_schemes_add_dirs(struct damon_sysfs_schemes *schemes, + int nr_schemes) +{ + struct damon_sysfs_scheme **schemes_arr, *scheme; + int err, i; + + damon_sysfs_schemes_rm_dirs(schemes); + if (!nr_schemes) + return 0; + + schemes_arr = kmalloc_array(nr_schemes, sizeof(*schemes_arr), + GFP_KERNEL | __GFP_NOWARN); + if (!schemes_arr) + return -ENOMEM; + schemes->schemes_arr = schemes_arr; + + for (i = 0; i < nr_schemes; i++) { + scheme = damon_sysfs_scheme_alloc(DAMOS_STAT); + if (!scheme) { + damon_sysfs_schemes_rm_dirs(schemes); + return -ENOMEM; + } + + err = kobject_init_and_add(&scheme->kobj, + &damon_sysfs_scheme_ktype, &schemes->kobj, + "%d", i); + if (err) + goto out; + err = damon_sysfs_scheme_add_dirs(scheme); + if (err) + goto out; + + schemes_arr[i] = scheme; + schemes->nr++; + } + return 0; + +out: + damon_sysfs_schemes_rm_dirs(schemes); + kobject_put(&scheme->kobj); + return err; +} + +static ssize_t nr_schemes_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + struct damon_sysfs_schemes *schemes = container_of(kobj, + struct damon_sysfs_schemes, kobj); + + return sysfs_emit(buf, "%d\n", schemes->nr); +} + +static ssize_t nr_schemes_store(struct kobject *kobj, + struct kobj_attribute *attr, const char *buf, size_t count) +{ + struct damon_sysfs_schemes *schemes = container_of(kobj, + struct damon_sysfs_schemes, kobj); + int nr, err = kstrtoint(buf, 0, &nr); + + if (err) + return err; + if (nr < 0) + return -EINVAL; + + if (!mutex_trylock(&damon_sysfs_lock)) + return -EBUSY; + err = damon_sysfs_schemes_add_dirs(schemes, nr); + mutex_unlock(&damon_sysfs_lock); + if (err) + return err; + return count; +} + +static void damon_sysfs_schemes_release(struct kobject *kobj) +{ + kfree(container_of(kobj, struct damon_sysfs_schemes, kobj)); +} + +static struct kobj_attribute damon_sysfs_schemes_nr_attr = + __ATTR_RW_MODE(nr_schemes, 0600); + +static struct attribute *damon_sysfs_schemes_attrs[] = { + &damon_sysfs_schemes_nr_attr.attr, + NULL, +}; +ATTRIBUTE_GROUPS(damon_sysfs_schemes); + +static struct kobj_type damon_sysfs_schemes_ktype = { + .release = damon_sysfs_schemes_release, + .sysfs_ops = &kobj_sysfs_ops, + .default_groups = damon_sysfs_schemes_groups, +}; + +/* * init region directory */ @@ -748,6 +1089,7 @@ struct damon_sysfs_context { enum damon_ops_id ops_id; struct damon_sysfs_attrs *attrs; struct damon_sysfs_targets *targets; + struct damon_sysfs_schemes *schemes; }; static struct damon_sysfs_context *damon_sysfs_context_alloc( @@ -802,6 +1144,23 @@ static int damon_sysfs_context_set_targe return 0; } +static int damon_sysfs_context_set_schemes(struct damon_sysfs_context *context) +{ + struct damon_sysfs_schemes *schemes = damon_sysfs_schemes_alloc(); + int err; + + if (!schemes) + return -ENOMEM; + err = kobject_init_and_add(&schemes->kobj, &damon_sysfs_schemes_ktype, + &context->kobj, "schemes"); + if (err) { + kobject_put(&schemes->kobj); + return err; + } + context->schemes = schemes; + return 0; +} + static int damon_sysfs_context_add_dirs(struct damon_sysfs_context *context) { int err; @@ -813,8 +1172,15 @@ static int damon_sysfs_context_add_dirs( err = damon_sysfs_context_set_targets(context); if (err) goto put_attrs_out; + + err = damon_sysfs_context_set_schemes(context); + if (err) + goto put_targets_attrs_out; return 0; +put_targets_attrs_out: + kobject_put(&context->targets->kobj); + context->targets = NULL; put_attrs_out: kobject_put(&context->attrs->kobj); context->attrs = NULL; @@ -827,6 +1193,8 @@ static void damon_sysfs_context_rm_dirs( kobject_put(&context->attrs->kobj); damon_sysfs_targets_rm_dirs(context->targets); kobject_put(&context->targets->kobj); + damon_sysfs_schemes_rm_dirs(context->schemes); + kobject_put(&context->schemes->kobj); } static ssize_t operations_show(struct kobject *kobj, @@ -1149,6 +1517,45 @@ static int damon_sysfs_set_targets(struc return 0; } +static struct damos *damon_sysfs_mk_scheme( + struct damon_sysfs_scheme *sysfs_scheme) +{ + struct damon_sysfs_access_pattern *pattern = + sysfs_scheme->access_pattern; + struct damos_quota quota = (struct damos_quota){}; + struct damos_watermarks wmarks = { + .metric = DAMOS_WMARK_NONE, + .interval = 0, + .high = 0, + .mid = 0, + .low = 0, + }; + + return damon_new_scheme(pattern->sz->min, pattern->sz->max, + pattern->nr_accesses->min, pattern->nr_accesses->max, + pattern->age->min, pattern->age->max, + sysfs_scheme->action, "a, &wmarks); +} + +static int damon_sysfs_set_schemes(struct damon_ctx *ctx, + struct damon_sysfs_schemes *sysfs_schemes) +{ + int i; + + for (i = 0; i < sysfs_schemes->nr; i++) { + struct damos *scheme, *next; + + scheme = damon_sysfs_mk_scheme(sysfs_schemes->schemes_arr[i]); + if (!scheme) { + damon_for_each_scheme_safe(scheme, next, ctx) + damon_destroy_scheme(scheme); + return -ENOMEM; + } + damon_add_scheme(ctx, scheme); + } + return 0; +} + static void damon_sysfs_before_terminate(struct damon_ctx *ctx) { struct damon_target *t, *next; @@ -1182,6 +1589,9 @@ static struct damon_ctx *damon_sysfs_bui err = damon_sysfs_set_targets(ctx, sys_ctx->targets); if (err) goto out; + err = damon_sysfs_set_schemes(ctx, sys_ctx->schemes); + if (err) + goto out; ctx->callback.before_terminate = damon_sysfs_before_terminate; return ctx; From patchwork Tue Mar 22 21:49:40 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789304 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9BFE1C4332F for ; Tue, 22 Mar 2022 21:49:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 229ED6B0253; Tue, 22 Mar 2022 17:49:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 13B646B0255; Tue, 22 Mar 2022 17:49:42 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BEAE86B0256; Tue, 22 Mar 2022 17:49:42 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0063.hostedemail.com [216.40.44.63]) by kanga.kvack.org (Postfix) with ESMTP id 882D16B0253 for ; Tue, 22 Mar 2022 17:49:42 -0400 (EDT) Received: from smtpin17.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 46849A4DD9 for ; Tue, 22 Mar 2022 21:49:42 +0000 (UTC) X-FDA: 79273364604.17.8401F46 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf05.hostedemail.com (Postfix) with ESMTP id C2907100028 for ; Tue, 22 Mar 2022 21:49:41 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 4C6A961667; Tue, 22 Mar 2022 21:49:41 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id A3247C340EE; Tue, 22 Mar 2022 21:49:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985780; bh=a4+hQi8kKkhAlPzDmb9wzBha0TSCNKx0gwdBaPmnNxA=; h=Date:To:From:In-Reply-To:Subject:From; b=XVWKa4xpM8SxDtmJEJUO3aqu0suaVfwTCoo2lHkUZZtNn36R8yshfI6pdpgbVCI0N ywvOV2Tw7QfLQWPEaXp7uz+aEWunMmH0SC4BixpKnwGf0ruI4LbRgQHNMjSxgGZ6Tv flJRn/RKsSTuCAyvL2iU9m5z7RescTqlQODrP0HA= Date: Tue, 22 Mar 2022 14:49:40 -0700 To: xhao@linux.alibaba.com,skhan@linuxfoundation.org,rientjes@google.com,gregkh@linuxfoundation.org,corbet@lwn.net,sj@kernel.org,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 220/227] mm/damon/sysfs: support DAMOS quotas Message-Id: <20220322214940.A3247C340EE@smtp.kernel.org> X-Stat-Signature: he74z5hnjijhud5to4hidz6jmqqxshe1 Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=XVWKa4xp; spf=pass (imf05.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: C2907100028 X-HE-Tag: 1647985781-862379 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: SeongJae Park Subject: mm/damon/sysfs: support DAMOS quotas This commit makes DAMON sysfs interface supports the DAMOS quotas feature. Specifically, this commit adds 'quotas' directory under each scheme directory and makes kdamond 'state' file writing respects the contents in the directory. As a result, the files hierarchy becomes as below: /sys/kernel/mm/damon/admin │ kdamonds/nr_kdamonds │ │ 0/state,pid │ │ │ contexts/nr_contexts │ │ │ │ 0/operations │ │ │ │ │ monitoring_attrs/intervals/sample_us,aggr_us,update_us │ │ │ │ │ │ nr_regions/min,max │ │ │ │ │ targets/nr_targets │ │ │ │ │ │ 0/pid_target │ │ │ │ │ │ │ regions/nr_regions │ │ │ │ │ │ │ │ 0/start,end │ │ │ │ │ │ │ │ ... │ │ │ │ │ │ ... │ │ │ │ │ schemes/nr_schemes │ │ │ │ │ │ 0/action │ │ │ │ │ │ │ access_pattern/ │ │ │ │ │ │ │ │ sz/min,max │ │ │ │ │ │ │ │ nr_accesses/min,max │ │ │ │ │ │ │ │ age/min,max │ │ │ │ │ │ │ quotas/ms,bytes,reset_interval_ms <- NEW DIRECTORY │ │ │ │ │ │ ... │ │ │ │ ... │ │ ... Link: https://lkml.kernel.org/r/20220228081314.5770-8-sj@kernel.org Signed-off-by: SeongJae Park Cc: David Rientjes Cc: Greg Kroah-Hartman Cc: Jonathan Corbet Cc: Shuah Khan Cc: Xin Hao Signed-off-by: Andrew Morton --- mm/damon/sysfs.c | 146 ++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 145 insertions(+), 1 deletion(-) --- a/mm/damon/sysfs.c~mm-damon-sysfs-support-damos-quotas +++ a/mm/damon/sysfs.c @@ -114,6 +114,113 @@ static struct kobj_type damon_sysfs_ul_r }; /* + * quotas directory + */ + +struct damon_sysfs_quotas { + struct kobject kobj; + unsigned long ms; + unsigned long sz; + unsigned long reset_interval_ms; +}; + +static struct damon_sysfs_quotas *damon_sysfs_quotas_alloc(void) +{ + return kzalloc(sizeof(struct damon_sysfs_quotas), GFP_KERNEL); +} + +static ssize_t ms_show(struct kobject *kobj, struct kobj_attribute *attr, + char *buf) +{ + struct damon_sysfs_quotas *quotas = container_of(kobj, + struct damon_sysfs_quotas, kobj); + + return sysfs_emit(buf, "%lu\n", quotas->ms); +} + +static ssize_t ms_store(struct kobject *kobj, struct kobj_attribute *attr, + const char *buf, size_t count) +{ + struct damon_sysfs_quotas *quotas = container_of(kobj, + struct damon_sysfs_quotas, kobj); + int err = kstrtoul(buf, 0, "as->ms); + + if (err) + return -EINVAL; + return count; +} + +static ssize_t bytes_show(struct kobject *kobj, struct kobj_attribute *attr, + char *buf) +{ + struct damon_sysfs_quotas *quotas = container_of(kobj, + struct damon_sysfs_quotas, kobj); + + return sysfs_emit(buf, "%lu\n", quotas->sz); +} + +static ssize_t bytes_store(struct kobject *kobj, + struct kobj_attribute *attr, const char *buf, size_t count) +{ + struct damon_sysfs_quotas *quotas = container_of(kobj, + struct damon_sysfs_quotas, kobj); + int err = kstrtoul(buf, 0, "as->sz); + + if (err) + return -EINVAL; + return count; +} + +static ssize_t reset_interval_ms_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + struct damon_sysfs_quotas *quotas = container_of(kobj, + struct damon_sysfs_quotas, kobj); + + return sysfs_emit(buf, "%lu\n", quotas->reset_interval_ms); +} + +static ssize_t reset_interval_ms_store(struct kobject *kobj, + struct kobj_attribute *attr, const char *buf, size_t count) +{ + struct damon_sysfs_quotas *quotas = container_of(kobj, + struct damon_sysfs_quotas, kobj); + int err = kstrtoul(buf, 0, "as->reset_interval_ms); + + if (err) + return -EINVAL; + return count; +} + +static void damon_sysfs_quotas_release(struct kobject *kobj) +{ + kfree(container_of(kobj, struct damon_sysfs_quotas, kobj)); +} + +static struct kobj_attribute damon_sysfs_quotas_ms_attr = + __ATTR_RW_MODE(ms, 0600); + +static struct kobj_attribute damon_sysfs_quotas_sz_attr = + __ATTR_RW_MODE(bytes, 0600); + +static struct kobj_attribute damon_sysfs_quotas_reset_interval_ms_attr = + __ATTR_RW_MODE(reset_interval_ms, 0600); + +static struct attribute *damon_sysfs_quotas_attrs[] = { + &damon_sysfs_quotas_ms_attr.attr, + &damon_sysfs_quotas_sz_attr.attr, + &damon_sysfs_quotas_reset_interval_ms_attr.attr, + NULL, +}; +ATTRIBUTE_GROUPS(damon_sysfs_quotas); + +static struct kobj_type damon_sysfs_quotas_ktype = { + .release = damon_sysfs_quotas_release, + .sysfs_ops = &kobj_sysfs_ops, + .default_groups = damon_sysfs_quotas_groups, +}; + +/* * access_pattern directory */ @@ -220,6 +327,7 @@ struct damon_sysfs_scheme { struct kobject kobj; enum damos_action action; struct damon_sysfs_access_pattern *access_pattern; + struct damon_sysfs_quotas *quotas; }; /* This should match with enum damos_action */ @@ -270,6 +378,25 @@ out: return err; } +static int damon_sysfs_scheme_set_quotas(struct damon_sysfs_scheme *scheme) +{ + struct damon_sysfs_quotas *quotas = damon_sysfs_quotas_alloc(); + int err; + + if (!quotas) + return -ENOMEM; + err = kobject_init_and_add("as->kobj, &damon_sysfs_quotas_ktype, + &scheme->kobj, "quotas"); + if (err) + goto out; + scheme->quotas = quotas; + return 0; + +out: + kobject_put("as->kobj); + return err; +} + static int damon_sysfs_scheme_add_dirs(struct damon_sysfs_scheme *scheme) { int err; @@ -277,13 +404,22 @@ static int damon_sysfs_scheme_add_dirs(s err = damon_sysfs_scheme_set_access_pattern(scheme); if (err) return err; + err = damon_sysfs_scheme_set_quotas(scheme); + if (err) + goto put_access_pattern_out; return 0; + +put_access_pattern_out: + kobject_put(&scheme->access_pattern->kobj); + scheme->access_pattern = NULL; + return err; } static void damon_sysfs_scheme_rm_dirs(struct damon_sysfs_scheme *scheme) { damon_sysfs_access_pattern_rm_dirs(scheme->access_pattern); kobject_put(&scheme->access_pattern->kobj); + kobject_put(&scheme->quotas->kobj); } static ssize_t action_show(struct kobject *kobj, struct kobj_attribute *attr, @@ -1522,7 +1658,15 @@ static struct damos *damon_sysfs_mk_sche { struct damon_sysfs_access_pattern *pattern = sysfs_scheme->access_pattern; - struct damos_quota quota = (struct damos_quota){}; + struct damon_sysfs_quotas *sysfs_quotas = sysfs_scheme->quotas; + struct damos_quota quota = { + .ms = sysfs_quotas->ms, + .sz = sysfs_quotas->sz, + .reset_interval = sysfs_quotas->reset_interval_ms, + .weight_sz = 1000, + .weight_nr_accesses = 1000, + .weight_age = 1000, + }; struct damos_watermarks wmarks = { .metric = DAMOS_WMARK_NONE, .interval = 0, From patchwork Tue Mar 22 21:49:43 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789305 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 752E0C433EF for ; Tue, 22 Mar 2022 21:49:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 13CFC6B0256; Tue, 22 Mar 2022 17:49:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0798C6B0257; Tue, 22 Mar 2022 17:49:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E34076B0258; Tue, 22 Mar 2022 17:49:46 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0154.hostedemail.com [216.40.44.154]) by kanga.kvack.org (Postfix) with ESMTP id CD0AA6B0256 for ; Tue, 22 Mar 2022 17:49:46 -0400 (EDT) Received: from smtpin21.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 9C488A010A for ; Tue, 22 Mar 2022 21:49:46 +0000 (UTC) X-FDA: 79273364772.21.B59B8DC Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf17.hostedemail.com (Postfix) with ESMTP id 0724340028 for ; Tue, 22 Mar 2022 21:49:45 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id E8FC3B81DAB; Tue, 22 Mar 2022 21:49:44 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9F855C340EC; Tue, 22 Mar 2022 21:49:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985783; bh=lschNhLU+b5N/RiUwXI2t+VkkMPcWJpfwU1Xg1Ykfxg=; h=Date:To:From:In-Reply-To:Subject:From; b=X9JaV09lOK1o3ndFRsMdzwQPsGUDli5YqTFmV+93qG5W/MJvqZHxP094KexmHVk7g 6qPEjBxU3gx28ytoCsD42jI/R0ncYeU/E5zd2HNs59Xs7cx64nyJN9xuzpLD+2GEb4 pjUjJ71EwAX/JV1KFquCDG5czSYUMY96mQTRpUdg= Date: Tue, 22 Mar 2022 14:49:43 -0700 To: xhao@linux.alibaba.com,skhan@linuxfoundation.org,rientjes@google.com,gregkh@linuxfoundation.org,corbet@lwn.net,sj@kernel.org,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 221/227] mm/damon/sysfs: support schemes prioritization Message-Id: <20220322214943.9F855C340EC@smtp.kernel.org> Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=X9JaV09l; spf=pass (imf17.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Stat-Signature: 7d4iz7ax6ztmdgqyrbnmegx39ezwafzw X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 0724340028 X-HE-Tag: 1647985785-100617 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: SeongJae Park Subject: mm/damon/sysfs: support schemes prioritization This commit makes DAMON sysfs interface supports the DAMOS' regions prioritization weights feature under quotas limitation. Specifically, this commit adds 'weights' directory under each scheme directory and makes kdamond 'state' file writing respects the contents in the directory. /sys/kernel/mm/damon/admin │ kdamonds/nr │ │ 0/state,pid │ │ │ contexts/nr │ │ │ │ 0/operations │ │ │ │ │ monitoring_attrs/intervals/sample_us,aggr_us,update_us │ │ │ │ │ │ nr_regions/min,max │ │ │ │ │ targets/nr │ │ │ │ │ │ 0/pid │ │ │ │ │ │ │ regions/nr │ │ │ │ │ │ │ │ 0/start,end │ │ │ │ │ │ │ │ ... │ │ │ │ │ │ ... │ │ │ │ │ schemes/nr │ │ │ │ │ │ 0/action │ │ │ │ │ │ │ access_pattern/ │ │ │ │ │ │ │ │ sz/min,max │ │ │ │ │ │ │ │ nr_accesses/min,max │ │ │ │ │ │ │ │ age/min,max │ │ │ │ │ │ │ quotas/ms,bytes,reset_interval_ms │ │ │ │ │ │ │ │ weights/ <- NEW DIRECTORY │ │ │ │ │ │ │ │ │ weights/sz_permil,nr_accesses_permil,age_permil │ │ │ │ │ │ ... │ │ │ │ ... │ │ ... Link: https://lkml.kernel.org/r/20220228081314.5770-9-sj@kernel.org Signed-off-by: SeongJae Park Cc: David Rientjes Cc: Greg Kroah-Hartman Cc: Jonathan Corbet Cc: Shuah Khan Cc: Xin Hao Signed-off-by: Andrew Morton --- mm/damon/sysfs.c | 152 ++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 149 insertions(+), 3 deletions(-) --- a/mm/damon/sysfs.c~mm-damon-sysfs-support-schemes-prioritization +++ a/mm/damon/sysfs.c @@ -114,11 +114,129 @@ static struct kobj_type damon_sysfs_ul_r }; /* + * scheme/weights directory + */ + +struct damon_sysfs_weights { + struct kobject kobj; + unsigned int sz; + unsigned int nr_accesses; + unsigned int age; +}; + +static struct damon_sysfs_weights *damon_sysfs_weights_alloc(unsigned int sz, + unsigned int nr_accesses, unsigned int age) +{ + struct damon_sysfs_weights *weights = kmalloc(sizeof(*weights), + GFP_KERNEL); + + if (!weights) + return NULL; + weights->kobj = (struct kobject){}; + weights->sz = sz; + weights->nr_accesses = nr_accesses; + weights->age = age; + return weights; +} + +static ssize_t sz_permil_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + struct damon_sysfs_weights *weights = container_of(kobj, + struct damon_sysfs_weights, kobj); + + return sysfs_emit(buf, "%u\n", weights->sz); +} + +static ssize_t sz_permil_store(struct kobject *kobj, + struct kobj_attribute *attr, const char *buf, size_t count) +{ + struct damon_sysfs_weights *weights = container_of(kobj, + struct damon_sysfs_weights, kobj); + int err = kstrtouint(buf, 0, &weights->sz); + + if (err) + return -EINVAL; + return count; +} + +static ssize_t nr_accesses_permil_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + struct damon_sysfs_weights *weights = container_of(kobj, + struct damon_sysfs_weights, kobj); + + return sysfs_emit(buf, "%u\n", weights->nr_accesses); +} + +static ssize_t nr_accesses_permil_store(struct kobject *kobj, + struct kobj_attribute *attr, const char *buf, size_t count) +{ + struct damon_sysfs_weights *weights = container_of(kobj, + struct damon_sysfs_weights, kobj); + int err = kstrtouint(buf, 0, &weights->nr_accesses); + + if (err) + return -EINVAL; + return count; +} + +static ssize_t age_permil_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + struct damon_sysfs_weights *weights = container_of(kobj, + struct damon_sysfs_weights, kobj); + + return sysfs_emit(buf, "%u\n", weights->age); +} + +static ssize_t age_permil_store(struct kobject *kobj, + struct kobj_attribute *attr, const char *buf, size_t count) +{ + struct damon_sysfs_weights *weights = container_of(kobj, + struct damon_sysfs_weights, kobj); + int err = kstrtouint(buf, 0, &weights->age); + + if (err) + return -EINVAL; + return count; +} + +static void damon_sysfs_weights_release(struct kobject *kobj) +{ + kfree(container_of(kobj, struct damon_sysfs_weights, kobj)); +} + +static struct kobj_attribute damon_sysfs_weights_sz_attr = + __ATTR_RW_MODE(sz_permil, 0600); + +static struct kobj_attribute damon_sysfs_weights_nr_accesses_attr = + __ATTR_RW_MODE(nr_accesses_permil, 0600); + +static struct kobj_attribute damon_sysfs_weights_age_attr = + __ATTR_RW_MODE(age_permil, 0600); + +static struct attribute *damon_sysfs_weights_attrs[] = { + &damon_sysfs_weights_sz_attr.attr, + &damon_sysfs_weights_nr_accesses_attr.attr, + &damon_sysfs_weights_age_attr.attr, + NULL, +}; +ATTRIBUTE_GROUPS(damon_sysfs_weights); + +static struct kobj_type damon_sysfs_weights_ktype = { + .release = damon_sysfs_weights_release, + .sysfs_ops = &kobj_sysfs_ops, + .default_groups = damon_sysfs_weights_groups, +}; + +/* * quotas directory */ struct damon_sysfs_quotas { struct kobject kobj; + struct damon_sysfs_weights *weights; unsigned long ms; unsigned long sz; unsigned long reset_interval_ms; @@ -129,6 +247,29 @@ static struct damon_sysfs_quotas *damon_ return kzalloc(sizeof(struct damon_sysfs_quotas), GFP_KERNEL); } +static int damon_sysfs_quotas_add_dirs(struct damon_sysfs_quotas *quotas) +{ + struct damon_sysfs_weights *weights; + int err; + + weights = damon_sysfs_weights_alloc(0, 0, 0); + if (!weights) + return -ENOMEM; + + err = kobject_init_and_add(&weights->kobj, &damon_sysfs_weights_ktype, + "as->kobj, "weights"); + if (err) + kobject_put(&weights->kobj); + else + quotas->weights = weights; + return err; +} + +static void damon_sysfs_quotas_rm_dirs(struct damon_sysfs_quotas *quotas) +{ + kobject_put("as->weights->kobj); +} + static ssize_t ms_show(struct kobject *kobj, struct kobj_attribute *attr, char *buf) { @@ -389,6 +530,9 @@ static int damon_sysfs_scheme_set_quotas &scheme->kobj, "quotas"); if (err) goto out; + err = damon_sysfs_quotas_add_dirs(quotas); + if (err) + goto out; scheme->quotas = quotas; return 0; @@ -419,6 +563,7 @@ static void damon_sysfs_scheme_rm_dirs(s { damon_sysfs_access_pattern_rm_dirs(scheme->access_pattern); kobject_put(&scheme->access_pattern->kobj); + damon_sysfs_quotas_rm_dirs(scheme->quotas); kobject_put(&scheme->quotas->kobj); } @@ -1659,13 +1804,14 @@ static struct damos *damon_sysfs_mk_sche struct damon_sysfs_access_pattern *pattern = sysfs_scheme->access_pattern; struct damon_sysfs_quotas *sysfs_quotas = sysfs_scheme->quotas; + struct damon_sysfs_weights *sysfs_weights = sysfs_quotas->weights; struct damos_quota quota = { .ms = sysfs_quotas->ms, .sz = sysfs_quotas->sz, .reset_interval = sysfs_quotas->reset_interval_ms, - .weight_sz = 1000, - .weight_nr_accesses = 1000, - .weight_age = 1000, + .weight_sz = sysfs_weights->sz, + .weight_nr_accesses = sysfs_weights->nr_accesses, + .weight_age = sysfs_weights->age, }; struct damos_watermarks wmarks = { .metric = DAMOS_WMARK_NONE, From patchwork Tue Mar 22 21:49:46 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789306 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 49C5EC433F5 for ; Tue, 22 Mar 2022 21:49:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D70546B0258; Tue, 22 Mar 2022 17:49:48 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C83716B0259; Tue, 22 Mar 2022 17:49:48 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AD6BB6B025A; Tue, 22 Mar 2022 17:49:48 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0008.hostedemail.com [216.40.44.8]) by kanga.kvack.org (Postfix) with ESMTP id 985F56B0258 for ; Tue, 22 Mar 2022 17:49:48 -0400 (EDT) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 606858249980 for ; Tue, 22 Mar 2022 21:49:48 +0000 (UTC) X-FDA: 79273364856.25.B44B68B Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf22.hostedemail.com (Postfix) with ESMTP id DC2A0C001F for ; Tue, 22 Mar 2022 21:49:47 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 4EF006149C; Tue, 22 Mar 2022 21:49:47 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id A3CB1C340EC; Tue, 22 Mar 2022 21:49:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985786; bh=h4ZseHfI2TYSWVEl5kfd8F5mdwFujHQ0EozM0ntQqmM=; h=Date:To:From:In-Reply-To:Subject:From; b=aui1aJ81gLS78jkomtZYL1xBwAE4HZuPlIUzmeqYGcO8H8UJTqtJakEaOk8RKXKsh PN66G6Ser45xyajmctHlbt7x6nCJGPB3J2VVdAivpSWlLMOJBrZ6lYtxbCpOTU0283 XYw9J1WqVfjQvhu9BPXYolrrcNFOi29Sqeafy2NI= Date: Tue, 22 Mar 2022 14:49:46 -0700 To: xhao@linux.alibaba.com,skhan@linuxfoundation.org,rientjes@google.com,gregkh@linuxfoundation.org,corbet@lwn.net,colin.i.king@gmail.com,sj@kernel.org,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 222/227] mm/damon/sysfs: support DAMOS watermarks Message-Id: <20220322214946.A3CB1C340EC@smtp.kernel.org> X-Stat-Signature: kxnt3fibap56ndgd1mfx3i4ssto353ka Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=aui1aJ81; dmarc=none; spf=pass (imf22.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Rspam-User: X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: DC2A0C001F X-HE-Tag: 1647985787-298165 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: SeongJae Park Subject: mm/damon/sysfs: support DAMOS watermarks This commit makes DAMON sysfs interface supports the DAMOS watermarks feature. Specifically, this commit adds 'watermarks' directory under each scheme directory and makes kdamond 'state' file writing respects the contents in the directory. As a result, the files hierarchy becomes as below: /sys/kernel/mm/damon/admin │ kdamonds/nr_kdamonds │ │ 0/state,pid │ │ │ contexts/nr_contexts │ │ │ │ 0/operations │ │ │ │ │ monitoring_attrs/intervals/sample_us,aggr_us,update_us │ │ │ │ │ │ nr_regions/min,max │ │ │ │ │ targets/nr_targets │ │ │ │ │ │ 0/pid_target │ │ │ │ │ │ │ regions/nr_regions │ │ │ │ │ │ │ │ 0/start,end │ │ │ │ │ │ │ │ ... │ │ │ │ │ │ ... │ │ │ │ │ schemes/nr_schemes │ │ │ │ │ │ 0/action │ │ │ │ │ │ │ access_pattern/ │ │ │ │ │ │ │ │ sz/min,max │ │ │ │ │ │ │ │ nr_accesses/min,max │ │ │ │ │ │ │ │ age/min,max │ │ │ │ │ │ │ quotas/ms,sz,reset_interval_ms │ │ │ │ │ │ │ │ weights/sz_permil,nr_accesses_permil,age_permil │ │ │ │ │ │ │ watermarks/ <- NEW DIRECTORY │ │ │ │ │ │ │ │ metric,interval_us,high,mid,lo │ │ │ │ │ │ ... │ │ │ │ ... │ │ ... [sj@kernel.org: fix out-of-bound array access for wmark_metric_strs[]] Link: https://lkml.kernel.org/r/20220301185619.2904-1-sj@kernel.org Link: https://lkml.kernel.org/r/20220228081314.5770-10-sj@kernel.org Signed-off-by: SeongJae Park Cc: David Rientjes Cc: Greg Kroah-Hartman Cc: Jonathan Corbet Cc: Shuah Khan Cc: Xin Hao Cc: Colin Ian King Signed-off-by: Andrew Morton --- mm/damon/sysfs.c | 220 +++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 215 insertions(+), 5 deletions(-) --- a/mm/damon/sysfs.c~mm-damon-sysfs-support-damos-watermarks +++ a/mm/damon/sysfs.c @@ -114,6 +114,189 @@ static struct kobj_type damon_sysfs_ul_r }; /* + * watermarks directory + */ + +struct damon_sysfs_watermarks { + struct kobject kobj; + enum damos_wmark_metric metric; + unsigned long interval_us; + unsigned long high; + unsigned long mid; + unsigned long low; +}; + +static struct damon_sysfs_watermarks *damon_sysfs_watermarks_alloc( + enum damos_wmark_metric metric, unsigned long interval_us, + unsigned long high, unsigned long mid, unsigned long low) +{ + struct damon_sysfs_watermarks *watermarks = kmalloc( + sizeof(*watermarks), GFP_KERNEL); + + if (!watermarks) + return NULL; + watermarks->kobj = (struct kobject){}; + watermarks->metric = metric; + watermarks->interval_us = interval_us; + watermarks->high = high; + watermarks->mid = mid; + watermarks->low = low; + return watermarks; +} + +/* Should match with enum damos_wmark_metric */ +static const char * const damon_sysfs_wmark_metric_strs[] = { + "none", + "free_mem_rate", +}; + +static ssize_t metric_show(struct kobject *kobj, struct kobj_attribute *attr, + char *buf) +{ + struct damon_sysfs_watermarks *watermarks = container_of(kobj, + struct damon_sysfs_watermarks, kobj); + + return sysfs_emit(buf, "%s\n", + damon_sysfs_wmark_metric_strs[watermarks->metric]); +} + +static ssize_t metric_store(struct kobject *kobj, struct kobj_attribute *attr, + const char *buf, size_t count) +{ + struct damon_sysfs_watermarks *watermarks = container_of(kobj, + struct damon_sysfs_watermarks, kobj); + enum damos_wmark_metric metric; + + for (metric = 0; metric < NR_DAMOS_WMARK_METRICS; metric++) { + if (sysfs_streq(buf, damon_sysfs_wmark_metric_strs[metric])) { + watermarks->metric = metric; + return count; + } + } + return -EINVAL; +} + +static ssize_t interval_us_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + struct damon_sysfs_watermarks *watermarks = container_of(kobj, + struct damon_sysfs_watermarks, kobj); + + return sysfs_emit(buf, "%lu\n", watermarks->interval_us); +} + +static ssize_t interval_us_store(struct kobject *kobj, + struct kobj_attribute *attr, const char *buf, size_t count) +{ + struct damon_sysfs_watermarks *watermarks = container_of(kobj, + struct damon_sysfs_watermarks, kobj); + int err = kstrtoul(buf, 0, &watermarks->interval_us); + + if (err) + return -EINVAL; + return count; +} + +static ssize_t high_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + struct damon_sysfs_watermarks *watermarks = container_of(kobj, + struct damon_sysfs_watermarks, kobj); + + return sysfs_emit(buf, "%lu\n", watermarks->high); +} + +static ssize_t high_store(struct kobject *kobj, + struct kobj_attribute *attr, const char *buf, size_t count) +{ + struct damon_sysfs_watermarks *watermarks = container_of(kobj, + struct damon_sysfs_watermarks, kobj); + int err = kstrtoul(buf, 0, &watermarks->high); + + if (err) + return -EINVAL; + return count; +} + +static ssize_t mid_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + struct damon_sysfs_watermarks *watermarks = container_of(kobj, + struct damon_sysfs_watermarks, kobj); + + return sysfs_emit(buf, "%lu\n", watermarks->mid); +} + +static ssize_t mid_store(struct kobject *kobj, + struct kobj_attribute *attr, const char *buf, size_t count) +{ + struct damon_sysfs_watermarks *watermarks = container_of(kobj, + struct damon_sysfs_watermarks, kobj); + int err = kstrtoul(buf, 0, &watermarks->mid); + + if (err) + return -EINVAL; + return count; +} + +static ssize_t low_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + struct damon_sysfs_watermarks *watermarks = container_of(kobj, + struct damon_sysfs_watermarks, kobj); + + return sysfs_emit(buf, "%lu\n", watermarks->low); +} + +static ssize_t low_store(struct kobject *kobj, + struct kobj_attribute *attr, const char *buf, size_t count) +{ + struct damon_sysfs_watermarks *watermarks = container_of(kobj, + struct damon_sysfs_watermarks, kobj); + int err = kstrtoul(buf, 0, &watermarks->low); + + if (err) + return -EINVAL; + return count; +} + +static void damon_sysfs_watermarks_release(struct kobject *kobj) +{ + kfree(container_of(kobj, struct damon_sysfs_watermarks, kobj)); +} + +static struct kobj_attribute damon_sysfs_watermarks_metric_attr = + __ATTR_RW_MODE(metric, 0600); + +static struct kobj_attribute damon_sysfs_watermarks_interval_us_attr = + __ATTR_RW_MODE(interval_us, 0600); + +static struct kobj_attribute damon_sysfs_watermarks_high_attr = + __ATTR_RW_MODE(high, 0600); + +static struct kobj_attribute damon_sysfs_watermarks_mid_attr = + __ATTR_RW_MODE(mid, 0600); + +static struct kobj_attribute damon_sysfs_watermarks_low_attr = + __ATTR_RW_MODE(low, 0600); + +static struct attribute *damon_sysfs_watermarks_attrs[] = { + &damon_sysfs_watermarks_metric_attr.attr, + &damon_sysfs_watermarks_interval_us_attr.attr, + &damon_sysfs_watermarks_high_attr.attr, + &damon_sysfs_watermarks_mid_attr.attr, + &damon_sysfs_watermarks_low_attr.attr, + NULL, +}; +ATTRIBUTE_GROUPS(damon_sysfs_watermarks); + +static struct kobj_type damon_sysfs_watermarks_ktype = { + .release = damon_sysfs_watermarks_release, + .sysfs_ops = &kobj_sysfs_ops, + .default_groups = damon_sysfs_watermarks_groups, +}; + +/* * scheme/weights directory */ @@ -469,6 +652,7 @@ struct damon_sysfs_scheme { enum damos_action action; struct damon_sysfs_access_pattern *access_pattern; struct damon_sysfs_quotas *quotas; + struct damon_sysfs_watermarks *watermarks; }; /* This should match with enum damos_action */ @@ -541,6 +725,24 @@ out: return err; } +static int damon_sysfs_scheme_set_watermarks(struct damon_sysfs_scheme *scheme) +{ + struct damon_sysfs_watermarks *watermarks = + damon_sysfs_watermarks_alloc(DAMOS_WMARK_NONE, 0, 0, 0, 0); + int err; + + if (!watermarks) + return -ENOMEM; + err = kobject_init_and_add(&watermarks->kobj, + &damon_sysfs_watermarks_ktype, &scheme->kobj, + "watermarks"); + if (err) + kobject_put(&watermarks->kobj); + else + scheme->watermarks = watermarks; + return err; +} + static int damon_sysfs_scheme_add_dirs(struct damon_sysfs_scheme *scheme) { int err; @@ -551,8 +753,14 @@ static int damon_sysfs_scheme_add_dirs(s err = damon_sysfs_scheme_set_quotas(scheme); if (err) goto put_access_pattern_out; + err = damon_sysfs_scheme_set_watermarks(scheme); + if (err) + goto put_quotas_access_pattern_out; return 0; +put_quotas_access_pattern_out: + kobject_put(&scheme->quotas->kobj); + scheme->quotas = NULL; put_access_pattern_out: kobject_put(&scheme->access_pattern->kobj); scheme->access_pattern = NULL; @@ -565,6 +773,7 @@ static void damon_sysfs_scheme_rm_dirs(s kobject_put(&scheme->access_pattern->kobj); damon_sysfs_quotas_rm_dirs(scheme->quotas); kobject_put(&scheme->quotas->kobj); + kobject_put(&scheme->watermarks->kobj); } static ssize_t action_show(struct kobject *kobj, struct kobj_attribute *attr, @@ -1805,6 +2014,7 @@ static struct damos *damon_sysfs_mk_sche sysfs_scheme->access_pattern; struct damon_sysfs_quotas *sysfs_quotas = sysfs_scheme->quotas; struct damon_sysfs_weights *sysfs_weights = sysfs_quotas->weights; + struct damon_sysfs_watermarks *sysfs_wmarks = sysfs_scheme->watermarks; struct damos_quota quota = { .ms = sysfs_quotas->ms, .sz = sysfs_quotas->sz, @@ -1814,11 +2024,11 @@ static struct damos *damon_sysfs_mk_sche .weight_age = sysfs_weights->age, }; struct damos_watermarks wmarks = { - .metric = DAMOS_WMARK_NONE, - .interval = 0, - .high = 0, - .mid = 0, - .low = 0, + .metric = sysfs_wmarks->metric, + .interval = sysfs_wmarks->interval_us, + .high = sysfs_wmarks->high, + .mid = sysfs_wmarks->mid, + .low = sysfs_wmarks->low, }; return damon_new_scheme(pattern->sz->min, pattern->sz->max, From patchwork Tue Mar 22 21:49:49 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789307 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5335EC433F5 for ; Tue, 22 Mar 2022 21:49:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D98536B025A; Tue, 22 Mar 2022 17:49:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CF8AA6B025B; Tue, 22 Mar 2022 17:49:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B26076B025C; Tue, 22 Mar 2022 17:49:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0042.hostedemail.com [216.40.44.42]) by kanga.kvack.org (Postfix) with ESMTP id 9D9916B025A for ; Tue, 22 Mar 2022 17:49:51 -0400 (EDT) Received: from smtpin19.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 5EFFC8249980 for ; Tue, 22 Mar 2022 21:49:51 +0000 (UTC) X-FDA: 79273364982.19.0F9B21E Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf19.hostedemail.com (Postfix) with ESMTP id DE8701A0033 for ; Tue, 22 Mar 2022 21:49:50 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 50767615BD; Tue, 22 Mar 2022 21:49:50 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id A1125C340EE; Tue, 22 Mar 2022 21:49:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985789; bh=3yTsixFqPaVKhl6xjPL9cDXevmSKT3jMGXb/cyHaWgM=; h=Date:To:From:In-Reply-To:Subject:From; b=Wa6L3UXGjrbVQXUKbVsK50d1gkaSYjSndBJL5GlrKL0JWp0SvDfXKaeULwVSsb+vr Z2b2wQAzxaMKAQKNl6UHZ4R3Jiwd9nRW9TyVx/sZJLEw7vu4DCzGPdi2jTYN3xy6lI jcMhTj/996tzvmQqTQKLQ70NagBpU+I3SF3KfLxc= Date: Tue, 22 Mar 2022 14:49:49 -0700 To: xhao@linux.alibaba.com,skhan@linuxfoundation.org,rientjes@google.com,gregkh@linuxfoundation.org,corbet@lwn.net,sj@kernel.org,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 223/227] mm/damon/sysfs: support DAMOS stats Message-Id: <20220322214949.A1125C340EE@smtp.kernel.org> X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: DE8701A0033 X-Stat-Signature: oytjs4ha7acokefhr1q41mcjeyfjpye6 X-Rspam-User: Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=Wa6L3UXG; dmarc=none; spf=pass (imf19.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-HE-Tag: 1647985790-100886 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: SeongJae Park Subject: mm/damon/sysfs: support DAMOS stats This commit makes DAMON sysfs interface supports the DAMOS stats feature. Specifically, this commit adds 'stats' directory under each scheme directory, and update the contents of the files under the directory according to the latest monitoring results, when the user writes special keyword, 'update_schemes_stats' to the 'state' file of the kdamond. As a result, the files hierarchy becomes as below: /sys/kernel/mm/damon/admin │ kdamonds/nr_kdamonds │ │ 0/state,pid │ │ │ contexts/nr_contexts │ │ │ │ 0/operations │ │ │ │ │ monitoring_attrs/intervals/sample_us,aggr_us,update_us │ │ │ │ │ │ nr_regions/min,max │ │ │ │ │ targets/nr_targets │ │ │ │ │ │ 0/pid_target │ │ │ │ │ │ │ regions/nr_regions │ │ │ │ │ │ │ │ 0/start,end │ │ │ │ │ │ │ │ ... │ │ │ │ │ │ ... │ │ │ │ │ schemes/nr_schemes │ │ │ │ │ │ 0/action │ │ │ │ │ │ │ access_pattern/ │ │ │ │ │ │ │ │ sz/min,max │ │ │ │ │ │ │ │ nr_accesses/min,max │ │ │ │ │ │ │ │ age/min,max │ │ │ │ │ │ │ quotas/ms,sz,reset_interval_ms │ │ │ │ │ │ │ │ weights/sz_permil,nr_accesses_permil,age_permil │ │ │ │ │ │ │ watermarks/metric,interval_us,high,mid,low │ │ │ │ │ │ │ stats/ <- NEW DIRECTORY │ │ │ │ │ │ │ │ nr_tried,sz_tried,nr_applied,sz_applied,qt_exceeds │ │ │ │ │ │ ... │ │ │ │ ... │ │ ... Link: https://lkml.kernel.org/r/20220228081314.5770-11-sj@kernel.org Signed-off-by: SeongJae Park Cc: David Rientjes Cc: Greg Kroah-Hartman Cc: Jonathan Corbet Cc: Shuah Khan Cc: Xin Hao Signed-off-by: Andrew Morton --- mm/damon/sysfs.c | 150 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 150 insertions(+) --- a/mm/damon/sysfs.c~mm-damon-sysfs-support-damos-stats +++ a/mm/damon/sysfs.c @@ -114,6 +114,105 @@ static struct kobj_type damon_sysfs_ul_r }; /* + * schemes/stats directory + */ + +struct damon_sysfs_stats { + struct kobject kobj; + unsigned long nr_tried; + unsigned long sz_tried; + unsigned long nr_applied; + unsigned long sz_applied; + unsigned long qt_exceeds; +}; + +static struct damon_sysfs_stats *damon_sysfs_stats_alloc(void) +{ + return kzalloc(sizeof(struct damon_sysfs_stats), GFP_KERNEL); +} + +static ssize_t nr_tried_show(struct kobject *kobj, struct kobj_attribute *attr, + char *buf) +{ + struct damon_sysfs_stats *stats = container_of(kobj, + struct damon_sysfs_stats, kobj); + + return sysfs_emit(buf, "%lu\n", stats->nr_tried); +} + +static ssize_t sz_tried_show(struct kobject *kobj, struct kobj_attribute *attr, + char *buf) +{ + struct damon_sysfs_stats *stats = container_of(kobj, + struct damon_sysfs_stats, kobj); + + return sysfs_emit(buf, "%lu\n", stats->sz_tried); +} + +static ssize_t nr_applied_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + struct damon_sysfs_stats *stats = container_of(kobj, + struct damon_sysfs_stats, kobj); + + return sysfs_emit(buf, "%lu\n", stats->nr_applied); +} + +static ssize_t sz_applied_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + struct damon_sysfs_stats *stats = container_of(kobj, + struct damon_sysfs_stats, kobj); + + return sysfs_emit(buf, "%lu\n", stats->sz_applied); +} + +static ssize_t qt_exceeds_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + struct damon_sysfs_stats *stats = container_of(kobj, + struct damon_sysfs_stats, kobj); + + return sysfs_emit(buf, "%lu\n", stats->qt_exceeds); +} + +static void damon_sysfs_stats_release(struct kobject *kobj) +{ + kfree(container_of(kobj, struct damon_sysfs_stats, kobj)); +} + +static struct kobj_attribute damon_sysfs_stats_nr_tried_attr = + __ATTR_RO_MODE(nr_tried, 0400); + +static struct kobj_attribute damon_sysfs_stats_sz_tried_attr = + __ATTR_RO_MODE(sz_tried, 0400); + +static struct kobj_attribute damon_sysfs_stats_nr_applied_attr = + __ATTR_RO_MODE(nr_applied, 0400); + +static struct kobj_attribute damon_sysfs_stats_sz_applied_attr = + __ATTR_RO_MODE(sz_applied, 0400); + +static struct kobj_attribute damon_sysfs_stats_qt_exceeds_attr = + __ATTR_RO_MODE(qt_exceeds, 0400); + +static struct attribute *damon_sysfs_stats_attrs[] = { + &damon_sysfs_stats_nr_tried_attr.attr, + &damon_sysfs_stats_sz_tried_attr.attr, + &damon_sysfs_stats_nr_applied_attr.attr, + &damon_sysfs_stats_sz_applied_attr.attr, + &damon_sysfs_stats_qt_exceeds_attr.attr, + NULL, +}; +ATTRIBUTE_GROUPS(damon_sysfs_stats); + +static struct kobj_type damon_sysfs_stats_ktype = { + .release = damon_sysfs_stats_release, + .sysfs_ops = &kobj_sysfs_ops, + .default_groups = damon_sysfs_stats_groups, +}; + +/* * watermarks directory */ @@ -653,6 +752,7 @@ struct damon_sysfs_scheme { struct damon_sysfs_access_pattern *access_pattern; struct damon_sysfs_quotas *quotas; struct damon_sysfs_watermarks *watermarks; + struct damon_sysfs_stats *stats; }; /* This should match with enum damos_action */ @@ -743,6 +843,22 @@ static int damon_sysfs_scheme_set_waterm return err; } +static int damon_sysfs_scheme_set_stats(struct damon_sysfs_scheme *scheme) +{ + struct damon_sysfs_stats *stats = damon_sysfs_stats_alloc(); + int err; + + if (!stats) + return -ENOMEM; + err = kobject_init_and_add(&stats->kobj, &damon_sysfs_stats_ktype, + &scheme->kobj, "stats"); + if (err) + kobject_put(&stats->kobj); + else + scheme->stats = stats; + return err; +} + static int damon_sysfs_scheme_add_dirs(struct damon_sysfs_scheme *scheme) { int err; @@ -756,8 +872,14 @@ static int damon_sysfs_scheme_add_dirs(s err = damon_sysfs_scheme_set_watermarks(scheme); if (err) goto put_quotas_access_pattern_out; + err = damon_sysfs_scheme_set_stats(scheme); + if (err) + goto put_watermarks_quotas_access_pattern_out; return 0; +put_watermarks_quotas_access_pattern_out: + kobject_put(&scheme->watermarks->kobj); + scheme->watermarks = NULL; put_quotas_access_pattern_out: kobject_put(&scheme->quotas->kobj); scheme->quotas = NULL; @@ -774,6 +896,7 @@ static void damon_sysfs_scheme_rm_dirs(s damon_sysfs_quotas_rm_dirs(scheme->quotas); kobject_put(&scheme->quotas->kobj); kobject_put(&scheme->watermarks->kobj); + kobject_put(&scheme->stats->kobj); } static ssize_t action_show(struct kobject *kobj, struct kobj_attribute *attr, @@ -2141,6 +2264,31 @@ static int damon_sysfs_turn_damon_off(st */ } +static int damon_sysfs_update_schemes_stats(struct damon_sysfs_kdamond *kdamond) +{ + struct damon_ctx *ctx = kdamond->damon_ctx; + struct damos *scheme; + int schemes_idx = 0; + + if (!ctx) + return -EINVAL; + mutex_lock(&ctx->kdamond_lock); + damon_for_each_scheme(scheme, ctx) { + struct damon_sysfs_schemes *sysfs_schemes; + struct damon_sysfs_stats *sysfs_stats; + + sysfs_schemes = kdamond->contexts->contexts_arr[0]->schemes; + sysfs_stats = sysfs_schemes->schemes_arr[schemes_idx++]->stats; + sysfs_stats->nr_tried = scheme->stat.nr_tried; + sysfs_stats->sz_tried = scheme->stat.sz_tried; + sysfs_stats->nr_applied = scheme->stat.nr_applied; + sysfs_stats->sz_applied = scheme->stat.sz_applied; + sysfs_stats->qt_exceeds = scheme->stat.qt_exceeds; + } + mutex_unlock(&ctx->kdamond_lock); + return 0; +} + static ssize_t state_store(struct kobject *kobj, struct kobj_attribute *attr, const char *buf, size_t count) { @@ -2154,6 +2302,8 @@ static ssize_t state_store(struct kobjec ret = damon_sysfs_turn_damon_on(kdamond); else if (sysfs_streq(buf, "off")) ret = damon_sysfs_turn_damon_off(kdamond); + else if (sysfs_streq(buf, "update_schemes_stats")) + ret = damon_sysfs_update_schemes_stats(kdamond); else ret = -EINVAL; mutex_unlock(&damon_sysfs_lock); From patchwork Tue Mar 22 21:49:52 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789308 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C9709C433EF for ; Tue, 22 Mar 2022 21:49:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 566AB6B025C; Tue, 22 Mar 2022 17:49:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4E35D6B025E; Tue, 22 Mar 2022 17:49:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 35DBF6B025F; Tue, 22 Mar 2022 17:49:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0220.hostedemail.com [216.40.44.220]) by kanga.kvack.org (Postfix) with ESMTP id 1EE776B025C for ; Tue, 22 Mar 2022 17:49:56 -0400 (EDT) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id D47F4A30A0 for ; Tue, 22 Mar 2022 21:49:55 +0000 (UTC) X-FDA: 79273365150.26.5E6B227 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf18.hostedemail.com (Postfix) with ESMTP id 59DF81C000B for ; Tue, 22 Mar 2022 21:49:55 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 199D4B81DAB; Tue, 22 Mar 2022 21:49:54 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id AAC83C340EE; Tue, 22 Mar 2022 21:49:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985792; bh=eVKmc8EDif3sWA6QQTHPM0SL4QRKdyJ4XV4xokzGKCA=; h=Date:To:From:In-Reply-To:Subject:From; b=cIK3gLKLuj3O4ef7s44H9KCb780jSobzTD8K7Z2Lf52QoULKg/0EBszDzK/4Ny+RG QP3mFSloTFdovi109nnxUrd3kDCjRxItUsT/S3Qb5e73w2x7UQMQZ+CachOn4ertJP mRyo/1U7xegpLOSz9dOBN9E+udrJwgzEdLyRbKas= Date: Tue, 22 Mar 2022 14:49:52 -0700 To: xhao@linux.alibaba.com,skhan@linuxfoundation.org,rientjes@google.com,gregkh@linuxfoundation.org,corbet@lwn.net,sj@kernel.org,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 224/227] selftests/damon: add a test for DAMON sysfs interface Message-Id: <20220322214952.AAC83C340EE@smtp.kernel.org> X-Stat-Signature: z65xkj83gtritdnkg1yqt7d8qsnbgu3x Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=cIK3gLKL; spf=pass (imf18.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 59DF81C000B X-HE-Tag: 1647985795-776789 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000001, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: SeongJae Park Subject: selftests/damon: add a test for DAMON sysfs interface This commit adds a selftest for DAMON sysfs interface. It tests the functionality of 'nr' files and existence of files in each directory of the hierarchy. Link: https://lkml.kernel.org/r/20220228081314.5770-12-sj@kernel.org Signed-off-by: SeongJae Park Cc: David Rientjes Cc: Greg Kroah-Hartman Cc: Jonathan Corbet Cc: Shuah Khan Cc: Xin Hao Signed-off-by: Andrew Morton --- tools/testing/selftests/damon/Makefile | 1 tools/testing/selftests/damon/sysfs.sh | 306 +++++++++++++++++++++++ 2 files changed, 307 insertions(+) --- a/tools/testing/selftests/damon/Makefile~selftests-damon-add-a-test-for-damon-sysfs-interface +++ a/tools/testing/selftests/damon/Makefile @@ -6,5 +6,6 @@ TEST_GEN_FILES += huge_count_read_write TEST_FILES = _chk_dependency.sh _debugfs_common.sh TEST_PROGS = debugfs_attrs.sh debugfs_schemes.sh debugfs_target_ids.sh TEST_PROGS += debugfs_empty_targets.sh debugfs_huge_count_read_write.sh +TEST_PROGS += sysfs.sh include ../lib.mk --- /dev/null +++ a/tools/testing/selftests/damon/sysfs.sh @@ -0,0 +1,306 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 + +# Kselftest frmework requirement - SKIP code is 4. +ksft_skip=4 + +ensure_write_succ() +{ + file=$1 + content=$2 + reason=$3 + + if ! echo "$content" > "$file" + then + echo "writing $content to $file failed" + echo "expected success because $reason" + exit 1 + fi +} + +ensure_write_fail() +{ + file=$1 + content=$2 + reason=$3 + + if echo "$content" > "$file" + then + echo "writing $content to $file succeed ($fail_reason)" + echo "expected failure because $reason" + exit 1 + fi +} + +ensure_dir() +{ + dir=$1 + to_ensure=$2 + if [ "$to_ensure" = "exist" ] && [ ! -d "$dir" ] + then + echo "$dir dir is expected but not found" + exit 1 + elif [ "$to_ensure" = "not_exist" ] && [ -d "$dir" ] + then + echo "$dir dir is not expected but found" + exit 1 + fi +} + +ensure_file() +{ + file=$1 + to_ensure=$2 + permission=$3 + if [ "$to_ensure" = "exist" ] + then + if [ ! -f "$file" ] + then + echo "$file is expected but not found" + exit 1 + fi + perm=$(stat -c "%a" "$file") + if [ ! "$perm" = "$permission" ] + then + echo "$file permission: expected $permission but $perm" + exit 1 + fi + elif [ "$to_ensure" = "not_exist" ] && [ -f "$dir" ] + then + echo "$file is not expected but found" + exit 1 + fi +} + +test_range() +{ + range_dir=$1 + ensure_dir "$range_dir" "exist" + ensure_file "$range_dir/min" "exist" 600 + ensure_file "$range_dir/max" "exist" 600 +} + +test_stats() +{ + stats_dir=$1 + ensure_dir "$stats_dir" "exist" + for f in nr_tried sz_tried nr_applied sz_applied qt_exceeds + do + ensure_file "$stats_dir/$f" "exist" "400" + done +} + +test_watermarks() +{ + watermarks_dir=$1 + ensure_dir "$watermarks_dir" "exist" + ensure_file "$watermarks_dir/metric" "exist" "600" + ensure_file "$watermarks_dir/interval_us" "exist" "600" + ensure_file "$watermarks_dir/high" "exist" "600" + ensure_file "$watermarks_dir/mid" "exist" "600" + ensure_file "$watermarks_dir/low" "exist" "600" +} + +test_weights() +{ + weights_dir=$1 + ensure_dir "$weights_dir" "exist" + ensure_file "$weights_dir/sz_permil" "exist" "600" + ensure_file "$weights_dir/nr_accesses_permil" "exist" "600" + ensure_file "$weights_dir/age_permil" "exist" "600" +} + +test_quotas() +{ + quotas_dir=$1 + ensure_dir "$quotas_dir" "exist" + ensure_file "$quotas_dir/ms" "exist" 600 + ensure_file "$quotas_dir/bytes" "exist" 600 + ensure_file "$quotas_dir/reset_interval_ms" "exist" 600 + test_weights "$quotas_dir/weights" +} + +test_access_pattern() +{ + access_pattern_dir=$1 + ensure_dir "$access_pattern_dir" "exist" + test_range "$access_pattern_dir/age" + test_range "$access_pattern_dir/nr_accesses" + test_range "$access_pattern_dir/sz" +} + +test_scheme() +{ + scheme_dir=$1 + ensure_dir "$scheme_dir" "exist" + ensure_file "$scheme_dir/action" "exist" "600" + test_access_pattern "$scheme_dir/access_pattern" + test_quotas "$scheme_dir/quotas" + test_watermarks "$scheme_dir/watermarks" + test_stats "$scheme_dir/stats" +} + +test_schemes() +{ + schemes_dir=$1 + ensure_dir "$schemes_dir" "exist" + ensure_file "$schemes_dir/nr_schemes" "exist" 600 + + ensure_write_succ "$schemes_dir/nr_schemes" "1" "valid input" + test_scheme "$schemes_dir/0" + + ensure_write_succ "$schemes_dir/nr_schemes" "2" "valid input" + test_scheme "$schemes_dir/0" + test_scheme "$schemes_dir/1" + + ensure_write_succ "$schemes_dir/nr_schemes" "0" "valid input" + ensure_dir "$schemes_dir/0" "not_exist" + ensure_dir "$schemes_dir/1" "not_exist" +} + +test_region() +{ + region_dir=$1 + ensure_dir "$region_dir" "exist" + ensure_file "$region_dir/start" "exist" 600 + ensure_file "$region_dir/end" "exist" 600 +} + +test_regions() +{ + regions_dir=$1 + ensure_dir "$regions_dir" "exist" + ensure_file "$regions_dir/nr_regions" "exist" 600 + + ensure_write_succ "$regions_dir/nr_regions" "1" "valid input" + test_region "$regions_dir/0" + + ensure_write_succ "$regions_dir/nr_regions" "2" "valid input" + test_region "$regions_dir/0" + test_region "$regions_dir/1" + + ensure_write_succ "$regions_dir/nr_regions" "0" "valid input" + ensure_dir "$regions_dir/0" "not_exist" + ensure_dir "$regions_dir/1" "not_exist" +} + +test_target() +{ + target_dir=$1 + ensure_dir "$target_dir" "exist" + ensure_file "$target_dir/pid_target" "exist" "600" + test_regions "$target_dir/regions" +} + +test_targets() +{ + targets_dir=$1 + ensure_dir "$targets_dir" "exist" + ensure_file "$targets_dir/nr_targets" "exist" 600 + + ensure_write_succ "$targets_dir/nr_targets" "1" "valid input" + test_target "$targets_dir/0" + + ensure_write_succ "$targets_dir/nr_targets" "2" "valid input" + test_target "$targets_dir/0" + test_target "$targets_dir/1" + + ensure_write_succ "$targets_dir/nr_targets" "0" "valid input" + ensure_dir "$targets_dir/0" "not_exist" + ensure_dir "$targets_dir/1" "not_exist" +} + +test_intervals() +{ + intervals_dir=$1 + ensure_dir "$intervals_dir" "exist" + ensure_file "$intervals_dir/aggr_us" "exist" "600" + ensure_file "$intervals_dir/sample_us" "exist" "600" + ensure_file "$intervals_dir/update_us" "exist" "600" +} + +test_monitoring_attrs() +{ + monitoring_attrs_dir=$1 + ensure_dir "$monitoring_attrs_dir" "exist" + test_intervals "$monitoring_attrs_dir/intervals" + test_range "$monitoring_attrs_dir/nr_regions" +} + +test_context() +{ + context_dir=$1 + ensure_dir "$context_dir" "exist" + ensure_file "$context_dir/operations" "exist" 600 + test_monitoring_attrs "$context_dir/monitoring_attrs" + test_targets "$context_dir/targets" + test_schemes "$context_dir/schemes" +} + +test_contexts() +{ + contexts_dir=$1 + ensure_dir "$contexts_dir" "exist" + ensure_file "$contexts_dir/nr_contexts" "exist" 600 + + ensure_write_succ "$contexts_dir/nr_contexts" "1" "valid input" + test_context "$contexts_dir/0" + + ensure_write_fail "$contexts_dir/nr_contexts" "2" "only 0/1 are supported" + test_context "$contexts_dir/0" + + ensure_write_succ "$contexts_dir/nr_contexts" "0" "valid input" + ensure_dir "$contexts_dir/0" "not_exist" +} + +test_kdamond() +{ + kdamond_dir=$1 + ensure_dir "$kdamond_dir" "exist" + ensure_file "$kdamond_dir/state" "exist" "600" + ensure_file "$kdamond_dir/pid" "exist" 400 + test_contexts "$kdamond_dir/contexts" +} + +test_kdamonds() +{ + kdamonds_dir=$1 + ensure_dir "$kdamonds_dir" "exist" + + ensure_file "$kdamonds_dir/nr_kdamonds" "exist" "600" + + ensure_write_succ "$kdamonds_dir/nr_kdamonds" "1" "valid input" + test_kdamond "$kdamonds_dir/0" + + ensure_write_succ "$kdamonds_dir/nr_kdamonds" "2" "valid input" + test_kdamond "$kdamonds_dir/0" + test_kdamond "$kdamonds_dir/1" + + ensure_write_succ "$kdamonds_dir/nr_kdamonds" "0" "valid input" + ensure_dir "$kdamonds_dir/0" "not_exist" + ensure_dir "$kdamonds_dir/1" "not_exist" +} + +test_damon_sysfs() +{ + damon_sysfs=$1 + if [ ! -d "$damon_sysfs" ] + then + echo "$damon_sysfs not found" + exit $ksft_skip + fi + + test_kdamonds "$damon_sysfs/kdamonds" +} + +check_dependencies() +{ + if [ $EUID -ne 0 ] + then + echo "Run as root" + exit $ksft_skip + fi +} + +check_dependencies +test_damon_sysfs "/sys/kernel/mm/damon/admin" From patchwork Tue Mar 22 21:49:55 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789309 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A79ACC433F5 for ; Tue, 22 Mar 2022 21:49:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3FD986B025F; Tue, 22 Mar 2022 17:49:59 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 35BAE6B0260; Tue, 22 Mar 2022 17:49:59 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1D6676B0261; Tue, 22 Mar 2022 17:49:59 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.27]) by kanga.kvack.org (Postfix) with ESMTP id 024226B025F for ; Tue, 22 Mar 2022 17:49:59 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id C6F7F16E6 for ; Tue, 22 Mar 2022 21:49:58 +0000 (UTC) X-FDA: 79273365276.04.D16DBE8 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf12.hostedemail.com (Postfix) with ESMTP id 249084003C for ; Tue, 22 Mar 2022 21:49:57 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id E3279B81DB7; Tue, 22 Mar 2022 21:49:56 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9E460C340EC; Tue, 22 Mar 2022 21:49:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985795; bh=yudckYbSmHuj3jJoUFQrJIVHdZk0aFgQ77QdZIFQDDQ=; h=Date:To:From:In-Reply-To:Subject:From; b=eo8l004Z6O6gmhw108a/BuHGEKWoHdMT4dkIk5miZyUEx5/gbGYSJe+g4L2hhm+ME eIYGwOIoGaAXtYFOA4lgJv/+cJUrpZRe2CZQfTrudYIcm4Ym/jxCAk9qX+TZfZ1Wuc FVqxpucqL2hSKTHHOh/L03zJqVo4tK5NNLNwNFH8= Date: Tue, 22 Mar 2022 14:49:55 -0700 To: xhao@linux.alibaba.com,skhan@linuxfoundation.org,rientjes@google.com,gregkh@linuxfoundation.org,corbet@lwn.net,sj@kernel.org,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 225/227] Docs/admin-guide/mm/damon/usage: document DAMON sysfs interface Message-Id: <20220322214955.9E460C340EC@smtp.kernel.org> X-Stat-Signature: zjerw8hiz1i8ij6zzhyjypf7doo1m67q Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=eo8l004Z; spf=pass (imf12.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspam-User: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 249084003C X-HE-Tag: 1647985797-458984 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: SeongJae Park Subject: Docs/admin-guide/mm/damon/usage: document DAMON sysfs interface This commit adds detailed usage of DAMON sysfs interface in the admin-guide document for DAMON. Link: https://lkml.kernel.org/r/20220228081314.5770-13-sj@kernel.org Signed-off-by: SeongJae Park Cc: David Rientjes Cc: Greg Kroah-Hartman Cc: Jonathan Corbet Cc: Shuah Khan Cc: Xin Hao Signed-off-by: Andrew Morton --- Documentation/admin-guide/mm/damon/usage.rst | 350 ++++++++++++++++- 1 file changed, 344 insertions(+), 6 deletions(-) --- a/Documentation/admin-guide/mm/damon/usage.rst~docs-admin-guide-mm-damon-usage-document-damon-sysfs-interface +++ a/Documentation/admin-guide/mm/damon/usage.rst @@ -4,7 +4,7 @@ Detailed Usages =============== -DAMON provides below three interfaces for different users. +DAMON provides below interfaces for different users. - *DAMON user space tool.* `This `_ is for privileged people such as @@ -14,17 +14,21 @@ DAMON provides below three interfaces fo virtual and physical address spaces monitoring. For more detail, please refer to its `usage document `_. -- *debugfs interface.* - :ref:`This ` is for privileged user space programmers who +- *sysfs interface.* + :ref:`This ` is for privileged user space programmers who want more optimized use of DAMON. Using this, users can use DAMON’s major - features by reading from and writing to special debugfs files. Therefore, - you can write and use your personalized DAMON debugfs wrapper programs that - reads/writes the debugfs files instead of you. The `DAMON user space tool + features by reading from and writing to special sysfs files. Therefore, + you can write and use your personalized DAMON sysfs wrapper programs that + reads/writes the sysfs files instead of you. The `DAMON user space tool `_ is one example of such programs. It supports both virtual and physical address spaces monitoring. Note that this interface provides only simple :ref:`statistics ` for the monitoring results. For detailed monitoring results, DAMON provides a :ref:`tracepoint `. +- *debugfs interface.* + :ref:`This ` is almost identical to :ref:`sysfs interface + `. This will be removed after next LTS kernel is released, + so users should move to the :ref:`sysfs interface `. - *Kernel Space Programming Interface.* :doc:`This ` is for kernel space programmers. Using this, users can utilize every feature of DAMON most flexibly and efficiently by @@ -32,6 +36,340 @@ DAMON provides below three interfaces fo DAMON for various address spaces. For detail, please refer to the interface :doc:`document `. +.. _sysfs_interface: + +sysfs Interface +=============== + +DAMON sysfs interface is built when ``CONFIG_DAMON_SYSFS`` is defined. It +creates multiple directories and files under its sysfs directory, +``/kernel/mm/damon/``. You can control DAMON by writing to and reading +from the files under the directory. + +For a short example, users can monitor the virtual address space of a given +workload as below. :: + + # cd /sys/kernel/mm/damon/admin/ + # echo 1 > kdamonds/nr && echo 1 > kdamonds/0/contexts/nr + # echo vaddr > kdamonds/0/contexts/0/operations + # echo 1 > kdamonds/0/contexts/0/targets/nr + # echo $(pidof ) > kdamonds/0/contexts/0/targets/0/pid + # echo on > kdamonds/0/state + +Files Hierarchy +--------------- + +The files hierarchy of DAMON sysfs interface is shown below. In the below +figure, parents-children relations are represented with indentations, each +directory is having ``/`` suffix, and files in each directory are separated by +comma (","). :: + + /sys/kernel/mm/damon/admin + │ kdamonds/nr_kdamonds + │ │ 0/state,pid + │ │ │ contexts/nr_contexts + │ │ │ │ 0/operations + │ │ │ │ │ monitoring_attrs/ + │ │ │ │ │ │ intervals/sample_us,aggr_us,update_us + │ │ │ │ │ │ nr_regions/min,max + │ │ │ │ │ targets/nr_targets + │ │ │ │ │ │ 0/pid_target + │ │ │ │ │ │ │ regions/nr_regions + │ │ │ │ │ │ │ │ 0/start,end + │ │ │ │ │ │ │ │ ... + │ │ │ │ │ │ ... + │ │ │ │ │ schemes/nr_schemes + │ │ │ │ │ │ 0/action + │ │ │ │ │ │ │ access_pattern/ + │ │ │ │ │ │ │ │ sz/min,max + │ │ │ │ │ │ │ │ nr_accesses/min,max + │ │ │ │ │ │ │ │ age/min,max + │ │ │ │ │ │ │ quotas/ms,bytes,reset_interval_ms + │ │ │ │ │ │ │ │ weights/sz_permil,nr_accesses_permil,age_permil + │ │ │ │ │ │ │ watermarks/metric,interval_us,high,mid,low + │ │ │ │ │ │ │ stats/nr_tried,sz_tried,nr_applied,sz_applied,qt_exceeds + │ │ │ │ │ │ ... + │ │ │ │ ... + │ │ ... + +Root +---- + +The root of the DAMON sysfs interface is ``/kernel/mm/damon/``, and it +has one directory named ``admin``. The directory contains the files for +privileged user space programs' control of DAMON. User space tools or deamons +having the root permission could use this directory. + +kdamonds/ +--------- + +The monitoring-related information including request specifications and results +are called DAMON context. DAMON executes each context with a kernel thread +called kdamond, and multiple kdamonds could run in parallel. + +Under the ``admin`` directory, one directory, ``kdamonds``, which has files for +controlling the kdamonds exist. In the beginning, this directory has only one +file, ``nr_kdamonds``. Writing a number (``N``) to the file creates the number +of child directories named ``0`` to ``N-1``. Each directory represents each +kdamond. + +kdamonds// +------------- + +In each kdamond directory, two files (``state`` and ``pid``) and one directory +(``contexts``) exist. + +Reading ``state`` returns ``on`` if the kdamond is currently running, or +``off`` if it is not running. Writing ``on`` or ``off`` makes the kdamond be +in the state. Writing ``update_schemes_stats`` to ``state`` file updates the +contents of stats files for each DAMON-based operation scheme of the kdamond. +For details of the stats, please refer to :ref:`stats section +`. + +If the state is ``on``, reading ``pid`` shows the pid of the kdamond thread. + +``contexts`` directory contains files for controlling the monitoring contexts +that this kdamond will execute. + +kdamonds//contexts/ +---------------------- + +In the beginning, this directory has only one file, ``nr_contexts``. Writing a +number (``N``) to the file creates the number of child directories named as +``0`` to ``N-1``. Each directory represents each monitoring context. At the +moment, only one context per kdamond is supported, so only ``0`` or ``1`` can +be written to the file. + +contexts// +------------- + +In each context directory, one file (``operations``) and three directories +(``monitoring_attrs``, ``targets``, and ``schemes``) exist. + +DAMON supports multiple types of monitoring operations, including those for +virtual address space and the physical address space. You can set and get what +type of monitoring operations DAMON will use for the context by writing one of +below keywords to, and reading from the file. + + - vaddr: Monitor virtual address spaces of specific processes + - paddr: Monitor the physical address space of the system + +contexts//monitoring_attrs/ +------------------------------ + +Files for specifying attributes of the monitoring including required quality +and efficiency of the monitoring are in ``monitoring_attrs`` directory. +Specifically, two directories, ``intervals`` and ``nr_regions`` exist in this +directory. + +Under ``intervals`` directory, three files for DAMON's sampling interval +(``sample_us``), aggregation interval (``aggr_us``), and update interval +(``update_us``) exist. You can set and get the values in micro-seconds by +writing to and reading from the files. + +Under ``nr_regions`` directory, two files for the lower-bound and upper-bound +of DAMON's monitoring regions (``min`` and ``max``, respectively), which +controls the monitoring overhead, exist. You can set and get the values by +writing to and rading from the files. + +For more details about the intervals and monitoring regions range, please refer +to the Design document (:doc:`/vm/damon/design`). + +contexts//targets/ +--------------------- + +In the beginning, this directory has only one file, ``nr_targets``. Writing a +number (``N``) to the file creates the number of child directories named ``0`` +to ``N-1``. Each directory represents each monitoring target. + +targets// +------------ + +In each target directory, one file (``pid_target``) and one directory +(``regions``) exist. + +If you wrote ``vaddr`` to the ``contexts//operations``, each target should +be a process. You can specify the process to DAMON by writing the pid of the +process to the ``pid_target`` file. + +targets//regions +------------------- + +When ``vaddr`` monitoring operations set is being used (``vaddr`` is written to +the ``contexts//operations`` file), DAMON automatically sets and updates the +monitoring target regions so that entire memory mappings of target processes +can be covered. However, users could want to set the initial monitoring region +to specific address ranges. + +In contrast, DAMON do not automatically sets and updates the monitoring target +regions when ``paddr`` monitoring operations set is being used (``paddr`` is +written to the ``contexts//operations``). Therefore, users should set the +monitoring target regions by themselves in the case. + +For such cases, users can explicitly set the initial monitoring target regions +as they want, by writing proper values to the files under this directory. + +In the beginning, this directory has only one file, ``nr_regions``. Writing a +number (``N``) to the file creates the number of child directories named ``0`` +to ``N-1``. Each directory represents each initial monitoring target region. + +regions// +------------ + +In each region directory, you will find two files (``start`` and ``end``). You +can set and get the start and end addresses of the initial monitoring target +region by writing to and reading from the files, respectively. + +contexts//schemes/ +--------------------- + +For usual DAMON-based data access aware memory management optimizations, users +would normally want the system to apply a memory management action to a memory +region of a specific access pattern. DAMON receives such formalized operation +schemes from the user and applies those to the target memory regions. Users +can get and set the schemes by reading from and writing to files under this +directory. + +In the beginning, this directory has only one file, ``nr_schemes``. Writing a +number (``N``) to the file creates the number of child directories named ``0`` +to ``N-1``. Each directory represents each DAMON-based operation scheme. + +schemes// +------------ + +In each scheme directory, four directories (``access_pattern``, ``quotas``, +``watermarks``, and ``stats``) and one file (``action``) exist. + +The ``action`` file is for setting and getting what action you want to apply to +memory regions having specific access pattern of the interest. The keywords +that can be written to and read from the file and their meaning are as below. + + - ``willneed``: Call ``madvise()`` for the region with ``MADV_WILLNEED`` + - ``cold``: Call ``madvise()`` for the region with ``MADV_COLD`` + - ``pageout``: Call ``madvise()`` for the region with ``MADV_PAGEOUT`` + - ``hugepage``: Call ``madvise()`` for the region with ``MADV_HUGEPAGE`` + - ``nohugepage``: Call ``madvise()`` for the region with ``MADV_NOHUGEPAGE`` + - ``stat``: Do nothing but count the statistics + +schemes//access_pattern/ +--------------------------- + +The target access pattern of each DAMON-based operation scheme is constructed +with three ranges including the size of the region in bytes, number of +monitored accesses per aggregate interval, and number of aggregated intervals +for the age of the region. + +Under the ``access_pattern`` directory, three directories (``sz``, +``nr_accesses``, and ``age``) each having two files (``min`` and ``max``) +exist. You can set and get the access pattern for the given scheme by writing +to and reading from the ``min`` and ``max`` files under ``sz``, +``nr_accesses``, and ``age`` directories, respectively. + +schemes//quotas/ +------------------- + +Optimal ``target access pattern`` for each ``action`` is workload dependent, so +not easy to find. Worse yet, setting a scheme of some action too aggressive +can cause severe overhead. To avoid such overhead, users can limit time and +size quota for each scheme. In detail, users can ask DAMON to try to use only +up to specific time (``time quota``) for applying the action, and to apply the +action to only up to specific amount (``size quota``) of memory regions having +the target access pattern within a given time interval (``reset interval``). + +When the quota limit is expected to be exceeded, DAMON prioritizes found memory +regions of the ``target access pattern`` based on their size, access frequency, +and age. For personalized prioritization, users can set the weights for the +three properties. + +Under ``quotas`` directory, three files (``ms``, ``bytes``, +``reset_interval_ms``) and one directory (``weights``) having three files +(``sz_permil``, ``nr_accesses_permil``, and ``age_permil``) in it exist. + +You can set the ``time quota`` in milliseconds, ``size quota`` in bytes, and +``reset interval`` in milliseconds by writing the values to the three files, +respectively. You can also set the prioritization weights for size, access +frequency, and age in per-thousand unit by writing the values to the three +files under the ``weights`` directory. + +schemes//watermarks/ +----------------------- + +To allow easy activation and deactivation of each scheme based on system +status, DAMON provides a feature called watermarks. The feature receives five +values called ``metric``, ``interval``, ``high``, ``mid``, and ``low``. The +``metric`` is the system metric such as free memory ratio that can be measured. +If the metric value of the system is higher than the value in ``high`` or lower +than ``low`` at the memoent, the scheme is deactivated. If the value is lower +than ``mid``, the scheme is activated. + +Under the watermarks directory, five files (``metric``, ``interval_us``, +``high``, ``mid``, and ``low``) for setting each value exist. You can set and +get the five values by writing to the files, respectively. + +Keywords and meanings of those that can be written to the ``metric`` file are +as below. + + - none: Ignore the watermarks + - free_mem_rate: System's free memory rate (per thousand) + +The ``interval`` should written in microseconds unit. + +.. _sysfs_schemes_stats: + +schemes//stats/ +------------------ + +DAMON counts the total number and bytes of regions that each scheme is tried to +be applied, the two numbers for the regions that each scheme is successfully +applied, and the total number of the quota limit exceeds. This statistics can +be used for online analysis or tuning of the schemes. + +The statistics can be retrieved by reading the files under ``stats`` directory +(``nr_tried``, ``sz_tried``, ``nr_applied``, ``sz_applied``, and +``qt_exceeds``), respectively. The files are not updated in real time, so you +should ask DAMON sysfs interface to updte the content of the files for the +stats by writing a special keyword, ``update_schemes_stats`` to the relevant +``kdamonds//state`` file. + +Example +~~~~~~~ + +Below commands applies a scheme saying "If a memory region of size in [4KiB, +8KiB] is showing accesses per aggregate interval in [0, 5] for aggregate +interval in [10, 20], page out the region. For the paging out, use only up to +10ms per second, and also don't page out more than 1GiB per second. Under the +limitation, page out memory regions having longer age first. Also, check the +free memory rate of the system every 5 seconds, start the monitoring and paging +out when the free memory rate becomes lower than 50%, but stop it if the free +memory rate becomes larger than 60%, or lower than 30%". :: + + # cd /kernel/mm/damon/admin + # # populate directories + # echo 1 > kdamonds/nr_kdamonds; echo 1 > kdamonds/0/contexts/nr_contexts; + # echo 1 > kdamonds/0/contexts/0/schemes/nr_schemes + # cd kdamonds/0/contexts/0/schemes/0 + # # set the basic access pattern and the action + # echo 4096 > access_patterns/sz/min + # echo 8192 > access_patterns/sz/max + # echo 0 > access_patterns/nr_accesses/min + # echo 5 > access_patterns/nr_accesses/max + # echo 10 > access_patterns/age/min + # echo 20 > access_patterns/age/max + # echo pageout > action + # # set quotas + # echo 10 > quotas/ms + # echo $((1024*1024*1024)) > quotas/bytes + # echo 1000 > quotas/reset_interval_ms + # # set watermark + # echo free_mem_rate > watermarks/metric + # echo 5000000 > watermarks/interval_us + # echo 600 > watermarks/high + # echo 500 > watermarks/mid + # echo 300 > watermarks/low + +Please note that it's highly recommended to use user space tools like `damo +`_ rather than manually reading and writing +the files as above. Above is only for an example. .. _debugfs_interface: From patchwork Tue Mar 22 21:49:58 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789310 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A74AAC433F5 for ; Tue, 22 Mar 2022 21:50:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 423546B0261; Tue, 22 Mar 2022 17:50:02 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3AAAD6B0266; Tue, 22 Mar 2022 17:50:02 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2247D6B0269; Tue, 22 Mar 2022 17:50:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.25]) by kanga.kvack.org (Postfix) with ESMTP id 0C5CB6B0261 for ; Tue, 22 Mar 2022 17:50:02 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay11.hostedemail.com (Postfix) with ESMTP id DC59081A41 for ; Tue, 22 Mar 2022 21:50:01 +0000 (UTC) X-FDA: 79273365402.02.00F665B Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf31.hostedemail.com (Postfix) with ESMTP id 3BFF720010 for ; Tue, 22 Mar 2022 21:50:01 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 04615B81DC6; Tue, 22 Mar 2022 21:50:00 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 99224C340EE; Tue, 22 Mar 2022 21:49:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985798; bh=bQqOYlZD+42SAaMW44eQLy21UU5942AlHX38ItcKB3U=; h=Date:To:From:In-Reply-To:Subject:From; b=WvznL9GjmdwXwaf3YvL4Bqo1eNUn7RY2smZazRtDSFmD0yAcb32O7YQtjAyDUQmV9 eyHyYoTdYwR8EXTAugCDyI+3pFvXaRJlwa8GM0l/6oP7D72ly0BIicEOieu8u4ugLT 8E2MuFgEAJYga704aSVe5JrjGsby90sDvvX/YGFQ= Date: Tue, 22 Mar 2022 14:49:58 -0700 To: xhao@linux.alibaba.com,skhan@linuxfoundation.org,rientjes@google.com,gregkh@linuxfoundation.org,corbet@lwn.net,sj@kernel.org,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 226/227] Docs/ABI/testing: add DAMON sysfs interface ABI document Message-Id: <20220322214958.99224C340EE@smtp.kernel.org> X-Rspam-User: X-Stat-Signature: 51356yu4aqczqt83r5jjpbyuc3yoo6ki Authentication-Results: imf31.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=WvznL9Gj; spf=pass (imf31.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 3BFF720010 X-HE-Tag: 1647985801-693297 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: SeongJae Park Subject: Docs/ABI/testing: add DAMON sysfs interface ABI document This commit adds DAMON sysfs interface ABI document under Documentation/ABI/testing. Link: https://lkml.kernel.org/r/20220228081314.5770-14-sj@kernel.org Signed-off-by: SeongJae Park Cc: David Rientjes Cc: Greg Kroah-Hartman Cc: Jonathan Corbet Cc: Shuah Khan Cc: Xin Hao Signed-off-by: Andrew Morton --- Documentation/ABI/testing/sysfs-kernel-mm-damon | 274 ++++++++++++++ MAINTAINERS | 1 2 files changed, 275 insertions(+) --- /dev/null +++ a/Documentation/ABI/testing/sysfs-kernel-mm-damon @@ -0,0 +1,274 @@ +what: /sys/kernel/mm/damon/ +Date: Mar 2022 +Contact: SeongJae Park +Description: Interface for Data Access MONitoring (DAMON). Contains files + for controlling DAMON. For more details on DAMON itself, + please refer to Documentation/admin-guide/mm/damon/index.rst. + +What: /sys/kernel/mm/damon/admin/ +Date: Mar 2022 +Contact: SeongJae Park +Description: Interface for privileged users of DAMON. Contains files for + controlling DAMON that aimed to be used by privileged users. + +What: /sys/kernel/mm/damon/admin/kdamonds/nr_kdamonds +Date: Mar 2022 +Contact: SeongJae Park +Description: Writing a number 'N' to this file creates the number of + directories for controlling each DAMON worker thread (kdamond) + named '0' to 'N-1' under the kdamonds/ directory. + +What: /sys/kernel/mm/damon/admin/kdamonds//state +Date: Mar 2022 +Contact: SeongJae Park +Description: Writing 'on' or 'off' to this file makes the kdamond starts or + stops, respectively. Reading the file returns the keywords + based on the current status. Writing 'update_schemes_stats' to + the file updates contents of schemes stats files of the + kdamond. + +What: /sys/kernel/mm/damon/admin/kdamonds//pid +Date: Mar 2022 +Contact: SeongJae Park +Description: Reading this file returns the pid of the kdamond if it is + running. + +What: /sys/kernel/mm/damon/admin/kdamonds//contexts/nr_contexts +Date: Mar 2022 +Contact: SeongJae Park +Description: Writing a number 'N' to this file creates the number of + directories for controlling each DAMON context named '0' to + 'N-1' under the contexts/ directory. + +What: /sys/kernel/mm/damon/admin/kdamonds//contexts//operations +Date: Mar 2022 +Contact: SeongJae Park +Description: Writing a keyword for a monitoring operations set ('vaddr' for + virtual address spaces monitoring, and 'paddr' for the physical + address space monitoring) to this file makes the context to use + the operations set. Reading the file returns the keyword for + the operations set the context is set to use. + +What: /sys/kernel/mm/damon/admin/kdamonds//contexts//monitoring_attrs/intervals/sample_us +Date: Mar 2022 +Contact: SeongJae Park +Description: Writing a value to this file sets the sampling interval of the + DAMON context in microseconds as the value. Reading this file + returns the value. + +What: /sys/kernel/mm/damon/admin/kdamonds//contexts//monitoring_attrs/intervals/aggr_us +Date: Mar 2022 +Contact: SeongJae Park +Description: Writing a value to this file sets the aggregation interval of + the DAMON context in microseconds as the value. Reading this + file returns the value. + +What: /sys/kernel/mm/damon/admin/kdamonds//contexts//monitoring_attrs/intervals/update_us +Date: Mar 2022 +Contact: SeongJae Park +Description: Writing a value to this file sets the update interval of the + DAMON context in microseconds as the value. Reading this file + returns the value. + +What: /sys/kernel/mm/damon/admin/kdamonds//contexts//monitoring_attrs/nr_regions/min + +WDate: Mar 2022 +Contact: SeongJae Park +Description: Writing a value to this file sets the minimum number of + monitoring regions of the DAMON context as the value. Reading + this file returns the value. + +What: /sys/kernel/mm/damon/admin/kdamonds//contexts//monitoring_attrs/nr_regions/max +Date: Mar 2022 +Contact: SeongJae Park +Description: Writing a value to this file sets the maximum number of + monitoring regions of the DAMON context as the value. Reading + this file returns the value. + +What: /sys/kernel/mm/damon/admin/kdamonds//contexts//targets/nr_targets +Date: Mar 2022 +Contact: SeongJae Park +Description: Writing a number 'N' to this file creates the number of + directories for controlling each DAMON target of the context + named '0' to 'N-1' under the contexts/ directory. + +What: /sys/kernel/mm/damon/admin/kdamonds//contexts//targets//pid_target +Date: Mar 2022 +Contact: SeongJae Park +Description: Writing to and reading from this file sets and gets the pid of + the target process if the context is for virtual address spaces + monitoring, respectively. + +What: /sys/kernel/mm/damon/admin/kdamonds//contexts//targets//regions/nr_regions +Date: Mar 2022 +Contact: SeongJae Park +Description: Writing a number 'N' to this file creates the number of + directories for setting each DAMON target memory region of the + context named '0' to 'N-1' under the regions/ directory. In + case of the virtual address space monitoring, DAMON + automatically sets the target memory region based on the target + processes' mappings. + +What: /sys/kernel/mm/damon/admin/kdamonds//contexts//targets//regions//start +Date: Mar 2022 +Contact: SeongJae Park +Description: Writing to and reading from this file sets and gets the start + address of the monitoring region. + +What: /sys/kernel/mm/damon/admin/kdamonds//contexts//targets//regions//end +Date: Mar 2022 +Contact: SeongJae Park +Description: Writing to and reading from this file sets and gets the end + address of the monitoring region. + +What: /sys/kernel/mm/damon/admin/kdamonds//contexts//schemes/nr_schemes +Date: Mar 2022 +Contact: SeongJae Park +Description: Writing a number 'N' to this file creates the number of + directories for controlling each DAMON-based operation scheme + of the context named '0' to 'N-1' under the schemes/ directory. + +What: /sys/kernel/mm/damon/admin/kdamonds//contexts//schemes//action +Date: Mar 2022 +Contact: SeongJae Park +Description: Writing to and reading from this file sets and gets the action + of the scheme. + +What: /sys/kernel/mm/damon/admin/kdamonds//contexts//schemes//access_pattern/sz/min +Date: Mar 2022 +Contact: SeongJae Park +Description: Writing to and reading from this file sets and gets the mimimum + size of the scheme's target regions in bytes. + +What: /sys/kernel/mm/damon/admin/kdamonds//contexts//schemes//access_pattern/sz/max +Date: Mar 2022 +Contact: SeongJae Park +Description: Writing to and reading from this file sets and gets the maximum + size of the scheme's target regions in bytes. + +What: /sys/kernel/mm/damon/admin/kdamonds//contexts//schemes//access_pattern/nr_accesses/min +Date: Mar 2022 +Contact: SeongJae Park +Description: Writing to and reading from this file sets and gets the manimum + 'nr_accesses' of the scheme's target regions. + +What: /sys/kernel/mm/damon/admin/kdamonds//contexts//schemes//access_pattern/nr_accesses/max +Date: Mar 2022 +Contact: SeongJae Park +Description: Writing to and reading from this file sets and gets the maximum + 'nr_accesses' of the scheme's target regions. + +What: /sys/kernel/mm/damon/admin/kdamonds//contexts//schemes//access_pattern/age/min +Date: Mar 2022 +Contact: SeongJae Park +Description: Writing to and reading from this file sets and gets the minimum + 'age' of the scheme's target regions. + +What: /sys/kernel/mm/damon/admin/kdamonds//contexts//schemes//access_pattern/age/max +Date: Mar 2022 +Contact: SeongJae Park +Description: Writing to and reading from this file sets and gets the maximum + 'age' of the scheme's target regions. + +What: /sys/kernel/mm/damon/admin/kdamonds//contexts//schemes//quotas/ms +Date: Mar 2022 +Contact: SeongJae Park +Description: Writing to and reading from this file sets and gets the time + quota of the scheme in milliseconds. + +What: /sys/kernel/mm/damon/admin/kdamonds//contexts//schemes//quotas/bytes +Date: Mar 2022 +Contact: SeongJae Park +Description: Writing to and reading from this file sets and gets the size + quota of the scheme in bytes. + +What: /sys/kernel/mm/damon/admin/kdamonds//contexts//schemes//quotas/reset_interval_ms +Date: Mar 2022 +Contact: SeongJae Park +Description: Writing to and reading from this file sets and gets the quotas + charge reset interval of the scheme in milliseconds. + +What: /sys/kernel/mm/damon/admin/kdamonds//contexts//schemes//quotas/weights/sz_permil +Date: Mar 2022 +Contact: SeongJae Park +Description: Writing to and reading from this file sets and gets the + under-quota limit regions prioritization weight for 'size' in + permil. + +What: /sys/kernel/mm/damon/admin/kdamonds//contexts//schemes//quotas/weights/nr_accesses_permil +Date: Mar 2022 +Contact: SeongJae Park +Description: Writing to and reading from this file sets and gets the + under-quota limit regions prioritization weight for + 'nr_accesses' in permil. + +What: /sys/kernel/mm/damon/admin/kdamonds//contexts//schemes//quotas/weights/age_permil +Date: Mar 2022 +Contact: SeongJae Park +Description: Writing to and reading from this file sets and gets the + under-quota limit regions prioritization weight for 'age' in + permil. + +What: /sys/kernel/mm/damon/admin/kdamonds//contexts//schemes//watermarks/metric +Date: Mar 2022 +Contact: SeongJae Park +Description: Writing to and reading from this file sets and gets the metric + of the watermarks for the scheme. The writable/readable + keywords for this file are 'none' for disabling the watermarks + feature, or 'free_mem_rate' for the system's global free memory + rate in permil. + +What: /sys/kernel/mm/damon/admin/kdamonds//contexts//schemes//watermarks/interval_us +Date: Mar 2022 +Contact: SeongJae Park +Description: Writing to and reading from this file sets and gets the metric + check interval of the watermarks for the scheme in + microseconds. + +What: /sys/kernel/mm/damon/admin/kdamonds//contexts//schemes//watermarks/high +Date: Mar 2022 +Contact: SeongJae Park +Description: Writing to and reading from this file sets and gets the high + watermark of the scheme in permil. + +What: /sys/kernel/mm/damon/admin/kdamonds//contexts//schemes//watermarks/mid +Date: Mar 2022 +Contact: SeongJae Park +Description: Writing to and reading from this file sets and gets the mid + watermark of the scheme in permil. + +What: /sys/kernel/mm/damon/admin/kdamonds//contexts//schemes//watermarks/low +Date: Mar 2022 +Contact: SeongJae Park +Description: Writing to and reading from this file sets and gets the low + watermark of the scheme in permil. + +What: /sys/kernel/mm/damon/admin/kdamonds//contexts//schemes//stats/nr_tried +Date: Mar 2022 +Contact: SeongJae Park +Description: Reading this file returns the number of regions that the action + of the scheme has tried to be applied. + +What: /sys/kernel/mm/damon/admin/kdamonds//contexts//schemes//stats/sz_tried +Date: Mar 2022 +Contact: SeongJae Park +Description: Reading this file returns the total size of regions that the + action of the scheme has tried to be applied in bytes. + +What: /sys/kernel/mm/damon/admin/kdamonds//contexts//schemes//stats/nr_applied +Date: Mar 2022 +Contact: SeongJae Park +Description: Reading this file returns the number of regions that the action + of the scheme has successfully applied. + +What: /sys/kernel/mm/damon/admin/kdamonds//contexts//schemes//stats/sz_applied +Date: Mar 2022 +Contact: SeongJae Park +Description: Reading this file returns the total size of regions that the + action of the scheme has successfully applied in bytes. + +What: /sys/kernel/mm/damon/admin/kdamonds//contexts//schemes//stats/qt_exceeds +Date: Mar 2022 +Contact: SeongJae Park +Description: Reading this file returns the number of the exceed events of + the scheme's quotas. --- a/MAINTAINERS~docs-abi-testing-add-damon-sysfs-interface-abi-document +++ a/MAINTAINERS @@ -5317,6 +5317,7 @@ DATA ACCESS MONITOR M: SeongJae Park L: linux-mm@kvack.org S: Maintained +F: Documentation/ABI/testing/sysfs-kernel-mm-damon F: Documentation/admin-guide/mm/damon/ F: Documentation/vm/damon/ F: include/linux/damon.h From patchwork Tue Mar 22 21:50:00 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12789311 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 22649C433EF for ; Tue, 22 Mar 2022 21:50:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A15C16B0269; Tue, 22 Mar 2022 17:50:03 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9A0166B026A; Tue, 22 Mar 2022 17:50:03 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 817256B026B; Tue, 22 Mar 2022 17:50:03 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0246.hostedemail.com [216.40.44.246]) by kanga.kvack.org (Postfix) with ESMTP id 6A3AB6B0269 for ; Tue, 22 Mar 2022 17:50:03 -0400 (EDT) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 296A18249980 for ; Tue, 22 Mar 2022 21:50:03 +0000 (UTC) X-FDA: 79273365486.26.C2FA857 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf29.hostedemail.com (Postfix) with ESMTP id BA0F912001A for ; Tue, 22 Mar 2022 21:50:02 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 2E8296149C; Tue, 22 Mar 2022 21:50:02 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 817B4C340EE; Tue, 22 Mar 2022 21:50:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985801; bh=mFHlkOPpf1q4Sl8uzN0iF58khGGdZSfydcW9vvO2qTM=; h=Date:To:From:In-Reply-To:Subject:From; b=t+NOshuP+FYdWBEymMllldithW5NymHxe8NTuoyrHRr5Ls2HCqsmFotQdwuLlE1Px rzfePvWfkwcpL34gvbcoyO3eRVvcYTSZREefVaBdpfg/RVr05KneMefDmvL2/cBO4a 99MsL8vyX4bGj/cizu6Dc6ZmQPwEmeQA4g2B0M7Q= Date: Tue, 22 Mar 2022 14:50:00 -0700 To: sj@kernel.org,xhao@linux.alibaba.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 227/227] mm/damon/sysfs: remove repeat container_of() in damon_sysfs_kdamond_release() Message-Id: <20220322215001.817B4C340EE@smtp.kernel.org> X-Stat-Signature: qmizrjctfpqn7gmn3mrzu8piwbrbwhma X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: BA0F912001A Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=t+NOshuP; dmarc=none; spf=pass (imf29.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Rspam-User: X-HE-Tag: 1647985802-191176 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Xin Hao Subject: mm/damon/sysfs: remove repeat container_of() in damon_sysfs_kdamond_release() In damon_sysfs_kdamond_release(), we have use container_of() to get "kdamond" pointer, so there no need to get it once again. Link: https://lkml.kernel.org/r/20220303075314.22502-1-xhao@linux.alibaba.com Signed-off-by: Xin Hao Reviewed-by: SeongJae Park Signed-off-by: Andrew Morton --- mm/damon/sysfs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/mm/damon/sysfs.c~mm-damon-sysfs-remove-repeat-container_of-in-damon_sysfs_kdamond_release +++ a/mm/damon/sysfs.c @@ -2345,7 +2345,7 @@ static void damon_sysfs_kdamond_release( if (kdamond->damon_ctx) damon_destroy_ctx(kdamond->damon_ctx); - kfree(container_of(kobj, struct damon_sysfs_kdamond, kobj)); + kfree(kdamond); } static struct kobj_attribute damon_sysfs_kdamond_state_attr =