From patchwork Wed Mar 16 02:26:49 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: luofei <luofei@unicloud.com>
X-Patchwork-Id: 12782087
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 4F521C433F5
	for <linux-mm@archiver.kernel.org>; Wed, 16 Mar 2022 02:27:03 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 57EAE8D0002; Tue, 15 Mar 2022 22:27:02 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 52DF48D0001; Tue, 15 Mar 2022 22:27:02 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 3CEDD8D0002; Tue, 15 Mar 2022 22:27:02 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.27])
	by kanga.kvack.org (Postfix) with ESMTP id 2A8F58D0001
	for <linux-mm@kvack.org>; Tue, 15 Mar 2022 22:27:02 -0400 (EDT)
Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay12.hostedemail.com (Postfix) with ESMTP id D102A1206BF
	for <linux-mm@kvack.org>; Wed, 16 Mar 2022 02:27:01 +0000 (UTC)
X-FDA: 79248661842.06.EF928A0
Received: from spam.unicloud.com (mx.uniclinxens.com [220.194.70.58])
	by imf01.hostedemail.com (Postfix) with ESMTP id 7B4D34000A
	for <linux-mm@kvack.org>; Wed, 16 Mar 2022 02:26:59 +0000 (UTC)
Received: from eage.unicloud.com ([220.194.70.35])
	by spam.unicloud.com with ESMTP id 22G2Qo1x028879;
	Wed, 16 Mar 2022 10:26:50 +0800 (GMT-8)
	(envelope-from luofei@unicloud.com)
Received: from zgys-ex-mb09.Unicloud.com (10.10.0.24) by
 zgys-ex-mb10.Unicloud.com (10.10.0.6) with Microsoft SMTP Server
 (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id
 15.1.2375.17; Wed, 16 Mar 2022 10:26:49 +0800
Received: from zgys-ex-mb09.Unicloud.com ([fe80::eda0:6815:ca71:5aa]) by
 zgys-ex-mb09.Unicloud.com ([fe80::eda0:6815:ca71:5aa%16]) with mapi id
 15.01.2375.017; Wed, 16 Mar 2022 10:26:49 +0800
From: =?eucgb2312_cn?b?wt63yQ==?= <luofei@unicloud.com>
To: Mike Kravetz <mike.kravetz@oracle.com>,
        Muchun Song
	<songmuchun@bytedance.com>
CC: Andrew Morton <akpm@linux-foundation.org>,
        Linux Memory Management List
	<linux-mm@kvack.org>,
        LKML <linux-kernel@vger.kernel.org>
Subject: 
 =?eucgb2312_cn?b?tPC4tDogW1BBVENIXSBodWdldGxiZnM6IGZpeCBkZXNjcmlwdGlvbiBh?=
	=?eucgb2312_cn?b?Ym91dCBhdG9taWMgYWxsb2NhdGlvbiBvZiB2bWVtbWFwIHBhZ2VzIHdoZW4gZnJl?=
	=?eucgb2312_cn?b?ZSBodWdlIHBhZ2U=?=
Thread-Topic: [PATCH] hugetlbfs: fix description about atomic allocation of
 vmemmap pages when free huge page
Thread-Index: AQHYOCSEJQtF7F2NS0Ol60FQZC1p/Ky/6v2AgACCawCAANlxmw==
Date: Wed, 16 Mar 2022 02:26:49 +0000
Message-ID: <380ea251d3084917aaaf6cc973a857e1@unicloud.com>
References: <20220315042355.362810-1-luofei@unicloud.com>
 <CAMZfGtWjnhZLVmRD0BSpMbAWr_vD5BCj5s0ARfNHpHeAAGWYjA@mail.gmail.com>,<a56e0ea8-3b11-8239-d39c-ed33e479427e@oracle.com>
In-Reply-To: <a56e0ea8-3b11-8239-d39c-ed33e479427e@oracle.com>
Accept-Language: zh-CN, en-US
Content-Language: zh-CN
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-originating-ip: [10.10.1.7]
MIME-Version: 1.0
X-DNSRBL: 
X-MAIL: spam.unicloud.com 22G2Qo1x028879
X-Rspamd-Server: rspam06
X-Rspamd-Queue-Id: 7B4D34000A
X-Stat-Signature: csu1k45ik3ud6tsmeemipd9m51qbx9j1
Authentication-Results: imf01.hostedemail.com;
	dkim=none;
	spf=pass (imf01.hostedemail.com: domain of luofei@unicloud.com designates
 220.194.70.58 as permitted sender) smtp.mailfrom=luofei@unicloud.com;
	dmarc=none
X-Rspam-User: 
X-HE-Tag: 1647397619-515852
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

>>>
>>> No matter what context update_and_free_page() is called in,
>>> the flag for allocating the vmemmap page is fixed
>>> (GFP_KERNEL | __GFP_NORETRY | __GFP_THISNODE), and no atomic
>>> allocation is involved, so the description of atomicity here
>>> is somewhat inappropriate.
>>>
>>> and the atomic parameter naming of update_and_free_page() is
>>> somewhat misleading.
>>>
>>> Signed-off-by: luofei <luofei@unicloud.com>
>>> ---
>>>  mm/hugetlb.c | 10 ++++------
>>>  1 file changed, 4 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>>> index f8ca7cca3c1a..239ef82b7897 100644
>>> --- a/mm/hugetlb.c
>>> +++ b/mm/hugetlb.c
>>> @@ -1570,8 +1570,8 @@ static void __update_and_free_page(struct hstate *h, struct page *page)
>>>
>>>  /*
>>>   * As update_and_free_page() can be called under any context, so we cannot
>>> - * use GFP_KERNEL to allocate vmemmap pages. However, we can defer the
>>> - * actual freeing in a workqueue to prevent from using GFP_ATOMIC to allocate
>>> + * use GFP_ATOMIC to allocate vmemmap pages. However, we can defer the
>>> + * actual freeing in a workqueue to prevent waits caused by allocating
>>>   * the vmemmap pages.
>>>   *
>>>   * free_hpage_workfn() locklessly retrieves the linked list of pages to be
>>> @@ -1617,16 +1617,14 @@ static inline void flush_free_hpage_work(struct hstate *h)
>>>  }
>>>
>>>  static void update_and_free_page(struct hstate *h, struct page *page,
>>> -                                bool atomic)
>>> +                                bool delay)
>>
>> Hi luofei,
>>
>> At least, I don't agree with this change.  The "atomic" means if the
>> caller is under atomic context instead of whether using atomic
>> GFP_MASK.  The "delay" seems to tell the caller that it can undelay
>> the allocation even if it is under atomic context (actually, it has no
>> choice).  But "atomic" can indicate the user is being asked to tell us
>> if it is under atomic context.
>
>There may be some confusion since GFP_ATOMIC is mentioned in the comments
>and GFP_ATOMIC is not used in the allocation of vmemmap pages.  IIRC,
>the use of GFP_ATOMIC was discussed at one time but dismissed because of
>undesired side effects such as dipping into "atomic reserves".
>
>How about an update to the comments as follows (sorry mailer may mess up
>formatting)?
>
>diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>index f8ca7cca3c1a..6a4d27e24b21 100644
>--- a/mm/hugetlb.c
>+++ b/mm/hugetlb.c
>@@ -1569,10 +1569,12 @@ static void __update_and_free_page(struct hstate *h, struct page *page)
>}
>
>/*
>- * As update_and_free_page() can be called under any context, so we cannot
>- * use GFP_KERNEL to allocate vmemmap pages. However, we can defer the
>- * actual freeing in a workqueue to prevent from using GFP_ATOMIC to allocate
>- * the vmemmap pages.
>+ * Freeing hugetlb pages in done in update_and_free_page().  When freeing a
>+ * hugetlb page, vmemmap pages may need to be allocated.  The routine
>+ * alloc_huge_page_vmemmap() can possibly sleep as it uses GFP_KERNEL.
>+ * However, update_and_free_page() can be called under any context.  To
>+ * avoid the possibility of sleeping in a context where sleeping is not
>+ * allowed, defer the actual freeing in a workqueue where sleeping is allowed.
> *
>  * free_hpage_workfn() locklessly retrieves the linked list of pages to be
>  * freed and frees them one-by-one. As the page->mapping pointer is going
>@@ -1616,6 +1618,10 @@ static inline void flush_free_hpage_work(struct hstate *h)
>                 flush_work(&free_hpage_work);
> }
>
>+/*
>+ * atomic == true indicates called from a context where sleeping is
>+ * not allowed.
>+ */
> static void update_and_free_page(struct hstate *h, struct page *page,
>                                 bool atomic)
> {
>@@ -1625,7 +1631,8 @@ static void update_and_free_page(struct hstate *h, struct page *page,
>         }
>
>        /*
>-        * Defer freeing to avoid using GFP_ATOMIC to allocate vmemmap pages.
>+        * Defer freeing to avoid possible sleeping when allocating
>+        * vmemmap pages.
>          *
>          * Only call schedule_work() if hpage_freelist is previously
>          * empty. Otherwise, schedule_work() had been called but the workfn

The comments made it clearer for me. I will revise the patch. Thanks :)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index f8ca7cca3c1a..6a4d27e24b21 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1569,10 +1569,12 @@ static void __update_and_free_page(struct hstate *h, struct page *page)
 }

 /*
- * As update_and_free_page() can be called under any context, so we cannot
- * use GFP_KERNEL to allocate vmemmap pages. However, we can defer the
- * actual freeing in a workqueue to prevent from using GFP_ATOMIC to allocate
- * the vmemmap pages.
+ * Freeing hugetlb pages in done in update_and_free_page().  When freeing a
+ * hugetlb page, vmemmap pages may need to be allocated.  The routine
+ * alloc_huge_page_vmemmap() can possibly sleep as it uses GFP_KERNEL.
+ * However, update_and_free_page() can be called under any context.  To
+ * avoid the possibility of sleeping in a context where sleeping is not
+ * allowed, defer the actual freeing in a workqueue where sleeping is allowed.
  *
  * free_hpage_workfn() locklessly retrieves the linked list of pages to be
  * freed and frees them one-by-one. As the page->mapping pointer is going
@@ -1616,6 +1618,10 @@ static inline void flush_free_hpage_work(struct hstate *h)
                flush_work(&free_hpage_work);
 }

+/*
+ * atomic == true indicates called from a context where sleeping is
+ * not allowed.
+ */
 static void update_and_free_page(struct hstate *h, struct page *page,
                                 bool atomic)
 {
@@ -1625,7 +1631,8 @@ static void update_and_free_page(struct hstate *h, struct page *page,
        }

        /*
-        * Defer freeing to avoid using GFP_ATOMIC to allocate vmemmap pages.
+        * Defer freeing to avoid possible sleeping when allocating
+        * vmemmap pages.
         *
         * Only call schedule_work() if hpage_freelist is previously
         * empty. Otherwise, schedule_work() had been called but the workfn