[v2] mm/hugetlb: defer free_huge_page() to a workqueue

Message ID	20191212190427.ouyohviijf5inhur@linux-p48b (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=aYM/=2C=kvack.org=owner-linux-mm@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 82BD42253D Date: Thu, 12 Dec 2019 11:04:27 -0800 From: Davidlohr Bueso <dave@stgolabs.net> To: Andrew Morton <akpm@linux-foundation.org> Cc: Mike Kravetz <mike.kravetz@oracle.com>, Waiman Long <longman@redhat.com>, Matthew Wilcox <willy@infradead.org>, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Michal Hocko <mhocko@kernel.org>, aneesh.kumar@linux.ibm.com Subject: [PATCH v2] mm/hugetlb: defer free_huge_page() to a workqueue Message-ID: <20191212190427.ouyohviijf5inhur@linux-p48b> Mail-Followup-To: Andrew Morton <akpm@linux-foundation.org>, Mike Kravetz <mike.kravetz@oracle.com>, Waiman Long <longman@redhat.com>, Matthew Wilcox <willy@infradead.org>, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Michal Hocko <mhocko@kernel.org>, aneesh.kumar@linux.ibm.com References: <20191211194615.18502-1-longman@redhat.com> <4fbc39a9-2c9c-4c2c-2b13-a548afe6083c@oracle.com> <32d2d4f2-83b9-2e40-05e2-71cd07e01b80@redhat.com> <0fcce71f-bc20-0ea3-b075-46592c8d533d@oracle.com> <20191212060650.ftqq27ftutxpc5hq@linux-p48b> <20191212063050.ufrpij6s6jkv7g7j@linux-p48b> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Disposition: inline Content-Transfer-Encoding: quoted-printable In-Reply-To: <20191212063050.ufrpij6s6jkv7g7j@linux-p48b> User-Agent: NeoMutt/20180716 Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	[v2] mm/hugetlb: defer free_huge_page() to a workqueue \| expand [v2] mm/hugetlb: defer free_huge_page() to a workqueue

Davidlohr Bueso Dec. 12, 2019, 7:04 p.m. UTC

There have been deadlock reports[1, 2] where put_page is called
from softirq context and this causes trouble with the hugetlb_lock,
as well as potentially the subpool lock.

For such an unlikely scenario, lets not add irq dancing overhead
to the lock+unlock operations, which could incur in expensive
instruction dependencies, particularly when considering hard-irq
safety. For example PUSHF+POPF on x86.

Instead, just use a workqueue and do the free_huge_page() in regular
task context.

[1] https://lore.kernel.org/lkml/20191211194615.18502-1-longman@redhat.com/
[2] https://lore.kernel.org/lkml/20180905112341.21355-1-aneesh.kumar@linux.ibm.com/

Reported-by: Waiman Long <longman@redhat.com>
Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---

- Changes from v1: Only use wq when in_interrupt(), otherwise business
    as usual. Also include the proper header file.

- While I have not reproduced this issue, the v1 using wq passes all hugetlb
    related tests in ltp.

 mm/hugetlb.c | 45 ++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 44 insertions(+), 1 deletion(-)

--
2.16.4

Mike Kravetz Dec. 12, 2019, 7:22 p.m. UTC | #1

On 12/12/19 11:04 AM, Davidlohr Bueso wrote:
> There have been deadlock reports[1, 2] where put_page is called
> from softirq context and this causes trouble with the hugetlb_lock,
> as well as potentially the subpool lock.
> 
> For such an unlikely scenario, lets not add irq dancing overhead
> to the lock+unlock operations, which could incur in expensive
> instruction dependencies, particularly when considering hard-irq
> safety. For example PUSHF+POPF on x86.
> 
> Instead, just use a workqueue and do the free_huge_page() in regular
> task context.
> 
> [1] https://lore.kernel.org/lkml/20191211194615.18502-1-longman@redhat.com/
> [2] https://lore.kernel.org/lkml/20180905112341.21355-1-aneesh.kumar@linux.ibm.com/
> 
> Reported-by: Waiman Long <longman@redhat.com>
> Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> Signed-off-by: Davidlohr Bueso <dbueso@suse.de>

Thank you Davidlohr.

The patch does seem fairly simple and straight forward.  I need to brush up
on my workqueue knowledge to provide a full review.

Longman,
Do you have a test to reproduce the issue?  If so, can you try running with
this patch.

Davidlohr Bueso Dec. 12, 2019, 7:36 p.m. UTC | #2

On Thu, 12 Dec 2019, Mike Kravetz wrote:

>The patch does seem fairly simple and straight forward.  I need to brush up
>on my workqueue knowledge to provide a full review.

The flush_work() call is actually bogus as we obviously cannot block in
softirq...

Waiman Long Dec. 12, 2019, 8:52 p.m. UTC | #3

On 12/12/19 2:22 PM, Mike Kravetz wrote:
> On 12/12/19 11:04 AM, Davidlohr Bueso wrote:
>> There have been deadlock reports[1, 2] where put_page is called
>> from softirq context and this causes trouble with the hugetlb_lock,
>> as well as potentially the subpool lock.
>>
>> For such an unlikely scenario, lets not add irq dancing overhead
>> to the lock+unlock operations, which could incur in expensive
>> instruction dependencies, particularly when considering hard-irq
>> safety. For example PUSHF+POPF on x86.
>>
>> Instead, just use a workqueue and do the free_huge_page() in regular
>> task context.
>>
>> [1] https://lore.kernel.org/lkml/20191211194615.18502-1-longman@redhat.com/
>> [2] https://lore.kernel.org/lkml/20180905112341.21355-1-aneesh.kumar@linux.ibm.com/
>>
>> Reported-by: Waiman Long <longman@redhat.com>
>> Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>> Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
> Thank you Davidlohr.
>
> The patch does seem fairly simple and straight forward.  I need to brush up
> on my workqueue knowledge to provide a full review.
>
> Longman,
> Do you have a test to reproduce the issue?  If so, can you try running with
> this patch.

Yes, I do have a test that can reproduce the issue. I will run it with
the patch and report the status tomorrow.

-Longman

Waiman Long Dec. 12, 2019, 9:01 p.m. UTC | #4

On 12/12/19 2:04 PM, Davidlohr Bueso wrote:
> There have been deadlock reports[1, 2] where put_page is called
> from softirq context and this causes trouble with the hugetlb_lock,
> as well as potentially the subpool lock.
>
> For such an unlikely scenario, lets not add irq dancing overhead
> to the lock+unlock operations, which could incur in expensive
> instruction dependencies, particularly when considering hard-irq
> safety. For example PUSHF+POPF on x86.
>
> Instead, just use a workqueue and do the free_huge_page() in regular
> task context.
>
> [1]
> https://lore.kernel.org/lkml/20191211194615.18502-1-longman@redhat.com/
> [2]
> https://lore.kernel.org/lkml/20180905112341.21355-1-aneesh.kumar@linux.ibm.com/
>
> Reported-by: Waiman Long <longman@redhat.com>
> Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
> ---
>
> - Changes from v1: Only use wq when in_interrupt(), otherwise business
>    as usual. Also include the proper header file.
>
> - While I have not reproduced this issue, the v1 using wq passes all
> hugetlb
>    related tests in ltp.
>
> mm/hugetlb.c | 45 ++++++++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 44 insertions(+), 1 deletion(-)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index ac65bb5e38ac..f28cf601938d 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -27,6 +27,7 @@
> #include <linux/swapops.h>
> #include <linux/jhash.h>
> #include <linux/numa.h>
> +#include <linux/workqueue.h>
>
> #include <asm/page.h>
> #include <asm/pgtable.h>
> @@ -1136,7 +1137,13 @@ static inline void
> ClearPageHugeTemporary(struct page *page)
>     page[2].mapping = NULL;
> }
>
> -void free_huge_page(struct page *page)
> +static struct workqueue_struct *hugetlb_free_page_wq;
> +struct hugetlb_free_page_work {
> +    struct page *page;
> +    struct work_struct work;
> +};
> +
> +static void __free_huge_page(struct page *page)
> {
>     /*
>      * Can't pass hstate in here because it is called from the
> @@ -1199,6 +1206,36 @@ void free_huge_page(struct page *page)
>     spin_unlock(&hugetlb_lock);
> }
>
> +static void free_huge_page_workfn(struct work_struct *work)
> +{
> +    struct page *page;
> +
> +    page = container_of(work, struct hugetlb_free_page_work,
> work)->page;
> +    __free_huge_page(page);
> +}
> +
> +void free_huge_page(struct page *page)
> +{
> +    if (unlikely(in_interrupt())) {

in_interrupt() also include context where softIRQ is disabled. So maybe
!in_task() is a better fit here.


> +        /*
> +         * While uncommon, free_huge_page() can be at least
> +         * called from softirq context, defer freeing such
> +         * that the hugetlb_lock and spool->lock need not have
> +         * to deal with irq dances just for this.
> +         */
> +        struct hugetlb_free_page_work work;
> +
> +        work.page = page;
> +        INIT_WORK_ONSTACK(&work.work, free_huge_page_workfn);
> +        queue_work(hugetlb_free_page_wq, &work.work);
> +
> +        /* wait until the huge page freeing is done */
> +        flush_work(&work.work);
> +        destroy_work_on_stack(&work.work);

The problem I see is that you don't want to wait too long while in the
hardirq context. However, the latency for the work to finish is
indeterminate.

Cheers,
Longman

Waiman Long Dec. 12, 2019, 9:04 p.m. UTC | #5

On 12/12/19 3:52 PM, Waiman Long wrote:
> On 12/12/19 2:22 PM, Mike Kravetz wrote:
>> On 12/12/19 11:04 AM, Davidlohr Bueso wrote:
>>> There have been deadlock reports[1, 2] where put_page is called
>>> from softirq context and this causes trouble with the hugetlb_lock,
>>> as well as potentially the subpool lock.
>>>
>>> For such an unlikely scenario, lets not add irq dancing overhead
>>> to the lock+unlock operations, which could incur in expensive
>>> instruction dependencies, particularly when considering hard-irq
>>> safety. For example PUSHF+POPF on x86.
>>>
>>> Instead, just use a workqueue and do the free_huge_page() in regular
>>> task context.
>>>
>>> [1] https://lore.kernel.org/lkml/20191211194615.18502-1-longman@redhat.com/
>>> [2] https://lore.kernel.org/lkml/20180905112341.21355-1-aneesh.kumar@linux.ibm.com/
>>>
>>> Reported-by: Waiman Long <longman@redhat.com>
>>> Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>>> Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
>> Thank you Davidlohr.
>>
>> The patch does seem fairly simple and straight forward.  I need to brush up
>> on my workqueue knowledge to provide a full review.
>>
>> Longman,
>> Do you have a test to reproduce the issue?  If so, can you try running with
>> this patch.
> Yes, I do have a test that can reproduce the issue. I will run it with
> the patch and report the status tomorrow.

I don't think Davidlohr's patch is ready for prime time yet. So I will
wait until a better version is available.

Cheers,
Longman

Michal Hocko Dec. 16, 2019, 1:26 p.m. UTC | #6

On Thu 12-12-19 15:52:20, Waiman Long wrote:
> On 12/12/19 2:22 PM, Mike Kravetz wrote:
> > On 12/12/19 11:04 AM, Davidlohr Bueso wrote:
> >> There have been deadlock reports[1, 2] where put_page is called
> >> from softirq context and this causes trouble with the hugetlb_lock,
> >> as well as potentially the subpool lock.
> >>
> >> For such an unlikely scenario, lets not add irq dancing overhead
> >> to the lock+unlock operations, which could incur in expensive
> >> instruction dependencies, particularly when considering hard-irq
> >> safety. For example PUSHF+POPF on x86.
> >>
> >> Instead, just use a workqueue and do the free_huge_page() in regular
> >> task context.
> >>
> >> [1] https://lore.kernel.org/lkml/20191211194615.18502-1-longman@redhat.com/
> >> [2] https://lore.kernel.org/lkml/20180905112341.21355-1-aneesh.kumar@linux.ibm.com/
> >>
> >> Reported-by: Waiman Long <longman@redhat.com>
> >> Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> >> Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
> > Thank you Davidlohr.
> >
> > The patch does seem fairly simple and straight forward.  I need to brush up
> > on my workqueue knowledge to provide a full review.
> >
> > Longman,
> > Do you have a test to reproduce the issue?  If so, can you try running with
> > this patch.
> 
> Yes, I do have a test that can reproduce the issue. I will run it with
> the patch and report the status tomorrow.

Can you extract guts of the testcase and integrate them into hugetlb
test suite?

Michal Hocko Dec. 16, 2019, 1:37 p.m. UTC | #7

On Thu 12-12-19 11:04:27, Davidlohr Bueso wrote:
> There have been deadlock reports[1, 2] where put_page is called
> from softirq context and this causes trouble with the hugetlb_lock,
> as well as potentially the subpool lock.
> 
> For such an unlikely scenario, lets not add irq dancing overhead
> to the lock+unlock operations, which could incur in expensive
> instruction dependencies, particularly when considering hard-irq
> safety. For example PUSHF+POPF on x86.
> 
> Instead, just use a workqueue and do the free_huge_page() in regular
> task context.

I am afraid that work_struct is too large to be stuffed into the struct
page array (because of the lockdep part).

I think that it would be just safer to make hugetlb_lock irq safe. Are
there any other locks that would require the same?

Waiman Long Dec. 16, 2019, 3:38 p.m. UTC | #8

On 12/16/19 8:26 AM, Michal Hocko wrote:
> On Thu 12-12-19 15:52:20, Waiman Long wrote:
>> On 12/12/19 2:22 PM, Mike Kravetz wrote:
>>> On 12/12/19 11:04 AM, Davidlohr Bueso wrote:
>>>> There have been deadlock reports[1, 2] where put_page is called
>>>> from softirq context and this causes trouble with the hugetlb_lock,
>>>> as well as potentially the subpool lock.
>>>>
>>>> For such an unlikely scenario, lets not add irq dancing overhead
>>>> to the lock+unlock operations, which could incur in expensive
>>>> instruction dependencies, particularly when considering hard-irq
>>>> safety. For example PUSHF+POPF on x86.
>>>>
>>>> Instead, just use a workqueue and do the free_huge_page() in regular
>>>> task context.
>>>>
>>>> [1] https://lore.kernel.org/lkml/20191211194615.18502-1-longman@redhat.com/
>>>> [2] https://lore.kernel.org/lkml/20180905112341.21355-1-aneesh.kumar@linux.ibm.com/
>>>>
>>>> Reported-by: Waiman Long <longman@redhat.com>
>>>> Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>>>> Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
>>> Thank you Davidlohr.
>>>
>>> The patch does seem fairly simple and straight forward.  I need to brush up
>>> on my workqueue knowledge to provide a full review.
>>>
>>> Longman,
>>> Do you have a test to reproduce the issue?  If so, can you try running with
>>> this patch.
>> Yes, I do have a test that can reproduce the issue. I will run it with
>> the patch and report the status tomorrow.
> Can you extract guts of the testcase and integrate them into hugetlb
> test suite?

The test case that I used is the Red Hat internal "Fork vs. fast GUP
race test" written by Jarod Wilson. I would have to ask him if he is OK
with that.

Cheers,
Longman

Waiman Long Dec. 16, 2019, 4:17 p.m. UTC | #9

On 12/16/19 8:37 AM, Michal Hocko wrote:
> On Thu 12-12-19 11:04:27, Davidlohr Bueso wrote:
>> There have been deadlock reports[1, 2] where put_page is called
>> from softirq context and this causes trouble with the hugetlb_lock,
>> as well as potentially the subpool lock.
>>
>> For such an unlikely scenario, lets not add irq dancing overhead
>> to the lock+unlock operations, which could incur in expensive
>> instruction dependencies, particularly when considering hard-irq
>> safety. For example PUSHF+POPF on x86.
>>
>> Instead, just use a workqueue and do the free_huge_page() in regular
>> task context.
> I am afraid that work_struct is too large to be stuffed into the struct
> page array (because of the lockdep part).
>
> I think that it would be just safer to make hugetlb_lock irq safe. Are
> there any other locks that would require the same?

Currently, free_huge_page() can be called from the softIRQ context. The
hugetlb_lock will be acquired during that call. The subpool lock may
conditionally be acquired as well.

I am still torn between converting both locks to be irq-safe or
deferring the freeing to a workqueue.

Cheers,
Longman

Davidlohr Bueso Dec. 16, 2019, 4:17 p.m. UTC | #10

On Mon, 16 Dec 2019, Michal Hocko wrote:
>I am afraid that work_struct is too large to be stuffed into the struct
>page array (because of the lockdep part).

Yeah, this needs to be done without touching struct page.

Which is why I had done the stack allocated way in this patch, but we
cannot wait for it to complete in irq, so that's out the window. Andi
had suggested percpu allocated work items, but having played with the
idea over the weekend, I don't see how we can prevent another page being
freed on the same cpu before previous work on the same cpu is complete
(cpu0 wants to free pageA, schedules the work, in the mean time cpu0
wants to free pageB and workerfn for pageA still hasn't been called).

>I think that it would be just safer to make hugetlb_lock irq safe. Are
>there any other locks that would require the same?

It would be simpler. Any performance issues that arise would probably
be only seen in microbenchmarks, assuming we want to have full irq safety.
If we don't need to worry about hardirq, then even better.

The subpool lock would also need to be irq safe.

Thanks,
Davidlohr

Matthew Wilcox Dec. 16, 2019, 5:18 p.m. UTC | #11

On Mon, Dec 16, 2019 at 08:17:48AM -0800, Davidlohr Bueso wrote:
> On Mon, 16 Dec 2019, Michal Hocko wrote:
> > I am afraid that work_struct is too large to be stuffed into the struct
> > page array (because of the lockdep part).
> 
> Yeah, this needs to be done without touching struct page.
> 
> Which is why I had done the stack allocated way in this patch, but we
> cannot wait for it to complete in irq, so that's out the window. Andi
> had suggested percpu allocated work items, but having played with the
> idea over the weekend, I don't see how we can prevent another page being
> freed on the same cpu before previous work on the same cpu is complete
> (cpu0 wants to free pageA, schedules the work, in the mean time cpu0
> wants to free pageB and workerfn for pageA still hasn't been called).

Why is it that we can call functions after-an-RCU-period-has-elapsed time,
at returning-to-userspace time and after-exiting-hardirq-handler
time easily, but the mechanism for calling a function
after-we've-finished-handling-softirqs is so bloody hard to use?

That's surely the major problem we need to fix.

Waiman Long Dec. 16, 2019, 6:44 p.m. UTC | #12

On 12/16/19 10:38 AM, Waiman Long wrote:
> On 12/16/19 8:26 AM, Michal Hocko wrote:
>> On Thu 12-12-19 15:52:20, Waiman Long wrote:
>>> On 12/12/19 2:22 PM, Mike Kravetz wrote:
>>>> On 12/12/19 11:04 AM, Davidlohr Bueso wrote:
>>>>> There have been deadlock reports[1, 2] where put_page is called
>>>>> from softirq context and this causes trouble with the hugetlb_lock,
>>>>> as well as potentially the subpool lock.
>>>>>
>>>>> For such an unlikely scenario, lets not add irq dancing overhead
>>>>> to the lock+unlock operations, which could incur in expensive
>>>>> instruction dependencies, particularly when considering hard-irq
>>>>> safety. For example PUSHF+POPF on x86.
>>>>>
>>>>> Instead, just use a workqueue and do the free_huge_page() in regular
>>>>> task context.
>>>>>
>>>>> [1] https://lore.kernel.org/lkml/20191211194615.18502-1-longman@redhat.com/
>>>>> [2] https://lore.kernel.org/lkml/20180905112341.21355-1-aneesh.kumar@linux.ibm.com/
>>>>>
>>>>> Reported-by: Waiman Long <longman@redhat.com>
>>>>> Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>>>>> Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
>>>> Thank you Davidlohr.
>>>>
>>>> The patch does seem fairly simple and straight forward.  I need to brush up
>>>> on my workqueue knowledge to provide a full review.
>>>>
>>>> Longman,
>>>> Do you have a test to reproduce the issue?  If so, can you try running with
>>>> this patch.
>>> Yes, I do have a test that can reproduce the issue. I will run it with
>>> the patch and report the status tomorrow.
>> Can you extract guts of the testcase and integrate them into hugetlb
>> test suite?

BTW, what hugetlb test suite are you talking about?

Cheers,
Longman

Mike Kravetz Dec. 16, 2019, 7:08 p.m. UTC | #13

On 12/16/19 8:17 AM, Davidlohr Bueso wrote:
> On Mon, 16 Dec 2019, Michal Hocko wrote:
>> I am afraid that work_struct is too large to be stuffed into the struct
>> page array (because of the lockdep part).
> 
> Yeah, this needs to be done without touching struct page.
> 
> Which is why I had done the stack allocated way in this patch, but we
> cannot wait for it to complete in irq, so that's out the window. Andi
> had suggested percpu allocated work items, but having played with the
> idea over the weekend, I don't see how we can prevent another page being
> freed on the same cpu before previous work on the same cpu is complete
> (cpu0 wants to free pageA, schedules the work, in the mean time cpu0
> wants to free pageB and workerfn for pageA still hasn't been called).
> 
>> I think that it would be just safer to make hugetlb_lock irq safe. Are
>> there any other locks that would require the same?
> 
> It would be simpler. Any performance issues that arise would probably
> be only seen in microbenchmarks, assuming we want to have full irq safety.
> If we don't need to worry about hardirq, then even better.
> 
> The subpool lock would also need to be irq safe.

I do think we need to worry about hardirq.  There are no restruictions that
put_page can not be called from hardirq context. 

I am concerned about the latency of making hugetlb_lock (and potentially
subpool lock) hardirq safe.  When these locks were introduced (before my
time) the concept of making them irq safe was not considered.  Recently,
I learned that the hugetlb_lock is held for a linear scan of ALL hugetlb
pages during a cgroup reparentling operation.  That is just too long.

If there is no viable work queue solution, then I think we would like to
restructure the hugetlb locking before a change to just make hugetlb_lock
irq safe.  The idea would be to split the scope of what is done under
hugetlb_lock.  Most of it would never be executed in irq context.  Then
have a small/limited set of functionality that really needs to be irq
safe protected by an irq safe lock.

Waiman Long Dec. 16, 2019, 7:13 p.m. UTC | #14

On 12/16/19 2:08 PM, Mike Kravetz wrote:
> On 12/16/19 8:17 AM, Davidlohr Bueso wrote:
>> On Mon, 16 Dec 2019, Michal Hocko wrote:
>>> I am afraid that work_struct is too large to be stuffed into the struct
>>> page array (because of the lockdep part).
>> Yeah, this needs to be done without touching struct page.
>>
>> Which is why I had done the stack allocated way in this patch, but we
>> cannot wait for it to complete in irq, so that's out the window. Andi
>> had suggested percpu allocated work items, but having played with the
>> idea over the weekend, I don't see how we can prevent another page being
>> freed on the same cpu before previous work on the same cpu is complete
>> (cpu0 wants to free pageA, schedules the work, in the mean time cpu0
>> wants to free pageB and workerfn for pageA still hasn't been called).
>>
>>> I think that it would be just safer to make hugetlb_lock irq safe. Are
>>> there any other locks that would require the same?
>> It would be simpler. Any performance issues that arise would probably
>> be only seen in microbenchmarks, assuming we want to have full irq safety.
>> If we don't need to worry about hardirq, then even better.
>>
>> The subpool lock would also need to be irq safe.
> I do think we need to worry about hardirq.  There are no restruictions that
> put_page can not be called from hardirq context. 
>
> I am concerned about the latency of making hugetlb_lock (and potentially
> subpool lock) hardirq safe.  When these locks were introduced (before my
> time) the concept of making them irq safe was not considered.  Recently,
> I learned that the hugetlb_lock is held for a linear scan of ALL hugetlb
> pages during a cgroup reparentling operation.  That is just too long.
>
> If there is no viable work queue solution, then I think we would like to
> restructure the hugetlb locking before a change to just make hugetlb_lock
> irq safe.  The idea would be to split the scope of what is done under
> hugetlb_lock.  Most of it would never be executed in irq context.  Then
> have a small/limited set of functionality that really needs to be irq
> safe protected by an irq safe lock.
>
Please take a look at my recently posted patch to see if that is an
acceptable workqueue based solution.

Thanks,
Longman

Michal Hocko Dec. 17, 2019, 9 a.m. UTC | #15

On Mon 16-12-19 13:44:53, Waiman Long wrote:
> On 12/16/19 10:38 AM, Waiman Long wrote:
[...]
> >> Can you extract guts of the testcase and integrate them into hugetlb
> >> test suite?
> 
> BTW, what hugetlb test suite are you talking about?

I was using tests from libhugetlbfs package in the past. There are few
tests in LTP project but the libhugetlbfs coverage used to cover the
largest part of the functionality.

Is there any newer home for the package than [1], Mike? Btw. would it
mak sense to migrate those tests to a more common place, LTP or kernel
selftests?

[1] https://github.com/libhugetlbfs/libhugetlbfs/tree/master/tests

Mike Kravetz Dec. 17, 2019, 6:05 p.m. UTC | #16

Cc: Eric

On 12/17/19 1:00 AM, Michal Hocko wrote:
> On Mon 16-12-19 13:44:53, Waiman Long wrote:
>> On 12/16/19 10:38 AM, Waiman Long wrote:
> [...]
>>>> Can you extract guts of the testcase and integrate them into hugetlb
>>>> test suite?
>>
>> BTW, what hugetlb test suite are you talking about?
> 
> I was using tests from libhugetlbfs package in the past. There are few
> tests in LTP project but the libhugetlbfs coverage used to cover the
> largest part of the functionality.
> 
> Is there any newer home for the package than [1], Mike? Btw. would it
> mak sense to migrate those tests to a more common place, LTP or kernel
> selftests?

That is the latest home/release for libhugetlbfs.

The libhugetlbfs test suite is somewhat strange in that I suspect it started
as testing for libhugetlbfs itself.  When it was written, the thought may have
been that people would use libhugetlfs as the primary interface to hugetlb
pages.  That is not the case today.  Over time, hugetlbfs tests not associated
with libhugetlbfs were added.

If we want to migrate libhugetlbfs tests, then I think we would only want to
migrate the non-libhugetlbfs test cases.  Although, the libhugetlbfs specific
tests are useful as they 'could' point out regressions.

Added Eric (libhugetlbfs maintainer) on Cc for another opinion.

> 
> [1] https://github.com/libhugetlbfs/libhugetlbfs/tree/master/tests
>

Michal Hocko Dec. 18, 2019, 12:18 p.m. UTC | #17

On Tue 17-12-19 10:05:06, Mike Kravetz wrote:
> Cc: Eric
> 
> On 12/17/19 1:00 AM, Michal Hocko wrote:
> > On Mon 16-12-19 13:44:53, Waiman Long wrote:
> >> On 12/16/19 10:38 AM, Waiman Long wrote:
> > [...]
> >>>> Can you extract guts of the testcase and integrate them into hugetlb
> >>>> test suite?
> >>
> >> BTW, what hugetlb test suite are you talking about?
> > 
> > I was using tests from libhugetlbfs package in the past. There are few
> > tests in LTP project but the libhugetlbfs coverage used to cover the
> > largest part of the functionality.
> > 
> > Is there any newer home for the package than [1], Mike? Btw. would it
> > mak sense to migrate those tests to a more common place, LTP or kernel
> > selftests?
> 
> That is the latest home/release for libhugetlbfs.
> 
> The libhugetlbfs test suite is somewhat strange in that I suspect it started
> as testing for libhugetlbfs itself.  When it was written, the thought may have
> been that people would use libhugetlfs as the primary interface to hugetlb
> pages.  That is not the case today.  Over time, hugetlbfs tests not associated
> with libhugetlbfs were added.
> 
> If we want to migrate libhugetlbfs tests, then I think we would only want to
> migrate the non-libhugetlbfs test cases.  Although, the libhugetlbfs specific
> tests are useful as they 'could' point out regressions.

Yeah, I can second that. I remember using the suite and it pointed to
real issues when I was touching the area in the past. So if we can get
as many tests to be independent on the library and integrate it to some
existing testing project - be it kernel selftest or LTP - then it would
be really great and I assume the testing coverage of the hugetlb
functionality would increase dramatically.

[v2] mm/hugetlb: defer free_huge_page() to a workqueue

Commit Message

Comments

Patch