mbox series

[0/2] Track reserve map changes to restore on error

Message ID 20210525233134.246444-1-mike.kravetz@oracle.com (mailing list archive)
Headers show
Series Track reserve map changes to restore on error | expand

Message

Mike Kravetz May 25, 2021, 11:31 p.m. UTC
Here is a modification to the reservation tracking for fixup on errors.
It is a more general change, but should work for the hugetlb_mcopy_pte_atomic
case as well.

Perhaps use this as a prerequisite for your fix(es)?  Pretty sure this
will eliminate the need for the call to hugetlb_unreserve_pages.

Mike Kravetz (2):
  hugetlb: rename HPageRestoreReserve flag to HPageRestoreRsvCnt
  hugetlb: add new hugetlb specific flag HPG_restore_rsv_map

 fs/hugetlbfs/inode.c    |   3 +-
 include/linux/hugetlb.h |  17 +++++--
 mm/hugetlb.c            | 108 ++++++++++++++++++++++++++++------------
 mm/userfaultfd.c        |  14 +++---
 4 files changed, 99 insertions(+), 43 deletions(-)

Comments

Muchun Song May 26, 2021, 3:19 a.m. UTC | #1
On Wed, May 26, 2021 at 7:31 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>
> Here is a modification to the reservation tracking for fixup on errors.
> It is a more general change, but should work for the hugetlb_mcopy_pte_atomic
> case as well.
>
> Perhaps use this as a prerequisite for your fix(es)?  Pretty sure this
> will eliminate the need for the call to hugetlb_unreserve_pages.

Hi Mike,

It seems like someone is fixing a bug, right? Maybe a link should be
placed in the cover letter so that someone can know what issue
we are facing.

Thanks.

>
> Mike Kravetz (2):
>   hugetlb: rename HPageRestoreReserve flag to HPageRestoreRsvCnt
>   hugetlb: add new hugetlb specific flag HPG_restore_rsv_map
>
>  fs/hugetlbfs/inode.c    |   3 +-
>  include/linux/hugetlb.h |  17 +++++--
>  mm/hugetlb.c            | 108 ++++++++++++++++++++++++++++------------
>  mm/userfaultfd.c        |  14 +++---
>  4 files changed, 99 insertions(+), 43 deletions(-)
>
> --
> 2.31.1
>
Mike Kravetz May 26, 2021, 5:17 p.m. UTC | #2
On 5/25/21 8:19 PM, Muchun Song wrote:
> On Wed, May 26, 2021 at 7:31 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>>
>> Here is a modification to the reservation tracking for fixup on errors.
>> It is a more general change, but should work for the hugetlb_mcopy_pte_atomic
>> case as well.
>>
>> Perhaps use this as a prerequisite for your fix(es)?  Pretty sure this
>> will eliminate the need for the call to hugetlb_unreserve_pages.
> 
> Hi Mike,
> 
> It seems like someone is fixing a bug, right? Maybe a link should be
> placed in the cover letter so that someone can know what issue
> we are facing.
> 

Thanks Muchun,

I wanted to first see if these patches would work in the code Mina is
modifying.  If this works for Mina, then a more formal patch and request
for inclusion will be sent.

I believe this issue has existed since the introduction of hugetlb
reservations in v2.6.18.  Since the bug only shows up when we take error
paths, the issue may not have been observed.  Mina found a similar issue
in an error path which could also expose this issue.
Mina Almasry May 26, 2021, 11:19 p.m. UTC | #3
On Wed, May 26, 2021 at 10:17 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>
> On 5/25/21 8:19 PM, Muchun Song wrote:
> > On Wed, May 26, 2021 at 7:31 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
> >>
> >> Here is a modification to the reservation tracking for fixup on errors.
> >> It is a more general change, but should work for the hugetlb_mcopy_pte_atomic
> >> case as well.
> >>
> >> Perhaps use this as a prerequisite for your fix(es)?  Pretty sure this
> >> will eliminate the need for the call to hugetlb_unreserve_pages.
> >
> > Hi Mike,
> >
> > It seems like someone is fixing a bug, right? Maybe a link should be
> > placed in the cover letter so that someone can know what issue
> > we are facing.
> >
>
> Thanks Muchun,
>
> I wanted to first see if these patches would work in the code Mina is
> modifying.  If this works for Mina, then a more formal patch and request
> for inclusion will be sent.
>

So a quick test: I apply my patche and yours on top of linus/master,
and I remove the hugetlb_unreserve_pages() call that triggered this
conversation, and run the userfaultfd test, resv_huge_pages underflows
again, so it seems on the surface this doesn't quite work as is.

Not quite sure what to do off the top of my head. I think I will try
to debug why the 3 patches don't work together and I will fix either
your patch or mine. I haven't taken a deep look yet; I just ran a
quick test.

> I believe this issue has existed since the introduction of hugetlb
> reservations in v2.6.18.  Since the bug only shows up when we take error
> paths, the issue may not have been observed.  Mina found a similar issue
> in an error path which could also expose this issue.
> --
> Mike Kravetz
Mina Almasry May 27, 2021, 2:48 a.m. UTC | #4
On Wed, May 26, 2021 at 4:19 PM Mina Almasry <almasrymina@google.com> wrote:
>
> On Wed, May 26, 2021 at 10:17 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
> >
> > On 5/25/21 8:19 PM, Muchun Song wrote:
> > > On Wed, May 26, 2021 at 7:31 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
> > >>
> > >> Here is a modification to the reservation tracking for fixup on errors.
> > >> It is a more general change, but should work for the hugetlb_mcopy_pte_atomic
> > >> case as well.
> > >>
> > >> Perhaps use this as a prerequisite for your fix(es)?  Pretty sure this
> > >> will eliminate the need for the call to hugetlb_unreserve_pages.
> > >
> > > Hi Mike,
> > >
> > > It seems like someone is fixing a bug, right? Maybe a link should be
> > > placed in the cover letter so that someone can know what issue
> > > we are facing.
> > >
> >
> > Thanks Muchun,
> >
> > I wanted to first see if these patches would work in the code Mina is
> > modifying.  If this works for Mina, then a more formal patch and request
> > for inclusion will be sent.
> >
>
> So a quick test: I apply my patche and yours on top of linus/master,
> and I remove the hugetlb_unreserve_pages() call that triggered this
> conversation, and run the userfaultfd test, resv_huge_pages underflows
> again, so it seems on the surface this doesn't quite work as is.
>
> Not quite sure what to do off the top of my head. I think I will try
> to debug why the 3 patches don't work together and I will fix either
> your patch or mine. I haven't taken a deep look yet; I just ran a
> quick test.
>

Ok found the issue. With the setup I described above, the
hugetlb_shared test case passes:

./tools/testing/selftests/vm/userfaultfd hugetlb_shared 10 2
/tmp/kokonut_test/huge/userfaultfd_test && echo test success

The non-shared test case is the one that underflows:

./tools/testing/selftests/vm/userfaultfd hugetlb 10 2
/tmp/kokonut_test/huge/userfaultfd_test && echo test success

I've debugged a bit, and this messy hunk 'fixes' the underflow with
the non-shared case. (Sorry for the messiness).

@@ -2329,17 +2340,14 @@ void restore_reserve_on_error(struct hstate
*h, struct vm_area_struct *vma,
                                 */
                                SetHPageRestoreRsvCnt(page);
                } else {
-                       rc = vma_needs_reservation(h, vma, address);
-                       if (rc < 0)
-                               /*
-                                * See above comment about rare out of
-                                * memory condition.
-                                */
-                               SetHPageRestoreRsvCnt(page);
-                       else if (rc)
-                               vma_add_reservation(h, vma, address);
-                       else
-                               vma_end_reservation(h, vma, address);
+                       resv = inode_resv_map(vma->vm_file->f_mapping->host);
+                       if (resv) {
+                               int chg = region_del(resv, idx, idx+1);
+                               VM_BUG_ON(chg);
+                       }

The reason being is that on page allocation we region_add() an entry
into the resv_map regardless of whether this is a shared mapping or
not (vma_needs_reservation() + vma_commit_reservation(), which amounts
to region_add() at the end of the day).

To unroll back this change on error, we need to region_del() the region_add().

The code removed above doesn't end up calling region_del(), because
vma_needs_reservation() returns 0, because region_chg() sees there is
an entry in the resv_map, and returns 0.

The VM_BUG_ON() is just because I'm not sure how to handle that error.

> > I believe this issue has existed since the introduction of hugetlb
> > reservations in v2.6.18.  Since the bug only shows up when we take error
> > paths, the issue may not have been observed.  Mina found a similar issue
> > in an error path which could also expose this issue.
> > --
> > Mike Kravetz
Mike Kravetz May 27, 2021, 4:08 p.m. UTC | #5
On 5/26/21 7:48 PM, Mina Almasry wrote:
> On Wed, May 26, 2021 at 4:19 PM Mina Almasry <almasrymina@google.com> wrote:
>>
>> On Wed, May 26, 2021 at 10:17 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>>>
>>> On 5/25/21 8:19 PM, Muchun Song wrote:
>>>> On Wed, May 26, 2021 at 7:31 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>>>>>
>>>>> Here is a modification to the reservation tracking for fixup on errors.
>>>>> It is a more general change, but should work for the hugetlb_mcopy_pte_atomic
>>>>> case as well.
>>>>>
>>>>> Perhaps use this as a prerequisite for your fix(es)?  Pretty sure this
>>>>> will eliminate the need for the call to hugetlb_unreserve_pages.
>>>>
>>>> Hi Mike,
>>>>
>>>> It seems like someone is fixing a bug, right? Maybe a link should be
>>>> placed in the cover letter so that someone can know what issue
>>>> we are facing.
>>>>
>>>
>>> Thanks Muchun,
>>>
>>> I wanted to first see if these patches would work in the code Mina is
>>> modifying.  If this works for Mina, then a more formal patch and request
>>> for inclusion will be sent.
>>>
>>
>> So a quick test: I apply my patche and yours on top of linus/master,
>> and I remove the hugetlb_unreserve_pages() call that triggered this
>> conversation, and run the userfaultfd test, resv_huge_pages underflows
>> again, so it seems on the surface this doesn't quite work as is.
>>
>> Not quite sure what to do off the top of my head. I think I will try
>> to debug why the 3 patches don't work together and I will fix either
>> your patch or mine. I haven't taken a deep look yet; I just ran a
>> quick test.
>>
> 
> Ok found the issue. With the setup I described above, the
> hugetlb_shared test case passes:
> 
> ./tools/testing/selftests/vm/userfaultfd hugetlb_shared 10 2
> /tmp/kokonut_test/huge/userfaultfd_test && echo test success
> 
> The non-shared test case is the one that underflows:
> 
> ./tools/testing/selftests/vm/userfaultfd hugetlb 10 2
> /tmp/kokonut_test/huge/userfaultfd_test && echo test success
> 
> I've debugged a bit, and this messy hunk 'fixes' the underflow with
> the non-shared case. (Sorry for the messiness).
> 
> @@ -2329,17 +2340,14 @@ void restore_reserve_on_error(struct hstate
> *h, struct vm_area_struct *vma,
>                                  */
>                                 SetHPageRestoreRsvCnt(page);
>                 } else {
> -                       rc = vma_needs_reservation(h, vma, address);
> -                       if (rc < 0)
> -                               /*
> -                                * See above comment about rare out of
> -                                * memory condition.
> -                                */
> -                               SetHPageRestoreRsvCnt(page);
> -                       else if (rc)
> -                               vma_add_reservation(h, vma, address);
> -                       else
> -                               vma_end_reservation(h, vma, address);
> +                       resv = inode_resv_map(vma->vm_file->f_mapping->host);
> +                       if (resv) {
> +                               int chg = region_del(resv, idx, idx+1);
> +                               VM_BUG_ON(chg);
> +                       }
> 
> The reason being is that on page allocation we region_add() an entry
> into the resv_map regardless of whether this is a shared mapping or
> not (vma_needs_reservation() + vma_commit_reservation(), which amounts
> to region_add() at the end of the day).
> 
> To unroll back this change on error, we need to region_del() the region_add().
> 
> The code removed above doesn't end up calling region_del(), because
> vma_needs_reservation() returns 0, because region_chg() sees there is
> an entry in the resv_map, and returns 0.
> 
> The VM_BUG_ON() is just because I'm not sure how to handle that error.
> 

Thanks Mina!

Yes, that new block of code in restore_reserve_on_error is incorrect for
the private mapping case.  Since alloc_huge_page does the region_add for
both shared and private mappings, it seems we should just do the
region_del for both.  I'll update this patch to fix this and take your
other comments into account.