diff mbox series

mm: madvise(MADV_DODUMP) allow hugetlbfs pages

Message ID 20180930054629.29150-1-daniel@linux.ibm.com (mailing list archive)
State Mainlined, archived
Headers show
Series mm: madvise(MADV_DODUMP) allow hugetlbfs pages | expand

Commit Message

Daniel Black Sept. 30, 2018, 5:46 a.m. UTC
Reproducer, assuming 2M of hugetlbfs available:

Hugetlbfs mounted, size=2M and option user=testuser

  # mount | grep ^hugetlbfs
  hugetlbfs on /dev/hugepages type hugetlbfs (rw,pagesize=2M,user=dan)
  # sysctl vm.nr_hugepages=1
  vm.nr_hugepages = 1
  # grep Huge /proc/meminfo
  AnonHugePages:         0 kB
  ShmemHugePages:        0 kB
  HugePages_Total:       1
  HugePages_Free:        1
  HugePages_Rsvd:        0
  HugePages_Surp:        0
  Hugepagesize:       2048 kB
  Hugetlb:            2048 kB

Code:

  #include <sys/mman.h>
  #include <stddef.h>
  #define SIZE 2*1024*1024
  int main()
  {
    void *ptr;
    ptr = mmap(NULL, SIZE, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_HUGETLB | MAP_ANONYMOUS, -1, 0);
    madvise(ptr, SIZE, MADV_DONTDUMP);
    madvise(ptr, SIZE, MADV_DODUMP);
  }

Compile and strace:

  mmap(NULL, 2097152, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_HUGETLB, -1, 0) = 0x7ff7c9200000
  madvise(0x7ff7c9200000, 2097152, MADV_DONTDUMP) = 0
  madvise(0x7ff7c9200000, 2097152, MADV_DODUMP) = -1 EINVAL (Invalid argument)

hugetlbfs pages have VM_DONTEXPAND in the VmFlags driver pages based on
author testing with analysis from Florian Weimer[1].

The inclusion of VM_DONTEXPAND into the VM_SPECIAL defination
was a consequence of the large useage of VM_DONTEXPAND in device
drivers.

A consequence of [2] is that VM_DONTEXPAND marked pages are unable to be
marked DODUMP.

A user could quite legitimately madvise(MADV_DONTDUMP) their hugetlbfs
memory for a while and later request that madvise(MADV_DODUMP) on the
same memory. We correct this omission by allowing madvice(MADV_DODUMP)
on hugetlbfs pages.

[1] https://stackoverflow.com/questions/52548260/madvisedodump-on-the-same-ptr-size-as-a-successful-madvisedontdump-fails-wit
[2] commit 0103bd16fb90 ("mm: prepare VM_DONTDUMP for using in drivers")

Fixes: 0103bd16fb90 ("mm: prepare VM_DONTDUMP for using in drivers")
Reported-by: Kenneth Penza <kpenza@gmail.com>
Signed-off-by: Daniel Black <daniel@linux.ibm.com>
Buglink: https://lists.launchpad.net/maria-discuss/msg05245.html
Cc: linux-mm@kvack.org
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
---
 mm/madvise.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Mike Kravetz Oct. 1, 2018, 10:11 p.m. UTC | #1
On 9/29/18 10:46 PM, Daniel Black wrote:
<snip>
> hugetlbfs pages have VM_DONTEXPAND in the VmFlags driver pages based on
> author testing with analysis from Florian Weimer[1].
> 
> The inclusion of VM_DONTEXPAND into the VM_SPECIAL defination
> was a consequence of the large useage of VM_DONTEXPAND in device
> drivers.
> 
> A consequence of [2] is that VM_DONTEXPAND marked pages are unable to be
> marked DODUMP.
> 
> A user could quite legitimately madvise(MADV_DONTDUMP) their hugetlbfs
> memory for a while and later request that madvise(MADV_DODUMP) on the
> same memory. We correct this omission by allowing madvice(MADV_DODUMP)
> on hugetlbfs pages.
> 
> [1] https://stackoverflow.com/questions/52548260/madvisedodump-on-the-same-ptr-size-as-a-successful-madvisedontdump-fails-wit
> [2] commit 0103bd16fb90 ("mm: prepare VM_DONTDUMP for using in drivers")
> 
> Fixes: 0103bd16fb90 ("mm: prepare VM_DONTDUMP for using in drivers")
> Reported-by: Kenneth Penza <kpenza@gmail.com>
> Signed-off-by: Daniel Black <daniel@linux.ibm.com>
> Buglink: https://lists.launchpad.net/maria-discuss/msg05245.html
> Cc: linux-mm@kvack.org
> Cc: Mike Kravetz <mike.kravetz@oracle.com>
> Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
> ---
>  mm/madvise.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 972a9eaa898b..71d21df2a3f3 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -96,7 +96,7 @@ static long madvise_behavior(struct vm_area_struct *vma,
>  		new_flags |= VM_DONTDUMP;
>  		break;
>  	case MADV_DODUMP:
> -		if (new_flags & VM_SPECIAL) {
> +		if (!is_vm_hugetlb_page(vma) && new_flags & VM_SPECIAL) {

Thanks Daniel,

This is certainly a regression.  My only question is whether this condition
should be more specific and test the default hugetlb vma flags
(VM_DONTEXPAND | VM_HUGETLB).  Or, whether simply checking VM_HUGETLB as you
have done above is sufficient.  Only reason for concern is that I am not
100% certain other VM_SPECIAL flags could not be set in VM_HUGETLB vma.

Perhaps Konstantin has an opinion as he did a bunch of the vm_flag reorg.
Daniel Black Oct. 3, 2018, 5:47 a.m. UTC | #2
On Mon, 1 Oct 2018 15:11:32 -0700
Mike Kravetz <mike.kravetz@oracle.com> wrote:

> On 9/29/18 10:46 PM, Daniel Black wrote:
> <snip>
> > diff --git a/mm/madvise.c b/mm/madvise.c
> > index 972a9eaa898b..71d21df2a3f3 100644
> > --- a/mm/madvise.c
> > +++ b/mm/madvise.c
> > @@ -96,7 +96,7 @@ static long madvise_behavior(struct
> > vm_area_struct *vma, new_flags |= VM_DONTDUMP;
> >  		break;
> >  	case MADV_DODUMP:
> > -		if (new_flags & VM_SPECIAL) {
> > +		if (!is_vm_hugetlb_page(vma) && new_flags &
> > VM_SPECIAL) {
> 
> Thanks Daniel,
> 
> This is certainly a regression.  My only question is whether this
> condition should be more specific and test the default hugetlb vma
> flags (VM_DONTEXPAND | VM_HUGETLB).

>  Or, whether simply checking
> VM_HUGETLB as you have done above is sufficient.

The is_vm_hugetlb_page() function seems widely used elsewhere for that
single purpose.

> Only reason for
> concern is that I am not 100% certain other VM_SPECIAL flags could
> not be set in VM_HUGETLB vma.

They might be, but being a VM_HUGETLB flag is the main criteria for
being able to madvise(DODUMP) on the memory. It highlight its user
memory for the user to do as they wish.

When 314e51b9851b was added, it seemed the direction was to kill of the
VM_RESERVED, now a few years later VM_SPECIAL seems to be replacing
this. I think it would be better to preserve the original goal and
keep flags having a single meaning.

The purpose in 0103bd16fb90 as I surmise it, is that VM_IO | VM_PFNMAP
| VM_MIXEDMAP are the true things that want to be prevented from having
madvise(DO_DUMP) on them, based on frequent use of DONT_DUMP with those
memory pages. Was VM_DONTEXPAND an intentional inclusion there it did it
just get included with VM_SPECIAL?

Either way, I've tried to keep to the principles of the
is_vm_hugetlb_page function being the authoritative source of a HUGETLB
page.

> Perhaps Konstantin has an opinion as he did a bunch of the vm_flag
> reorg.
> 

Thanks for the review.
Mike Kravetz Oct. 4, 2018, 9:47 p.m. UTC | #3
On 10/2/18 10:47 PM, Daniel Black wrote:
> On Mon, 1 Oct 2018 15:11:32 -0700
> Mike Kravetz <mike.kravetz@oracle.com> wrote:
> 
>> On 9/29/18 10:46 PM, Daniel Black wrote:
>> <snip>
>> > diff --git a/mm/madvise.c b/mm/madvise.c
>> > index 972a9eaa898b..71d21df2a3f3 100644
>> > --- a/mm/madvise.c
>> > +++ b/mm/madvise.c
>> > @@ -96,7 +96,7 @@ static long madvise_behavior(struct
>> > vm_area_struct *vma, new_flags |= VM_DONTDUMP;
>> >          break;
>> >      case MADV_DODUMP:
>> > -        if (new_flags & VM_SPECIAL) {
>> > +        if (!is_vm_hugetlb_page(vma) && new_flags &
>> > VM_SPECIAL) {
>>
>> Thanks Daniel,
>>
>> This is certainly a regression.  My only question is whether this
>> condition should be more specific and test the default hugetlb vma
>> flags (VM_DONTEXPAND | VM_HUGETLB).
> 
>>  Or, whether simply checking
>> VM_HUGETLB as you have done above is sufficient.
> 
> The is_vm_hugetlb_page() function seems widely used elsewhere for that
> single purpose.
> 
>> Only reason for
>> concern is that I am not 100% certain other VM_SPECIAL flags could
>> not be set in VM_HUGETLB vma.
> 
> They might be, but being a VM_HUGETLB flag is the main criteria for
> being able to madvise(DODUMP) on the memory. It highlight its user
> memory for the user to do as they wish.
> 
> When 314e51b9851b was added, it seemed the direction was to kill of the
> VM_RESERVED, now a few years later VM_SPECIAL seems to be replacing
> this. I think it would be better to preserve the original goal and
> keep flags having a single meaning.
> 
> The purpose in 0103bd16fb90 as I surmise it, is that VM_IO | VM_PFNMAP
> | VM_MIXEDMAP are the true things that want to be prevented from having
> madvise(DO_DUMP) on them, based on frequent use of DONT_DUMP with those
> memory pages. Was VM_DONTEXPAND an intentional inclusion there it did it
> just get included with VM_SPECIAL?
> 
> Either way, I've tried to keep to the principles of the
> is_vm_hugetlb_page function being the authoritative source of a HUGETLB
> page.
> 
>> Perhaps Konstantin has an opinion as he did a bunch of the vm_flag
>> reorg.
>>
> 
> Thanks for the review.
> 

Thanks for the explanation.  I was thinking of the case where hugetlb pages
could be used for RDMA, and wanted to make sure those drivers were not doing
anything to vm_flags.  A quick look reveals nothing special happening.

You can add,
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>

Adding Andrew on Cc:
diff mbox series

Patch

diff --git a/mm/madvise.c b/mm/madvise.c
index 972a9eaa898b..71d21df2a3f3 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -96,7 +96,7 @@  static long madvise_behavior(struct vm_area_struct *vma,
 		new_flags |= VM_DONTDUMP;
 		break;
 	case MADV_DODUMP:
-		if (new_flags & VM_SPECIAL) {
+		if (!is_vm_hugetlb_page(vma) && new_flags & VM_SPECIAL) {
 			error = -EINVAL;
 			goto out;
 		}