diff mbox series

[V1,2/5] mm/hugetlb: fix memfd_pin_folios free_huge_pages leak

Message ID 1725373521-451395-3-git-send-email-steven.sistare@oracle.com (mailing list archive)
State New
Headers show
Series memfd-pin huge page fixes | expand

Commit Message

Steven Sistare Sept. 3, 2024, 2:25 p.m. UTC
memfd_pin_folios followed by unpin_folios fails to restore
free_huge_pages if the pages were not already faulted in, because the
folio refcount for pages created by memfd_alloc_folio never goes to 0.
memfd_pin_folios needs another folio_put to undo the folio_try_get
below:

memfd_alloc_folio()
  alloc_hugetlb_folio_nodemask()
    dequeue_hugetlb_folio_nodemask()
      dequeue_hugetlb_folio_node_exact()
        folio_ref_unfreeze(folio, 1);    ; adds 1 refcount
  folio_try_get()                        ; adds 1 refcount
  hugetlb_add_to_page_cache()            ; adds 512 refcount (on x86)

With the fix, after memfd_pin_folios + unpin_folios, the refcount for
the (unfaulted) page is 512, which is correct, as the refcount for a
faulted unpinned page is 513.

Fixes: 89c1905d9c14 ("mm/gup: introduce memfd_pin_folios() for pinning memfd folios")

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 mm/gup.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Kasireddy, Vivek Sept. 4, 2024, 12:45 a.m. UTC | #1
Hi Steve,

> Subject: [PATCH V1 2/5] mm/hugetlb: fix memfd_pin_folios free_huge_pages
> leak
> 
> memfd_pin_folios followed by unpin_folios fails to restore
> free_huge_pages if the pages were not already faulted in, because the
> folio refcount for pages created by memfd_alloc_folio never goes to 0.
> memfd_pin_folios needs another folio_put to undo the folio_try_get
> below:
> 
> memfd_alloc_folio()
>   alloc_hugetlb_folio_nodemask()
>     dequeue_hugetlb_folio_nodemask()
>       dequeue_hugetlb_folio_node_exact()
>         folio_ref_unfreeze(folio, 1);    ; adds 1 refcount
>   folio_try_get()                        ; adds 1 refcount
I wonder if it is more optimal to skip the folio_try_get() above.
I think it (folio_try_get) was probably added because I didn't realize that
alloc_hugetlb_folio_nodemask() already adds a reference.

>   hugetlb_add_to_page_cache()            ; adds 512 refcount (on x86)
> 
> With the fix, after memfd_pin_folios + unpin_folios, the refcount for
> the (unfaulted) page is 512, which is correct, as the refcount for a
> faulted unpinned page is 513.
> 
> Fixes: 89c1905d9c14 ("mm/gup: introduce memfd_pin_folios() for pinning
> memfd folios")
> 
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> ---
>  mm/gup.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/gup.c b/mm/gup.c
> index 54d0dc3..5b92f1d 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -3618,7 +3618,7 @@ long memfd_pin_folios(struct file *memfd, loff_t
> start, loff_t end,
>  	pgoff_t start_idx, end_idx, next_idx;
>  	struct folio *folio = NULL;
>  	struct folio_batch fbatch;
> -	struct hstate *h;
> +	struct hstate *h = NULL;
>  	long ret = -EINVAL;
> 
>  	if (start < 0 || start > end || !max_folios)
> @@ -3662,6 +3662,8 @@ long memfd_pin_folios(struct file *memfd, loff_t
> start, loff_t end,
>  							     &fbatch);
>  			if (folio) {
>  				folio_put(folio);
> +				if (h)
> +					folio_put(folio);
If we stick with this change, I guess there needs to be an additional
folio_put() in memfd_alloc_folio() as well if adding the folio to the
page cache fails.

Acked-by: Vivek Kasireddy <vivek.kasireddy@intel.com>

Thanks,
Vivek
>  				folio = NULL;
>  			}
> 
> --
> 1.8.3.1
Steven Sistare Sept. 4, 2024, 2:52 p.m. UTC | #2
On 9/3/2024 8:45 PM, Kasireddy, Vivek wrote:
> Hi Steve,
> 
>> Subject: [PATCH V1 2/5] mm/hugetlb: fix memfd_pin_folios free_huge_pages
>> leak
>>
>> memfd_pin_folios followed by unpin_folios fails to restore
>> free_huge_pages if the pages were not already faulted in, because the
>> folio refcount for pages created by memfd_alloc_folio never goes to 0.
>> memfd_pin_folios needs another folio_put to undo the folio_try_get
>> below:
>>
>> memfd_alloc_folio()
>>    alloc_hugetlb_folio_nodemask()
>>      dequeue_hugetlb_folio_nodemask()
>>        dequeue_hugetlb_folio_node_exact()
>>          folio_ref_unfreeze(folio, 1);    ; adds 1 refcount
>>    folio_try_get()                        ; adds 1 refcount
> I wonder if it is more optimal to skip the folio_try_get() above.
> I think it (folio_try_get) was probably added because I didn't realize that
> alloc_hugetlb_folio_nodemask() already adds a reference.

Agreed, that is a simpler fix.  I tried it and my tests pass.

>>    hugetlb_add_to_page_cache()            ; adds 512 refcount (on x86)
>>
>> With the fix, after memfd_pin_folios + unpin_folios, the refcount for
>> the (unfaulted) page is 512, which is correct, as the refcount for a
>> faulted unpinned page is 513.
>>
>> Fixes: 89c1905d9c14 ("mm/gup: introduce memfd_pin_folios() for pinning
>> memfd folios")
>>
>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>> ---
>>   mm/gup.c | 4 +++-
>>   1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/gup.c b/mm/gup.c
>> index 54d0dc3..5b92f1d 100644
>> --- a/mm/gup.c
>> +++ b/mm/gup.c
>> @@ -3618,7 +3618,7 @@ long memfd_pin_folios(struct file *memfd, loff_t
>> start, loff_t end,
>>   	pgoff_t start_idx, end_idx, next_idx;
>>   	struct folio *folio = NULL;
>>   	struct folio_batch fbatch;
>> -	struct hstate *h;
>> +	struct hstate *h = NULL;
>>   	long ret = -EINVAL;
>>
>>   	if (start < 0 || start > end || !max_folios)
>> @@ -3662,6 +3662,8 @@ long memfd_pin_folios(struct file *memfd, loff_t
>> start, loff_t end,
>>   							     &fbatch);
>>   			if (folio) {
>>   				folio_put(folio);
>> +				if (h)
>> +					folio_put(folio);
> If we stick with this change, I guess there needs to be an additional
> folio_put() in memfd_alloc_folio() as well if adding the folio to the
> page cache fails.

Indeed, another reason why just deleting the folio_try_get is better.  Thanks!

I will submit a V2 of this patch.

- Steve
diff mbox series

Patch

diff --git a/mm/gup.c b/mm/gup.c
index 54d0dc3..5b92f1d 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -3618,7 +3618,7 @@  long memfd_pin_folios(struct file *memfd, loff_t start, loff_t end,
 	pgoff_t start_idx, end_idx, next_idx;
 	struct folio *folio = NULL;
 	struct folio_batch fbatch;
-	struct hstate *h;
+	struct hstate *h = NULL;
 	long ret = -EINVAL;
 
 	if (start < 0 || start > end || !max_folios)
@@ -3662,6 +3662,8 @@  long memfd_pin_folios(struct file *memfd, loff_t start, loff_t end,
 							     &fbatch);
 			if (folio) {
 				folio_put(folio);
+				if (h)
+					folio_put(folio);
 				folio = NULL;
 			}