diff mbox series

[163/227] mm: madvise: skip unmapped vma holes passed to process_madvise

Message ID 20220322214648.AB7A1C340EC@smtp.kernel.org (mailing list archive)
State New
Headers show
Series [001/227] linux/kthread.h: remove unused macros | expand

Commit Message

Andrew Morton March 22, 2022, 9:46 p.m. UTC
From: Charan Teja Kalla <quic_charante@quicinc.com>
Subject: mm: madvise: skip unmapped vma holes passed to process_madvise

The process_madvise() system call is expected to skip holes in vma passed
through 'struct iovec' vector list.  But do_madvise, which
process_madvise() calls for each vma, returns ENOMEM in case of unmapped
holes, despite the VMA is processed.

Thus process_madvise() should treat ENOMEM as expected and consider the
VMA passed to as processed and continue processing other vma's in the
vector list.  Returning -ENOMEM to user, despite the VMA is processed,
will be unable to figure out where to start the next madvise.

Link: https://lkml.kernel.org/r/4f091776142f2ebf7b94018146de72318474e686.1647008754.git.quic_charante@quicinc.com
Fixes: ecb8ac8b1f14("mm/madvise: introduce process_madvise() syscall: an external memory hinting API")
Signed-off-by: Charan Teja Kalla <quic_charante@quicinc.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/madvise.c |    9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

Comments

Minchan Kim March 23, 2022, 12:24 a.m. UTC | #1
On Tue, Mar 22, 2022 at 02:46:48PM -0700, Andrew Morton wrote:
> From: Charan Teja Kalla <quic_charante@quicinc.com>
> Subject: mm: madvise: skip unmapped vma holes passed to process_madvise
> 
> The process_madvise() system call is expected to skip holes in vma passed
> through 'struct iovec' vector list.  But do_madvise, which
> process_madvise() calls for each vma, returns ENOMEM in case of unmapped
> holes, despite the VMA is processed.
> 
> Thus process_madvise() should treat ENOMEM as expected and consider the
> VMA passed to as processed and continue processing other vma's in the
> vector list.  Returning -ENOMEM to user, despite the VMA is processed,
> will be unable to figure out where to start the next madvise.
> 
> Link: https://lkml.kernel.org/r/4f091776142f2ebf7b94018146de72318474e686.1647008754.git.quic_charante@quicinc.com

I thought it was still under discussion and Charan will post next
version along with previous patch
"mm: madvise: return correct bytes advised with process_madvise"

https://lore.kernel.org/linux-mm/7207b2f5-6b3e-aea4-aa1b-9c6d849abe34@quicinc.com/


> Fixes: ecb8ac8b1f14("mm/madvise: introduce process_madvise() syscall: an external memory hinting API")
> Signed-off-by: Charan Teja Kalla <quic_charante@quicinc.com>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Nadav Amit <nadav.amit@gmail.com>
> Cc: Stephen Rothwell <sfr@canb.auug.org.au>
> Cc: Suren Baghdasaryan <surenb@google.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
> 
>  mm/madvise.c |    9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> --- a/mm/madvise.c~mm-madvise-skip-unmapped-vma-holes-passed-to-process_madvise
> +++ a/mm/madvise.c
> @@ -1428,9 +1428,16 @@ SYSCALL_DEFINE5(process_madvise, int, pi
>  
>  	while (iov_iter_count(&iter)) {
>  		iovec = iov_iter_iovec(&iter);
> +		/*
> +		 * do_madvise returns ENOMEM if unmapped holes are present
> +		 * in the passed VMA. process_madvise() is expected to skip
> +		 * unmapped holes passed to it in the 'struct iovec' list
> +		 * and not fail because of them. Thus treat -ENOMEM return
> +		 * from do_madvise as valid and continue processing.
> +		 */
>  		ret = do_madvise(mm, (unsigned long)iovec.iov_base,
>  					iovec.iov_len, behavior);
> -		if (ret < 0)
> +		if (ret < 0 && ret != -ENOMEM)
>  			break;
>  		iov_iter_advance(&iter, iovec.iov_len);
>  	}
> _
Michal Hocko March 23, 2022, 8:28 a.m. UTC | #2
On Tue 22-03-22 17:24:58, Minchan Kim wrote:
> On Tue, Mar 22, 2022 at 02:46:48PM -0700, Andrew Morton wrote:
> > From: Charan Teja Kalla <quic_charante@quicinc.com>
> > Subject: mm: madvise: skip unmapped vma holes passed to process_madvise
> > 
> > The process_madvise() system call is expected to skip holes in vma passed
> > through 'struct iovec' vector list.  But do_madvise, which
> > process_madvise() calls for each vma, returns ENOMEM in case of unmapped
> > holes, despite the VMA is processed.
> > 
> > Thus process_madvise() should treat ENOMEM as expected and consider the
> > VMA passed to as processed and continue processing other vma's in the
> > vector list.  Returning -ENOMEM to user, despite the VMA is processed,
> > will be unable to figure out where to start the next madvise.
> > 
> > Link: https://lkml.kernel.org/r/4f091776142f2ebf7b94018146de72318474e686.1647008754.git.quic_charante@quicinc.com
> 
> I thought it was still under discussion and Charan will post next
> version along with previous patch
> "mm: madvise: return correct bytes advised with process_madvise"
> 
> https://lore.kernel.org/linux-mm/7207b2f5-6b3e-aea4-aa1b-9c6d849abe34@quicinc.com/

Yes, I am not even sure the new semantic is sensible[1]. We should discuss
that and see all the consequences. Changing the semantic of an existing
syscall is always tricky going back and forth is even worse.
Charan Teja Kalla March 23, 2022, 3:47 p.m. UTC | #3
On 3/23/2022 1:58 PM, Michal Hocko wrote:
> On Tue 22-03-22 17:24:58, Minchan Kim wrote:
>> On Tue, Mar 22, 2022 at 02:46:48PM -0700, Andrew Morton wrote:
>>> From: Charan Teja Kalla <quic_charante@quicinc.com>
>>> Subject: mm: madvise: skip unmapped vma holes passed to process_madvise
>>>
>>> The process_madvise() system call is expected to skip holes in vma passed
>>> through 'struct iovec' vector list.  But do_madvise, which
>>> process_madvise() calls for each vma, returns ENOMEM in case of unmapped
>>> holes, despite the VMA is processed.
>>>
>>> Thus process_madvise() should treat ENOMEM as expected and consider the
>>> VMA passed to as processed and continue processing other vma's in the
>>> vector list.  Returning -ENOMEM to user, despite the VMA is processed,
>>> will be unable to figure out where to start the next madvise.
>>>
>>> Link: https://lkml.kernel.org/r/4f091776142f2ebf7b94018146de72318474e686.1647008754.git.quic_charante@quicinc.com
>> I thought it was still under discussion and Charan will post next
>> version along with previous patch
>> "mm: madvise: return correct bytes advised with process_madvise"
>>
>> https://lore.kernel.org/linux-mm/7207b2f5-6b3e-aea4-aa1b-9c6d849abe34@quicinc.com/
> Yes, I am not even sure the new semantic is sensible[1]. We should discuss
> that and see all the consequences. Changing the semantic of an existing
> syscall is always tricky going back and forth is even worse.

Starting the discussion @
https://lore.kernel.org/linux-mm/cover.1648046642.git.quic_charante@quicinc.com/

Thanks,
Charan
diff mbox series

Patch

--- a/mm/madvise.c~mm-madvise-skip-unmapped-vma-holes-passed-to-process_madvise
+++ a/mm/madvise.c
@@ -1428,9 +1428,16 @@  SYSCALL_DEFINE5(process_madvise, int, pi
 
 	while (iov_iter_count(&iter)) {
 		iovec = iov_iter_iovec(&iter);
+		/*
+		 * do_madvise returns ENOMEM if unmapped holes are present
+		 * in the passed VMA. process_madvise() is expected to skip
+		 * unmapped holes passed to it in the 'struct iovec' list
+		 * and not fail because of them. Thus treat -ENOMEM return
+		 * from do_madvise as valid and continue processing.
+		 */
 		ret = do_madvise(mm, (unsigned long)iovec.iov_base,
 					iovec.iov_len, behavior);
-		if (ret < 0)
+		if (ret < 0 && ret != -ENOMEM)
 			break;
 		iov_iter_advance(&iter, iovec.iov_len);
 	}