Message ID | 20220322214648.AB7A1C340EC@smtp.kernel.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [001/227] linux/kthread.h: remove unused macros | expand |
On Tue, Mar 22, 2022 at 02:46:48PM -0700, Andrew Morton wrote: > From: Charan Teja Kalla <quic_charante@quicinc.com> > Subject: mm: madvise: skip unmapped vma holes passed to process_madvise > > The process_madvise() system call is expected to skip holes in vma passed > through 'struct iovec' vector list. But do_madvise, which > process_madvise() calls for each vma, returns ENOMEM in case of unmapped > holes, despite the VMA is processed. > > Thus process_madvise() should treat ENOMEM as expected and consider the > VMA passed to as processed and continue processing other vma's in the > vector list. Returning -ENOMEM to user, despite the VMA is processed, > will be unable to figure out where to start the next madvise. > > Link: https://lkml.kernel.org/r/4f091776142f2ebf7b94018146de72318474e686.1647008754.git.quic_charante@quicinc.com I thought it was still under discussion and Charan will post next version along with previous patch "mm: madvise: return correct bytes advised with process_madvise" https://lore.kernel.org/linux-mm/7207b2f5-6b3e-aea4-aa1b-9c6d849abe34@quicinc.com/ > Fixes: ecb8ac8b1f14("mm/madvise: introduce process_madvise() syscall: an external memory hinting API") > Signed-off-by: Charan Teja Kalla <quic_charante@quicinc.com> > Cc: David Rientjes <rientjes@google.com> > Cc: Michal Hocko <mhocko@suse.com> > Cc: Minchan Kim <minchan@kernel.org> > Cc: Nadav Amit <nadav.amit@gmail.com> > Cc: Stephen Rothwell <sfr@canb.auug.org.au> > Cc: Suren Baghdasaryan <surenb@google.com> > Cc: Vlastimil Babka <vbabka@suse.cz> > Cc: <stable@vger.kernel.org> > Signed-off-by: Andrew Morton <akpm@linux-foundation.org> > --- > > mm/madvise.c | 9 ++++++++- > 1 file changed, 8 insertions(+), 1 deletion(-) > > --- a/mm/madvise.c~mm-madvise-skip-unmapped-vma-holes-passed-to-process_madvise > +++ a/mm/madvise.c > @@ -1428,9 +1428,16 @@ SYSCALL_DEFINE5(process_madvise, int, pi > > while (iov_iter_count(&iter)) { > iovec = iov_iter_iovec(&iter); > + /* > + * do_madvise returns ENOMEM if unmapped holes are present > + * in the passed VMA. process_madvise() is expected to skip > + * unmapped holes passed to it in the 'struct iovec' list > + * and not fail because of them. Thus treat -ENOMEM return > + * from do_madvise as valid and continue processing. > + */ > ret = do_madvise(mm, (unsigned long)iovec.iov_base, > iovec.iov_len, behavior); > - if (ret < 0) > + if (ret < 0 && ret != -ENOMEM) > break; > iov_iter_advance(&iter, iovec.iov_len); > } > _
On Tue 22-03-22 17:24:58, Minchan Kim wrote: > On Tue, Mar 22, 2022 at 02:46:48PM -0700, Andrew Morton wrote: > > From: Charan Teja Kalla <quic_charante@quicinc.com> > > Subject: mm: madvise: skip unmapped vma holes passed to process_madvise > > > > The process_madvise() system call is expected to skip holes in vma passed > > through 'struct iovec' vector list. But do_madvise, which > > process_madvise() calls for each vma, returns ENOMEM in case of unmapped > > holes, despite the VMA is processed. > > > > Thus process_madvise() should treat ENOMEM as expected and consider the > > VMA passed to as processed and continue processing other vma's in the > > vector list. Returning -ENOMEM to user, despite the VMA is processed, > > will be unable to figure out where to start the next madvise. > > > > Link: https://lkml.kernel.org/r/4f091776142f2ebf7b94018146de72318474e686.1647008754.git.quic_charante@quicinc.com > > I thought it was still under discussion and Charan will post next > version along with previous patch > "mm: madvise: return correct bytes advised with process_madvise" > > https://lore.kernel.org/linux-mm/7207b2f5-6b3e-aea4-aa1b-9c6d849abe34@quicinc.com/ Yes, I am not even sure the new semantic is sensible[1]. We should discuss that and see all the consequences. Changing the semantic of an existing syscall is always tricky going back and forth is even worse.
On 3/23/2022 1:58 PM, Michal Hocko wrote: > On Tue 22-03-22 17:24:58, Minchan Kim wrote: >> On Tue, Mar 22, 2022 at 02:46:48PM -0700, Andrew Morton wrote: >>> From: Charan Teja Kalla <quic_charante@quicinc.com> >>> Subject: mm: madvise: skip unmapped vma holes passed to process_madvise >>> >>> The process_madvise() system call is expected to skip holes in vma passed >>> through 'struct iovec' vector list. But do_madvise, which >>> process_madvise() calls for each vma, returns ENOMEM in case of unmapped >>> holes, despite the VMA is processed. >>> >>> Thus process_madvise() should treat ENOMEM as expected and consider the >>> VMA passed to as processed and continue processing other vma's in the >>> vector list. Returning -ENOMEM to user, despite the VMA is processed, >>> will be unable to figure out where to start the next madvise. >>> >>> Link: https://lkml.kernel.org/r/4f091776142f2ebf7b94018146de72318474e686.1647008754.git.quic_charante@quicinc.com >> I thought it was still under discussion and Charan will post next >> version along with previous patch >> "mm: madvise: return correct bytes advised with process_madvise" >> >> https://lore.kernel.org/linux-mm/7207b2f5-6b3e-aea4-aa1b-9c6d849abe34@quicinc.com/ > Yes, I am not even sure the new semantic is sensible[1]. We should discuss > that and see all the consequences. Changing the semantic of an existing > syscall is always tricky going back and forth is even worse. Starting the discussion @ https://lore.kernel.org/linux-mm/cover.1648046642.git.quic_charante@quicinc.com/ Thanks, Charan
--- a/mm/madvise.c~mm-madvise-skip-unmapped-vma-holes-passed-to-process_madvise +++ a/mm/madvise.c @@ -1428,9 +1428,16 @@ SYSCALL_DEFINE5(process_madvise, int, pi while (iov_iter_count(&iter)) { iovec = iov_iter_iovec(&iter); + /* + * do_madvise returns ENOMEM if unmapped holes are present + * in the passed VMA. process_madvise() is expected to skip + * unmapped holes passed to it in the 'struct iovec' list + * and not fail because of them. Thus treat -ENOMEM return + * from do_madvise as valid and continue processing. + */ ret = do_madvise(mm, (unsigned long)iovec.iov_base, iovec.iov_len, behavior); - if (ret < 0) + if (ret < 0 && ret != -ENOMEM) break; iov_iter_advance(&iter, iovec.iov_len); }