diff mbox series

[1/1] mm: lock VMAs skipped by a failed queue_pages_range()

Message ID 20230918211608.3580629-1-surenb@google.com (mailing list archive)
State New
Headers show
Series [1/1] mm: lock VMAs skipped by a failed queue_pages_range() | expand

Commit Message

Suren Baghdasaryan Sept. 18, 2023, 9:16 p.m. UTC
When queue_pages_range() encounters an unmovable page, it terminates
its page walk. This walk, among other things, locks the VMAs in the range.
This termination might result in some VMAs being left unlock after
queue_pages_range() completes. Since do_mbind() continues to operate on
these VMAs despite the failure from queue_pages_range(), it will encounter
an unlocked VMA.
This mbind() behavior has been modified several times before and might
need some changes to either finish the page walk even in the presence
of unmovable pages or to error out immediately after the failure to
queue_pages_range(). However that requires more discussions, so to
fix the immediate issue, explicitly lock the VMAs in the range if
queue_pages_range() failed. The added condition does not save much
but is added for documentation purposes to understand when this extra
locking is needed.

Fixes: 49b0638502da ("mm: enable page walking API to lock vmas during the walk")
Reported-by: syzbot+b591856e0f0139f83023@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/000000000000f392a60604a65085@google.com/
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
 mm/mempolicy.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Hugh Dickins Sept. 18, 2023, 11:30 p.m. UTC | #1
On Mon, 18 Sep 2023, Suren Baghdasaryan wrote:

> When queue_pages_range() encounters an unmovable page, it terminates
> its page walk. This walk, among other things, locks the VMAs in the range.
> This termination might result in some VMAs being left unlock after
> queue_pages_range() completes. Since do_mbind() continues to operate on
> these VMAs despite the failure from queue_pages_range(), it will encounter
> an unlocked VMA.
> This mbind() behavior has been modified several times before and might
> need some changes to either finish the page walk even in the presence
> of unmovable pages or to error out immediately after the failure to
> queue_pages_range(). However that requires more discussions, so to
> fix the immediate issue, explicitly lock the VMAs in the range if
> queue_pages_range() failed. The added condition does not save much
> but is added for documentation purposes to understand when this extra
> locking is needed.
> 
> Fixes: 49b0638502da ("mm: enable page walking API to lock vmas during the walk")
> Reported-by: syzbot+b591856e0f0139f83023@syzkaller.appspotmail.com
> Closes: https://lore.kernel.org/all/000000000000f392a60604a65085@google.com/
> Signed-off-by: Suren Baghdasaryan <surenb@google.com>

Acked-by: Hugh Dickins <hughd@google.com>

> ---
>  mm/mempolicy.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index 42b5567e3773..cbc584e9b6ca 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -1342,6 +1342,9 @@ static long do_mbind(unsigned long start, unsigned long len,
>  	vma_iter_init(&vmi, mm, start);
>  	prev = vma_prev(&vmi);
>  	for_each_vma_range(vmi, vma, end) {
> +		/* If queue_pages_range failed then not all VMAs might be locked */
> +		if (ret)
> +			vma_start_write(vma);
>  		err = mbind_range(&vmi, vma, &prev, start, end, new);
>  		if (err)
>  			break;
> -- 
> 2.42.0.459.ge4e396fd5e-goog
Michal Hocko Sept. 19, 2023, 8:52 a.m. UTC | #2
On Mon 18-09-23 14:16:08, Suren Baghdasaryan wrote:
> When queue_pages_range() encounters an unmovable page, it terminates
> its page walk. This walk, among other things, locks the VMAs in the range.
> This termination might result in some VMAs being left unlock after
> queue_pages_range() completes. Since do_mbind() continues to operate on
> these VMAs despite the failure from queue_pages_range(), it will encounter
> an unlocked VMA.
> This mbind() behavior has been modified several times before and might
> need some changes to either finish the page walk even in the presence
> of unmovable pages or to error out immediately after the failure to
> queue_pages_range(). However that requires more discussions, so to
> fix the immediate issue, explicitly lock the VMAs in the range if
> queue_pages_range() failed. The added condition does not save much
> but is added for documentation purposes to understand when this extra
> locking is needed.

The semantic of the walk in this case is really clear as mud. I was
trying to reconstruct the whole picture and it really hurts... Then I
found http://lkml.kernel.org/r/CAHbLzkrmTaqBRmHVdE2kyW57Uoghqd_E+jAXC9cB5ofkhL-uvw@mail.gmail.com
and that helped a lot. Let's keep it a reference at least in the email
thread here for future.

> Fixes: 49b0638502da ("mm: enable page walking API to lock vmas during the walk")
> Reported-by: syzbot+b591856e0f0139f83023@syzkaller.appspotmail.com
> Closes: https://lore.kernel.org/all/000000000000f392a60604a65085@google.com/
> Signed-off-by: Suren Baghdasaryan <surenb@google.com>

I cannot say I like the patch (it looks like a potential double locking
unless you realize this lock is special) but considering this might be just
temporal I do not mind. 

Acked-by: Michal Hocko <mhocko@suse.com>

Thanks!

> ---
>  mm/mempolicy.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index 42b5567e3773..cbc584e9b6ca 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -1342,6 +1342,9 @@ static long do_mbind(unsigned long start, unsigned long len,
>  	vma_iter_init(&vmi, mm, start);
>  	prev = vma_prev(&vmi);
>  	for_each_vma_range(vmi, vma, end) {
> +		/* If queue_pages_range failed then not all VMAs might be locked */
> +		if (ret)
> +			vma_start_write(vma);
>  		err = mbind_range(&vmi, vma, &prev, start, end, new);
>  		if (err)
>  			break;
> -- 
> 2.42.0.459.ge4e396fd5e-goog
Yang Shi Sept. 19, 2023, 9:09 p.m. UTC | #3
On Tue, Sep 19, 2023 at 1:53 AM Michal Hocko <mhocko@suse.com> wrote:
>
> On Mon 18-09-23 14:16:08, Suren Baghdasaryan wrote:
> > When queue_pages_range() encounters an unmovable page, it terminates
> > its page walk. This walk, among other things, locks the VMAs in the range.
> > This termination might result in some VMAs being left unlock after
> > queue_pages_range() completes. Since do_mbind() continues to operate on
> > these VMAs despite the failure from queue_pages_range(), it will encounter
> > an unlocked VMA.
> > This mbind() behavior has been modified several times before and might
> > need some changes to either finish the page walk even in the presence
> > of unmovable pages or to error out immediately after the failure to
> > queue_pages_range(). However that requires more discussions, so to
> > fix the immediate issue, explicitly lock the VMAs in the range if
> > queue_pages_range() failed. The added condition does not save much
> > but is added for documentation purposes to understand when this extra
> > locking is needed.
>
> The semantic of the walk in this case is really clear as mud. I was
> trying to reconstruct the whole picture and it really hurts... Then I
> found http://lkml.kernel.org/r/CAHbLzkrmTaqBRmHVdE2kyW57Uoghqd_E+jAXC9cB5ofkhL-uvw@mail.gmail.com
> and that helped a lot. Let's keep it a reference at least in the email
> thread here for future.

FYI, I'm working on a fix for the regression mentioned in that series,
and Hugh has some clean up and enhancement for that too.

>
> > Fixes: 49b0638502da ("mm: enable page walking API to lock vmas during the walk")
> > Reported-by: syzbot+b591856e0f0139f83023@syzkaller.appspotmail.com
> > Closes: https://lore.kernel.org/all/000000000000f392a60604a65085@google.com/
> > Signed-off-by: Suren Baghdasaryan <surenb@google.com>
>
> I cannot say I like the patch (it looks like a potential double locking
> unless you realize this lock is special) but considering this might be just
> temporal I do not mind.
>
> Acked-by: Michal Hocko <mhocko@suse.com>
>
> Thanks!
>
> > ---
> >  mm/mempolicy.c | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> > index 42b5567e3773..cbc584e9b6ca 100644
> > --- a/mm/mempolicy.c
> > +++ b/mm/mempolicy.c
> > @@ -1342,6 +1342,9 @@ static long do_mbind(unsigned long start, unsigned long len,
> >       vma_iter_init(&vmi, mm, start);
> >       prev = vma_prev(&vmi);
> >       for_each_vma_range(vmi, vma, end) {
> > +             /* If queue_pages_range failed then not all VMAs might be locked */
> > +             if (ret)
> > +                     vma_start_write(vma);
> >               err = mbind_range(&vmi, vma, &prev, start, end, new);
> >               if (err)
> >                       break;
> > --
> > 2.42.0.459.ge4e396fd5e-goog
>
> --
> Michal Hocko
> SUSE Labs
diff mbox series

Patch

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 42b5567e3773..cbc584e9b6ca 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1342,6 +1342,9 @@  static long do_mbind(unsigned long start, unsigned long len,
 	vma_iter_init(&vmi, mm, start);
 	prev = vma_prev(&vmi);
 	for_each_vma_range(vmi, vma, end) {
+		/* If queue_pages_range failed then not all VMAs might be locked */
+		if (ret)
+			vma_start_write(vma);
 		err = mbind_range(&vmi, vma, &prev, start, end, new);
 		if (err)
 			break;