diff mbox series

[RFC,RESEND,07/28] kernel/fork: mark VMAs as locked before copying pages during fork

Message ID 20220901173516.702122-8-surenb@google.com (mailing list archive)
State New
Headers show
Series per-VMA locks proposal | expand

Commit Message

Suren Baghdasaryan Sept. 1, 2022, 5:34 p.m. UTC
Protect VMAs from concurrent page fault handler while performing
copy_page_range for VMAs having VM_WIPEONFORK flag set.

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
 kernel/fork.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Laurent Dufour Sept. 6, 2022, 2:37 p.m. UTC | #1
Le 01/09/2022 à 19:34, Suren Baghdasaryan a écrit :
> Protect VMAs from concurrent page fault handler while performing
> copy_page_range for VMAs having VM_WIPEONFORK flag set.

I'm wondering why is that necessary.
The copied mm is write locked, and the destination one is not reachable.
If any other readers are using the VMA, this is only for page fault handling.
I should have miss something because I can't see any need to mark the lock
VMA here.

> Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> ---
>  kernel/fork.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/fork.c b/kernel/fork.c
> index bfab31ecd11e..1872ad549fed 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -709,8 +709,10 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
>  		rb_parent = &tmp->vm_rb;
>  
>  		mm->map_count++;
> -		if (!(tmp->vm_flags & VM_WIPEONFORK))
> +		if (!(tmp->vm_flags & VM_WIPEONFORK)) {
> +			vma_mark_locked(mpnt);
>  			retval = copy_page_range(tmp, mpnt);
> +		}
>  
>  		if (tmp->vm_ops && tmp->vm_ops->open)
>  			tmp->vm_ops->open(tmp);
Suren Baghdasaryan Sept. 8, 2022, 11:57 p.m. UTC | #2
On Tue, Sep 6, 2022 at 7:38 AM Laurent Dufour <ldufour@linux.ibm.com> wrote:
>
> Le 01/09/2022 à 19:34, Suren Baghdasaryan a écrit :
> > Protect VMAs from concurrent page fault handler while performing
> > copy_page_range for VMAs having VM_WIPEONFORK flag set.
>
> I'm wondering why is that necessary.
> The copied mm is write locked, and the destination one is not reachable.
> If any other readers are using the VMA, this is only for page fault handling.

Correct, this is done to prevent page faulting in the VMA being
duplicated. I assume we want to prevent the pages in that VMA from
changing when we are calling copy_page_range(). Am I wrong?

> I should have miss something because I can't see any need to mark the lock
> VMA here.
>
> > Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> > ---
> >  kernel/fork.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/fork.c b/kernel/fork.c
> > index bfab31ecd11e..1872ad549fed 100644
> > --- a/kernel/fork.c
> > +++ b/kernel/fork.c
> > @@ -709,8 +709,10 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
> >               rb_parent = &tmp->vm_rb;
> >
> >               mm->map_count++;
> > -             if (!(tmp->vm_flags & VM_WIPEONFORK))
> > +             if (!(tmp->vm_flags & VM_WIPEONFORK)) {
> > +                     vma_mark_locked(mpnt);
> >                       retval = copy_page_range(tmp, mpnt);
> > +             }
> >
> >               if (tmp->vm_ops && tmp->vm_ops->open)
> >                       tmp->vm_ops->open(tmp);
>
Laurent Dufour Sept. 9, 2022, 1:27 p.m. UTC | #3
Le 09/09/2022 à 01:57, Suren Baghdasaryan a écrit :
> On Tue, Sep 6, 2022 at 7:38 AM Laurent Dufour <ldufour@linux.ibm.com> wrote:
>>
>> Le 01/09/2022 à 19:34, Suren Baghdasaryan a écrit :
>>> Protect VMAs from concurrent page fault handler while performing
>>> copy_page_range for VMAs having VM_WIPEONFORK flag set.
>>
>> I'm wondering why is that necessary.
>> The copied mm is write locked, and the destination one is not reachable.
>> If any other readers are using the VMA, this is only for page fault handling.
> 
> Correct, this is done to prevent page faulting in the VMA being
> duplicated. I assume we want to prevent the pages in that VMA from
> changing when we are calling copy_page_range(). Am I wrong?

If a page is faulted while copy_page_range() is in progress, the page may
not be backed on the child side (PTE lock should protect the copy, isn't it).
Is that a real problem? It will be backed later if accessed on the child side.
Maybe the per process pages accounting could be incorrect...

> 
>> I should have miss something because I can't see any need to mark the lock
>> VMA here.
>>
>>> Signed-off-by: Suren Baghdasaryan <surenb@google.com>
>>> ---
>>>  kernel/fork.c | 4 +++-
>>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/kernel/fork.c b/kernel/fork.c
>>> index bfab31ecd11e..1872ad549fed 100644
>>> --- a/kernel/fork.c
>>> +++ b/kernel/fork.c
>>> @@ -709,8 +709,10 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
>>>               rb_parent = &tmp->vm_rb;
>>>
>>>               mm->map_count++;
>>> -             if (!(tmp->vm_flags & VM_WIPEONFORK))
>>> +             if (!(tmp->vm_flags & VM_WIPEONFORK)) {
>>> +                     vma_mark_locked(mpnt);
>>>                       retval = copy_page_range(tmp, mpnt);
>>> +             }
>>>
>>>               if (tmp->vm_ops && tmp->vm_ops->open)
>>>                       tmp->vm_ops->open(tmp);
>>
Suren Baghdasaryan Sept. 9, 2022, 4:29 p.m. UTC | #4
On Fri, Sep 9, 2022 at 6:27 AM Laurent Dufour <ldufour@linux.ibm.com> wrote:
>
> Le 09/09/2022 à 01:57, Suren Baghdasaryan a écrit :
> > On Tue, Sep 6, 2022 at 7:38 AM Laurent Dufour <ldufour@linux.ibm.com> wrote:
> >>
> >> Le 01/09/2022 à 19:34, Suren Baghdasaryan a écrit :
> >>> Protect VMAs from concurrent page fault handler while performing
> >>> copy_page_range for VMAs having VM_WIPEONFORK flag set.
> >>
> >> I'm wondering why is that necessary.
> >> The copied mm is write locked, and the destination one is not reachable.
> >> If any other readers are using the VMA, this is only for page fault handling.
> >
> > Correct, this is done to prevent page faulting in the VMA being
> > duplicated. I assume we want to prevent the pages in that VMA from
> > changing when we are calling copy_page_range(). Am I wrong?
>
> If a page is faulted while copy_page_range() is in progress, the page may
> not be backed on the child side (PTE lock should protect the copy, isn't it).
> Is that a real problem? It will be backed later if accessed on the child side.
> Maybe the per process pages accounting could be incorrect...

This feels to me like walking on the edge. Maybe we can discuss this
with more people at LPC before trying it?

>
> >
> >> I should have miss something because I can't see any need to mark the lock
> >> VMA here.
> >>
> >>> Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> >>> ---
> >>>  kernel/fork.c | 4 +++-
> >>>  1 file changed, 3 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/kernel/fork.c b/kernel/fork.c
> >>> index bfab31ecd11e..1872ad549fed 100644
> >>> --- a/kernel/fork.c
> >>> +++ b/kernel/fork.c
> >>> @@ -709,8 +709,10 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
> >>>               rb_parent = &tmp->vm_rb;
> >>>
> >>>               mm->map_count++;
> >>> -             if (!(tmp->vm_flags & VM_WIPEONFORK))
> >>> +             if (!(tmp->vm_flags & VM_WIPEONFORK)) {
> >>> +                     vma_mark_locked(mpnt);
> >>>                       retval = copy_page_range(tmp, mpnt);
> >>> +             }
> >>>
> >>>               if (tmp->vm_ops && tmp->vm_ops->open)
> >>>                       tmp->vm_ops->open(tmp);
> >>
>
diff mbox series

Patch

diff --git a/kernel/fork.c b/kernel/fork.c
index bfab31ecd11e..1872ad549fed 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -709,8 +709,10 @@  static __latent_entropy int dup_mmap(struct mm_struct *mm,
 		rb_parent = &tmp->vm_rb;
 
 		mm->map_count++;
-		if (!(tmp->vm_flags & VM_WIPEONFORK))
+		if (!(tmp->vm_flags & VM_WIPEONFORK)) {
+			vma_mark_locked(mpnt);
 			retval = copy_page_range(tmp, mpnt);
+		}
 
 		if (tmp->vm_ops && tmp->vm_ops->open)
 			tmp->vm_ops->open(tmp);