diff mbox series

[1/1] mm: do not increment pgfault stats when page fault handler retries

Message ID 20230414175444.1837474-1-surenb@google.com (mailing list archive)
State New
Headers show
Series [1/1] mm: do not increment pgfault stats when page fault handler retries | expand

Commit Message

Suren Baghdasaryan April 14, 2023, 5:54 p.m. UTC
If the page fault handler requests a retry, we will count the fault
multiple times.  This is a relatively harmless problem as the retry paths
are not often requested, and the only user-visible problem is that the
fault counter will be slightly higher than it should be.  Nevertheless,
userspace only took one fault, and should not see the fact that the
kernel had to retry the fault multiple times.

Fixes: 6b4c9f446981 ("filemap: drop the mmap_sem for all blocking operations")
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
Patch applies cleanly over linux-next and mm-unstable

 mm/memory.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

Comments

Matthew Wilcox April 14, 2023, 6:11 p.m. UTC | #1
On Fri, Apr 14, 2023 at 10:54:44AM -0700, Suren Baghdasaryan wrote:
> If the page fault handler requests a retry, we will count the fault
> multiple times.  This is a relatively harmless problem as the retry paths
> are not often requested, and the only user-visible problem is that the
> fault counter will be slightly higher than it should be.  Nevertheless,
> userspace only took one fault, and should not see the fact that the
> kernel had to retry the fault multiple times.
> 
> Fixes: 6b4c9f446981 ("filemap: drop the mmap_sem for all blocking operations")

I know I suggested this fixes line, but I think it's actually been
here much longer, perhaps since

Fixes: d065bd810b6d ("mm: retry page fault when blocking on disk transfer")

Michel, what do you think?

> Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
> Patch applies cleanly over linux-next and mm-unstable
> 
>  mm/memory.c | 16 ++++++++++------
>  1 file changed, 10 insertions(+), 6 deletions(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index 1c5b231fe6e3..d88f370eacd1 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -5212,17 +5212,16 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address,
>  
>  	__set_current_state(TASK_RUNNING);
>  
> -	count_vm_event(PGFAULT);
> -	count_memcg_event_mm(vma->vm_mm, PGFAULT);
> -
>  	ret = sanitize_fault_flags(vma, &flags);
>  	if (ret)
> -		return ret;
> +		goto out;
>  
>  	if (!arch_vma_access_permitted(vma, flags & FAULT_FLAG_WRITE,
>  					    flags & FAULT_FLAG_INSTRUCTION,
> -					    flags & FAULT_FLAG_REMOTE))
> -		return VM_FAULT_SIGSEGV;
> +					    flags & FAULT_FLAG_REMOTE)) {
> +		ret = VM_FAULT_SIGSEGV;
> +		goto out;
> +	}
>  
>  	/*
>  	 * Enable the memcg OOM handling for faults triggered in user
> @@ -5253,6 +5252,11 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address,
>  	}
>  
>  	mm_account_fault(regs, address, flags, ret);
> +out:
> +	if (!(ret & VM_FAULT_RETRY)) {
> +		count_vm_event(PGFAULT);
> +		count_memcg_event_mm(vma->vm_mm, PGFAULT);
> +	}
>  
>  	return ret;
>  }
> -- 
> 2.40.0.634.g4ca3ef3211-goog
>
Peter Xu April 14, 2023, 9:47 p.m. UTC | #2
Hi, Suren,

On Fri, Apr 14, 2023 at 10:54:44AM -0700, Suren Baghdasaryan wrote:
> If the page fault handler requests a retry, we will count the fault
> multiple times.  This is a relatively harmless problem as the retry paths
> are not often requested, and the only user-visible problem is that the
> fault counter will be slightly higher than it should be.  Nevertheless,
> userspace only took one fault, and should not see the fact that the
> kernel had to retry the fault multiple times.
> 
> Fixes: 6b4c9f446981 ("filemap: drop the mmap_sem for all blocking operations")
> Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
> Patch applies cleanly over linux-next and mm-unstable
> 
>  mm/memory.c | 16 ++++++++++------
>  1 file changed, 10 insertions(+), 6 deletions(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index 1c5b231fe6e3..d88f370eacd1 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -5212,17 +5212,16 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address,
>  
>  	__set_current_state(TASK_RUNNING);
>  
> -	count_vm_event(PGFAULT);
> -	count_memcg_event_mm(vma->vm_mm, PGFAULT);
> -
>  	ret = sanitize_fault_flags(vma, &flags);
>  	if (ret)
> -		return ret;
> +		goto out;
>  
>  	if (!arch_vma_access_permitted(vma, flags & FAULT_FLAG_WRITE,
>  					    flags & FAULT_FLAG_INSTRUCTION,
> -					    flags & FAULT_FLAG_REMOTE))
> -		return VM_FAULT_SIGSEGV;
> +					    flags & FAULT_FLAG_REMOTE)) {
> +		ret = VM_FAULT_SIGSEGV;
> +		goto out;
> +	}
>  
>  	/*
>  	 * Enable the memcg OOM handling for faults triggered in user
> @@ -5253,6 +5252,11 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address,
>  	}
>  
>  	mm_account_fault(regs, address, flags, ret);

Here is the mm_account_fault() function taking care of some other
accountings.  Perhaps good to put things into it?

It also already ignores invalid faults:

	if (ret & (VM_FAULT_ERROR | VM_FAULT_RETRY))
		return;

I see that you may also want to account for sigbus, however I really don't
know why.  Explanations would be great when it would matter.  So far it
makes sense to me if we skip both RETRY or ERROR cases.

> +out:
> +	if (!(ret & VM_FAULT_RETRY)) {
> +		count_vm_event(PGFAULT);
> +		count_memcg_event_mm(vma->vm_mm, PGFAULT);

There is one thing worth noticing is here vma may or may not be valid
depending on the retval of the fault.

RETRY is exactly one of the cases that accessing vma may be unsafe due to
releasing of mmap read lock.  The other one is the recently added
VM_FAULT_COMPLETE.  So if we want to move this chunk (or any vma reference)
to be later we need to consider a valid vma / mm being there first, or
we're prone to accessing a vma that has already been released, I think.

> +	}
>  
>  	return ret;
>  }
> -- 
> 2.40.0.634.g4ca3ef3211-goog
> 
> 

Thanks,
Suren Baghdasaryan April 14, 2023, 10:14 p.m. UTC | #3
On Fri, Apr 14, 2023 at 2:47 PM Peter Xu <peterx@redhat.com> wrote:
>
> Hi, Suren,

Hi Peter,

>
> On Fri, Apr 14, 2023 at 10:54:44AM -0700, Suren Baghdasaryan wrote:
> > If the page fault handler requests a retry, we will count the fault
> > multiple times.  This is a relatively harmless problem as the retry paths
> > are not often requested, and the only user-visible problem is that the
> > fault counter will be slightly higher than it should be.  Nevertheless,
> > userspace only took one fault, and should not see the fact that the
> > kernel had to retry the fault multiple times.
> >
> > Fixes: 6b4c9f446981 ("filemap: drop the mmap_sem for all blocking operations")
> > Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> > Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> > ---
> > Patch applies cleanly over linux-next and mm-unstable
> >
> >  mm/memory.c | 16 ++++++++++------
> >  1 file changed, 10 insertions(+), 6 deletions(-)
> >
> > diff --git a/mm/memory.c b/mm/memory.c
> > index 1c5b231fe6e3..d88f370eacd1 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -5212,17 +5212,16 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address,
> >
> >       __set_current_state(TASK_RUNNING);
> >
> > -     count_vm_event(PGFAULT);
> > -     count_memcg_event_mm(vma->vm_mm, PGFAULT);
> > -
> >       ret = sanitize_fault_flags(vma, &flags);
> >       if (ret)
> > -             return ret;
> > +             goto out;
> >
> >       if (!arch_vma_access_permitted(vma, flags & FAULT_FLAG_WRITE,
> >                                           flags & FAULT_FLAG_INSTRUCTION,
> > -                                         flags & FAULT_FLAG_REMOTE))
> > -             return VM_FAULT_SIGSEGV;
> > +                                         flags & FAULT_FLAG_REMOTE)) {
> > +             ret = VM_FAULT_SIGSEGV;
> > +             goto out;
> > +     }
> >
> >       /*
> >        * Enable the memcg OOM handling for faults triggered in user
> > @@ -5253,6 +5252,11 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address,
> >       }
> >
> >       mm_account_fault(regs, address, flags, ret);
>
> Here is the mm_account_fault() function taking care of some other
> accountings.  Perhaps good to put things into it?

That seems appropriate. Let me take a closer look.

>
> It also already ignores invalid faults:
>
>         if (ret & (VM_FAULT_ERROR | VM_FAULT_RETRY))
>                 return;

Can there be a case of (!VM_FAULT_ERROR && VM_FAULT_RETRY) - basically
we need to retry but no errors happened? If so then this condition
would double-count pagefaults in such cases. If such return code is
impossible then it's the same as checking for VM_FAULT_RETRY.

>
> I see that you may also want to account for sigbus, however I really don't
> know why.  Explanations would be great when it would matter.  So far it
> makes sense to me if we skip both RETRY or ERROR cases.

Accounting in case of a sigbus is not affected by this patch I think.
We account for sigbus or any other error cases because there was a
pagefault and we need to account for it. Whether we failed to handle
it or not should not affect the count. We skip the retry case because
we know the same fault will be retried. If we don't skip then we will
double-count this fault.

>
> > +out:
> > +     if (!(ret & VM_FAULT_RETRY)) {
> > +             count_vm_event(PGFAULT);
> > +             count_memcg_event_mm(vma->vm_mm, PGFAULT);
>
> There is one thing worth noticing is here vma may or may not be valid
> depending on the retval of the fault.
>
> RETRY is exactly one of the cases that accessing vma may be unsafe due to
> releasing of mmap read lock.  The other one is the recently added
> VM_FAULT_COMPLETE.  So if we want to move this chunk (or any vma reference)
> to be later we need to consider a valid vma / mm being there first, or
> we're prone to accessing a vma that has already been released, I think.

Good catch! I think you are right and I should have stored vma->vm_mm
in the beginning and used it when calling count_memcg_event_mm().
I'll prepare a new patch which handles this correctly.
Thanks,
Suren.

>
> > +     }
> >
> >       return ret;
> >  }
> > --
> > 2.40.0.634.g4ca3ef3211-goog
> >
> >
>
> Thanks,
>
> --
> Peter Xu
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
>
Suren Baghdasaryan April 14, 2023, 10:26 p.m. UTC | #4
On Fri, Apr 14, 2023 at 3:14 PM Suren Baghdasaryan <surenb@google.com> wrote:
>
> On Fri, Apr 14, 2023 at 2:47 PM Peter Xu <peterx@redhat.com> wrote:
> >
> > Hi, Suren,
>
> Hi Peter,
>
> >
> > On Fri, Apr 14, 2023 at 10:54:44AM -0700, Suren Baghdasaryan wrote:
> > > If the page fault handler requests a retry, we will count the fault
> > > multiple times.  This is a relatively harmless problem as the retry paths
> > > are not often requested, and the only user-visible problem is that the
> > > fault counter will be slightly higher than it should be.  Nevertheless,
> > > userspace only took one fault, and should not see the fact that the
> > > kernel had to retry the fault multiple times.
> > >
> > > Fixes: 6b4c9f446981 ("filemap: drop the mmap_sem for all blocking operations")
> > > Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> > > Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> > > ---
> > > Patch applies cleanly over linux-next and mm-unstable
> > >
> > >  mm/memory.c | 16 ++++++++++------
> > >  1 file changed, 10 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/mm/memory.c b/mm/memory.c
> > > index 1c5b231fe6e3..d88f370eacd1 100644
> > > --- a/mm/memory.c
> > > +++ b/mm/memory.c
> > > @@ -5212,17 +5212,16 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address,
> > >
> > >       __set_current_state(TASK_RUNNING);
> > >
> > > -     count_vm_event(PGFAULT);
> > > -     count_memcg_event_mm(vma->vm_mm, PGFAULT);
> > > -
> > >       ret = sanitize_fault_flags(vma, &flags);
> > >       if (ret)
> > > -             return ret;
> > > +             goto out;
> > >
> > >       if (!arch_vma_access_permitted(vma, flags & FAULT_FLAG_WRITE,
> > >                                           flags & FAULT_FLAG_INSTRUCTION,
> > > -                                         flags & FAULT_FLAG_REMOTE))
> > > -             return VM_FAULT_SIGSEGV;
> > > +                                         flags & FAULT_FLAG_REMOTE)) {
> > > +             ret = VM_FAULT_SIGSEGV;
> > > +             goto out;
> > > +     }
> > >
> > >       /*
> > >        * Enable the memcg OOM handling for faults triggered in user
> > > @@ -5253,6 +5252,11 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address,
> > >       }
> > >
> > >       mm_account_fault(regs, address, flags, ret);
> >
> > Here is the mm_account_fault() function taking care of some other
> > accountings.  Perhaps good to put things into it?
>
> That seems appropriate. Let me take a closer look.
>
> >
> > It also already ignores invalid faults:
> >
> >         if (ret & (VM_FAULT_ERROR | VM_FAULT_RETRY))
> >                 return;
>
> Can there be a case of (!VM_FAULT_ERROR && VM_FAULT_RETRY) - basically
> we need to retry but no errors happened? If so then this condition
> would double-count pagefaults in such cases. If such return code is
> impossible then it's the same as checking for VM_FAULT_RETRY.
>
> >
> > I see that you may also want to account for sigbus, however I really don't
> > know why.  Explanations would be great when it would matter.  So far it
> > makes sense to me if we skip both RETRY or ERROR cases.
>
> Accounting in case of a sigbus is not affected by this patch I think.
> We account for sigbus or any other error cases because there was a
> pagefault and we need to account for it. Whether we failed to handle
> it or not should not affect the count. We skip the retry case because
> we know the same fault will be retried. If we don't skip then we will
> double-count this fault.

mm_account_fault() has a nice comment explaining why it skips errors
and that now makes sense to me. Let me move the accounting there and
see if others agree that's the right place.

>
> >
> > > +out:
> > > +     if (!(ret & VM_FAULT_RETRY)) {
> > > +             count_vm_event(PGFAULT);
> > > +             count_memcg_event_mm(vma->vm_mm, PGFAULT);
> >
> > There is one thing worth noticing is here vma may or may not be valid
> > depending on the retval of the fault.
> >
> > RETRY is exactly one of the cases that accessing vma may be unsafe due to
> > releasing of mmap read lock.  The other one is the recently added
> > VM_FAULT_COMPLETE.  So if we want to move this chunk (or any vma reference)
> > to be later we need to consider a valid vma / mm being there first, or
> > we're prone to accessing a vma that has already been released, I think.
>
> Good catch! I think you are right and I should have stored vma->vm_mm
> in the beginning and used it when calling count_memcg_event_mm().
> I'll prepare a new patch which handles this correctly.
> Thanks,
> Suren.
>
> >
> > > +     }
> > >
> > >       return ret;
> > >  }
> > > --
> > > 2.40.0.634.g4ca3ef3211-goog
> > >
> > >
> >
> > Thanks,
> >
> > --
> > Peter Xu
> >
> > --
> > To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
> >
Peter Xu April 14, 2023, 10:34 p.m. UTC | #5
Hi, Suren,

On Fri, Apr 14, 2023 at 03:14:23PM -0700, Suren Baghdasaryan wrote:
> > It also already ignores invalid faults:
> >
> >         if (ret & (VM_FAULT_ERROR | VM_FAULT_RETRY))
> >                 return;
> 
> Can there be a case of (!VM_FAULT_ERROR && VM_FAULT_RETRY) - basically
> we need to retry but no errors happened? If so then this condition
> would double-count pagefaults in such cases.

If ret==VM_FAULT_RETRY it should return here already, so I assume
mm_account_fault() itself is fine regarding fault retries?

Note that I think "ret & (VM_FAULT_ERROR | VM_FAULT_RETRY)" above means
"either ERROR or RETRY we'll skip the accounting".

IMHO we should have 3 cases here:

  - ERROR && !RETRY
    error triggered of any kind

  - RETRY && !ERROR
    we need to try one more time

  - !RETRY && !ERROR
    we finished the fault

I don't think ERROR & RETRY can even be set at the same time so I assume
there's no option 4) - a RETRY should imply no ERROR already, even though
it's still incomplete so need another attempt.

Thanks,
Suren Baghdasaryan April 14, 2023, 11:49 p.m. UTC | #6
On Fri, Apr 14, 2023 at 3:35 PM Peter Xu <peterx@redhat.com> wrote:
>
> Hi, Suren,
>
> On Fri, Apr 14, 2023 at 03:14:23PM -0700, Suren Baghdasaryan wrote:
> > > It also already ignores invalid faults:
> > >
> > >         if (ret & (VM_FAULT_ERROR | VM_FAULT_RETRY))
> > >                 return;
> >
> > Can there be a case of (!VM_FAULT_ERROR && VM_FAULT_RETRY) - basically
> > we need to retry but no errors happened? If so then this condition
> > would double-count pagefaults in such cases.
>
> If ret==VM_FAULT_RETRY it should return here already, so I assume
> mm_account_fault() itself is fine regarding fault retries?
>
> Note that I think "ret & (VM_FAULT_ERROR | VM_FAULT_RETRY)" above means
> "either ERROR or RETRY we'll skip the accounting".
>
> IMHO we should have 3 cases here:
>
>   - ERROR && !RETRY
>     error triggered of any kind
>
>   - RETRY && !ERROR
>     we need to try one more time
>
>   - !RETRY && !ERROR
>     we finished the fault

After looking some more into mm_account_fault(), I think it would be
fine to count the faults which produced errors. IIUC these counters
represent the total number of faults, not the number of valid and
successful faults. If so then I think simply using VM_FAULT_RETRY
should be ok without considering all possible combinations. WDYT?

>
> I don't think ERROR & RETRY can even be set at the same time so I assume
> there's no option 4) - a RETRY should imply no ERROR already, even though
> it's still incomplete so need another attempt.
>
> Thanks,
>
> --
> Peter Xu
>
Suren Baghdasaryan April 15, 2023, 12:11 a.m. UTC | #7
On Fri, Apr 14, 2023 at 4:49 PM Suren Baghdasaryan <surenb@google.com> wrote:
>
> On Fri, Apr 14, 2023 at 3:35 PM Peter Xu <peterx@redhat.com> wrote:
> >
> > Hi, Suren,
> >
> > On Fri, Apr 14, 2023 at 03:14:23PM -0700, Suren Baghdasaryan wrote:
> > > > It also already ignores invalid faults:
> > > >
> > > >         if (ret & (VM_FAULT_ERROR | VM_FAULT_RETRY))
> > > >                 return;
> > >
> > > Can there be a case of (!VM_FAULT_ERROR && VM_FAULT_RETRY) - basically
> > > we need to retry but no errors happened? If so then this condition
> > > would double-count pagefaults in such cases.
> >
> > If ret==VM_FAULT_RETRY it should return here already, so I assume
> > mm_account_fault() itself is fine regarding fault retries?
> >
> > Note that I think "ret & (VM_FAULT_ERROR | VM_FAULT_RETRY)" above means
> > "either ERROR or RETRY we'll skip the accounting".
> >
> > IMHO we should have 3 cases here:
> >
> >   - ERROR && !RETRY
> >     error triggered of any kind
> >
> >   - RETRY && !ERROR
> >     we need to try one more time
> >
> >   - !RETRY && !ERROR
> >     we finished the fault
>
> After looking some more into mm_account_fault(), I think it would be
> fine to count the faults which produced errors. IIUC these counters
> represent the total number of faults, not the number of valid and
> successful faults. If so then I think simply using VM_FAULT_RETRY
> should be ok without considering all possible combinations. WDYT?

I posted v2 at https://lore.kernel.org/all/20230415000818.1955007-1-surenb@google.com/
Hopefully it's closer to what we want it to be.

>
> >
> > I don't think ERROR & RETRY can even be set at the same time so I assume
> > there's no option 4) - a RETRY should imply no ERROR already, even though
> > it's still incomplete so need another attempt.
> >
> > Thanks,
> >
> > --
> > Peter Xu
> >
diff mbox series

Patch

diff --git a/mm/memory.c b/mm/memory.c
index 1c5b231fe6e3..d88f370eacd1 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -5212,17 +5212,16 @@  vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address,
 
 	__set_current_state(TASK_RUNNING);
 
-	count_vm_event(PGFAULT);
-	count_memcg_event_mm(vma->vm_mm, PGFAULT);
-
 	ret = sanitize_fault_flags(vma, &flags);
 	if (ret)
-		return ret;
+		goto out;
 
 	if (!arch_vma_access_permitted(vma, flags & FAULT_FLAG_WRITE,
 					    flags & FAULT_FLAG_INSTRUCTION,
-					    flags & FAULT_FLAG_REMOTE))
-		return VM_FAULT_SIGSEGV;
+					    flags & FAULT_FLAG_REMOTE)) {
+		ret = VM_FAULT_SIGSEGV;
+		goto out;
+	}
 
 	/*
 	 * Enable the memcg OOM handling for faults triggered in user
@@ -5253,6 +5252,11 @@  vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address,
 	}
 
 	mm_account_fault(regs, address, flags, ret);
+out:
+	if (!(ret & VM_FAULT_RETRY)) {
+		count_vm_event(PGFAULT);
+		count_memcg_event_mm(vma->vm_mm, PGFAULT);
+	}
 
 	return ret;
 }