diff mbox series

[06/25] mm/arm64: Use mm_fault_accounting()

Message ID 20200615221607.7764-7-peterx@redhat.com (mailing list archive)
State New, archived
Headers show
Series None | expand

Commit Message

Peter Xu June 15, 2020, 10:15 p.m. UTC
Use the new mm_fault_accounting() helper for page fault accounting.

CC: Catalin Marinas <catalin.marinas@arm.com>
CC: Will Deacon <will@kernel.org>
CC: linux-arm-kernel@lists.infradead.org
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 arch/arm64/mm/fault.c | 17 ++---------------
 1 file changed, 2 insertions(+), 15 deletions(-)

Comments

Will Deacon June 16, 2020, 7:43 a.m. UTC | #1
On Mon, Jun 15, 2020 at 06:15:48PM -0400, Peter Xu wrote:
> Use the new mm_fault_accounting() helper for page fault accounting.
> 
> CC: Catalin Marinas <catalin.marinas@arm.com>
> CC: Will Deacon <will@kernel.org>
> CC: linux-arm-kernel@lists.infradead.org
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  arch/arm64/mm/fault.c | 17 ++---------------
>  1 file changed, 2 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> index c9cedc0432d2..09af7d7a60ec 100644
> --- a/arch/arm64/mm/fault.c
> +++ b/arch/arm64/mm/fault.c
> @@ -484,8 +484,6 @@ static int __kprobes do_page_fault(unsigned long addr, unsigned int esr,
>  					 addr, esr, regs);
>  	}
>  
> -	perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, addr);
> -
>  	/*
>  	 * As per x86, we may deadlock here. However, since the kernel only
>  	 * validly references user space from well defined areas of the code,
> @@ -535,20 +533,9 @@ static int __kprobes do_page_fault(unsigned long addr, unsigned int esr,
>  			      VM_FAULT_BADACCESS)))) {
>  		/*
>  		 * Major/minor page fault accounting is only done
> -		 * once. If we go through a retry, it is extremely
> -		 * likely that the page will be found in page cache at
> -		 * that point.
> +		 * once.
>  		 */
> -		if (major) {
> -			current->maj_flt++;
> -			perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ, 1, regs,
> -				      addr);
> -		} else {
> -			current->min_flt++;
> -			perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MIN, 1, regs,
> -				      addr);
> -		}
> -
> +		mm_fault_accounting(current, regs, address, major);

Please can you explain why it's ok to move the PERF_COUNT_SW_PAGE_FAULTS
update like this? Seems like a user-visible change to me, so some
justification would really help.

Will
Peter Xu June 16, 2020, 3:59 p.m. UTC | #2
Hi, Will,

On Tue, Jun 16, 2020 at 08:43:08AM +0100, Will Deacon wrote:
> Please can you explain why it's ok to move the PERF_COUNT_SW_PAGE_FAULTS
> update like this? Seems like a user-visible change to me, so some
> justification would really help.

Indeed this could be a functional change for PERF_COUNT_SW_PAGE_FAULTS on some
archs, e.g., for arm64, PERF_COUNT_SW_PAGE_FAULTS previously will also contain
accounting of severe errors where we go into the "no_context" path.  However if
you see the other archs, it's not always true, for example, the xtensa arch
only accounts the correctly handled faults (arch/xtensa/mm/fault.c).

After I thought about this, I don't think it's extremely useful (or please
correct me if I missed something important) to use PERF_COUNT_SW_PAGE_FAULTS
for fatal error accountings among all those correct ones.  After all they are
really extremely rare cases, and even if we got a sigbus for a process, we'll
normally got something dumped in dmesg so if we really want to capture the
error cases there should always be a better way (because by following things
like dmesg we can not only know how many error faults triggered, but also on
the details of the errors).

IOW, my understanding of users of PERF_COUNT_SW_PAGE_FAULTS is that they want
to trap normal/correct page faults, not really care about rare errors.

Then when I went back to think PERF_COUNT_SW_PAGE_FAULTS, it's really about:

  A=PERF_COUNT_SW_PAGE_FAULTS 
  B=PERF_COUNT_SW_PAGE_FAULTS_MAJ
  C=PERF_COUNT_SW_PAGE_FAULTS_MIN

And:

  A=B+C

If that's the case (which is simple and clear), it's really helpful too that we
unify this definition across all the architectures, then it'll also be easier
for us to provide some helper like mm_fault_accounting() so that the accounting
can be managed in the general code rather than in arch-specific ways.

Thanks,
diff mbox series

Patch

diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index c9cedc0432d2..09af7d7a60ec 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -484,8 +484,6 @@  static int __kprobes do_page_fault(unsigned long addr, unsigned int esr,
 					 addr, esr, regs);
 	}
 
-	perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, addr);
-
 	/*
 	 * As per x86, we may deadlock here. However, since the kernel only
 	 * validly references user space from well defined areas of the code,
@@ -535,20 +533,9 @@  static int __kprobes do_page_fault(unsigned long addr, unsigned int esr,
 			      VM_FAULT_BADACCESS)))) {
 		/*
 		 * Major/minor page fault accounting is only done
-		 * once. If we go through a retry, it is extremely
-		 * likely that the page will be found in page cache at
-		 * that point.
+		 * once.
 		 */
-		if (major) {
-			current->maj_flt++;
-			perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ, 1, regs,
-				      addr);
-		} else {
-			current->min_flt++;
-			perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MIN, 1, regs,
-				      addr);
-		}
-
+		mm_fault_accounting(current, regs, address, major);
 		return 0;
 	}