diff mbox series

[v3,1/3] RISC-V: Issue a local tlbflush if possible.

Message ID 20190822004644.25829-2-atish.patra@wdc.com (mailing list archive)
State New, archived
Headers show
Series Optimize tlbflush path | expand

Commit Message

Atish Patra Aug. 22, 2019, 12:46 a.m. UTC
In RISC-V, tlb flush happens via SBI which is expensive. If the local
cpu is the only cpu in cpumask, there is no need to invoke a SBI call.

Just do a local flush and return.

Signed-off-by: Atish Patra <atish.patra@wdc.com>
---
 arch/riscv/mm/tlbflush.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

Comments

Christoph Hellwig Aug. 22, 2019, 1:46 a.m. UTC | #1
On Wed, Aug 21, 2019 at 05:46:42PM -0700, Atish Patra wrote:
> In RISC-V, tlb flush happens via SBI which is expensive. If the local
> cpu is the only cpu in cpumask, there is no need to invoke a SBI call.
> 
> Just do a local flush and return.
> 
> Signed-off-by: Atish Patra <atish.patra@wdc.com>
> ---
>  arch/riscv/mm/tlbflush.c | 15 +++++++++++++++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
> index df93b26f1b9d..36430ee3bed9 100644
> --- a/arch/riscv/mm/tlbflush.c
> +++ b/arch/riscv/mm/tlbflush.c
> @@ -2,6 +2,7 @@
>  
>  #include <linux/mm.h>
>  #include <linux/smp.h>
> +#include <linux/sched.h>
>  #include <asm/sbi.h>
>  
>  void flush_tlb_all(void)
> @@ -13,9 +14,23 @@ static void __sbi_tlb_flush_range(struct cpumask *cmask, unsigned long start,
>  		unsigned long size)
>  {
>  	struct cpumask hmask;
> +	unsigned int cpuid = get_cpu();
>  
> +	if (!cmask) {
> +		riscv_cpuid_to_hartid_mask(cpu_online_mask, &hmask);
> +		goto issue_sfence;
> +	}
> +
> +	if (cpumask_test_cpu(cpuid, cmask) && cpumask_weight(cmask) == 1) {
> +		local_flush_tlb_all();
> +		goto done;
> +	}

I think a single core on a SMP kernel is a valid enough use case given
how litte distros still have UP kernels.  So Maybe this shiuld rather be:

	if (!cmask)
		cmask = cpu_online_mask;

	if (cpumask_test_cpu(cpuid, cmask) && cpumask_weight(cmask) == 1) {
		local_flush_tlb_all();
	} else {
	 	riscv_cpuid_to_hartid_mask(cmask, &hmask);
	  	sbi_remote_sfence_vma(hmask.bits, start, size);
	}
Atish Patra Aug. 22, 2019, 4:01 a.m. UTC | #2
On Thu, 2019-08-22 at 03:46 +0200, Christoph Hellwig wrote:
> On Wed, Aug 21, 2019 at 05:46:42PM -0700, Atish Patra wrote:
> > In RISC-V, tlb flush happens via SBI which is expensive. If the
> > local
> > cpu is the only cpu in cpumask, there is no need to invoke a SBI
> > call.
> > 
> > Just do a local flush and return.
> > 
> > Signed-off-by: Atish Patra <atish.patra@wdc.com>
> > ---
> >  arch/riscv/mm/tlbflush.c | 15 +++++++++++++++
> >  1 file changed, 15 insertions(+)
> > 
> > diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
> > index df93b26f1b9d..36430ee3bed9 100644
> > --- a/arch/riscv/mm/tlbflush.c
> > +++ b/arch/riscv/mm/tlbflush.c
> > @@ -2,6 +2,7 @@
> >  
> >  #include <linux/mm.h>
> >  #include <linux/smp.h>
> > +#include <linux/sched.h>
> >  #include <asm/sbi.h>
> >  
> >  void flush_tlb_all(void)
> > @@ -13,9 +14,23 @@ static void __sbi_tlb_flush_range(struct cpumask
> > *cmask, unsigned long start,
> >  		unsigned long size)
> >  {
> >  	struct cpumask hmask;
> > +	unsigned int cpuid = get_cpu();
> >  
> > +	if (!cmask) {
> > +		riscv_cpuid_to_hartid_mask(cpu_online_mask, &hmask);
> > +		goto issue_sfence;
> > +	}
> > +
> > +	if (cpumask_test_cpu(cpuid, cmask) && cpumask_weight(cmask) ==
> > 1) {
> > +		local_flush_tlb_all();
> > +		goto done;
> > +	}
> 
> I think a single core on a SMP kernel is a valid enough use case
> given
> how litte distros still have UP kernels.  So Maybe this shiuld rather
> be:
> 
> 	if (!cmask)
> 		cmask = cpu_online_mask;
> 
> 	if (cpumask_test_cpu(cpuid, cmask) && cpumask_weight(cmask) ==
> 1) {
> 		local_flush_tlb_all();
> 	} else {
> 	 	riscv_cpuid_to_hartid_mask(cmask, &hmask);
> 	  	sbi_remote_sfence_vma(hmask.bits, start, size);
> 	}

The downside of this is that for every !cmask case in true SMP (more
common probably) it will execute 2 extra cpumask instructions. As
tlbflush path is in performance critical path, I think we should favor
more common case (SMP with more than 1 core).

Thoughts ?
Christoph Hellwig Aug. 22, 2019, 4:27 a.m. UTC | #3
On Thu, Aug 22, 2019 at 04:01:24AM +0000, Atish Patra wrote:
> The downside of this is that for every !cmask case in true SMP (more
> common probably) it will execute 2 extra cpumask instructions. As
> tlbflush path is in performance critical path, I think we should favor
> more common case (SMP with more than 1 core).

Actually, looking at both the current mainline code, and the code from my
cleanups tree I don't think remote_sfence_vma / __sbi_tlb_flush_range
can ever be called with  NULL cpumask, as we always have a valid mm.

So this is a bit of a moot point, and we can drop andling that case
entirely.  With that we can also use a simple if / else for the local
cpu only vs remote case.  Btw, what was the reason you didn't like
using cpumask_any_but like x86, which should be more efficient than
cpumask_test_cpu + hweigt?
Atish Patra Aug. 22, 2019, 5:39 a.m. UTC | #4
On Thu, 2019-08-22 at 06:27 +0200, hch@lst.de wrote:
> On Thu, Aug 22, 2019 at 04:01:24AM +0000, Atish Patra wrote:
> > The downside of this is that for every !cmask case in true SMP
> > (more
> > common probably) it will execute 2 extra cpumask instructions. As
> > tlbflush path is in performance critical path, I think we should
> > favor
> > more common case (SMP with more than 1 core).
> 
> Actually, looking at both the current mainline code, and the code
> from my
> cleanups tree I don't think remote_sfence_vma / __sbi_tlb_flush_range
> can ever be called with  NULL cpumask, as we always have a valid mm.
> 

Yes. You are correct.

As both cpumask functions here will crash if cpumask is null, we should
probably leave a harmless comment to warn the consequeunce of cpumask
being null.

> So this is a bit of a moot point, and we can drop andling that case
> entirely.  With that we can also use a simple if / else for the local
> cpu only vs remote case. 

Done.

>  Btw, what was the reason you didn't like
> using cpumask_any_but like x86, which should be more efficient than
> cpumask_test_cpu + hweigt?

I had it in v2 patch but removed as it can potentially return garbage
value if cpumask is empty. 

However, we are already checking empty cpumask before the local cpu
check. I will replace cpumask_test_cpu + hweight with
cpumask_any_but().
diff mbox series

Patch

diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
index df93b26f1b9d..36430ee3bed9 100644
--- a/arch/riscv/mm/tlbflush.c
+++ b/arch/riscv/mm/tlbflush.c
@@ -2,6 +2,7 @@ 
 
 #include <linux/mm.h>
 #include <linux/smp.h>
+#include <linux/sched.h>
 #include <asm/sbi.h>
 
 void flush_tlb_all(void)
@@ -13,9 +14,23 @@  static void __sbi_tlb_flush_range(struct cpumask *cmask, unsigned long start,
 		unsigned long size)
 {
 	struct cpumask hmask;
+	unsigned int cpuid = get_cpu();
 
+	if (!cmask) {
+		riscv_cpuid_to_hartid_mask(cpu_online_mask, &hmask);
+		goto issue_sfence;
+	}
+
+	if (cpumask_test_cpu(cpuid, cmask) && cpumask_weight(cmask) == 1) {
+		local_flush_tlb_all();
+		goto done;
+	}
 	riscv_cpuid_to_hartid_mask(cmask, &hmask);
+
+issue_sfence:
 	sbi_remote_sfence_vma(hmask.bits, start, size);
+done:
+	put_cpu();
 }
 
 void flush_tlb_mm(struct mm_struct *mm)