Message ID | 20190820004735.18518-1-atish.patra@wdc.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] RISC-V: Optimize tlb flush path. | expand |
On Mon, Aug 19, 2019 at 05:47:35PM -0700, Atish Patra wrote: > In RISC-V, tlb flush happens via SBI which is expensive. > If the target cpumask contains a local hartid, some cost > can be saved by issuing a local tlb flush as we do that > in OpenSBI anyways. There is also no need of SBI call if > cpumask is empty. > > Do a local flush first if current cpu is present in cpumask. > Invoke SBI call only if target cpumask contains any cpus > other than local cpu. Btw, you can use up your 70-ish chars per line for commit logs.. > + if (cpumask_test_cpu(cpuid, cmask)) { > + /* Save trap cost by issuing a local tlb flush here */ > + if ((start == 0 && size == -1) || (size > PAGE_SIZE)) > + local_flush_tlb_all(); > + else if (size == PAGE_SIZE) > + local_flush_tlb_page(start); > + } This looks a little odd to m and assumes we never pass a size smaller than PAGE_SIZE. Whule that is probably true, why not something like: if (size < PAGE_SIZE && size != -1) local_flush_tlb_page(start); else local_flush_tlb_all(); ?
On Aug 19 2019, "hch@infradead.org" <hch@infradead.org> wrote: > This looks a little odd to m and assumes we never pass a size smaller > than PAGE_SIZE. Whule that is probably true, why not something like: > > if (size < PAGE_SIZE && size != -1) ITYM size <= PAGE_SIZE. And since size is unsigned it cannot be == -1 at the same time. > local_flush_tlb_page(start); > else > local_flush_tlb_all(); > > ? Andreas.
On Tue, Aug 20, 2019 at 09:14:58AM +0200, Andreas Schwab wrote: > On Aug 19 2019, "hch@infradead.org" <hch@infradead.org> wrote: > > > This looks a little odd to m and assumes we never pass a size smaller > > than PAGE_SIZE. Whule that is probably true, why not something like: > > > > if (size < PAGE_SIZE && size != -1) > > ITYM size <= PAGE_SIZE. And since size is unsigned it cannot be == -1 > at the same time. Yes, the <= was obvious, that's what you get for hacking up a demo patch on the plan. And true for the -1. That being said I find the -1 convention rather annoying, a ULONG_MAX in the callers would be a lot more obvious.
On Aug 19 2019, Atish Patra <atish.patra@wdc.com> wrote: > @@ -42,20 +43,44 @@ static inline void flush_tlb_range(struct vm_area_struct *vma, > > #include <asm/sbi.h> > > -static inline void remote_sfence_vma(struct cpumask *cmask, unsigned long start, > - unsigned long size) > +static void __riscv_flush_tlb(struct cpumask *cmask, unsigned long start, > + unsigned long size) cmask can be const. > { > struct cpumask hmask; > + unsigned int hartid; > + unsigned int cpuid; > > cpumask_clear(&hmask); > + > + if (!cmask) { > + riscv_cpuid_to_hartid_mask(cpu_online_mask, &hmask); > + goto issue_sfence; > + } Wouldn't it make sense to fall through to doing a local flush here? if (!cmask) cmask = cpu_online_mask; Andreas.
On Tue, 2019-08-20 at 09:46 +0200, Andreas Schwab wrote: > On Aug 19 2019, Atish Patra <atish.patra@wdc.com> wrote: > > > @@ -42,20 +43,44 @@ static inline void flush_tlb_range(struct > > vm_area_struct *vma, > > > > #include <asm/sbi.h> > > > > -static inline void remote_sfence_vma(struct cpumask *cmask, > > unsigned long start, > > - unsigned long size) > > +static void __riscv_flush_tlb(struct cpumask *cmask, unsigned long > > start, > > + unsigned long size) > > cmask can be const. > Sure. Will update it. > > { > > struct cpumask hmask; > > + unsigned int hartid; > > + unsigned int cpuid; > > > > cpumask_clear(&hmask); > > + > > + if (!cmask) { > > + riscv_cpuid_to_hartid_mask(cpu_online_mask, &hmask); > > + goto issue_sfence; > > + } > > Wouldn't it make sense to fall through to doing a local flush here? > > if (!cmask) > cmask = cpu_online_mask; > cmask NULL is pretty common case and we would be unnecessarily executing bunch of instructions everytime while not saving much. Kernel still have to make an SBI call and OpenSBI is doing a local flush anyways. Looking at the code again, I think we can just use cpumask_weight and do local tlb flush only if local cpu is the only cpu present. Otherwise, it will just fall through and call sbi_remote_sfence_vma(). .... .... + + cpuid = get_cpu(); + if (!cmask) { + riscv_cpuid_to_hartid_mask(cpu_online_mask, &hmask); + goto issue_sfence; + } + + + if (cpumask_test_cpu(cpuid, cmask) && cpumask_weight(cmask) == 1) { + /* Save trap cost by issuing a local tlb flush here */ + if ((start == 0 && size == -1) || (size > PAGE_SIZE)) + local_flush_tlb_all(); + else if (size == PAGE_SIZE) + local_flush_tlb_page(start); + goto done; + } + riscv_cpuid_to_hartid_mask(cmask, &hmask); + +issue_sfence: sbi_remote_sfence_vma(hmask.bits, start, size); +done: + put_cpu(); } This is much simpler than what I had done in v2. I will address the if condition around size as well. Regards, Atish > Andreas. >
On Aug 20 2019, Atish Patra <Atish.Patra@wdc.com> wrote: > + > + cpuid = get_cpu(); > + if (!cmask) { > + riscv_cpuid_to_hartid_mask(cpu_online_mask, &hmask); > + goto issue_sfence; > + } > + > + > + if (cpumask_test_cpu(cpuid, cmask) && cpumask_weight(cmask) == > 1) { > + /* Save trap cost by issuing a local tlb flush here */ > + if ((start == 0 && size == -1) || (size > PAGE_SIZE)) > + local_flush_tlb_all(); > + else if (size == PAGE_SIZE) > + local_flush_tlb_page(start); > + goto done; > + } > + > riscv_cpuid_to_hartid_mask(cmask, &hmask); > + > +issue_sfence: > sbi_remote_sfence_vma(hmask.bits, start, size); > +done: > + put_cpu(); > } > > This is much simpler than what I had done in v2. I will address the if > condition around size as well. I still think that this function should be moved out of the header. Andreas.
On Tue, Aug 20, 2019 at 6:17 AM Atish Patra <atish.patra@wdc.com> wrote: > > In RISC-V, tlb flush happens via SBI which is expensive. > If the target cpumask contains a local hartid, some cost > can be saved by issuing a local tlb flush as we do that > in OpenSBI anyways. There is also no need of SBI call if > cpumask is empty. > > Do a local flush first if current cpu is present in cpumask. > Invoke SBI call only if target cpumask contains any cpus > other than local cpu. > > Signed-off-by: Atish Patra <atish.patra@wdc.com> > --- > arch/riscv/include/asm/tlbflush.h | 37 ++++++++++++++++++++++++++----- > 1 file changed, 31 insertions(+), 6 deletions(-) > > diff --git a/arch/riscv/include/asm/tlbflush.h b/arch/riscv/include/asm/tlbflush.h > index b5e64dc19b9e..3f9cd17b5402 100644 > --- a/arch/riscv/include/asm/tlbflush.h > +++ b/arch/riscv/include/asm/tlbflush.h > @@ -8,6 +8,7 @@ > #define _ASM_RISCV_TLBFLUSH_H > > #include <linux/mm_types.h> > +#include <linux/sched.h> > #include <asm/smp.h> > > /* > @@ -42,20 +43,44 @@ static inline void flush_tlb_range(struct vm_area_struct *vma, > > #include <asm/sbi.h> > > -static inline void remote_sfence_vma(struct cpumask *cmask, unsigned long start, > - unsigned long size) > +static void __riscv_flush_tlb(struct cpumask *cmask, unsigned long start, > + unsigned long size) > { > struct cpumask hmask; > + unsigned int hartid; > + unsigned int cpuid; > > cpumask_clear(&hmask); > + > + if (!cmask) { > + riscv_cpuid_to_hartid_mask(cpu_online_mask, &hmask); > + goto issue_sfence; > + } > + > + cpuid = get_cpu(); > + if (cpumask_test_cpu(cpuid, cmask)) { > + /* Save trap cost by issuing a local tlb flush here */ > + if ((start == 0 && size == -1) || (size > PAGE_SIZE)) > + local_flush_tlb_all(); > + else if (size == PAGE_SIZE) > + local_flush_tlb_page(start); > + } > + if (cpumask_any_but(cmask, cpuid) >= nr_cpu_ids) > + goto done; > + > riscv_cpuid_to_hartid_mask(cmask, &hmask); > + hartid = cpuid_to_hartid_map(cpuid); > + cpumask_clear_cpu(hartid, &hmask); > + > +issue_sfence: > sbi_remote_sfence_vma(hmask.bits, start, size); > +done: > + put_cpu(); > } > > -#define flush_tlb_all() sbi_remote_sfence_vma(NULL, 0, -1) > - > +#define flush_tlb_all() __riscv_flush_tlb(NULL, 0, -1) > #define flush_tlb_range(vma, start, end) \ > - remote_sfence_vma(mm_cpumask((vma)->vm_mm), start, (end) - (start)) > + __riscv_flush_tlb(mm_cpumask((vma)->vm_mm), start, (end) - (start)) > > static inline void flush_tlb_page(struct vm_area_struct *vma, > unsigned long addr) { > @@ -63,7 +88,7 @@ static inline void flush_tlb_page(struct vm_area_struct *vma, > } > > #define flush_tlb_mm(mm) \ > - remote_sfence_vma(mm_cpumask(mm), 0, -1) > + __riscv_flush_tlb(mm_cpumask(mm), 0, -1) > > #endif /* CONFIG_SMP */ > > -- > 2.21.0 > I think we should move __riscv_flush_tlb() to mm/tlbflush.c because it's quite big now. In future, we will also have __riscv_flush_tlb_asid() which will flush TLB based on ASID. Regards, Anup
On Tue, Aug 20, 2019 at 08:42:19AM +0000, Atish Patra wrote: > cmask NULL is pretty common case and we would be unnecessarily > executing bunch of instructions everytime while not saving much. Kernel > still have to make an SBI call and OpenSBI is doing a local flush > anyways. > > Looking at the code again, I think we can just use cpumask_weight and > do local tlb flush only if local cpu is the only cpu present. > > Otherwise, it will just fall through and call sbi_remote_sfence_vma(). Maybe it is just time to split the different cases at a higher level. The idea to multiple everything onto a single function always seemed odd to me. FYI, here is what I do for the IPI based tlbflush for the native S-mode clint prototype, which seems much easier to understand: http://git.infradead.org/users/hch/riscv.git/commitdiff/ea4067ae61e20fcfcf46a6f6bd1cc25710ce3afe
On Tue, 2019-08-20 at 02:22 -0700, hch@infradead.org wrote: > On Tue, Aug 20, 2019 at 08:42:19AM +0000, Atish Patra wrote: > > cmask NULL is pretty common case and we would be unnecessarily > > executing bunch of instructions everytime while not saving much. > > Kernel > > still have to make an SBI call and OpenSBI is doing a local flush > > anyways. > > > > Looking at the code again, I think we can just use cpumask_weight > > and > > do local tlb flush only if local cpu is the only cpu present. > > > > Otherwise, it will just fall through and call > > sbi_remote_sfence_vma(). > > Maybe it is just time to split the different cases at a higher level. > The idea to multiple everything onto a single function always seemed > odd to me. > > FYI, here is what I do for the IPI based tlbflush for the native S- > mode > clint prototype, which seems much easier to understand: > > http://git.infradead.org/users/hch/riscv.git/commitdiff/ea4067ae61e20fcfcf46a6f6bd1cc25710ce3afe This does seem a lot cleaner to me. We can reuse some of the code for this patch as well. Based on NATIVE_CLINT configuration, it will send an IPI or SBI call. I can rebase my patch on top of yours and I can send it together or you can include in your series. Let me know your preference.
On Tue, 2019-08-20 at 14:21 +0530, Anup Patel wrote: > On Tue, Aug 20, 2019 at 6:17 AM Atish Patra <atish.patra@wdc.com> > wrote: > > In RISC-V, tlb flush happens via SBI which is expensive. > > If the target cpumask contains a local hartid, some cost > > can be saved by issuing a local tlb flush as we do that > > in OpenSBI anyways. There is also no need of SBI call if > > cpumask is empty. > > > > Do a local flush first if current cpu is present in cpumask. > > Invoke SBI call only if target cpumask contains any cpus > > other than local cpu. > > > > Signed-off-by: Atish Patra <atish.patra@wdc.com> > > --- > > arch/riscv/include/asm/tlbflush.h | 37 ++++++++++++++++++++++++++- > > ---- > > 1 file changed, 31 insertions(+), 6 deletions(-) > > > > diff --git a/arch/riscv/include/asm/tlbflush.h > > b/arch/riscv/include/asm/tlbflush.h > > index b5e64dc19b9e..3f9cd17b5402 100644 > > --- a/arch/riscv/include/asm/tlbflush.h > > +++ b/arch/riscv/include/asm/tlbflush.h > > @@ -8,6 +8,7 @@ > > #define _ASM_RISCV_TLBFLUSH_H > > > > #include <linux/mm_types.h> > > +#include <linux/sched.h> > > #include <asm/smp.h> > > > > /* > > @@ -42,20 +43,44 @@ static inline void flush_tlb_range(struct > > vm_area_struct *vma, > > > > #include <asm/sbi.h> > > > > -static inline void remote_sfence_vma(struct cpumask *cmask, > > unsigned long start, > > - unsigned long size) > > +static void __riscv_flush_tlb(struct cpumask *cmask, unsigned long > > start, > > + unsigned long size) > > { > > struct cpumask hmask; > > + unsigned int hartid; > > + unsigned int cpuid; > > > > cpumask_clear(&hmask); > > + > > + if (!cmask) { > > + riscv_cpuid_to_hartid_mask(cpu_online_mask, > > &hmask); > > + goto issue_sfence; > > + } > > + > > + cpuid = get_cpu(); > > + if (cpumask_test_cpu(cpuid, cmask)) { > > + /* Save trap cost by issuing a local tlb flush here > > */ > > + if ((start == 0 && size == -1) || (size > > > PAGE_SIZE)) > > + local_flush_tlb_all(); > > + else if (size == PAGE_SIZE) > > + local_flush_tlb_page(start); > > + } > > + if (cpumask_any_but(cmask, cpuid) >= nr_cpu_ids) > > + goto done; > > + > > riscv_cpuid_to_hartid_mask(cmask, &hmask); > > + hartid = cpuid_to_hartid_map(cpuid); > > + cpumask_clear_cpu(hartid, &hmask); > > + > > +issue_sfence: > > sbi_remote_sfence_vma(hmask.bits, start, size); > > +done: > > + put_cpu(); > > } > > > > -#define flush_tlb_all() sbi_remote_sfence_vma(NULL, 0, -1) > > - > > +#define flush_tlb_all() __riscv_flush_tlb(NULL, 0, -1) > > #define flush_tlb_range(vma, start, end) \ > > - remote_sfence_vma(mm_cpumask((vma)->vm_mm), start, (end) - > > (start)) > > + __riscv_flush_tlb(mm_cpumask((vma)->vm_mm), start, (end) - > > (start)) > > > > static inline void flush_tlb_page(struct vm_area_struct *vma, > > unsigned long addr) { > > @@ -63,7 +88,7 @@ static inline void flush_tlb_page(struct > > vm_area_struct *vma, > > } > > > > #define flush_tlb_mm(mm) \ > > - remote_sfence_vma(mm_cpumask(mm), 0, -1) > > + __riscv_flush_tlb(mm_cpumask(mm), 0, -1) > > > > #endif /* CONFIG_SMP */ > > > > -- > > 2.21.0 > > > > I think we should move __riscv_flush_tlb() to mm/tlbflush.c because > it's quite > big now. > > In future, we will also have __riscv_flush_tlb_asid() which will > flush TLB based > on ASID. > Sounds good to me. Christoph has already mm/tlbflush.c in his mmu series. I will rebase on top of it. > Regards, > Anup
On Tue, Aug 20, 2019 at 08:28:36PM +0000, Atish Patra wrote: > > http://git.infradead.org/users/hch/riscv.git/commitdiff/ea4067ae61e20fcfcf46a6f6bd1cc25710ce3afe > > This does seem a lot cleaner to me. We can reuse some of the code for > this patch as well. Based on NATIVE_CLINT configuration, it will send > an IPI or SBI call. > > I can rebase my patch on top of yours and I can send it together or you > can include in your series. > > Let me know your preference. I think the native clint for S-mode will need more discussion, so you should not wait for it.
On Tue, 2019-08-20 at 15:18 -0700, hch@infradead.org wrote: > CAUTION: This email originated from outside of Western Digital. Do > not click on links or open attachments unless you recognize the > sender and know that the content is safe. > > > On Tue, Aug 20, 2019 at 08:28:36PM +0000, Atish Patra wrote: > > > http://git.infradead.org/users/hch/riscv.git/commitdiff/ea4067ae61e20fcfcf46a6f6bd1cc25710ce3afe > > > > This does seem a lot cleaner to me. We can reuse some of the code > > for > > this patch as well. Based on NATIVE_CLINT configuration, it will > > send > > an IPI or SBI call. > > > > I can rebase my patch on top of yours and I can send it together or > > you > > can include in your series. > > > > Let me know your preference. > > I think the native clint for S-mode will need more discussion, so you > should not wait for it. Ok sure. I will move the code to tlbflush.c and refactor the tlb flush functions similar to what you have in your patch.
On Tue, Aug 20, 2019 at 08:28:36PM +0000, Atish Patra wrote: > On Tue, 2019-08-20 at 02:22 -0700, hch@infradead.org wrote: > > On Tue, Aug 20, 2019 at 08:42:19AM +0000, Atish Patra wrote: > > > cmask NULL is pretty common case and we would be unnecessarily > > > executing bunch of instructions everytime while not saving much. > > > Kernel > > > still have to make an SBI call and OpenSBI is doing a local flush > > > anyways. > > > > > > Looking at the code again, I think we can just use cpumask_weight > > > and > > > do local tlb flush only if local cpu is the only cpu present. > > > > > > Otherwise, it will just fall through and call > > > sbi_remote_sfence_vma(). > > > > Maybe it is just time to split the different cases at a higher level. > > The idea to multiple everything onto a single function always seemed > > odd to me. > > > > FYI, here is what I do for the IPI based tlbflush for the native S- > > mode > > clint prototype, which seems much easier to understand: > > > > http://git.infradead.org/users/hch/riscv.git/commitdiff/ea4067ae61e20fcfcf46a6f6bd1cc25710ce3afe > > This does seem a lot cleaner to me. We can reuse some of the code for > this patch as well. Based on NATIVE_CLINT configuration, it will send > an IPI or SBI call. IMHO, this approach should be avoided because CLINT is compatible to but not mandatory in the privileged spec. In other words, it is possible that a Linux-capable RISC-V platform does not contain a CLINT component but rely on some other mechanism to deal with SW/timer interrupts. > > I can rebase my patch on top of yours and I can send it together or you > can include in your series. > > Let me know your preference. > > -- > Regards, > Atish Best, Alan
On Wed, Aug 21, 2019 at 09:29:22AM +0800, Alan Kao wrote: > IMHO, this approach should be avoided because CLINT is compatible to but > not mandatory in the privileged spec. In other words, it is possible that > a Linux-capable RISC-V platform does not contain a CLINT component but > rely on some other mechanism to deal with SW/timer interrupts. Hi Alan, at this point the above is just a prototype showing the performance improvement if we can inject IPIs and timer interrups directly from S-mode and delivered directly to S-mode. It is based on a copy of the clint IPI block currently used by SiFive, qemu, Ariane and Kendryte. If the experiment works out (which I think it does), I'd like to define interfaces for the unix platform spec to make something like this available. My current plan for that is to have one DT node each for the IPI registers, timer cmp and time val register each as MMIO regions. This would fit the current clint block but also allow other register layouts. Is that something you'd be fine with? If not do you have another proposal? (note that eventually the dicussion should move to the unix platform spec list, but now that I have you here we can at least brain storm a bit).
On Wed, Aug 21, 2019 at 7:10 AM hch@infradead.org <hch@infradead.org> wrote: > > On Wed, Aug 21, 2019 at 09:29:22AM +0800, Alan Kao wrote: > > IMHO, this approach should be avoided because CLINT is compatible to but > > not mandatory in the privileged spec. In other words, it is possible that > > a Linux-capable RISC-V platform does not contain a CLINT component but > > rely on some other mechanism to deal with SW/timer interrupts. > > Hi Alan, > > at this point the above is just a prototype showing the performance > improvement if we can inject IPIs and timer interrups directly from > S-mode and delivered directly to S-mode. It is based on a copy of > the clint IPI block currently used by SiFive, qemu, Ariane and Kendryte. > > If the experiment works out (which I think it does), I'd like to > define interfaces for the unix platform spec to make something like > this available. My current plan for that is to have one DT node > each for the IPI registers, timer cmp and time val register each > as MMIO regions. This would fit the current clint block but also > allow other register layouts. Is that something you'd be fine with? > If not do you have another proposal? (note that eventually the > dicussion should move to the unix platform spec list, but now that > I have you here we can at least brain storm a bit). I agree that IPI mechanism should be standardized for RISC-V but I don't support the idea of mandating CLINT as part of the UNIX platform spec. For example, the AndesTech SOC does not use CLINT instead they have PLMT for per-HART timer and PLICSW for per-HART IPIs. IMHO, we can also think of: RISC-V Timer Extension - For per-HART timer access to M-mode and S-mode RISC-V IPI Extension - HART IPI injection Regards, Anup
On Wed, Aug 21, 2019 at 09:22:48AM +0530, Anup Patel wrote: > I agree that IPI mechanism should be standardized for RISC-V but I > don't support the idea of mandating CLINT as part of the UNIX > platform spec. For example, the AndesTech SOC does not use CLINT > instead they have PLMT for per-HART timer and PLICSW for per-HART > IPIs. The point is not really mandating a CLINT as know right now. The point is to mandate one way to issue IPIs from S-mode to S-mode, one way to read the time counter and one way to write the timer threshold.
On Tue, Aug 20, 2019 at 08:29:47PM +0000, Atish Patra wrote: > Sounds good to me. Christoph has already mm/tlbflush.c in his mmu > series. I will rebase on top of it. It was't really intended for the nommu series but for the native clint prototype. But the nommu series grew so many cleanups and micro-optimizations by now that I think I'll split those into another prep series and will include a version of this.
Btw, for the next version it also might make sense to do one optimization at a time. E.g. the empty cpumask one as the first patch, the local cpu directly one next, and the threshold based full flush as a third.
On Wed, 2019-08-21 at 07:45 -0700, hch@infradead.org wrote: > Btw, for the next version it also might make sense to do one > optimization at a time. E.g. the empty cpumask one as the > first patch, the local cpu directly one next, and the threshold > based full flush as a third. ok sure. I will refactor my optimization patch and remove the base patch(moving the functions from header to tlbflush.c) as you have already sent it out.
diff --git a/arch/riscv/include/asm/tlbflush.h b/arch/riscv/include/asm/tlbflush.h index b5e64dc19b9e..3f9cd17b5402 100644 --- a/arch/riscv/include/asm/tlbflush.h +++ b/arch/riscv/include/asm/tlbflush.h @@ -8,6 +8,7 @@ #define _ASM_RISCV_TLBFLUSH_H #include <linux/mm_types.h> +#include <linux/sched.h> #include <asm/smp.h> /* @@ -42,20 +43,44 @@ static inline void flush_tlb_range(struct vm_area_struct *vma, #include <asm/sbi.h> -static inline void remote_sfence_vma(struct cpumask *cmask, unsigned long start, - unsigned long size) +static void __riscv_flush_tlb(struct cpumask *cmask, unsigned long start, + unsigned long size) { struct cpumask hmask; + unsigned int hartid; + unsigned int cpuid; cpumask_clear(&hmask); + + if (!cmask) { + riscv_cpuid_to_hartid_mask(cpu_online_mask, &hmask); + goto issue_sfence; + } + + cpuid = get_cpu(); + if (cpumask_test_cpu(cpuid, cmask)) { + /* Save trap cost by issuing a local tlb flush here */ + if ((start == 0 && size == -1) || (size > PAGE_SIZE)) + local_flush_tlb_all(); + else if (size == PAGE_SIZE) + local_flush_tlb_page(start); + } + if (cpumask_any_but(cmask, cpuid) >= nr_cpu_ids) + goto done; + riscv_cpuid_to_hartid_mask(cmask, &hmask); + hartid = cpuid_to_hartid_map(cpuid); + cpumask_clear_cpu(hartid, &hmask); + +issue_sfence: sbi_remote_sfence_vma(hmask.bits, start, size); +done: + put_cpu(); } -#define flush_tlb_all() sbi_remote_sfence_vma(NULL, 0, -1) - +#define flush_tlb_all() __riscv_flush_tlb(NULL, 0, -1) #define flush_tlb_range(vma, start, end) \ - remote_sfence_vma(mm_cpumask((vma)->vm_mm), start, (end) - (start)) + __riscv_flush_tlb(mm_cpumask((vma)->vm_mm), start, (end) - (start)) static inline void flush_tlb_page(struct vm_area_struct *vma, unsigned long addr) { @@ -63,7 +88,7 @@ static inline void flush_tlb_page(struct vm_area_struct *vma, } #define flush_tlb_mm(mm) \ - remote_sfence_vma(mm_cpumask(mm), 0, -1) + __riscv_flush_tlb(mm_cpumask(mm), 0, -1) #endif /* CONFIG_SMP */
In RISC-V, tlb flush happens via SBI which is expensive. If the target cpumask contains a local hartid, some cost can be saved by issuing a local tlb flush as we do that in OpenSBI anyways. There is also no need of SBI call if cpumask is empty. Do a local flush first if current cpu is present in cpumask. Invoke SBI call only if target cpumask contains any cpus other than local cpu. Signed-off-by: Atish Patra <atish.patra@wdc.com> --- arch/riscv/include/asm/tlbflush.h | 37 ++++++++++++++++++++++++++----- 1 file changed, 31 insertions(+), 6 deletions(-)