mbox series

[v4,0/3] Optimize tlbflush path

Message ID 20190822075151.24838-1-atish.patra@wdc.com (mailing list archive)
Headers show
Series Optimize tlbflush path | expand

Message

Atish Patra Aug. 22, 2019, 7:51 a.m. UTC
This series adds few optimizations to reduce the trap cost in the tlb
flush path. We should only make SBI calls to remote tlb flush only if
absolutely required. 

This series is based on Christoph's series:

http://lists.infradead.org/pipermail/linux-riscv/2019-August/006148.html

Changes from v3->v4
1. Simplify the local vs remote usecase.
2. Reorder the patches in the series.

Changes from v2->v3:
1. Split the patches into smaller one per optimization.

Atish Patra (3):
RISC-V: Do not invoke SBI call if cpumask is empty
RISC-V: Issue a local tlbflush if possible.
RISC-V: Issue a tlb page flush if possible

arch/riscv/mm/tlbflush.c | 25 +++++++++++++++++++++++--
1 file changed, 23 insertions(+), 2 deletions(-)

--
2.21.0

Comments

Paul Walmsley Aug. 31, 2019, 2:50 a.m. UTC | #1
Hi Atish,

On Thu, 22 Aug 2019, Atish Patra wrote:

> This series adds few optimizations to reduce the trap cost in the tlb
> flush path. We should only make SBI calls to remote tlb flush only if
> absolutely required. 

The patches look great.  My understanding is that these optimization 
patches may actually be a partial workaround for the TLB flushing bug that 
we've been looking at for the last month or so, which can corrupt memory 
or crash the system.

If that's the case, let's first root-cause the underlying bug.  Otherwise 
we'll just be papering over the actual issue, which probably could still 
occur even with this series, correct?  Since it contains no explicit 
fixes?


- Paul
Atish Patra Sept. 28, 2019, 4:23 a.m. UTC | #2
On Fri, 2019-08-30 at 19:50 -0700, Paul Walmsley wrote:
> Hi Atish,
> 
> On Thu, 22 Aug 2019, Atish Patra wrote:
> 
> > This series adds few optimizations to reduce the trap cost in the
> > tlb
> > flush path. We should only make SBI calls to remote tlb flush only
> > if
> > absolutely required. 
> 
> The patches look great.  My understanding is that these optimization 
> patches may actually be a partial workaround for the TLB flushing bug
> that 
> we've been looking at for the last month or so, which can corrupt
> memory 
> or crash the system.
> 
> If that's the case, let's first root-cause the underlying
> bug.  Otherwise 
> we'll just be papering over the actual issue, which probably could
> still 
> occur even with this series, correct?  Since it contains no explicit 
> fixes?
> 
> 
I have verified the glibc locale install issue both in Qemu and
Unleashed. I don't see any issue with OpenSBI master + Linux v5.3
kernel.

As per our investigation, it looks like a hardware errata with
Unleashed board as the memory corruption issue only happens in case of
tlb range flush. In RISC-V, sfence.vma can only be issued at page
boundary. If the range is larger than that, OpenSBI has to issue
multiple sfence.vma calls back to back leading to possible memory
corruption.

Currently, OpenSBI has a platform feature i.e. "tlb_range_flush_limit"
that allows to configure tlb flush threshold per platform. Any tlb
flush range request greater than this threshold, is converted to a full
flush. Currently, it is set to the default value 4K for every
platform[1]. Glibc locale install memory corruption only happens if
this threshold is changed to a higher value i.e. 1G. This doesn't
change anything in OpenSBI code path except the fact that it will issue
many sfence.vma instructions back to back instead of one. 

If the hardware team at SiFive can look into this as well, it would be
great.

To conclude, we think this issue need to be investigated by hardware
team and the kernel patch can be merged to get the performance benefit.

[1] 
https://github.com/riscv/opensbi/blob/master/include/sbi/sbi_platform.h#L40



> - Paul
> 
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
Paul Walmsley Oct. 25, 2019, 3:09 a.m. UTC | #3
On Fri, 30 Aug 2019, Paul Walmsley wrote:

> On Thu, 22 Aug 2019, Atish Patra wrote:
> 
> > This series adds few optimizations to reduce the trap cost in the tlb
> > flush path. We should only make SBI calls to remote tlb flush only if
> > absolutely required. 
> 
> The patches look great.  My understanding is that these optimization 
> patches may actually be a partial workaround for the TLB flushing bug that 
> we've been looking at for the last month or so, which can corrupt memory 
> or crash the system.

I don't think we're any closer to root-causing this issue.  Meanwhile, 
OpenSBI has merged patches to work around it.  So since many of us have 
looked at Atish's TLB optimization patches, and we all think they are 
useful optimizations, let's plan to merge it for v5.5-rc1.


- Paul
Atish Patra Oct. 31, 2019, 3:13 p.m. UTC | #4
On Thu, 2019-10-24 at 20:09 -0700, Paul Walmsley wrote:
> On Fri, 30 Aug 2019, Paul Walmsley wrote:
> 
> > On Thu, 22 Aug 2019, Atish Patra wrote:
> > 
> > > This series adds few optimizations to reduce the trap cost in the
> > > tlb
> > > flush path. We should only make SBI calls to remote tlb flush
> > > only if
> > > absolutely required. 
> > 
> > The patches look great.  My understanding is that these
> > optimization 
> > patches may actually be a partial workaround for the TLB flushing
> > bug that 
> > we've been looking at for the last month or so, which can corrupt
> > memory 
> > or crash the system.
> 
> I don't think we're any closer to root-causing this
> issue.  Meanwhile, 
> OpenSBI has merged patches to work around it.  So since many of us
> have 
> looked at Atish's TLB optimization patches, and we all think they
> are 
> useful optimizations, let's plan to merge it for v5.5-rc1.
> 
> 

Thanks. These patches still applies cleanly on 5.4-rc5.
Let me know if you need something from myside.

> - Paul