Message ID | 20190822075151.24838-1-atish.patra@wdc.com (mailing list archive) |
---|---|
Headers | show |
Series | Optimize tlbflush path | expand |
Hi Atish, On Thu, 22 Aug 2019, Atish Patra wrote: > This series adds few optimizations to reduce the trap cost in the tlb > flush path. We should only make SBI calls to remote tlb flush only if > absolutely required. The patches look great. My understanding is that these optimization patches may actually be a partial workaround for the TLB flushing bug that we've been looking at for the last month or so, which can corrupt memory or crash the system. If that's the case, let's first root-cause the underlying bug. Otherwise we'll just be papering over the actual issue, which probably could still occur even with this series, correct? Since it contains no explicit fixes? - Paul
On Fri, 2019-08-30 at 19:50 -0700, Paul Walmsley wrote: > Hi Atish, > > On Thu, 22 Aug 2019, Atish Patra wrote: > > > This series adds few optimizations to reduce the trap cost in the > > tlb > > flush path. We should only make SBI calls to remote tlb flush only > > if > > absolutely required. > > The patches look great. My understanding is that these optimization > patches may actually be a partial workaround for the TLB flushing bug > that > we've been looking at for the last month or so, which can corrupt > memory > or crash the system. > > If that's the case, let's first root-cause the underlying > bug. Otherwise > we'll just be papering over the actual issue, which probably could > still > occur even with this series, correct? Since it contains no explicit > fixes? > > I have verified the glibc locale install issue both in Qemu and Unleashed. I don't see any issue with OpenSBI master + Linux v5.3 kernel. As per our investigation, it looks like a hardware errata with Unleashed board as the memory corruption issue only happens in case of tlb range flush. In RISC-V, sfence.vma can only be issued at page boundary. If the range is larger than that, OpenSBI has to issue multiple sfence.vma calls back to back leading to possible memory corruption. Currently, OpenSBI has a platform feature i.e. "tlb_range_flush_limit" that allows to configure tlb flush threshold per platform. Any tlb flush range request greater than this threshold, is converted to a full flush. Currently, it is set to the default value 4K for every platform[1]. Glibc locale install memory corruption only happens if this threshold is changed to a higher value i.e. 1G. This doesn't change anything in OpenSBI code path except the fact that it will issue many sfence.vma instructions back to back instead of one. If the hardware team at SiFive can look into this as well, it would be great. To conclude, we think this issue need to be investigated by hardware team and the kernel patch can be merged to get the performance benefit. [1] https://github.com/riscv/opensbi/blob/master/include/sbi/sbi_platform.h#L40 > - Paul > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv
On Fri, 30 Aug 2019, Paul Walmsley wrote: > On Thu, 22 Aug 2019, Atish Patra wrote: > > > This series adds few optimizations to reduce the trap cost in the tlb > > flush path. We should only make SBI calls to remote tlb flush only if > > absolutely required. > > The patches look great. My understanding is that these optimization > patches may actually be a partial workaround for the TLB flushing bug that > we've been looking at for the last month or so, which can corrupt memory > or crash the system. I don't think we're any closer to root-causing this issue. Meanwhile, OpenSBI has merged patches to work around it. So since many of us have looked at Atish's TLB optimization patches, and we all think they are useful optimizations, let's plan to merge it for v5.5-rc1. - Paul
On Thu, 2019-10-24 at 20:09 -0700, Paul Walmsley wrote: > On Fri, 30 Aug 2019, Paul Walmsley wrote: > > > On Thu, 22 Aug 2019, Atish Patra wrote: > > > > > This series adds few optimizations to reduce the trap cost in the > > > tlb > > > flush path. We should only make SBI calls to remote tlb flush > > > only if > > > absolutely required. > > > > The patches look great. My understanding is that these > > optimization > > patches may actually be a partial workaround for the TLB flushing > > bug that > > we've been looking at for the last month or so, which can corrupt > > memory > > or crash the system. > > I don't think we're any closer to root-causing this > issue. Meanwhile, > OpenSBI has merged patches to work around it. So since many of us > have > looked at Atish's TLB optimization patches, and we all think they > are > useful optimizations, let's plan to merge it for v5.5-rc1. > > Thanks. These patches still applies cleanly on 5.4-rc5. Let me know if you need something from myside. > - Paul