Message ID | 1621400656-25678-1-git-send-email-guoren@kernel.org (mailing list archive) |
---|---|
Headers | show |
Series | riscv: Add DMA_COHERENT support | expand |
On Wed, May 19, 2021 at 05:04:13AM +0000, guoren@kernel.org wrote: > From: Guo Ren <guoren@linux.alibaba.com> > > The RISC-V ISA doesn't yet specify how to query or modify PMAs, so let > vendors define the custom properties of memory regions in PTE. Err, hell no. The ISA needs to gets this fixed first. Then we can talk about alternatives patching things in or trapping in the SBI. But if the RISC-V ISA can't get these basic done after years we can't support it in Linux at all.
On Wed, May 19, 2021 at 1:20 PM Christoph Hellwig <hch@lst.de> wrote: > > On Wed, May 19, 2021 at 05:04:13AM +0000, guoren@kernel.org wrote: > > From: Guo Ren <guoren@linux.alibaba.com> > > > > The RISC-V ISA doesn't yet specify how to query or modify PMAs, so let > > vendors define the custom properties of memory regions in PTE. > > Err, hell no. The ISA needs to gets this fixed first. Then we can > talk about alternatives patching things in or trapping in the SBI. > But if the RISC-V ISA can't get these basic done after years we can't > support it in Linux at all. The patchset just leaves a configuration chance for vendors. Before RISC-V ISA fixes it, we should give the chance to let vendor solve their real chip issues.
On Wed, May 19, 2021 at 01:48:23PM +0800, Guo Ren wrote: > The patchset just leaves a configuration chance for vendors. Before > RISC-V ISA fixes it, we should give the chance to let vendor solve > their real chip issues. No. The vendors need to work to get a feature standardized before implementing it. There is other way to have a sane kernel build that supports all the different SOCs.
On Wed, May 19, 2021 at 1:20 PM Christoph Hellwig <hch@lst.de> wrote: > > On Wed, May 19, 2021 at 05:04:13AM +0000, guoren@kernel.org wrote: > > From: Guo Ren <guoren@linux.alibaba.com> > > > > The RISC-V ISA doesn't yet specify how to query or modify PMAs, so let > > vendors define the custom properties of memory regions in PTE. > > Err, hell no. The ISA needs to gets this fixed first. Then we can > talk about alternatives patching things in or trapping in the SBI. > But if the RISC-V ISA can't get these basic done after years we can't > support it in Linux at all. This is the lightest solution I could imagine, it avoids conflicts with RISC-V ISA. Since the existing RISC-V ISA cannot solve this problem, it is better to provide some configuration for the SOC vendor to customize. -- Best Regards Guo Ren ML: https://lore.kernel.org/linux-csky/
On Wed, May 19, 2021 at 02:05:00PM +0800, Guo Ren wrote: > Since the existing RISC-V ISA cannot solve this problem, it is better > to provide some configuration for the SOC vendor to customize. We've been talking about this problem for close to five years. So no, if you don't manage to get the feature into the ISA it can't be supported.
On Wed, May 19, 2021 at 1:55 PM Christoph Hellwig <hch@lst.de> wrote: > > On Wed, May 19, 2021 at 01:48:23PM +0800, Guo Ren wrote: > > The patchset just leaves a configuration chance for vendors. Before > > RISC-V ISA fixes it, we should give the chance to let vendor solve > > their real chip issues. > > No. The vendors need to work to get a feature standardized before > implementing it. There is other way to have a sane kernel build that > supports all the different SOCs. I've said the patchset doesn't define any features, It just leaves the chance for vendors. It's not in conflict with any standardized riscv ISA.
On Wed, May 19, 2021 at 2:06 PM Christoph Hellwig <hch@lst.de> wrote: > > On Wed, May 19, 2021 at 02:05:00PM +0800, Guo Ren wrote: > > Since the existing RISC-V ISA cannot solve this problem, it is better > > to provide some configuration for the SOC vendor to customize. > > We've been talking about this problem for close to five years. So no, > if you don't manage to get the feature into the ISA it can't be > supported. arch/riscv/errata/ is also defined in riscv ISA?
On Wed, May 19, 2021 at 01:48:23PM +0800, Guo Ren wrote: > On Wed, May 19, 2021 at 1:20 PM Christoph Hellwig <hch@lst.de> wrote: > > > > On Wed, May 19, 2021 at 05:04:13AM +0000, guoren@kernel.org wrote: > > > From: Guo Ren <guoren@linux.alibaba.com> > > > > > > The RISC-V ISA doesn't yet specify how to query or modify PMAs, so let > > > vendors define the custom properties of memory regions in PTE. > > > > Err, hell no. The ISA needs to gets this fixed first. Then we can > > talk about alternatives patching things in or trapping in the SBI. > > But if the RISC-V ISA can't get these basic done after years we can't > > support it in Linux at all. > > The patchset just leaves a configuration chance for vendors. Before > RISC-V ISA fixes it, we should give the chance to let vendor solve > their real chip issues. This patch series looks like it might be useful for the StarFive JH7100 [1] [2] too as it has peripherals on a non-coherent interconnect. GMAC, USB and SDIO require that the L2 cache must be manually flushed after DMA operations if the data is intended to be shared with U74 cores [2]. There is the RISC-V Cache Management Operation, or CMO, task group [3] but I am not sure if that can help the SoC's that have already been fabbed like the the D1 and the JH7100. thanks, drew [1] https://github.com/starfive-tech/beaglev_doc/blob/main/JH7100%20Data%20Sheet%20V01.01.04-EN%20(4-21-2021).pdf [2] https://github.com/starfive-tech/beaglev_doc/blob/main/JH7100%20Cache%20Coherence%20V1.0.pdf [3] https://github.com/riscv/riscv-CMOs
On Tue, May 18, 2021 at 11:44:35PM -0700, Drew Fustini wrote: > This patch series looks like it might be useful for the StarFive JH7100 > [1] [2] too as it has peripherals on a non-coherent interconnect. GMAC, > USB and SDIO require that the L2 cache must be manually flushed after > DMA operations if the data is intended to be shared with U74 cores [2]. Not too much, given that the SiFive lineage CPUs have an uncached window, that is a totally different way to allocate uncached memory. > There is the RISC-V Cache Management Operation, or CMO, task group [3] > but I am not sure if that can help the SoC's that have already been > fabbed like the the D1 and the JH7100. It does, because unimplemented instructions trap into M-mode, where they can be emulated. Or to summarize things. Non-coherent DMA (and not coherent as title in this series) requires two things: 1) allocating chunks of memory that is marked as not cachable 2) instructions to invalidate and/or writeback cache lines none of which currently exists in RISV-V. Hacking vendor specific cruft into the kernel doesn't scale, as shown perfectly by this series which requires to hard code vendor specific non-standardized extensions in a kernel that makes it specific to that implementation. What we need to do is to standardize a way to do this properly, and then after that figure out a way to quirk in non-compliant implementations in a way that does not harm the general kernel.
On Wed, May 19, 2021 at 08:06:17AM +0200, Christoph Hellwig wrote: > On Wed, May 19, 2021 at 02:05:00PM +0800, Guo Ren wrote: > > Since the existing RISC-V ISA cannot solve this problem, it is better > > to provide some configuration for the SOC vendor to customize. > > We've been talking about this problem for close to five years. So no, > if you don't manage to get the feature into the ISA it can't be > supported. Isn't it a good goal for Linux to support the capabilities present in the SoC that a currently being fab'd? I believe the CMO group only started last year [1] so the RV64GC SoCs that are going into mass production this year would not have had the opporuntiy of utilizing any RISC-V ISA extension for handling cache management. Thanks, Drew [1] https://github.com/riscv/riscv-CMOs
On Tue, May 18, 2021 at 11:54:31PM -0700, Drew Fustini wrote: > Isn't it a good goal for Linux to support the capabilities present in > the SoC that a currently being fab'd? > > I believe the CMO group only started last year [1] so the RV64GC SoCs > that are going into mass production this year would not have had the > opporuntiy of utilizing any RISC-V ISA extension for handling cache > management. Then the vendors need to push harder. This problem has been known for years but ignored by the vendors.
On Wed, May 19, 2021 at 12:24 PM Drew Fustini <drew@beagleboard.org> wrote: > > On Wed, May 19, 2021 at 08:06:17AM +0200, Christoph Hellwig wrote: > > On Wed, May 19, 2021 at 02:05:00PM +0800, Guo Ren wrote: > > > Since the existing RISC-V ISA cannot solve this problem, it is better > > > to provide some configuration for the SOC vendor to customize. > > > > We've been talking about this problem for close to five years. So no, > > if you don't manage to get the feature into the ISA it can't be > > supported. > > Isn't it a good goal for Linux to support the capabilities present in > the SoC that a currently being fab'd? > > I believe the CMO group only started last year [1] so the RV64GC SoCs > that are going into mass production this year would not have had the > opporuntiy of utilizing any RISC-V ISA extension for handling cache > management. The current Linux RISC-V policy is to only accept patches for frozen or ratified ISA specs. (Refer, Documentation/riscv/patch-acceptance.rst) This means even if emulate CMO instructions in OpenSBI, the Linux patches won't be taken by Palmer because CMO specification is still in draft stage. Also, we all know how much time it takes for RISCV international to freeze some spec. Judging by that we are looking at another 3-4 years at minimum. Regards, Anup
On 2021/05/19 16:16, Anup Patel wrote: > On Wed, May 19, 2021 at 12:24 PM Drew Fustini <drew@beagleboard.org> wrote: >> >> On Wed, May 19, 2021 at 08:06:17AM +0200, Christoph Hellwig wrote: >>> On Wed, May 19, 2021 at 02:05:00PM +0800, Guo Ren wrote: >>>> Since the existing RISC-V ISA cannot solve this problem, it is better >>>> to provide some configuration for the SOC vendor to customize. >>> >>> We've been talking about this problem for close to five years. So no, >>> if you don't manage to get the feature into the ISA it can't be >>> supported. >> >> Isn't it a good goal for Linux to support the capabilities present in >> the SoC that a currently being fab'd? >> >> I believe the CMO group only started last year [1] so the RV64GC SoCs >> that are going into mass production this year would not have had the >> opporuntiy of utilizing any RISC-V ISA extension for handling cache >> management. > > The current Linux RISC-V policy is to only accept patches for frozen or > ratified ISA specs. > (Refer, Documentation/riscv/patch-acceptance.rst) > > This means even if emulate CMO instructions in OpenSBI, the Linux > patches won't be taken by Palmer because CMO specification is > still in draft stage. > > Also, we all know how much time it takes for RISCV international > to freeze some spec. Judging by that we are looking at another > 3-4 years at minimum. Which is the root cause of most problems with riscv extension support in Linux. All RISC-V foundation members need to apply pressure on the foundation and these standard groups to deliver frozen specifications with an acceptable schedule. c.f. the H extensions specs which are not yet frozen despite not having been changed for months if not years.
On Wed, May 19, 2021 at 2:53 PM Christoph Hellwig <hch@lst.de> wrote: > > On Tue, May 18, 2021 at 11:44:35PM -0700, Drew Fustini wrote: > > This patch series looks like it might be useful for the StarFive JH7100 > > [1] [2] too as it has peripherals on a non-coherent interconnect. GMAC, > > USB and SDIO require that the L2 cache must be manually flushed after > > DMA operations if the data is intended to be shared with U74 cores [2]. > > Not too much, given that the SiFive lineage CPUs have an uncached > window, that is a totally different way to allocate uncached memory. It's a very big MIPS smell. What's the attribute of the uncached window? (uncached + strong-order/ uncached + weak, most vendors still use AXI interconnect, how to deal with a bufferable attribute?) In fact, customers' drivers use different ways to deal with DMA memory in non-coherent SOC. Most riscv SOC vendors are from ARM, so giving them the same way in DMA memory is a smart choice. So using PTE attributes is more suitable. See: https://github.com/riscv/virtual-memory/blob/main/specs/611-virtual-memory-diff.pdf 4.4.1 The draft supports custom attribute bits in PTE. Although I do not agree with uncached windows, this patchset does not conflict with SiFive uncached windows. > > > There is the RISC-V Cache Management Operation, or CMO, task group [3] > > but I am not sure if that can help the SoC's that have already been > > fabbed like the the D1 and the JH7100. > > It does, because unimplemented instructions trap into M-mode, where they > can be emulated. > > Or to summarize things. Non-coherent DMA (and not coherent as title in > this series) requires two things: > > 1) allocating chunks of memory that is marked as not cachable > 2) instructions to invalidate and/or writeback cache lines Maybe sbi_ecall (dma_sync) is enough and let the vendor deal with it in opensbi. From a hardware view, CMO instruction only could deal with one cache line, then CMO-trap is not a good idea. > > none of which currently exists in RISV-V. Hacking vendor specific > cruft into the kernel doesn't scale, as shown perfectly by this > series which requires to hard code vendor-specific non-standardized > extensions in a kernel that makes it specific to that implementation. > > What we need to do is to standardize a way to do this properly, and then > after that figure out a way to quirk in non-compliant implementations > in a way that does not harm the general kernel.
On Wed, May 19, 2021 at 3:15 PM Anup Patel <anup@brainfault.org> wrote: > > On Wed, May 19, 2021 at 12:24 PM Drew Fustini <drew@beagleboard.org> wrote: > > > > On Wed, May 19, 2021 at 08:06:17AM +0200, Christoph Hellwig wrote: > > > On Wed, May 19, 2021 at 02:05:00PM +0800, Guo Ren wrote: > > > > Since the existing RISC-V ISA cannot solve this problem, it is better > > > > to provide some configuration for the SOC vendor to customize. > > > > > > We've been talking about this problem for close to five years. So no, > > > if you don't manage to get the feature into the ISA it can't be > > > supported. > > > > Isn't it a good goal for Linux to support the capabilities present in > > the SoC that a currently being fab'd? > > > > I believe the CMO group only started last year [1] so the RV64GC SoCs > > that are going into mass production this year would not have had the > > opporuntiy of utilizing any RISC-V ISA extension for handling cache > > management. > > The current Linux RISC-V policy is to only accept patches for frozen or > ratified ISA specs. > (Refer, Documentation/riscv/patch-acceptance.rst) > > This means even if emulate CMO instructions in OpenSBI, the Linux > patches won't be taken by Palmer because CMO specification is > still in draft stage. How do you think about sbi_ecall(SBI_EXT_DMA, SBI_DMA_SYNC, start, size, dir, 0, 0, 0); ? thx > > Also, we all know how much time it takes for RISCV international > to freeze some spec. Judging by that we are looking at another > 3-4 years at minimum. > > Regards, > Anup
On Thu, May 20, 2021 at 9:47 AM Guo Ren <guoren@kernel.org> wrote: > > On Wed, May 19, 2021 at 3:15 PM Anup Patel <anup@brainfault.org> wrote: > > > > On Wed, May 19, 2021 at 12:24 PM Drew Fustini <drew@beagleboard.org> wrote: > > > > > > On Wed, May 19, 2021 at 08:06:17AM +0200, Christoph Hellwig wrote: > > > > On Wed, May 19, 2021 at 02:05:00PM +0800, Guo Ren wrote: > > > > > Since the existing RISC-V ISA cannot solve this problem, it is better > > > > > to provide some configuration for the SOC vendor to customize. > > > > > > > > We've been talking about this problem for close to five years. So no, > > > > if you don't manage to get the feature into the ISA it can't be > > > > supported. > > > > > > Isn't it a good goal for Linux to support the capabilities present in > > > the SoC that a currently being fab'd? > > > > > > I believe the CMO group only started last year [1] so the RV64GC SoCs > > > that are going into mass production this year would not have had the > > > opporuntiy of utilizing any RISC-V ISA extension for handling cache > > > management. > > > > The current Linux RISC-V policy is to only accept patches for frozen or > > ratified ISA specs. > > (Refer, Documentation/riscv/patch-acceptance.rst) > > > > This means even if emulate CMO instructions in OpenSBI, the Linux I think it's CBO now. https://github.com/riscv/riscv-CMOs/blob/master/discussion-files/RISC_V_range_CMOs_bad_v1.00.pdf > > patches won't be taken by Palmer because CMO specification is > > still in draft stage. > How do you think about > sbi_ecall(SBI_EXT_DMA, SBI_DMA_SYNC, start, size, dir, 0, 0, 0); > ? thx CBO insn trap is okay for me ;-) > > > > Also, we all know how much time it takes for RISCV international > > to freeze some spec. Judging by that we are looking at another > > 3-4 years at minimum. > > > > Regards, > > Anup > > > > -- > Best Regards > Guo Ren > > ML: https://lore.kernel.org/linux-csky/
On Thu, May 20, 2021 at 09:45:45AM +0800, Guo Ren wrote: > It's a very big MIPS smell. What's the attribute of the uncached > window? (uncached + strong-order/ uncached + weak, most vendors still > use AXI interconnect, how to deal with a bufferable attribute?) In > fact, customers' drivers use different ways to deal with DMA memory in > non-coherent SOC. Most riscv SOC vendors are from ARM, so giving them > the same way in DMA memory is a smart choice. So using PTE attributes > is more suitable. I'm not saying it is a good idea. Just that apparently this exists in the ASICs.
On Wed, May 19, 2021 at 3:15 PM Anup Patel <anup@brainfault.org> wrote: > > On Wed, May 19, 2021 at 12:24 PM Drew Fustini <drew@beagleboard.org> wrote: > > > > On Wed, May 19, 2021 at 08:06:17AM +0200, Christoph Hellwig wrote: > > > On Wed, May 19, 2021 at 02:05:00PM +0800, Guo Ren wrote: > > > > Since the existing RISC-V ISA cannot solve this problem, it is better > > > > to provide some configuration for the SOC vendor to customize. > > > > > > We've been talking about this problem for close to five years. So no, > > > if you don't manage to get the feature into the ISA it can't be > > > supported. > > > > Isn't it a good goal for Linux to support the capabilities present in > > the SoC that a currently being fab'd? > > > > I believe the CMO group only started last year [1] so the RV64GC SoCs > > that are going into mass production this year would not have had the > > opporuntiy of utilizing any RISC-V ISA extension for handling cache > > management. > > The current Linux RISC-V policy is to only accept patches for frozen or > ratified ISA specs. > (Refer, Documentation/riscv/patch-acceptance.rst) > > This means even if emulate CMO instructions in OpenSBI, the Linux > patches won't be taken by Palmer because CMO specification is > still in draft stage. Before CMO specification release, could we use a sbi_ecall to solve the current problem? This is not against the specification, when CMO is ready we could let users choose to use the new CMO in Linux. From a tech view, CMO trap emulation is the same as sbi_ecall. > > Also, we all know how much time it takes for RISCV international > to freeze some spec. Judging by that we are looking at another > 3-4 years at minimum. > > Regards, > Anup
On Fri, 21 May 2021 17:36:08 PDT (-0700), guoren@kernel.org wrote: > On Wed, May 19, 2021 at 3:15 PM Anup Patel <anup@brainfault.org> wrote: >> >> On Wed, May 19, 2021 at 12:24 PM Drew Fustini <drew@beagleboard.org> wrote: >> > >> > On Wed, May 19, 2021 at 08:06:17AM +0200, Christoph Hellwig wrote: >> > > On Wed, May 19, 2021 at 02:05:00PM +0800, Guo Ren wrote: >> > > > Since the existing RISC-V ISA cannot solve this problem, it is better >> > > > to provide some configuration for the SOC vendor to customize. >> > > >> > > We've been talking about this problem for close to five years. So no, >> > > if you don't manage to get the feature into the ISA it can't be >> > > supported. >> > >> > Isn't it a good goal for Linux to support the capabilities present in >> > the SoC that a currently being fab'd? >> > >> > I believe the CMO group only started last year [1] so the RV64GC SoCs >> > that are going into mass production this year would not have had the >> > opporuntiy of utilizing any RISC-V ISA extension for handling cache >> > management. >> >> The current Linux RISC-V policy is to only accept patches for frozen or >> ratified ISA specs. >> (Refer, Documentation/riscv/patch-acceptance.rst) >> >> This means even if emulate CMO instructions in OpenSBI, the Linux >> patches won't be taken by Palmer because CMO specification is >> still in draft stage. > Before CMO specification release, could we use a sbi_ecall to solve > the current problem? This is not against the specification, when CMO > is ready we could let users choose to use the new CMO in Linux. > > From a tech view, CMO trap emulation is the same as sbi_ecall. > >> >> Also, we all know how much time it takes for RISCV international >> to freeze some spec. Judging by that we are looking at another >> 3-4 years at minimum. Sorry for being slow here, this thread got buried. I've been trying to work with a handful of folks at the RISC-V foundation to try and get a subset of the various in-development specifications (some simple CMOs, something about non-caching in the page tables, and some way to prevent speculative accesse from generating coherence traffic that will break non-coherent systems). I'm not sure we can get this together quickly, but I'd prefer to at least try before we jump to taking vendor-specificed behavior here. It's obviously an up-hill battle to try and get specifications through the process and I'm certainly not going to promise it will work, but I'm hoping that the impending need to avoid forking the ISA will be sufficient to get people behind producing some specifications in a timely fashion. I wasn't aware than this chip had non-coherent devices until I saw this thread, so we'd been mostly focused on the Beagle V chip. That was in a sense an easier problem because the SiFive IP in it was never designed to have non-coherent devices so we'd have to make anything work via a series of slow workarounds, which would make emulating the eventually standardized behavior reasonable in terms of performance (ie, everything would be super slow so who really cares). I don't think relying on some sort of SBI call for the CMOs whould be such a performance hit that it would prevent these systems from being viable, but assuming you have reasonable performance on your non-cached accesses then that's probably not going to be viable to trap and emulate. At that point it really just becomes silly to pretend that we're still making things work by emulating the eventually ratified behavior, as anyone who actually tries to use this thing to do IO would need out of tree patches. I'm not sure exactly what the plan is for the page table bits in the specification right now, but if you can give me a pointer to some documentation then I'm happy to try and push for something compatible. If we can't make the process work at the foundation then I'd be strongly in favor of just biting the bullet and starting to take vendor-specific code that's been implemented in hardware and is necessarry to make things work acceptably. That's obviously a sub-optimal solution as it'll lead to a bunch of ISA fragmentation, but at least we'll be able to keep the software stack together. Can you tell us when these will be in the hands of users? That's pretty important here, as I don't want to be blocking real users from having their hardware work. IIRC there were some plans to distribute early boards, but it looks like the foundation got involved and I guess I lost the thread at that point. Sorry this is all such a headache, but hopefully we can get things sorted out. >> >> Regards, >> Anup
On Sat, 29 May 2021 17:30:18 PDT (-0700), Palmer Dabbelt wrote: > On Fri, 21 May 2021 17:36:08 PDT (-0700), guoren@kernel.org wrote: >> On Wed, May 19, 2021 at 3:15 PM Anup Patel <anup@brainfault.org> wrote: >>> >>> On Wed, May 19, 2021 at 12:24 PM Drew Fustini <drew@beagleboard.org> wrote: >>> > >>> > On Wed, May 19, 2021 at 08:06:17AM +0200, Christoph Hellwig wrote: >>> > > On Wed, May 19, 2021 at 02:05:00PM +0800, Guo Ren wrote: >>> > > > Since the existing RISC-V ISA cannot solve this problem, it is better >>> > > > to provide some configuration for the SOC vendor to customize. >>> > > >>> > > We've been talking about this problem for close to five years. So no, >>> > > if you don't manage to get the feature into the ISA it can't be >>> > > supported. >>> > >>> > Isn't it a good goal for Linux to support the capabilities present in >>> > the SoC that a currently being fab'd? >>> > >>> > I believe the CMO group only started last year [1] so the RV64GC SoCs >>> > that are going into mass production this year would not have had the >>> > opporuntiy of utilizing any RISC-V ISA extension for handling cache >>> > management. >>> >>> The current Linux RISC-V policy is to only accept patches for frozen or >>> ratified ISA specs. >>> (Refer, Documentation/riscv/patch-acceptance.rst) >>> >>> This means even if emulate CMO instructions in OpenSBI, the Linux >>> patches won't be taken by Palmer because CMO specification is >>> still in draft stage. >> Before CMO specification release, could we use a sbi_ecall to solve >> the current problem? This is not against the specification, when CMO >> is ready we could let users choose to use the new CMO in Linux. >> >> From a tech view, CMO trap emulation is the same as sbi_ecall. >> >>> >>> Also, we all know how much time it takes for RISCV international >>> to freeze some spec. Judging by that we are looking at another >>> 3-4 years at minimum. > > Sorry for being slow here, this thread got buried. > > I've been trying to work with a handful of folks at the RISC-V > foundation to try and get a subset of the various in-development > specifications (some simple CMOs, something about non-caching in the > page tables, and some way to prevent speculative accesse from generating > coherence traffic that will break non-coherent systems). I'm not sure > we can get this together quickly, but I'd prefer to at least try before > we jump to taking vendor-specificed behavior here. It's obviously an > up-hill battle to try and get specifications through the process and I'm > certainly not going to promise it will work, but I'm hoping that the > impending need to avoid forking the ISA will be sufficient to get people > behind producing some specifications in a timely fashion. > > I wasn't aware than this chip had non-coherent devices until I saw this > thread, so we'd been mostly focused on the Beagle V chip. That was in a > sense an easier problem because the SiFive IP in it was never designed > to have non-coherent devices so we'd have to make anything work via a > series of slow workarounds, which would make emulating the eventually > standardized behavior reasonable in terms of performance (ie, everything > would be super slow so who really cares). > > I don't think relying on some sort of SBI call for the CMOs whould be > such a performance hit that it would prevent these systems from being > viable, but assuming you have reasonable performance on your non-cached > accesses then that's probably not going to be viable to trap and > emulate. At that point it really just becomes silly to pretend that > we're still making things work by emulating the eventually ratified > behavior, as anyone who actually tries to use this thing to do IO would > need out of tree patches. I'm not sure exactly what the plan is for the > page table bits in the specification right now, but if you can give me a > pointer to some documentation then I'm happy to try and push for > something compatible. > > If we can't make the process work at the foundation then I'd be strongly > in favor of just biting the bullet and starting to take vendor-specific > code that's been implemented in hardware and is necessarry to make > things work acceptably. That's obviously a sub-optimal solution as > it'll lead to a bunch of ISA fragmentation, but at least we'll be able > to keep the software stack together. > > Can you tell us when these will be in the hands of users? That's pretty > important here, as I don't want to be blocking real users from having > their hardware work. IIRC there were some plans to distribute early > boards, but it looks like the foundation got involved and I guess I lost > the thread at that point. > > Sorry this is all such a headache, but hopefully we can get things > sorted out. I talked with some of the RISC-V foundation folks, we're not going to have an ISA specification for the non-coherent stuff any time soon. I took a look at this code and I definately don't want to take it as is, but I'm not opposed to taking something that makes the hardware work as long as it's a lot cleaner. We've already got two of these non-coherent chips, I'm sure more will come, and I'd rather have the extra headaches than make everyone fork the software stack. After talking to Atish it looks like there's likely to be an SBI extension to handle the CMOs, which should let us avoid the bulk of the vendor-specific behavior in the kernel. I know some people are worried about adding to the SBI surface. I'm worried about that too, but that's way better than sticking a bunch of vendor-specific instructions into the kernel. The SBI extension should make for a straight-forward cache flush implementation in Linux, so let's just plan on that getting through quickly (as has been done before). Unfortunately we've yet to come up with a way to handle the non-cacheable mappings without introducing a degree of vendor-specific behavior or seriously impacting performance (mark them as not valid and deal with them in the trap handler). I'm not really sure it counts as supporting the hardware if it's massively slow, so that really leaves us with vendor-specific mappings as the only option to make these chips work. This implementation, which adds some Kconfig entries that control page table bits, definately isn't suitable for upstream. Allowing users to set arbitrary page table bits will eventually conflict with the standard, and is just going to be a mess. It'll also lead to kernels that are only compatible with specific designs, which we're trying very hard to avoid. At a bare minimum we'll need some way to detect systems with these page table bits before setting them, and some description of what the bits actually do so we can reason about them.
> -----Original Message----- > From: Palmer Dabbelt <palmer@dabbelt.com> > Sent: 03 June 2021 09:43 > To: guoren@kernel.org > Cc: anup@brainfault.org; drew@beagleboard.org; Christoph Hellwig > <hch@lst.de>; Anup Patel <Anup.Patel@wdc.com>; wefu@redhat.com; > lazyparser@gmail.com; linux-riscv@lists.infradead.org; linux- > kernel@vger.kernel.org; linux-arch@vger.kernel.org; linux- > sunxi@lists.linux.dev; guoren@linux.alibaba.com; Paul Walmsley > <paul.walmsley@sifive.com> > Subject: Re: [PATCH RFC 0/3] riscv: Add DMA_COHERENT support > > On Sat, 29 May 2021 17:30:18 PDT (-0700), Palmer Dabbelt wrote: > > On Fri, 21 May 2021 17:36:08 PDT (-0700), guoren@kernel.org wrote: > >> On Wed, May 19, 2021 at 3:15 PM Anup Patel <anup@brainfault.org> > wrote: > >>> > >>> On Wed, May 19, 2021 at 12:24 PM Drew Fustini > <drew@beagleboard.org> wrote: > >>> > > >>> > On Wed, May 19, 2021 at 08:06:17AM +0200, Christoph Hellwig > wrote: > >>> > > On Wed, May 19, 2021 at 02:05:00PM +0800, Guo Ren wrote: > >>> > > > Since the existing RISC-V ISA cannot solve this problem, it is > >>> > > > better to provide some configuration for the SOC vendor to > customize. > >>> > > > >>> > > We've been talking about this problem for close to five years. > >>> > > So no, if you don't manage to get the feature into the ISA it > >>> > > can't be supported. > >>> > > >>> > Isn't it a good goal for Linux to support the capabilities present > >>> > in the SoC that a currently being fab'd? > >>> > > >>> > I believe the CMO group only started last year [1] so the RV64GC > >>> > SoCs that are going into mass production this year would not have > >>> > had the opporuntiy of utilizing any RISC-V ISA extension for > >>> > handling cache management. > >>> > >>> The current Linux RISC-V policy is to only accept patches for frozen > >>> or ratified ISA specs. > >>> (Refer, Documentation/riscv/patch-acceptance.rst) > >>> > >>> This means even if emulate CMO instructions in OpenSBI, the Linux > >>> patches won't be taken by Palmer because CMO specification is still > >>> in draft stage. > >> Before CMO specification release, could we use a sbi_ecall to solve > >> the current problem? This is not against the specification, when CMO > >> is ready we could let users choose to use the new CMO in Linux. > >> > >> From a tech view, CMO trap emulation is the same as sbi_ecall. > >> > >>> > >>> Also, we all know how much time it takes for RISCV international to > >>> freeze some spec. Judging by that we are looking at another > >>> 3-4 years at minimum. > > > > Sorry for being slow here, this thread got buried. > > > > I've been trying to work with a handful of folks at the RISC-V > > foundation to try and get a subset of the various in-development > > specifications (some simple CMOs, something about non-caching in the > > page tables, and some way to prevent speculative accesse from > > generating coherence traffic that will break non-coherent systems). > > I'm not sure we can get this together quickly, but I'd prefer to at > > least try before we jump to taking vendor-specificed behavior here. > > It's obviously an up-hill battle to try and get specifications through > > the process and I'm certainly not going to promise it will work, but > > I'm hoping that the impending need to avoid forking the ISA will be > > sufficient to get people behind producing some specifications in a timely > fashion. > > > > I wasn't aware than this chip had non-coherent devices until I saw > > this thread, so we'd been mostly focused on the Beagle V chip. That > > was in a sense an easier problem because the SiFive IP in it was never > > designed to have non-coherent devices so we'd have to make anything > > work via a series of slow workarounds, which would make emulating the > > eventually standardized behavior reasonable in terms of performance > > (ie, everything would be super slow so who really cares). > > > > I don't think relying on some sort of SBI call for the CMOs whould be > > such a performance hit that it would prevent these systems from being > > viable, but assuming you have reasonable performance on your > > non-cached accesses then that's probably not going to be viable to > > trap and emulate. At that point it really just becomes silly to > > pretend that we're still making things work by emulating the > > eventually ratified behavior, as anyone who actually tries to use this > > thing to do IO would need out of tree patches. I'm not sure exactly > > what the plan is for the page table bits in the specification right > > now, but if you can give me a pointer to some documentation then I'm > > happy to try and push for something compatible. > > > > If we can't make the process work at the foundation then I'd be > > strongly in favor of just biting the bullet and starting to take > > vendor-specific code that's been implemented in hardware and is > > necessarry to make things work acceptably. That's obviously a > > sub-optimal solution as it'll lead to a bunch of ISA fragmentation, > > but at least we'll be able to keep the software stack together. > > > > Can you tell us when these will be in the hands of users? That's > > pretty important here, as I don't want to be blocking real users from > > having their hardware work. IIRC there were some plans to distribute > > early boards, but it looks like the foundation got involved and I > > guess I lost the thread at that point. > > > > Sorry this is all such a headache, but hopefully we can get things > > sorted out. > > I talked with some of the RISC-V foundation folks, we're not going to have an > ISA specification for the non-coherent stuff any time soon. I took a look at > this code and I definately don't want to take it as is, but I'm not opposed to > taking something that makes the hardware work as long as it's a lot cleaner. > We've already got two of these non-coherent chips, I'm sure more will come, > and I'd rather have the extra headaches than make everyone fork the software > stack. Thanks for confirming. The CMO extension is still in early stages so it will certainly take more time for them. After CMO extension is finalized, it will take some more time to have actual RISC-V platforms with CMO implemented. > > After talking to Atish it looks like there's likely to be an SBI extension to > handle the CMOs, which should let us avoid the bulk of the vendor-specific > behavior in the kernel. I know some people are worried about adding to the > SBI surface. I'm worried about that too, but that's way better than sticking a > bunch of vendor-specific instructions into the kernel. The SBI extension > should make for a straight-forward cache flush implementation in Linux, so > let's just plan on that getting through quickly (as has been done before). Yes, I agree. We can have just a single SBI call which is meant for DMA sync purpose only which means it will flush/invalidate pages from all cache levels irrespective of the cache hierarchy (i.e. flush/invalidate to RAM). The CMO extension might more generic cache operations which can target any cache level. I am already preparing a write-up for SBI DMA sync call in SBI spec. I will share it with you separately as well. > > Unfortunately we've yet to come up with a way to handle the non-cacheable > mappings without introducing a degree of vendor-specific behavior or > seriously impacting performance (mark them as not valid and deal with them > in the trap handler). I'm not really sure it counts as supporting the hardware > if it's massively slow, so that really leaves us with vendor-specific mappings as > the only option to make these chips work. A RISC-V platform can have non-cacheable mappings is following possible ways: 1) Fixed physical address range as non-cacheable using PMAs 2) Custom page table attributes 3) Svpmbt extension being defined by RVI Atish and me both think it is possible to have RISC-V specific DMA ops implementation which can handle all above case. Atish is already working on DMA ops implementation for RISC-V. > > This implementation, which adds some Kconfig entries that control page table > bits, definately isn't suitable for upstream. Allowing users to set arbitrary > page table bits will eventually conflict with the standard, and is just going to > be a mess. It'll also lead to kernels that are only compatible with specific > designs, which we're trying very hard to avoid. At a bare minimum we'll need > some way to detect systems with these page table bits before setting them, > and some description of what the bits actually do so we can reason about > them. Yes, vendor specific Kconfig options are strict NO NO. We can't give-up the goal of unified kernel image for all platforms. Regards, Anup
On Wed, 02 Jun 2021 23:00:29 PDT (-0700), Anup Patel wrote: > > >> -----Original Message----- >> From: Palmer Dabbelt <palmer@dabbelt.com> >> Sent: 03 June 2021 09:43 >> To: guoren@kernel.org >> Cc: anup@brainfault.org; drew@beagleboard.org; Christoph Hellwig >> <hch@lst.de>; Anup Patel <Anup.Patel@wdc.com>; wefu@redhat.com; >> lazyparser@gmail.com; linux-riscv@lists.infradead.org; linux- >> kernel@vger.kernel.org; linux-arch@vger.kernel.org; linux- >> sunxi@lists.linux.dev; guoren@linux.alibaba.com; Paul Walmsley >> <paul.walmsley@sifive.com> >> Subject: Re: [PATCH RFC 0/3] riscv: Add DMA_COHERENT support >> >> On Sat, 29 May 2021 17:30:18 PDT (-0700), Palmer Dabbelt wrote: >> > On Fri, 21 May 2021 17:36:08 PDT (-0700), guoren@kernel.org wrote: >> >> On Wed, May 19, 2021 at 3:15 PM Anup Patel <anup@brainfault.org> >> wrote: >> >>> >> >>> On Wed, May 19, 2021 at 12:24 PM Drew Fustini >> <drew@beagleboard.org> wrote: >> >>> > >> >>> > On Wed, May 19, 2021 at 08:06:17AM +0200, Christoph Hellwig >> wrote: >> >>> > > On Wed, May 19, 2021 at 02:05:00PM +0800, Guo Ren wrote: >> >>> > > > Since the existing RISC-V ISA cannot solve this problem, it is >> >>> > > > better to provide some configuration for the SOC vendor to >> customize. >> >>> > > >> >>> > > We've been talking about this problem for close to five years. >> >>> > > So no, if you don't manage to get the feature into the ISA it >> >>> > > can't be supported. >> >>> > >> >>> > Isn't it a good goal for Linux to support the capabilities present >> >>> > in the SoC that a currently being fab'd? >> >>> > >> >>> > I believe the CMO group only started last year [1] so the RV64GC >> >>> > SoCs that are going into mass production this year would not have >> >>> > had the opporuntiy of utilizing any RISC-V ISA extension for >> >>> > handling cache management. >> >>> >> >>> The current Linux RISC-V policy is to only accept patches for frozen >> >>> or ratified ISA specs. >> >>> (Refer, Documentation/riscv/patch-acceptance.rst) >> >>> >> >>> This means even if emulate CMO instructions in OpenSBI, the Linux >> >>> patches won't be taken by Palmer because CMO specification is still >> >>> in draft stage. >> >> Before CMO specification release, could we use a sbi_ecall to solve >> >> the current problem? This is not against the specification, when CMO >> >> is ready we could let users choose to use the new CMO in Linux. >> >> >> >> From a tech view, CMO trap emulation is the same as sbi_ecall. >> >> >> >>> >> >>> Also, we all know how much time it takes for RISCV international to >> >>> freeze some spec. Judging by that we are looking at another >> >>> 3-4 years at minimum. >> > >> > Sorry for being slow here, this thread got buried. >> > >> > I've been trying to work with a handful of folks at the RISC-V >> > foundation to try and get a subset of the various in-development >> > specifications (some simple CMOs, something about non-caching in the >> > page tables, and some way to prevent speculative accesse from >> > generating coherence traffic that will break non-coherent systems). >> > I'm not sure we can get this together quickly, but I'd prefer to at >> > least try before we jump to taking vendor-specificed behavior here. >> > It's obviously an up-hill battle to try and get specifications through >> > the process and I'm certainly not going to promise it will work, but >> > I'm hoping that the impending need to avoid forking the ISA will be >> > sufficient to get people behind producing some specifications in a timely >> fashion. >> > >> > I wasn't aware than this chip had non-coherent devices until I saw >> > this thread, so we'd been mostly focused on the Beagle V chip. That >> > was in a sense an easier problem because the SiFive IP in it was never >> > designed to have non-coherent devices so we'd have to make anything >> > work via a series of slow workarounds, which would make emulating the >> > eventually standardized behavior reasonable in terms of performance >> > (ie, everything would be super slow so who really cares). >> > >> > I don't think relying on some sort of SBI call for the CMOs whould be >> > such a performance hit that it would prevent these systems from being >> > viable, but assuming you have reasonable performance on your >> > non-cached accesses then that's probably not going to be viable to >> > trap and emulate. At that point it really just becomes silly to >> > pretend that we're still making things work by emulating the >> > eventually ratified behavior, as anyone who actually tries to use this >> > thing to do IO would need out of tree patches. I'm not sure exactly >> > what the plan is for the page table bits in the specification right >> > now, but if you can give me a pointer to some documentation then I'm >> > happy to try and push for something compatible. >> > >> > If we can't make the process work at the foundation then I'd be >> > strongly in favor of just biting the bullet and starting to take >> > vendor-specific code that's been implemented in hardware and is >> > necessarry to make things work acceptably. That's obviously a >> > sub-optimal solution as it'll lead to a bunch of ISA fragmentation, >> > but at least we'll be able to keep the software stack together. >> > >> > Can you tell us when these will be in the hands of users? That's >> > pretty important here, as I don't want to be blocking real users from >> > having their hardware work. IIRC there were some plans to distribute >> > early boards, but it looks like the foundation got involved and I >> > guess I lost the thread at that point. >> > >> > Sorry this is all such a headache, but hopefully we can get things >> > sorted out. >> >> I talked with some of the RISC-V foundation folks, we're not going to have an >> ISA specification for the non-coherent stuff any time soon. I took a look at >> this code and I definately don't want to take it as is, but I'm not opposed to >> taking something that makes the hardware work as long as it's a lot cleaner. >> We've already got two of these non-coherent chips, I'm sure more will come, >> and I'd rather have the extra headaches than make everyone fork the software >> stack. > > Thanks for confirming. The CMO extension is still in early stages so it will > certainly take more time for them. After CMO extension is finalized, it will > take some more time to have actual RISC-V platforms with CMO implemented. Agreed. It's going to take two or three years from the standard to get hardware to supporting it, so that means we're three or four years away (at least, there's not even any solid timeline for a spec a year from now) from having hardware. There's just going to be too much non-standard hardware here to try to ignore it all. >> After talking to Atish it looks like there's likely to be an SBI extension to >> handle the CMOs, which should let us avoid the bulk of the vendor-specific >> behavior in the kernel. I know some people are worried about adding to the >> SBI surface. I'm worried about that too, but that's way better than sticking a >> bunch of vendor-specific instructions into the kernel. The SBI extension >> should make for a straight-forward cache flush implementation in Linux, so >> let's just plan on that getting through quickly (as has been done before). > > Yes, I agree. We can have just a single SBI call which is meant for DMA sync > purpose only which means it will flush/invalidate pages from all cache > levels irrespective of the cache hierarchy (i.e. flush/invalidate to RAM). The > CMO extension might more generic cache operations which can target > any cache level. > > I am already preparing a write-up for SBI DMA sync call in SBI spec. I will > share it with you separately as well. Great, thanks. Atish sort of mentioned that, but I didn't want to put words in your mouth (and I assume you were aleep or something, due to time zones). >> Unfortunately we've yet to come up with a way to handle the non-cacheable >> mappings without introducing a degree of vendor-specific behavior or >> seriously impacting performance (mark them as not valid and deal with them >> in the trap handler). I'm not really sure it counts as supporting the hardware >> if it's massively slow, so that really leaves us with vendor-specific mappings as >> the only option to make these chips work. > > A RISC-V platform can have non-cacheable mappings is following possible > ways: > 1) Fixed physical address range as non-cacheable using PMAs > 2) Custom page table attributes > 3) Svpmbt extension being defined by RVI > > Atish and me both think it is possible to have RISC-V specific DMA ops > implementation which can handle all above case. Atish is already working > on DMA ops implementation for RISC-V. Great, thanks. I haven't started writing any code, but I think we're going to be able to get a big chunk of #1 from the "dma-ranges" device tree stuff. I think we still need some arch-specific allocation work to make sure we don't alias, though. The page table attributes are definately going to need dma ops. I'd been assuming we'd have multiple DMA op tables for the multiple flavors, but if they fit into a single op table cleanly that's fine -- that sort of stuff really needs the code here. Since you guys have already started I'll just wait for patches. Thanks! >> This implementation, which adds some Kconfig entries that control page table >> bits, definately isn't suitable for upstream. Allowing users to set arbitrary >> page table bits will eventually conflict with the standard, and is just going to >> be a mess. It'll also lead to kernels that are only compatible with specific >> designs, which we're trying very hard to avoid. At a bare minimum we'll need >> some way to detect systems with these page table bits before setting them, >> and some description of what the bits actually do so we can reason about >> them. > > Yes, vendor specific Kconfig options are strict NO NO. We can't give-up the > goal of unified kernel image for all platforms. I think this is just a phrasing issue, but just to be sure: IMO it's not that they're vendor-specific Kconfig options, it's that turning them on will conflict with standard systems (and other vendors). We've already got the ability to select sets of Kconfig settings that will only boot on one vendor's system, which is fine, as long as there remains a set of Kconfig settings that will boot on all systems. An example here would be the errata: every system has errata of some sort, so if we start flipping off various vendor's errata Kconfigs you'll end up with kernels that only function properly on some systems. That's fine with me, as long as it's possible to turn on all vendor's errata Kconfigs at the same time and the resulting kernel functions correctly on all systems. > Regards, > Anup
From: Palmer Dabbelt > Sent: 03 June 2021 16:39 ... > An example here would be the errata: every system has errata of some > sort, so if we start flipping off various vendor's errata Kconfigs > you'll end up with kernels that only function properly on some systems. > That's fine with me, as long as it's possible to turn on all vendor's > errata Kconfigs at the same time and the resulting kernel functions > correctly on all systems. ISTM that if you can (easily) detect the errata then the detection should be left it - but the kernel fail to boot if the system needs the errata fixed. The same would be needed for DMA in systems with non-coherent memory. Only a hardware engineer would build a system with non-coherent memory and without the ability to do uncached accesses and flush/invalidate small sections of cache. Mind you we did get a dual-cpu system that didn't have cache-coherency between the cpus! That was singularly useless. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Thu, Jun 3, 2021 at 5:39 PM Palmer Dabbelt <palmer@dabbelt.com> wrote: > On Wed, 02 Jun 2021 23:00:29 PDT (-0700), Anup Patel wrote: > >> This implementation, which adds some Kconfig entries that control page table > >> bits, definately isn't suitable for upstream. Allowing users to set arbitrary > >> page table bits will eventually conflict with the standard, and is just going to > >> be a mess. It'll also lead to kernels that are only compatible with specific > >> designs, which we're trying very hard to avoid. At a bare minimum we'll need > >> some way to detect systems with these page table bits before setting them, > >> and some description of what the bits actually do so we can reason about > >> them. > > > > Yes, vendor specific Kconfig options are strict NO NO. We can't give-up the > > goal of unified kernel image for all platforms. > > I think this is just a phrasing issue, but just to be sure: > > IMO it's not that they're vendor-specific Kconfig options, it's that > turning them on will conflict with standard systems (and other vendors). > We've already got the ability to select sets of Kconfig settings that > will only boot on one vendor's system, which is fine, as long as there > remains a set of Kconfig settings that will boot on all systems. > > An example here would be the errata: every system has errata of some > sort, so if we start flipping off various vendor's errata Kconfigs > you'll end up with kernels that only function properly on some systems. > That's fine with me, as long as it's possible to turn on all vendor's > errata Kconfigs at the same time and the resulting kernel functions > correctly on all systems. Yes, this is generally the goal, and it would be great to have that working in a way where a 'defconfig' build just turns on all the options that are needed to use any SoC specific features and drivers while still working on all hardware. There are however limits you may run into at some point, and other architectures usually only manage to span some 10 to 15 years of hardware implementations with a single kernel before it get really hard. To give some common examples that make it break down: - 32-bit vs 64-bit already violates that rule on risc-v (as it does on most other architectures) - architectures that support both big-endian and little-endian kernels tend to have platforms that require one or the other (e.g. mips, though not arm). Not an issue for you. - page table formats are the main cause of incompatibility: arm32 and x86-32 require three-level tables for certain features, but those are incompatible with older cores, arm64 supports three different page sizes, but none of them works on all cores (4KB almost works everywhere). - SMP-enabled ARMv7 kernels can be configured to run on either ARMv6 or ARMv8, but not both, in this case because of incompatible barrier instructions. - 32-bit Arm has a couple more remaining features that require building a machine specific kernel if enabled because they hardcode physical addresses: early printk (debug_ll, not the normal earlycon), NOMMU, and XIP. Arnd
Hi Arnd & Palmer, Sorry for the delayed reply, I'm working on the next version of the patch. On Fri, Jun 4, 2021 at 5:56 PM Arnd Bergmann <arnd@arndb.de> wrote: > > On Thu, Jun 3, 2021 at 5:39 PM Palmer Dabbelt <palmer@dabbelt.com> wrote: > > On Wed, 02 Jun 2021 23:00:29 PDT (-0700), Anup Patel wrote: > > >> This implementation, which adds some Kconfig entries that control page table > > >> bits, definately isn't suitable for upstream. Allowing users to set arbitrary > > >> page table bits will eventually conflict with the standard, and is just going to > > >> be a mess. It'll also lead to kernels that are only compatible with specific > > >> designs, which we're trying very hard to avoid. At a bare minimum we'll need > > >> some way to detect systems with these page table bits before setting them, > > >> and some description of what the bits actually do so we can reason about > > >> them. > > > > > > Yes, vendor specific Kconfig options are strict NO NO. We can't give-up the > > > goal of unified kernel image for all platforms. Okay, Agree. Please help review the next version of the patch. > > > > I think this is just a phrasing issue, but just to be sure: > > > > IMO it's not that they're vendor-specific Kconfig options, it's that > > turning them on will conflict with standard systems (and other vendors). > > We've already got the ability to select sets of Kconfig settings that > > will only boot on one vendor's system, which is fine, as long as there > > remains a set of Kconfig settings that will boot on all systems. > > > > An example here would be the errata: every system has errata of some > > sort, so if we start flipping off various vendor's errata Kconfigs > > you'll end up with kernels that only function properly on some systems. > > That's fine with me, as long as it's possible to turn on all vendor's > > errata Kconfigs at the same time and the resulting kernel functions > > correctly on all systems. > > Yes, this is generally the goal, and it would be great to have that > working in a way where a 'defconfig' build just turns on all the options > that are needed to use any SoC specific features and drivers while > still working on all hardware. There are however limits you may run > into at some point, and other architectures usually only manage to span > some 10 to 15 years of hardware implementations with a single > kernel before it get really hard. I could follow the goal in the next version of the patchset. Please help review, thx. > > To give some common examples that make it break down: > > - 32-bit vs 64-bit already violates that rule on risc-v (as it does on > most other architectures) > > - architectures that support both big-endian and little-endian kernels > tend to have platforms that require one or the other (e.g. mips, > though not arm). Not an issue for you. > > - page table formats are the main cause of incompatibility: arm32 > and x86-32 require three-level tables for certain features, but those > are incompatible with older cores, arm64 supports three different > page sizes, but none of them works on all cores (4KB almost works > everywhere). > > - SMP-enabled ARMv7 kernels can be configured to run on either > ARMv6 or ARMv8, but not both, in this case because of incompatible > barrier instructions. > > - 32-bit Arm has a couple more remaining features that require building > a machine specific kernel if enabled because they hardcode physical > addresses: early printk (debug_ll, not the normal earlycon), NOMMU, > and XIP. > > Arnd
On Fri, 04 Jun 2021 07:47:22 PDT (-0700), guoren@kernel.org wrote: > Hi Arnd & Palmer, > > Sorry for the delayed reply, I'm working on the next version of the patch. > > On Fri, Jun 4, 2021 at 5:56 PM Arnd Bergmann <arnd@arndb.de> wrote: >> >> On Thu, Jun 3, 2021 at 5:39 PM Palmer Dabbelt <palmer@dabbelt.com> wrote: >> > On Wed, 02 Jun 2021 23:00:29 PDT (-0700), Anup Patel wrote: >> > >> This implementation, which adds some Kconfig entries that control page table >> > >> bits, definately isn't suitable for upstream. Allowing users to set arbitrary >> > >> page table bits will eventually conflict with the standard, and is just going to >> > >> be a mess. It'll also lead to kernels that are only compatible with specific >> > >> designs, which we're trying very hard to avoid. At a bare minimum we'll need >> > >> some way to detect systems with these page table bits before setting them, >> > >> and some description of what the bits actually do so we can reason about >> > >> them. >> > > >> > > Yes, vendor specific Kconfig options are strict NO NO. We can't give-up the >> > > goal of unified kernel image for all platforms. > Okay, Agree. Please help review the next version of the patch. > >> > >> > I think this is just a phrasing issue, but just to be sure: >> > >> > IMO it's not that they're vendor-specific Kconfig options, it's that >> > turning them on will conflict with standard systems (and other vendors). >> > We've already got the ability to select sets of Kconfig settings that >> > will only boot on one vendor's system, which is fine, as long as there >> > remains a set of Kconfig settings that will boot on all systems. >> > >> > An example here would be the errata: every system has errata of some >> > sort, so if we start flipping off various vendor's errata Kconfigs >> > you'll end up with kernels that only function properly on some systems. >> > That's fine with me, as long as it's possible to turn on all vendor's >> > errata Kconfigs at the same time and the resulting kernel functions >> > correctly on all systems. >> >> Yes, this is generally the goal, and it would be great to have that >> working in a way where a 'defconfig' build just turns on all the options >> that are needed to use any SoC specific features and drivers while >> still working on all hardware. There are however limits you may run >> into at some point, and other architectures usually only manage to span >> some 10 to 15 years of hardware implementations with a single >> kernel before it get really hard. > I could follow the goal in the next version of the patchset. Please > help review, thx. IMO we're essentially here now with the RISC-V stuff: defconfig flips on everything necesasry to boot normal-smelling SOCs, with everything being detected as the system boots. We have some wacky configurations like !MMU and XIP that are coupled to the hardware, but (and sorry for crossing the other threads, I missed your pointer as it's early here) as I said in the other thread it might be time to make it explicit that those things are non-portable. The hope here has always been that we'd have enough in the standards that we could avoid a proliferation of vendor-specific code. We've always put a strong "things keep working forever" stake in the ground in RISC-V land, but that's largely been because we were counting on the standards existing that make support easy. In practice we don't have those standards so we're ending up with a fairly large software base that is required to support everything. We don't have all that much hardware right now so we'll have to see how it goes, but for now I'm in favor of keeping defconfig as a "boots on everything" sort of setup -- both because it makes life easier for users, and because it makes issues like the non-portable Kconfigs that showed up here quite explicit. If we get to 10/15 years of hardware then I'm sure we'll be removing old systems from defconfig (or maybe even the kernel entirely, a lot of this stuff isn't in production). I'm just hoping we make it that far ;) >> To give some common examples that make it break down: >> >> - 32-bit vs 64-bit already violates that rule on risc-v (as it does on >> most other architectures) Yes, and there's no way around that on RISC-V. They're different base ISAs therefor re-define the same instructions, so we're essentially at two kernel binaries by that point. The platform spec says rv64gc, so we can kind of punt on this one for now. If rv32 hardware shows up we'll probably want a standard system there too, which is why we've avoided coupling kernel portability to XLEN. >> - architectures that support both big-endian and little-endian kernels >> tend to have platforms that require one or the other (e.g. mips, >> though not arm). Not an issue for you. It is now! We've added big-endian to RISC-V. There's no hardware yet and very little software support. IMO the right answer is to ban that from the platform spec, but again it'll depnd on what vendors want to build (though anyone is listening, please don't make my life miserable ;)). >> - page table formats are the main cause of incompatibility: arm32 >> and x86-32 require three-level tables for certain features, but those >> are incompatible with older cores, arm64 supports three different >> page sizes, but none of them works on all cores (4KB almost works >> everywhere). We actually have some support on the works for multiple page table levels in a single binary, which should help with a lot of that incompatibility. I don't know of any plans to couple other page table features to the number of levels, though. >> - SMP-enabled ARMv7 kernels can be configured to run on either >> ARMv6 or ARMv8, but not both, in this case because of incompatible >> barrier instructions. Our barriers aren't quite split the same way, but we do have two memory models (RVWMO and TSO). IIUC we should be able to support both in the same kernels with some patching, but the resulting kernels would be biased towards one memory models over the other WRT performance. Again, we'll have to see what the vendors do and I'm hoping we don't end up with too many headaches. >> - 32-bit Arm has a couple more remaining features that require building >> a machine specific kernel if enabled because they hardcode physical >> addresses: early printk (debug_ll, not the normal earlycon), NOMMU, >> and XIP. We've got NOMMU and XIP as well, but we have some SBI support for early printk. IMO we're not really sure if we've decoupled all the PA layout dependencies yet from Linux, as we really only support one vendor's systems, but we've had a lot of work lately on beefing up our memory layout so with any luck we'll be able to quickly sort out anything that comes up.
On Fri, Jun 4, 2021 at 6:14 PM Palmer Dabbelt <palmer@dabbelt.com> wrote: > The hope here has always been that we'd have enough in the standards > that we could avoid a proliferation of vendor-specific code. We've > always put a strong "things keep working forever" stake in the ground in > RISC-V land, but that's largely been because we were counting on the > standards existing that make support easy. In practice we don't have > those standards so we're ending up with a fairly large software base > that is required to support everything. We don't have all that much > hardware right now so we'll have to see how it goes, but for now I'm in > favor of keeping defconfig as a "boots on everything" sort of setup -- > both because it makes life easier for users, and because it makes issues > like the non-portable Kconfigs that showed up here quite explicit. It's obviously easy to take the hard line approach as long as there is so little hardware available. I expect this to be a constant struggle, but it's definitely worth trying as long as you can. > >> To give some common examples that make it break down: > >> > >> - 32-bit vs 64-bit already violates that rule on risc-v (as it does on > >> most other architectures) > > Yes, and there's no way around that on RISC-V. They're different base > ISAs therefor re-define the same instructions, so we're essentially at > two kernel binaries by that point. The platform spec says rv64gc, so we > can kind of punt on this one for now. If rv32 hardware shows up > we'll probably want a standard system there too, which is why we've > avoided coupling kernel portability to XLEN. I would actually put 32-bit into the same category as NOMMU, XIP and the built-in DTB: Since it seems unrealistic to expect an rv32 Debian or Fedora build, there is very little to gain by enforcing compatibility between machines. This is different from 32-bit Arm, which is widely used across multiple distros and many SoCs. > >> - architectures that support both big-endian and little-endian kernels > >> tend to have platforms that require one or the other (e.g. mips, > >> though not arm). Not an issue for you. > > It is now! We've added big-endian to RISC-V. There's no hardware yet > and very little software support. IMO the right answer is to ban that > from the platform spec, but again it'll depnd on what vendors want to > build (though anyone is listening, please don't make my life miserable > ;)). I don't see any big-endian support in linux-next. This one does seem worth enforcing to be kept out, as it would double the number of user space ABIs, not just kernel configurations. On arm64, I think the general feeling is now that we would have been better off not merging big-endian support into the kernel as an option, but it still seemed important at the time. Not that there is anything really wrong with big-endian by itself, just that there is no use case that is worth the added complexity of supporting both. Let me know if there are patches you want me to Nak in the future ;-) > >> - SMP-enabled ARMv7 kernels can be configured to run on either > >> ARMv6 or ARMv8, but not both, in this case because of incompatible > >> barrier instructions. > > Our barriers aren't quite split the same way, but we do have two memory > models (RVWMO and TSO). IIUC we should be able to support both in the > same kernels with some patching, but the resulting kernels would be > biased towards one memory models over the other WRT performance. Again, > we'll have to see what the vendors do and I'm hoping we don't end up > with too many headaches. I wouldn't specifically expect the problem to be barriers in the rv64 case, this was just an example of instruction sets slowly changing in incompatible ways over a long time. There might be an important reason for version 3.0 of one of the specifications to drop compatibility with version 1.x. Arnd
On Fri, 04 Jun 2021 14:26:11 PDT (-0700), Arnd Bergmann wrote: > On Fri, Jun 4, 2021 at 6:14 PM Palmer Dabbelt <palmer@dabbelt.com> wrote: >> >> To give some common examples that make it break down: >> >> >> >> - 32-bit vs 64-bit already violates that rule on risc-v (as it does on >> >> most other architectures) >> >> Yes, and there's no way around that on RISC-V. They're different base >> ISAs therefor re-define the same instructions, so we're essentially at >> two kernel binaries by that point. The platform spec says rv64gc, so we >> can kind of punt on this one for now. If rv32 hardware shows up >> we'll probably want a standard system there too, which is why we've >> avoided coupling kernel portability to XLEN. > > I would actually put 32-bit into the same category as NOMMU, XIP > and the built-in DTB: > Since it seems unrealistic to expect an rv32 Debian or Fedora build, > there is very little to gain by enforcing compatibility between machines. > This is different from 32-bit Arm, which is widely used across multiple > distros and many SoCs. OK, well, that's what the spec says already. Maybe the right answer is to just add that "be compatible with the platform spec" Kconfig and have it also enforce rv64gc like the spec says. > >> >> - architectures that support both big-endian and little-endian kernels >> >> tend to have platforms that require one or the other (e.g. mips, >> >> though not arm). Not an issue for you. >> >> It is now! We've added big-endian to RISC-V. There's no hardware yet >> and very little software support. IMO the right answer is to ban that >> from the platform spec, but again it'll depnd on what vendors want to >> build (though anyone is listening, please don't make my life miserable >> ;)). > > I don't see any big-endian support in linux-next. This one does seem > worth enforcing to be kept out, as it would double the number of user > space ABIs, not just kernel configurations. On arm64, I think the general > feeling is now that we would have been better off not merging big-endian > support into the kernel as an option, but it still seemed important at the > time. Not that there is anything really wrong with big-endian by itself, > just that there is no use case that is worth the added complexity of > supporting both. > > Let me know if there are patches you want me to Nak in the future ;-) Sorry, by "added big-endian to RISC-V" I meant to the ISA, not to Linux. We haven't had any interesting in adding it to Linux. The interest has all been in the embedded space.
Hi Anup and Atish, On Thu, Jun 3, 2021 at 2:00 PM Anup Patel <Anup.Patel@wdc.com> wrote: > > > > > -----Original Message----- > > From: Palmer Dabbelt <palmer@dabbelt.com> > > Sent: 03 June 2021 09:43 > > To: guoren@kernel.org > > Cc: anup@brainfault.org; drew@beagleboard.org; Christoph Hellwig > > <hch@lst.de>; Anup Patel <Anup.Patel@wdc.com>; wefu@redhat.com; > > lazyparser@gmail.com; linux-riscv@lists.infradead.org; linux- > > kernel@vger.kernel.org; linux-arch@vger.kernel.org; linux- > > sunxi@lists.linux.dev; guoren@linux.alibaba.com; Paul Walmsley > > <paul.walmsley@sifive.com> > > Subject: Re: [PATCH RFC 0/3] riscv: Add DMA_COHERENT support > > > > On Sat, 29 May 2021 17:30:18 PDT (-0700), Palmer Dabbelt wrote: > > > On Fri, 21 May 2021 17:36:08 PDT (-0700), guoren@kernel.org wrote: > > >> On Wed, May 19, 2021 at 3:15 PM Anup Patel <anup@brainfault.org> > > wrote: > > >>> > > >>> On Wed, May 19, 2021 at 12:24 PM Drew Fustini > > <drew@beagleboard.org> wrote: > > >>> > > > >>> > On Wed, May 19, 2021 at 08:06:17AM +0200, Christoph Hellwig > > wrote: > > >>> > > On Wed, May 19, 2021 at 02:05:00PM +0800, Guo Ren wrote: > > >>> > > > Since the existing RISC-V ISA cannot solve this problem, it is > > >>> > > > better to provide some configuration for the SOC vendor to > > customize. > > >>> > > > > >>> > > We've been talking about this problem for close to five years. > > >>> > > So no, if you don't manage to get the feature into the ISA it > > >>> > > can't be supported. > > >>> > > > >>> > Isn't it a good goal for Linux to support the capabilities present > > >>> > in the SoC that a currently being fab'd? > > >>> > > > >>> > I believe the CMO group only started last year [1] so the RV64GC > > >>> > SoCs that are going into mass production this year would not have > > >>> > had the opporuntiy of utilizing any RISC-V ISA extension for > > >>> > handling cache management. > > >>> > > >>> The current Linux RISC-V policy is to only accept patches for frozen > > >>> or ratified ISA specs. > > >>> (Refer, Documentation/riscv/patch-acceptance.rst) > > >>> > > >>> This means even if emulate CMO instructions in OpenSBI, the Linux > > >>> patches won't be taken by Palmer because CMO specification is still > > >>> in draft stage. > > >> Before CMO specification release, could we use a sbi_ecall to solve > > >> the current problem? This is not against the specification, when CMO > > >> is ready we could let users choose to use the new CMO in Linux. > > >> > > >> From a tech view, CMO trap emulation is the same as sbi_ecall. > > >> > > >>> > > >>> Also, we all know how much time it takes for RISCV international to > > >>> freeze some spec. Judging by that we are looking at another > > >>> 3-4 years at minimum. > > > > > > Sorry for being slow here, this thread got buried. > > > > > > I've been trying to work with a handful of folks at the RISC-V > > > foundation to try and get a subset of the various in-development > > > specifications (some simple CMOs, something about non-caching in the > > > page tables, and some way to prevent speculative accesse from > > > generating coherence traffic that will break non-coherent systems). > > > I'm not sure we can get this together quickly, but I'd prefer to at > > > least try before we jump to taking vendor-specificed behavior here. > > > It's obviously an up-hill battle to try and get specifications through > > > the process and I'm certainly not going to promise it will work, but > > > I'm hoping that the impending need to avoid forking the ISA will be > > > sufficient to get people behind producing some specifications in a timely > > fashion. > > > > > > I wasn't aware than this chip had non-coherent devices until I saw > > > this thread, so we'd been mostly focused on the Beagle V chip. That > > > was in a sense an easier problem because the SiFive IP in it was never > > > designed to have non-coherent devices so we'd have to make anything > > > work via a series of slow workarounds, which would make emulating the > > > eventually standardized behavior reasonable in terms of performance > > > (ie, everything would be super slow so who really cares). > > > > > > I don't think relying on some sort of SBI call for the CMOs whould be > > > such a performance hit that it would prevent these systems from being > > > viable, but assuming you have reasonable performance on your > > > non-cached accesses then that's probably not going to be viable to > > > trap and emulate. At that point it really just becomes silly to > > > pretend that we're still making things work by emulating the > > > eventually ratified behavior, as anyone who actually tries to use this > > > thing to do IO would need out of tree patches. I'm not sure exactly > > > what the plan is for the page table bits in the specification right > > > now, but if you can give me a pointer to some documentation then I'm > > > happy to try and push for something compatible. > > > > > > If we can't make the process work at the foundation then I'd be > > > strongly in favor of just biting the bullet and starting to take > > > vendor-specific code that's been implemented in hardware and is > > > necessarry to make things work acceptably. That's obviously a > > > sub-optimal solution as it'll lead to a bunch of ISA fragmentation, > > > but at least we'll be able to keep the software stack together. > > > > > > Can you tell us when these will be in the hands of users? That's > > > pretty important here, as I don't want to be blocking real users from > > > having their hardware work. IIRC there were some plans to distribute > > > early boards, but it looks like the foundation got involved and I > > > guess I lost the thread at that point. > > > > > > Sorry this is all such a headache, but hopefully we can get things > > > sorted out. > > > > I talked with some of the RISC-V foundation folks, we're not going to have an > > ISA specification for the non-coherent stuff any time soon. I took a look at > > this code and I definately don't want to take it as is, but I'm not opposed to > > taking something that makes the hardware work as long as it's a lot cleaner. > > We've already got two of these non-coherent chips, I'm sure more will come, > > and I'd rather have the extra headaches than make everyone fork the software > > stack. > > Thanks for confirming. The CMO extension is still in early stages so it will > certainly take more time for them. After CMO extension is finalized, it will > take some more time to have actual RISC-V platforms with CMO implemented. > > > > > After talking to Atish it looks like there's likely to be an SBI extension to > > handle the CMOs, which should let us avoid the bulk of the vendor-specific > > behavior in the kernel. I know some people are worried about adding to the > > SBI surface. I'm worried about that too, but that's way better than sticking a > > bunch of vendor-specific instructions into the kernel. The SBI extension > > should make for a straight-forward cache flush implementation in Linux, so > > let's just plan on that getting through quickly (as has been done before). > > Yes, I agree. We can have just a single SBI call which is meant for DMA sync > purpose only which means it will flush/invalidate pages from all cache > levels irrespective of the cache hierarchy (i.e. flush/invalidate to RAM). The > CMO extension might more generic cache operations which can target > any cache level. > > I am already preparing a write-up for SBI DMA sync call in SBI spec. I will > share it with you separately as well. > > > > > Unfortunately we've yet to come up with a way to handle the non-cacheable > > mappings without introducing a degree of vendor-specific behavior or > > seriously impacting performance (mark them as not valid and deal with them > > in the trap handler). I'm not really sure it counts as supporting the hardware > > if it's massively slow, so that really leaves us with vendor-specific mappings as > > the only option to make these chips work. > > A RISC-V platform can have non-cacheable mappings is following possible > ways: > 1) Fixed physical address range as non-cacheable using PMAs > 2) Custom page table attributes > 3) Svpmbt extension being defined by RVI > > Atish and me both think it is possible to have RISC-V specific DMA ops > implementation which can handle all above case. Atish is already working > on DMA ops implementation for RISC-V. Not only DMA ops, but also icache_sync & __vdso_icache_sync. Please have a look at: https://lore.kernel.org/linux-riscv/1622970249-50770-12-git-send-email-guoren@kernel.org/T/#u > > > > > This implementation, which adds some Kconfig entries that control page table > > bits, definately isn't suitable for upstream. Allowing users to set arbitrary > > page table bits will eventually conflict with the standard, and is just going to > > be a mess. It'll also lead to kernels that are only compatible with specific > > designs, which we're trying very hard to avoid. At a bare minimum we'll need > > some way to detect systems with these page table bits before setting them, > > and some description of what the bits actually do so we can reason about > > them. > > Yes, vendor specific Kconfig options are strict NO NO. We can't give-up the > goal of unified kernel image for all platforms. > > Regards, > Anup
Στις 2021-05-20 04:45, Guo Ren έγραψε: > On Wed, May 19, 2021 at 2:53 PM Christoph Hellwig <hch@lst.de> wrote: >> >> On Tue, May 18, 2021 at 11:44:35PM -0700, Drew Fustini wrote: >> > This patch series looks like it might be useful for the StarFive JH7100 >> > [1] [2] too as it has peripherals on a non-coherent interconnect. GMAC, >> > USB and SDIO require that the L2 cache must be manually flushed after >> > DMA operations if the data is intended to be shared with U74 cores [2]. >> >> Not too much, given that the SiFive lineage CPUs have an uncached >> window, that is a totally different way to allocate uncached memory. > It's a very big MIPS smell. What's the attribute of the uncached > window? (uncached + strong-order/ uncached + weak, most vendors still > use AXI interconnect, how to deal with a bufferable attribute?) In > fact, customers' drivers use different ways to deal with DMA memory in > non-coherent SOC. Most riscv SOC vendors are from ARM, so giving them > the same way in DMA memory is a smart choice. So using PTE attributes > is more suitable. > > See: > https://github.com/riscv/virtual-memory/blob/main/specs/611-virtual-memory-diff.pdf > 4.4.1 > The draft supports custom attribute bits in PTE. > Not only it doesn't support custom attributes on PTEs: "Bits63–54 are reserved for future standard use and must be zeroed by software for forward compatibility." It also goes further to say that: "if any of these bits are set, a page-fault exception is raised"
On Mon, Jun 7, 2021 at 2:14 AM Nick Kossifidis <mick@ics.forth.gr> wrote: > > Στις 2021-05-20 04:45, Guo Ren έγραψε: > > On Wed, May 19, 2021 at 2:53 PM Christoph Hellwig <hch@lst.de> wrote: > >> > >> On Tue, May 18, 2021 at 11:44:35PM -0700, Drew Fustini wrote: > >> > This patch series looks like it might be useful for the StarFive JH7100 > >> > [1] [2] too as it has peripherals on a non-coherent interconnect. GMAC, > >> > USB and SDIO require that the L2 cache must be manually flushed after > >> > DMA operations if the data is intended to be shared with U74 cores [2]. > >> > >> Not too much, given that the SiFive lineage CPUs have an uncached > >> window, that is a totally different way to allocate uncached memory. > > It's a very big MIPS smell. What's the attribute of the uncached > > window? (uncached + strong-order/ uncached + weak, most vendors still > > use AXI interconnect, how to deal with a bufferable attribute?) In > > fact, customers' drivers use different ways to deal with DMA memory in > > non-coherent SOC. Most riscv SOC vendors are from ARM, so giving them > > the same way in DMA memory is a smart choice. So using PTE attributes > > is more suitable. > > > > See: > > https://github.com/riscv/virtual-memory/blob/main/specs/611-virtual-memory-diff.pdf > > 4.4.1 > > The draft supports custom attribute bits in PTE. > > > > Not only it doesn't support custom attributes on PTEs: > > "Bits63–54 are reserved for future standard use and must be zeroed by > software for forward compatibility." > > It also goes further to say that: > > "if any of these bits are set, a page-fault exception is raised" In RISC-V VM TG, A C-bit discussion is raised. So it's a comm idea to support it. Let Linux support custom PTE attributes won't get any side effect in practice. IMO: We needn't waste a bit in PTE, but the custom idea in PTE reserved bits is necessary. Because Allwinner D1 needs custom PTE bits in reserved bits to work around. So I recommend just remove the "C" bit in PTE, but allow vendors to define their own PTE attributes in reserved bits. I've found a way to compact different PTE attributes of different vendors during the Linux boot stage. That means we still could use One Image for all vendors in Linux
Στις 2021-06-07 03:04, Guo Ren έγραψε: > On Mon, Jun 7, 2021 at 2:14 AM Nick Kossifidis <mick@ics.forth.gr> > wrote: >> >> Στις 2021-05-20 04:45, Guo Ren έγραψε: >> > On Wed, May 19, 2021 at 2:53 PM Christoph Hellwig <hch@lst.de> wrote: >> >> >> >> On Tue, May 18, 2021 at 11:44:35PM -0700, Drew Fustini wrote: >> >> > This patch series looks like it might be useful for the StarFive JH7100 >> >> > [1] [2] too as it has peripherals on a non-coherent interconnect. GMAC, >> >> > USB and SDIO require that the L2 cache must be manually flushed after >> >> > DMA operations if the data is intended to be shared with U74 cores [2]. >> >> >> >> Not too much, given that the SiFive lineage CPUs have an uncached >> >> window, that is a totally different way to allocate uncached memory. >> > It's a very big MIPS smell. What's the attribute of the uncached >> > window? (uncached + strong-order/ uncached + weak, most vendors still >> > use AXI interconnect, how to deal with a bufferable attribute?) In >> > fact, customers' drivers use different ways to deal with DMA memory in >> > non-coherent SOC. Most riscv SOC vendors are from ARM, so giving them >> > the same way in DMA memory is a smart choice. So using PTE attributes >> > is more suitable. >> > >> > See: >> > https://github.com/riscv/virtual-memory/blob/main/specs/611-virtual-memory-diff.pdf >> > 4.4.1 >> > The draft supports custom attribute bits in PTE. >> > >> >> Not only it doesn't support custom attributes on PTEs: >> >> "Bits63–54 are reserved for future standard use and must be zeroed by >> software for forward compatibility." >> >> It also goes further to say that: >> >> "if any of these bits are set, a page-fault exception is raised" > > In RISC-V VM TG, A C-bit discussion is raised. So it's a comm idea to > support it. > The C-bit was recently dropped, there is a new proposal for Page Based Memory Attributes (PBMT) that we can work on / push for. > Let Linux support custom PTE attributes won't get any side effect in > practice. > > IMO: > We needn't waste a bit in PTE, but the custom idea in PTE reserved > bits is necessary. Because Allwinner D1 needs custom PTE bits in > reserved bits to work around. > So I recommend just remove the "C" bit in PTE, but allow vendors to > define their own PTE attributes in reserved bits. I've found a way to > compact different PTE attributes of different vendors during the Linux > boot stage. That means we still could use One Image for all vendors in > Linux The spec is clear, those attributes are for standard use only, not for custom/platform use. It's one thing to implement custom CMOs where the ISA doesn't have anything for it and doesn't prevent you to do so (so you are not violating anything, it's just a custom extension), and we can hide them behind SBI calls etc, and another to violate the current Privilege Spec by using bits on PTEs that are reserved for standard use only. The intentions of the VM TG are clear, not only those bits are reserved but if software uses them the hw will result a page fault in future revisions of the spec. What's the idea here, to support non-compliant implementations on mainline ? I'm sure you have a good idea on how to make this work, but as long as it violates the spec it can't go in IMHO.
On Mon, Jun 7, 2021 at 10:16 AM Nick Kossifidis <mick@ics.forth.gr> wrote: > > Στις 2021-06-07 03:04, Guo Ren έγραψε: > > On Mon, Jun 7, 2021 at 2:14 AM Nick Kossifidis <mick@ics.forth.gr> > > wrote: > >> > >> Στις 2021-05-20 04:45, Guo Ren έγραψε: > >> > On Wed, May 19, 2021 at 2:53 PM Christoph Hellwig <hch@lst.de> wrote: > >> >> > >> >> On Tue, May 18, 2021 at 11:44:35PM -0700, Drew Fustini wrote: > >> >> > This patch series looks like it might be useful for the StarFive JH7100 > >> >> > [1] [2] too as it has peripherals on a non-coherent interconnect. GMAC, > >> >> > USB and SDIO require that the L2 cache must be manually flushed after > >> >> > DMA operations if the data is intended to be shared with U74 cores [2]. > >> >> > >> >> Not too much, given that the SiFive lineage CPUs have an uncached > >> >> window, that is a totally different way to allocate uncached memory. > >> > It's a very big MIPS smell. What's the attribute of the uncached > >> > window? (uncached + strong-order/ uncached + weak, most vendors still > >> > use AXI interconnect, how to deal with a bufferable attribute?) In > >> > fact, customers' drivers use different ways to deal with DMA memory in > >> > non-coherent SOC. Most riscv SOC vendors are from ARM, so giving them > >> > the same way in DMA memory is a smart choice. So using PTE attributes > >> > is more suitable. > >> > > >> > See: > >> > https://github.com/riscv/virtual-memory/blob/main/specs/611-virtual-memory-diff.pdf > >> > 4.4.1 > >> > The draft supports custom attribute bits in PTE. > >> > > >> > >> Not only it doesn't support custom attributes on PTEs: > >> > >> "Bits63–54 are reserved for future standard use and must be zeroed by > >> software for forward compatibility." > >> > >> It also goes further to say that: > >> > >> "if any of these bits are set, a page-fault exception is raised" > > > > In RISC-V VM TG, A C-bit discussion is raised. So it's a comm idea to > > support it. > > > > The C-bit was recently dropped, there is a new proposal for Page Based > Memory Attributes (PBMT) that we can work on / push for. C-bit still needs discussion, we shouldn't drop it directly. > > > Let Linux support custom PTE attributes won't get any side effect in > > practice. > > > > IMO: > > We needn't waste a bit in PTE, but the custom idea in PTE reserved > > bits is necessary. Because Allwinner D1 needs custom PTE bits in > > reserved bits to work around. > > So I recommend just remove the "C" bit in PTE, but allow vendors to > > define their own PTE attributes in reserved bits. I've found a way to > > compact different PTE attributes of different vendors during the Linux > > boot stage. That means we still could use One Image for all vendors in > > Linux > > The spec is clear, those attributes are for standard use only, not for > custom/platform use. It's one thing to implement custom CMOs where the > ISA doesn't have anything for it and doesn't prevent you to do so (so > you are not violating anything, it's just a custom extension), and we > can hide them behind SBI calls etc, and another to violate the current > Privilege Spec by using bits on PTEs that are reserved for standard use > only. The intentions of the VM TG are clear, not only those bits are > reserved but if software uses them the hw will result a page fault in > future revisions of the spec. What's the idea here, to support > non-compliant implementations on mainline ? Raise a page fault won't solve anything. We still need access to the page with proper performance. > I'm sure you have a good > idea on how to make this work, but as long as it violates the spec it > can't go in IMHO. We need PTEs to provide a non-coherency solution, and only CMO instructions are not enough. We can't modify so many Linux drivers to fit it. From Linux non-coherency view, we need: - Non-cache + Strong Order PTE attributes to deal with drivers' DMA descriptors - Non-cache + weak order to deal with framebuffer drivers - CMO dma_sync to sync cache with DMA devices - Userspace icache_sync solution, which prevents calls to S-mode with IPI fence.i. (Necessary to JIT java scenarios.) All above are not in spec, but the real chips are done. (Actually, these have been talked about for more than five years, we still haven't the uniform idea.) The idea of C-bit is really important for us which prevents our chips violates the spec.
> -----Original Message----- > From: Guo Ren <guoren@kernel.org> > Sent: 06 June 2021 22:42 > To: Anup Patel <Anup.Patel@wdc.com>; Atish Patra <atishp@atishpatra.org> > Cc: Palmer Dabbelt <palmer@dabbelt.com>; anup@brainfault.org; > drew@beagleboard.org; Christoph Hellwig <hch@lst.de>; wefu@redhat.com; > lazyparser@gmail.com; linux-riscv@lists.infradead.org; linux- > kernel@vger.kernel.org; linux-arch@vger.kernel.org; linux- > sunxi@lists.linux.dev; guoren@linux.alibaba.com; Paul Walmsley > <paul.walmsley@sifive.com> > Subject: Re: [PATCH RFC 0/3] riscv: Add DMA_COHERENT support > > Hi Anup and Atish, > > On Thu, Jun 3, 2021 at 2:00 PM Anup Patel <Anup.Patel@wdc.com> wrote: > > > > > > > > > -----Original Message----- > > > From: Palmer Dabbelt <palmer@dabbelt.com> > > > Sent: 03 June 2021 09:43 > > > To: guoren@kernel.org > > > Cc: anup@brainfault.org; drew@beagleboard.org; Christoph Hellwig > > > <hch@lst.de>; Anup Patel <Anup.Patel@wdc.com>; wefu@redhat.com; > > > lazyparser@gmail.com; linux-riscv@lists.infradead.org; linux- > > > kernel@vger.kernel.org; linux-arch@vger.kernel.org; linux- > > > sunxi@lists.linux.dev; guoren@linux.alibaba.com; Paul Walmsley > > > <paul.walmsley@sifive.com> > > > Subject: Re: [PATCH RFC 0/3] riscv: Add DMA_COHERENT support > > > > > > On Sat, 29 May 2021 17:30:18 PDT (-0700), Palmer Dabbelt wrote: > > > > On Fri, 21 May 2021 17:36:08 PDT (-0700), guoren@kernel.org wrote: > > > >> On Wed, May 19, 2021 at 3:15 PM Anup Patel <anup@brainfault.org> > > > wrote: > > > >>> > > > >>> On Wed, May 19, 2021 at 12:24 PM Drew Fustini > > > <drew@beagleboard.org> wrote: > > > >>> > > > > >>> > On Wed, May 19, 2021 at 08:06:17AM +0200, Christoph Hellwig > > > wrote: > > > >>> > > On Wed, May 19, 2021 at 02:05:00PM +0800, Guo Ren wrote: > > > >>> > > > Since the existing RISC-V ISA cannot solve this problem, > > > >>> > > > it is better to provide some configuration for the SOC > > > >>> > > > vendor to > > > customize. > > > >>> > > > > > >>> > > We've been talking about this problem for close to five years. > > > >>> > > So no, if you don't manage to get the feature into the ISA > > > >>> > > it can't be supported. > > > >>> > > > > >>> > Isn't it a good goal for Linux to support the capabilities > > > >>> > present in the SoC that a currently being fab'd? > > > >>> > > > > >>> > I believe the CMO group only started last year [1] so the > > > >>> > RV64GC SoCs that are going into mass production this year > > > >>> > would not have had the opporuntiy of utilizing any RISC-V ISA > > > >>> > extension for handling cache management. > > > >>> > > > >>> The current Linux RISC-V policy is to only accept patches for > > > >>> frozen or ratified ISA specs. > > > >>> (Refer, Documentation/riscv/patch-acceptance.rst) > > > >>> > > > >>> This means even if emulate CMO instructions in OpenSBI, the > > > >>> Linux patches won't be taken by Palmer because CMO specification > > > >>> is still in draft stage. > > > >> Before CMO specification release, could we use a sbi_ecall to > > > >> solve the current problem? This is not against the specification, > > > >> when CMO is ready we could let users choose to use the new CMO in > Linux. > > > >> > > > >> From a tech view, CMO trap emulation is the same as sbi_ecall. > > > >> > > > >>> > > > >>> Also, we all know how much time it takes for RISCV international > > > >>> to freeze some spec. Judging by that we are looking at another > > > >>> 3-4 years at minimum. > > > > > > > > Sorry for being slow here, this thread got buried. > > > > > > > > I've been trying to work with a handful of folks at the RISC-V > > > > foundation to try and get a subset of the various in-development > > > > specifications (some simple CMOs, something about non-caching in > > > > the page tables, and some way to prevent speculative accesse from > > > > generating coherence traffic that will break non-coherent systems). > > > > I'm not sure we can get this together quickly, but I'd prefer to > > > > at least try before we jump to taking vendor-specificed behavior here. > > > > It's obviously an up-hill battle to try and get specifications > > > > through the process and I'm certainly not going to promise it will > > > > work, but I'm hoping that the impending need to avoid forking the > > > > ISA will be sufficient to get people behind producing some > > > > specifications in a timely > > > fashion. > > > > > > > > I wasn't aware than this chip had non-coherent devices until I saw > > > > this thread, so we'd been mostly focused on the Beagle V chip. > > > > That was in a sense an easier problem because the SiFive IP in it > > > > was never designed to have non-coherent devices so we'd have to > > > > make anything work via a series of slow workarounds, which would > > > > make emulating the eventually standardized behavior reasonable in > > > > terms of performance (ie, everything would be super slow so who really > cares). > > > > > > > > I don't think relying on some sort of SBI call for the CMOs whould > > > > be such a performance hit that it would prevent these systems from > > > > being viable, but assuming you have reasonable performance on your > > > > non-cached accesses then that's probably not going to be viable to > > > > trap and emulate. At that point it really just becomes silly to > > > > pretend that we're still making things work by emulating the > > > > eventually ratified behavior, as anyone who actually tries to use > > > > this thing to do IO would need out of tree patches. I'm not sure > > > > exactly what the plan is for the page table bits in the > > > > specification right now, but if you can give me a pointer to some > > > > documentation then I'm happy to try and push for something > compatible. > > > > > > > > If we can't make the process work at the foundation then I'd be > > > > strongly in favor of just biting the bullet and starting to take > > > > vendor-specific code that's been implemented in hardware and is > > > > necessarry to make things work acceptably. That's obviously a > > > > sub-optimal solution as it'll lead to a bunch of ISA > > > > fragmentation, but at least we'll be able to keep the software stack > together. > > > > > > > > Can you tell us when these will be in the hands of users? That's > > > > pretty important here, as I don't want to be blocking real users > > > > from having their hardware work. IIRC there were some plans to > > > > distribute early boards, but it looks like the foundation got > > > > involved and I guess I lost the thread at that point. > > > > > > > > Sorry this is all such a headache, but hopefully we can get things > > > > sorted out. > > > > > > I talked with some of the RISC-V foundation folks, we're not going > > > to have an ISA specification for the non-coherent stuff any time > > > soon. I took a look at this code and I definately don't want to > > > take it as is, but I'm not opposed to taking something that makes the > hardware work as long as it's a lot cleaner. > > > We've already got two of these non-coherent chips, I'm sure more > > > will come, and I'd rather have the extra headaches than make > > > everyone fork the software stack. > > > > Thanks for confirming. The CMO extension is still in early stages so > > it will certainly take more time for them. After CMO extension is > > finalized, it will take some more time to have actual RISC-V platforms with > CMO implemented. > > > > > > > > After talking to Atish it looks like there's likely to be an SBI > > > extension to handle the CMOs, which should let us avoid the bulk of > > > the vendor-specific behavior in the kernel. I know some people are > > > worried about adding to the SBI surface. I'm worried about that > > > too, but that's way better than sticking a bunch of vendor-specific > > > instructions into the kernel. The SBI extension should make for a > > > straight-forward cache flush implementation in Linux, so let's just plan on > that getting through quickly (as has been done before). > > > > Yes, I agree. We can have just a single SBI call which is meant for > > DMA sync purpose only which means it will flush/invalidate pages from > > all cache levels irrespective of the cache hierarchy (i.e. > > flush/invalidate to RAM). The CMO extension might more generic cache > > operations which can target any cache level. > > > > I am already preparing a write-up for SBI DMA sync call in SBI spec. I > > will share it with you separately as well. > > > > > > > > Unfortunately we've yet to come up with a way to handle the > > > non-cacheable mappings without introducing a degree of > > > vendor-specific behavior or seriously impacting performance (mark > > > them as not valid and deal with them in the trap handler). I'm not > > > really sure it counts as supporting the hardware if it's massively > > > slow, so that really leaves us with vendor-specific mappings as the only > option to make these chips work. > > > > A RISC-V platform can have non-cacheable mappings is following > > possible > > ways: > > 1) Fixed physical address range as non-cacheable using PMAs > > 2) Custom page table attributes > > 3) Svpmbt extension being defined by RVI > > > > Atish and me both think it is possible to have RISC-V specific DMA ops > > implementation which can handle all above case. Atish is already > > working on DMA ops implementation for RISC-V. > Not only DMA ops, but also icache_sync & __vdso_icache_sync. Please have a > look at: > https://lore.kernel.org/linux-riscv/1622970249-50770-12-git-send-email- > guoren@kernel.org/T/#u The icache_sync and __vdso_icache_sync will have to be addressed differently. The SBI DMA sync extension cannot address this. It seems Allwinner D1 have more non-standard stuff: 1) Custom PTE bits for IO-coherent access 2) Custom data cache flush/invalidate for DMA sync 3) Custom icache flush/invalidate Other hand, BeagleV has only two problems: 1) Custom physical address range for IO-coherent access 2) Custom L2 cache flush/invalidate for DMA sync From above #2, can be solved by SBI DMA sync call and Linux DMA ops for both BeagleV and Allwinner D1 On BeagleV, issue #1 can be solved using "dma-ranges". On Allwinner D1, issues #1 and #3 need to be addressed separately. I think supporting BeagleV in upstream Linux is relatively easy compared to Allwinner D1. @Guo, please check if you can reserve dedicated physical address range for IO-coherent access (just like BeagleV). If yes, then we can tackle issue #1 for Allwinner D1 using "dma-ranges" DT property. Regards, Anup > > > > > > > > > > This implementation, which adds some Kconfig entries that control > > > page table bits, definately isn't suitable for upstream. Allowing > > > users to set arbitrary page table bits will eventually conflict with > > > the standard, and is just going to be a mess. It'll also lead to > > > kernels that are only compatible with specific designs, which we're > > > trying very hard to avoid. At a bare minimum we'll need some way to > > > detect systems with these page table bits before setting them, and > > > some description of what the bits actually do so we can reason about > them. > > > > Yes, vendor specific Kconfig options are strict NO NO. We can't > > give-up the goal of unified kernel image for all platforms. > > > > Regards, > > Anup > > > > -- > Best Regards > Guo Ren > > ML: https://lore.kernel.org/linux-csky/
Hi Anup, On Mon, Jun 7, 2021 at 11:38 AM Anup Patel <Anup.Patel@wdc.com> wrote: > > > > > -----Original Message----- > > From: Guo Ren <guoren@kernel.org> > > Sent: 06 June 2021 22:42 > > To: Anup Patel <Anup.Patel@wdc.com>; Atish Patra <atishp@atishpatra.org> > > Cc: Palmer Dabbelt <palmer@dabbelt.com>; anup@brainfault.org; > > drew@beagleboard.org; Christoph Hellwig <hch@lst.de>; wefu@redhat.com; > > lazyparser@gmail.com; linux-riscv@lists.infradead.org; linux- > > kernel@vger.kernel.org; linux-arch@vger.kernel.org; linux- > > sunxi@lists.linux.dev; guoren@linux.alibaba.com; Paul Walmsley > > <paul.walmsley@sifive.com> > > Subject: Re: [PATCH RFC 0/3] riscv: Add DMA_COHERENT support > > > > Hi Anup and Atish, > > > > On Thu, Jun 3, 2021 at 2:00 PM Anup Patel <Anup.Patel@wdc.com> wrote: > > > > > > > > > > > > > -----Original Message----- > > > > From: Palmer Dabbelt <palmer@dabbelt.com> > > > > Sent: 03 June 2021 09:43 > > > > To: guoren@kernel.org > > > > Cc: anup@brainfault.org; drew@beagleboard.org; Christoph Hellwig > > > > <hch@lst.de>; Anup Patel <Anup.Patel@wdc.com>; wefu@redhat.com; > > > > lazyparser@gmail.com; linux-riscv@lists.infradead.org; linux- > > > > kernel@vger.kernel.org; linux-arch@vger.kernel.org; linux- > > > > sunxi@lists.linux.dev; guoren@linux.alibaba.com; Paul Walmsley > > > > <paul.walmsley@sifive.com> > > > > Subject: Re: [PATCH RFC 0/3] riscv: Add DMA_COHERENT support > > > > > > > > On Sat, 29 May 2021 17:30:18 PDT (-0700), Palmer Dabbelt wrote: > > > > > On Fri, 21 May 2021 17:36:08 PDT (-0700), guoren@kernel.org wrote: > > > > >> On Wed, May 19, 2021 at 3:15 PM Anup Patel <anup@brainfault.org> > > > > wrote: > > > > >>> > > > > >>> On Wed, May 19, 2021 at 12:24 PM Drew Fustini > > > > <drew@beagleboard.org> wrote: > > > > >>> > > > > > >>> > On Wed, May 19, 2021 at 08:06:17AM +0200, Christoph Hellwig > > > > wrote: > > > > >>> > > On Wed, May 19, 2021 at 02:05:00PM +0800, Guo Ren wrote: > > > > >>> > > > Since the existing RISC-V ISA cannot solve this problem, > > > > >>> > > > it is better to provide some configuration for the SOC > > > > >>> > > > vendor to > > > > customize. > > > > >>> > > > > > > >>> > > We've been talking about this problem for close to five years. > > > > >>> > > So no, if you don't manage to get the feature into the ISA > > > > >>> > > it can't be supported. > > > > >>> > > > > > >>> > Isn't it a good goal for Linux to support the capabilities > > > > >>> > present in the SoC that a currently being fab'd? > > > > >>> > > > > > >>> > I believe the CMO group only started last year [1] so the > > > > >>> > RV64GC SoCs that are going into mass production this year > > > > >>> > would not have had the opporuntiy of utilizing any RISC-V ISA > > > > >>> > extension for handling cache management. > > > > >>> > > > > >>> The current Linux RISC-V policy is to only accept patches for > > > > >>> frozen or ratified ISA specs. > > > > >>> (Refer, Documentation/riscv/patch-acceptance.rst) > > > > >>> > > > > >>> This means even if emulate CMO instructions in OpenSBI, the > > > > >>> Linux patches won't be taken by Palmer because CMO specification > > > > >>> is still in draft stage. > > > > >> Before CMO specification release, could we use a sbi_ecall to > > > > >> solve the current problem? This is not against the specification, > > > > >> when CMO is ready we could let users choose to use the new CMO in > > Linux. > > > > >> > > > > >> From a tech view, CMO trap emulation is the same as sbi_ecall. > > > > >> > > > > >>> > > > > >>> Also, we all know how much time it takes for RISCV international > > > > >>> to freeze some spec. Judging by that we are looking at another > > > > >>> 3-4 years at minimum. > > > > > > > > > > Sorry for being slow here, this thread got buried. > > > > > > > > > > I've been trying to work with a handful of folks at the RISC-V > > > > > foundation to try and get a subset of the various in-development > > > > > specifications (some simple CMOs, something about non-caching in > > > > > the page tables, and some way to prevent speculative accesse from > > > > > generating coherence traffic that will break non-coherent systems). > > > > > I'm not sure we can get this together quickly, but I'd prefer to > > > > > at least try before we jump to taking vendor-specificed behavior here. > > > > > It's obviously an up-hill battle to try and get specifications > > > > > through the process and I'm certainly not going to promise it will > > > > > work, but I'm hoping that the impending need to avoid forking the > > > > > ISA will be sufficient to get people behind producing some > > > > > specifications in a timely > > > > fashion. > > > > > > > > > > I wasn't aware than this chip had non-coherent devices until I saw > > > > > this thread, so we'd been mostly focused on the Beagle V chip. > > > > > That was in a sense an easier problem because the SiFive IP in it > > > > > was never designed to have non-coherent devices so we'd have to > > > > > make anything work via a series of slow workarounds, which would > > > > > make emulating the eventually standardized behavior reasonable in > > > > > terms of performance (ie, everything would be super slow so who really > > cares). > > > > > > > > > > I don't think relying on some sort of SBI call for the CMOs whould > > > > > be such a performance hit that it would prevent these systems from > > > > > being viable, but assuming you have reasonable performance on your > > > > > non-cached accesses then that's probably not going to be viable to > > > > > trap and emulate. At that point it really just becomes silly to > > > > > pretend that we're still making things work by emulating the > > > > > eventually ratified behavior, as anyone who actually tries to use > > > > > this thing to do IO would need out of tree patches. I'm not sure > > > > > exactly what the plan is for the page table bits in the > > > > > specification right now, but if you can give me a pointer to some > > > > > documentation then I'm happy to try and push for something > > compatible. > > > > > > > > > > If we can't make the process work at the foundation then I'd be > > > > > strongly in favor of just biting the bullet and starting to take > > > > > vendor-specific code that's been implemented in hardware and is > > > > > necessarry to make things work acceptably. That's obviously a > > > > > sub-optimal solution as it'll lead to a bunch of ISA > > > > > fragmentation, but at least we'll be able to keep the software stack > > together. > > > > > > > > > > Can you tell us when these will be in the hands of users? That's > > > > > pretty important here, as I don't want to be blocking real users > > > > > from having their hardware work. IIRC there were some plans to > > > > > distribute early boards, but it looks like the foundation got > > > > > involved and I guess I lost the thread at that point. > > > > > > > > > > Sorry this is all such a headache, but hopefully we can get things > > > > > sorted out. > > > > > > > > I talked with some of the RISC-V foundation folks, we're not going > > > > to have an ISA specification for the non-coherent stuff any time > > > > soon. I took a look at this code and I definately don't want to > > > > take it as is, but I'm not opposed to taking something that makes the > > hardware work as long as it's a lot cleaner. > > > > We've already got two of these non-coherent chips, I'm sure more > > > > will come, and I'd rather have the extra headaches than make > > > > everyone fork the software stack. > > > > > > Thanks for confirming. The CMO extension is still in early stages so > > > it will certainly take more time for them. After CMO extension is > > > finalized, it will take some more time to have actual RISC-V platforms with > > CMO implemented. > > > > > > > > > > > After talking to Atish it looks like there's likely to be an SBI > > > > extension to handle the CMOs, which should let us avoid the bulk of > > > > the vendor-specific behavior in the kernel. I know some people are > > > > worried about adding to the SBI surface. I'm worried about that > > > > too, but that's way better than sticking a bunch of vendor-specific > > > > instructions into the kernel. The SBI extension should make for a > > > > straight-forward cache flush implementation in Linux, so let's just plan on > > that getting through quickly (as has been done before). > > > > > > Yes, I agree. We can have just a single SBI call which is meant for > > > DMA sync purpose only which means it will flush/invalidate pages from > > > all cache levels irrespective of the cache hierarchy (i.e. > > > flush/invalidate to RAM). The CMO extension might more generic cache > > > operations which can target any cache level. > > > > > > I am already preparing a write-up for SBI DMA sync call in SBI spec. I > > > will share it with you separately as well. > > > > > > > > > > > Unfortunately we've yet to come up with a way to handle the > > > > non-cacheable mappings without introducing a degree of > > > > vendor-specific behavior or seriously impacting performance (mark > > > > them as not valid and deal with them in the trap handler). I'm not > > > > really sure it counts as supporting the hardware if it's massively > > > > slow, so that really leaves us with vendor-specific mappings as the only > > option to make these chips work. > > > > > > A RISC-V platform can have non-cacheable mappings is following > > > possible > > > ways: > > > 1) Fixed physical address range as non-cacheable using PMAs > > > 2) Custom page table attributes > > > 3) Svpmbt extension being defined by RVI > > > > > > Atish and me both think it is possible to have RISC-V specific DMA ops > > > implementation which can handle all above case. Atish is already > > > working on DMA ops implementation for RISC-V. > > Not only DMA ops, but also icache_sync & __vdso_icache_sync. Please have a > > look at: > > https://lore.kernel.org/linux-riscv/1622970249-50770-12-git-send-email- > > guoren@kernel.org/T/#u > > The icache_sync and __vdso_icache_sync will have to be addressed > differently. The SBI DMA sync extension cannot address this. Agree > > It seems Allwinner D1 have more non-standard stuff: > 1) Custom PTE bits for IO-coherent access > 2) Custom data cache flush/invalidate for DMA sync > 3) Custom icache flush/invalidate Yes, but 3) is a performance optimization, not critical for running. > > Other hand, BeagleV has only two problems: > 1) Custom physical address range for IO-coherent access > 2) Custom L2 cache flush/invalidate for DMA sync https://github.com/starfive-tech/linux/commit/d4c4044c08134dca8e5eaaeb6d3faf97dc453b6d Currently, they still use DMA sync with DMA descriptor, are you sure they have minor memory physical address. > > From above #2, can be solved by SBI DMA sync call and > Linux DMA ops for both BeagleV and Allwinner D1 > > On BeagleV, issue #1 can be solved using "dma-ranges". > > On Allwinner D1, issues #1 and #3 need to be addressed > separately. > > I think supporting BeagleV in upstream Linux is relatively > easy compared to Allwinner D1. > > @Guo, please check if you can reserve dedicated > physical address range for IO-coherent access (just like > BeagleV). If yes, then we can tackle issue #1 for Allwinner > D1 using "dma-ranges" DT property. There is no dedicated physical address range for IO-coherent access in D1. But the solution you mentioned couldn't solve all requirements. Only one mirror physical address range is not enough, we need at least three (Normal, DMA desc, frame buffer). And that will triple the memory physical address which can't be accepted by our users from the hardware design cost view. "dma-ranges" DT property is a big early MIPS smell. ARM SOC users can't accept it. (They just say replace the CPU, but don't touch anything other.) PTE attributes are the non-coherent solution for many years. MIPS also follows that now: ref arch/mips/include/asm/pgtable-bits.h & arch/mips/include/asm/pgtable.h #ifndef _CACHE_CACHABLE_NO_WA #define _CACHE_CACHABLE_NO_WA (0<<_CACHE_SHIFT) #endif #ifndef _CACHE_CACHABLE_WA #define _CACHE_CACHABLE_WA (1<<_CACHE_SHIFT) #endif #ifndef _CACHE_UNCACHED #define _CACHE_UNCACHED (2<<_CACHE_SHIFT) #endif #ifndef _CACHE_CACHABLE_NONCOHERENT #define _CACHE_CACHABLE_NONCOHERENT (3<<_CACHE_SHIFT) #endif #ifndef _CACHE_CACHABLE_CE #define _CACHE_CACHABLE_CE (4<<_CACHE_SHIFT) #endif #ifndef _CACHE_CACHABLE_COW #define _CACHE_CACHABLE_COW (5<<_CACHE_SHIFT) #endif #ifndef _CACHE_CACHABLE_CUW #define _CACHE_CACHABLE_CUW (6<<_CACHE_SHIFT) #endif #ifndef _CACHE_UNCACHED_ACCELERATED #define _CACHE_UNCACHED_ACCELERATED (7<<_CACHE_SHIFT) We can't force our users to double/triplicate their physical memory regions. > > Regards, > Anup > > > > > > > > > > > > > > > > This implementation, which adds some Kconfig entries that control > > > > page table bits, definately isn't suitable for upstream. Allowing > > > > users to set arbitrary page table bits will eventually conflict with > > > > the standard, and is just going to be a mess. It'll also lead to > > > > kernels that are only compatible with specific designs, which we're > > > > trying very hard to avoid. At a bare minimum we'll need some way to > > > > detect systems with these page table bits before setting them, and > > > > some description of what the bits actually do so we can reason about > > them. > > > > > > Yes, vendor specific Kconfig options are strict NO NO. We can't > > > give-up the goal of unified kernel image for all platforms. > > > > > > Regards, > > > Anup > > > > > > > > -- > > Best Regards > > Guo Ren > > > > ML: https://lore.kernel.org/linux-csky/
> -----Original Message----- > From: Guo Ren <guoren@kernel.org> > Sent: 07 June 2021 09:52 > To: Anup Patel <Anup.Patel@wdc.com> > Cc: Atish Patra <atishp@atishpatra.org>; Palmer Dabbelt > <palmer@dabbelt.com>; anup@brainfault.org; drew@beagleboard.org; > Christoph Hellwig <hch@lst.de>; wefu@redhat.com; lazyparser@gmail.com; > linux-riscv@lists.infradead.org; linux-kernel@vger.kernel.org; linux- > arch@vger.kernel.org; linux-sunxi@lists.linux.dev; guoren@linux.alibaba.com; > Paul Walmsley <paul.walmsley@sifive.com> > Subject: Re: [PATCH RFC 0/3] riscv: Add DMA_COHERENT support > > Hi Anup, > > On Mon, Jun 7, 2021 at 11:38 AM Anup Patel <Anup.Patel@wdc.com> wrote: > > > > > > > > > -----Original Message----- > > > From: Guo Ren <guoren@kernel.org> > > > Sent: 06 June 2021 22:42 > > > To: Anup Patel <Anup.Patel@wdc.com>; Atish Patra > > > <atishp@atishpatra.org> > > > Cc: Palmer Dabbelt <palmer@dabbelt.com>; anup@brainfault.org; > > > drew@beagleboard.org; Christoph Hellwig <hch@lst.de>; > > > wefu@redhat.com; lazyparser@gmail.com; > > > linux-riscv@lists.infradead.org; linux- kernel@vger.kernel.org; > > > linux-arch@vger.kernel.org; linux- sunxi@lists.linux.dev; > > > guoren@linux.alibaba.com; Paul Walmsley <paul.walmsley@sifive.com> > > > Subject: Re: [PATCH RFC 0/3] riscv: Add DMA_COHERENT support > > > > > > Hi Anup and Atish, > > > > > > On Thu, Jun 3, 2021 at 2:00 PM Anup Patel <Anup.Patel@wdc.com> > wrote: > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > From: Palmer Dabbelt <palmer@dabbelt.com> > > > > > Sent: 03 June 2021 09:43 > > > > > To: guoren@kernel.org > > > > > Cc: anup@brainfault.org; drew@beagleboard.org; Christoph Hellwig > > > > > <hch@lst.de>; Anup Patel <Anup.Patel@wdc.com>; > wefu@redhat.com; > > > > > lazyparser@gmail.com; linux-riscv@lists.infradead.org; linux- > > > > > kernel@vger.kernel.org; linux-arch@vger.kernel.org; linux- > > > > > sunxi@lists.linux.dev; guoren@linux.alibaba.com; Paul Walmsley > > > > > <paul.walmsley@sifive.com> > > > > > Subject: Re: [PATCH RFC 0/3] riscv: Add DMA_COHERENT support > > > > > > > > > > On Sat, 29 May 2021 17:30:18 PDT (-0700), Palmer Dabbelt wrote: > > > > > > On Fri, 21 May 2021 17:36:08 PDT (-0700), guoren@kernel.org > wrote: > > > > > >> On Wed, May 19, 2021 at 3:15 PM Anup Patel > > > > > >> <anup@brainfault.org> > > > > > wrote: > > > > > >>> > > > > > >>> On Wed, May 19, 2021 at 12:24 PM Drew Fustini > > > > > <drew@beagleboard.org> wrote: > > > > > >>> > > > > > > >>> > On Wed, May 19, 2021 at 08:06:17AM +0200, Christoph > > > > > >>> > Hellwig > > > > > wrote: > > > > > >>> > > On Wed, May 19, 2021 at 02:05:00PM +0800, Guo Ren > wrote: > > > > > >>> > > > Since the existing RISC-V ISA cannot solve this > > > > > >>> > > > problem, it is better to provide some configuration > > > > > >>> > > > for the SOC vendor to > > > > > customize. > > > > > >>> > > > > > > > >>> > > We've been talking about this problem for close to five years. > > > > > >>> > > So no, if you don't manage to get the feature into the > > > > > >>> > > ISA it can't be supported. > > > > > >>> > > > > > > >>> > Isn't it a good goal for Linux to support the capabilities > > > > > >>> > present in the SoC that a currently being fab'd? > > > > > >>> > > > > > > >>> > I believe the CMO group only started last year [1] so the > > > > > >>> > RV64GC SoCs that are going into mass production this year > > > > > >>> > would not have had the opporuntiy of utilizing any RISC-V > > > > > >>> > ISA extension for handling cache management. > > > > > >>> > > > > > >>> The current Linux RISC-V policy is to only accept patches > > > > > >>> for frozen or ratified ISA specs. > > > > > >>> (Refer, Documentation/riscv/patch-acceptance.rst) > > > > > >>> > > > > > >>> This means even if emulate CMO instructions in OpenSBI, the > > > > > >>> Linux patches won't be taken by Palmer because CMO > > > > > >>> specification is still in draft stage. > > > > > >> Before CMO specification release, could we use a sbi_ecall to > > > > > >> solve the current problem? This is not against the > > > > > >> specification, when CMO is ready we could let users choose to > > > > > >> use the new CMO in > > > Linux. > > > > > >> > > > > > >> From a tech view, CMO trap emulation is the same as sbi_ecall. > > > > > >> > > > > > >>> > > > > > >>> Also, we all know how much time it takes for RISCV > > > > > >>> international to freeze some spec. Judging by that we are > > > > > >>> looking at another > > > > > >>> 3-4 years at minimum. > > > > > > > > > > > > Sorry for being slow here, this thread got buried. > > > > > > > > > > > > I've been trying to work with a handful of folks at the RISC-V > > > > > > foundation to try and get a subset of the various > > > > > > in-development specifications (some simple CMOs, something > > > > > > about non-caching in the page tables, and some way to prevent > > > > > > speculative accesse from generating coherence traffic that will break > non-coherent systems). > > > > > > I'm not sure we can get this together quickly, but I'd prefer > > > > > > to at least try before we jump to taking vendor-specificed behavior > here. > > > > > > It's obviously an up-hill battle to try and get specifications > > > > > > through the process and I'm certainly not going to promise it > > > > > > will work, but I'm hoping that the impending need to avoid > > > > > > forking the ISA will be sufficient to get people behind > > > > > > producing some specifications in a timely > > > > > fashion. > > > > > > > > > > > > I wasn't aware than this chip had non-coherent devices until I > > > > > > saw this thread, so we'd been mostly focused on the Beagle V chip. > > > > > > That was in a sense an easier problem because the SiFive IP in > > > > > > it was never designed to have non-coherent devices so we'd > > > > > > have to make anything work via a series of slow workarounds, > > > > > > which would make emulating the eventually standardized > > > > > > behavior reasonable in terms of performance (ie, everything > > > > > > would be super slow so who really > > > cares). > > > > > > > > > > > > I don't think relying on some sort of SBI call for the CMOs > > > > > > whould be such a performance hit that it would prevent these > > > > > > systems from being viable, but assuming you have reasonable > > > > > > performance on your non-cached accesses then that's probably > > > > > > not going to be viable to trap and emulate. At that point it > > > > > > really just becomes silly to pretend that we're still making > > > > > > things work by emulating the eventually ratified behavior, as > > > > > > anyone who actually tries to use this thing to do IO would > > > > > > need out of tree patches. I'm not sure exactly what the plan > > > > > > is for the page table bits in the specification right now, but > > > > > > if you can give me a pointer to some documentation then I'm > > > > > > happy to try and push for something > > > compatible. > > > > > > > > > > > > If we can't make the process work at the foundation then I'd > > > > > > be strongly in favor of just biting the bullet and starting to > > > > > > take vendor-specific code that's been implemented in hardware > > > > > > and is necessarry to make things work acceptably. That's > > > > > > obviously a sub-optimal solution as it'll lead to a bunch of > > > > > > ISA fragmentation, but at least we'll be able to keep the > > > > > > software stack > > > together. > > > > > > > > > > > > Can you tell us when these will be in the hands of users? > > > > > > That's pretty important here, as I don't want to be blocking > > > > > > real users from having their hardware work. IIRC there were > > > > > > some plans to distribute early boards, but it looks like the > > > > > > foundation got involved and I guess I lost the thread at that point. > > > > > > > > > > > > Sorry this is all such a headache, but hopefully we can get > > > > > > things sorted out. > > > > > > > > > > I talked with some of the RISC-V foundation folks, we're not > > > > > going to have an ISA specification for the non-coherent stuff > > > > > any time soon. I took a look at this code and I definately > > > > > don't want to take it as is, but I'm not opposed to taking > > > > > something that makes the > > > hardware work as long as it's a lot cleaner. > > > > > We've already got two of these non-coherent chips, I'm sure more > > > > > will come, and I'd rather have the extra headaches than make > > > > > everyone fork the software stack. > > > > > > > > Thanks for confirming. The CMO extension is still in early stages > > > > so it will certainly take more time for them. After CMO extension > > > > is finalized, it will take some more time to have actual RISC-V > > > > platforms with > > > CMO implemented. > > > > > > > > > > > > > > After talking to Atish it looks like there's likely to be an SBI > > > > > extension to handle the CMOs, which should let us avoid the bulk > > > > > of the vendor-specific behavior in the kernel. I know some > > > > > people are worried about adding to the SBI surface. I'm worried > > > > > about that too, but that's way better than sticking a bunch of > > > > > vendor-specific instructions into the kernel. The SBI extension > > > > > should make for a straight-forward cache flush implementation in > > > > > Linux, so let's just plan on > > > that getting through quickly (as has been done before). > > > > > > > > Yes, I agree. We can have just a single SBI call which is meant > > > > for DMA sync purpose only which means it will flush/invalidate > > > > pages from all cache levels irrespective of the cache hierarchy (i.e. > > > > flush/invalidate to RAM). The CMO extension might more generic > > > > cache operations which can target any cache level. > > > > > > > > I am already preparing a write-up for SBI DMA sync call in SBI > > > > spec. I will share it with you separately as well. > > > > > > > > > > > > > > Unfortunately we've yet to come up with a way to handle the > > > > > non-cacheable mappings without introducing a degree of > > > > > vendor-specific behavior or seriously impacting performance > > > > > (mark them as not valid and deal with them in the trap handler). > > > > > I'm not really sure it counts as supporting the hardware if it's > > > > > massively slow, so that really leaves us with vendor-specific > > > > > mappings as the only > > > option to make these chips work. > > > > > > > > A RISC-V platform can have non-cacheable mappings is following > > > > possible > > > > ways: > > > > 1) Fixed physical address range as non-cacheable using PMAs > > > > 2) Custom page table attributes > > > > 3) Svpmbt extension being defined by RVI > > > > > > > > Atish and me both think it is possible to have RISC-V specific DMA > > > > ops implementation which can handle all above case. Atish is > > > > already working on DMA ops implementation for RISC-V. > > > Not only DMA ops, but also icache_sync & __vdso_icache_sync. Please > > > have a look at: > > > https://lore.kernel.org/linux-riscv/1622970249-50770-12-git-send-ema > > > il- > > > guoren@kernel.org/T/#u > > > > The icache_sync and __vdso_icache_sync will have to be addressed > > differently. The SBI DMA sync extension cannot address this. > Agree > > > > > It seems Allwinner D1 have more non-standard stuff: > > 1) Custom PTE bits for IO-coherent access > > 2) Custom data cache flush/invalidate for DMA sync > > 3) Custom icache flush/invalidate > Yes, but 3) is a performance optimization, not critical for running. > > > > > Other hand, BeagleV has only two problems: > > 1) Custom physical address range for IO-coherent access > > 2) Custom L2 cache flush/invalidate for DMA sync > https://github.com/starfive- > tech/linux/commit/d4c4044c08134dca8e5eaaeb6d3faf97dc453b6d > > Currently, they still use DMA sync with DMA descriptor, are you sure they > have minor memory physical address. > > > > > From above #2, can be solved by SBI DMA sync call and Linux DMA ops > > for both BeagleV and Allwinner D1 > > > > On BeagleV, issue #1 can be solved using "dma-ranges". > > > > On Allwinner D1, issues #1 and #3 need to be addressed separately. > > > > I think supporting BeagleV in upstream Linux is relatively easy > > compared to Allwinner D1. > > > > @Guo, please check if you can reserve dedicated physical address range > > for IO-coherent access (just like BeagleV). If yes, then we can tackle > > issue #1 for Allwinner > > D1 using "dma-ranges" DT property. > There is no dedicated physical address range for IO-coherent access in D1. But > the solution you mentioned couldn't solve all requirements. > Only one mirror physical address range is not enough, we need at least three > (Normal, DMA desc, frame buffer). How many non-coherent devices you really have? I am guess lot of critical devices on Allwinner D1 are not coherent with CPU. The problem for Allwinner D1 is even worst than I thought. If such critical high through-put devices are not cache coherent with CPU then I am speechless about Allwinner D1 situation. > And that will triple the memory physical address which can't be accepted by > our users from the hardware design cost view. > > "dma-ranges" DT property is a big early MIPS smell. ARM SOC users can't > accept it. (They just say replace the CPU, but don't touch anything other.) > > PTE attributes are the non-coherent solution for many years. MIPS also > follows that now: > ref arch/mips/include/asm/pgtable-bits.h & > arch/mips/include/asm/pgtable.h RISC-V is in the process of standardizing Svpmbt extension. Unfortunately, the higher order bits which your implementation uses is not for SoC vendor use as-per the RISC-V privilege spec. > > #ifndef _CACHE_CACHABLE_NO_WA > #define _CACHE_CACHABLE_NO_WA (0<<_CACHE_SHIFT) > #endif > #ifndef _CACHE_CACHABLE_WA > #define _CACHE_CACHABLE_WA (1<<_CACHE_SHIFT) > #endif > #ifndef _CACHE_UNCACHED > #define _CACHE_UNCACHED (2<<_CACHE_SHIFT) > #endif > #ifndef _CACHE_CACHABLE_NONCOHERENT > #define _CACHE_CACHABLE_NONCOHERENT (3<<_CACHE_SHIFT) > #endif > #ifndef _CACHE_CACHABLE_CE > #define _CACHE_CACHABLE_CE (4<<_CACHE_SHIFT) > #endif > #ifndef _CACHE_CACHABLE_COW > #define _CACHE_CACHABLE_COW (5<<_CACHE_SHIFT) > #endif > #ifndef _CACHE_CACHABLE_CUW > #define _CACHE_CACHABLE_CUW (6<<_CACHE_SHIFT) > #endif > #ifndef _CACHE_UNCACHED_ACCELERATED > #define _CACHE_UNCACHED_ACCELERATED (7<<_CACHE_SHIFT) > > We can't force our users to double/triplicate their physical memory regions. We are trying to find a workable solution here so that we don't have to deal with custom PTE attributes which are reserved for RISC-V priv specification only. Regards, Anup > > > > > Regards, > > Anup > > > > > > > > > > > > > > > > > > > > > > This implementation, which adds some Kconfig entries that > > > > > control page table bits, definately isn't suitable for upstream. > > > > > Allowing users to set arbitrary page table bits will eventually > > > > > conflict with the standard, and is just going to be a mess. > > > > > It'll also lead to kernels that are only compatible with > > > > > specific designs, which we're trying very hard to avoid. At a > > > > > bare minimum we'll need some way to detect systems with these > > > > > page table bits before setting them, and some description of > > > > > what the bits actually do so we can reason about > > > them. > > > > > > > > Yes, vendor specific Kconfig options are strict NO NO. We can't > > > > give-up the goal of unified kernel image for all platforms. > > > > > > > > Regards, > > > > Anup > > > > > > > > > > > > -- > > > Best Regards > > > Guo Ren > > > > > > ML: https://lore.kernel.org/linux-csky/ > > > > -- > Best Regards > Guo Ren > > ML: https://lore.kernel.org/linux-csky/
On Mon, Jun 7, 2021 at 12:47 PM Anup Patel <Anup.Patel@wdc.com> wrote: > > > > > -----Original Message----- > > From: Guo Ren <guoren@kernel.org> > > Sent: 07 June 2021 09:52 > > To: Anup Patel <Anup.Patel@wdc.com> > > Cc: Atish Patra <atishp@atishpatra.org>; Palmer Dabbelt > > <palmer@dabbelt.com>; anup@brainfault.org; drew@beagleboard.org; > > Christoph Hellwig <hch@lst.de>; wefu@redhat.com; lazyparser@gmail.com; > > linux-riscv@lists.infradead.org; linux-kernel@vger.kernel.org; linux- > > arch@vger.kernel.org; linux-sunxi@lists.linux.dev; guoren@linux.alibaba.com; > > Paul Walmsley <paul.walmsley@sifive.com> > > Subject: Re: [PATCH RFC 0/3] riscv: Add DMA_COHERENT support > > > > Hi Anup, > > > > On Mon, Jun 7, 2021 at 11:38 AM Anup Patel <Anup.Patel@wdc.com> wrote: > > > > > > > > > > > > > -----Original Message----- > > > > From: Guo Ren <guoren@kernel.org> > > > > Sent: 06 June 2021 22:42 > > > > To: Anup Patel <Anup.Patel@wdc.com>; Atish Patra > > > > <atishp@atishpatra.org> > > > > Cc: Palmer Dabbelt <palmer@dabbelt.com>; anup@brainfault.org; > > > > drew@beagleboard.org; Christoph Hellwig <hch@lst.de>; > > > > wefu@redhat.com; lazyparser@gmail.com; > > > > linux-riscv@lists.infradead.org; linux- kernel@vger.kernel.org; > > > > linux-arch@vger.kernel.org; linux- sunxi@lists.linux.dev; > > > > guoren@linux.alibaba.com; Paul Walmsley <paul.walmsley@sifive.com> > > > > Subject: Re: [PATCH RFC 0/3] riscv: Add DMA_COHERENT support > > > > > > > > Hi Anup and Atish, > > > > > > > > On Thu, Jun 3, 2021 at 2:00 PM Anup Patel <Anup.Patel@wdc.com> > > wrote: > > > > > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > From: Palmer Dabbelt <palmer@dabbelt.com> > > > > > > Sent: 03 June 2021 09:43 > > > > > > To: guoren@kernel.org > > > > > > Cc: anup@brainfault.org; drew@beagleboard.org; Christoph Hellwig > > > > > > <hch@lst.de>; Anup Patel <Anup.Patel@wdc.com>; > > wefu@redhat.com; > > > > > > lazyparser@gmail.com; linux-riscv@lists.infradead.org; linux- > > > > > > kernel@vger.kernel.org; linux-arch@vger.kernel.org; linux- > > > > > > sunxi@lists.linux.dev; guoren@linux.alibaba.com; Paul Walmsley > > > > > > <paul.walmsley@sifive.com> > > > > > > Subject: Re: [PATCH RFC 0/3] riscv: Add DMA_COHERENT support > > > > > > > > > > > > On Sat, 29 May 2021 17:30:18 PDT (-0700), Palmer Dabbelt wrote: > > > > > > > On Fri, 21 May 2021 17:36:08 PDT (-0700), guoren@kernel.org > > wrote: > > > > > > >> On Wed, May 19, 2021 at 3:15 PM Anup Patel > > > > > > >> <anup@brainfault.org> > > > > > > wrote: > > > > > > >>> > > > > > > >>> On Wed, May 19, 2021 at 12:24 PM Drew Fustini > > > > > > <drew@beagleboard.org> wrote: > > > > > > >>> > > > > > > > >>> > On Wed, May 19, 2021 at 08:06:17AM +0200, Christoph > > > > > > >>> > Hellwig > > > > > > wrote: > > > > > > >>> > > On Wed, May 19, 2021 at 02:05:00PM +0800, Guo Ren > > wrote: > > > > > > >>> > > > Since the existing RISC-V ISA cannot solve this > > > > > > >>> > > > problem, it is better to provide some configuration > > > > > > >>> > > > for the SOC vendor to > > > > > > customize. > > > > > > >>> > > > > > > > > >>> > > We've been talking about this problem for close to five years. > > > > > > >>> > > So no, if you don't manage to get the feature into the > > > > > > >>> > > ISA it can't be supported. > > > > > > >>> > > > > > > > >>> > Isn't it a good goal for Linux to support the capabilities > > > > > > >>> > present in the SoC that a currently being fab'd? > > > > > > >>> > > > > > > > >>> > I believe the CMO group only started last year [1] so the > > > > > > >>> > RV64GC SoCs that are going into mass production this year > > > > > > >>> > would not have had the opporuntiy of utilizing any RISC-V > > > > > > >>> > ISA extension for handling cache management. > > > > > > >>> > > > > > > >>> The current Linux RISC-V policy is to only accept patches > > > > > > >>> for frozen or ratified ISA specs. > > > > > > >>> (Refer, Documentation/riscv/patch-acceptance.rst) > > > > > > >>> > > > > > > >>> This means even if emulate CMO instructions in OpenSBI, the > > > > > > >>> Linux patches won't be taken by Palmer because CMO > > > > > > >>> specification is still in draft stage. > > > > > > >> Before CMO specification release, could we use a sbi_ecall to > > > > > > >> solve the current problem? This is not against the > > > > > > >> specification, when CMO is ready we could let users choose to > > > > > > >> use the new CMO in > > > > Linux. > > > > > > >> > > > > > > >> From a tech view, CMO trap emulation is the same as sbi_ecall. > > > > > > >> > > > > > > >>> > > > > > > >>> Also, we all know how much time it takes for RISCV > > > > > > >>> international to freeze some spec. Judging by that we are > > > > > > >>> looking at another > > > > > > >>> 3-4 years at minimum. > > > > > > > > > > > > > > Sorry for being slow here, this thread got buried. > > > > > > > > > > > > > > I've been trying to work with a handful of folks at the RISC-V > > > > > > > foundation to try and get a subset of the various > > > > > > > in-development specifications (some simple CMOs, something > > > > > > > about non-caching in the page tables, and some way to prevent > > > > > > > speculative accesse from generating coherence traffic that will break > > non-coherent systems). > > > > > > > I'm not sure we can get this together quickly, but I'd prefer > > > > > > > to at least try before we jump to taking vendor-specificed behavior > > here. > > > > > > > It's obviously an up-hill battle to try and get specifications > > > > > > > through the process and I'm certainly not going to promise it > > > > > > > will work, but I'm hoping that the impending need to avoid > > > > > > > forking the ISA will be sufficient to get people behind > > > > > > > producing some specifications in a timely > > > > > > fashion. > > > > > > > > > > > > > > I wasn't aware than this chip had non-coherent devices until I > > > > > > > saw this thread, so we'd been mostly focused on the Beagle V chip. > > > > > > > That was in a sense an easier problem because the SiFive IP in > > > > > > > it was never designed to have non-coherent devices so we'd > > > > > > > have to make anything work via a series of slow workarounds, > > > > > > > which would make emulating the eventually standardized > > > > > > > behavior reasonable in terms of performance (ie, everything > > > > > > > would be super slow so who really > > > > cares). > > > > > > > > > > > > > > I don't think relying on some sort of SBI call for the CMOs > > > > > > > whould be such a performance hit that it would prevent these > > > > > > > systems from being viable, but assuming you have reasonable > > > > > > > performance on your non-cached accesses then that's probably > > > > > > > not going to be viable to trap and emulate. At that point it > > > > > > > really just becomes silly to pretend that we're still making > > > > > > > things work by emulating the eventually ratified behavior, as > > > > > > > anyone who actually tries to use this thing to do IO would > > > > > > > need out of tree patches. I'm not sure exactly what the plan > > > > > > > is for the page table bits in the specification right now, but > > > > > > > if you can give me a pointer to some documentation then I'm > > > > > > > happy to try and push for something > > > > compatible. > > > > > > > > > > > > > > If we can't make the process work at the foundation then I'd > > > > > > > be strongly in favor of just biting the bullet and starting to > > > > > > > take vendor-specific code that's been implemented in hardware > > > > > > > and is necessarry to make things work acceptably. That's > > > > > > > obviously a sub-optimal solution as it'll lead to a bunch of > > > > > > > ISA fragmentation, but at least we'll be able to keep the > > > > > > > software stack > > > > together. > > > > > > > > > > > > > > Can you tell us when these will be in the hands of users? > > > > > > > That's pretty important here, as I don't want to be blocking > > > > > > > real users from having their hardware work. IIRC there were > > > > > > > some plans to distribute early boards, but it looks like the > > > > > > > foundation got involved and I guess I lost the thread at that point. > > > > > > > > > > > > > > Sorry this is all such a headache, but hopefully we can get > > > > > > > things sorted out. > > > > > > > > > > > > I talked with some of the RISC-V foundation folks, we're not > > > > > > going to have an ISA specification for the non-coherent stuff > > > > > > any time soon. I took a look at this code and I definately > > > > > > don't want to take it as is, but I'm not opposed to taking > > > > > > something that makes the > > > > hardware work as long as it's a lot cleaner. > > > > > > We've already got two of these non-coherent chips, I'm sure more > > > > > > will come, and I'd rather have the extra headaches than make > > > > > > everyone fork the software stack. > > > > > > > > > > Thanks for confirming. The CMO extension is still in early stages > > > > > so it will certainly take more time for them. After CMO extension > > > > > is finalized, it will take some more time to have actual RISC-V > > > > > platforms with > > > > CMO implemented. > > > > > > > > > > > > > > > > > After talking to Atish it looks like there's likely to be an SBI > > > > > > extension to handle the CMOs, which should let us avoid the bulk > > > > > > of the vendor-specific behavior in the kernel. I know some > > > > > > people are worried about adding to the SBI surface. I'm worried > > > > > > about that too, but that's way better than sticking a bunch of > > > > > > vendor-specific instructions into the kernel. The SBI extension > > > > > > should make for a straight-forward cache flush implementation in > > > > > > Linux, so let's just plan on > > > > that getting through quickly (as has been done before). > > > > > > > > > > Yes, I agree. We can have just a single SBI call which is meant > > > > > for DMA sync purpose only which means it will flush/invalidate > > > > > pages from all cache levels irrespective of the cache hierarchy (i.e. > > > > > flush/invalidate to RAM). The CMO extension might more generic > > > > > cache operations which can target any cache level. > > > > > > > > > > I am already preparing a write-up for SBI DMA sync call in SBI > > > > > spec. I will share it with you separately as well. > > > > > > > > > > > > > > > > > Unfortunately we've yet to come up with a way to handle the > > > > > > non-cacheable mappings without introducing a degree of > > > > > > vendor-specific behavior or seriously impacting performance > > > > > > (mark them as not valid and deal with them in the trap handler). > > > > > > I'm not really sure it counts as supporting the hardware if it's > > > > > > massively slow, so that really leaves us with vendor-specific > > > > > > mappings as the only > > > > option to make these chips work. > > > > > > > > > > A RISC-V platform can have non-cacheable mappings is following > > > > > possible > > > > > ways: > > > > > 1) Fixed physical address range as non-cacheable using PMAs > > > > > 2) Custom page table attributes > > > > > 3) Svpmbt extension being defined by RVI > > > > > > > > > > Atish and me both think it is possible to have RISC-V specific DMA > > > > > ops implementation which can handle all above case. Atish is > > > > > already working on DMA ops implementation for RISC-V. > > > > Not only DMA ops, but also icache_sync & __vdso_icache_sync. Please > > > > have a look at: > > > > https://lore.kernel.org/linux-riscv/1622970249-50770-12-git-send-ema > > > > il- > > > > guoren@kernel.org/T/#u > > > > > > The icache_sync and __vdso_icache_sync will have to be addressed > > > differently. The SBI DMA sync extension cannot address this. > > Agree > > > > > > > > It seems Allwinner D1 have more non-standard stuff: > > > 1) Custom PTE bits for IO-coherent access > > > 2) Custom data cache flush/invalidate for DMA sync > > > 3) Custom icache flush/invalidate > > Yes, but 3) is a performance optimization, not critical for running. > > > > > > > > Other hand, BeagleV has only two problems: > > > 1) Custom physical address range for IO-coherent access > > > 2) Custom L2 cache flush/invalidate for DMA sync > > https://github.com/starfive- > > tech/linux/commit/d4c4044c08134dca8e5eaaeb6d3faf97dc453b6d > > > > Currently, they still use DMA sync with DMA descriptor, are you sure they > > have minor memory physical address. > > > > > > > > From above #2, can be solved by SBI DMA sync call and Linux DMA ops > > > for both BeagleV and Allwinner D1 > > > > > > On BeagleV, issue #1 can be solved using "dma-ranges". > > > > > > On Allwinner D1, issues #1 and #3 need to be addressed separately. > > > > > > I think supporting BeagleV in upstream Linux is relatively easy > > > compared to Allwinner D1. > > > > > > @Guo, please check if you can reserve dedicated physical address range > > > for IO-coherent access (just like BeagleV). If yes, then we can tackle > > > issue #1 for Allwinner > > > D1 using "dma-ranges" DT property. > > There is no dedicated physical address range for IO-coherent access in D1. But > > the solution you mentioned couldn't solve all requirements. > > Only one mirror physical address range is not enough, we need at least three > > (Normal, DMA desc, frame buffer). > > How many non-coherent devices you really have? > > I am guess lot of critical devices on Allwinner D1 are not coherent with CPU. > The problem for Allwinner D1 is even worst than I thought. If such critical > high through-put devices are not cache coherent with CPU then I am > speechless about Allwinner D1 situation. Allwinner D1 is a cost-down product and there is no cache coherent device at all. Cache coherent interconnect will increase the chip design cost and the performance is enough in their scenario. So that why we need Srong Order + noncache & Weak Order + noncache to optimization. From T-HEAD side we could privide two kinds of solution of DMA coherent. - Let SOC vendor update coherent interconnect, and our CPU could support coherent protocal. - Let SOC vendor connect their DMA device with our CPU LL cache coherent interface. But we can't force them do that. They want how my origin soc works then make it work with your RV core. They know trade off coherency or non-coherency in their busisness scenario. > > > And that will triple the memory physical address which can't be accepted by > > our users from the hardware design cost view. > > > > "dma-ranges" DT property is a big early MIPS smell. ARM SOC users can't > > accept it. (They just say replace the CPU, but don't touch anything other.) > > > > PTE attributes are the non-coherent solution for many years. MIPS also > > follows that now: > > ref arch/mips/include/asm/pgtable-bits.h & > > arch/mips/include/asm/pgtable.h > > RISC-V is in the process of standardizing Svpmbt extension. > > Unfortunately, the higher order bits which your implementation uses is > not for SoC vendor use as-per the RISC-V privilege spec. For a while, I had placed my hopes on C-bit, but my fantasy was disillusioned. -_-! > > > > > #ifndef _CACHE_CACHABLE_NO_WA > > #define _CACHE_CACHABLE_NO_WA (0<<_CACHE_SHIFT) > > #endif > > #ifndef _CACHE_CACHABLE_WA > > #define _CACHE_CACHABLE_WA (1<<_CACHE_SHIFT) > > #endif > > #ifndef _CACHE_UNCACHED > > #define _CACHE_UNCACHED (2<<_CACHE_SHIFT) > > #endif > > #ifndef _CACHE_CACHABLE_NONCOHERENT > > #define _CACHE_CACHABLE_NONCOHERENT (3<<_CACHE_SHIFT) > > #endif > > #ifndef _CACHE_CACHABLE_CE > > #define _CACHE_CACHABLE_CE (4<<_CACHE_SHIFT) > > #endif > > #ifndef _CACHE_CACHABLE_COW > > #define _CACHE_CACHABLE_COW (5<<_CACHE_SHIFT) > > #endif > > #ifndef _CACHE_CACHABLE_CUW > > #define _CACHE_CACHABLE_CUW (6<<_CACHE_SHIFT) > > #endif > > #ifndef _CACHE_UNCACHED_ACCELERATED > > #define _CACHE_UNCACHED_ACCELERATED (7<<_CACHE_SHIFT) > > > > We can't force our users to double/triplicate their physical memory regions. > > We are trying to find a workable solution here so that we don't have > to deal with custom PTE attributes which are reserved for RISC-V priv > specification only. Thank you for your hard work in this regard, sincerely. > > Regards, > Anup > > > > > > > > > Regards, > > > Anup > > > > > > > > > > > > > > > > > > > > > > > > > > > > This implementation, which adds some Kconfig entries that > > > > > > control page table bits, definately isn't suitable for upstream. > > > > > > Allowing users to set arbitrary page table bits will eventually > > > > > > conflict with the standard, and is just going to be a mess. > > > > > > It'll also lead to kernels that are only compatible with > > > > > > specific designs, which we're trying very hard to avoid. At a > > > > > > bare minimum we'll need some way to detect systems with these > > > > > > page table bits before setting them, and some description of > > > > > > what the bits actually do so we can reason about > > > > them. > > > > > > > > > > Yes, vendor specific Kconfig options are strict NO NO. We can't > > > > > give-up the goal of unified kernel image for all platforms. > > > > > > > > > > Regards, > > > > > Anup > > > > > > > > > > > > > > > > -- > > > > Best Regards > > > > Guo Ren > > > > > > > > ML: https://lore.kernel.org/linux-csky/ > > > > > > > > -- > > Best Regards > > Guo Ren > > > > ML: https://lore.kernel.org/linux-csky/
: On Mon, Jun 7, 2021 at 12:47 PM Anup Patel <Anup.Patel@wdc.com> wrote: > > > > > -----Original Message----- > > From: Guo Ren <guoren@kernel.org> > > Sent: 07 June 2021 09:52 > > To: Anup Patel <Anup.Patel@wdc.com> > > Cc: Atish Patra <atishp@atishpatra.org>; Palmer Dabbelt > > <palmer@dabbelt.com>; anup@brainfault.org; drew@beagleboard.org; > > Christoph Hellwig <hch@lst.de>; wefu@redhat.com; lazyparser@gmail.com; > > linux-riscv@lists.infradead.org; linux-kernel@vger.kernel.org; linux- > > arch@vger.kernel.org; linux-sunxi@lists.linux.dev; guoren@linux.alibaba.com; > > Paul Walmsley <paul.walmsley@sifive.com> > > Subject: Re: [PATCH RFC 0/3] riscv: Add DMA_COHERENT support > > > > Hi Anup, > > > > On Mon, Jun 7, 2021 at 11:38 AM Anup Patel <Anup.Patel@wdc.com> wrote: > > > > > > > > > > > > > -----Original Message----- > > > > From: Guo Ren <guoren@kernel.org> > > > > Sent: 06 June 2021 22:42 > > > > To: Anup Patel <Anup.Patel@wdc.com>; Atish Patra > > > > <atishp@atishpatra.org> > > > > Cc: Palmer Dabbelt <palmer@dabbelt.com>; anup@brainfault.org; > > > > drew@beagleboard.org; Christoph Hellwig <hch@lst.de>; > > > > wefu@redhat.com; lazyparser@gmail.com; > > > > linux-riscv@lists.infradead.org; linux- kernel@vger.kernel.org; > > > > linux-arch@vger.kernel.org; linux- sunxi@lists.linux.dev; > > > > guoren@linux.alibaba.com; Paul Walmsley <paul.walmsley@sifive.com> > > > > Subject: Re: [PATCH RFC 0/3] riscv: Add DMA_COHERENT support > > > > > > > > Hi Anup and Atish, > > > > > > > > On Thu, Jun 3, 2021 at 2:00 PM Anup Patel <Anup.Patel@wdc.com> > > wrote: > > > > > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > From: Palmer Dabbelt <palmer@dabbelt.com> > > > > > > Sent: 03 June 2021 09:43 > > > > > > To: guoren@kernel.org > > > > > > Cc: anup@brainfault.org; drew@beagleboard.org; Christoph Hellwig > > > > > > <hch@lst.de>; Anup Patel <Anup.Patel@wdc.com>; > > wefu@redhat.com; > > > > > > lazyparser@gmail.com; linux-riscv@lists.infradead.org; linux- > > > > > > kernel@vger.kernel.org; linux-arch@vger.kernel.org; linux- > > > > > > sunxi@lists.linux.dev; guoren@linux.alibaba.com; Paul Walmsley > > > > > > <paul.walmsley@sifive.com> > > > > > > Subject: Re: [PATCH RFC 0/3] riscv: Add DMA_COHERENT support > > > > > > > > > > > > On Sat, 29 May 2021 17:30:18 PDT (-0700), Palmer Dabbelt wrote: > > > > > > > On Fri, 21 May 2021 17:36:08 PDT (-0700), guoren@kernel.org > > wrote: > > > > > > >> On Wed, May 19, 2021 at 3:15 PM Anup Patel > > > > > > >> <anup@brainfault.org> > > > > > > wrote: > > > > > > >>> > > > > > > >>> On Wed, May 19, 2021 at 12:24 PM Drew Fustini > > > > > > <drew@beagleboard.org> wrote: > > > > > > >>> > > > > > > > >>> > On Wed, May 19, 2021 at 08:06:17AM +0200, Christoph > > > > > > >>> > Hellwig > > > > > > wrote: > > > > > > >>> > > On Wed, May 19, 2021 at 02:05:00PM +0800, Guo Ren > > wrote: > > > > > > >>> > > > Since the existing RISC-V ISA cannot solve this > > > > > > >>> > > > problem, it is better to provide some configuration > > > > > > >>> > > > for the SOC vendor to > > > > > > customize. > > > > > > >>> > > > > > > > > >>> > > We've been talking about this problem for close to five years. > > > > > > >>> > > So no, if you don't manage to get the feature into the > > > > > > >>> > > ISA it can't be supported. > > > > > > >>> > > > > > > > >>> > Isn't it a good goal for Linux to support the capabilities > > > > > > >>> > present in the SoC that a currently being fab'd? > > > > > > >>> > > > > > > > >>> > I believe the CMO group only started last year [1] so the > > > > > > >>> > RV64GC SoCs that are going into mass production this year > > > > > > >>> > would not have had the opporuntiy of utilizing any RISC-V > > > > > > >>> > ISA extension for handling cache management. > > > > > > >>> > > > > > > >>> The current Linux RISC-V policy is to only accept patches > > > > > > >>> for frozen or ratified ISA specs. > > > > > > >>> (Refer, Documentation/riscv/patch-acceptance.rst) > > > > > > >>> > > > > > > >>> This means even if emulate CMO instructions in OpenSBI, the > > > > > > >>> Linux patches won't be taken by Palmer because CMO > > > > > > >>> specification is still in draft stage. > > > > > > >> Before CMO specification release, could we use a sbi_ecall to > > > > > > >> solve the current problem? This is not against the > > > > > > >> specification, when CMO is ready we could let users choose to > > > > > > >> use the new CMO in > > > > Linux. > > > > > > >> > > > > > > >> From a tech view, CMO trap emulation is the same as sbi_ecall. > > > > > > >> > > > > > > >>> > > > > > > >>> Also, we all know how much time it takes for RISCV > > > > > > >>> international to freeze some spec. Judging by that we are > > > > > > >>> looking at another > > > > > > >>> 3-4 years at minimum. > > > > > > > > > > > > > > Sorry for being slow here, this thread got buried. > > > > > > > > > > > > > > I've been trying to work with a handful of folks at the RISC-V > > > > > > > foundation to try and get a subset of the various > > > > > > > in-development specifications (some simple CMOs, something > > > > > > > about non-caching in the page tables, and some way to prevent > > > > > > > speculative accesse from generating coherence traffic that will break > > non-coherent systems). > > > > > > > I'm not sure we can get this together quickly, but I'd prefer > > > > > > > to at least try before we jump to taking vendor-specificed behavior > > here. > > > > > > > It's obviously an up-hill battle to try and get specifications > > > > > > > through the process and I'm certainly not going to promise it > > > > > > > will work, but I'm hoping that the impending need to avoid > > > > > > > forking the ISA will be sufficient to get people behind > > > > > > > producing some specifications in a timely > > > > > > fashion. > > > > > > > > > > > > > > I wasn't aware than this chip had non-coherent devices until I > > > > > > > saw this thread, so we'd been mostly focused on the Beagle V chip. > > > > > > > That was in a sense an easier problem because the SiFive IP in > > > > > > > it was never designed to have non-coherent devices so we'd > > > > > > > have to make anything work via a series of slow workarounds, > > > > > > > which would make emulating the eventually standardized > > > > > > > behavior reasonable in terms of performance (ie, everything > > > > > > > would be super slow so who really > > > > cares). > > > > > > > > > > > > > > I don't think relying on some sort of SBI call for the CMOs > > > > > > > whould be such a performance hit that it would prevent these > > > > > > > systems from being viable, but assuming you have reasonable > > > > > > > performance on your non-cached accesses then that's probably > > > > > > > not going to be viable to trap and emulate. At that point it > > > > > > > really just becomes silly to pretend that we're still making > > > > > > > things work by emulating the eventually ratified behavior, as > > > > > > > anyone who actually tries to use this thing to do IO would > > > > > > > need out of tree patches. I'm not sure exactly what the plan > > > > > > > is for the page table bits in the specification right now, but > > > > > > > if you can give me a pointer to some documentation then I'm > > > > > > > happy to try and push for something > > > > compatible. > > > > > > > > > > > > > > If we can't make the process work at the foundation then I'd > > > > > > > be strongly in favor of just biting the bullet and starting to > > > > > > > take vendor-specific code that's been implemented in hardware > > > > > > > and is necessarry to make things work acceptably. That's > > > > > > > obviously a sub-optimal solution as it'll lead to a bunch of > > > > > > > ISA fragmentation, but at least we'll be able to keep the > > > > > > > software stack > > > > together. > > > > > > > > > > > > > > Can you tell us when these will be in the hands of users? > > > > > > > That's pretty important here, as I don't want to be blocking > > > > > > > real users from having their hardware work. IIRC there were > > > > > > > some plans to distribute early boards, but it looks like the > > > > > > > foundation got involved and I guess I lost the thread at that point. > > > > > > > > > > > > > > Sorry this is all such a headache, but hopefully we can get > > > > > > > things sorted out. > > > > > > > > > > > > I talked with some of the RISC-V foundation folks, we're not > > > > > > going to have an ISA specification for the non-coherent stuff > > > > > > any time soon. I took a look at this code and I definately > > > > > > don't want to take it as is, but I'm not opposed to taking > > > > > > something that makes the > > > > hardware work as long as it's a lot cleaner. > > > > > > We've already got two of these non-coherent chips, I'm sure more > > > > > > will come, and I'd rather have the extra headaches than make > > > > > > everyone fork the software stack. > > > > > > > > > > Thanks for confirming. The CMO extension is still in early stages > > > > > so it will certainly take more time for them. After CMO extension > > > > > is finalized, it will take some more time to have actual RISC-V > > > > > platforms with > > > > CMO implemented. > > > > > > > > > > > > > > > > > After talking to Atish it looks like there's likely to be an SBI > > > > > > extension to handle the CMOs, which should let us avoid the bulk > > > > > > of the vendor-specific behavior in the kernel. I know some > > > > > > people are worried about adding to the SBI surface. I'm worried > > > > > > about that too, but that's way better than sticking a bunch of > > > > > > vendor-specific instructions into the kernel. The SBI extension > > > > > > should make for a straight-forward cache flush implementation in > > > > > > Linux, so let's just plan on > > > > that getting through quickly (as has been done before). > > > > > > > > > > Yes, I agree. We can have just a single SBI call which is meant > > > > > for DMA sync purpose only which means it will flush/invalidate > > > > > pages from all cache levels irrespective of the cache hierarchy (i.e. > > > > > flush/invalidate to RAM). The CMO extension might more generic > > > > > cache operations which can target any cache level. > > > > > > > > > > I am already preparing a write-up for SBI DMA sync call in SBI > > > > > spec. I will share it with you separately as well. > > > > > > > > > > > > > > > > > Unfortunately we've yet to come up with a way to handle the > > > > > > non-cacheable mappings without introducing a degree of > > > > > > vendor-specific behavior or seriously impacting performance > > > > > > (mark them as not valid and deal with them in the trap handler). > > > > > > I'm not really sure it counts as supporting the hardware if it's > > > > > > massively slow, so that really leaves us with vendor-specific > > > > > > mappings as the only > > > > option to make these chips work. > > > > > > > > > > A RISC-V platform can have non-cacheable mappings is following > > > > > possible > > > > > ways: > > > > > 1) Fixed physical address range as non-cacheable using PMAs > > > > > 2) Custom page table attributes > > > > > 3) Svpmbt extension being defined by RVI > > > > > > > > > > Atish and me both think it is possible to have RISC-V specific DMA > > > > > ops implementation which can handle all above case. Atish is > > > > > already working on DMA ops implementation for RISC-V. > > > > Not only DMA ops, but also icache_sync & __vdso_icache_sync. Please > > > > have a look at: > > > > https://lore.kernel.org/linux-riscv/1622970249-50770-12-git-send-ema > > > > il- > > > > guoren@kernel.org/T/#u > > > > > > The icache_sync and __vdso_icache_sync will have to be addressed > > > differently. The SBI DMA sync extension cannot address this. > > Agree > > > > > > > > It seems Allwinner D1 have more non-standard stuff: > > > 1) Custom PTE bits for IO-coherent access > > > 2) Custom data cache flush/invalidate for DMA sync > > > 3) Custom icache flush/invalidate > > Yes, but 3) is a performance optimization, not critical for running. > > > > > > > > Other hand, BeagleV has only two problems: > > > 1) Custom physical address range for IO-coherent access > > > 2) Custom L2 cache flush/invalidate for DMA sync > > https://github.com/starfive- > > tech/linux/commit/d4c4044c08134dca8e5eaaeb6d3faf97dc453b6d > > > > Currently, they still use DMA sync with DMA descriptor, are you sure they > > have minor memory physical address. > > > > > > > > From above #2, can be solved by SBI DMA sync call and Linux DMA ops > > > for both BeagleV and Allwinner D1 > > > > > > On BeagleV, issue #1 can be solved using "dma-ranges". > > > > > > On Allwinner D1, issues #1 and #3 need to be addressed separately. > > > > > > I think supporting BeagleV in upstream Linux is relatively easy > > > compared to Allwinner D1. > > > > > > @Guo, please check if you can reserve dedicated physical address range > > > for IO-coherent access (just like BeagleV). If yes, then we can tackle > > > issue #1 for Allwinner > > > D1 using "dma-ranges" DT property. > > There is no dedicated physical address range for IO-coherent access in D1. But > > the solution you mentioned couldn't solve all requirements. > > Only one mirror physical address range is not enough, we need at least three > > (Normal, DMA desc, frame buffer). > > How many non-coherent devices you really have? > > I am guess lot of critical devices on Allwinner D1 are not coherent with CPU. > The problem for Allwinner D1 is even worst than I thought. If such critical > high through-put devices are not cache coherent with CPU then I am > speechless about Allwinner D1 situation. > > > And that will triple the memory physical address which can't be accepted by > > our users from the hardware design cost view. > > > > "dma-ranges" DT property is a big early MIPS smell. ARM SOC users can't > > accept it. (They just say replace the CPU, but don't touch anything other.) > > > > PTE attributes are the non-coherent solution for many years. MIPS also > > follows that now: > > ref arch/mips/include/asm/pgtable-bits.h & > > arch/mips/include/asm/pgtable.h > > RISC-V is in the process of standardizing Svpmbt extension. > > Unfortunately, the higher order bits which your implementation uses is > not for SoC vendor use as-per the RISC-V privilege spec. > > > > > #ifndef _CACHE_CACHABLE_NO_WA > > #define _CACHE_CACHABLE_NO_WA (0<<_CACHE_SHIFT) > > #endif > > #ifndef _CACHE_CACHABLE_WA > > #define _CACHE_CACHABLE_WA (1<<_CACHE_SHIFT) > > #endif > > #ifndef _CACHE_UNCACHED > > #define _CACHE_UNCACHED (2<<_CACHE_SHIFT) > > #endif > > #ifndef _CACHE_CACHABLE_NONCOHERENT > > #define _CACHE_CACHABLE_NONCOHERENT (3<<_CACHE_SHIFT) > > #endif > > #ifndef _CACHE_CACHABLE_CE > > #define _CACHE_CACHABLE_CE (4<<_CACHE_SHIFT) > > #endif > > #ifndef _CACHE_CACHABLE_COW > > #define _CACHE_CACHABLE_COW (5<<_CACHE_SHIFT) > > #endif > > #ifndef _CACHE_CACHABLE_CUW > > #define _CACHE_CACHABLE_CUW (6<<_CACHE_SHIFT) > > #endif > > #ifndef _CACHE_UNCACHED_ACCELERATED > > #define _CACHE_UNCACHED_ACCELERATED (7<<_CACHE_SHIFT) > > > > We can't force our users to double/triplicate their physical memory regions. > > We are trying to find a workable solution here so that we don't have > to deal with custom PTE attributes which are reserved for RISC-V priv > specification only. How do think about my new patch of custom PTE attributes? https://lore.kernel.org/linux-riscv/610849b6f66e8d5a9653c9f62f46c48d@mailhost.ics.forth.gr/T/#mdc0dacba57346b5ac59a01961495c132b93cfcdb > > Regards, > Anup > > > > > > > > > Regards, > > > Anup > > > > > > > > > > > > > > > > > > > > > > > > > > > > This implementation, which adds some Kconfig entries that > > > > > > control page table bits, definately isn't suitable for upstream. > > > > > > Allowing users to set arbitrary page table bits will eventually > > > > > > conflict with the standard, and is just going to be a mess. > > > > > > It'll also lead to kernels that are only compatible with > > > > > > specific designs, which we're trying very hard to avoid. At a > > > > > > bare minimum we'll need some way to detect systems with these > > > > > > page table bits before setting them, and some description of > > > > > > what the bits actually do so we can reason about > > > > them. > > > > > > > > > > Yes, vendor specific Kconfig options are strict NO NO. We can't > > > > > give-up the goal of unified kernel image for all platforms. > > > > > > > > > > Regards, > > > > > Anup > > > > > > > > > > > > > > > > -- > > > > Best Regards > > > > Guo Ren > > > > > > > > ML: https://lore.kernel.org/linux-csky/ > > > > > > > > -- > > Best Regards > > Guo Ren > > > > ML: https://lore.kernel.org/linux-csky/
On Mon, Jun 07, 2021 at 11:19:03AM +0800, Guo Ren wrote: > >From Linux non-coherency view, we need: > - Non-cache + Strong Order PTE attributes to deal with drivers' DMA descriptors > - Non-cache + weak order to deal with framebuffer drivers > - CMO dma_sync to sync cache with DMA devices This is not strictly true. At the very minimum you only need cache invalidation and writeback instructions. For example early parisc CPUs and some m68knommu SOCs have no support for uncached areas at all, and Linux works. But to be fair this is very painful and supports only very limited periphals. So for modern full Linux support some uncahed memory is advisable. But that doesn't have to be using PTE attributes. It could also be physical memory regions that are either totally fixed or somewhat dynamic.
On Mon, Jun 7, 2021 at 2:27 PM Christoph Hellwig <hch@lst.de> wrote: > > On Mon, Jun 07, 2021 at 11:19:03AM +0800, Guo Ren wrote: > > >From Linux non-coherency view, we need: > > - Non-cache + Strong Order PTE attributes to deal with drivers' DMA descriptors > > - Non-cache + weak order to deal with framebuffer drivers > > - CMO dma_sync to sync cache with DMA devices > > This is not strictly true. At the very minimum you only need cache > invalidation and writeback instructions. For example early parisc > CPUs and some m68knommu SOCs have no support for uncached areas at all, > and Linux works. But to be fair this is very painful and supports only > very limited periphals. So for modern full Linux support some uncahed > memory is advisable. But that doesn't have to be using PTE attributes. > It could also be physical memory regions that are either totally fixed Double/Triple the size of physical memory regions can't be accepted by SOC vendors, because it wastes HW resources. Some cost-down soc interconnects only have 32bit~34bit width of physical address, are you sure you could force them to expand it? (I can't) > or somewhat dynamic. How can HW implement with dynamic modifying PMA? What's the granularity?
On Mon, Jun 07, 2021 at 02:41:14PM +0800, Guo Ren wrote: > Double/Triple the size of physical memory regions can't be accepted by > SOC vendors, because it wastes HW resources. > Some cost-down soc interconnects only have 32bit~34bit width of > physical address, are you sure you could force them to expand it? (I > can't) > > > or somewhat dynamic. > How can HW implement with dynamic modifying PMA? What's the granularity? I'm just stating the requirements from the Linux DMA perspective. You also do not need tripple the address space, just double.
On Mon, Jun 7, 2021 at 2:51 PM Christoph Hellwig <hch@lst.de> wrote: > > On Mon, Jun 07, 2021 at 02:41:14PM +0800, Guo Ren wrote: > > Double/Triple the size of physical memory regions can't be accepted by > > SOC vendors, because it wastes HW resources. > > Some cost-down soc interconnects only have 32bit~34bit width of > > physical address, are you sure you could force them to expand it? (I > > can't) > > > > > or somewhat dynamic. > > How can HW implement with dynamic modifying PMA? What's the granularity? > > I'm just stating the requirements from the Linux DMA perspective. You > also do not need tripple the address space, just double. With double, you only got "strong order + non-cache" for the DMA descriptor. How about write-combine scenario? Even, double physical memory address space also wastes HW resources. -- Best Regards Guo Ren ML: https://lore.kernel.org/linux-csky/
Στις 2021-06-07 06:19, Guo Ren έγραψε: >> The C-bit was recently dropped, there is a new proposal for Page Based >> Memory Attributes (PBMT) that we can work on / push for. > C-bit still needs discussion, we shouldn't drop it directly. > You can always participate on the discussion on virtmem mailing list. > Raise a page fault won't solve anything. We still need access to the > page with proper performance. > The point is that future hw implementations will be required to return a page fault in case we tamper with those reserved bits, they won't just ignore them. Supporting custom values there means supporting non-compliant implementations. > > We need PTEs to provide a non-coherency solution, and only CMO > instructions are not enough. We can't modify so many Linux drivers to > fit it. > From Linux non-coherency view, we need: > - Non-cache + Strong Order PTE attributes to deal with drivers' DMA > descriptors > - Non-cache + weak order to deal with framebuffer drivers > - CMO dma_sync to sync cache with DMA devices > - Userspace icache_sync solution, which prevents calls to S-mode with > IPI fence.i. (Necessary to JIT java scenarios.) > > All above are not in spec, but the real chips are done. > (Actually, these have been talked about for more than five years, we > still haven't the uniform idea.) > > The idea of C-bit is really important for us which prevents our chips > violates the spec. Have you checked the PBMT proposal ? It defines (so far) the following attributes that can be set on PTEs to override the PMAs of the underlying physical memory: Bits [62:61] 00 (WB) -> Cacheable, default ordering 01 (NC) -> Noncacheable, default ordering 10 (IO) -> Noncacheable, strong ordering So it'll cover the use cases you mention.
On Sat, Jun 5, 2021 at 12:12 AM Palmer Dabbelt <palmer@dabbelt.com> wrote: > > On Fri, 04 Jun 2021 07:47:22 PDT (-0700), guoren@kernel.org wrote: > > Hi Arnd & Palmer, > > > > Sorry for the delayed reply, I'm working on the next version of the patch. > > > > On Fri, Jun 4, 2021 at 5:56 PM Arnd Bergmann <arnd@arndb.de> wrote: > >> > >> On Thu, Jun 3, 2021 at 5:39 PM Palmer Dabbelt <palmer@dabbelt.com> wrote: > >> > On Wed, 02 Jun 2021 23:00:29 PDT (-0700), Anup Patel wrote: > >> > >> This implementation, which adds some Kconfig entries that control page table > >> > >> bits, definately isn't suitable for upstream. Allowing users to set arbitrary > >> > >> page table bits will eventually conflict with the standard, and is just going to > >> > >> be a mess. It'll also lead to kernels that are only compatible with specific > >> > >> designs, which we're trying very hard to avoid. At a bare minimum we'll need > >> > >> some way to detect systems with these page table bits before setting them, > >> > >> and some description of what the bits actually do so we can reason about > >> > >> them. > >> > > > >> > > Yes, vendor specific Kconfig options are strict NO NO. We can't give-up the > >> > > goal of unified kernel image for all platforms. > > Okay, Agree. Please help review the next version of the patch. > > > >> > > >> > I think this is just a phrasing issue, but just to be sure: > >> > > >> > IMO it's not that they're vendor-specific Kconfig options, it's that > >> > turning them on will conflict with standard systems (and other vendors). > >> > We've already got the ability to select sets of Kconfig settings that > >> > will only boot on one vendor's system, which is fine, as long as there > >> > remains a set of Kconfig settings that will boot on all systems. > >> > > >> > An example here would be the errata: every system has errata of some > >> > sort, so if we start flipping off various vendor's errata Kconfigs > >> > you'll end up with kernels that only function properly on some systems. > >> > That's fine with me, as long as it's possible to turn on all vendor's > >> > errata Kconfigs at the same time and the resulting kernel functions > >> > correctly on all systems. > >> > >> Yes, this is generally the goal, and it would be great to have that > >> working in a way where a 'defconfig' build just turns on all the options > >> that are needed to use any SoC specific features and drivers while > >> still working on all hardware. There are however limits you may run > >> into at some point, and other architectures usually only manage to span > >> some 10 to 15 years of hardware implementations with a single > >> kernel before it get really hard. > > I could follow the goal in the next version of the patchset. Please > > help review, thx. > > IMO we're essentially here now with the RISC-V stuff: defconfig flips on > everything necesasry to boot normal-smelling SOCs, with everything being > detected as the system boots. We have some wacky configurations like > !MMU and XIP that are coupled to the hardware, but (and sorry for > crossing the other threads, I missed your pointer as it's early here) as > I said in the other thread it might be time to make it explicit that > those things are non-portable. > > The hope here has always been that we'd have enough in the standards > that we could avoid a proliferation of vendor-specific code. We've > always put a strong "things keep working forever" stake in the ground in > RISC-V land, but that's largely been because we were counting on the > standards existing that make support easy. In practice we don't have > those standards so we're ending up with a fairly large software base > that is required to support everything. We don't have all that much > hardware right now so we'll have to see how it goes, but for now I'm in > favor of keeping defconfig as a "boots on everything" sort of setup -- > both because it makes life easier for users, and because it makes issues > like the non-portable Kconfigs that showed up here quite explicit. I reuse the Image header to pass vendor magic to init PTE attributes' variable before setup_vm. Can you give me some feedback on that patch? https://lore.kernel.org/linux-riscv/1622970249-50770-9-git-send-email-guoren@kernel.org/T/#mdc0dacba57346b5ac59a01961495c132b93cfcdb > > If we get to 10/15 years of hardware then I'm sure we'll be removing old > systems from defconfig (or maybe even the kernel entirely, a lot of this > stuff isn't in production). I'm just hoping we make it that far ;) > > >> To give some common examples that make it break down: > >> > >> - 32-bit vs 64-bit already violates that rule on risc-v (as it does on > >> most other architectures) > > Yes, and there's no way around that on RISC-V. They're different base > ISAs therefor re-define the same instructions, so we're essentially at > two kernel binaries by that point. The platform spec says rv64gc, so we > can kind of punt on this one for now. If rv32 hardware shows up > we'll probably want a standard system there too, which is why we've > avoided coupling kernel portability to XLEN. > > >> - architectures that support both big-endian and little-endian kernels > >> tend to have platforms that require one or the other (e.g. mips, > >> though not arm). Not an issue for you. > > It is now! We've added big-endian to RISC-V. There's no hardware yet > and very little software support. IMO the right answer is to ban that > from the platform spec, but again it'll depnd on what vendors want to > build (though anyone is listening, please don't make my life miserable > ;)). > > >> - page table formats are the main cause of incompatibility: arm32 > >> and x86-32 require three-level tables for certain features, but those > >> are incompatible with older cores, arm64 supports three different > >> page sizes, but none of them works on all cores (4KB almost works > >> everywhere). > > We actually have some support on the works for multiple page table > levels in a single binary, which should help with a lot of that > incompatibility. I don't know of any plans to couple other page table > features to the number of levels, though. > > >> - SMP-enabled ARMv7 kernels can be configured to run on either > >> ARMv6 or ARMv8, but not both, in this case because of incompatible > >> barrier instructions. > > Our barriers aren't quite split the same way, but we do have two memory > models (RVWMO and TSO). IIUC we should be able to support both in the > same kernels with some patching, but the resulting kernels would be > biased towards one memory models over the other WRT performance. Again, > we'll have to see what the vendors do and I'm hoping we don't end up > with too many headaches. > > >> - 32-bit Arm has a couple more remaining features that require building > >> a machine specific kernel if enabled because they hardcode physical > >> addresses: early printk (debug_ll, not the normal earlycon), NOMMU, > >> and XIP. > > We've got NOMMU and XIP as well, but we have some SBI support for early > printk. IMO we're not really sure if we've decoupled all the PA layout > dependencies yet from Linux, as we really only support one vendor's > systems, but we've had a lot of work lately on beefing up our memory > layout so with any luck we'll be able to quickly sort out anything that > comes up.
From: Christoph Hellwig > Sent: 07 June 2021 07:27 > > On Mon, Jun 07, 2021 at 11:19:03AM +0800, Guo Ren wrote: > > >From Linux non-coherency view, we need: > > - Non-cache + Strong Order PTE attributes to deal with drivers' DMA descriptors > > - Non-cache + weak order to deal with framebuffer drivers > > - CMO dma_sync to sync cache with DMA devices > > This is not strictly true. At the very minimum you only need cache > invalidation and writeback instructions. For example early parisc > CPUs and some m68knommu SOCs have no support for uncached areas at all, > and Linux works. But to be fair this is very painful and supports only > very limited periphals. So for modern full Linux support some uncahed > memory is advisable. But that doesn't have to be using PTE attributes. > It could also be physical memory regions that are either totally fixed > or somewhat dynamic. It is almost impossible to interface to many ethernet chips without either coherent or uncached memory for the descriptor rings. The status bits on the transmit ring are particularly problematic. The receive ring can be done with writeback+invalidate provided you fill a cache line at a time. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Tue, Jun 08, 2021 at 03:00:17PM +0000, David Laight wrote: > It is almost impossible to interface to many ethernet chips without > either coherent or uncached memory for the descriptor rings. > The status bits on the transmit ring are particularly problematic. > > The receive ring can be done with writeback+invalidate provided you > fill a cache line at a time. It is horrible, but it has been done. Take a look at: drivers/net/ethernet/i825xx/lasi_82596.c and drivers/net/ethernet/seeq/sgiseeq.c
From: 'Christoph Hellwig' > Sent: 08 June 2021 16:32 > > On Tue, Jun 08, 2021 at 03:00:17PM +0000, David Laight wrote: > > It is almost impossible to interface to many ethernet chips without > > either coherent or uncached memory for the descriptor rings. > > The status bits on the transmit ring are particularly problematic. > > > > The receive ring can be done with writeback+invalidate provided you > > fill a cache line at a time. > > It is horrible, but it has been done. Take a look at: > > drivers/net/ethernet/i825xx/lasi_82596.c and > drivers/net/ethernet/seeq/sgiseeq.c I guess that each transmit has to be split into enough fragments that they fill a cache line. That won't work with some (probably old now) devices that require the first fragment to be 64 bytes because it won't back-up the descriptors after a collision. It's all as horrid as a DSP we have that can't receive ethernet frames onto a 4n+2 boundary and doesn't support misaligned accesses. Mind you, Sun's original Sbus ethernet board had to be given a 4n aligned rx buffer and then a misaligned copy done in kernel in order to not drop packets! David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Mon, Jun 7, 2021 at 2:14 AM Nick Kossifidis <mick@ics.forth.gr> wrote: > > Στις 2021-05-20 04:45, Guo Ren έγραψε: > > On Wed, May 19, 2021 at 2:53 PM Christoph Hellwig <hch@lst.de> wrote: > >> > >> On Tue, May 18, 2021 at 11:44:35PM -0700, Drew Fustini wrote: > >> > This patch series looks like it might be useful for the StarFive JH7100 > >> > [1] [2] too as it has peripherals on a non-coherent interconnect. GMAC, > >> > USB and SDIO require that the L2 cache must be manually flushed after > >> > DMA operations if the data is intended to be shared with U74 cores [2]. > >> > >> Not too much, given that the SiFive lineage CPUs have an uncached > >> window, that is a totally different way to allocate uncached memory. > > It's a very big MIPS smell. What's the attribute of the uncached > > window? (uncached + strong-order/ uncached + weak, most vendors still > > use AXI interconnect, how to deal with a bufferable attribute?) In > > fact, customers' drivers use different ways to deal with DMA memory in > > non-coherent SOC. Most riscv SOC vendors are from ARM, so giving them > > the same way in DMA memory is a smart choice. So using PTE attributes > > is more suitable. > > > > See: > > https://github.com/riscv/virtual-memory/blob/main/specs/611-virtual-memory-diff.pdf > > 4.4.1 > > The draft supports custom attribute bits in PTE. > > > > Not only it doesn't support custom attributes on PTEs: > > "Bits63–54 are reserved for future standard use and must be zeroed by > software for forward compatibility." > > It also goes further to say that: > > "if any of these bits are set, a page-fault exception is raised" Agree, when our processor's mmu works in compatible mmu, we must keep "Bits63–54 bit" zero in Linux. So, I think this is the first version of the PTE format. If the "PBMT" extension proposal is approved, it will cause the second version of the PTE format. Maybe in the future, we'll get more versions of the PTE formats. So, seems Linux must support multi versions of PTE formats with one Image, right? Okay, we could stop arguing with the D1 PTE format. And talk about how to let Linux support multi versions of PTE formats that come from the future RISC-V privilege spec.
On Wed, 9 Jun 2021 11:28:19 +0800 Guo Ren <guoren@kernel.org> wrote: > > > On Mon, Jun 7, 2021 at 2:14 AM Nick Kossifidis <mick@ics.forth.gr> wrote: > > > > Στις 2021-05-20 04:45, Guo Ren έγραψε: > > > On Wed, May 19, 2021 at 2:53 PM Christoph Hellwig <hch@lst.de> wrote: > > >> > > >> On Tue, May 18, 2021 at 11:44:35PM -0700, Drew Fustini wrote: > > >> > This patch series looks like it might be useful for the StarFive JH7100 > > >> > [1] [2] too as it has peripherals on a non-coherent interconnect. GMAC, > > >> > USB and SDIO require that the L2 cache must be manually flushed after > > >> > DMA operations if the data is intended to be shared with U74 cores [2]. > > >> > > >> Not too much, given that the SiFive lineage CPUs have an uncached > > >> window, that is a totally different way to allocate uncached memory. > > > It's a very big MIPS smell. What's the attribute of the uncached > > > window? (uncached + strong-order/ uncached + weak, most vendors still > > > use AXI interconnect, how to deal with a bufferable attribute?) In > > > fact, customers' drivers use different ways to deal with DMA memory in > > > non-coherent SOC. Most riscv SOC vendors are from ARM, so giving them > > > the same way in DMA memory is a smart choice. So using PTE attributes > > > is more suitable. > > > <snip> > > > 4.4.1 > > > The draft supports custom attribute bits in PTE. > > > > > > > Not only it doesn't support custom attributes on PTEs: > > > > "Bits63–54 are reserved for future standard use and must be zeroed by > > software for forward compatibility." > > > > It also goes further to say that: > > > > "if any of these bits are set, a page-fault exception is raised" > Agree, when our processor's mmu works in compatible mmu, we must keep > "Bits63–54 bit" zero in Linux. > So, I think this is the first version of the PTE format. > > If the "PBMT" extension proposal is approved, it will cause the second > version of the PTE format. > > Maybe in the future, we'll get more versions of the PTE formats. > > So, seems Linux must support multi versions of PTE formats with one > Image, right? > > Okay, we could stop arguing with the D1 PTE format. And talk about how > to let Linux support multi versions of PTE formats that come from the > future RISC-V privilege spec. > Just my humble opinion: When those bits(63~54) usage are standardized in future RISC-V privilege spec generic Image can still be supported with the following solutions: *alternative patch only fly: If the bit is only need to be set during init, we may insert nop instruction(s) at proper place, then patch the nop into set_the_target_bit instruction(s) by hart's feature. *normal check feature then use: If the feature needs a bit complex code, we could go through the "feature check then use". static key tech can be used here to avoid branches.
Στις 2021-06-09 06:28, Guo Ren έγραψε: > On Mon, Jun 7, 2021 at 2:14 AM Nick Kossifidis <mick@ics.forth.gr> > wrote: >> >> Στις 2021-05-20 04:45, Guo Ren έγραψε: >> > On Wed, May 19, 2021 at 2:53 PM Christoph Hellwig <hch@lst.de> wrote: >> >> >> >> On Tue, May 18, 2021 at 11:44:35PM -0700, Drew Fustini wrote: >> >> > This patch series looks like it might be useful for the StarFive JH7100 >> >> > [1] [2] too as it has peripherals on a non-coherent interconnect. GMAC, >> >> > USB and SDIO require that the L2 cache must be manually flushed after >> >> > DMA operations if the data is intended to be shared with U74 cores [2]. >> >> >> >> Not too much, given that the SiFive lineage CPUs have an uncached >> >> window, that is a totally different way to allocate uncached memory. >> > It's a very big MIPS smell. What's the attribute of the uncached >> > window? (uncached + strong-order/ uncached + weak, most vendors still >> > use AXI interconnect, how to deal with a bufferable attribute?) In >> > fact, customers' drivers use different ways to deal with DMA memory in >> > non-coherent SOC. Most riscv SOC vendors are from ARM, so giving them >> > the same way in DMA memory is a smart choice. So using PTE attributes >> > is more suitable. >> > >> > See: >> > https://github.com/riscv/virtual-memory/blob/main/specs/611-virtual-memory-diff.pdf >> > 4.4.1 >> > The draft supports custom attribute bits in PTE. >> > >> >> Not only it doesn't support custom attributes on PTEs: >> >> "Bits63–54 are reserved for future standard use and must be zeroed by >> software for forward compatibility." >> >> It also goes further to say that: >> >> "if any of these bits are set, a page-fault exception is raised" > Agree, when our processor's mmu works in compatible mmu, we must keep > "Bits63–54 bit" zero in Linux. > So, I think this is the first version of the PTE format. > > If the "PBMT" extension proposal is approved, it will cause the second > version of the PTE format. > > Maybe in the future, we'll get more versions of the PTE formats. > > So, seems Linux must support multi versions of PTE formats with one > Image, right? > > Okay, we could stop arguing with the D1 PTE format. And talk about how > to let Linux support multi versions of PTE formats that come from the > future RISC-V privilege spec. The RISC-V ISA specs are meant to be backwards compatible, so newer PTE versions should work on older devices (note that the spec says that software must set those bits to zero for "forward compatibility" and are "reserved for future use" so current implementations must ignore them). Obviously the proposed "if any of these bits are set, a page-fault exception is raised" will break backwards compatibility which is why we need to ask for it to be removed from the draft. As an example the PBMT proposal uses bits 62:61 that on older hw should be ignored ("reserved for future use"), if Linux uses those bits we won't need a different code path for supporting older hw/older PTE versions, we'll just set them and older hw will ignore them. Because of the guarantee that ISA specs maintain backwards compatibility, the functionality of bits 62:61 is guaranteed to remain backwards compatible. In other words we don't need any special handling of multiple PTE formats, we just need to support the latest Priv. Spec and the Spec itself will guarantee backwards compatibility.
On Wed, Jun 9, 2021 at 5:45 PM Nick Kossifidis <mick@ics.forth.gr> wrote: > > Στις 2021-06-09 06:28, Guo Ren έγραψε: > > On Mon, Jun 7, 2021 at 2:14 AM Nick Kossifidis <mick@ics.forth.gr> > > wrote: > >> > >> Στις 2021-05-20 04:45, Guo Ren έγραψε: > >> > On Wed, May 19, 2021 at 2:53 PM Christoph Hellwig <hch@lst.de> wrote: > >> >> > >> >> On Tue, May 18, 2021 at 11:44:35PM -0700, Drew Fustini wrote: > >> >> > This patch series looks like it might be useful for the StarFive JH7100 > >> >> > [1] [2] too as it has peripherals on a non-coherent interconnect. GMAC, > >> >> > USB and SDIO require that the L2 cache must be manually flushed after > >> >> > DMA operations if the data is intended to be shared with U74 cores [2]. > >> >> > >> >> Not too much, given that the SiFive lineage CPUs have an uncached > >> >> window, that is a totally different way to allocate uncached memory. > >> > It's a very big MIPS smell. What's the attribute of the uncached > >> > window? (uncached + strong-order/ uncached + weak, most vendors still > >> > use AXI interconnect, how to deal with a bufferable attribute?) In > >> > fact, customers' drivers use different ways to deal with DMA memory in > >> > non-coherent SOC. Most riscv SOC vendors are from ARM, so giving them > >> > the same way in DMA memory is a smart choice. So using PTE attributes > >> > is more suitable. > >> > > >> > See: > >> > https://github.com/riscv/virtual-memory/blob/main/specs/611-virtual-memory-diff.pdf > >> > 4.4.1 > >> > The draft supports custom attribute bits in PTE. > >> > > >> > >> Not only it doesn't support custom attributes on PTEs: > >> > >> "Bits63–54 are reserved for future standard use and must be zeroed by > >> software for forward compatibility." > >> > >> It also goes further to say that: > >> > >> "if any of these bits are set, a page-fault exception is raised" > > Agree, when our processor's mmu works in compatible mmu, we must keep > > "Bits63–54 bit" zero in Linux. > > So, I think this is the first version of the PTE format. > > > > If the "PBMT" extension proposal is approved, it will cause the second > > version of the PTE format. > > > > Maybe in the future, we'll get more versions of the PTE formats. > > > > So, seems Linux must support multi versions of PTE formats with one > > Image, right? > > > > Okay, we could stop arguing with the D1 PTE format. And talk about how > > to let Linux support multi versions of PTE formats that come from the > > future RISC-V privilege spec. > > The RISC-V ISA specs are meant to be backwards compatible, so newer PTE > versions should work on older devices (note that the spec says that > software must set those bits to zero for "forward compatibility" and are > "reserved for future use" so current implementations must ignore them). > Obviously the proposed "if any of these bits are set, a page-fault > exception is raised" will break backwards compatibility which is why we > need to ask for it to be removed from the draft. > > As an example the PBMT proposal uses bits 62:61 that on older hw should > be ignored ("reserved for future use"), if Linux uses those bits we > won't need a different code path for supporting older hw/older PTE > versions, we'll just set them and older hw will ignore them. Because of > the guarantee that ISA specs maintain backwards compatibility, the > functionality of bits 62:61 is guaranteed to remain backwards > compatible. the spec says that software must set those bits to zero for "forward compatibility". So how older hw ignore them? If an older hw follow the current spec requires software to set those bits to zero, how we put any PBMT bits without different Linux PTE formats? > > In other words we don't need any special handling of multiple PTE > formats, we just need to support the latest Priv. Spec and the Spec > itself will guarantee backwards compatibility. Nak, totally no Logically self-consistent.
From: Guo Ren <guoren@linux.alibaba.com> The RISC-V ISA doesn't yet specify how to query or modify PMAs, so let vendors define the custom properties of memory regions in PTE. This patchset helps SOC vendors to support their own custom interconnect coherent solution with PTE attributes. For example, allwinner D1[1] uses T-HEAD C906 as main processor, C906 has two modes in MMU: - Compatible mode, the same as the definitions in spec. - Enhanced mode, add custom DMA_COHERENT attribute bits in PTE which not mentioned in spec. Allwinner D1 needs the enhanced mode to support the DMA type device with non-coherent interconnect in its SOC. C906 uses BITS(63 - 59) as custom attribute bits in PTE. Here is the config example for Allwinner D1: CONFIG_RISCV_DMA_COHERENT=y CONFIG_RISCV_PAGE_DMA_MASK=0xf800000000000000 CONFIG_RISCV_PAGE_CACHE=0x7000000000000000 CONFIG_RISCV_PAGE_DMA_NONCACHE=0x8000000000000000 Link: https://linux-sunxi.org/D1 [1] Guo Ren (3): riscv: pgtable.h: Fixup _PAGE_CHG_MASK usage riscv: Add DMA_COHERENT for custom PTE attributes riscv: Add SYNC_DMA_FOR_CPU/DEVICE for DMA_COHERENT arch/riscv/Kconfig | 31 ++++++++++++++++++++++++++ arch/riscv/include/asm/pgtable-64.h | 8 ++++--- arch/riscv/include/asm/pgtable-bits.h | 13 ++++++++++- arch/riscv/include/asm/pgtable.h | 26 +++++++++++++++++----- arch/riscv/include/asm/sbi.h | 16 ++++++++++++++ arch/riscv/kernel/sbi.c | 19 ++++++++++++++++ arch/riscv/mm/Makefile | 4 ++++ arch/riscv/mm/dma-mapping.c | 41 +++++++++++++++++++++++++++++++++++ 8 files changed, 148 insertions(+), 10 deletions(-) create mode 100644 arch/riscv/mm/dma-mapping.c