Message ID | 20210320151903.60759-1-sven@svenpeter.dev (mailing list archive) |
---|---|
Headers | show |
Series | Apple M1 DART IOMMU driver | expand |
> Date: Sat, 20 Mar 2021 15:19:33 +0000 > From: Sven Peter <sven@svenpeter.dev> > > Hi, > > After Hector's initial work [1] to bring up Linux on Apple's M1 it's time to > bring up more devices. Most peripherals connected to the SoC are behind a iommu > which Apple calls "Device Address Resolution Table", or DART for short [2]. > Unfortunately, it only shares the name with PowerPC's DART. > Configuring this iommu is mandatory if these peripherals require DMA access. > > This patchset implements initial support for this iommu. The hardware itself > uses a pagetable format that's very similar to the one already implement in > io-pgtable.c. There are some minor modifications, namely some details of the > PTE format and that there are always three pagetable levels, which I've > implement as a new format variant. > > I have mainly tested this with the USB controller in device mode which is > compatible with Linux's dwc3 driver. Some custom PHY initialization (which is > not yet ready or fully understood) is required though to bring up the ports, > see e.g. my patches to our m1n1 bootloader [3,4]. If you want to test the same > setup you will probably need that branch for now and add the nodes from > the DT binding specification example to your device tree. > > Even though each DART instances could support up to 16 devices usually only > a single device is actually connected. Different devices generally just use > an entirely separate DART instance with a seperate MMIO range, IRQ, etc. > > I have just noticed today though that at least the USB DWC3 controller in host > mode uses *two* darts at the same time. I'm not sure yet which parts seem to > require which DART instance. > > This means that we might need to support devices attached to two iommus > simultaneously and just create the same iova mappings. Currently this only > seems to be required for USB according to Apple's Device Tree. > > I see two options for this and would like to get feedback before > I implement either one: > > 1) Change #iommu-cells = <1>; to #iommu-cells = <2>; and use the first cell > to identify the DART and the second one to identify the master. > The DART DT node would then also take two register ranges that would > correspond to the two DARTs. Both instances use the same IRQ and the > same clocks according to Apple's device tree and my experiments. > This would keep a single device node and the DART driver would then > simply map iovas in both DARTs if required. > > 2) Keep #iommu-cells as-is but support > iommus = <&usb_dart1a 1>, <&usb_dart1b 0>; > instead. > This would then require two devices nodes for the two DART instances and > some housekeeping in the DART driver to support mapping iovas in both > DARTs. > I believe omap-iommu.c supports this setup but I will have to read > more code to understand the details there and figure out how to implement > this in a sane way. > > I currently prefer the first option but I don't understand enough details of > the iommu system to actually make an informed decision. > I'm obviously also open to more options :-) Hi Sven, I don't think the first option is going to work for PCIe. PCIe devices will have to use "iommu-map" properties to map PCI devices to the right iommu, and the currently implementation seems to assume that #iommu-cells = <1>. The devictree binding[1] doesn't explicitly state that it relies on #iommu-cells = <1>, but it isn't clear how the rid-base to iommu-base mapping mechanism would work when that isn't the case. Now the PCIe DARTs are simpler and seem to have only one "instance" per DART. So if we keep #iommu-cells = <1> for those, you'd still be fine using the first approach. As I mentioned before, not all DARTs support the full 32-bit aperture. In particular the PCIe DARTs support a smaller address-space. It is not clear whether this is a restriction of the PCIe host controller or the DART, but the Apple Device Tree has "vm-base" and "vm-size" properties that encode the base address and size of the aperture. These single-cell properties which is probably why for the USB DARTs only "vm-base" is given; since "vm-base" is 0, a 32-bit number wouldn't be able to encode the full aperture size. We could make them 64-bit numbers in the Linux device tree though and always be explicit about the size. Older Sun SPARC machines used a single "virtual-dma" property to encode the aperture. We could do someting similar. You would use this property to initialize domain->geometry.aperture_start and domain->geometry.aperture_end in diff 3/3 of this series. I think it would make sense to include this in this series, as this would make adding support for PCIe very easy, and PCIe gives you aupport for network (both wired and wireless) and the type-A USB ports on the mini. Cheers, Mark [1] Documentation/devicetree/bindings/pci/pci-iommu.txt
Hi Mark, Sorry for the spam if you get this message twice. This is pretty embarrassing but I've just switched mail providers after ProtonMail messed up yesterday and it looks like my new one defaulted to sending HTML messages even though I only typed plaintext. This shouldn't have happened in the first place but it certainly shouldn't happen again now :-( > On 21. Mar 2021, at 17:00, Mark Kettenis <mark.kettenis@xs4all.nl> wrote: > > I don't think the first option is going to work for PCIe. PCIe > devices will have to use "iommu-map" properties to map PCI devices to > the right iommu, and the currently implementation seems to assume that > #iommu-cells = <1>. The devictree binding[1] doesn't explicitly state > that it relies on #iommu-cells = <1>, but it isn't clear how the > rid-base to iommu-base mapping mechanism would work when that isn't > the case. > > Now the PCIe DARTs are simpler and seem to have only one "instance" > per DART. So if we keep #iommu-cells = <1> for those, you'd still be > fine using the first approach. Good point, I guess that only leaves option two for now then. Having some DARTs use cells = <1> and others <2> sounds confusing to me. > > As I mentioned before, not all DARTs support the full 32-bit aperture. > In particular the PCIe DARTs support a smaller address-space. It is > not clear whether this is a restriction of the PCIe host controller or > the DART, but the Apple Device Tree has "vm-base" and "vm-size" > properties that encode the base address and size of the aperture. > These single-cell properties which is probably why for the USB DARTs > only "vm-base" is given; since "vm-base" is 0, a 32-bit number > wouldn't be able to encode the full aperture size. We could make them > 64-bit numbers in the Linux device tree though and always be explicit > about the size. Older Sun SPARC machines used a single "virtual-dma" > property to encode the aperture. We could do someting similar. You > would use this property to initialize domain->geometry.aperture_start > and domain->geometry.aperture_end in diff 3/3 of this series. > > I think it would make sense to include this in this series, as this > would make adding support for PCIe very easy, and PCIe gives you > aupport for network (both wired and wireless) and the type-A USB ports > on the mini. Agreed, I'd ideally like to converge on a device tree binding that won't have to change in the near future. I've tried to use an address space larger than 32bit and that seems to work for parts of the dwc3 controller but breaks for the xhci parts because the upper lines don't seem to be connected there (e.g. if xhci tries to use <0x1 0xffff0000> I get a fault for <0 0xffff0000>). Looking at other iommu drivers I have found the following two similar bindings: qcom uses a ranges property with a 64bit address and 32 bit size [1] apps_iommu: iommu@1e20000 { ... ranges = <0 0x1e20000 0x40000>; ... }; and tegra seems to support a dma-window property with 32bit address and size [2] smmu { [...] dma-window = <0 0x40000000>; /* IOVA start & length */ [...] }; I believe there already is of_get_dma_window to handle parsing this in the common iommu code [3] but I can't find any place using it. It's a little bit more complex that we need since it allows to specify the number of cells for both the address and the size but it should allow us to express all possible configurations: usb_dart { [ ... ] #dma-address-cells = <1>; #dma-size-cells = <2>; dma-window = <0 0x1 0x0>; [ ... ] }; pcie_dart { [ ... ] #dma-address-cells = <1>; #dma-size-cells = <1>; dma-window = <0x100000 0x3fe00000>; [ ... ] }; where #dma-address-cells and #dma-size-cells default to #address-cells and #size-cells respectively if I understand the code correctly. That way we could also just always use a 64bit address and size in the DT, e.g. pcie_dart { [ ... ] dma-window = <0 0x100000 0 0x3fe00000>; [ ... ] }; Best, Sven [1] Documentation/devicetree/bindings/iommu/qcom,iommu.txt [2] Documentation/devicetree/bindings/iommu/nvidia,tegra30-smmu.txt [3] drivers/iommu/of_iommu.c
> Date: Sun, 21 Mar 2021 18:22:15 +0100 > From: "Sven Peter" <sven@svenpeter.dev> > > Hi Mark, > > On 21. Mar 2021, at 17:00, Mark Kettenis <mark.kettenis@xs4all.nl> > wrote: > > I don't think the first option is going to work for PCIe. PCIe > devices will have to use "iommu-map" properties to map PCI devices to > the right iommu, and the currently implementation seems to assume that > #iommu-cells = <1>. The devictree binding[1] doesn't explicitly state > that it relies on #iommu-cells = <1>, but it isn't clear how the > rid-base to iommu-base mapping mechanism would work when that isn't > the case. > > Now the PCIe DARTs are simpler and seem to have only one "instance" > per DART. So if we keep #iommu-cells = <1> for those, you'd still be > fine using the first approach. > > Good point, I guess that only leaves option two for now then. > Having some DARTs use cells = <1> and others <2> sounds confusing to me. Guess we do need to understand a little bit better how the USB DART actually works. My hypothesis (based on our discussion on #asahi) is that the XHCI host controller and the peripheral controller of the DWC3 block use different DMA "streams" that are handled by the different sub-DARTs. The Corellium folks use a DART + sub-DART model in their driver and a single node in the device tree that represents both. That might sense since the error registers and interrupts are shared. Maybe it would make sense to select the appropriate sub-DART based on the DMA stream ID? > As I mentioned before, not all DARTs support the full 32-bit aperture. > In particular the PCIe DARTs support a smaller address-space. It is > not clear whether this is a restriction of the PCIe host controller or > the DART, but the Apple Device Tree has "vm-base" and "vm-size" > properties that encode the base address and size of the aperture. > These single-cell properties which is probably why for the USB DARTs > only "vm-base" is given; since "vm-base" is 0, a 32-bit number > wouldn't be able to encode the full aperture size. We could make them > 64-bit numbers in the Linux device tree though and always be explicit > about the size. Older Sun SPARC machines used a single "virtual-dma" > property to encode the aperture. We could do someting similar. You > would use this property to initialize domain->geometry.aperture_start > and domain->geometry.aperture_end in diff 3/3 of this series. > > I think it would make sense to include this in this series, as this > would make adding support for PCIe very easy, and PCIe gives you > aupport for network (both wired and wireless) and the type-A USB ports > on the mini. > > Agreed, I'd ideally like to converge on a device tree binding > that won't have to change in the near future. > > I've tried to use an address space larger than 32bit and that seems to > work for parts of the dwc3 controller but breaks for the xhci parts because > the upper lines don't seem to be connected there (e.g. if xhci tries to > use <0x1 0xffff0000> I get a fault for <0 0xffff0000>). > > Looking at other iommu drivers I have found the following two similar > bindings: Ah, missed those. > qcom uses a ranges property with a 64bit address and 32 bit size [1] > > apps_iommu: iommu@1e20000 { > ... > ranges = <0 0x1e20000 0x40000>; > ... > }; Doesn't sound like a very good idea to me to repurpose "ranges" for this... > > and tegra seems to support a dma-window property with 32bit address > and size [2] > > smmu { > [...] > dma-window = <0 0x40000000>; /* IOVA start & length */ > [...] > }; > > I believe there already is of_get_dma_window to handle parsing this > in the common iommu code [3] but I can't find any place using it. > It's a little bit more complex that we need since it allows to specify the > number of cells for both the address and the size but it should allow us to > express all possible configurations: > > usb_dart { > [ ... ] > #dma-address-cells = <1>; > #dma-size-cells = <2>; > dma-window = <0 0x1 0x0>; > [ ... ] > }; > pcie_dart { > [ ... ] > #dma-address-cells = <1>; > #dma-size-cells = <1>; > dma-window = <0x100000 0x3fe00000>; > [ ... ] > }; > > where #dma-address-cells and #dma-size-cells default to > #address-cells and #size-cells respectively if I understand > the code correctly. That way we could also just always use > a 64bit address and size in the DT, e.g. > > pcie_dart { > [ ... ] > dma-window = <0 0x100000 0 0x3fe00000>; > [ ... ] > }; That sounds like a serious contender to me! Hopefully one of the Linux kernel developers can give this some sort of blessing. I think it would make sense for us to just rely on the #address-cells and #size-cells defaults for the M1 device tree.
Hi Mark, On Sun, Mar 21, 2021, at 19:35, Mark Kettenis wrote: > > Guess we do need to understand a little bit better how the USB DART > actually works. My hypothesis (based on our discussion on #asahi) is > that the XHCI host controller and the peripheral controller of the > DWC3 block use different DMA "streams" that are handled by the > different sub-DARTs. I've done some more experiments and the situation is unfortunately more complicated: Most DMA transfers are translated with the first DART. But sometimes (and I have not been able to figure out the exact conditions) transfers instead have to go through the second DART. This happens e.g. with one of my USB keyboards after a stop EP command is issued: Suddenly the xhci_ep_ctx struct must be translated through the second DART. What this likely means is that we'll need to point both DARTs to the same pagetables and just issue the TLB maintenance operations as a group. > > The Corellium folks use a DART + sub-DART model in their driver and a > single node in the device tree that represents both. That might sense > since the error registers and interrupts are shared. Maybe it would > make sense to select the appropriate sub-DART based on the DMA stream > ID? dwc3 specifically seems to require stream id #1 from the DART at <0x5 0x02f00000> and stream id #0 from the DART at <0x5 0x02f80000>. Both of these only share a IRQ line but are otherwise completely independent. Each has their own error registers, etc. and we need some way to specify these two DARTs + the appropriate stream ID. Essentially we have three options to represent this now: 1) Add both DARTs as separate regs, use #iommu-cells = <2> and have the first cell select the DART and the second one the stream ID. We could allow #iommu-cells = <1> in case only one reg is specified for the PCIe DART: usb_dart1@502f00000 { compatible = "apple,t8103-dart"; reg = <0x5 0x02f00000 0x0 0x4000>, <0x5 0x02f80000 0x0 0x4000>; #iommu-cells = <2>; ... }; usb1 { iommus = <&usb_dart1 0 1>, <&usb_dart1 1 0>; ... }; I prefer this option because we fully describe the DART in a single device node here. It also feels natural to group them like this because they need to share some properties (like dma-window and the interrupt) anyway. 2) Create two DART nodes which share the same IRQ line and attach them both to the master node: usb_dart1a@502f00000 { compatible = "apple,t8103-dart"; reg = <0x5 0x02f00000 0x0 0x4000>; #iommu-cells = <1>; ... }; usb_dart1b@502f80000 { compatible = "apple,t8103-dart"; reg = <0x5 0x02f80000 0x0 0x4000>; #iommu-cells = <1>; ... }; usb1 { iommus = <&usb_dart1a 1>, <&usb_dart1b 0>; ... }; I dislike this one because attaching those two DARTs to a single device seems rather unusual. We'd also have to duplicate the dma-window setting, make sure it's the same for both DARTs and there are probably even more complications I can't think of right now. It seems like this would also make the device tree very verbose and the implementation itself more complicated. 3) Introduce another property and let the DART driver take care of mirroring the pagetables. I believe this would be similar to the sid-remap property: usb_dart1@502f00000 { compatible = "apple,t8103-dart"; reg = <0x5 0x02f00000 0x0 0x4000>, <0x5 0x02f80000 0x0 0x4000>; #iommu-cells = <1>; sid-remap = <0 1>; }; usb1 { iommus = <&usb_dart1 0>; }; I slightly dislike this one because we now specify which stream id to use in two places: Once in the device node and another time in the new property in the DART node. I also don't think the binding is much simpler than the first one. > > where #dma-address-cells and #dma-size-cells default to > > #address-cells and #size-cells respectively if I understand > > the code correctly. That way we could also just always use > > a 64bit address and size in the DT, e.g. > > > > pcie_dart { > > [ ... ] > > dma-window = <0 0x100000 0 0x3fe00000>; > > [ ... ] > > }; > > That sounds like a serious contender to me! Hopefully one of the > Linux kernel developers can give this some sort of blessing. > > I think it would make sense for us to just rely on the #address-cells > and #size-cells defaults for the M1 device tree. > Agreed. Best, Sven
> Date: Mon, 22 Mar 2021 23:17:31 +0100 > From: "Sven Peter" <sven@svenpeter.dev> > > Hi Mark, > > On Sun, Mar 21, 2021, at 19:35, Mark Kettenis wrote: > > > > Guess we do need to understand a little bit better how the USB DART > > actually works. My hypothesis (based on our discussion on #asahi) is > > that the XHCI host controller and the peripheral controller of the > > DWC3 block use different DMA "streams" that are handled by the > > different sub-DARTs. > > I've done some more experiments and the situation is unfortunately more > complicated: Most DMA transfers are translated with the first DART. > But sometimes (and I have not been able to figure out the exact conditions) > transfers instead have to go through the second DART. > This happens e.g. with one of my USB keyboards after a stop EP command > is issued: Suddenly the xhci_ep_ctx struct must be translated through the > second DART. > > What this likely means is that we'll need to point both DARTs > to the same pagetables and just issue the TLB maintenance operations > as a group. > > > > > The Corellium folks use a DART + sub-DART model in their driver and a > > single node in the device tree that represents both. That might sense > > since the error registers and interrupts are shared. Maybe it would > > make sense to select the appropriate sub-DART based on the DMA stream > > ID? > > dwc3 specifically seems to require stream id #1 from the DART > at <0x5 0x02f00000> and stream id #0 from the DART at <0x5 0x02f80000>. > Both of these only share a IRQ line but are otherwise completely independent. > Each has their own error registers, etc. and we need some way to > specify these two DARTs + the appropriate stream ID. > > Essentially we have three options to represent this now: > > 1) Add both DARTs as separate regs, use #iommu-cells = <2> and have the > first cell select the DART and the second one the stream ID. > We could allow #iommu-cells = <1> in case only one reg is specified > for the PCIe DART: > > usb_dart1@502f00000 { > compatible = "apple,t8103-dart"; > reg = <0x5 0x02f00000 0x0 0x4000>, <0x5 0x02f80000 0x0 0x4000>; > #iommu-cells = <2>; > ... > }; > > usb1 { > iommus = <&usb_dart1 0 1>, <&usb_dart1 1 0>; > ... > }; > > I prefer this option because we fully describe the DART in a single > device node here. It also feels natural to group them like this because > they need to share some properties (like dma-window and the interrupt) > anyway. > > 2) Create two DART nodes which share the same IRQ line and attach them > both to the master node: > > usb_dart1a@502f00000 { > compatible = "apple,t8103-dart"; > reg = <0x5 0x02f00000 0x0 0x4000>; > #iommu-cells = <1>; > ... > }; > usb_dart1b@502f80000 { > compatible = "apple,t8103-dart"; > reg = <0x5 0x02f80000 0x0 0x4000>; > #iommu-cells = <1>; > ... > }; > > usb1 { > iommus = <&usb_dart1a 1>, <&usb_dart1b 0>; > ... > }; > > I dislike this one because attaching those two DARTs to a single device > seems rather unusual. We'd also have to duplicate the dma-window setting, > make sure it's the same for both DARTs and there are probably even more > complications I can't think of right now. It seems like this would also > make the device tree very verbose and the implementation itself more > complicated. > > 3) Introduce another property and let the DART driver take care of > mirroring the pagetables. I believe this would be similar to > the sid-remap property: > > usb_dart1@502f00000 { > compatible = "apple,t8103-dart"; > reg = <0x5 0x02f00000 0x0 0x4000>, <0x5 0x02f80000 0x0 0x4000>; > #iommu-cells = <1>; > sid-remap = <0 1>; > }; > usb1 { > iommus = <&usb_dart1 0>; > }; > > I slightly dislike this one because we now specify which stream id > to use in two places: Once in the device node and another time in the > new property in the DART node. I also don't think the binding is much > simpler than the first one. Hi Sven, The problem with both #1 and #2 is that you end up with two references to (effectively) different iommu's in the dwc3 device node. I don't see how that is compatible with the idea of using a single translation table for both sub-DARTs. If you no longer think that is desirable, you'll still have the problem that you'll need to modify the dwc3 driver code such that it uses the right IOMMU to do its DMA address translation. Given what you write above that sounds really ugly and confusing. I would certainly want to avoid doing that in OpenBSD. So I think #3 is the only realistic option here. In my opinion it is perfectly fine for the devicetree to present a workable model of the hardware instead of a 100% accurate model. > > > where #dma-address-cells and #dma-size-cells default to > > > #address-cells and #size-cells respectively if I understand > > > the code correctly. That way we could also just always use > > > a 64bit address and size in the DT, e.g. > > > > > > pcie_dart { > > > [ ... ] > > > dma-window = <0 0x100000 0 0x3fe00000>; > > > [ ... ] > > > }; > > > > That sounds like a serious contender to me! Hopefully one of the > > Linux kernel developers can give this some sort of blessing. > > > > I think it would make sense for us to just rely on the #address-cells > > and #size-cells defaults for the M1 device tree. > > > > Agreed. > > > Best, > > Sven >
On Sun, Mar 21, 2021 at 05:00:50PM +0100, Mark Kettenis wrote: > > Date: Sat, 20 Mar 2021 15:19:33 +0000 > > From: Sven Peter <sven@svenpeter.dev> > > > > Hi, > > > > After Hector's initial work [1] to bring up Linux on Apple's M1 it's time to > > bring up more devices. Most peripherals connected to the SoC are behind a iommu > > which Apple calls "Device Address Resolution Table", or DART for short [2]. > > Unfortunately, it only shares the name with PowerPC's DART. > > Configuring this iommu is mandatory if these peripherals require DMA access. > > > > This patchset implements initial support for this iommu. The hardware itself > > uses a pagetable format that's very similar to the one already implement in > > io-pgtable.c. There are some minor modifications, namely some details of the > > PTE format and that there are always three pagetable levels, which I've > > implement as a new format variant. > > > > I have mainly tested this with the USB controller in device mode which is > > compatible with Linux's dwc3 driver. Some custom PHY initialization (which is > > not yet ready or fully understood) is required though to bring up the ports, > > see e.g. my patches to our m1n1 bootloader [3,4]. If you want to test the same > > setup you will probably need that branch for now and add the nodes from > > the DT binding specification example to your device tree. > > > > Even though each DART instances could support up to 16 devices usually only > > a single device is actually connected. Different devices generally just use > > an entirely separate DART instance with a seperate MMIO range, IRQ, etc. > > > > I have just noticed today though that at least the USB DWC3 controller in host > > mode uses *two* darts at the same time. I'm not sure yet which parts seem to > > require which DART instance. > > > > This means that we might need to support devices attached to two iommus > > simultaneously and just create the same iova mappings. Currently this only > > seems to be required for USB according to Apple's Device Tree. > > > > I see two options for this and would like to get feedback before > > I implement either one: > > > > 1) Change #iommu-cells = <1>; to #iommu-cells = <2>; and use the first cell > > to identify the DART and the second one to identify the master. > > The DART DT node would then also take two register ranges that would > > correspond to the two DARTs. Both instances use the same IRQ and the > > same clocks according to Apple's device tree and my experiments. > > This would keep a single device node and the DART driver would then > > simply map iovas in both DARTs if required. > > > > 2) Keep #iommu-cells as-is but support > > iommus = <&usb_dart1a 1>, <&usb_dart1b 0>; > > instead. > > This would then require two devices nodes for the two DART instances and > > some housekeeping in the DART driver to support mapping iovas in both > > DARTs. > > I believe omap-iommu.c supports this setup but I will have to read > > more code to understand the details there and figure out how to implement > > this in a sane way. > > > > I currently prefer the first option but I don't understand enough details of > > the iommu system to actually make an informed decision. Please don't mix what does the h/w look like and what's easy to implement in Linux's IOMMU subsytem. It's pretty clear (at least from the description here) that option 2 reflects the h/w. > > I'm obviously also open to more options :-) > > Hi Sven, > > I don't think the first option is going to work for PCIe. PCIe > devices will have to use "iommu-map" properties to map PCI devices to > the right iommu, and the currently implementation seems to assume that > #iommu-cells = <1>. The devictree binding[1] doesn't explicitly state > that it relies on #iommu-cells = <1>, but it isn't clear how the > rid-base to iommu-base mapping mechanism would work when that isn't > the case. > > Now the PCIe DARTs are simpler and seem to have only one "instance" > per DART. So if we keep #iommu-cells = <1> for those, you'd still be > fine using the first approach. > > As I mentioned before, not all DARTs support the full 32-bit aperture. > In particular the PCIe DARTs support a smaller address-space. It is > not clear whether this is a restriction of the PCIe host controller or > the DART, but the Apple Device Tree has "vm-base" and "vm-size" > properties that encode the base address and size of the aperture. > These single-cell properties which is probably why for the USB DARTs > only "vm-base" is given; since "vm-base" is 0, a 32-bit number > wouldn't be able to encode the full aperture size. We could make them > 64-bit numbers in the Linux device tree though and always be explicit > about the size. Older Sun SPARC machines used a single "virtual-dma" > property to encode the aperture. We could do someting similar. You > would use this property to initialize domain->geometry.aperture_start > and domain->geometry.aperture_end in diff 3/3 of this series. 'dma-ranges' is what should be used here. Rob
Hi Mark, On Tue, Mar 23, 2021, at 21:00, Mark Kettenis wrote: > The problem with both #1 and #2 is that you end up with two references > to (effectively) different iommu's in the dwc3 device node. I don't > see how that is compatible with the idea of using a single translation > table for both sub-DARTs. I don't have a strong opinion about this fwiw. Option #1 and #2 seem to have precedence in the already existing iommu bindings though [1,2,3]. I just want to make sure we've thought through all options and understand their advantages/disadvantages before we take a decision. I've been reading some more Linux iommu code and here's how I understand it now / how I could implement at least #1 with a single shared pagetable. This might also be possible for option #2, but I'll need to think that through in more detail. An iommu domain is a collection of devices that share a virtual address space. During domain allocation I can just allocate a single DART pagetable and not have anything point to it for now. A stream identifies the smallest unit the iommu hardware can differentiate. For the DART we have 16 of these with a single TCR + the four TTBR register for each stream. A device is assigned to individual streams using the "iommus" property. When a device is attached to a domain we now simply setup the TTBR registers to point to the iommu domain pagetable. It doesn't matter here if it's a single stream or multiple ones or even multiple devices sharing a single stream as long as they're attached to the same domain. All operations (map, unmap, etc.) now simply first modify the domain pagetable and then issue the TLB maintenance operations for attached streams. > > If you no longer think that is desirable, you'll still have the > problem that you'll need to modify the dwc3 driver code such that it > uses the right IOMMU to do its DMA address translation. Given what > you write above that sounds really ugly and confusing. I would > certainly want to avoid doing that in OpenBSD. Yeah, I absolutely don't want to hack on the dwc3 code at all. That will end up being *very* ugly. Best, Sven [1] Documentation/devicetree/bindings/iommu/iommu.txt "Multiple-master IOMMU" [2] Documentation/devicetree/bindings/qcom,iommu.txt last example [3] Documentation/devicetree/bindings/arm,smmu.yaml first example
> Date: Tue, 23 Mar 2021 14:53:46 -0600 > From: Rob Herring <robh@kernel.org> > > On Sun, Mar 21, 2021 at 05:00:50PM +0100, Mark Kettenis wrote: > > > Date: Sat, 20 Mar 2021 15:19:33 +0000 > > > From: Sven Peter <sven@svenpeter.dev> > > > > > > Hi, > > > > > > After Hector's initial work [1] to bring up Linux on Apple's M1 it's time to > > > bring up more devices. Most peripherals connected to the SoC are behind a iommu > > > which Apple calls "Device Address Resolution Table", or DART for short [2]. > > > Unfortunately, it only shares the name with PowerPC's DART. > > > Configuring this iommu is mandatory if these peripherals require DMA access. > > > > > > This patchset implements initial support for this iommu. The hardware itself > > > uses a pagetable format that's very similar to the one already implement in > > > io-pgtable.c. There are some minor modifications, namely some details of the > > > PTE format and that there are always three pagetable levels, which I've > > > implement as a new format variant. > > > > > > I have mainly tested this with the USB controller in device mode which is > > > compatible with Linux's dwc3 driver. Some custom PHY initialization (which is > > > not yet ready or fully understood) is required though to bring up the ports, > > > see e.g. my patches to our m1n1 bootloader [3,4]. If you want to test the same > > > setup you will probably need that branch for now and add the nodes from > > > the DT binding specification example to your device tree. > > > > > > Even though each DART instances could support up to 16 devices usually only > > > a single device is actually connected. Different devices generally just use > > > an entirely separate DART instance with a seperate MMIO range, IRQ, etc. > > > > > > I have just noticed today though that at least the USB DWC3 controller in host > > > mode uses *two* darts at the same time. I'm not sure yet which parts seem to > > > require which DART instance. > > > > > > This means that we might need to support devices attached to two iommus > > > simultaneously and just create the same iova mappings. Currently this only > > > seems to be required for USB according to Apple's Device Tree. > > > > > > I see two options for this and would like to get feedback before > > > I implement either one: > > > > > > 1) Change #iommu-cells = <1>; to #iommu-cells = <2>; and use the first cell > > > to identify the DART and the second one to identify the master. > > > The DART DT node would then also take two register ranges that would > > > correspond to the two DARTs. Both instances use the same IRQ and the > > > same clocks according to Apple's device tree and my experiments. > > > This would keep a single device node and the DART driver would then > > > simply map iovas in both DARTs if required. > > > > > > 2) Keep #iommu-cells as-is but support > > > iommus = <&usb_dart1a 1>, <&usb_dart1b 0>; > > > instead. > > > This would then require two devices nodes for the two DART instances and > > > some housekeeping in the DART driver to support mapping iovas in both > > > DARTs. > > > I believe omap-iommu.c supports this setup but I will have to read > > > more code to understand the details there and figure out how to implement > > > this in a sane way. > > > > > > I currently prefer the first option but I don't understand enough details of > > > the iommu system to actually make an informed decision. > > Please don't mix what does the h/w look like and what's easy to > implement in Linux's IOMMU subsytem. It's pretty clear (at least > from the description here) that option 2 reflects the h/w. Apple does represent these as a single node in their device tree. The two instances share an interrupt and share power/clock gating. So they seem to be part of a single hardware block. > > > I'm obviously also open to more options :-) > > > > Hi Sven, > > > > I don't think the first option is going to work for PCIe. PCIe > > devices will have to use "iommu-map" properties to map PCI devices to > > the right iommu, and the currently implementation seems to assume that > > #iommu-cells = <1>. The devictree binding[1] doesn't explicitly state > > that it relies on #iommu-cells = <1>, but it isn't clear how the > > rid-base to iommu-base mapping mechanism would work when that isn't > > the case. > > > > Now the PCIe DARTs are simpler and seem to have only one "instance" > > per DART. So if we keep #iommu-cells = <1> for those, you'd still be > > fine using the first approach. > > > > As I mentioned before, not all DARTs support the full 32-bit aperture. > > In particular the PCIe DARTs support a smaller address-space. It is > > not clear whether this is a restriction of the PCIe host controller or > > the DART, but the Apple Device Tree has "vm-base" and "vm-size" > > properties that encode the base address and size of the aperture. > > These single-cell properties which is probably why for the USB DARTs > > only "vm-base" is given; since "vm-base" is 0, a 32-bit number > > wouldn't be able to encode the full aperture size. We could make them > > 64-bit numbers in the Linux device tree though and always be explicit > > about the size. Older Sun SPARC machines used a single "virtual-dma" > > property to encode the aperture. We could do someting similar. You > > would use this property to initialize domain->geometry.aperture_start > > and domain->geometry.aperture_end in diff 3/3 of this series. > > 'dma-ranges' is what should be used here. > > Rob >
On 2021-03-20 15:19, Sven Peter wrote: > Hi, > > After Hector's initial work [1] to bring up Linux on Apple's M1 it's time to > bring up more devices. Most peripherals connected to the SoC are behind a iommu > which Apple calls "Device Address Resolution Table", or DART for short [2]. > Unfortunately, it only shares the name with PowerPC's DART. > Configuring this iommu is mandatory if these peripherals require DMA access. > > This patchset implements initial support for this iommu. The hardware itself > uses a pagetable format that's very similar to the one already implement in > io-pgtable.c. There are some minor modifications, namely some details of the > PTE format and that there are always three pagetable levels, which I've > implement as a new format variant. > > I have mainly tested this with the USB controller in device mode which is > compatible with Linux's dwc3 driver. Some custom PHY initialization (which is > not yet ready or fully understood) is required though to bring up the ports, > see e.g. my patches to our m1n1 bootloader [3,4]. If you want to test the same > setup you will probably need that branch for now and add the nodes from > the DT binding specification example to your device tree. > > Even though each DART instances could support up to 16 devices usually only > a single device is actually connected. Different devices generally just use > an entirely separate DART instance with a seperate MMIO range, IRQ, etc. > > I have just noticed today though that at least the USB DWC3 controller in host > mode uses *two* darts at the same time. I'm not sure yet which parts seem to > require which DART instance. > > This means that we might need to support devices attached to two iommus > simultaneously and just create the same iova mappings. Currently this only > seems to be required for USB according to Apple's Device Tree. > > I see two options for this and would like to get feedback before > I implement either one: > > 1) Change #iommu-cells = <1>; to #iommu-cells = <2>; and use the first cell > to identify the DART and the second one to identify the master. > The DART DT node would then also take two register ranges that would > correspond to the two DARTs. Both instances use the same IRQ and the > same clocks according to Apple's device tree and my experiments. > This would keep a single device node and the DART driver would then > simply map iovas in both DARTs if required. This is broadly similar to the approach used by rockchip-iommu and the special arm-smmu-nvidia implementation, where there are multiple instances which require programming identically, that are abstracted behind a single "device". Your case is a little different since you're not programming both *entirely* identically, although maybe that's a possibility if each respective ID isn't used by anything else on the "other" DART? Overall I tend to view this approach as a bit of a hack because it's not really describing the hardware truthfully - just because two distinct functional blocks have their IRQ lines wired together doesn't suddenly make them a single monolithic block with multiple interfaces - and tends to be done for the sake of making the driver implementation easier in terms of the Linux IOMMU API (which, note, hasn't evolved all that far from its PCI-centric origins and isn't exactly great for arbitrary SoC topologies). > 2) Keep #iommu-cells as-is but support > iommus = <&usb_dart1a 1>, <&usb_dart1b 0>; > instead. > This would then require two devices nodes for the two DART instances and > some housekeeping in the DART driver to support mapping iovas in both > DARTs. > I believe omap-iommu.c supports this setup but I will have to read > more code to understand the details there and figure out how to implement > this in a sane way. This approach is arguably the most honest, and more robust in terms of making fewer assumptions, and is used by at least exynos-iommu and omap-iommu. In Linux it currently takes a little bit more housekeeping to keep track of linked instances within the driver since the IOMMU API holds the notion that any given client device is associated with "an IOMMU", but that's always free to change at any time, unlike the design of a DT binding. There's also the funky "root" and "leaf" devices thing that ipmmu-vmsa does, although I believe that represents genuine hardware differences where the leaves are more like extra TLBs rather than fully-functional IOMMU blocks in their own right, so that may not be a relevant model here. Robin. > I currently prefer the first option but I don't understand enough details of > the iommu system to actually make an informed decision. > I'm obviously also open to more options :-) > > > Best regards, > > > Sven > > [1] https://lore.kernel.org/linux-arch/20210304213902.83903-1-marcan@marcan.st/ > [2] https://developer.apple.com/library/archive/documentation/DeviceDrivers/Conceptual/IOKitFundamentals/DataMgmt/DataMgmt.html > [3] https://github.com/svenpeter42/m1n1/commit/1e2661abf5ea2c820297b3ff591235c408d19a34 > [4] https://github.com/svenpeter42/m1n1/tree/usb-uartproxy-console-wip > > >
On Tue, Mar 23, 2021, at 21:53, Rob Herring wrote: > On Sun, Mar 21, 2021 at 05:00:50PM +0100, Mark Kettenis wrote: > > > Date: Sat, 20 Mar 2021 15:19:33 +0000 > > > From: Sven Peter <sven@svenpeter.dev> > > > I have just noticed today though that at least the USB DWC3 controller in host > > > mode uses *two* darts at the same time. I'm not sure yet which parts seem to > > > require which DART instance. > > > > > > This means that we might need to support devices attached to two iommus > > > simultaneously and just create the same iova mappings. Currently this only > > > seems to be required for USB according to Apple's Device Tree. > > > > > > I see two options for this and would like to get feedback before > > > I implement either one: > > > > > > 1) Change #iommu-cells = <1>; to #iommu-cells = <2>; and use the first cell > > > to identify the DART and the second one to identify the master. > > > The DART DT node would then also take two register ranges that would > > > correspond to the two DARTs. Both instances use the same IRQ and the > > > same clocks according to Apple's device tree and my experiments. > > > This would keep a single device node and the DART driver would then > > > simply map iovas in both DARTs if required. > > > > > > 2) Keep #iommu-cells as-is but support > > > iommus = <&usb_dart1a 1>, <&usb_dart1b 0>; > > > instead. > > > This would then require two devices nodes for the two DART instances and > > > some housekeeping in the DART driver to support mapping iovas in both > > > DARTs. > > > I believe omap-iommu.c supports this setup but I will have to read > > > more code to understand the details there and figure out how to implement > > > this in a sane way. > > > > > > I currently prefer the first option but I don't understand enough details of > > > the iommu system to actually make an informed decision. > > Please don't mix what does the h/w look like and what's easy to > implement in Linux's IOMMU subsytem. It's pretty clear (at least > from the description here) that option 2 reflects the h/w. > Good point, I'll keep that in mind and give option 2 a try. > > > > As I mentioned before, not all DARTs support the full 32-bit aperture. > > In particular the PCIe DARTs support a smaller address-space. It is > > not clear whether this is a restriction of the PCIe host controller or > > the DART, but the Apple Device Tree has "vm-base" and "vm-size" > > properties that encode the base address and size of the aperture. > > These single-cell properties which is probably why for the USB DARTs > > only "vm-base" is given; since "vm-base" is 0, a 32-bit number > > wouldn't be able to encode the full aperture size. We could make them > > 64-bit numbers in the Linux device tree though and always be explicit > > about the size. Older Sun SPARC machines used a single "virtual-dma" > > property to encode the aperture. We could do someting similar. You > > would use this property to initialize domain->geometry.aperture_start > > and domain->geometry.aperture_end in diff 3/3 of this series. > > 'dma-ranges' is what should be used here. > The iommu binding documentation [1] mentions that The device tree node of the IOMMU device's parent bus must contain a valid "dma-ranges" property that describes how the physical address space of the IOMMU maps to memory. An empty "dma-ranges" property means that there is a 1:1 mapping from IOMMU to memory. which, if I understand this correctly, means that the 'dma-ranges' for the parent bus of the iommu should be empty since the DART hardware can see the full physical address space with a 1:1 mapping. The documentation also mentions that When an "iommus" property is specified in a device tree node, the IOMMU will be used for address translation. If a "dma-ranges" property exists in the device's parent node it will be ignored. which means that specifying a 'dma-ranges' in the parent bus of any devices that use the iommu will just be ignored. As a concrete example, the PCIe DART IOMMU only allows translations from iovas within 0x00100000...0x3ff00000 to the entire physical address space (though realistically it will only map to 16GB RAM starting at 0x800000000 on the M1). I'm probably just confused or maybe the documentation is outdated but I don't see how I could specify "this device can only use DMA addresses from 0x00100000...0x3ff00000 but can map these via the iommu to any physical address" using 'dma-ranges'. Could you maybe point me to the right direction or give me a small example? That would help a lot! Thanks, Sven [1] Documentation/devicetree/bindings/iommu/iommu.txt
Hi Robin, On Wed, Mar 24, 2021, at 16:29, Robin Murphy wrote: > On 2021-03-20 15:19, Sven Peter wrote: > > > > I have just noticed today though that at least the USB DWC3 controller in host > > mode uses *two* darts at the same time. I'm not sure yet which parts seem to > > require which DART instance. > > > > This means that we might need to support devices attached to two iommus > > simultaneously and just create the same iova mappings. Currently this only > > seems to be required for USB according to Apple's Device Tree. > > > > I see two options for this and would like to get feedback before > > I implement either one: > > > > 1) Change #iommu-cells = <1>; to #iommu-cells = <2>; and use the first cell > > to identify the DART and the second one to identify the master. > > The DART DT node would then also take two register ranges that would > > correspond to the two DARTs. Both instances use the same IRQ and the > > same clocks according to Apple's device tree and my experiments. > > This would keep a single device node and the DART driver would then > > simply map iovas in both DARTs if required. > > This is broadly similar to the approach used by rockchip-iommu and the > special arm-smmu-nvidia implementation, where there are multiple > instances which require programming identically, that are abstracted > behind a single "device". Your case is a little different since you're > not programming both *entirely* identically, although maybe that's a > possibility if each respective ID isn't used by anything else on the > "other" DART? That would be possible. The only difference is that I need to program ID 0 of the first DART and ID 1 of the second one. Both of these IDs are only connected to the same USB controller. > > Overall I tend to view this approach as a bit of a hack because it's not > really describing the hardware truthfully - just because two distinct > functional blocks have their IRQ lines wired together doesn't suddenly > make them a single monolithic block with multiple interfaces - and tends > to be done for the sake of making the driver implementation easier in > terms of the Linux IOMMU API (which, note, hasn't evolved all that far > from its PCI-centric origins and isn't exactly great for arbitrary SoC > topologies). Yes, the easier driver implementation was my reason to favour this option. > > > 2) Keep #iommu-cells as-is but support > > iommus = <&usb_dart1a 1>, <&usb_dart1b 0>; > > instead. > > This would then require two devices nodes for the two DART instances and > > some housekeeping in the DART driver to support mapping iovas in both > > DARTs. > > I believe omap-iommu.c supports this setup but I will have to read > > more code to understand the details there and figure out how to implement > > this in a sane way. > > This approach is arguably the most honest, and more robust in terms of > making fewer assumptions, and is used by at least exynos-iommu and > omap-iommu. In Linux it currently takes a little bit more housekeeping > to keep track of linked instances within the driver since the IOMMU API > holds the notion that any given client device is associated with "an > IOMMU", but that's always free to change at any time, unlike the design > of a DT binding. Sounds good. I'll read those drivers and give it a try for v2. Thanks, Sven
On 2021-03-25 07:53, Sven Peter wrote: > > > On Tue, Mar 23, 2021, at 21:53, Rob Herring wrote: >> On Sun, Mar 21, 2021 at 05:00:50PM +0100, Mark Kettenis wrote: >>>> Date: Sat, 20 Mar 2021 15:19:33 +0000 >>>> From: Sven Peter <sven@svenpeter.dev> >>>> I have just noticed today though that at least the USB DWC3 controller in host >>>> mode uses *two* darts at the same time. I'm not sure yet which parts seem to >>>> require which DART instance. >>>> >>>> This means that we might need to support devices attached to two iommus >>>> simultaneously and just create the same iova mappings. Currently this only >>>> seems to be required for USB according to Apple's Device Tree. >>>> >>>> I see two options for this and would like to get feedback before >>>> I implement either one: >>>> >>>> 1) Change #iommu-cells = <1>; to #iommu-cells = <2>; and use the first cell >>>> to identify the DART and the second one to identify the master. >>>> The DART DT node would then also take two register ranges that would >>>> correspond to the two DARTs. Both instances use the same IRQ and the >>>> same clocks according to Apple's device tree and my experiments. >>>> This would keep a single device node and the DART driver would then >>>> simply map iovas in both DARTs if required. >>>> >>>> 2) Keep #iommu-cells as-is but support >>>> iommus = <&usb_dart1a 1>, <&usb_dart1b 0>; >>>> instead. >>>> This would then require two devices nodes for the two DART instances and >>>> some housekeeping in the DART driver to support mapping iovas in both >>>> DARTs. >>>> I believe omap-iommu.c supports this setup but I will have to read >>>> more code to understand the details there and figure out how to implement >>>> this in a sane way. >>>> >>>> I currently prefer the first option but I don't understand enough details of >>>> the iommu system to actually make an informed decision. >> >> Please don't mix what does the h/w look like and what's easy to >> implement in Linux's IOMMU subsytem. It's pretty clear (at least >> from the description here) that option 2 reflects the h/w. >> > > Good point, I'll keep that in mind and give option 2 a try. > >>> >>> As I mentioned before, not all DARTs support the full 32-bit aperture. >>> In particular the PCIe DARTs support a smaller address-space. It is >>> not clear whether this is a restriction of the PCIe host controller or >>> the DART, but the Apple Device Tree has "vm-base" and "vm-size" >>> properties that encode the base address and size of the aperture. >>> These single-cell properties which is probably why for the USB DARTs >>> only "vm-base" is given; since "vm-base" is 0, a 32-bit number >>> wouldn't be able to encode the full aperture size. We could make them >>> 64-bit numbers in the Linux device tree though and always be explicit >>> about the size. Older Sun SPARC machines used a single "virtual-dma" >>> property to encode the aperture. We could do someting similar. You >>> would use this property to initialize domain->geometry.aperture_start >>> and domain->geometry.aperture_end in diff 3/3 of this series. >> >> 'dma-ranges' is what should be used here. >> > > The iommu binding documentation [1] mentions that > > The device tree node of the IOMMU device's parent bus must contain a valid > "dma-ranges" property that describes how the physical address space of the > IOMMU maps to memory. An empty "dma-ranges" property means that there is a > 1:1 mapping from IOMMU to memory. > > which, if I understand this correctly, means that the 'dma-ranges' for the > parent bus of the iommu should be empty since the DART hardware can see the > full physical address space with a 1:1 mapping. > > > The documentation also mentions that > > When an "iommus" property is specified in a device tree node, the IOMMU > will be used for address translation. If a "dma-ranges" property exists > in the device's parent node it will be ignored. > > which means that specifying a 'dma-ranges' in the parent bus of any devices > that use the iommu will just be ignored. I think that's just wrong and wants updating (or at least clarifying). The high-level view now is that we use "dma-ranges" to describe limitations imposed by a bridge or interconnect segment, and that can certainly happen upstream of an IOMMU. As it happens, I've just recently sent a patch for precisely that case[1]. I guess what it might have been trying to say is that "dma-ranges" *does* become irrelevant in terms of constraining what physical memory is usable for DMA, but that shouldn't imply that its meaning doesn't just shift to a different purpose. > As a concrete example, the PCIe DART IOMMU only allows translations from iovas > within 0x00100000...0x3ff00000 to the entire physical address space (though > realistically it will only map to 16GB RAM starting at 0x800000000 on the M1). > > I'm probably just confused or maybe the documentation is outdated but I don't > see how I could specify "this device can only use DMA addresses from > 0x00100000...0x3ff00000 but can map these via the iommu to any physical > address" using 'dma-ranges'. > > Could you maybe point me to the right direction or give me a small example? > That would help a lot! PCI is easy, since it's already standard practice to use "dma-ranges" to describe host bridge inbound windows. Even if the restriction is really out in the host-side interconnect rather than in the bridge itself, to all intents and purposes it's indistinguishable so can still be described the same way. The case of a standalone device having fewer address bits wired up than both its output and the corresponding IOMMU input might expect is a little more awkward, since that often *does* require adding an extra level of "bus" to explicitly represent that interconnect link in the DT model, e.g. [2]. Robin. [1] https://lore.kernel.org/linux-arm-kernel/720d0a9a42e33148fcac45cd39a727093a32bf32.1614965598.git.robin.murphy@arm.com/ [2] https://lore.kernel.org/linux-arm-kernel/20180926132247.10971-23-laurentiu.tudor@nxp.com/
On Thu, Mar 25, 2021, at 12:50, Robin Murphy wrote: > On 2021-03-25 07:53, Sven Peter wrote: > > > > > > On Tue, Mar 23, 2021, at 21:53, Rob Herring wrote: > >> On Sun, Mar 21, 2021 at 05:00:50PM +0100, Mark Kettenis wrote: > >>> > >>> As I mentioned before, not all DARTs support the full 32-bit aperture. > >>> In particular the PCIe DARTs support a smaller address-space. It is > >>> not clear whether this is a restriction of the PCIe host controller or > >>> the DART, but the Apple Device Tree has "vm-base" and "vm-size" > >>> properties that encode the base address and size of the aperture. > >>> These single-cell properties which is probably why for the USB DARTs > >>> only "vm-base" is given; since "vm-base" is 0, a 32-bit number > >>> wouldn't be able to encode the full aperture size. We could make them > >>> 64-bit numbers in the Linux device tree though and always be explicit > >>> about the size. Older Sun SPARC machines used a single "virtual-dma" > >>> property to encode the aperture. We could do someting similar. You > >>> would use this property to initialize domain->geometry.aperture_start > >>> and domain->geometry.aperture_end in diff 3/3 of this series. > >> > >> 'dma-ranges' is what should be used here. > >> > > > > The iommu binding documentation [1] mentions that > > > > The device tree node of the IOMMU device's parent bus must contain a valid > > "dma-ranges" property that describes how the physical address space of the > > IOMMU maps to memory. An empty "dma-ranges" property means that there is a > > 1:1 mapping from IOMMU to memory. > > > > which, if I understand this correctly, means that the 'dma-ranges' for the > > parent bus of the iommu should be empty since the DART hardware can see the > > full physical address space with a 1:1 mapping. > > > > > > The documentation also mentions that > > > > When an "iommus" property is specified in a device tree node, the IOMMU > > will be used for address translation. If a "dma-ranges" property exists > > in the device's parent node it will be ignored. > > > > which means that specifying a 'dma-ranges' in the parent bus of any devices > > that use the iommu will just be ignored. > > I think that's just wrong and wants updating (or at least clarifying). > The high-level view now is that we use "dma-ranges" to describe > limitations imposed by a bridge or interconnect segment, and that can > certainly happen upstream of an IOMMU. As it happens, I've just recently > sent a patch for precisely that case[1]. > > I guess what it might have been trying to say is that "dma-ranges" > *does* become irrelevant in terms of constraining what physical memory > is usable for DMA, but that shouldn't imply that its meaning doesn't > just shift to a different purpose. > Okay, now it makes sense then! > > As a concrete example, the PCIe DART IOMMU only allows translations from iovas > > within 0x00100000...0x3ff00000 to the entire physical address space (though > > realistically it will only map to 16GB RAM starting at 0x800000000 on the M1). > > > > I'm probably just confused or maybe the documentation is outdated but I don't > > see how I could specify "this device can only use DMA addresses from > > 0x00100000...0x3ff00000 but can map these via the iommu to any physical > > address" using 'dma-ranges'. > > > > Could you maybe point me to the right direction or give me a small example? > > That would help a lot! > > PCI is easy, since it's already standard practice to use "dma-ranges" to > describe host bridge inbound windows. Even if the restriction is really > out in the host-side interconnect rather than in the bridge itself, to > all intents and purposes it's indistinguishable so can still be > described the same way. > > The case of a standalone device having fewer address bits wired up than > both its output and the corresponding IOMMU input might expect is a > little more awkward, since that often *does* require adding an extra > level of "bus" to explicitly represent that interconnect link in the DT > model, e.g. [2]. > Nice, thanks! That's exactly what I was looking for :) Best, Sven
On Thu, Mar 25, 2021 at 8:53 AM Sven Peter <sven@svenpeter.dev> wrote: > On Tue, Mar 23, 2021, at 21:53, Rob Herring wrote: > > I'm probably just confused or maybe the documentation is outdated but I don't > see how I could specify "this device can only use DMA addresses from > 0x00100000...0x3ff00000 but can map these via the iommu to any physical > address" using 'dma-ranges'. It sounds like this is a holdover from the original powerpc iommu, which also had a limited set of virtual addresses in the iommu. I would think it's sufficient to describe it in the iommu itself, since the limitation is more "addresses coming into the iommu must be this range" than "this device must use that address range for talking to the iommu". If the addresses are allocated by the iommu driver, and each iommu only has one DMA master attached to it, having a simple range property in the iommu node should do the trick here. If there might be multiple devices on the same iommu but with different address ranges (which I don't think is the case), then it could be part of the reference to the iommu. Arnd
> From: Arnd Bergmann <arnd@kernel.org> > Date: Thu, 25 Mar 2021 22:41:09 +0100 > > On Thu, Mar 25, 2021 at 8:53 AM Sven Peter <sven@svenpeter.dev> wrote: > > On Tue, Mar 23, 2021, at 21:53, Rob Herring wrote: > > > > I'm probably just confused or maybe the documentation is outdated but I don't > > see how I could specify "this device can only use DMA addresses from > > 0x00100000...0x3ff00000 but can map these via the iommu to any physical > > address" using 'dma-ranges'. > > It sounds like this is a holdover from the original powerpc iommu, > which also had a limited set of virtual addresses in the iommu. > > I would think it's sufficient to describe it in the iommu itself, > since the limitation is more "addresses coming into the iommu must > be this range" than "this device must use that address range for > talking to the iommu". > > If the addresses are allocated by the iommu driver, and each iommu > only has one DMA master attached to it, having a simple range > property in the iommu node should do the trick here. If there might > be multiple devices on the same iommu but with different address > ranges (which I don't think is the case), then it could be part of > the reference to the iommu. The ADT has properties on the iommu node that describe the adresses it accepts for translation ("vm-base" and "vm-size"). So I think we can safely assume that the same limits apply to all DMA masters that are attached to it. We don't know if the range limit is baked into the silicon or whether it is related to how the firmware sets things up. Having the properties on the iommu node makes it easy for m1n1 to update the properties with the right values if necessary. Some of the DARTs provide a bypass facility. That code make using the standard "dma-ranges" property tricky. That property would need to contain the bypass address range. But that would mean that if the DART driver needs to look at that property to figure out the address range that supports translation it will need to be able to distinguish between the translatable address range and the bypass address range. Cheers, Mark
On Fri, Mar 26, 2021 at 4:59 PM Mark Kettenis <mark.kettenis@xs4all.nl> wrote: > > > From: Arnd Bergmann <arnd@kernel.org> > > Date: Thu, 25 Mar 2021 22:41:09 +0100 > > > > On Thu, Mar 25, 2021 at 8:53 AM Sven Peter <sven@svenpeter.dev> wrote: > > > On Tue, Mar 23, 2021, at 21:53, Rob Herring wrote: > > > > > > I'm probably just confused or maybe the documentation is outdated but I don't > > > see how I could specify "this device can only use DMA addresses from > > > 0x00100000...0x3ff00000 but can map these via the iommu to any physical > > > address" using 'dma-ranges'. > > > > It sounds like this is a holdover from the original powerpc iommu, > > which also had a limited set of virtual addresses in the iommu. > > > > I would think it's sufficient to describe it in the iommu itself, > > since the limitation is more "addresses coming into the iommu must > > be this range" than "this device must use that address range for > > talking to the iommu". > > > > If the addresses are allocated by the iommu driver, and each iommu > > only has one DMA master attached to it, having a simple range > > property in the iommu node should do the trick here. If there might > > be multiple devices on the same iommu but with different address > > ranges (which I don't think is the case), then it could be part of > > the reference to the iommu. > > The ADT has properties on the iommu node that describe the adresses it > accepts for translation ("vm-base" and "vm-size"). So I think we can > safely assume that the same limits apply to all DMA masters that are > attached to it. We don't know if the range limit is baked into the > silicon or whether it is related to how the firmware sets things up. > Having the properties on the iommu node makes it easy for m1n1 to > update the properties with the right values if necessary. Ok. > Some of the DARTs provide a bypass facility. That code make using the > standard "dma-ranges" property tricky. That property would need to > contain the bypass address range. But that would mean that if the > DART driver needs to look at that property to figure out the address > range that supports translation it will need to be able to distinguish > between the translatable address range and the bypass address range. I'm not following here. Do you mean the bus address used for bypass is different from the upstream address that gets programmed into the iommu when it is active? Arnd
On Fri, Mar 26, 2021, at 16:59, Mark Kettenis wrote: > Some of the DARTs provide a bypass facility. That code make using the > standard "dma-ranges" property tricky. That property would need to > contain the bypass address range. But that would mean that if the > DART driver needs to look at that property to figure out the address > range that supports translation it will need to be able to distinguish > between the translatable address range and the bypass address range. Do we understand if and why we even need to bypass certain streams? And do you have an example for a node in the ADT that contains this bypass range? I've only seen nodes with "bypass" and "bypass-adress" but that could just be some software abstraction Apple uses which doesn't map well to Linux or other OSes and might not even be required here. Sven
On Fri, Mar 26, 2021 at 5:10 PM Sven Peter <sven@svenpeter.dev> wrote: > On Fri, Mar 26, 2021, at 16:59, Mark Kettenis wrote: > > Some of the DARTs provide a bypass facility. That code make using the > > standard "dma-ranges" property tricky. That property would need to > > contain the bypass address range. But that would mean that if the > > DART driver needs to look at that property to figure out the address > > range that supports translation it will need to be able to distinguish > > between the translatable address range and the bypass address range. > > Do we understand if and why we even need to bypass certain streams? My guess is that this is a performance optimization. There are generally three reasons to want an iommu in the first place: - Pass a device down to a guest or user process without giving access to all of memory - Avoid problems with limitations in the device, typically when it only supports 32-bit bus addressing, but the installed memory is larger than 4GB - Protect kernel memory from broken drivers If you care about none of the above, but you do care about data transfer speed, you are better off just leaving the IOMMU in bypass mode. I don't think we have to support it if the IOMMU works reliably, but it's something that users might want. Arnd
On Fri, Mar 26, 2021, at 17:38, Arnd Bergmann wrote: > On Fri, Mar 26, 2021 at 5:10 PM Sven Peter <sven@svenpeter.dev> wrote: > > On Fri, Mar 26, 2021, at 16:59, Mark Kettenis wrote: > > > Some of the DARTs provide a bypass facility. That code make using the > > > standard "dma-ranges" property tricky. That property would need to > > > contain the bypass address range. But that would mean that if the > > > DART driver needs to look at that property to figure out the address > > > range that supports translation it will need to be able to distinguish > > > between the translatable address range and the bypass address range. > > > > Do we understand if and why we even need to bypass certain streams? > > My guess is that this is a performance optimization. Makes sense. > > There are generally three reasons to want an iommu in the first place: > - Pass a device down to a guest or user process without giving > access to all of memory > - Avoid problems with limitations in the device, typically when it > only supports > 32-bit bus addressing, but the installed memory is larger than 4GB > - Protect kernel memory from broken drivers > > If you care about none of the above, but you do care about data transfer > speed, you are better off just leaving the IOMMU in bypass mode. > I don't think we have to support it if the IOMMU works reliably, but it's > something that users might want. Right now the IOMMU works very reliably while bypass mode seems to be tricky at best. I think I partly know how to enable it but it looks like either not every DART or DART/master combination even supports it or that there is some additional configuration required to make it work reliably. I had it working with the USB DART at one point but I needed to enable it in all 16 streams of the IOMMU even though the pagetables only need to be setup in one stream as indicated by the ADT. I couldn't get it to work at all for the framebuffer IOMMU. I think it's fine to skip it for now until it's either actually required due to some hardware quirk or once we have users requesting support. Apple uses almost all IOMMUs without bypass mode if that ADT is to be believed though. Best, Sven
> From: Arnd Bergmann <arnd@kernel.org> > Date: Fri, 26 Mar 2021 17:38:24 +0100 > > On Fri, Mar 26, 2021 at 5:10 PM Sven Peter <sven@svenpeter.dev> wrote: > > On Fri, Mar 26, 2021, at 16:59, Mark Kettenis wrote: > > > Some of the DARTs provide a bypass facility. That code make using the > > > standard "dma-ranges" property tricky. That property would need to > > > contain the bypass address range. But that would mean that if the > > > DART driver needs to look at that property to figure out the address > > > range that supports translation it will need to be able to distinguish > > > between the translatable address range and the bypass address range. > > > > Do we understand if and why we even need to bypass certain streams? > > My guess is that this is a performance optimization. > > There are generally three reasons to want an iommu in the first place: > - Pass a device down to a guest or user process without giving > access to all of memory > - Avoid problems with limitations in the device, typically when it > only supports > 32-bit bus addressing, but the installed memory is larger than 4GB > - Protect kernel memory from broken drivers > > If you care about none of the above, but you do care about data transfer > speed, you are better off just leaving the IOMMU in bypass mode. > I don't think we have to support it if the IOMMU works reliably, but it's > something that users might want. Another reason might be that a device needs access to large amounts of physical memory at the same time and the 32-bit address space that the DART provides is too tight. In U-Boot I might want to use bypass where it works since there is no proper IOMMU support in U-Boot. Generally speaking, the goal is to keep the code size down in U-Boot. In OpenBSD I'll probably avoid bypass mode if I can. I haven't figured out how the bypass stuff really works. Corellium added support for it in their codebase when they added support for Thunderbolt, and some of the DARTs that seem to be related to Thunderbolt do indeed have a "bypass" property. But it is unclear to me how the different puzzle pieces fit together for Thunderbolt. Anyway, from my viewpoint having the information about the IOVA address space sit on the devices makes little sense. This information is needed by the DART driver, and there is no direct cnnection from the DART to the individual devices in the devicetree. The "iommus" property makes a connection in the opposite direction. Cheers, Mark
On 2021-03-26 17:26, Mark Kettenis wrote: >> From: Arnd Bergmann <arnd@kernel.org> >> Date: Fri, 26 Mar 2021 17:38:24 +0100 >> >> On Fri, Mar 26, 2021 at 5:10 PM Sven Peter <sven@svenpeter.dev> wrote: >>> On Fri, Mar 26, 2021, at 16:59, Mark Kettenis wrote: >>>> Some of the DARTs provide a bypass facility. That code make using the >>>> standard "dma-ranges" property tricky. That property would need to >>>> contain the bypass address range. But that would mean that if the >>>> DART driver needs to look at that property to figure out the address >>>> range that supports translation it will need to be able to distinguish >>>> between the translatable address range and the bypass address range. >>> >>> Do we understand if and why we even need to bypass certain streams? >> >> My guess is that this is a performance optimization. >> >> There are generally three reasons to want an iommu in the first place: >> - Pass a device down to a guest or user process without giving >> access to all of memory >> - Avoid problems with limitations in the device, typically when it >> only supports >> 32-bit bus addressing, but the installed memory is larger than 4GB >> - Protect kernel memory from broken drivers >> >> If you care about none of the above, but you do care about data transfer >> speed, you are better off just leaving the IOMMU in bypass mode. >> I don't think we have to support it if the IOMMU works reliably, but it's >> something that users might want. > > Another reason might be that a device needs access to large amounts of > physical memory at the same time and the 32-bit address space that the > DART provides is too tight. > > In U-Boot I might want to use bypass where it works since there is no > proper IOMMU support in U-Boot. Generally speaking, the goal is to > keep the code size down in U-Boot. In OpenBSD I'll probably avoid > bypass mode if I can. > > I haven't figured out how the bypass stuff really works. Corellium > added support for it in their codebase when they added support for > Thunderbolt, and some of the DARTs that seem to be related to > Thunderbolt do indeed have a "bypass" property. But it is unclear to > me how the different puzzle pieces fit together for Thunderbolt. > > Anyway, from my viewpoint having the information about the IOVA > address space sit on the devices makes little sense. This information > is needed by the DART driver, and there is no direct cnnection from > the DART to the individual devices in the devicetree. The "iommus" > property makes a connection in the opposite direction. What still seems unclear is whether these addressing limitations are a property of the DART input interface, the device output interface, or the interconnect between them. Although the observable end result appears more or less the same either way, they are conceptually different things which we have different abstractions to deal with. Robin.
On Fri, Mar 26, 2021, at 18:34, Robin Murphy wrote: > On 2021-03-26 17:26, Mark Kettenis wrote: > > > > Anyway, from my viewpoint having the information about the IOVA > > address space sit on the devices makes little sense. This information > > is needed by the DART driver, and there is no direct cnnection from > > the DART to the individual devices in the devicetree. The "iommus" > > property makes a connection in the opposite direction. > > What still seems unclear is whether these addressing limitations are a > property of the DART input interface, the device output interface, or > the interconnect between them. Although the observable end result > appears more or less the same either way, they are conceptually > different things which we have different abstractions to deal with. > > Robin. > I'm not really sure if there is any way for us to figure out where these limitation comes from though. I've done some more experiments and looked at all DART nodes in Apple's Device Tree though. It seems that most (if not all) masters only connect 32 address lines even though the iommu can handle a much larger address space. I'll therefore remove the code to handle the full space for v2 since it's essentially dead code that can't be tested anyway. There are some exceptions though: There are the PCIe DARTs which have a different limitation which could be encoded as 'dma-ranges' in the pci bus node: name base size dart-apcie1: 00100000 3fe00000 dart-apcie2: 00100000 3fe00000 dart-apcie0: 00100000 3fe00000 dart-apciec0: 00004000 7fffc000 dart-apciec1: 80000000 7fffc000 Then there are also these display controller DARTs. If we wanted to use dma-ranges we could just put them in a single sub bus: name base size dart-disp0: 00000000 fc000000 dart-dcp: 00000000 fc000000 dart-dispext0: 00000000 fc000000 dart-dcpext: 00000000 fc000000 And finally we have these strange ones which might eventually each require another awkward sub-bus if we want to stick to the dma-ranges property. name base size dart-aop: 00030000 ffffffff ("always-on processor") dart-pmp: 00000000 bff00000 (no idea yet) dart-sio: 0021c000 fbde4000 (at least their Secure Enclave/TPM co-processor) dart-ane: 00000000 e0000000 ("Neural Engine", their ML accelerator) For all we know these limitations could even arise for different reasons. (the secure enclave one looks like it might be imposed by the code running on there). Not really sure to proceed from here. I'll give the dma-ranges options a try for v2 and see how that one works out but that's not going to help us understand *why* these limitations exist. At least I won't have to change much code if we agree on a different abstraction :) The important ones for now are probably the USB and the PCIe ones. We'll need the display ones after that and can probably ignore the strange ones for quite a while. Best, Sven
On Fri, Mar 26, 2021 at 6:51 PM Sven Peter <sven@svenpeter.dev> wrote: > On Fri, Mar 26, 2021, at 18:34, Robin Murphy wrote: > > On 2021-03-26 17:26, Mark Kettenis wrote: > > > > > > Anyway, from my viewpoint having the information about the IOVA > > > address space sit on the devices makes little sense. This information > > > is needed by the DART driver, and there is no direct cnnection from > > > the DART to the individual devices in the devicetree. The "iommus" > > > property makes a connection in the opposite direction. > > > > What still seems unclear is whether these addressing limitations are a > > property of the DART input interface, the device output interface, or > > the interconnect between them. Although the observable end result > > appears more or less the same either way, they are conceptually > > different things which we have different abstractions to deal with. > > I'm not really sure if there is any way for us to figure out where these > limitation comes from though. My first guess was that this is done to partition the available address address space in a way that allows one physical IOTLB to handle multiple devices that each have their own page table for a subset of the address space, as was done on old PowerPC IOMMUs. However, the ranges you list don't really support that model. > I've done some more experiments and looked at all DART nodes in Apple's Device > Tree though. It seems that most (if not all) masters only connect 32 address > lines even though the iommu can handle a much larger address space. I'll therefore > remove the code to handle the full space for v2 since it's essentially dead > code that can't be tested anyway. > > > There are some exceptions though: > > There are the PCIe DARTs which have a different limitation which could be > encoded as 'dma-ranges' in the pci bus node: > > name base size > dart-apcie1: 00100000 3fe00000 > dart-apcie2: 00100000 3fe00000 > dart-apcie0: 00100000 3fe00000 > dart-apciec0: 00004000 7fffc000 > dart-apciec1: 80000000 7fffc000 This looks like they are reserving some address space in the beginning and/or the end, and for apciec0, the address space is partitioned into two equal-sized regions. > Then there are also these display controller DARTs. If we wanted to use dma-ranges > we could just put them in a single sub bus: > > name base size > dart-disp0: 00000000 fc000000 > dart-dcp: 00000000 fc000000 > dart-dispext0: 00000000 fc000000 > dart-dcpext: 00000000 fc000000 > > > And finally we have these strange ones which might eventually each require > another awkward sub-bus if we want to stick to the dma-ranges property. > > name base size > dart-aop: 00030000 ffffffff ("always-on processor") > dart-pmp: 00000000 bff00000 (no idea yet) Here I also see a "pio-vm-size" property: dart-pmp { pio-vm-base = <0xc0000000>; pio-vm-size = <0x40000000>; vm-size = <0xbff00000>; ... }; Which seems to give 3GB of address space to the normal iotlb, plus the last 1GB to something else. The vm-base property is also missing rather than zero, but that could just be part of their syntax instead of a real difference. Could it be that there are > dart-sio: 0021c000 fbde4000 (at least their Secure Enclave/TPM co-processor) Same here: dart-sio { vm-base = <0x0>; vm-size = <0xfc000000>; pio-vm-base = <0xfd000000>; pio-vm-size = <0x2000000>; pio-granularity = <0x1000000>; } There are clearly two distinct ranges that split up the 4GB space again, with a small hole of 16MB (==pio-granularity) at the end of each range. The "pio" name might indicate that this is a range of addresses that can be programmed to point at I/O registers in another device, rather than pointing to RAM. Arnd
On Fri, Mar 26, 2021 at 6:28 PM Mark Kettenis <mark.kettenis@xs4all.nl> wrote: > I haven't figured out how the bypass stuff really works. Corellium > added support for it in their codebase when they added support for > Thunderbolt, and some of the DARTs that seem to be related to > Thunderbolt do indeed have a "bypass" property. But it is unclear to > me how the different puzzle pieces fit together for Thunderbolt. As a general observation, bypass mode for Thunderbolt is what enabled the http://thunderclap.io/ attack. This is extremely useful for debugging a running kernel from another machine, but it's also something that should never be done in a production kernel. Arnd
> From: Arnd Bergmann <arnd@kernel.org> > Date: Fri, 26 Mar 2021 21:03:32 +0100 > > On Fri, Mar 26, 2021 at 6:28 PM Mark Kettenis <mark.kettenis@xs4all.nl> wrote: > > > I haven't figured out how the bypass stuff really works. Corellium > > added support for it in their codebase when they added support for > > Thunderbolt, and some of the DARTs that seem to be related to > > Thunderbolt do indeed have a "bypass" property. But it is unclear to > > me how the different puzzle pieces fit together for Thunderbolt. > > As a general observation, bypass mode for Thunderbolt is what enabled > the http://thunderclap.io/ attack. This is extremely useful for debugging > a running kernel from another machine, but it's also something that > should never be done in a production kernel. No kidding! I was surprised to see the bypass support on the Thunderbolt-related nodes.
> From: Arnd Bergmann <arnd@kernel.org> > Date: Fri, 26 Mar 2021 20:59:58 +0100 > > On Fri, Mar 26, 2021 at 6:51 PM Sven Peter <sven@svenpeter.dev> wrote: > > On Fri, Mar 26, 2021, at 18:34, Robin Murphy wrote: > > > On 2021-03-26 17:26, Mark Kettenis wrote: > > > > > > > > Anyway, from my viewpoint having the information about the IOVA > > > > address space sit on the devices makes little sense. This information > > > > is needed by the DART driver, and there is no direct cnnection from > > > > the DART to the individual devices in the devicetree. The "iommus" > > > > property makes a connection in the opposite direction. > > > > > > What still seems unclear is whether these addressing limitations are a > > > property of the DART input interface, the device output interface, or > > > the interconnect between them. Although the observable end result > > > appears more or less the same either way, they are conceptually > > > different things which we have different abstractions to deal with. > > > > I'm not really sure if there is any way for us to figure out where these > > limitation comes from though. > > My first guess was that this is done to partition the available address > address space in a way that allows one physical IOTLB to handle > multiple devices that each have their own page table for a subset > of the address space, as was done on old PowerPC IOMMUs. > However, the ranges you list don't really support that model. > > > I've done some more experiments and looked at all DART nodes in Apple's Device > > Tree though. It seems that most (if not all) masters only connect 32 address > > lines even though the iommu can handle a much larger address space. I'll therefore > > remove the code to handle the full space for v2 since it's essentially dead > > code that can't be tested anyway. > > > > > > There are some exceptions though: > > > > There are the PCIe DARTs which have a different limitation which could be > > encoded as 'dma-ranges' in the pci bus node: > > > > name base size > > dart-apcie1: 00100000 3fe00000 > > dart-apcie2: 00100000 3fe00000 > > dart-apcie0: 00100000 3fe00000 > > dart-apciec0: 00004000 7fffc000 > > dart-apciec1: 80000000 7fffc000 > > This looks like they are reserving some address space in the beginning > and/or the end, and for apciec0, the address space is partitioned into > two equal-sized regions. > > > Then there are also these display controller DARTs. If we wanted to use dma-ranges > > we could just put them in a single sub bus: > > > > name base size > > dart-disp0: 00000000 fc000000 > > dart-dcp: 00000000 fc000000 > > dart-dispext0: 00000000 fc000000 > > dart-dcpext: 00000000 fc000000 > > > > > > And finally we have these strange ones which might eventually each require > > another awkward sub-bus if we want to stick to the dma-ranges property. > > > > name base size > > dart-aop: 00030000 ffffffff ("always-on processor") > > dart-pmp: 00000000 bff00000 (no idea yet) > > Here I also see a "pio-vm-size" property: > > dart-pmp { > pio-vm-base = <0xc0000000>; > pio-vm-size = <0x40000000>; > vm-size = <0xbff00000>; > ... > }; > > Which seems to give 3GB of address space to the normal iotlb, > plus the last 1GB to something else. The vm-base property is also > missing rather than zero, but that could just be part of their syntax > instead of a real difference. Yes. It seems like "vm-base" is omitted when it is 0, and "vm-size" is omitted when the end of the window is at 0xffffffff. > > Could it be that there are > > > dart-sio: 0021c000 fbde4000 (at least their Secure Enclave/TPM co-processor) > > Same here: > dart-sio { > vm-base = <0x0>; > vm-size = <0xfc000000>; > pio-vm-base = <0xfd000000>; > pio-vm-size = <0x2000000>; > pio-granularity = <0x1000000>; > } > > There are clearly two distinct ranges that split up the 4GB space again, > with a small hole of 16MB (==pio-granularity) at the end of each range. > > The "pio" name might indicate that this is a range of addresses that > can be programmed to point at I/O registers in another device, rather > than pointing to RAM. > > Arnd >
On Fri, Mar 26, 2021, at 20:59, Arnd Bergmann wrote: > On Fri, Mar 26, 2021 at 6:51 PM Sven Peter <sven@svenpeter.dev> wrote: > > dart-sio: 0021c000 fbde4000 (at least their Secure Enclave/TPM co-processor) > > Same here: > dart-sio { > vm-base = <0x0>; > vm-size = <0xfc000000>; > pio-vm-base = <0xfd000000>; > pio-vm-size = <0x2000000>; > pio-granularity = <0x1000000>; > } > > There are clearly two distinct ranges that split up the 4GB space again, > with a small hole of 16MB (==pio-granularity) at the end of each range. > > The "pio" name might indicate that this is a range of addresses that > can be programmed to point at I/O registers in another device, rather > than pointing to RAM. > > Arnd > Very interesting observation! Mark and I have discussed this a little bit further on IRC. Mark also successfully used the PCIe DARTs with a DMA window outside of the one specified by vm-base/vm-size in the ADT. I believe that the (pio-)vm-base/size properties merely specify the ranges their allocator uses and do not describe actual hardware limitations. Mark also suggested that they might reserve memory at the beginning to find bugs similar to how one might not allow to map memory at 0x0. I have also done a few more experiments and figured out that if I put the IOMMU into bypass mode (which doesn't seem to work for all IOMMUs/master combinations which is why I'll leave it out of this series for now until I figure out more details) I *can* use the full address space. I think the limitation is therefore imposed by the translation hardware inside the IOMMU and not by the bus/the interconnect. If that's correct I think the right place to enforce this is to just limit the aperture inside the DART driver to a 32bit address space whenever address translation is enabled. Thanks, Sven
Hi Robin, On Thu, Mar 25, 2021, at 12:50, Robin Murphy wrote: > On 2021-03-25 07:53, Sven Peter wrote: > > The iommu binding documentation [1] mentions that > > > > The device tree node of the IOMMU device's parent bus must contain a valid > > "dma-ranges" property that describes how the physical address space of the > > IOMMU maps to memory. An empty "dma-ranges" property means that there is a > > 1:1 mapping from IOMMU to memory. > > > > which, if I understand this correctly, means that the 'dma-ranges' for the > > parent bus of the iommu should be empty since the DART hardware can see the > > full physical address space with a 1:1 mapping. > > > > > > The documentation also mentions that > > > > When an "iommus" property is specified in a device tree node, the IOMMU > > will be used for address translation. If a "dma-ranges" property exists > > in the device's parent node it will be ignored. > > > > which means that specifying a 'dma-ranges' in the parent bus of any devices > > that use the iommu will just be ignored. > > I think that's just wrong and wants updating (or at least clarifying). > The high-level view now is that we use "dma-ranges" to describe > limitations imposed by a bridge or interconnect segment, and that can > certainly happen upstream of an IOMMU. As it happens, I've just recently > sent a patch for precisely that case[1]. > > I guess what it might have been trying to say is that "dma-ranges" > *does* become irrelevant in terms of constraining what physical memory > is usable for DMA, but that shouldn't imply that its meaning doesn't > just shift to a different purpose. Should I add a patch to clarify this paragraph for v2 or submit a separate one-off patch? I'm not entirely sure about the process here but I could add a Suggested-by: to the commit if that's fine with you. Best, Sven