[00/11] of: dma-ranges fixes and improvements

Message ID	20190927002455.13169-1-robh@kernel.org (mailing list archive)
Headers	show Return-Path: <SRS0=Nv45=XW=lists.infradead.org=linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BFB0520835 From: Rob Herring <robh@kernel.org> To: devicetree@vger.kernel.org, linux-arm-kernel@lists.infradead.org Subject: [PATCH 00/11] of: dma-ranges fixes and improvements Date: Thu, 26 Sep 2019 19:24:44 -0500 Message-Id: <20190927002455.13169-1-robh@kernel.org> MIME-Version: 1.0 summary: Content analysis details: (0.5 points) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at https://www.dnswl.org/, no trust [209.85.210.68 listed in list.dnswl.org] 0.2 FREEMAIL_ENVFROM_END_DIGIT Envelope-from freemail username ends in digit (robherring2[at]gmail.com) 0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider (robherring2[at]gmail.com) -0.0 SPF_PASS SPF: sender matches SPF record 0.2 HEADER_FROM_DIFFERENT_DOMAINS From and EnvelopeFrom 2nd level mail domains are different 0.0 SPF_HELO_NONE SPF: HELO does not publish an SPF Record 0.0 RCVD_IN_MSPIKE_H3 RBL: Good reputation (+3) [209.85.210.68 listed in wl.mailspike.net] 0.0 RCVD_IN_MSPIKE_WL Mailspike good senders 0.0 FREEMAIL_FORGED_FROMDOMAIN 2nd level domains in From and EnvelopeFrom freemail headers are different Precedence: list Cc: Florian Fainelli <f.fainelli@gmail.com>, Arnd Bergmann <arnd@arndb.de>, Frank Rowand <frowand.list@gmail.com>, linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org, Marek Vasut <marek.vasut@gmail.com>, Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>, Oza Pawandeep <oza.oza@broadcom.com>, Stefan Wahren <wahrenst@gmx.net>, Simon Horman <horms+renesas@verge.net.au>, Geert Uytterhoeven <geert+renesas@glider.be>, Robin Murphy <robin.murphy@arm.com>, Nicolas Saenz Julienne <nsaenzjulienne@suse.de> Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org> Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org
Series	of: dma-ranges fixes and improvements \| expand [00/11] of: dma-ranges fixes and improvements [01/11] of: Remove unused of_find_matching_node_by_address() [02/11] of: Make of_dma_get_range() private [03/11] of: address: Report of_dma_get_range() errors meaningfully [04/11] of/unittest: Add dma-ranges address translation tests [05/11] of: Ratify of_dma_configure() interface [06/11] of/address: Introduce of_get_next_dma_parent() helper [07/11] of: address: Follow DMA parent for "dma-coherent" [08/11] of: Factor out #{addr,size}-cells parsing [09/11] of: Make of_dma_get_range() work on bus nodes [10/11] of/address: Translate 'dma-ranges' for parent nodes missing 'dma-ranges' [11/11] of/address: Fix of_pci_range_parser_one translation of DMA addresses

Rob Herring Sept. 27, 2019, 12:24 a.m. UTC

This series fixes several issues related to 'dma-ranges'. Primarily,
'dma-ranges' in a PCI bridge node does correctly set dma masks for PCI
devices not described in the DT. A common case needing dma-ranges is a
32-bit PCIe bridge on a 64-bit system. This affects several platforms
including Broadcom, NXP, Renesas, and Arm Juno. There's been several
attempts to fix these issues, most recently earlier this week[1].

In the process, I found several bugs in the address translation. It
appears that things have happened to work as various DTs happen to use
1:1 addresses.

First 3 patches are just some clean-up. The 4th patch adds a unittest
exhibiting the issues. Patches 5-9 rework how of_dma_configure() works
making it work on either a struct device child node or a struct
device_node parent node so that it works on bus leaf nodes like PCI
bridges. Patches 10 and 11 fix 2 issues with address translation for
dma-ranges.

My testing on this has been with QEMU virt machine hacked up to set PCI
dma-ranges and the unittest. Nicolas reports this series resolves the
issues on Rpi4 and NXP Layerscape platforms.

Rob

[1] https://lore.kernel.org/linux-arm-kernel/20190924181244.7159-1-nsaenzjulienne@suse.de/

Rob Herring (5):
of: Remove unused of_find_matching_node_by_address()
of: Make of_dma_get_range() private
of/unittest: Add dma-ranges address translation tests
of/address: Translate 'dma-ranges' for parent nodes missing
'dma-ranges'
of/address: Fix of_pci_range_parser_one translation of DMA addresses

Robin Murphy (6):
of: address: Report of_dma_get_range() errors meaningfully
of: Ratify of_dma_configure() interface
of/address: Introduce of_get_next_dma_parent() helper
of: address: Follow DMA parent for "dma-coherent"
of: Factor out #{addr,size}-cells parsing
of: Make of_dma_get_range() work on bus nodes

drivers/of/address.c | 83 +++++++++----------
drivers/of/base.c | 32 ++++---
drivers/of/device.c | 12 ++-
drivers/of/of_private.h | 14 ++++
drivers/of/unittest-data/testcases.dts | 1 +
drivers/of/unittest-data/tests-address.dtsi | 48 +++++++++++
drivers/of/unittest.c | 92 +++++++++++++++++++++
include/linux/of_address.h | 21 +----
include/linux/of_device.h | 4 +-
9 files changed, 227 insertions(+), 80 deletions(-)
create mode 100644 drivers/of/unittest-data/tests-address.dtsi

--
2.20.1

Arnd Bergmann Sept. 29, 2019, 11:16 a.m. UTC | #1

On Fri, Sep 27, 2019 at 2:24 AM Rob Herring <robh@kernel.org> wrote:
>
> This series fixes several issues related to 'dma-ranges'. Primarily,
> 'dma-ranges' in a PCI bridge node does correctly set dma masks for PCI
> devices not described in the DT. A common case needing dma-ranges is a
> 32-bit PCIe bridge on a 64-bit system. This affects several platforms
> including Broadcom, NXP, Renesas, and Arm Juno. There's been several
> attempts to fix these issues, most recently earlier this week[1].
>
> In the process, I found several bugs in the address translation. It
> appears that things have happened to work as various DTs happen to use
> 1:1 addresses.
>
> First 3 patches are just some clean-up. The 4th patch adds a unittest
> exhibiting the issues. Patches 5-9 rework how of_dma_configure() works
> making it work on either a struct device child node or a struct
> device_node parent node so that it works on bus leaf nodes like PCI
> bridges. Patches 10 and 11 fix 2 issues with address translation for
> dma-ranges.
>
> My testing on this has been with QEMU virt machine hacked up to set PCI
> dma-ranges and the unittest. Nicolas reports this series resolves the
> issues on Rpi4 and NXP Layerscape platforms.

I've only looked briefly, but this all seems reasonable. Adding Christoph
to Cc here to draw his attention to it as he's done a lot of reworks on
the dma-mapping interfaces recently.

On a semi-related note, Thierry asked about one aspect of the dma-ranges
property recently, which is the behavior of dma_set_mask() and related
functions when a driver sets a mask that is larger than the memory
area in the bus-ranges but smaller than the available physical RAM.
As I understood Thierry's problem and the current code, the generic
dma_set_mask() will either reject the new mask entirely or override
the mask set by of_dma_configure, but it fails to set a correct mask
within the limitations of the parent bus in this case.

We had discussed and proposed patches for this in the past, but
it seems that never got anywhere. Maybe now that a number of
people have looked at this logic, we can figure it out for good.

        Arnd

> [1] https://lore.kernel.org/linux-arm-kernel/20190924181244.7159-1-nsaenzjulienne@suse.de/
>
> Rob Herring (5):
>   of: Remove unused of_find_matching_node_by_address()
>   of: Make of_dma_get_range() private
>   of/unittest: Add dma-ranges address translation tests
>   of/address: Translate 'dma-ranges' for parent nodes missing
>     'dma-ranges'
>   of/address: Fix of_pci_range_parser_one translation of DMA addresses
>
> Robin Murphy (6):
>   of: address: Report of_dma_get_range() errors meaningfully
>   of: Ratify of_dma_configure() interface
>   of/address: Introduce of_get_next_dma_parent() helper
>   of: address: Follow DMA parent for "dma-coherent"
>   of: Factor out #{addr,size}-cells parsing
>   of: Make of_dma_get_range() work on bus nodes
>
>  drivers/of/address.c                        | 83 +++++++++----------
>  drivers/of/base.c                           | 32 ++++---
>  drivers/of/device.c                         | 12 ++-
>  drivers/of/of_private.h                     | 14 ++++
>  drivers/of/unittest-data/testcases.dts      |  1 +
>  drivers/of/unittest-data/tests-address.dtsi | 48 +++++++++++
>  drivers/of/unittest.c                       | 92 +++++++++++++++++++++
>  include/linux/of_address.h                  | 21 +----
>  include/linux/of_device.h                   |  4 +-
>  9 files changed, 227 insertions(+), 80 deletions(-)
>  create mode 100644 drivers/of/unittest-data/tests-address.dtsi
>
> --
> 2.20.1

Christoph Hellwig Sept. 30, 2019, 8:20 a.m. UTC | #2

On Sun, Sep 29, 2019 at 01:16:20PM +0200, Arnd Bergmann wrote:
> On a semi-related note, Thierry asked about one aspect of the dma-ranges
> property recently, which is the behavior of dma_set_mask() and related
> functions when a driver sets a mask that is larger than the memory
> area in the bus-ranges but smaller than the available physical RAM.
> As I understood Thierry's problem and the current code, the generic
> dma_set_mask() will either reject the new mask entirely or override
> the mask set by of_dma_configure, but it fails to set a correct mask
> within the limitations of the parent bus in this case.

There days dma_set_mask will only reject a mask if it is too small
to be supported by the hardware.  Larger than required masks are now
always accepted.

Thierry Reding Sept. 30, 2019, 8:56 a.m. UTC | #3

On Mon, Sep 30, 2019 at 01:20:55AM -0700, Christoph Hellwig wrote:
> On Sun, Sep 29, 2019 at 01:16:20PM +0200, Arnd Bergmann wrote:
> > On a semi-related note, Thierry asked about one aspect of the dma-ranges
> > property recently, which is the behavior of dma_set_mask() and related
> > functions when a driver sets a mask that is larger than the memory
> > area in the bus-ranges but smaller than the available physical RAM.
> > As I understood Thierry's problem and the current code, the generic
> > dma_set_mask() will either reject the new mask entirely or override
> > the mask set by of_dma_configure, but it fails to set a correct mask
> > within the limitations of the parent bus in this case.
> 
> There days dma_set_mask will only reject a mask if it is too small
> to be supported by the hardware.  Larger than required masks are now
> always accepted.

Summarizing why this came up: the memory subsystem on Tegra194 has a
mechanism controlled by bit 39 of physical addresses. This is used to
support two variants of sector ordering for block linear formats. The
GPU uses a slightly different ordering than other MSS clients, so the
drivers have to set this bit depending on who they interoperate with.

I was running into this as I was adding support for IOMMU support for
the Ethernet controller on Tegra194. The controller has a HW feature
register that contains how many address bits it supports. This is 40
for Tegra194, corresponding to the number of address bits to the MSS.
Without IOMMU support that's not a problem because there are no systems
with 40 bits of system memory. However, if we enable IOMMU support, the
DMA/IOMMU code will allocate from the top of a 48-bit (constrained to
40 bits via the DMA mask) input address space. This causes bit 39 to be
set, which in turn will make the MSS reorder sectors and break network
communications.

Since this reordering takes place at the MSS level, this applies to all
MSS clients. Most of these clients always want bit 39 to be 0, whereas
the clients that can and want to make use of the reordering always want
bit 39 to be under their control, so they can control in a fine-grained
way when to set it.

This means that effectively all MSS clients can only address 39 bits, so
instead of hard-coding that for each driver I thought it'd make sense to
have a central place to configure this, so that all devices by default
are restricted to 39-bit addressing. However, with the current DMA API
implementation this causes a problem because the default 39-bit DMA mask
would get overwritten by the driver (as in the example of the Ethernet
controller setting a 40-bit DMA mask because that's what the hardware
supports).

I realize that this is somewhat exotic. On one hand it is correct for a
driver to say that the hardware supports 40-bit addressing (i.e. the
Ethernet controller can address bit 39), but from a system integration
point of view, using bit 39 is wrong, except in a very restricted set of
cases.

If I understand correctly, describing this with a dma-ranges property is
the right thing to do, but it wouldn't work with the current
implementation because drivers can still override a lower DMA mask with
a higher one.

Thierry

Nicolas Saenz Julienne Sept. 30, 2019, 9:20 a.m. UTC | #4

On Thu, 2019-09-26 at 19:24 -0500, Rob Herring wrote:
> This series fixes several issues related to 'dma-ranges'. Primarily,
> 'dma-ranges' in a PCI bridge node does correctly set dma masks for PCI
> devices not described in the DT. A common case needing dma-ranges is a
> 32-bit PCIe bridge on a 64-bit system. This affects several platforms
> including Broadcom, NXP, Renesas, and Arm Juno. There's been several
> attempts to fix these issues, most recently earlier this week[1].
> 
> In the process, I found several bugs in the address translation. It
> appears that things have happened to work as various DTs happen to use
> 1:1 addresses.
> 
> First 3 patches are just some clean-up. The 4th patch adds a unittest
> exhibiting the issues. Patches 5-9 rework how of_dma_configure() works
> making it work on either a struct device child node or a struct
> device_node parent node so that it works on bus leaf nodes like PCI
> bridges. Patches 10 and 11 fix 2 issues with address translation for
> dma-ranges.
> 
> My testing on this has been with QEMU virt machine hacked up to set PCI
> dma-ranges and the unittest. Nicolas reports this series resolves the
> issues on Rpi4 and NXP Layerscape platforms.
> 
> Rob
> 
> [1] 
> 
https://lore.kernel.org/linux-arm-kernel/20190924181244.7159-1-nsaenzjulienne@suse.de/
> 
> Rob Herring (5):
>   of: Remove unused of_find_matching_node_by_address()
>   of: Make of_dma_get_range() private
>   of/unittest: Add dma-ranges address translation tests
>   of/address: Translate 'dma-ranges' for parent nodes missing
>     'dma-ranges'
>   of/address: Fix of_pci_range_parser_one translation of DMA addresses
> 
> Robin Murphy (6):
>   of: address: Report of_dma_get_range() errors meaningfully
>   of: Ratify of_dma_configure() interface
>   of/address: Introduce of_get_next_dma_parent() helper
>   of: address: Follow DMA parent for "dma-coherent"
>   of: Factor out #{addr,size}-cells parsing
>   of: Make of_dma_get_range() work on bus nodes

Re-tested the whole series. Verified both the unittests run fine and PCIe's
behaviour is fixed.

Tested-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>

Also for the whole series:

Reviewed-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>

Regards,
Nicolas

Robin Murphy Sept. 30, 2019, 9:55 a.m. UTC | #5

On 2019-09-30 9:56 am, Thierry Reding wrote:
> On Mon, Sep 30, 2019 at 01:20:55AM -0700, Christoph Hellwig wrote:
>> On Sun, Sep 29, 2019 at 01:16:20PM +0200, Arnd Bergmann wrote:
>>> On a semi-related note, Thierry asked about one aspect of the dma-ranges
>>> property recently, which is the behavior of dma_set_mask() and related
>>> functions when a driver sets a mask that is larger than the memory
>>> area in the bus-ranges but smaller than the available physical RAM.
>>> As I understood Thierry's problem and the current code, the generic
>>> dma_set_mask() will either reject the new mask entirely or override
>>> the mask set by of_dma_configure, but it fails to set a correct mask
>>> within the limitations of the parent bus in this case.
>>
>> There days dma_set_mask will only reject a mask if it is too small
>> to be supported by the hardware.  Larger than required masks are now
>> always accepted.
> 
> Summarizing why this came up: the memory subsystem on Tegra194 has a
> mechanism controlled by bit 39 of physical addresses. This is used to
> support two variants of sector ordering for block linear formats. The
> GPU uses a slightly different ordering than other MSS clients, so the
> drivers have to set this bit depending on who they interoperate with.
> 
> I was running into this as I was adding support for IOMMU support for
> the Ethernet controller on Tegra194. The controller has a HW feature
> register that contains how many address bits it supports. This is 40
> for Tegra194, corresponding to the number of address bits to the MSS.
> Without IOMMU support that's not a problem because there are no systems
> with 40 bits of system memory. However, if we enable IOMMU support, the
> DMA/IOMMU code will allocate from the top of a 48-bit (constrained to
> 40 bits via the DMA mask) input address space. This causes bit 39 to be
> set, which in turn will make the MSS reorder sectors and break network
> communications.
> 
> Since this reordering takes place at the MSS level, this applies to all
> MSS clients. Most of these clients always want bit 39 to be 0, whereas
> the clients that can and want to make use of the reordering always want
> bit 39 to be under their control, so they can control in a fine-grained
> way when to set it.
> 
> This means that effectively all MSS clients can only address 39 bits, so
> instead of hard-coding that for each driver I thought it'd make sense to
> have a central place to configure this, so that all devices by default
> are restricted to 39-bit addressing. However, with the current DMA API
> implementation this causes a problem because the default 39-bit DMA mask
> would get overwritten by the driver (as in the example of the Ethernet
> controller setting a 40-bit DMA mask because that's what the hardware
> supports).
> 
> I realize that this is somewhat exotic. On one hand it is correct for a
> driver to say that the hardware supports 40-bit addressing (i.e. the
> Ethernet controller can address bit 39), but from a system integration
> point of view, using bit 39 is wrong, except in a very restricted set of
> cases.
> 
> If I understand correctly, describing this with a dma-ranges property is
> the right thing to do, but it wouldn't work with the current
> implementation because drivers can still override a lower DMA mask with
> a higher one.

But that sounds like exactly the situation for which we introduced 
bus_dma_mask. If "dma-ranges" is found, then we should initialise that 
to reflect the limitation. Drivers may subsequently set a larger mask 
based on what the device is natively capable of, but the DMA API 
internals should quietly clamp that down to the bus mask wherever it 
matters.

Since that change, the initial value of dma_mask and coherent_dma_mask 
doesn't really matter much, as we expect drivers to reset them anyway 
(and in general they have to be able to enlarge them from a 32-bit 
default value).

As far as I'm aware this has been working fine (albeit in equivalent 
ACPI form) for at least one SoC with 64-bit device masks, a 48-bit 
IOMMU, and a 44-bit interconnect in between - indeed if I avoid 
distraction long enough to set up the big new box under my desk, the 
sending of future emails will depend on it ;)

Robin.

Marek Vasut Sept. 30, 2019, 12:40 p.m. UTC | #6

On 9/27/19 2:24 AM, Rob Herring wrote:
> This series fixes several issues related to 'dma-ranges'. Primarily,
> 'dma-ranges' in a PCI bridge node does correctly set dma masks for PCI
> devices not described in the DT. A common case needing dma-ranges is a
> 32-bit PCIe bridge on a 64-bit system. This affects several platforms
> including Broadcom, NXP, Renesas, and Arm Juno. There's been several
> attempts to fix these issues, most recently earlier this week[1].
> 
> In the process, I found several bugs in the address translation. It
> appears that things have happened to work as various DTs happen to use
> 1:1 addresses.
> 
> First 3 patches are just some clean-up. The 4th patch adds a unittest
> exhibiting the issues. Patches 5-9 rework how of_dma_configure() works
> making it work on either a struct device child node or a struct
> device_node parent node so that it works on bus leaf nodes like PCI
> bridges. Patches 10 and 11 fix 2 issues with address translation for
> dma-ranges.
> 
> My testing on this has been with QEMU virt machine hacked up to set PCI
> dma-ranges and the unittest. Nicolas reports this series resolves the
> issues on Rpi4 and NXP Layerscape platforms.

With the following patches applied:
      https://patchwork.ozlabs.org/patch/1144870/
      https://patchwork.ozlabs.org/patch/1144871/
on R8A7795 Salvator-XS
Tested-by: Marek Vasut <marek.vasut+renesas@gmail.com>

Robin Murphy Sept. 30, 2019, 12:52 p.m. UTC | #7

On 30/09/2019 13:40, Marek Vasut wrote:
> On 9/27/19 2:24 AM, Rob Herring wrote:
>> This series fixes several issues related to 'dma-ranges'. Primarily,
>> 'dma-ranges' in a PCI bridge node does correctly set dma masks for PCI
>> devices not described in the DT. A common case needing dma-ranges is a
>> 32-bit PCIe bridge on a 64-bit system. This affects several platforms
>> including Broadcom, NXP, Renesas, and Arm Juno. There's been several
>> attempts to fix these issues, most recently earlier this week[1].
>>
>> In the process, I found several bugs in the address translation. It
>> appears that things have happened to work as various DTs happen to use
>> 1:1 addresses.
>>
>> First 3 patches are just some clean-up. The 4th patch adds a unittest
>> exhibiting the issues. Patches 5-9 rework how of_dma_configure() works
>> making it work on either a struct device child node or a struct
>> device_node parent node so that it works on bus leaf nodes like PCI
>> bridges. Patches 10 and 11 fix 2 issues with address translation for
>> dma-ranges.
>>
>> My testing on this has been with QEMU virt machine hacked up to set PCI
>> dma-ranges and the unittest. Nicolas reports this series resolves the
>> issues on Rpi4 and NXP Layerscape platforms.
> 
> With the following patches applied:
>        https://patchwork.ozlabs.org/patch/1144870/
>        https://patchwork.ozlabs.org/patch/1144871/

Can you try it without those additional patches? This series aims to 
make the parsing work properly generically, such that we shouldn't need 
to add an additional PCI-specific version of almost the same code.

Robin.

Marek Vasut Sept. 30, 2019, 12:54 p.m. UTC | #8

On 9/30/19 2:52 PM, Robin Murphy wrote:
> On 30/09/2019 13:40, Marek Vasut wrote:
>> On 9/27/19 2:24 AM, Rob Herring wrote:
>>> This series fixes several issues related to 'dma-ranges'. Primarily,
>>> 'dma-ranges' in a PCI bridge node does correctly set dma masks for PCI
>>> devices not described in the DT. A common case needing dma-ranges is a
>>> 32-bit PCIe bridge on a 64-bit system. This affects several platforms
>>> including Broadcom, NXP, Renesas, and Arm Juno. There's been several
>>> attempts to fix these issues, most recently earlier this week[1].
>>>
>>> In the process, I found several bugs in the address translation. It
>>> appears that things have happened to work as various DTs happen to use
>>> 1:1 addresses.
>>>
>>> First 3 patches are just some clean-up. The 4th patch adds a unittest
>>> exhibiting the issues. Patches 5-9 rework how of_dma_configure() works
>>> making it work on either a struct device child node or a struct
>>> device_node parent node so that it works on bus leaf nodes like PCI
>>> bridges. Patches 10 and 11 fix 2 issues with address translation for
>>> dma-ranges.
>>>
>>> My testing on this has been with QEMU virt machine hacked up to set PCI
>>> dma-ranges and the unittest. Nicolas reports this series resolves the
>>> issues on Rpi4 and NXP Layerscape platforms.
>>
>> With the following patches applied:
>>        https://patchwork.ozlabs.org/patch/1144870/
>>        https://patchwork.ozlabs.org/patch/1144871/
> 
> Can you try it without those additional patches? This series aims to
> make the parsing work properly generically, such that we shouldn't need
> to add an additional PCI-specific version of almost the same code.

Seems to work even without those.

Robin Murphy Sept. 30, 2019, 1:05 p.m. UTC | #9

On 30/09/2019 13:54, Marek Vasut wrote:
> On 9/30/19 2:52 PM, Robin Murphy wrote:
>> On 30/09/2019 13:40, Marek Vasut wrote:
>>> On 9/27/19 2:24 AM, Rob Herring wrote:
>>>> This series fixes several issues related to 'dma-ranges'. Primarily,
>>>> 'dma-ranges' in a PCI bridge node does correctly set dma masks for PCI
>>>> devices not described in the DT. A common case needing dma-ranges is a
>>>> 32-bit PCIe bridge on a 64-bit system. This affects several platforms
>>>> including Broadcom, NXP, Renesas, and Arm Juno. There's been several
>>>> attempts to fix these issues, most recently earlier this week[1].
>>>>
>>>> In the process, I found several bugs in the address translation. It
>>>> appears that things have happened to work as various DTs happen to use
>>>> 1:1 addresses.
>>>>
>>>> First 3 patches are just some clean-up. The 4th patch adds a unittest
>>>> exhibiting the issues. Patches 5-9 rework how of_dma_configure() works
>>>> making it work on either a struct device child node or a struct
>>>> device_node parent node so that it works on bus leaf nodes like PCI
>>>> bridges. Patches 10 and 11 fix 2 issues with address translation for
>>>> dma-ranges.
>>>>
>>>> My testing on this has been with QEMU virt machine hacked up to set PCI
>>>> dma-ranges and the unittest. Nicolas reports this series resolves the
>>>> issues on Rpi4 and NXP Layerscape platforms.
>>>
>>> With the following patches applied:
>>>         https://patchwork.ozlabs.org/patch/1144870/
>>>         https://patchwork.ozlabs.org/patch/1144871/
>>
>> Can you try it without those additional patches? This series aims to
>> make the parsing work properly generically, such that we shouldn't need
>> to add an additional PCI-specific version of almost the same code.
> 
> Seems to work even without those.

Great, thanks for confirming!

Robin.

Thierry Reding Sept. 30, 2019, 1:35 p.m. UTC | #10

On Mon, Sep 30, 2019 at 10:55:15AM +0100, Robin Murphy wrote:
> On 2019-09-30 9:56 am, Thierry Reding wrote:
> > On Mon, Sep 30, 2019 at 01:20:55AM -0700, Christoph Hellwig wrote:
> > > On Sun, Sep 29, 2019 at 01:16:20PM +0200, Arnd Bergmann wrote:
> > > > On a semi-related note, Thierry asked about one aspect of the dma-ranges
> > > > property recently, which is the behavior of dma_set_mask() and related
> > > > functions when a driver sets a mask that is larger than the memory
> > > > area in the bus-ranges but smaller than the available physical RAM.
> > > > As I understood Thierry's problem and the current code, the generic
> > > > dma_set_mask() will either reject the new mask entirely or override
> > > > the mask set by of_dma_configure, but it fails to set a correct mask
> > > > within the limitations of the parent bus in this case.
> > > 
> > > There days dma_set_mask will only reject a mask if it is too small
> > > to be supported by the hardware.  Larger than required masks are now
> > > always accepted.
> > 
> > Summarizing why this came up: the memory subsystem on Tegra194 has a
> > mechanism controlled by bit 39 of physical addresses. This is used to
> > support two variants of sector ordering for block linear formats. The
> > GPU uses a slightly different ordering than other MSS clients, so the
> > drivers have to set this bit depending on who they interoperate with.
> > 
> > I was running into this as I was adding support for IOMMU support for
> > the Ethernet controller on Tegra194. The controller has a HW feature
> > register that contains how many address bits it supports. This is 40
> > for Tegra194, corresponding to the number of address bits to the MSS.
> > Without IOMMU support that's not a problem because there are no systems
> > with 40 bits of system memory. However, if we enable IOMMU support, the
> > DMA/IOMMU code will allocate from the top of a 48-bit (constrained to
> > 40 bits via the DMA mask) input address space. This causes bit 39 to be
> > set, which in turn will make the MSS reorder sectors and break network
> > communications.
> > 
> > Since this reordering takes place at the MSS level, this applies to all
> > MSS clients. Most of these clients always want bit 39 to be 0, whereas
> > the clients that can and want to make use of the reordering always want
> > bit 39 to be under their control, so they can control in a fine-grained
> > way when to set it.
> > 
> > This means that effectively all MSS clients can only address 39 bits, so
> > instead of hard-coding that for each driver I thought it'd make sense to
> > have a central place to configure this, so that all devices by default
> > are restricted to 39-bit addressing. However, with the current DMA API
> > implementation this causes a problem because the default 39-bit DMA mask
> > would get overwritten by the driver (as in the example of the Ethernet
> > controller setting a 40-bit DMA mask because that's what the hardware
> > supports).
> > 
> > I realize that this is somewhat exotic. On one hand it is correct for a
> > driver to say that the hardware supports 40-bit addressing (i.e. the
> > Ethernet controller can address bit 39), but from a system integration
> > point of view, using bit 39 is wrong, except in a very restricted set of
> > cases.
> > 
> > If I understand correctly, describing this with a dma-ranges property is
> > the right thing to do, but it wouldn't work with the current
> > implementation because drivers can still override a lower DMA mask with
> > a higher one.
> 
> But that sounds like exactly the situation for which we introduced
> bus_dma_mask. If "dma-ranges" is found, then we should initialise that to
> reflect the limitation. Drivers may subsequently set a larger mask based on
> what the device is natively capable of, but the DMA API internals should
> quietly clamp that down to the bus mask wherever it matters.
> 
> Since that change, the initial value of dma_mask and coherent_dma_mask
> doesn't really matter much, as we expect drivers to reset them anyway (and
> in general they have to be able to enlarge them from a 32-bit default
> value).
> 
> As far as I'm aware this has been working fine (albeit in equivalent ACPI
> form) for at least one SoC with 64-bit device masks, a 48-bit IOMMU, and a
> 44-bit interconnect in between - indeed if I avoid distraction long enough
> to set up the big new box under my desk, the sending of future emails will
> depend on it ;)

After applying this series it does indeed seem to be working. The only
thing I had to do was add a dma-ranges property to the DMA parent. I
ended up doing that via an interconnects property because the default
DMA parent on Tegra194 is /cbb which restricts #address-cells = <1> and
#size-cells = <1>, so it can't actually translate anything beyond 32
bits of system memory.

So I basically ended up making the memory controller an interconnect
provider, increasing #address-cells = <2> and #size-cells = <2> again
and then using a dma-ranges property like this:

	dma-ranges = <0x0 0x0 0x0 0x80 0x0>;

to specify that only 39 bits should be used for addressing, leaving the
special bit 39 up to the driver to set as required.

Coincidentally making the memory controller an interconnect provider is
something that I was planning to do anyway in order to support memory
frequency scaling, so this all actually fits together pretty elegantly.

Thierry

[00/11] of: dma-ranges fixes and improvements

Message

Comments