diff mbox series

[v2] dma-mapping: treat dev->bus_dma_mask as a DMA limit

Message ID 20191121092646.8449-1-nsaenzjulienne@suse.de (mailing list archive)
State Mainlined
Commit a7ba70f1787f977f970cd116076c6fce4b9e01cc
Headers show
Series [v2] dma-mapping: treat dev->bus_dma_mask as a DMA limit | expand

Commit Message

Nicolas Saenz Julienne Nov. 21, 2019, 9:26 a.m. UTC
Using a mask to represent bus DMA constraints has a set of limitations.
The biggest one being it can only hold a power of two (minus one). The
DMA mapping code is already aware of this and treats dev->bus_dma_mask
as a limit. This quirk is already used by some architectures although
still rare.

With the introduction of the Raspberry Pi 4 we've found a new contender
for the use of bus DMA limits, as its PCIe bus can only address the
lower 3GB of memory (of a total of 4GB). This is impossible to represent
with a mask. To make things worse the device-tree code rounds non power
of two bus DMA limits to the next power of two, which is unacceptable in
this case.

In the light of this, rename dev->bus_dma_mask to dev->bus_dma_limit all
over the tree and treat it as such. Note that dev->bus_dma_limit should
contain the higher accesible DMA address.

Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>

---

Changes since v1:
  - rework ACPI code to avoid divergence with OF's

 arch/mips/pci/fixup-sb1250.c  | 16 ++++++++--------
 arch/powerpc/sysdev/fsl_pci.c |  6 +++---
 arch/x86/kernel/pci-dma.c     |  2 +-
 arch/x86/mm/mem_encrypt.c     |  2 +-
 arch/x86/pci/sta2x11-fixup.c  |  2 +-
 drivers/acpi/arm64/iort.c     | 20 +++++++-------------
 drivers/ata/ahci.c            |  2 +-
 drivers/iommu/dma-iommu.c     |  3 +--
 drivers/of/device.c           |  9 +++++----
 include/linux/device.h        |  6 +++---
 include/linux/dma-direct.h    |  2 +-
 include/linux/dma-mapping.h   |  2 +-
 kernel/dma/direct.c           | 27 +++++++++++++--------------
 13 files changed, 46 insertions(+), 53 deletions(-)

Comments

Christoph Hellwig Nov. 21, 2019, 3:24 p.m. UTC | #1
On Thu, Nov 21, 2019 at 10:26:44AM +0100, Nicolas Saenz Julienne wrote:
> Using a mask to represent bus DMA constraints has a set of limitations.
> The biggest one being it can only hold a power of two (minus one). The
> DMA mapping code is already aware of this and treats dev->bus_dma_mask
> as a limit. This quirk is already used by some architectures although
> still rare.
> 
> With the introduction of the Raspberry Pi 4 we've found a new contender
> for the use of bus DMA limits, as its PCIe bus can only address the
> lower 3GB of memory (of a total of 4GB). This is impossible to represent
> with a mask. To make things worse the device-tree code rounds non power
> of two bus DMA limits to the next power of two, which is unacceptable in
> this case.
> 
> In the light of this, rename dev->bus_dma_mask to dev->bus_dma_limit all
> over the tree and treat it as such. Note that dev->bus_dma_limit should
> contain the higher accesible DMA address.
> 
> Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>

I've tentatively added this patch to the dma-mapping tree based on
Robins principal approval of the last version.  That way tomorrows
linux-next run should still pick it up.
Nicolas Saenz Julienne Nov. 21, 2019, 3:26 p.m. UTC | #2
On Thu, 2019-11-21 at 16:24 +0100, Christoph Hellwig wrote:
> On Thu, Nov 21, 2019 at 10:26:44AM +0100, Nicolas Saenz Julienne wrote:
> > Using a mask to represent bus DMA constraints has a set of limitations.
> > The biggest one being it can only hold a power of two (minus one). The
> > DMA mapping code is already aware of this and treats dev->bus_dma_mask
> > as a limit. This quirk is already used by some architectures although
> > still rare.
> > 
> > With the introduction of the Raspberry Pi 4 we've found a new contender
> > for the use of bus DMA limits, as its PCIe bus can only address the
> > lower 3GB of memory (of a total of 4GB). This is impossible to represent
> > with a mask. To make things worse the device-tree code rounds non power
> > of two bus DMA limits to the next power of two, which is unacceptable in
> > this case.
> > 
> > In the light of this, rename dev->bus_dma_mask to dev->bus_dma_limit all
> > over the tree and treat it as such. Note that dev->bus_dma_limit should
> > contain the higher accesible DMA address.
> > 
> > Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
> 
> I've tentatively added this patch to the dma-mapping tree based on
> Robins principal approval of the last version.  That way tomorrows
> linux-next run should still pick it up.

Thanks!
Christoph Hellwig Nov. 21, 2019, 3:26 p.m. UTC | #3
On Thu, Nov 21, 2019 at 04:24:57PM +0100, Christoph Hellwig wrote:
> On Thu, Nov 21, 2019 at 10:26:44AM +0100, Nicolas Saenz Julienne wrote:
> > Using a mask to represent bus DMA constraints has a set of limitations.
> > The biggest one being it can only hold a power of two (minus one). The
> > DMA mapping code is already aware of this and treats dev->bus_dma_mask
> > as a limit. This quirk is already used by some architectures although
> > still rare.
> > 
> > With the introduction of the Raspberry Pi 4 we've found a new contender
> > for the use of bus DMA limits, as its PCIe bus can only address the
> > lower 3GB of memory (of a total of 4GB). This is impossible to represent
> > with a mask. To make things worse the device-tree code rounds non power
> > of two bus DMA limits to the next power of two, which is unacceptable in
> > this case.
> > 
> > In the light of this, rename dev->bus_dma_mask to dev->bus_dma_limit all
> > over the tree and treat it as such. Note that dev->bus_dma_limit should
> > contain the higher accesible DMA address.
> > 
> > Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
> 
> I've tentatively added this patch to the dma-mapping tree based on
> Robins principal approval of the last version.  That way tomorrows
> linux-next run should still pick it up.

Actually.  This doesn't apply because the dma-mapping tree doesn't
have you zone_dma_bits change.  I guess we'll need to wait for the
next merge window, or maybe post rc1 if this happens to fix the
powerpc problem that Christian reported.
Robin Murphy Nov. 21, 2019, 3:55 p.m. UTC | #4
On 21/11/2019 3:26 pm, Christoph Hellwig wrote:
> On Thu, Nov 21, 2019 at 04:24:57PM +0100, Christoph Hellwig wrote:
>> On Thu, Nov 21, 2019 at 10:26:44AM +0100, Nicolas Saenz Julienne wrote:
>>> Using a mask to represent bus DMA constraints has a set of limitations.
>>> The biggest one being it can only hold a power of two (minus one). The
>>> DMA mapping code is already aware of this and treats dev->bus_dma_mask
>>> as a limit. This quirk is already used by some architectures although
>>> still rare.
>>>
>>> With the introduction of the Raspberry Pi 4 we've found a new contender
>>> for the use of bus DMA limits, as its PCIe bus can only address the
>>> lower 3GB of memory (of a total of 4GB). This is impossible to represent
>>> with a mask. To make things worse the device-tree code rounds non power
>>> of two bus DMA limits to the next power of two, which is unacceptable in
>>> this case.
>>>
>>> In the light of this, rename dev->bus_dma_mask to dev->bus_dma_limit all
>>> over the tree and treat it as such. Note that dev->bus_dma_limit should
>>> contain the higher accesible DMA address.
>>>
>>> Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
>>
>> I've tentatively added this patch to the dma-mapping tree based on
>> Robins principal approval of the last version.  That way tomorrows
>> linux-next run should still pick it up.
> 
> Actually.  This doesn't apply because the dma-mapping tree doesn't
> have you zone_dma_bits change.  I guess we'll need to wait for the
> next merge window, or maybe post rc1 if this happens to fix the
> powerpc problem that Christian reported.

Hmm, there's no functional dependency though, is there? AFAICS it's 
essentially just a context conflict. Is it worth simply dropping (or 
postponing) the local renaming in __dma_direct_optimal_gfp_mask(), or 
perhaps even cross-merging arm64/for-next/zone-dma into dma/for-next?

Robin.
Christoph Hellwig Nov. 21, 2019, 4:02 p.m. UTC | #5
On Thu, Nov 21, 2019 at 03:55:39PM +0000, Robin Murphy wrote:
> Hmm, there's no functional dependency though, is there? AFAICS it's 
> essentially just a context conflict. Is it worth simply dropping (or 
> postponing) the local renaming in __dma_direct_optimal_gfp_mask(), or 
> perhaps even cross-merging arm64/for-next/zone-dma into dma/for-next?

I would have no problem with pulling it in.  I'd kinda hate creating
the conflict, though.  So if the arm64 maintainers are fine with it
I'll pull it in, especially if I get an ACK from Robin.
Will Deacon Nov. 21, 2019, 4:53 p.m. UTC | #6
On Thu, Nov 21, 2019 at 05:02:17PM +0100, Christoph Hellwig wrote:
> On Thu, Nov 21, 2019 at 03:55:39PM +0000, Robin Murphy wrote:
> > Hmm, there's no functional dependency though, is there? AFAICS it's 
> > essentially just a context conflict. Is it worth simply dropping (or 
> > postponing) the local renaming in __dma_direct_optimal_gfp_mask(), or 
> > perhaps even cross-merging arm64/for-next/zone-dma into dma/for-next?
> 
> I would have no problem with pulling it in.  I'd kinda hate creating
> the conflict, though.  So if the arm64 maintainers are fine with it
> I'll pull it in, especially if I get an ACK from Robin.

Please go ahead and pull in our for-next/zone-dma branch if you need it.
We're not going to rebase it, and I suspect we won't even be queueing
anything else there at this stage in the game.

Cheers,

Will
Robin Murphy Nov. 21, 2019, 5:07 p.m. UTC | #7
On 21/11/2019 9:26 am, Nicolas Saenz Julienne wrote:
> Using a mask to represent bus DMA constraints has a set of limitations.
> The biggest one being it can only hold a power of two (minus one). The
> DMA mapping code is already aware of this and treats dev->bus_dma_mask
> as a limit. This quirk is already used by some architectures although
> still rare.
> 
> With the introduction of the Raspberry Pi 4 we've found a new contender
> for the use of bus DMA limits, as its PCIe bus can only address the
> lower 3GB of memory (of a total of 4GB). This is impossible to represent
> with a mask. To make things worse the device-tree code rounds non power
> of two bus DMA limits to the next power of two, which is unacceptable in
> this case.
> 
> In the light of this, rename dev->bus_dma_mask to dev->bus_dma_limit all
> over the tree and treat it as such. Note that dev->bus_dma_limit should
> contain the higher accesible DMA address.

^^ super-nit only because I can't not see my editor currently 
highlighting the typo: "accessible"

Regardless of that though,

Reviewed-by: Robin Murphy <robin.murphy@arm.com>

> Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
> 
> ---
> 
> Changes since v1:
>    - rework ACPI code to avoid divergence with OF's
> 
>   arch/mips/pci/fixup-sb1250.c  | 16 ++++++++--------
>   arch/powerpc/sysdev/fsl_pci.c |  6 +++---
>   arch/x86/kernel/pci-dma.c     |  2 +-
>   arch/x86/mm/mem_encrypt.c     |  2 +-
>   arch/x86/pci/sta2x11-fixup.c  |  2 +-
>   drivers/acpi/arm64/iort.c     | 20 +++++++-------------
>   drivers/ata/ahci.c            |  2 +-
>   drivers/iommu/dma-iommu.c     |  3 +--
>   drivers/of/device.c           |  9 +++++----
>   include/linux/device.h        |  6 +++---
>   include/linux/dma-direct.h    |  2 +-
>   include/linux/dma-mapping.h   |  2 +-
>   kernel/dma/direct.c           | 27 +++++++++++++--------------
>   13 files changed, 46 insertions(+), 53 deletions(-)
> 
> diff --git a/arch/mips/pci/fixup-sb1250.c b/arch/mips/pci/fixup-sb1250.c
> index 8a41b359cf90..40efc990cdce 100644
> --- a/arch/mips/pci/fixup-sb1250.c
> +++ b/arch/mips/pci/fixup-sb1250.c
> @@ -21,22 +21,22 @@ DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_SIBYTE, PCI_DEVICE_ID_BCM1250_PCI,
>   
>   /*
>    * The BCM1250, etc. PCI host bridge does not support DAC on its 32-bit
> - * bus, so we set the bus's DMA mask accordingly.  However the HT link
> + * bus, so we set the bus's DMA limit accordingly.  However the HT link
>    * down the artificial PCI-HT bridge supports 40-bit addressing and the
>    * SP1011 HT-PCI bridge downstream supports both DAC and a 64-bit bus
>    * width, so we record the PCI-HT bridge's secondary and subordinate bus
> - * numbers and do not set the mask for devices present in the inclusive
> + * numbers and do not set the limit for devices present in the inclusive
>    * range of those.
>    */
> -struct sb1250_bus_dma_mask_exclude {
> +struct sb1250_bus_dma_limit_exclude {
>   	bool set;
>   	unsigned char start;
>   	unsigned char end;
>   };
>   
> -static int sb1250_bus_dma_mask(struct pci_dev *dev, void *data)
> +static int sb1250_bus_dma_limit(struct pci_dev *dev, void *data)
>   {
> -	struct sb1250_bus_dma_mask_exclude *exclude = data;
> +	struct sb1250_bus_dma_limit_exclude *exclude = data;
>   	bool exclude_this;
>   	bool ht_bridge;
>   
> @@ -55,7 +55,7 @@ static int sb1250_bus_dma_mask(struct pci_dev *dev, void *data)
>   			exclude->start, exclude->end);
>   	} else {
>   		dev_dbg(&dev->dev, "disabling DAC for device");
> -		dev->dev.bus_dma_mask = DMA_BIT_MASK(32);
> +		dev->dev.bus_dma_limit = DMA_BIT_MASK(32);
>   	}
>   
>   	return 0;
> @@ -63,9 +63,9 @@ static int sb1250_bus_dma_mask(struct pci_dev *dev, void *data)
>   
>   static void quirk_sb1250_pci_dac(struct pci_dev *dev)
>   {
> -	struct sb1250_bus_dma_mask_exclude exclude = { .set = false };
> +	struct sb1250_bus_dma_limit_exclude exclude = { .set = false };
>   
> -	pci_walk_bus(dev->bus, sb1250_bus_dma_mask, &exclude);
> +	pci_walk_bus(dev->bus, sb1250_bus_dma_limit, &exclude);
>   }
>   DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_SIBYTE, PCI_DEVICE_ID_BCM1250_PCI,
>   			quirk_sb1250_pci_dac);
> diff --git a/arch/powerpc/sysdev/fsl_pci.c b/arch/powerpc/sysdev/fsl_pci.c
> index ff0e2b156cb5..617a443d673d 100644
> --- a/arch/powerpc/sysdev/fsl_pci.c
> +++ b/arch/powerpc/sysdev/fsl_pci.c
> @@ -115,8 +115,8 @@ static void pci_dma_dev_setup_swiotlb(struct pci_dev *pdev)
>   {
>   	struct pci_controller *hose = pci_bus_to_host(pdev->bus);
>   
> -	pdev->dev.bus_dma_mask =
> -		hose->dma_window_base_cur + hose->dma_window_size;
> +	pdev->dev.bus_dma_limit =
> +		hose->dma_window_base_cur + hose->dma_window_size - 1;
>   }
>   
>   static void setup_swiotlb_ops(struct pci_controller *hose)
> @@ -135,7 +135,7 @@ static void fsl_pci_dma_set_mask(struct device *dev, u64 dma_mask)
>   	 * mapping that allows addressing any RAM address from across PCI.
>   	 */
>   	if (dev_is_pci(dev) && dma_mask >= pci64_dma_offset * 2 - 1) {
> -		dev->bus_dma_mask = 0;
> +		dev->bus_dma_limit = 0;
>   		dev->archdata.dma_offset = pci64_dma_offset;
>   	}
>   }
> diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
> index 57de2ebff7e2..5dcedad21dff 100644
> --- a/arch/x86/kernel/pci-dma.c
> +++ b/arch/x86/kernel/pci-dma.c
> @@ -140,7 +140,7 @@ rootfs_initcall(pci_iommu_init);
>   
>   static int via_no_dac_cb(struct pci_dev *pdev, void *data)
>   {
> -	pdev->dev.bus_dma_mask = DMA_BIT_MASK(32);
> +	pdev->dev.bus_dma_limit = DMA_BIT_MASK(32);
>   	return 0;
>   }
>   
> diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
> index 9268c12458c8..a03614bd3e1a 100644
> --- a/arch/x86/mm/mem_encrypt.c
> +++ b/arch/x86/mm/mem_encrypt.c
> @@ -367,7 +367,7 @@ bool force_dma_unencrypted(struct device *dev)
>   	if (sme_active()) {
>   		u64 dma_enc_mask = DMA_BIT_MASK(__ffs64(sme_me_mask));
>   		u64 dma_dev_mask = min_not_zero(dev->coherent_dma_mask,
> -						dev->bus_dma_mask);
> +						dev->bus_dma_limit);
>   
>   		if (dma_dev_mask <= dma_enc_mask)
>   			return true;
> diff --git a/arch/x86/pci/sta2x11-fixup.c b/arch/x86/pci/sta2x11-fixup.c
> index 4a631264b809..c313d784efab 100644
> --- a/arch/x86/pci/sta2x11-fixup.c
> +++ b/arch/x86/pci/sta2x11-fixup.c
> @@ -143,7 +143,7 @@ static void sta2x11_map_ep(struct pci_dev *pdev)
>   
>   	dev->dma_pfn_offset = PFN_DOWN(-amba_base);
>   
> -	dev->bus_dma_mask = max_amba_addr;
> +	dev->bus_dma_limit = max_amba_addr;
>   	pci_set_consistent_dma_mask(pdev, max_amba_addr);
>   	pci_set_dma_mask(pdev, max_amba_addr);
>   
> diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
> index 5a7551d060f2..33f71983e001 100644
> --- a/drivers/acpi/arm64/iort.c
> +++ b/drivers/acpi/arm64/iort.c
> @@ -1057,8 +1057,8 @@ static int rc_dma_get_range(struct device *dev, u64 *size)
>    */
>   void iort_dma_setup(struct device *dev, u64 *dma_addr, u64 *dma_size)
>   {
> -	u64 mask, dmaaddr = 0, size = 0, offset = 0;
> -	int ret, msb;
> +	u64 end, mask, dmaaddr = 0, size = 0, offset = 0;
> +	int ret;
>   
>   	/*
>   	 * If @dev is expected to be DMA-capable then the bus code that created
> @@ -1085,19 +1085,13 @@ void iort_dma_setup(struct device *dev, u64 *dma_addr, u64 *dma_size)
>   	}
>   
>   	if (!ret) {
> -		msb = fls64(dmaaddr + size - 1);
>   		/*
> -		 * Round-up to the power-of-two mask or set
> -		 * the mask to the whole 64-bit address space
> -		 * in case the DMA region covers the full
> -		 * memory window.
> +		 * Limit coherent and dma mask based on size retrieved from
> +		 * firmware.
>   		 */
> -		mask = msb == 64 ? U64_MAX : (1ULL << msb) - 1;
> -		/*
> -		 * Limit coherent and dma mask based on size
> -		 * retrieved from firmware.
> -		 */
> -		dev->bus_dma_mask = mask;
> +		end = dmaaddr + size - 1;
> +		mask = DMA_BIT_MASK(ilog2(end) + 1);
> +		dev->bus_dma_limit = end;
>   		dev->coherent_dma_mask = mask;
>   		*dev->dma_mask = mask;
>   	}
> diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
> index ec6c64fce74a..4bfd1b14b390 100644
> --- a/drivers/ata/ahci.c
> +++ b/drivers/ata/ahci.c
> @@ -910,7 +910,7 @@ static int ahci_configure_dma_masks(struct pci_dev *pdev, int using_dac)
>   	 * value, don't extend it here. This happens on STA2X11, for example.
>   	 *
>   	 * XXX: manipulating the DMA mask from platform code is completely
> -	 * bogus, platform code should use dev->bus_dma_mask instead..
> +	 * bogus, platform code should use dev->bus_dma_limit instead..
>   	 */
>   	if (pdev->dma_mask && pdev->dma_mask < DMA_BIT_MASK(32))
>   		return 0;
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 041066f3ec99..0cc702a70a96 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -421,8 +421,7 @@ static dma_addr_t iommu_dma_alloc_iova(struct iommu_domain *domain,
>   	if (iova_len < (1 << (IOVA_RANGE_CACHE_MAX_SIZE - 1)))
>   		iova_len = roundup_pow_of_two(iova_len);
>   
> -	if (dev->bus_dma_mask)
> -		dma_limit &= dev->bus_dma_mask;
> +	dma_limit = min_not_zero(dma_limit, dev->bus_dma_limit);
>   
>   	if (domain->geometry.force_aperture)
>   		dma_limit = min(dma_limit, domain->geometry.aperture_end);
> diff --git a/drivers/of/device.c b/drivers/of/device.c
> index da8158392010..e9127db7b067 100644
> --- a/drivers/of/device.c
> +++ b/drivers/of/device.c
> @@ -93,7 +93,7 @@ int of_dma_configure(struct device *dev, struct device_node *np, bool force_dma)
>   	bool coherent;
>   	unsigned long offset;
>   	const struct iommu_ops *iommu;
> -	u64 mask;
> +	u64 mask, end;
>   
>   	ret = of_dma_get_range(np, &dma_addr, &paddr, &size);
>   	if (ret < 0) {
> @@ -148,12 +148,13 @@ int of_dma_configure(struct device *dev, struct device_node *np, bool force_dma)
>   	 * Limit coherent and dma mask based on size and default mask
>   	 * set by the driver.
>   	 */
> -	mask = DMA_BIT_MASK(ilog2(dma_addr + size - 1) + 1);
> +	end = dma_addr + size - 1;
> +	mask = DMA_BIT_MASK(ilog2(end) + 1);
>   	dev->coherent_dma_mask &= mask;
>   	*dev->dma_mask &= mask;
> -	/* ...but only set bus mask if we found valid dma-ranges earlier */
> +	/* ...but only set bus limit if we found valid dma-ranges earlier */
>   	if (!ret)
> -		dev->bus_dma_mask = mask;
> +		dev->bus_dma_limit = end;
>   
>   	coherent = of_dma_is_coherent(np);
>   	dev_dbg(dev, "device is%sdma coherent\n",
> diff --git a/include/linux/device.h b/include/linux/device.h
> index 99af366db50d..dada6b4bd7e0 100644
> --- a/include/linux/device.h
> +++ b/include/linux/device.h
> @@ -1214,8 +1214,8 @@ struct dev_links_info {
>    * @coherent_dma_mask: Like dma_mask, but for alloc_coherent mapping as not all
>    * 		hardware supports 64-bit addresses for consistent allocations
>    * 		such descriptors.
> - * @bus_dma_mask: Mask of an upstream bridge or bus which imposes a smaller DMA
> - *		limit than the device itself supports.
> + * @bus_dma_limit: Limit of an upstream bridge or bus which imposes a smaller
> + *		DMA limit than the device itself supports.
>    * @dma_pfn_offset: offset of DMA memory range relatively of RAM
>    * @dma_parms:	A low level driver may set these to teach IOMMU code about
>    * 		segment limitations.
> @@ -1301,7 +1301,7 @@ struct device {
>   					     not all hardware supports
>   					     64 bit addresses for consistent
>   					     allocations such descriptors. */
> -	u64		bus_dma_mask;	/* upstream dma_mask constraint */
> +	u64		bus_dma_limit;	/* upstream dma constraint */
>   	unsigned long	dma_pfn_offset;
>   
>   	struct device_dma_parameters *dma_parms;
> diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
> index c5c5b5ff2371..aa031cb213c3 100644
> --- a/include/linux/dma-direct.h
> +++ b/include/linux/dma-direct.h
> @@ -62,7 +62,7 @@ static inline bool dma_capable(struct device *dev, dma_addr_t addr, size_t size)
>   	    min(addr, end) < phys_to_dma(dev, PFN_PHYS(min_low_pfn)))
>   		return false;
>   
> -	return end <= min_not_zero(*dev->dma_mask, dev->bus_dma_mask);
> +	return end <= min_not_zero(*dev->dma_mask, dev->bus_dma_limit);
>   }
>   
>   u64 dma_direct_get_required_mask(struct device *dev);
> diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
> index a4930310d0c7..330ad58fbf4d 100644
> --- a/include/linux/dma-mapping.h
> +++ b/include/linux/dma-mapping.h
> @@ -694,7 +694,7 @@ static inline int dma_coerce_mask_and_coherent(struct device *dev, u64 mask)
>    */
>   static inline bool dma_addressing_limited(struct device *dev)
>   {
> -	return min_not_zero(dma_get_mask(dev), dev->bus_dma_mask) <
> +	return min_not_zero(dma_get_mask(dev), dev->bus_dma_limit) <
>   			    dma_get_required_mask(dev);
>   }
>   
> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> index 84cf8be65078..998d48ddfb07 100644
> --- a/kernel/dma/direct.c
> +++ b/kernel/dma/direct.c
> @@ -27,10 +27,10 @@ static void report_addr(struct device *dev, dma_addr_t dma_addr, size_t size)
>   {
>   	if (!dev->dma_mask) {
>   		dev_err_once(dev, "DMA map on device without dma_mask\n");
> -	} else if (*dev->dma_mask >= DMA_BIT_MASK(32) || dev->bus_dma_mask) {
> +	} else if (*dev->dma_mask >= DMA_BIT_MASK(32) || dev->bus_dma_limit) {
>   		dev_err_once(dev,
> -			"overflow %pad+%zu of DMA mask %llx bus mask %llx\n",
> -			&dma_addr, size, *dev->dma_mask, dev->bus_dma_mask);
> +			"overflow %pad+%zu of DMA mask %llx bus limit %llx\n",
> +			&dma_addr, size, *dev->dma_mask, dev->bus_dma_limit);
>   	}
>   	WARN_ON_ONCE(1);
>   }
> @@ -57,15 +57,14 @@ u64 dma_direct_get_required_mask(struct device *dev)
>   }
>   
>   static gfp_t __dma_direct_optimal_gfp_mask(struct device *dev, u64 dma_mask,
> -		u64 *phys_mask)
> +		u64 *phys_limit)
>   {
> -	if (dev->bus_dma_mask && dev->bus_dma_mask < dma_mask)
> -		dma_mask = dev->bus_dma_mask;
> +	u64 dma_limit = min_not_zero(dma_mask, dev->bus_dma_limit);
>   
>   	if (force_dma_unencrypted(dev))
> -		*phys_mask = __dma_to_phys(dev, dma_mask);
> +		*phys_limit = __dma_to_phys(dev, dma_limit);
>   	else
> -		*phys_mask = dma_to_phys(dev, dma_mask);
> +		*phys_limit = dma_to_phys(dev, dma_limit);
>   
>   	/*
>   	 * Optimistically try the zone that the physical address mask falls
> @@ -75,9 +74,9 @@ static gfp_t __dma_direct_optimal_gfp_mask(struct device *dev, u64 dma_mask,
>   	 * Note that GFP_DMA32 and GFP_DMA are no ops without the corresponding
>   	 * zones.
>   	 */
> -	if (*phys_mask <= DMA_BIT_MASK(zone_dma_bits))
> +	if (*phys_limit <= DMA_BIT_MASK(zone_dma_bits))
>   		return GFP_DMA;
> -	if (*phys_mask <= DMA_BIT_MASK(32))
> +	if (*phys_limit <= DMA_BIT_MASK(32))
>   		return GFP_DMA32;
>   	return 0;
>   }
> @@ -85,7 +84,7 @@ static gfp_t __dma_direct_optimal_gfp_mask(struct device *dev, u64 dma_mask,
>   static bool dma_coherent_ok(struct device *dev, phys_addr_t phys, size_t size)
>   {
>   	return phys_to_dma_direct(dev, phys) + size - 1 <=
> -			min_not_zero(dev->coherent_dma_mask, dev->bus_dma_mask);
> +			min_not_zero(dev->coherent_dma_mask, dev->bus_dma_limit);
>   }
>   
>   struct page *__dma_direct_alloc_pages(struct device *dev, size_t size,
> @@ -94,7 +93,7 @@ struct page *__dma_direct_alloc_pages(struct device *dev, size_t size,
>   	size_t alloc_size = PAGE_ALIGN(size);
>   	int node = dev_to_node(dev);
>   	struct page *page = NULL;
> -	u64 phys_mask;
> +	u64 phys_limit;
>   
>   	if (attrs & DMA_ATTR_NO_WARN)
>   		gfp |= __GFP_NOWARN;
> @@ -102,7 +101,7 @@ struct page *__dma_direct_alloc_pages(struct device *dev, size_t size,
>   	/* we always manually zero the memory once we are done: */
>   	gfp &= ~__GFP_ZERO;
>   	gfp |= __dma_direct_optimal_gfp_mask(dev, dev->coherent_dma_mask,
> -			&phys_mask);
> +			&phys_limit);
>   	page = dma_alloc_contiguous(dev, alloc_size, gfp);
>   	if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) {
>   		dma_free_contiguous(dev, page, alloc_size);
> @@ -116,7 +115,7 @@ struct page *__dma_direct_alloc_pages(struct device *dev, size_t size,
>   		page = NULL;
>   
>   		if (IS_ENABLED(CONFIG_ZONE_DMA32) &&
> -		    phys_mask < DMA_BIT_MASK(64) &&
> +		    phys_limit < DMA_BIT_MASK(64) &&
>   		    !(gfp & (GFP_DMA32 | GFP_DMA))) {
>   			gfp |= GFP_DMA32;
>   			goto again;
>
Christoph Hellwig Nov. 21, 2019, 5:16 p.m. UTC | #8
On Thu, Nov 21, 2019 at 04:53:48PM +0000, Will Deacon wrote:
> Please go ahead and pull in our for-next/zone-dma branch if you need it.
> We're not going to rebase it, and I suspect we won't even be queueing
> anything else there at this stage in the game.

Ok.  I've pulled it in and will wait with sending the dma-mapping
pull request until the arm64 one made it to Linus.
Christoph Hellwig Nov. 21, 2019, 5:17 p.m. UTC | #9
On Thu, Nov 21, 2019 at 05:07:54PM +0000, Robin Murphy wrote:
> ^^ super-nit only because I can't not see my editor currently highlighting 
> the typo: "accessible"
>
> Regardless of that though,
>
> Reviewed-by: Robin Murphy <robin.murphy@arm.com>

Applied for real now with that typo fixed and on top of the pulled in
arm64 branch.
Nathan Chancellor Nov. 23, 2019, 4:51 p.m. UTC | #10
On Thu, Nov 21, 2019 at 10:26:44AM +0100, Nicolas Saenz Julienne wrote:
> Using a mask to represent bus DMA constraints has a set of limitations.
> The biggest one being it can only hold a power of two (minus one). The
> DMA mapping code is already aware of this and treats dev->bus_dma_mask
> as a limit. This quirk is already used by some architectures although
> still rare.
> 
> With the introduction of the Raspberry Pi 4 we've found a new contender
> for the use of bus DMA limits, as its PCIe bus can only address the
> lower 3GB of memory (of a total of 4GB). This is impossible to represent
> with a mask. To make things worse the device-tree code rounds non power
> of two bus DMA limits to the next power of two, which is unacceptable in
> this case.
> 
> In the light of this, rename dev->bus_dma_mask to dev->bus_dma_limit all
> over the tree and treat it as such. Note that dev->bus_dma_limit should
> contain the higher accesible DMA address.
> 
> Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
<snip>
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 041066f3ec99..0cc702a70a96 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -421,8 +421,7 @@ static dma_addr_t iommu_dma_alloc_iova(struct iommu_domain *domain,
>  	if (iova_len < (1 << (IOVA_RANGE_CACHE_MAX_SIZE - 1)))
>  		iova_len = roundup_pow_of_two(iova_len);
>  
> -	if (dev->bus_dma_mask)
> -		dma_limit &= dev->bus_dma_mask;
> +	dma_limit = min_not_zero(dma_limit, dev->bus_dma_limit);
>  
>  	if (domain->geometry.force_aperture)
>  		dma_limit = min(dma_limit, domain->geometry.aperture_end);

Just as an FYI, this introduces a warning on arm32 allyesconfig for me:

In file included from ../include/linux/list.h:9,
                 from ../include/linux/kobject.h:19,
                 from ../include/linux/of.h:17,
                 from ../include/linux/irqdomain.h:35,
                 from ../include/linux/acpi.h:13,
                 from ../include/linux/acpi_iort.h:10,
                 from ../drivers/iommu/dma-iommu.c:11:
../drivers/iommu/dma-iommu.c: In function 'iommu_dma_alloc_iova':
../include/linux/kernel.h:851:29: warning: comparison of distinct pointer types lacks a cast
  851 |   (!!(sizeof((typeof(x) *)1 == (typeof(y) *)1)))
      |                             ^~
../include/linux/kernel.h:865:4: note: in expansion of macro '__typecheck'
  865 |   (__typecheck(x, y) && __no_side_effects(x, y))
      |    ^~~~~~~~~~~
../include/linux/kernel.h:875:24: note: in expansion of macro '__safe_cmp'
  875 |  __builtin_choose_expr(__safe_cmp(x, y), \
      |                        ^~~~~~~~~~
../include/linux/kernel.h:884:19: note: in expansion of macro '__careful_cmp'
  884 | #define min(x, y) __careful_cmp(x, y, <)
      |                   ^~~~~~~~~~~~~
../include/linux/kernel.h:917:39: note: in expansion of macro 'min'
  917 |  __x == 0 ? __y : ((__y == 0) ? __x : min(__x, __y)); })
      |                                       ^~~
../drivers/iommu/dma-iommu.c:424:14: note: in expansion of macro 'min_not_zero'
  424 |  dma_limit = min_not_zero(dma_limit, dev->bus_dma_limit);
      |              ^~~~~~~~~~~~

Cheers,
Nathan
Christoph Hellwig Nov. 25, 2019, 7:44 a.m. UTC | #11
On Sat, Nov 23, 2019 at 09:51:08AM -0700, Nathan Chancellor wrote:
> Just as an FYI, this introduces a warning on arm32 allyesconfig for me:

I think the dma_limit argument to iommu_dma_alloc_iova should be a u64
and/or we need to use min_t and open code the zero exception.

Robin, Nicolas - any opinions?

Also I wonder how this file gets compiled on arm32 given that arm32
has its own set of iommu dma ops..
Robin Murphy Nov. 25, 2019, 4:33 p.m. UTC | #12
On 25/11/2019 7:44 am, Christoph Hellwig wrote:
> On Sat, Nov 23, 2019 at 09:51:08AM -0700, Nathan Chancellor wrote:
>> Just as an FYI, this introduces a warning on arm32 allyesconfig for me:
> 
> I think the dma_limit argument to iommu_dma_alloc_iova should be a u64
> and/or we need to use min_t and open code the zero exception.
> 
> Robin, Nicolas - any opinions?

Yeah, given that it's always held a mask I'm not entirely sure why it 
was ever a dma_addr_t rather than a u64. Unless anyone else is desperate 
to do it I'll get a cleanup patch ready for rc1.

> Also I wonder how this file gets compiled on arm32 given that arm32
> has its own set of iommu dma ops..

As long as the dependencies for CONFIG_IOMMU_DMA are met it can be built 
even when it's not actually used. That said, I might have expected that 
arm allyesconfig ends up with CONFIG_ARCH_DMA_ADDR_T_64BIT=y anyway; I 
guess it must pick some of CONFIG_ARM_LPAE's negative dependencies.

(/me doesn't feel like jumping down the all*config rabbit hole today)

Robin.
Nicolas Saenz Julienne Nov. 26, 2019, 6:51 p.m. UTC | #13
On Mon, 2019-11-25 at 16:33 +0000, Robin Murphy wrote:
> On 25/11/2019 7:44 am, Christoph Hellwig wrote:
> > On Sat, Nov 23, 2019 at 09:51:08AM -0700, Nathan Chancellor wrote:
> > > Just as an FYI, this introduces a warning on arm32 allyesconfig for me:
> > 
> > I think the dma_limit argument to iommu_dma_alloc_iova should be a u64
> > and/or we need to use min_t and open code the zero exception.
> > 
> > Robin, Nicolas - any opinions?
> 
> Yeah, given that it's always held a mask I'm not entirely sure why it 
> was ever a dma_addr_t rather than a u64. Unless anyone else is desperate 
> to do it I'll get a cleanup patch ready for rc1.

Sounds good to me too

Robin, since I started the mess, I'll be happy to do it if it helps offloading
some work from you.

Regards,
Nicolas
Robin Murphy Nov. 26, 2019, 9:45 p.m. UTC | #14
On 2019-11-26 6:51 pm, Nicolas Saenz Julienne wrote:
> On Mon, 2019-11-25 at 16:33 +0000, Robin Murphy wrote:
>> On 25/11/2019 7:44 am, Christoph Hellwig wrote:
>>> On Sat, Nov 23, 2019 at 09:51:08AM -0700, Nathan Chancellor wrote:
>>>> Just as an FYI, this introduces a warning on arm32 allyesconfig for me:
>>>
>>> I think the dma_limit argument to iommu_dma_alloc_iova should be a u64
>>> and/or we need to use min_t and open code the zero exception.
>>>
>>> Robin, Nicolas - any opinions?
>>
>> Yeah, given that it's always held a mask I'm not entirely sure why it
>> was ever a dma_addr_t rather than a u64. Unless anyone else is desperate
>> to do it I'll get a cleanup patch ready for rc1.
> 
> Sounds good to me too
> 
> Robin, since I started the mess, I'll be happy to do it if it helps offloading
> some work from you.

No worries - your change only exposed my original weird decision ;)  On 
second look the patch was literally a trivial one-liner, so I've written 
it up already.

Cheers,
Robin.
diff mbox series

Patch

diff --git a/arch/mips/pci/fixup-sb1250.c b/arch/mips/pci/fixup-sb1250.c
index 8a41b359cf90..40efc990cdce 100644
--- a/arch/mips/pci/fixup-sb1250.c
+++ b/arch/mips/pci/fixup-sb1250.c
@@ -21,22 +21,22 @@  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_SIBYTE, PCI_DEVICE_ID_BCM1250_PCI,
 
 /*
  * The BCM1250, etc. PCI host bridge does not support DAC on its 32-bit
- * bus, so we set the bus's DMA mask accordingly.  However the HT link
+ * bus, so we set the bus's DMA limit accordingly.  However the HT link
  * down the artificial PCI-HT bridge supports 40-bit addressing and the
  * SP1011 HT-PCI bridge downstream supports both DAC and a 64-bit bus
  * width, so we record the PCI-HT bridge's secondary and subordinate bus
- * numbers and do not set the mask for devices present in the inclusive
+ * numbers and do not set the limit for devices present in the inclusive
  * range of those.
  */
-struct sb1250_bus_dma_mask_exclude {
+struct sb1250_bus_dma_limit_exclude {
 	bool set;
 	unsigned char start;
 	unsigned char end;
 };
 
-static int sb1250_bus_dma_mask(struct pci_dev *dev, void *data)
+static int sb1250_bus_dma_limit(struct pci_dev *dev, void *data)
 {
-	struct sb1250_bus_dma_mask_exclude *exclude = data;
+	struct sb1250_bus_dma_limit_exclude *exclude = data;
 	bool exclude_this;
 	bool ht_bridge;
 
@@ -55,7 +55,7 @@  static int sb1250_bus_dma_mask(struct pci_dev *dev, void *data)
 			exclude->start, exclude->end);
 	} else {
 		dev_dbg(&dev->dev, "disabling DAC for device");
-		dev->dev.bus_dma_mask = DMA_BIT_MASK(32);
+		dev->dev.bus_dma_limit = DMA_BIT_MASK(32);
 	}
 
 	return 0;
@@ -63,9 +63,9 @@  static int sb1250_bus_dma_mask(struct pci_dev *dev, void *data)
 
 static void quirk_sb1250_pci_dac(struct pci_dev *dev)
 {
-	struct sb1250_bus_dma_mask_exclude exclude = { .set = false };
+	struct sb1250_bus_dma_limit_exclude exclude = { .set = false };
 
-	pci_walk_bus(dev->bus, sb1250_bus_dma_mask, &exclude);
+	pci_walk_bus(dev->bus, sb1250_bus_dma_limit, &exclude);
 }
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_SIBYTE, PCI_DEVICE_ID_BCM1250_PCI,
 			quirk_sb1250_pci_dac);
diff --git a/arch/powerpc/sysdev/fsl_pci.c b/arch/powerpc/sysdev/fsl_pci.c
index ff0e2b156cb5..617a443d673d 100644
--- a/arch/powerpc/sysdev/fsl_pci.c
+++ b/arch/powerpc/sysdev/fsl_pci.c
@@ -115,8 +115,8 @@  static void pci_dma_dev_setup_swiotlb(struct pci_dev *pdev)
 {
 	struct pci_controller *hose = pci_bus_to_host(pdev->bus);
 
-	pdev->dev.bus_dma_mask =
-		hose->dma_window_base_cur + hose->dma_window_size;
+	pdev->dev.bus_dma_limit =
+		hose->dma_window_base_cur + hose->dma_window_size - 1;
 }
 
 static void setup_swiotlb_ops(struct pci_controller *hose)
@@ -135,7 +135,7 @@  static void fsl_pci_dma_set_mask(struct device *dev, u64 dma_mask)
 	 * mapping that allows addressing any RAM address from across PCI.
 	 */
 	if (dev_is_pci(dev) && dma_mask >= pci64_dma_offset * 2 - 1) {
-		dev->bus_dma_mask = 0;
+		dev->bus_dma_limit = 0;
 		dev->archdata.dma_offset = pci64_dma_offset;
 	}
 }
diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index 57de2ebff7e2..5dcedad21dff 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -140,7 +140,7 @@  rootfs_initcall(pci_iommu_init);
 
 static int via_no_dac_cb(struct pci_dev *pdev, void *data)
 {
-	pdev->dev.bus_dma_mask = DMA_BIT_MASK(32);
+	pdev->dev.bus_dma_limit = DMA_BIT_MASK(32);
 	return 0;
 }
 
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index 9268c12458c8..a03614bd3e1a 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -367,7 +367,7 @@  bool force_dma_unencrypted(struct device *dev)
 	if (sme_active()) {
 		u64 dma_enc_mask = DMA_BIT_MASK(__ffs64(sme_me_mask));
 		u64 dma_dev_mask = min_not_zero(dev->coherent_dma_mask,
-						dev->bus_dma_mask);
+						dev->bus_dma_limit);
 
 		if (dma_dev_mask <= dma_enc_mask)
 			return true;
diff --git a/arch/x86/pci/sta2x11-fixup.c b/arch/x86/pci/sta2x11-fixup.c
index 4a631264b809..c313d784efab 100644
--- a/arch/x86/pci/sta2x11-fixup.c
+++ b/arch/x86/pci/sta2x11-fixup.c
@@ -143,7 +143,7 @@  static void sta2x11_map_ep(struct pci_dev *pdev)
 
 	dev->dma_pfn_offset = PFN_DOWN(-amba_base);
 
-	dev->bus_dma_mask = max_amba_addr;
+	dev->bus_dma_limit = max_amba_addr;
 	pci_set_consistent_dma_mask(pdev, max_amba_addr);
 	pci_set_dma_mask(pdev, max_amba_addr);
 
diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
index 5a7551d060f2..33f71983e001 100644
--- a/drivers/acpi/arm64/iort.c
+++ b/drivers/acpi/arm64/iort.c
@@ -1057,8 +1057,8 @@  static int rc_dma_get_range(struct device *dev, u64 *size)
  */
 void iort_dma_setup(struct device *dev, u64 *dma_addr, u64 *dma_size)
 {
-	u64 mask, dmaaddr = 0, size = 0, offset = 0;
-	int ret, msb;
+	u64 end, mask, dmaaddr = 0, size = 0, offset = 0;
+	int ret;
 
 	/*
 	 * If @dev is expected to be DMA-capable then the bus code that created
@@ -1085,19 +1085,13 @@  void iort_dma_setup(struct device *dev, u64 *dma_addr, u64 *dma_size)
 	}
 
 	if (!ret) {
-		msb = fls64(dmaaddr + size - 1);
 		/*
-		 * Round-up to the power-of-two mask or set
-		 * the mask to the whole 64-bit address space
-		 * in case the DMA region covers the full
-		 * memory window.
+		 * Limit coherent and dma mask based on size retrieved from
+		 * firmware.
 		 */
-		mask = msb == 64 ? U64_MAX : (1ULL << msb) - 1;
-		/*
-		 * Limit coherent and dma mask based on size
-		 * retrieved from firmware.
-		 */
-		dev->bus_dma_mask = mask;
+		end = dmaaddr + size - 1;
+		mask = DMA_BIT_MASK(ilog2(end) + 1);
+		dev->bus_dma_limit = end;
 		dev->coherent_dma_mask = mask;
 		*dev->dma_mask = mask;
 	}
diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
index ec6c64fce74a..4bfd1b14b390 100644
--- a/drivers/ata/ahci.c
+++ b/drivers/ata/ahci.c
@@ -910,7 +910,7 @@  static int ahci_configure_dma_masks(struct pci_dev *pdev, int using_dac)
 	 * value, don't extend it here. This happens on STA2X11, for example.
 	 *
 	 * XXX: manipulating the DMA mask from platform code is completely
-	 * bogus, platform code should use dev->bus_dma_mask instead..
+	 * bogus, platform code should use dev->bus_dma_limit instead..
 	 */
 	if (pdev->dma_mask && pdev->dma_mask < DMA_BIT_MASK(32))
 		return 0;
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 041066f3ec99..0cc702a70a96 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -421,8 +421,7 @@  static dma_addr_t iommu_dma_alloc_iova(struct iommu_domain *domain,
 	if (iova_len < (1 << (IOVA_RANGE_CACHE_MAX_SIZE - 1)))
 		iova_len = roundup_pow_of_two(iova_len);
 
-	if (dev->bus_dma_mask)
-		dma_limit &= dev->bus_dma_mask;
+	dma_limit = min_not_zero(dma_limit, dev->bus_dma_limit);
 
 	if (domain->geometry.force_aperture)
 		dma_limit = min(dma_limit, domain->geometry.aperture_end);
diff --git a/drivers/of/device.c b/drivers/of/device.c
index da8158392010..e9127db7b067 100644
--- a/drivers/of/device.c
+++ b/drivers/of/device.c
@@ -93,7 +93,7 @@  int of_dma_configure(struct device *dev, struct device_node *np, bool force_dma)
 	bool coherent;
 	unsigned long offset;
 	const struct iommu_ops *iommu;
-	u64 mask;
+	u64 mask, end;
 
 	ret = of_dma_get_range(np, &dma_addr, &paddr, &size);
 	if (ret < 0) {
@@ -148,12 +148,13 @@  int of_dma_configure(struct device *dev, struct device_node *np, bool force_dma)
 	 * Limit coherent and dma mask based on size and default mask
 	 * set by the driver.
 	 */
-	mask = DMA_BIT_MASK(ilog2(dma_addr + size - 1) + 1);
+	end = dma_addr + size - 1;
+	mask = DMA_BIT_MASK(ilog2(end) + 1);
 	dev->coherent_dma_mask &= mask;
 	*dev->dma_mask &= mask;
-	/* ...but only set bus mask if we found valid dma-ranges earlier */
+	/* ...but only set bus limit if we found valid dma-ranges earlier */
 	if (!ret)
-		dev->bus_dma_mask = mask;
+		dev->bus_dma_limit = end;
 
 	coherent = of_dma_is_coherent(np);
 	dev_dbg(dev, "device is%sdma coherent\n",
diff --git a/include/linux/device.h b/include/linux/device.h
index 99af366db50d..dada6b4bd7e0 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -1214,8 +1214,8 @@  struct dev_links_info {
  * @coherent_dma_mask: Like dma_mask, but for alloc_coherent mapping as not all
  * 		hardware supports 64-bit addresses for consistent allocations
  * 		such descriptors.
- * @bus_dma_mask: Mask of an upstream bridge or bus which imposes a smaller DMA
- *		limit than the device itself supports.
+ * @bus_dma_limit: Limit of an upstream bridge or bus which imposes a smaller
+ *		DMA limit than the device itself supports.
  * @dma_pfn_offset: offset of DMA memory range relatively of RAM
  * @dma_parms:	A low level driver may set these to teach IOMMU code about
  * 		segment limitations.
@@ -1301,7 +1301,7 @@  struct device {
 					     not all hardware supports
 					     64 bit addresses for consistent
 					     allocations such descriptors. */
-	u64		bus_dma_mask;	/* upstream dma_mask constraint */
+	u64		bus_dma_limit;	/* upstream dma constraint */
 	unsigned long	dma_pfn_offset;
 
 	struct device_dma_parameters *dma_parms;
diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
index c5c5b5ff2371..aa031cb213c3 100644
--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -62,7 +62,7 @@  static inline bool dma_capable(struct device *dev, dma_addr_t addr, size_t size)
 	    min(addr, end) < phys_to_dma(dev, PFN_PHYS(min_low_pfn)))
 		return false;
 
-	return end <= min_not_zero(*dev->dma_mask, dev->bus_dma_mask);
+	return end <= min_not_zero(*dev->dma_mask, dev->bus_dma_limit);
 }
 
 u64 dma_direct_get_required_mask(struct device *dev);
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index a4930310d0c7..330ad58fbf4d 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -694,7 +694,7 @@  static inline int dma_coerce_mask_and_coherent(struct device *dev, u64 mask)
  */
 static inline bool dma_addressing_limited(struct device *dev)
 {
-	return min_not_zero(dma_get_mask(dev), dev->bus_dma_mask) <
+	return min_not_zero(dma_get_mask(dev), dev->bus_dma_limit) <
 			    dma_get_required_mask(dev);
 }
 
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 84cf8be65078..998d48ddfb07 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -27,10 +27,10 @@  static void report_addr(struct device *dev, dma_addr_t dma_addr, size_t size)
 {
 	if (!dev->dma_mask) {
 		dev_err_once(dev, "DMA map on device without dma_mask\n");
-	} else if (*dev->dma_mask >= DMA_BIT_MASK(32) || dev->bus_dma_mask) {
+	} else if (*dev->dma_mask >= DMA_BIT_MASK(32) || dev->bus_dma_limit) {
 		dev_err_once(dev,
-			"overflow %pad+%zu of DMA mask %llx bus mask %llx\n",
-			&dma_addr, size, *dev->dma_mask, dev->bus_dma_mask);
+			"overflow %pad+%zu of DMA mask %llx bus limit %llx\n",
+			&dma_addr, size, *dev->dma_mask, dev->bus_dma_limit);
 	}
 	WARN_ON_ONCE(1);
 }
@@ -57,15 +57,14 @@  u64 dma_direct_get_required_mask(struct device *dev)
 }
 
 static gfp_t __dma_direct_optimal_gfp_mask(struct device *dev, u64 dma_mask,
-		u64 *phys_mask)
+		u64 *phys_limit)
 {
-	if (dev->bus_dma_mask && dev->bus_dma_mask < dma_mask)
-		dma_mask = dev->bus_dma_mask;
+	u64 dma_limit = min_not_zero(dma_mask, dev->bus_dma_limit);
 
 	if (force_dma_unencrypted(dev))
-		*phys_mask = __dma_to_phys(dev, dma_mask);
+		*phys_limit = __dma_to_phys(dev, dma_limit);
 	else
-		*phys_mask = dma_to_phys(dev, dma_mask);
+		*phys_limit = dma_to_phys(dev, dma_limit);
 
 	/*
 	 * Optimistically try the zone that the physical address mask falls
@@ -75,9 +74,9 @@  static gfp_t __dma_direct_optimal_gfp_mask(struct device *dev, u64 dma_mask,
 	 * Note that GFP_DMA32 and GFP_DMA are no ops without the corresponding
 	 * zones.
 	 */
-	if (*phys_mask <= DMA_BIT_MASK(zone_dma_bits))
+	if (*phys_limit <= DMA_BIT_MASK(zone_dma_bits))
 		return GFP_DMA;
-	if (*phys_mask <= DMA_BIT_MASK(32))
+	if (*phys_limit <= DMA_BIT_MASK(32))
 		return GFP_DMA32;
 	return 0;
 }
@@ -85,7 +84,7 @@  static gfp_t __dma_direct_optimal_gfp_mask(struct device *dev, u64 dma_mask,
 static bool dma_coherent_ok(struct device *dev, phys_addr_t phys, size_t size)
 {
 	return phys_to_dma_direct(dev, phys) + size - 1 <=
-			min_not_zero(dev->coherent_dma_mask, dev->bus_dma_mask);
+			min_not_zero(dev->coherent_dma_mask, dev->bus_dma_limit);
 }
 
 struct page *__dma_direct_alloc_pages(struct device *dev, size_t size,
@@ -94,7 +93,7 @@  struct page *__dma_direct_alloc_pages(struct device *dev, size_t size,
 	size_t alloc_size = PAGE_ALIGN(size);
 	int node = dev_to_node(dev);
 	struct page *page = NULL;
-	u64 phys_mask;
+	u64 phys_limit;
 
 	if (attrs & DMA_ATTR_NO_WARN)
 		gfp |= __GFP_NOWARN;
@@ -102,7 +101,7 @@  struct page *__dma_direct_alloc_pages(struct device *dev, size_t size,
 	/* we always manually zero the memory once we are done: */
 	gfp &= ~__GFP_ZERO;
 	gfp |= __dma_direct_optimal_gfp_mask(dev, dev->coherent_dma_mask,
-			&phys_mask);
+			&phys_limit);
 	page = dma_alloc_contiguous(dev, alloc_size, gfp);
 	if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) {
 		dma_free_contiguous(dev, page, alloc_size);
@@ -116,7 +115,7 @@  struct page *__dma_direct_alloc_pages(struct device *dev, size_t size,
 		page = NULL;
 
 		if (IS_ENABLED(CONFIG_ZONE_DMA32) &&
-		    phys_mask < DMA_BIT_MASK(64) &&
+		    phys_limit < DMA_BIT_MASK(64) &&
 		    !(gfp & (GFP_DMA32 | GFP_DMA))) {
 			gfp |= GFP_DMA32;
 			goto again;