diff mbox

[V2] drm/amdgpu: limit DMA size to PAGE_SIZE for scatter-gather buffers

Message ID 1523394001-4615-1-git-send-email-okaya@codeaurora.org (mailing list archive)
State New, archived
Headers show

Commit Message

Sinan Kaya April 10, 2018, 8:59 p.m. UTC
Code is expecing to observe the same number of buffers returned from
dma_map_sg() function compared to sg_alloc_table_from_pages(). This
doesn't hold true universally especially for systems with IOMMU.

IOMMU driver tries to combine buffers into a single DMA address as much
as it can. The right thing is to tell the DMA layer how much combining
IOMMU can do.

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c | 2 +-
 drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c | 1 +
 drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c | 1 +
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 1 +
 4 files changed, 4 insertions(+), 1 deletion(-)

Comments

Robin Murphy April 11, 2018, 12:03 p.m. UTC | #1
On 10/04/18 21:59, Sinan Kaya wrote:
> Code is expecing to observe the same number of buffers returned from
> dma_map_sg() function compared to sg_alloc_table_from_pages(). This
> doesn't hold true universally especially for systems with IOMMU.

So why not fix said code? It's clearly not a real hardware limitation, 
and the map_sg() APIs have potentially returned fewer than nents since 
forever, so there's really no excuse.

> IOMMU driver tries to combine buffers into a single DMA address as much
> as it can. The right thing is to tell the DMA layer how much combining
> IOMMU can do.

Disagree; this is a dodgy hack, since you'll now end up passing 
scatterlists into dma_map_sg() which already violate max_seg_size to 
begin with, and I think a conscientious DMA API implementation would be 
at rights to fail the mapping for that reason (I know arm64 happens not 
to, but that was a deliberate design decision to make my life easier at 
the time).

As a short-term fix, at least do something like what i915 does and 
constrain the table allocation to the desired segment size as well, so 
things remain self-consistent. But still never claim that faking a 
hardware constraint as a workaround for a driver shortcoming is "the 
right thing to do" ;)

Robin.

> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
> ---
>   drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c | 2 +-
>   drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c | 1 +
>   drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c | 1 +
>   drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 1 +
>   4 files changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
> index 8e28270..1b031eb 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
> @@ -851,7 +851,7 @@ static int gmc_v6_0_sw_init(void *handle)
>   		pci_set_consistent_dma_mask(adev->pdev, DMA_BIT_MASK(32));
>   		dev_warn(adev->dev, "amdgpu: No coherent DMA available.\n");
>   	}
> -
> +	dma_set_max_seg_size(adev->dev, PAGE_SIZE);
>   	r = gmc_v6_0_init_microcode(adev);
>   	if (r) {
>   		dev_err(adev->dev, "Failed to load mc firmware!\n");
> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
> index 86e9d682..0a4b2cc1 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
> @@ -999,6 +999,7 @@ static int gmc_v7_0_sw_init(void *handle)
>   		pci_set_consistent_dma_mask(adev->pdev, DMA_BIT_MASK(32));
>   		pr_warn("amdgpu: No coherent DMA available\n");
>   	}
> +	dma_set_max_seg_size(adev->dev, PAGE_SIZE);
>   
>   	r = gmc_v7_0_init_microcode(adev);
>   	if (r) {
> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
> index 9a813d8..b171529 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
> @@ -1096,6 +1096,7 @@ static int gmc_v8_0_sw_init(void *handle)
>   		pci_set_consistent_dma_mask(adev->pdev, DMA_BIT_MASK(32));
>   		pr_warn("amdgpu: No coherent DMA available\n");
>   	}
> +	dma_set_max_seg_size(adev->dev, PAGE_SIZE);
>   
>   	r = gmc_v8_0_init_microcode(adev);
>   	if (r) {
> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> index 3b7e7af..36e658ab 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> @@ -855,6 +855,7 @@ static int gmc_v9_0_sw_init(void *handle)
>   		pci_set_consistent_dma_mask(adev->pdev, DMA_BIT_MASK(32));
>   		printk(KERN_WARNING "amdgpu: No coherent DMA available.\n");
>   	}
> +	dma_set_max_seg_size(adev->dev, PAGE_SIZE);
>   
>   	r = gmc_v9_0_mc_init(adev);
>   	if (r)
>
Sinan Kaya April 11, 2018, 2:33 p.m. UTC | #2
On 4/11/2018 8:03 AM, Robin Murphy wrote:
> On 10/04/18 21:59, Sinan Kaya wrote:
>> Code is expecing to observe the same number of buffers returned from
>> dma_map_sg() function compared to sg_alloc_table_from_pages(). This
>> doesn't hold true universally especially for systems with IOMMU.
> 
> So why not fix said code? It's clearly not a real hardware limitation, and the map_sg() APIs have potentially returned fewer than nents since forever, so there's really no excuse.

Sure, I'll take a better fix if there is one.

> 
>> IOMMU driver tries to combine buffers into a single DMA address as much
>> as it can. The right thing is to tell the DMA layer how much combining
>> IOMMU can do.
> 
> Disagree; this is a dodgy hack, since you'll now end up passing scatterlists into dma_map_sg() which already violate max_seg_size to begin with, and I think a conscientious DMA API implementation would be at rights to fail the mapping for that reason (I know arm64 happens not to, but that was a deliberate design decision to make my life easier at the time).
> 
> As a short-term fix, at least do something like what i915 does and constrain the table allocation to the desired segment size as well, so things remain self-consistent. But still never claim that faking a hardware constraint as a workaround for a driver shortcoming is "the right thing to do" ;)

You are asking for something like this from here, right?

https://elixir.bootlin.com/linux/v4.16.1/source/drivers/gpu/drm/i915/i915_gem_dmabuf.c#L58


	ret = sg_alloc_table(st, obj->mm.pages->nents, GFP_KERNEL);
	if (ret)
		goto err_free;

	src = obj->mm.pages->sgl;
	dst = st->sgl;
	for (i = 0; i < obj->mm.pages->nents; i++) {
		sg_set_page(dst, sg_page(src), src->length, 0);
		dst = sg_next(dst);
		src = sg_next(src);
	}

This seems to allocate the scatter gather list and fill it in manually before passing it
to dma_map_sg(). I'll give it a try. 

Just double checking.

> 
> Robin.
> 
>> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
Robin Murphy April 11, 2018, 3:32 p.m. UTC | #3
On 11/04/18 15:33, Sinan Kaya wrote:
> On 4/11/2018 8:03 AM, Robin Murphy wrote:
>> On 10/04/18 21:59, Sinan Kaya wrote:
>>> Code is expecing to observe the same number of buffers returned
>>> from dma_map_sg() function compared to
>>> sg_alloc_table_from_pages(). This doesn't hold true universally
>>> especially for systems with IOMMU.
>> 
>> So why not fix said code? It's clearly not a real hardware
>> limitation, and the map_sg() APIs have potentially returned fewer
>> than nents since forever, so there's really no excuse.
> 
> Sure, I'll take a better fix if there is one.
> 
>> 
>>> IOMMU driver tries to combine buffers into a single DMA address
>>> as much as it can. The right thing is to tell the DMA layer how
>>> much combining IOMMU can do.
>> 
>> Disagree; this is a dodgy hack, since you'll now end up passing
>> scatterlists into dma_map_sg() which already violate max_seg_size
>> to begin with, and I think a conscientious DMA API implementation
>> would be at rights to fail the mapping for that reason (I know
>> arm64 happens not to, but that was a deliberate design decision to
>> make my life easier at the time).
>> 
>> As a short-term fix, at least do something like what i915 does and
>> constrain the table allocation to the desired segment size as well,
>> so things remain self-consistent. But still never claim that faking
>> a hardware constraint as a workaround for a driver shortcoming is
>> "the right thing to do" ;)
> 
> You are asking for something like this from here, right?
> 
> https://elixir.bootlin.com/linux/v4.16.1/source/drivers/gpu/drm/i915/i915_gem_dmabuf.c#L58

I just looked for callers of __sg_alloc_table_from_pages() with a limit 
other than SCATTERLIST_MAX_SEGMENT, and found 
__i915_gem_userptr_alloc_pages(), which at first glance appears to be 
doing something vaguely similar to amdgpu_ttm_tt_pin_userptr().

Robin.
Christoph Hellwig April 12, 2018, 6:26 a.m. UTC | #4
On Wed, Apr 11, 2018 at 01:03:59PM +0100, Robin Murphy wrote:
> On 10/04/18 21:59, Sinan Kaya wrote:
>> Code is expecing to observe the same number of buffers returned from
>> dma_map_sg() function compared to sg_alloc_table_from_pages(). This
>> doesn't hold true universally especially for systems with IOMMU.
>
> So why not fix said code? It's clearly not a real hardware limitation, and 
> the map_sg() APIs have potentially returned fewer than nents since forever, 
> so there's really no excuse.

Yes, relying on dma_map_sg returning the same number of entries as passed
it is completely bogus.

>> IOMMU driver tries to combine buffers into a single DMA address as much
>> as it can. The right thing is to tell the DMA layer how much combining
>> IOMMU can do.
>
> Disagree; this is a dodgy hack, since you'll now end up passing 
> scatterlists into dma_map_sg() which already violate max_seg_size to begin 
> with, and I think a conscientious DMA API implementation would be at rights 
> to fail the mapping for that reason (I know arm64 happens not to, but that 
> was a deliberate design decision to make my life easier at the time).

Agreed.
Christian König April 12, 2018, 9:42 a.m. UTC | #5
Am 12.04.2018 um 08:26 schrieb Christoph Hellwig:
> On Wed, Apr 11, 2018 at 01:03:59PM +0100, Robin Murphy wrote:
>> On 10/04/18 21:59, Sinan Kaya wrote:
>>> Code is expecing to observe the same number of buffers returned from
>>> dma_map_sg() function compared to sg_alloc_table_from_pages(). This
>>> doesn't hold true universally especially for systems with IOMMU.
>> So why not fix said code? It's clearly not a real hardware limitation, and
>> the map_sg() APIs have potentially returned fewer than nents since forever,
>> so there's really no excuse.
> Yes, relying on dma_map_sg returning the same number of entries as passed
> it is completely bogus.

I agree that the common DRM functions should be able to deal with both, 
but we should let the driver side decide if it wants multiple addresses 
combined or not.

>
>>> IOMMU driver tries to combine buffers into a single DMA address as much
>>> as it can. The right thing is to tell the DMA layer how much combining
>>> IOMMU can do.
>> Disagree; this is a dodgy hack, since you'll now end up passing
>> scatterlists into dma_map_sg() which already violate max_seg_size to begin
>> with, and I think a conscientious DMA API implementation would be at rights
>> to fail the mapping for that reason (I know arm64 happens not to, but that
>> was a deliberate design decision to make my life easier at the time).
> Agreed.

Sounds like my understanding of max_seg_size is incorrect, what exactly 
should that describe?

Thanks,
Christian.
Robin Murphy April 12, 2018, 1:51 p.m. UTC | #6
On 12/04/18 10:42, Christian König wrote:
> Am 12.04.2018 um 08:26 schrieb Christoph Hellwig:
>> On Wed, Apr 11, 2018 at 01:03:59PM +0100, Robin Murphy wrote:
>>> On 10/04/18 21:59, Sinan Kaya wrote:
>>>> Code is expecing to observe the same number of buffers returned from
>>>> dma_map_sg() function compared to sg_alloc_table_from_pages(). This
>>>> doesn't hold true universally especially for systems with IOMMU.
>>> So why not fix said code? It's clearly not a real hardware 
>>> limitation, and
>>> the map_sg() APIs have potentially returned fewer than nents since 
>>> forever,
>>> so there's really no excuse.
>> Yes, relying on dma_map_sg returning the same number of entries as passed
>> it is completely bogus.
> 
> I agree that the common DRM functions should be able to deal with both, 
> but we should let the driver side decide if it wants multiple addresses 
> combined or not.
> 
>>
>>>> IOMMU driver tries to combine buffers into a single DMA address as much
>>>> as it can. The right thing is to tell the DMA layer how much combining
>>>> IOMMU can do.
>>> Disagree; this is a dodgy hack, since you'll now end up passing
>>> scatterlists into dma_map_sg() which already violate max_seg_size to 
>>> begin
>>> with, and I think a conscientious DMA API implementation would be at 
>>> rights
>>> to fail the mapping for that reason (I know arm64 happens not to, but 
>>> that
>>> was a deliberate design decision to make my life easier at the time).
>> Agreed.
> 
> Sounds like my understanding of max_seg_size is incorrect, what exactly 
> should that describe?

The segment size and boundary mask are there to represent a device's 
hardware scatter-gather capabilities - a lot of things have limits on 
the size and alignment of the data pointed to by a single descriptor 
(e.g. an xHCI TRB, where the data buffer must not cross a 64KB boundary) 
- although they can also be relevant to non-scatter-gather devices if 
they just have limits on how big/aligned a single DMA transfer can be.

Robin.
diff mbox

Patch

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
index 8e28270..1b031eb 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
@@ -851,7 +851,7 @@  static int gmc_v6_0_sw_init(void *handle)
 		pci_set_consistent_dma_mask(adev->pdev, DMA_BIT_MASK(32));
 		dev_warn(adev->dev, "amdgpu: No coherent DMA available.\n");
 	}
-
+	dma_set_max_seg_size(adev->dev, PAGE_SIZE);
 	r = gmc_v6_0_init_microcode(adev);
 	if (r) {
 		dev_err(adev->dev, "Failed to load mc firmware!\n");
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
index 86e9d682..0a4b2cc1 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
@@ -999,6 +999,7 @@  static int gmc_v7_0_sw_init(void *handle)
 		pci_set_consistent_dma_mask(adev->pdev, DMA_BIT_MASK(32));
 		pr_warn("amdgpu: No coherent DMA available\n");
 	}
+	dma_set_max_seg_size(adev->dev, PAGE_SIZE);
 
 	r = gmc_v7_0_init_microcode(adev);
 	if (r) {
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
index 9a813d8..b171529 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
@@ -1096,6 +1096,7 @@  static int gmc_v8_0_sw_init(void *handle)
 		pci_set_consistent_dma_mask(adev->pdev, DMA_BIT_MASK(32));
 		pr_warn("amdgpu: No coherent DMA available\n");
 	}
+	dma_set_max_seg_size(adev->dev, PAGE_SIZE);
 
 	r = gmc_v8_0_init_microcode(adev);
 	if (r) {
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index 3b7e7af..36e658ab 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -855,6 +855,7 @@  static int gmc_v9_0_sw_init(void *handle)
 		pci_set_consistent_dma_mask(adev->pdev, DMA_BIT_MASK(32));
 		printk(KERN_WARNING "amdgpu: No coherent DMA available.\n");
 	}
+	dma_set_max_seg_size(adev->dev, PAGE_SIZE);
 
 	r = gmc_v9_0_mc_init(adev);
 	if (r)