diff mbox

drm: modify pages_to_sg prime helper to create optimized SG table

Message ID 1359380328-23058-1-git-send-email-rahul.sharma@samsung.com (mailing list archive)
State New, archived
Headers show

Commit Message

Rahul Sharma Jan. 28, 2013, 1:38 p.m. UTC
It fixes the issue arises due to passing 'nr_pages' in place of 'nents' to
sg_alloc_table. When ARM_HAS_SG_CHAIN is disabled, it is causing failure in
creating SG table for the buffers having more than 204 physical pages i.e.
equal to SG_MAX_SINGLE_ALLOC.

When using sg_alloc_table_from_pages interface, in place of sg_alloc_table,
page list will be passes to get each contiguous section which is represented
by a single entry in the table. For a Contiguous Buffer, number of entries
should be equal to 1.

Following check is causing the failure which is not applicable for Non-Contig
buffers:

	if (WARN_ON_ONCE(nents > max_ents))
		return -EINVAL;

Above patch is well tested for EXYNOS4 and EXYNOS5 for with/wihtout IOMMU
supprot. NOUVEAU and RADEON platforms also depends on drm_prime_pages_to_sg
helper function.

This set is base on "exynos-drm-fixes" branch at
http://git.kernel.org/?p=linux/kernel/git/daeinki/drm-exynos.git

Signed-off-by: Rahul Sharma <rahul.sharma@samsung.com>
---
 drivers/gpu/drm/drm_prime.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

Comments

Aaron Plattner Jan. 29, 2013, 5:10 p.m. UTC | #1
On 01/28/2013 05:38 AM, Rahul Sharma wrote:
> It fixes the issue arises due to passing 'nr_pages' in place of 'nents' to
> sg_alloc_table. When ARM_HAS_SG_CHAIN is disabled, it is causing failure in
> creating SG table for the buffers having more than 204 physical pages i.e.
> equal to SG_MAX_SINGLE_ALLOC.
>
> When using sg_alloc_table_from_pages interface, in place of sg_alloc_table,
> page list will be passes to get each contiguous section which is represented
> by a single entry in the table. For a Contiguous Buffer, number of entries
> should be equal to 1.
>
> Following check is causing the failure which is not applicable for Non-Contig
> buffers:
>
> 	if (WARN_ON_ONCE(nents > max_ents))
> 		return -EINVAL;
>
> Above patch is well tested for EXYNOS4 and EXYNOS5 for with/wihtout IOMMU
> supprot. NOUVEAU and RADEON platforms also depends on drm_prime_pages_to_sg
> helper function.
>
> This set is base on "exynos-drm-fixes" branch at
> http://git.kernel.org/?p=linux/kernel/git/daeinki/drm-exynos.git
>
> Signed-off-by: Rahul Sharma <rahul.sharma@samsung.com>

Reviewed-by: Aaron Plattner <aplattner@nvidia.com>

I also verified that this reduces my 2025-entry sg_table to 6 entries, so

Tested-by: Aaron Plattner <aplattner@nvidia.com>
Rahul Sharma Jan. 31, 2013, 8:38 a.m. UTC | #2
On Tue, Jan 29, 2013 at 10:40 PM, Aaron Plattner <aplattner@nvidia.com> wrote:
> On 01/28/2013 05:38 AM, Rahul Sharma wrote:
>>
>> It fixes the issue arises due to passing 'nr_pages' in place of 'nents' to
>> sg_alloc_table. When ARM_HAS_SG_CHAIN is disabled, it is causing failure
>> in
>> creating SG table for the buffers having more than 204 physical pages i.e.
>> equal to SG_MAX_SINGLE_ALLOC.
>>
>> When using sg_alloc_table_from_pages interface, in place of
>> sg_alloc_table,
>> page list will be passes to get each contiguous section which is
>> represented
>> by a single entry in the table. For a Contiguous Buffer, number of entries
>> should be equal to 1.
>>
>> Following check is causing the failure which is not applicable for
>> Non-Contig
>> buffers:
>>
>>         if (WARN_ON_ONCE(nents > max_ents))
>>                 return -EINVAL;
>>
>> Above patch is well tested for EXYNOS4 and EXYNOS5 for with/wihtout IOMMU
>> supprot. NOUVEAU and RADEON platforms also depends on
>> drm_prime_pages_to_sg
>> helper function.
>>
>> This set is base on "exynos-drm-fixes" branch at
>> http://git.kernel.org/?p=linux/kernel/git/daeinki/drm-exynos.git
>>
>> Signed-off-by: Rahul Sharma <rahul.sharma@samsung.com>
>
>
> Reviewed-by: Aaron Plattner <aplattner@nvidia.com>
>
> I also verified that this reduces my 2025-entry sg_table to 6 entries, so
>
> Tested-by: Aaron Plattner <aplattner@nvidia.com>
>
> --
> Aaron
>

Thanks Aaron,

I want to request stake-holders to review and test this patch for
other platforms.

regards,
Rahul Sharma.

>
>> ---
>>   drivers/gpu/drm/drm_prime.c | 8 ++------
>>   1 file changed, 2 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
>> index 7f12573..072ee08 100644
>> --- a/drivers/gpu/drm/drm_prime.c
>> +++ b/drivers/gpu/drm/drm_prime.c
>> @@ -217,21 +217,17 @@ int drm_prime_fd_to_handle_ioctl(struct drm_device
>> *dev, void *data,
>>   struct sg_table *drm_prime_pages_to_sg(struct page **pages, int
>> nr_pages)
>>   {
>>         struct sg_table *sg = NULL;
>> -       struct scatterlist *iter;
>> -       int i;
>>         int ret;
>>
>>         sg = kmalloc(sizeof(struct sg_table), GFP_KERNEL);
>>         if (!sg)
>>                 goto out;
>>
>> -       ret = sg_alloc_table(sg, nr_pages, GFP_KERNEL);
>> +       ret = sg_alloc_table_from_pages(sg, pages, nr_pages, 0,
>> +                               nr_pages << PAGE_SHIFT, GFP_KERNEL);
>>         if (ret)
>>                 goto out;
>>
>> -       for_each_sg(sg->sgl, iter, nr_pages, i)
>> -               sg_set_page(iter, pages[i], PAGE_SIZE, 0);
>> -
>>         return sg;
>>   out:
>>         kfree(sg);
>>
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel
Daniel Vetter Jan. 31, 2013, 8:53 a.m. UTC | #3
On Thu, Jan 31, 2013 at 9:38 AM, Rahul Sharma <r.sh.open@gmail.com> wrote:
> On Tue, Jan 29, 2013 at 10:40 PM, Aaron Plattner <aplattner@nvidia.com> wrote:
>> On 01/28/2013 05:38 AM, Rahul Sharma wrote:
>>>
>>> It fixes the issue arises due to passing 'nr_pages' in place of 'nents' to
>>> sg_alloc_table. When ARM_HAS_SG_CHAIN is disabled, it is causing failure
>>> in
>>> creating SG table for the buffers having more than 204 physical pages i.e.
>>> equal to SG_MAX_SINGLE_ALLOC.
>>>
>>> When using sg_alloc_table_from_pages interface, in place of
>>> sg_alloc_table,
>>> page list will be passes to get each contiguous section which is
>>> represented
>>> by a single entry in the table. For a Contiguous Buffer, number of entries
>>> should be equal to 1.
>>>
>>> Following check is causing the failure which is not applicable for
>>> Non-Contig
>>> buffers:
>>>
>>>         if (WARN_ON_ONCE(nents > max_ents))
>>>                 return -EINVAL;
>>>
>>> Above patch is well tested for EXYNOS4 and EXYNOS5 for with/wihtout IOMMU
>>> supprot. NOUVEAU and RADEON platforms also depends on
>>> drm_prime_pages_to_sg
>>> helper function.
>>>
>>> This set is base on "exynos-drm-fixes" branch at
>>> http://git.kernel.org/?p=linux/kernel/git/daeinki/drm-exynos.git
>>>
>>> Signed-off-by: Rahul Sharma <rahul.sharma@samsung.com>
>>
>>
>> Reviewed-by: Aaron Plattner <aplattner@nvidia.com>
>>
>> I also verified that this reduces my 2025-entry sg_table to 6 entries, so
>>
>> Tested-by: Aaron Plattner <aplattner@nvidia.com>
>>
>> --
>> Aaron
>>
>
> Thanks Aaron,
>
> I want to request stake-holders to review and test this patch for
> other platforms.

Iirc the old i915 dma_buf import code presumed that each sg entry
points to exactly one page. Yeah, Dave cut a few corners in the
initial implementation. Since Chris' rework to use sg_tables
internally in i915.ko I think we should be safe, but would need to do
a full audit of the code. No idea what the exact situation in ttm is.

So I think step one is for you to crawl through the existing drm prime
drivers and check that you don't break any hidden assumptions about
this.

For the patch itself I think there's now a generic pages_to_sg helper
in the dma core which does exactly what you want it to do. I can dig
it out if you can't find it.

Cheers, Daniel
Rahul Sharma Jan. 31, 2013, 11:54 a.m. UTC | #4
On Thu, Jan 31, 2013 at 2:23 PM, Daniel Vetter <daniel@ffwll.ch> wrote:
> On Thu, Jan 31, 2013 at 9:38 AM, Rahul Sharma <r.sh.open@gmail.com> wrote:
>> On Tue, Jan 29, 2013 at 10:40 PM, Aaron Plattner <aplattner@nvidia.com> wrote:
>>> On 01/28/2013 05:38 AM, Rahul Sharma wrote:
>>>>
>>>> It fixes the issue arises due to passing 'nr_pages' in place of 'nents' to
>>>> sg_alloc_table. When ARM_HAS_SG_CHAIN is disabled, it is causing failure
>>>> in
>>>> creating SG table for the buffers having more than 204 physical pages i.e.
>>>> equal to SG_MAX_SINGLE_ALLOC.
>>>>
>>>> When using sg_alloc_table_from_pages interface, in place of
>>>> sg_alloc_table,
>>>> page list will be passes to get each contiguous section which is
>>>> represented
>>>> by a single entry in the table. For a Contiguous Buffer, number of entries
>>>> should be equal to 1.
>>>>
>>>> Following check is causing the failure which is not applicable for
>>>> Non-Contig
>>>> buffers:
>>>>
>>>>         if (WARN_ON_ONCE(nents > max_ents))
>>>>                 return -EINVAL;
>>>>
>>>> Above patch is well tested for EXYNOS4 and EXYNOS5 for with/wihtout IOMMU
>>>> supprot. NOUVEAU and RADEON platforms also depends on
>>>> drm_prime_pages_to_sg
>>>> helper function.
>>>>
>>>> This set is base on "exynos-drm-fixes" branch at
>>>> http://git.kernel.org/?p=linux/kernel/git/daeinki/drm-exynos.git
>>>>
>>>> Signed-off-by: Rahul Sharma <rahul.sharma@samsung.com>
>>>
>>>
>>> Reviewed-by: Aaron Plattner <aplattner@nvidia.com>
>>>
>>> I also verified that this reduces my 2025-entry sg_table to 6 entries, so
>>>
>>> Tested-by: Aaron Plattner <aplattner@nvidia.com>
>>>
>>> --
>>> Aaron
>>>
>>
>> Thanks Aaron,
>>
>> I want to request stake-holders to review and test this patch for
>> other platforms.
>
> Iirc the old i915 dma_buf import code presumed that each sg entry
> points to exactly one page. Yeah, Dave cut a few corners in the
> initial implementation. Since Chris' rework to use sg_tables
> internally in i915.ko I think we should be safe, but would need to do
> a full audit of the code. No idea what the exact situation in ttm is.
>
> So I think step one is for you to crawl through the existing drm prime
> drivers and check that you don't break any hidden assumptions about
> this.

Thanks Daniel,

I have parsed the related code and it looks fine to me. I couldn't find
any code section, expecting sg-tables with single-page sgl entries. I
just want to ensure again that it doesn't cause any side effects on
various platforms.

>
> For the patch itself I think there's now a generic pages_to_sg helper
> in the dma core which does exactly what you want it to do. I can dig
> it out if you can't find it.
>

Sorry, I din't get this part. Please elaborate a bit.

regards,
Rahul Sharma.

> Cheers, Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
Daniel Vetter Jan. 31, 2013, 12:17 p.m. UTC | #5
On Thu, Jan 31, 2013 at 12:54 PM, Rahul Sharma <r.sh.open@gmail.com> wrote:
> I have parsed the related code and it looks fine to me. I couldn't find
> any code section, expecting sg-tables with single-page sgl entries. I
> just want to ensure again that it doesn't cause any side effects on
> various platforms.

Just chatted with Chris Wilson on our team, and at last i915.ko _has_
code paths which rely on this. See the reloc handling offset
computations in i915_gem_execbuf.c. I haven't checked
ttm/radeon/nouveau/vmwgfx simply because I don't have the time right
now, and I know that doing this carefully will blow through a few
days. So this patch will break prime buffer sharing on the desktop.

The other thing is that only doing sg entry coalescing for
prime/dma_buf will lead to a QA problem, since all the normal sg
tables still have a 1:1 relationship. So imo the right way to move
forward with this is to convert the various sg table constructors in
i915/ttm/radeon/nouveau/... over to coalesce pages. This allows us to
fix any fallout step-by-step.

Then, once each driver is ready for this, we can merge your patch.

>> For the patch itself I think there's now a generic pages_to_sg helper
>> in the dma core which does exactly what you want it to do. I can dig
>> it out if you can't find it.
>>
>
> Sorry, I din't get this part. Please elaborate a bit.

See sg_alloc_table_from_pages in lib/scatterlist.c introduced with:

commit efc42bc98058a36d761b16a114823db1a902ed05
Author: Tomasz Stanislawski <t.stanislaws@samsung.com>
Date:   Mon Jun 18 09:25:01 2012 +0200

    scatterlist: add sg_alloc_table_from_pages function

Cheers, Daniel
Rahul Sharma Jan. 31, 2013, 1:26 p.m. UTC | #6
On Thu, Jan 31, 2013 at 5:47 PM, Daniel Vetter <daniel@ffwll.ch> wrote:
> On Thu, Jan 31, 2013 at 12:54 PM, Rahul Sharma <r.sh.open@gmail.com> wrote:
>> I have parsed the related code and it looks fine to me. I couldn't find
>> any code section, expecting sg-tables with single-page sgl entries. I
>> just want to ensure again that it doesn't cause any side effects on
>> various platforms.
>
> Just chatted with Chris Wilson on our team, and at last i915.ko _has_
> code paths which rely on this. See the reloc handling offset
> computations in i915_gem_execbuf.c. I haven't checked
> ttm/radeon/nouveau/vmwgfx simply because I don't have the time right
> now, and I know that doing this carefully will blow through a few
> days. So this patch will break prime buffer sharing on the desktop.
>
> The other thing is that only doing sg entry coalescing for
> prime/dma_buf will lead to a QA problem, since all the normal sg
> tables still have a 1:1 relationship. So imo the right way to move
> forward with this is to convert the various sg table constructors in
> i915/ttm/radeon/nouveau/... over to coalesce pages. This allows us to
> fix any fallout step-by-step.
>
> Then, once each driver is ready for this, we can merge your patch.
>

Looks fine to me.

>>> For the patch itself I think there's now a generic pages_to_sg helper
>>> in the dma core which does exactly what you want it to do. I can dig
>>> it out if you can't find it.
>>>
>>
>> Sorry, I din't get this part. Please elaborate a bit.
>
> See sg_alloc_table_from_pages in lib/scatterlist.c introduced with:
>
> commit efc42bc98058a36d761b16a114823db1a902ed05
> Author: Tomasz Stanislawski <t.stanislaws@samsung.com>
> Date:   Mon Jun 18 09:25:01 2012 +0200
>
>     scatterlist: add sg_alloc_table_from_pages function
>

Yes, my patch is using sg_alloc_table_from_pages.

regards,
Rahul Sharma.

> Cheers, Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
Daniel Vetter Jan. 31, 2013, 2:24 p.m. UTC | #7
On Thu, Jan 31, 2013 at 2:26 PM, Rahul Sharma <r.sh.open@gmail.com> wrote:
>> The other thing is that only doing sg entry coalescing for
>> prime/dma_buf will lead to a QA problem, since all the normal sg
>> tables still have a 1:1 relationship. So imo the right way to move
>> forward with this is to convert the various sg table constructors in
>> i915/ttm/radeon/nouveau/... over to coalesce pages. This allows us to
>> fix any fallout step-by-step.
>>
>> Then, once each driver is ready for this, we can merge your patch.
>
> Looks fine to me.

I've looked some more at the drivers, and it seems like ttm uses the
sg tables only for dma_buf imported objects. Also, all drivers already
use drm_prime_sg_to_page_addr_arrays, which does the right thing. So
ttm drivers should indeed be all fine (I've stopped chaising
callchains before hitting ttm_tt_populate, where the magic happens).

So that seems to only leave i915, which has a few backing storage
walkers in i915_gem_execbuffer.c and the sg constructor
i915_gem_object_get_pages_gtt in i915_gem.c. That one is a bit a pain
since it doesn't start out with a simple struct page * array. But I
think we could just coalesce sg entries anyway without reaping any
size benefits for the allocate sg table, to ensure that i915 doesn't
break accidentally when facing such an sg table.

I'm rather busy the next few days, but should be able to look into
patches later on.

>>>> For the patch itself I think there's now a generic pages_to_sg helper
>>>> in the dma core which does exactly what you want it to do. I can dig
>>>> it out if you can't find it.
>>>>
>>>
>>> Sorry, I din't get this part. Please elaborate a bit.
>>
>> See sg_alloc_table_from_pages in lib/scatterlist.c introduced with:
>>
>> commit efc42bc98058a36d761b16a114823db1a902ed05
>> Author: Tomasz Stanislawski <t.stanislaws@samsung.com>
>> Date:   Mon Jun 18 09:25:01 2012 +0200
>>
>>     scatterlist: add sg_alloc_table_from_pages function
>>
>
> Yes, my patch is using sg_alloc_table_from_pages

Oops, I've totally missed that. My apologies for the confusion.

Thanks, Daniel
Daniel Vetter March 19, 2013, 9:09 a.m. UTC | #8
On Tue, Jan 29, 2013 at 09:10:40AM -0800, Aaron Plattner wrote:
> On 01/28/2013 05:38 AM, Rahul Sharma wrote:
> >It fixes the issue arises due to passing 'nr_pages' in place of 'nents' to
> >sg_alloc_table. When ARM_HAS_SG_CHAIN is disabled, it is causing failure in
> >creating SG table for the buffers having more than 204 physical pages i.e.
> >equal to SG_MAX_SINGLE_ALLOC.
> >
> >When using sg_alloc_table_from_pages interface, in place of sg_alloc_table,
> >page list will be passes to get each contiguous section which is represented
> >by a single entry in the table. For a Contiguous Buffer, number of entries
> >should be equal to 1.
> >
> >Following check is causing the failure which is not applicable for Non-Contig
> >buffers:
> >
> >	if (WARN_ON_ONCE(nents > max_ents))
> >		return -EINVAL;
> >
> >Above patch is well tested for EXYNOS4 and EXYNOS5 for with/wihtout IOMMU
> >supprot. NOUVEAU and RADEON platforms also depends on drm_prime_pages_to_sg
> >helper function.
> >
> >This set is base on "exynos-drm-fixes" branch at
> >http://git.kernel.org/?p=linux/kernel/git/daeinki/drm-exynos.git
> >
> >Signed-off-by: Rahul Sharma <rahul.sharma@samsung.com>
> 
> Reviewed-by: Aaron Plattner <aplattner@nvidia.com>
> 
> I also verified that this reduces my 2025-entry sg_table to 6 entries, so
> 
> Tested-by: Aaron Plattner <aplattner@nvidia.com>

I've just merged the drm/i915 fixes and this patch on top to
drm-intel-next-queued. I'll send the pull request to Dave somewhen next
week, presuming nothing unexpected blows up. Thanks for the patch&review.
-Daniel
diff mbox

Patch

diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
index 7f12573..072ee08 100644
--- a/drivers/gpu/drm/drm_prime.c
+++ b/drivers/gpu/drm/drm_prime.c
@@ -217,21 +217,17 @@  int drm_prime_fd_to_handle_ioctl(struct drm_device *dev, void *data,
 struct sg_table *drm_prime_pages_to_sg(struct page **pages, int nr_pages)
 {
 	struct sg_table *sg = NULL;
-	struct scatterlist *iter;
-	int i;
 	int ret;
 
 	sg = kmalloc(sizeof(struct sg_table), GFP_KERNEL);
 	if (!sg)
 		goto out;
 
-	ret = sg_alloc_table(sg, nr_pages, GFP_KERNEL);
+	ret = sg_alloc_table_from_pages(sg, pages, nr_pages, 0,
+				nr_pages << PAGE_SHIFT, GFP_KERNEL);
 	if (ret)
 		goto out;
 
-	for_each_sg(sg->sgl, iter, nr_pages, i)
-		sg_set_page(iter, pages[i], PAGE_SIZE, 0);
-
 	return sg;
 out:
 	kfree(sg);