Message ID | 1359380328-23058-1-git-send-email-rahul.sharma@samsung.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 01/28/2013 05:38 AM, Rahul Sharma wrote: > It fixes the issue arises due to passing 'nr_pages' in place of 'nents' to > sg_alloc_table. When ARM_HAS_SG_CHAIN is disabled, it is causing failure in > creating SG table for the buffers having more than 204 physical pages i.e. > equal to SG_MAX_SINGLE_ALLOC. > > When using sg_alloc_table_from_pages interface, in place of sg_alloc_table, > page list will be passes to get each contiguous section which is represented > by a single entry in the table. For a Contiguous Buffer, number of entries > should be equal to 1. > > Following check is causing the failure which is not applicable for Non-Contig > buffers: > > if (WARN_ON_ONCE(nents > max_ents)) > return -EINVAL; > > Above patch is well tested for EXYNOS4 and EXYNOS5 for with/wihtout IOMMU > supprot. NOUVEAU and RADEON platforms also depends on drm_prime_pages_to_sg > helper function. > > This set is base on "exynos-drm-fixes" branch at > http://git.kernel.org/?p=linux/kernel/git/daeinki/drm-exynos.git > > Signed-off-by: Rahul Sharma <rahul.sharma@samsung.com> Reviewed-by: Aaron Plattner <aplattner@nvidia.com> I also verified that this reduces my 2025-entry sg_table to 6 entries, so Tested-by: Aaron Plattner <aplattner@nvidia.com>
On Tue, Jan 29, 2013 at 10:40 PM, Aaron Plattner <aplattner@nvidia.com> wrote: > On 01/28/2013 05:38 AM, Rahul Sharma wrote: >> >> It fixes the issue arises due to passing 'nr_pages' in place of 'nents' to >> sg_alloc_table. When ARM_HAS_SG_CHAIN is disabled, it is causing failure >> in >> creating SG table for the buffers having more than 204 physical pages i.e. >> equal to SG_MAX_SINGLE_ALLOC. >> >> When using sg_alloc_table_from_pages interface, in place of >> sg_alloc_table, >> page list will be passes to get each contiguous section which is >> represented >> by a single entry in the table. For a Contiguous Buffer, number of entries >> should be equal to 1. >> >> Following check is causing the failure which is not applicable for >> Non-Contig >> buffers: >> >> if (WARN_ON_ONCE(nents > max_ents)) >> return -EINVAL; >> >> Above patch is well tested for EXYNOS4 and EXYNOS5 for with/wihtout IOMMU >> supprot. NOUVEAU and RADEON platforms also depends on >> drm_prime_pages_to_sg >> helper function. >> >> This set is base on "exynos-drm-fixes" branch at >> http://git.kernel.org/?p=linux/kernel/git/daeinki/drm-exynos.git >> >> Signed-off-by: Rahul Sharma <rahul.sharma@samsung.com> > > > Reviewed-by: Aaron Plattner <aplattner@nvidia.com> > > I also verified that this reduces my 2025-entry sg_table to 6 entries, so > > Tested-by: Aaron Plattner <aplattner@nvidia.com> > > -- > Aaron > Thanks Aaron, I want to request stake-holders to review and test this patch for other platforms. regards, Rahul Sharma. > >> --- >> drivers/gpu/drm/drm_prime.c | 8 ++------ >> 1 file changed, 2 insertions(+), 6 deletions(-) >> >> diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c >> index 7f12573..072ee08 100644 >> --- a/drivers/gpu/drm/drm_prime.c >> +++ b/drivers/gpu/drm/drm_prime.c >> @@ -217,21 +217,17 @@ int drm_prime_fd_to_handle_ioctl(struct drm_device >> *dev, void *data, >> struct sg_table *drm_prime_pages_to_sg(struct page **pages, int >> nr_pages) >> { >> struct sg_table *sg = NULL; >> - struct scatterlist *iter; >> - int i; >> int ret; >> >> sg = kmalloc(sizeof(struct sg_table), GFP_KERNEL); >> if (!sg) >> goto out; >> >> - ret = sg_alloc_table(sg, nr_pages, GFP_KERNEL); >> + ret = sg_alloc_table_from_pages(sg, pages, nr_pages, 0, >> + nr_pages << PAGE_SHIFT, GFP_KERNEL); >> if (ret) >> goto out; >> >> - for_each_sg(sg->sgl, iter, nr_pages, i) >> - sg_set_page(iter, pages[i], PAGE_SIZE, 0); >> - >> return sg; >> out: >> kfree(sg); >> > > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/dri-devel
On Thu, Jan 31, 2013 at 9:38 AM, Rahul Sharma <r.sh.open@gmail.com> wrote: > On Tue, Jan 29, 2013 at 10:40 PM, Aaron Plattner <aplattner@nvidia.com> wrote: >> On 01/28/2013 05:38 AM, Rahul Sharma wrote: >>> >>> It fixes the issue arises due to passing 'nr_pages' in place of 'nents' to >>> sg_alloc_table. When ARM_HAS_SG_CHAIN is disabled, it is causing failure >>> in >>> creating SG table for the buffers having more than 204 physical pages i.e. >>> equal to SG_MAX_SINGLE_ALLOC. >>> >>> When using sg_alloc_table_from_pages interface, in place of >>> sg_alloc_table, >>> page list will be passes to get each contiguous section which is >>> represented >>> by a single entry in the table. For a Contiguous Buffer, number of entries >>> should be equal to 1. >>> >>> Following check is causing the failure which is not applicable for >>> Non-Contig >>> buffers: >>> >>> if (WARN_ON_ONCE(nents > max_ents)) >>> return -EINVAL; >>> >>> Above patch is well tested for EXYNOS4 and EXYNOS5 for with/wihtout IOMMU >>> supprot. NOUVEAU and RADEON platforms also depends on >>> drm_prime_pages_to_sg >>> helper function. >>> >>> This set is base on "exynos-drm-fixes" branch at >>> http://git.kernel.org/?p=linux/kernel/git/daeinki/drm-exynos.git >>> >>> Signed-off-by: Rahul Sharma <rahul.sharma@samsung.com> >> >> >> Reviewed-by: Aaron Plattner <aplattner@nvidia.com> >> >> I also verified that this reduces my 2025-entry sg_table to 6 entries, so >> >> Tested-by: Aaron Plattner <aplattner@nvidia.com> >> >> -- >> Aaron >> > > Thanks Aaron, > > I want to request stake-holders to review and test this patch for > other platforms. Iirc the old i915 dma_buf import code presumed that each sg entry points to exactly one page. Yeah, Dave cut a few corners in the initial implementation. Since Chris' rework to use sg_tables internally in i915.ko I think we should be safe, but would need to do a full audit of the code. No idea what the exact situation in ttm is. So I think step one is for you to crawl through the existing drm prime drivers and check that you don't break any hidden assumptions about this. For the patch itself I think there's now a generic pages_to_sg helper in the dma core which does exactly what you want it to do. I can dig it out if you can't find it. Cheers, Daniel
On Thu, Jan 31, 2013 at 2:23 PM, Daniel Vetter <daniel@ffwll.ch> wrote: > On Thu, Jan 31, 2013 at 9:38 AM, Rahul Sharma <r.sh.open@gmail.com> wrote: >> On Tue, Jan 29, 2013 at 10:40 PM, Aaron Plattner <aplattner@nvidia.com> wrote: >>> On 01/28/2013 05:38 AM, Rahul Sharma wrote: >>>> >>>> It fixes the issue arises due to passing 'nr_pages' in place of 'nents' to >>>> sg_alloc_table. When ARM_HAS_SG_CHAIN is disabled, it is causing failure >>>> in >>>> creating SG table for the buffers having more than 204 physical pages i.e. >>>> equal to SG_MAX_SINGLE_ALLOC. >>>> >>>> When using sg_alloc_table_from_pages interface, in place of >>>> sg_alloc_table, >>>> page list will be passes to get each contiguous section which is >>>> represented >>>> by a single entry in the table. For a Contiguous Buffer, number of entries >>>> should be equal to 1. >>>> >>>> Following check is causing the failure which is not applicable for >>>> Non-Contig >>>> buffers: >>>> >>>> if (WARN_ON_ONCE(nents > max_ents)) >>>> return -EINVAL; >>>> >>>> Above patch is well tested for EXYNOS4 and EXYNOS5 for with/wihtout IOMMU >>>> supprot. NOUVEAU and RADEON platforms also depends on >>>> drm_prime_pages_to_sg >>>> helper function. >>>> >>>> This set is base on "exynos-drm-fixes" branch at >>>> http://git.kernel.org/?p=linux/kernel/git/daeinki/drm-exynos.git >>>> >>>> Signed-off-by: Rahul Sharma <rahul.sharma@samsung.com> >>> >>> >>> Reviewed-by: Aaron Plattner <aplattner@nvidia.com> >>> >>> I also verified that this reduces my 2025-entry sg_table to 6 entries, so >>> >>> Tested-by: Aaron Plattner <aplattner@nvidia.com> >>> >>> -- >>> Aaron >>> >> >> Thanks Aaron, >> >> I want to request stake-holders to review and test this patch for >> other platforms. > > Iirc the old i915 dma_buf import code presumed that each sg entry > points to exactly one page. Yeah, Dave cut a few corners in the > initial implementation. Since Chris' rework to use sg_tables > internally in i915.ko I think we should be safe, but would need to do > a full audit of the code. No idea what the exact situation in ttm is. > > So I think step one is for you to crawl through the existing drm prime > drivers and check that you don't break any hidden assumptions about > this. Thanks Daniel, I have parsed the related code and it looks fine to me. I couldn't find any code section, expecting sg-tables with single-page sgl entries. I just want to ensure again that it doesn't cause any side effects on various platforms. > > For the patch itself I think there's now a generic pages_to_sg helper > in the dma core which does exactly what you want it to do. I can dig > it out if you can't find it. > Sorry, I din't get this part. Please elaborate a bit. regards, Rahul Sharma. > Cheers, Daniel > -- > Daniel Vetter > Software Engineer, Intel Corporation > +41 (0) 79 365 57 48 - http://blog.ffwll.ch
On Thu, Jan 31, 2013 at 12:54 PM, Rahul Sharma <r.sh.open@gmail.com> wrote: > I have parsed the related code and it looks fine to me. I couldn't find > any code section, expecting sg-tables with single-page sgl entries. I > just want to ensure again that it doesn't cause any side effects on > various platforms. Just chatted with Chris Wilson on our team, and at last i915.ko _has_ code paths which rely on this. See the reloc handling offset computations in i915_gem_execbuf.c. I haven't checked ttm/radeon/nouveau/vmwgfx simply because I don't have the time right now, and I know that doing this carefully will blow through a few days. So this patch will break prime buffer sharing on the desktop. The other thing is that only doing sg entry coalescing for prime/dma_buf will lead to a QA problem, since all the normal sg tables still have a 1:1 relationship. So imo the right way to move forward with this is to convert the various sg table constructors in i915/ttm/radeon/nouveau/... over to coalesce pages. This allows us to fix any fallout step-by-step. Then, once each driver is ready for this, we can merge your patch. >> For the patch itself I think there's now a generic pages_to_sg helper >> in the dma core which does exactly what you want it to do. I can dig >> it out if you can't find it. >> > > Sorry, I din't get this part. Please elaborate a bit. See sg_alloc_table_from_pages in lib/scatterlist.c introduced with: commit efc42bc98058a36d761b16a114823db1a902ed05 Author: Tomasz Stanislawski <t.stanislaws@samsung.com> Date: Mon Jun 18 09:25:01 2012 +0200 scatterlist: add sg_alloc_table_from_pages function Cheers, Daniel
On Thu, Jan 31, 2013 at 5:47 PM, Daniel Vetter <daniel@ffwll.ch> wrote: > On Thu, Jan 31, 2013 at 12:54 PM, Rahul Sharma <r.sh.open@gmail.com> wrote: >> I have parsed the related code and it looks fine to me. I couldn't find >> any code section, expecting sg-tables with single-page sgl entries. I >> just want to ensure again that it doesn't cause any side effects on >> various platforms. > > Just chatted with Chris Wilson on our team, and at last i915.ko _has_ > code paths which rely on this. See the reloc handling offset > computations in i915_gem_execbuf.c. I haven't checked > ttm/radeon/nouveau/vmwgfx simply because I don't have the time right > now, and I know that doing this carefully will blow through a few > days. So this patch will break prime buffer sharing on the desktop. > > The other thing is that only doing sg entry coalescing for > prime/dma_buf will lead to a QA problem, since all the normal sg > tables still have a 1:1 relationship. So imo the right way to move > forward with this is to convert the various sg table constructors in > i915/ttm/radeon/nouveau/... over to coalesce pages. This allows us to > fix any fallout step-by-step. > > Then, once each driver is ready for this, we can merge your patch. > Looks fine to me. >>> For the patch itself I think there's now a generic pages_to_sg helper >>> in the dma core which does exactly what you want it to do. I can dig >>> it out if you can't find it. >>> >> >> Sorry, I din't get this part. Please elaborate a bit. > > See sg_alloc_table_from_pages in lib/scatterlist.c introduced with: > > commit efc42bc98058a36d761b16a114823db1a902ed05 > Author: Tomasz Stanislawski <t.stanislaws@samsung.com> > Date: Mon Jun 18 09:25:01 2012 +0200 > > scatterlist: add sg_alloc_table_from_pages function > Yes, my patch is using sg_alloc_table_from_pages. regards, Rahul Sharma. > Cheers, Daniel > -- > Daniel Vetter > Software Engineer, Intel Corporation > +41 (0) 79 365 57 48 - http://blog.ffwll.ch
On Thu, Jan 31, 2013 at 2:26 PM, Rahul Sharma <r.sh.open@gmail.com> wrote: >> The other thing is that only doing sg entry coalescing for >> prime/dma_buf will lead to a QA problem, since all the normal sg >> tables still have a 1:1 relationship. So imo the right way to move >> forward with this is to convert the various sg table constructors in >> i915/ttm/radeon/nouveau/... over to coalesce pages. This allows us to >> fix any fallout step-by-step. >> >> Then, once each driver is ready for this, we can merge your patch. > > Looks fine to me. I've looked some more at the drivers, and it seems like ttm uses the sg tables only for dma_buf imported objects. Also, all drivers already use drm_prime_sg_to_page_addr_arrays, which does the right thing. So ttm drivers should indeed be all fine (I've stopped chaising callchains before hitting ttm_tt_populate, where the magic happens). So that seems to only leave i915, which has a few backing storage walkers in i915_gem_execbuffer.c and the sg constructor i915_gem_object_get_pages_gtt in i915_gem.c. That one is a bit a pain since it doesn't start out with a simple struct page * array. But I think we could just coalesce sg entries anyway without reaping any size benefits for the allocate sg table, to ensure that i915 doesn't break accidentally when facing such an sg table. I'm rather busy the next few days, but should be able to look into patches later on. >>>> For the patch itself I think there's now a generic pages_to_sg helper >>>> in the dma core which does exactly what you want it to do. I can dig >>>> it out if you can't find it. >>>> >>> >>> Sorry, I din't get this part. Please elaborate a bit. >> >> See sg_alloc_table_from_pages in lib/scatterlist.c introduced with: >> >> commit efc42bc98058a36d761b16a114823db1a902ed05 >> Author: Tomasz Stanislawski <t.stanislaws@samsung.com> >> Date: Mon Jun 18 09:25:01 2012 +0200 >> >> scatterlist: add sg_alloc_table_from_pages function >> > > Yes, my patch is using sg_alloc_table_from_pages Oops, I've totally missed that. My apologies for the confusion. Thanks, Daniel
On Tue, Jan 29, 2013 at 09:10:40AM -0800, Aaron Plattner wrote: > On 01/28/2013 05:38 AM, Rahul Sharma wrote: > >It fixes the issue arises due to passing 'nr_pages' in place of 'nents' to > >sg_alloc_table. When ARM_HAS_SG_CHAIN is disabled, it is causing failure in > >creating SG table for the buffers having more than 204 physical pages i.e. > >equal to SG_MAX_SINGLE_ALLOC. > > > >When using sg_alloc_table_from_pages interface, in place of sg_alloc_table, > >page list will be passes to get each contiguous section which is represented > >by a single entry in the table. For a Contiguous Buffer, number of entries > >should be equal to 1. > > > >Following check is causing the failure which is not applicable for Non-Contig > >buffers: > > > > if (WARN_ON_ONCE(nents > max_ents)) > > return -EINVAL; > > > >Above patch is well tested for EXYNOS4 and EXYNOS5 for with/wihtout IOMMU > >supprot. NOUVEAU and RADEON platforms also depends on drm_prime_pages_to_sg > >helper function. > > > >This set is base on "exynos-drm-fixes" branch at > >http://git.kernel.org/?p=linux/kernel/git/daeinki/drm-exynos.git > > > >Signed-off-by: Rahul Sharma <rahul.sharma@samsung.com> > > Reviewed-by: Aaron Plattner <aplattner@nvidia.com> > > I also verified that this reduces my 2025-entry sg_table to 6 entries, so > > Tested-by: Aaron Plattner <aplattner@nvidia.com> I've just merged the drm/i915 fixes and this patch on top to drm-intel-next-queued. I'll send the pull request to Dave somewhen next week, presuming nothing unexpected blows up. Thanks for the patch&review. -Daniel
diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c index 7f12573..072ee08 100644 --- a/drivers/gpu/drm/drm_prime.c +++ b/drivers/gpu/drm/drm_prime.c @@ -217,21 +217,17 @@ int drm_prime_fd_to_handle_ioctl(struct drm_device *dev, void *data, struct sg_table *drm_prime_pages_to_sg(struct page **pages, int nr_pages) { struct sg_table *sg = NULL; - struct scatterlist *iter; - int i; int ret; sg = kmalloc(sizeof(struct sg_table), GFP_KERNEL); if (!sg) goto out; - ret = sg_alloc_table(sg, nr_pages, GFP_KERNEL); + ret = sg_alloc_table_from_pages(sg, pages, nr_pages, 0, + nr_pages << PAGE_SHIFT, GFP_KERNEL); if (ret) goto out; - for_each_sg(sg->sgl, iter, nr_pages, i) - sg_set_page(iter, pages[i], PAGE_SIZE, 0); - return sg; out: kfree(sg);
It fixes the issue arises due to passing 'nr_pages' in place of 'nents' to sg_alloc_table. When ARM_HAS_SG_CHAIN is disabled, it is causing failure in creating SG table for the buffers having more than 204 physical pages i.e. equal to SG_MAX_SINGLE_ALLOC. When using sg_alloc_table_from_pages interface, in place of sg_alloc_table, page list will be passes to get each contiguous section which is represented by a single entry in the table. For a Contiguous Buffer, number of entries should be equal to 1. Following check is causing the failure which is not applicable for Non-Contig buffers: if (WARN_ON_ONCE(nents > max_ents)) return -EINVAL; Above patch is well tested for EXYNOS4 and EXYNOS5 for with/wihtout IOMMU supprot. NOUVEAU and RADEON platforms also depends on drm_prime_pages_to_sg helper function. This set is base on "exynos-drm-fixes" branch at http://git.kernel.org/?p=linux/kernel/git/daeinki/drm-exynos.git Signed-off-by: Rahul Sharma <rahul.sharma@samsung.com> --- drivers/gpu/drm/drm_prime.c | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-)