Message ID | 20201004154340.1080481-1-leon@kernel.org (mailing list archive) |
---|---|
Headers | show |
Series | Dynamicaly allocate SG table from the pages | expand |
On Sun, Oct 04, 2020 at 06:43:36PM +0300, Leon Romanovsky wrote: > This series extends __sg_alloc_table_from_pages to allow chaining of > new pages to already initialized SG table. > > This allows for the drivers to utilize the optimization of merging contiguous > pages without a need to pre allocate all the pages and hold them in > a very large temporary buffer prior to the call to SG table initialization. > > The second patch changes the Infiniband driver to use the new API. It > removes duplicate functionality from the code and benefits the > optimization of allocating dynamic SG table from pages. > > In huge pages system of 2MB page size, without this change, the SG table > would contain x512 SG entries. > E.g. for 100GB memory registration: > > Number of entries Size > Before 26214400 600.0MB > After 51200 1.2MB > > Thanks > > Maor Gottlieb (2): > lib/scatterlist: Add support in dynamic allocation of SG table from > pages > RDMA/umem: Move to allocate SG table from pages > > Tvrtko Ursulin (2): > tools/testing/scatterlist: Rejuvenate bit-rotten test > tools/testing/scatterlist: Show errors in human readable form This looks OK, I'm going to send it into linux-next on the hmm tree for awhile to see if anything gets broken. If there is more remarks/tags/etc please continue Thanks, Jason
On Mon, Oct 05, 2020 at 08:56:50PM -0300, Jason Gunthorpe wrote: > On Sun, Oct 04, 2020 at 06:43:36PM +0300, Leon Romanovsky wrote: > > This series extends __sg_alloc_table_from_pages to allow chaining of > > new pages to already initialized SG table. > > > > This allows for the drivers to utilize the optimization of merging contiguous > > pages without a need to pre allocate all the pages and hold them in > > a very large temporary buffer prior to the call to SG table initialization. > > > > The second patch changes the Infiniband driver to use the new API. It > > removes duplicate functionality from the code and benefits the > > optimization of allocating dynamic SG table from pages. > > > > In huge pages system of 2MB page size, without this change, the SG table > > would contain x512 SG entries. > > E.g. for 100GB memory registration: > > > > Number of entries Size > > Before 26214400 600.0MB > > After 51200 1.2MB > > > > Thanks > > > > Maor Gottlieb (2): > > lib/scatterlist: Add support in dynamic allocation of SG table from > > pages > > RDMA/umem: Move to allocate SG table from pages > > > > Tvrtko Ursulin (2): > > tools/testing/scatterlist: Rejuvenate bit-rotten test > > tools/testing/scatterlist: Show errors in human readable form > > This looks OK, I'm going to send it into linux-next on the hmm tree > for awhile to see if anything gets broken. If there is more > remarks/tags/etc please continue An idea that just crossed my mind: A pin_user_pages_sgt might be useful for both rdma and drm, since this would avoid the possible huge interim struct pages array for thp pages. Or anything else that could be coalesced down into a single sg entry. Not sure it's worth it, but would at least give a slightly neater interface I think. -Daniel
On Wed, Oct 7, 2020 at 9:22 AM Jason Gunthorpe <jgg@ziepe.ca> wrote: > On Tue, Oct 06, 2020 at 12:41:22PM +0200, Daniel Vetter wrote: > > On Mon, Oct 05, 2020 at 08:56:50PM -0300, Jason Gunthorpe wrote: > > > On Sun, Oct 04, 2020 at 06:43:36PM +0300, Leon Romanovsky wrote: > > > > This series extends __sg_alloc_table_from_pages to allow chaining of > > > > new pages to already initialized SG table. > > > > > > > > This allows for the drivers to utilize the optimization of merging contiguous > > > > pages without a need to pre allocate all the pages and hold them in > > > > a very large temporary buffer prior to the call to SG table initialization. > > > > > > > > The second patch changes the Infiniband driver to use the new API. It > > > > removes duplicate functionality from the code and benefits the > > > > optimization of allocating dynamic SG table from pages. > > > > > > > > In huge pages system of 2MB page size, without this change, the SG table > > > > would contain x512 SG entries. > > > > E.g. for 100GB memory registration: > > > > > > > > Number of entries Size > > > > Before 26214400 600.0MB > > > > After 51200 1.2MB > > > > > > > > Thanks > > > > > > > > Maor Gottlieb (2): > > > > lib/scatterlist: Add support in dynamic allocation of SG table from > > > > pages > > > > RDMA/umem: Move to allocate SG table from pages > > > > > > > > Tvrtko Ursulin (2): > > > > tools/testing/scatterlist: Rejuvenate bit-rotten test > > > > tools/testing/scatterlist: Show errors in human readable form > > > > > > This looks OK, I'm going to send it into linux-next on the hmm tree > > > for awhile to see if anything gets broken. If there is more > > > remarks/tags/etc please continue > > > > An idea that just crossed my mind: A pin_user_pages_sgt might be useful > > for both rdma and drm, since this would avoid the possible huge interim > > struct pages array for thp pages. Or anything else that could be coalesced > > down into a single sg entry. > > > > Not sure it's worth it, but would at least give a slightly neater > > interface I think. > > We've talked about it. Christoph wants to see this area move to a biovec > interface instead of sgl, but it might still be worthwhile to have an > interm step at least as an API consolidation. Hm but then we'd need a new struct for the mapped side of things (which would still be what you get from dma-buf). That would be quite a bit of work to roll out everywhere, and sgt isn't such a huge misfit for passing buffer object mappings and system memory backing storage around, and hence what we (very slowly) converging drivers/gpu towards over the past 10 years or so. And moving the dma_map step out of dma-buf doesn't work, because some of the use-cases we have is for very special iommus which are managed by the gpu driver directly. Stuff that e.g. rotates/retiles/compresses on the fly, and is accessible by other (gfx related like video code, camera, ..) devices. Not something I expect to ever be relevant for rdma since this exist mostly on some small soc, but it's a thing. Without that dma-buf could hand out biovec for struct_page backed stuff, or some pfn_vec for the p2p stuff. Anyway was just an idea, I guess we'll have to live with some impedance mismatch since rolling out the one an only iovec structure which suits everyone is I think impossible :-) > Avoiding the page list would be complicated as we'd somehow have to > code share the page table iterator scheme. We're (slowly) getting towards thp for vram mappings and everything so I guess for drivers/gpu we might make that happen. But yeah it'd be not so pretty I think. -Daniel
From: Leon Romanovsky <leonro@nvidia.com> Changelog: v5: * Use sg_init_table to allocate table and avoid changes is __sg_alloc_table * Fix offset issue v4: https://lore.kernel.org/lkml/20200927064647.3106737-1-leon@kernel.org * Fixed formatting in first patch. * Added fix (clear tmp_netnts) in first patch to fix i915 failure. * Added test patches v3: https://lore.kernel.org/linux-rdma/20200922083958.2150803-1-leon@kernel.org/ * Squashed Christopher's suggestion to avoid introduced new API, but extend existing one. v2: https://lore.kernel.org/linux-rdma/20200916140726.839377-1-leon@kernel.org * Fixed indentations and comments * Deleted sg_alloc_next() * Squashed lib/scatterlist patches into one v1: https://lore.kernel.org/lkml/20200910134259.1304543-1-leon@kernel.org * Changed _sg_chain to be __sg_chain * Added dependency on ARCH_NO_SG_CHAIN * Removed struct sg_append v0: * https://lore.kernel.org/lkml/20200903121853.1145976-1-leon@kernel.org -------------------------------------------------------------------------- From Maor: This series extends __sg_alloc_table_from_pages to allow chaining of new pages to already initialized SG table. This allows for the drivers to utilize the optimization of merging contiguous pages without a need to pre allocate all the pages and hold them in a very large temporary buffer prior to the call to SG table initialization. The second patch changes the Infiniband driver to use the new API. It removes duplicate functionality from the code and benefits the optimization of allocating dynamic SG table from pages. In huge pages system of 2MB page size, without this change, the SG table would contain x512 SG entries. E.g. for 100GB memory registration: Number of entries Size Before 26214400 600.0MB After 51200 1.2MB Thanks Maor Gottlieb (2): lib/scatterlist: Add support in dynamic allocation of SG table from pages RDMA/umem: Move to allocate SG table from pages Tvrtko Ursulin (2): tools/testing/scatterlist: Rejuvenate bit-rotten test tools/testing/scatterlist: Show errors in human readable form drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 12 +- drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c | 15 ++- drivers/infiniband/core/umem.c | 94 ++------------- include/linux/scatterlist.h | 38 +++--- lib/scatterlist.c | 125 ++++++++++++++++---- tools/testing/scatterlist/Makefile | 3 +- tools/testing/scatterlist/linux/mm.h | 35 ++++++ tools/testing/scatterlist/main.c | 53 ++++++--- 8 files changed, 225 insertions(+), 150 deletions(-) -- 2.26.2