Message ID | 20201106170036.18713-5-logang@deltatee.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Userspace P2PDMA with O_DIRECT NVMe devices | expand |
On Fri, Nov 06, 2020 at 10:00:25AM -0700, Logan Gunthorpe wrote: > We make use of the top bit of the dma_length to indicate a P2PDMA > segment. I don't think "we" can. There is nothing limiting the size of a SGL segment.
On 2020-11-09 09:12, Christoph Hellwig wrote: > On Fri, Nov 06, 2020 at 10:00:25AM -0700, Logan Gunthorpe wrote: >> We make use of the top bit of the dma_length to indicate a P2PDMA >> segment. > > I don't think "we" can. There is nothing limiting the size of a SGL > segment. Right, the story behind ab2cbeb0ed30 ("iommu/dma: Handle SG length overflow better") comes immediately to mind, for one. If all the P2P users can agree to be in on the game then by all means implement this in the P2P code, but I don't think it belongs in the generic top-level scatterlist API. Robin.
On 2020-11-09 2:12 a.m., Christoph Hellwig wrote: > On Fri, Nov 06, 2020 at 10:00:25AM -0700, Logan Gunthorpe wrote: >> We make use of the top bit of the dma_length to indicate a P2PDMA >> segment. > > I don't think "we" can. There is nothing limiting the size of a SGL > segment. Yes, I expected this would be the unacceptable part. Any alternative ideas? Logan
On Mon, Nov 9, 2020 at 8:47 AM Logan Gunthorpe <logang@deltatee.com> wrote: > > > > On 2020-11-09 2:12 a.m., Christoph Hellwig wrote: > > On Fri, Nov 06, 2020 at 10:00:25AM -0700, Logan Gunthorpe wrote: > >> We make use of the top bit of the dma_length to indicate a P2PDMA > >> segment. > > > > I don't think "we" can. There is nothing limiting the size of a SGL > > segment. > > Yes, I expected this would be the unacceptable part. Any alternative ideas? Why is the SG_P2PDMA_FLAG needed as compared to checking the SGL segment-pages for is_pci_p2pdma_page()?
On 2020-12-09 6:22 p.m., Dan Williams wrote: > On Mon, Nov 9, 2020 at 8:47 AM Logan Gunthorpe <logang@deltatee.com> wrote: >> >> >> >> On 2020-11-09 2:12 a.m., Christoph Hellwig wrote: >>> On Fri, Nov 06, 2020 at 10:00:25AM -0700, Logan Gunthorpe wrote: >>>> We make use of the top bit of the dma_length to indicate a P2PDMA >>>> segment. >>> >>> I don't think "we" can. There is nothing limiting the size of a SGL >>> segment. >> >> Yes, I expected this would be the unacceptable part. Any alternative ideas? > > Why is the SG_P2PDMA_FLAG needed as compared to checking the SGL > segment-pages for is_pci_p2pdma_page()? Because the DMA and page segments in the SGL aren't necessarily aligned... The IOMMU implementations can coalesce multiple pages into fewer DMA address ranges, so the page pointed to by sg->page_link may not be the one that corresponds to the address in sg->dma_address for a given segment. If that makes sense -- it's not the easiest thing to explain. Logan
On Wed, Dec 9, 2020 at 6:07 PM Logan Gunthorpe <logang@deltatee.com> wrote: > > > > On 2020-12-09 6:22 p.m., Dan Williams wrote: > > On Mon, Nov 9, 2020 at 8:47 AM Logan Gunthorpe <logang@deltatee.com> wrote: > >> > >> > >> > >> On 2020-11-09 2:12 a.m., Christoph Hellwig wrote: > >>> On Fri, Nov 06, 2020 at 10:00:25AM -0700, Logan Gunthorpe wrote: > >>>> We make use of the top bit of the dma_length to indicate a P2PDMA > >>>> segment. > >>> > >>> I don't think "we" can. There is nothing limiting the size of a SGL > >>> segment. > >> > >> Yes, I expected this would be the unacceptable part. Any alternative ideas? > > > > Why is the SG_P2PDMA_FLAG needed as compared to checking the SGL > > segment-pages for is_pci_p2pdma_page()? > > Because the DMA and page segments in the SGL aren't necessarily aligned... > > The IOMMU implementations can coalesce multiple pages into fewer DMA > address ranges, so the page pointed to by sg->page_link may not be the > one that corresponds to the address in sg->dma_address for a given segment. > > If that makes sense -- it's not the easiest thing to explain. It does... Did someone already grab, or did you already consider the 3rd available bit in page_link? AFAICS only SG_CHAIN and SG_END are reserved. However, if you have a CONFIG_64BIT dependency for user-directed p2pdma that would seem to allow SG_P2PDMA_FLAG to be (0x4) in page_link.
On 2020-12-09 9:04 p.m., Dan Williams wrote: > On Wed, Dec 9, 2020 at 6:07 PM Logan Gunthorpe <logang@deltatee.com> wrote: >> >> >> >> On 2020-12-09 6:22 p.m., Dan Williams wrote: >>> On Mon, Nov 9, 2020 at 8:47 AM Logan Gunthorpe <logang@deltatee.com> wrote: >>>> >>>> >>>> >>>> On 2020-11-09 2:12 a.m., Christoph Hellwig wrote: >>>>> On Fri, Nov 06, 2020 at 10:00:25AM -0700, Logan Gunthorpe wrote: >>>>>> We make use of the top bit of the dma_length to indicate a P2PDMA >>>>>> segment. >>>>> >>>>> I don't think "we" can. There is nothing limiting the size of a SGL >>>>> segment. >>>> >>>> Yes, I expected this would be the unacceptable part. Any alternative ideas? >>> >>> Why is the SG_P2PDMA_FLAG needed as compared to checking the SGL >>> segment-pages for is_pci_p2pdma_page()? >> >> Because the DMA and page segments in the SGL aren't necessarily aligned... >> >> The IOMMU implementations can coalesce multiple pages into fewer DMA >> address ranges, so the page pointed to by sg->page_link may not be the >> one that corresponds to the address in sg->dma_address for a given segment. >> >> If that makes sense -- it's not the easiest thing to explain. > > It does... > > Did someone already grab, or did you already consider the 3rd > available bit in page_link? AFAICS only SG_CHAIN and SG_END are > reserved. However, if you have a CONFIG_64BIT dependency for > user-directed p2pdma that would seem to allow SG_P2PDMA_FLAG to be > (0x4) in page_link. Hmm, I half considered that, but I had came to the conclusion that given the mis-alignment I shouldn't be using the page side of the SGL. However, reconsidering now, that might actually be a reasonable option. However, the CONFIG_64BIT dependency would have to be on all P2PDMA, because we'd need to replace pci_p2pdma_map_sg() in all cases. I'm not sure if this would be a restriction people care about. Thanks, Logan
diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h index 36c47e7e66a2..e738159d56f9 100644 --- a/include/linux/scatterlist.h +++ b/include/linux/scatterlist.h @@ -39,6 +39,10 @@ struct scatterlist { #define sg_dma_len(sg) ((sg)->length) #endif +#define SG_P2PDMA_FLAG (1U << 31) +#define sg_dma_p2pdma_len(sg) (sg_dma_len(sg) & ~SG_P2PDMA_FLAG) +#define sg_dma_is_p2pdma(sg) (sg_dma_len(sg) & SG_P2PDMA_FLAG) + struct sg_table { struct scatterlist *sgl; /* the list */ unsigned int nents; /* number of mapped entries */
We make use of the top bit of the dma_length to indicate a P2PDMA segment. Code that uses this will need to use sg_dma_p2pdma_len() to get the length and ensure no lengths exceed 2GB. An sg_dma_is_p2pdma() helper is included to check if a particular segment is p2pdma(). Signed-off-by: Logan Gunthorpe <logang@deltatee.com> --- include/linux/scatterlist.h | 4 ++++ 1 file changed, 4 insertions(+)