Message ID | 445bc352-75d7-438f-96ef-c2411215628d@deltatee.com (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
On Mon, Apr 3, 2017 at 2:20 PM, Logan Gunthorpe <logang@deltatee.com> wrote: > Hi Christoph, > > What are your thoughts on an approach like the following untested > draft patch. > > The patch (if fleshed out) makes it so iomem can be used in an sgl > and WARN_ONs will occur in places where drivers attempt to access > iomem directly through the sgl. > > I'd also probably create a p2pmem_alloc_sgl helper function so driver > writers wouldn't have to mess with sg_set_iomem_page. > > With all that in place, it should be relatively safe for drivers to > implement p2pmem even though we'd still technically be violating the > __iomem boundary in some places. Just reacting to this mail, I still haven't had a chance to take a look at the rest of the series. The pfn_t type was invented to carry extra type and page lookup information about the memory behind a given pfn. At first glance that seems a more natural place to carry an indication that this is an "I/O" pfn.
On 03/04/17 03:44 PM, Dan Williams wrote: > On Mon, Apr 3, 2017 at 2:20 PM, Logan Gunthorpe <logang@deltatee.com> wrote: >> Hi Christoph, >> >> What are your thoughts on an approach like the following untested >> draft patch. >> >> The patch (if fleshed out) makes it so iomem can be used in an sgl >> and WARN_ONs will occur in places where drivers attempt to access >> iomem directly through the sgl. >> >> I'd also probably create a p2pmem_alloc_sgl helper function so driver >> writers wouldn't have to mess with sg_set_iomem_page. >> >> With all that in place, it should be relatively safe for drivers to >> implement p2pmem even though we'd still technically be violating the >> __iomem boundary in some places. > > Just reacting to this mail, I still haven't had a chance to take a > look at the rest of the series. > > The pfn_t type was invented to carry extra type and page lookup > information about the memory behind a given pfn. At first glance that > seems a more natural place to carry an indication that this is an > "I/O" pfn. I agree... But what are the plans for pfn_t? Is anyone working on using it in the scatterlist code? Currently it's not there yet and given the assertion that we will continue to be using struct page for DMA is that a direction we'd want to go? Logan
On Mon, Apr 3, 2017 at 3:10 PM, Logan Gunthorpe <logang@deltatee.com> wrote: > > > On 03/04/17 03:44 PM, Dan Williams wrote: >> On Mon, Apr 3, 2017 at 2:20 PM, Logan Gunthorpe <logang@deltatee.com> wrote: >>> Hi Christoph, >>> >>> What are your thoughts on an approach like the following untested >>> draft patch. >>> >>> The patch (if fleshed out) makes it so iomem can be used in an sgl >>> and WARN_ONs will occur in places where drivers attempt to access >>> iomem directly through the sgl. >>> >>> I'd also probably create a p2pmem_alloc_sgl helper function so driver >>> writers wouldn't have to mess with sg_set_iomem_page. >>> >>> With all that in place, it should be relatively safe for drivers to >>> implement p2pmem even though we'd still technically be violating the >>> __iomem boundary in some places. >> >> Just reacting to this mail, I still haven't had a chance to take a >> look at the rest of the series. >> >> The pfn_t type was invented to carry extra type and page lookup >> information about the memory behind a given pfn. At first glance that >> seems a more natural place to carry an indication that this is an >> "I/O" pfn. > > I agree... But what are the plans for pfn_t? Is anyone working on using > it in the scatterlist code? Currently it's not there yet and given the > assertion that we will continue to be using struct page for DMA is that > a direction we'd want to go? > I wouldn't necessarily conflate supporting pfn_t in the scatterlist with the stalled stuct-page-less DMA effor. A pfn_t_to_page() conversion will still work and be required. However you're right, the minute we use pfn_t for this we're into the realm of special case drivers that understand scatterlists with special "I/O-pfn_t" entries. However, maybe that's what we want? I think peer-to-peer DMA is not a general purpose feature unless/until we get it standardized in PCI. So maybe drivers with special case scatterlist support is exactly what we want for now. Thoughts?
On 03/04/17 04:47 PM, Dan Williams wrote: > I wouldn't necessarily conflate supporting pfn_t in the scatterlist > with the stalled stuct-page-less DMA effor. A pfn_t_to_page() > conversion will still work and be required. However you're right, the > minute we use pfn_t for this we're into the realm of special case > drivers that understand scatterlists with special "I/O-pfn_t" entries. Well yes, it would certainly be possible to convert the scatterlist code from page_link to pfn_t. (The only slightly tricky thing is that scatterlist uses extra chaining bits and pfn_t uses extra flag bits so they'd have to be harmonized somehow). But if we aren't moving toward struct-page-less DMA, I fail to see the point of the conversion. I'll definitely need IO scatterlists of some form or another and I like pfn_t but right now it just seems like extra work with unclear benefit. (Though, if someone told me that I can't use a third bit in the page_link field then maybe that would be a good reason to move to pfn_t.) > However, maybe that's what we want? I think peer-to-peer DMA is not a > general purpose feature unless/until we get it standardized in PCI. So > maybe drivers with special case scatterlist support is exactly what we > want for now. Well, I think this should be completely independent from PCI code. I see no reason why we can't have infrastructure for DMA on iomem from any bus. Largely all the work I've done in this area is completely agnostic to the bus in use. (Except for any kind of white/black list when it is used.) The "special case scatterlist" is essentially what I'm proposing in the patch I sent upthread, it just stores the flag in the page_link instead of in a pfn_t. Logan
On Mon, Apr 3, 2017 at 4:12 PM, Logan Gunthorpe <logang@deltatee.com> wrote: > > > On 03/04/17 04:47 PM, Dan Williams wrote: >> I wouldn't necessarily conflate supporting pfn_t in the scatterlist >> with the stalled stuct-page-less DMA effor. A pfn_t_to_page() >> conversion will still work and be required. However you're right, the >> minute we use pfn_t for this we're into the realm of special case >> drivers that understand scatterlists with special "I/O-pfn_t" entries. > > Well yes, it would certainly be possible to convert the scatterlist code > from page_link to pfn_t. (The only slightly tricky thing is that > scatterlist uses extra chaining bits and pfn_t uses extra flag bits so > they'd have to be harmonized somehow). But if we aren't moving toward > struct-page-less DMA, I fail to see the point of the conversion. > > I'll definitely need IO scatterlists of some form or another and I like > pfn_t but right now it just seems like extra work with unclear benefit. > (Though, if someone told me that I can't use a third bit in the > page_link field then maybe that would be a good reason to move to pfn_t.) > >> However, maybe that's what we want? I think peer-to-peer DMA is not a >> general purpose feature unless/until we get it standardized in PCI. So >> maybe drivers with special case scatterlist support is exactly what we >> want for now. > > Well, I think this should be completely independent from PCI code. I see > no reason why we can't have infrastructure for DMA on iomem from any > bus. Largely all the work I've done in this area is completely agnostic > to the bus in use. (Except for any kind of white/black list when it is > used.) The completely agnostic part is where I get worried, but I shouldn't say anymore until I actually read the patch.The worry is cases where this agnostic enabling allows unsuspecting code paths to do the wrong thing. Like bypass iomem safety. > The "special case scatterlist" is essentially what I'm proposing in the > patch I sent upthread, it just stores the flag in the page_link instead > of in a pfn_t. Makes sense. The suggestion of pfn_t was to try to get more type safety throughout the stack. So that, again, unsuspecting code paths that get an I/O pfn aren't able to do things like page_address() or kmap() without failing. I'll stop commenting now and set aside some time to go read the patches.
Hi Dan, On 03/04/17 06:07 PM, Dan Williams wrote: > The completely agnostic part is where I get worried, but I shouldn't > say anymore until I actually read the patch.The worry is cases where > this agnostic enabling allows unsuspecting code paths to do the wrong > thing. Like bypass iomem safety. Yup, you're right the iomem safety issue is a really difficult problem. I think replacing struct page with pfn_t in a bunch of places is probably going to be a requirement for my work. However, this is going to be a very large undertaking. I've done an audit of sg_page users and there will indeed be some difficult cases. However, I'm going to start doing some cleanup and semantic changes to hopefully move in that direction. The first step I've chosen to look at is to create an sg_kmap interface which replaces about 77 (out of ~340) sg_page users. I'm hoping the new interface can have the semantic that sg_kmap can fail (which would happen in the case that no suitable page exists). Eventually, I'd want to get to a place where sg_page either doesn't exists or can fail and is always checked. At that point swapping out pfn_t in the sgl would be manageable. Thoughts? Logan
diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c index 0007b79..bd690a2c 100644 --- a/drivers/dma-buf/dma-buf.c +++ b/drivers/dma-buf/dma-buf.c @@ -37,6 +37,9 @@ #include <uapi/linux/dma-buf.h> +/* Avoid the highmem.h macro from aliasing our ops->kunmap_atomic */ +#undef kunmap_atomic + static inline int is_dma_buf_file(struct file *); struct dma_buf_list { diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h index cb3c8fe..7608da0 100644 --- a/include/linux/scatterlist.h +++ b/include/linux/scatterlist.h @@ -5,6 +5,7 @@ #include <linux/types.h> #include <linux/bug.h> #include <linux/mm.h> +#include <linux/highmem.h> #include <asm/io.h> struct scatterlist { @@ -53,6 +54,9 @@ struct sg_table { * * If bit 1 is set, then this sg entry is the last element in a list. * + * We also use bit 2 to indicate whether the page_link points to an + * iomem page or not. + * * See sg_next(). * */ @@ -64,10 +68,17 @@ struct sg_table { * a valid sg entry, or whether it points to the start of a new scatterlist. * Those low bits are there for everyone! (thanks mason :-) */ -#define sg_is_chain(sg) ((sg)->page_link & 0x01) -#define sg_is_last(sg) ((sg)->page_link & 0x02) +#define PAGE_LINK_MASK 0x7 +#define PAGE_LINK_CHAIN 0x1 +#define PAGE_LINK_LAST 0x2 +#define PAGE_LINK_IOMEM 0x4 + +#define sg_is_chain(sg) ((sg)->page_link & PAGE_LINK_CHAIN) +#define sg_is_last(sg) ((sg)->page_link & PAGE_LINK_LAST) #define sg_chain_ptr(sg) \ - ((struct scatterlist *) ((sg)->page_link & ~0x03)) + ((struct scatterlist *) ((sg)->page_link & ~(PAGE_LINK_CHAIN | \ + PAGE_LINK_LAST))) +#define sg_is_iomem(sg) ((sg)->page_link & PAGE_LINK_IOMEM) /** * sg_assign_page - Assign a given page to an SG entry @@ -81,13 +92,13 @@ struct sg_table { **/ static inline void sg_assign_page(struct scatterlist *sg, struct page *page) { - unsigned long page_link = sg->page_link & 0x3; + unsigned long page_link = sg->page_link & PAGE_LINK_MASK; /* * In order for the low bit stealing approach to work, pages - * must be aligned at a 32-bit boundary as a minimum. + * must be aligned at a 64-bit boundary as a minimum. */ - BUG_ON((unsigned long) page & 0x03); + BUG_ON((unsigned long) page & PAGE_LINK_MASK); #ifdef CONFIG_DEBUG_SG BUG_ON(sg->sg_magic != SG_MAGIC); BUG_ON(sg_is_chain(sg)); @@ -117,13 +128,56 @@ static inline void sg_set_page(struct scatterlist *sg, struct page *page, sg->length = len; } +/** + * sg_set_page - Set sg entry to point at given iomem page + * @sg: SG entry + * @page: The page + * @len: Length of data + * @offset: Offset into page + * + * Description: + * Same as sg_set_page but used when the page is a ZONE_DEVICE page that + * points to IO memory. + * + **/ +static inline void sg_set_iomem_page(struct scatterlist *sg, struct page *page, + unsigned int len, unsigned int offset) +{ + sg_set_page(sg, page, len, offset); + sg->page_link |= PAGE_LINK_IOMEM; +} + static inline struct page *sg_page(struct scatterlist *sg) { #ifdef CONFIG_DEBUG_SG BUG_ON(sg->sg_magic != SG_MAGIC); BUG_ON(sg_is_chain(sg)); #endif - return (struct page *)((sg)->page_link & ~0x3); + return (struct page *)((sg)->page_link & ~PAGE_LINK_MASK); +} + +static inline void *sg_kmap(struct scatterlist *sg) +{ + WARN_ON(sg_is_iomem(sg)); + + return kmap(sg_page(sg)); +} + +static inline void sg_kunmap(struct scatterlist *sg, void *addr) +{ + kunmap(addr); +} + +static inline void *sg_kmap_atomic(struct scatterlist *sg) +{ + WARN_ON(sg_is_iomem(sg)); + + return kmap(sg_page(sg)); +} + +static inline void sg_kunmap_atomic(struct scatterlist *sg, void *addr) +{ + kunmap_atomic(addr); } /** @@ -171,7 +225,8 @@ static inline void sg_chain(struct scatterlist *prv, unsigned int prv_nents, * Set lowest bit to indicate a link pointer, and make sure to clear * the termination bit if it happens to be set. */ - prv[prv_nents - 1].page_link = ((unsigned long) sgl | 0x01) & ~0x02; + prv[prv_nents - 1].page_link = + ((unsigned long) sgl & ~PAGE_LINK_MASK) | PAGE_LINK_CHAIN; } /** @@ -191,8 +246,8 @@ static inline void sg_mark_end(struct scatterlist *sg) /* * Set termination bit, clear potential chain bit */ - sg->page_link |= 0x02; - sg->page_link &= ~0x01; + sg->page_link &= ~PAGE_LINK_MASK; + sg->page_link |= PAGE_LINK_LAST; } /** @@ -208,7 +263,7 @@ static inline void sg_unmark_end(struct scatterlist *sg) #ifdef CONFIG_DEBUG_SG BUG_ON(sg->sg_magic != SG_MAGIC); #endif - sg->page_link &= ~0x02; + sg->page_link &= ~PAGE_LINK_LAST; } /** @@ -383,6 +438,7 @@ static inline dma_addr_t sg_page_iter_dma_address(struct sg_page_iter *piter) #define SG_MITER_ATOMIC (1 << 0) /* use kmap_atomic */ #define SG_MITER_TO_SG (1 << 1) /* flush back to phys on unmap */ #define SG_MITER_FROM_SG (1 << 2) /* nop */ +#define SG_MITER_IOMEM (1 << 3) /* support iomem in miter ops */ struct sg_mapping_iter { /* the following three fields can be accessed directly */ diff --git a/lib/scatterlist.c b/lib/scatterlist.c index c6cf822..6d8f39b 100644 --- a/lib/scatterlist.c +++ b/lib/scatterlist.c @@ -580,6 +580,9 @@ bool sg_miter_next(struct sg_mapping_iter *miter) if (!sg_miter_get_next_page(miter)) return false; + if (!(miter->__flags & SG_MITER_IOMEM)) + WARN_ON(sg_is_iomem(miter->piter.sg)); + miter->page = sg_page_iter_page(&miter->piter); miter->consumed = miter->length = miter->__remaining; @@ -651,7 +654,7 @@ size_t sg_copy_buffer(struct scatterlist *sgl, unsigned int nents, void *buf, { unsigned int offset = 0; struct sg_mapping_iter miter; - unsigned int sg_flags = SG_MITER_ATOMIC; + unsigned int sg_flags = SG_MITER_ATOMIC | SG_MITER_IOMEM; if (to_buffer)
Hi Christoph, What are your thoughts on an approach like the following untested draft patch. The patch (if fleshed out) makes it so iomem can be used in an sgl and WARN_ONs will occur in places where drivers attempt to access iomem directly through the sgl. I'd also probably create a p2pmem_alloc_sgl helper function so driver writers wouldn't have to mess with sg_set_iomem_page. With all that in place, it should be relatively safe for drivers to implement p2pmem even though we'd still technically be violating the __iomem boundary in some places. Logan commit b435a154a4ec4f82766f6ab838092c3c5a9388ac Author: Logan Gunthorpe <logang@deltatee.com> Date: Wed Feb 8 12:44:52 2017 -0700 scatterlist: Add support for iomem pages This patch steals another bit from the page_link field to indicate the sg points to iomem. In sg_copy_buffer we use this flag to select between memcpy and iomemcpy. Other sg_miter users will get an WARN_ON unless they indicate they support iomemory by setting the SG_MITER_IOMEM flag. Also added are sg_kmap functions which would replace a common pattern of kmap(sg_page(sg)). These new functions then also warn if the caller tries to map io memory. Another option may be to automatically copy the iomem to a new page and return that transparently to the driver. Another coccinelle patch would then be done to convert kmap(sg_page(sg)) instances to the appropriate sg_kmap calls. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> sg_flags |= SG_MITER_FROM_SG; @@ -668,10 +671,17 @@ size_t sg_copy_buffer(struct scatterlist *sgl, unsigned int nents, void *buf, len = min(miter.length, buflen - offset); - if (to_buffer) - memcpy(buf + offset, miter.addr, len); - else - memcpy(miter.addr, buf + offset, len); + if (sg_is_iomem(miter.piter.sg)) { + if (to_buffer) + memcpy_fromio(buf + offset, miter.addr, len); + else + memcpy_toio(miter.addr, buf + offset, len); + } else { + if (to_buffer) + memcpy(buf + offset, miter.addr, len); + else + memcpy(miter.addr, buf + offset, len); + } offset += len; }