Message ID | 8a428278-fe0f-ca1f-2b6a-96fdbb363db4@arm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 12/01/17 18:07, Robin Murphy wrote: > On 12/01/17 17:15, Vladimir Murzin wrote: >> On 12/01/17 17:04, Robin Murphy wrote: >>> On 12/01/17 16:52, Vladimir Murzin wrote: >>>> On 12/01/17 10:55, Benjamin Gaignard wrote: >>>>> 2017-01-12 11:35 GMT+01:00 Benjamin Gaignard <benjamin.gaignard@linaro.org>: >>>>>> 2017-01-11 15:34 GMT+01:00 Vladimir Murzin <vladimir.murzin@arm.com>: >>>>>>> On 11/01/17 13:17, Benjamin Gaignard wrote: >>>>>>>> 2017-01-10 15:18 GMT+01:00 Vladimir Murzin <vladimir.murzin@arm.com>: >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> It seem that addition of cache support for M-class cpus uncovered >>>>>>>>> latent bug in DMA usage. NOMMU memory model has been treated as being >>>>>>>>> always consistent; however, for R/M classes of cpu memory can be >>>>>>>>> covered by MPU which in turn might configure RAM as Normal >>>>>>>>> i.e. bufferable and cacheable. It breaks dma_alloc_coherent() and >>>>>>>>> friends, since data can stuck in caches now or be buffered. >>>>>>>>> >>>>>>>>> This patch set is trying to address the issue by providing region of >>>>>>>>> memory suitable for consistent DMA operations. It is supposed that >>>>>>>>> such region is marked by MPU as non-cacheable. Robin suggested to >>>>>>>>> advertise such memory as reserved shared-dma-pool, rather then using >>>>>>>>> homebrew command line option, and extend dma-coherent to provide >>>>>>>>> default DMA area in the similar way as it is done for CMA (PATCH >>>>>>>>> 2/5). It allows us to offload all bookkeeping on generic coherent DMA >>>>>>>>> framework, and it is seems that it might be reused by other >>>>>>>>> architectures like c6x and blackfin. >>>>>>>>> >>>>>>>>> Dedicated DMA region is required for cases other than: >>>>>>>>> - MMU/MPU is off >>>>>>>>> - cpu is v7m w/o cache support >>>>>>>>> - device is coherent >>>>>>>>> >>>>>>>>> In case one of the above conditions is true dma operations are forced >>>>>>>>> to be coherent and wired with dma_noop_ops. >>>>>>>>> >>>>>>>>> To make life easier NOMMU dma operations are kept in separate >>>>>>>>> compilation unit. >>>>>>>>> >>>>>>>>> Since the issue was reported in the same time as Benjamin sent his >>>>>>>>> patch [1] to allow mmap for NOMMU, his case is also addressed in this >>>>>>>>> series (PATCH 1/5 and PATCH 3/5). >>>>>>>>> >>>>>>>>> Thanks! >>>>>>>> >>>>>>>> I have tested this v4 on my setup (stm32f4, no cache, no MPU) and unfortunately >>>>>>>> it doesn't work with my drm/kms driver. >>>>>>> >>>>>>> I guess the same is for fbmem, but would be better to have confirmation since >>>>>>> amba-clcd I use has not been ported to drm/kms (yet), so I can't test. >>>>>>> >>>>>>>> I haven't any errors but nothing is displayed unlike what I have when >>>>>>>> using current dma-mapping >>>>>>>> code. >>>>>>>> I guess the issue is coming from dma-noop where __get_free_pages() is >>>>>>>> used instead of alloc_pages() >>>>>>>> in dma-mapping. >>>>>>> >>>>>>> Unless I've missed something bellow is a call stack for both >>>>>>> >>>>>>> #1 >>>>>>> __alloc_simple_buffer >>>>>>> __dma_alloc_buffer >>>>>>> alloc_pages >>>>>>> split_page >>>>>>> __dma_clear_buffer >>>>>>> memset >>>>>>> page_address >>>>>>> >>>>>>> #2 >>>>>>> __get_free_pages >>>>>>> alloc_pages >>>>>>> page_address >>>>>>> >>>>>>> So the difference is that nommu case in dma-mapping.c memzeros memory, handles >>>>>>> DMA_ATTR_NO_KERNEL_MAPPING and does optimisation of memory usage. >>>>>>> >>>>>>> Is something from above critical for your driver? >>>>>> >>>>>> I have removed all the diff (split_page, __dma_clear_buffer, memset) >>>>>> from #1 and it is still working. >>>>>> DMA_ATTR_NO_KERNEL_MAPPING flag is not set when allocating the buffer. >>>>>> >>>>>> I have investigated more and found that dma-noop doesn't take care of >>>>>> "dma-ranges" property which is set in DT. >>>>>> I believed that is the root cause of my problem with your patches. >>>>> >>>>> After testing changing virt_to_phys to virt_to_dma in dma-noop.c fix the issue >>>>> modetest and fbdemo are now still functional. >>>>> >>>> >>>> Thanks for narrowing it down! I did not noticed that stm32f4 remap its memory, >>>> so dma-ranges property is in use. >>>> >>>> It looks like virt_to_dma is ARM specific, so I probably have to discard idea >>>> of reusing dma-noop-ops and switch logic into dma-mapping-nommu.c based on >>>> is_device_dma_coherent(dev) check. >>> >>> dma_pfn_offset is a member of struct device, so it should be OK for >>> dma_noop_ops to also make reference to it (and assume it's zero if not >>> explicitly set). >>> >>>> Meanwhile, I'm quite puzzled on how such memory remaping should work together >>>> with reserved memory. It seem it doesn't account dma-ranges while reserving >>>> memory (it is too early) nor while allocating/mapping/etc. >>> >>> The reserved memory is described in terms of CPU physical addresses, so >>> a device offset shouldn't matter from that perspective. It only comes >>> into play at the point you generate the dma_addr_t to hand off to the >>> device - only then do you need to transform the CPU physical address of >>> the allocated/mapped page into the device's view of that page (i.e. >>> subtract the offset). >> >> Thanks for explanation! So dma-coherent.c should be modified, right? I see >> that some architectures provide phys_to_dma/dma_to_phys helpers primary for >> swiotlb, is it safe to reuse them given that default implementation is >> provided? Nothing under Documentation explains how they supposed to be used, >> sorry if asking stupid question. > > Those are essentially SWIOTLB-specific, so can't be universally relied > upon. I think something like this ought to suffice: Yup, but what about dma-coherent.c? Currently it has int dma_alloc_from_coherent(struct device *dev, ssize_t size, dma_addr_t *dma_handle, void **ret) { ... *dma_handle = mem->device_base + (pageno << PAGE_SHIFT); *ret = mem->virt_base + (pageno << PAGE_SHIFT); ... } In case reserved memory is described in terms of CPU phys addresses, would not we need to take into account dma_pfn_offset? What I'm missing? Thanks Vladimir > > ---8<--- > diff --git a/lib/dma-noop.c b/lib/dma-noop.c > index 3d766e78fbe2..fbb1b37750d5 100644 > --- a/lib/dma-noop.c > +++ b/lib/dma-noop.c > @@ -8,6 +8,11 @@ > #include <linux/dma-mapping.h> > #include <linux/scatterlist.h> > > +static dma_addr_t dma_noop_dev_offset(struct device *dev) > +{ > + return (dma_addr_t)dev->dma_pfn_offset << PAGE_SHIFT; > +} > + > static void *dma_noop_alloc(struct device *dev, size_t size, > dma_addr_t *dma_handle, gfp_t gfp, > unsigned long attrs) > @@ -16,7 +21,7 @@ static void *dma_noop_alloc(struct device *dev, size_t > size, > > ret = (void *)__get_free_pages(gfp, get_order(size)); > if (ret) > - *dma_handle = virt_to_phys(ret); > + *dma_handle = virt_to_phys(ret) - dma_noop_dev_offset(dev); > return ret; > } > > @@ -32,7 +37,7 @@ static dma_addr_t dma_noop_map_page(struct device > *dev, struct page *page, > enum dma_data_direction dir, > unsigned long attrs) > { > - return page_to_phys(page) + offset; > + return page_to_phys(page) + offset - dma_noop_dev_offset(dev); > } > > static int dma_noop_map_sg(struct device *dev, struct scatterlist *sgl, > int nents, > @@ -47,7 +52,8 @@ static int dma_noop_map_sg(struct device *dev, struct > scatterlist *sgl, int nent > > BUG_ON(!sg_page(sg)); > va = sg_virt(sg); > - sg_dma_address(sg) = (dma_addr_t)virt_to_phys(va); > + sg_dma_address(sg) = (dma_addr_t)virt_to_phys(va) - > + dma_noop_dev_offset(dev); > sg_dma_len(sg) = sg->length; > } > --->8--- > > intentionally whitespace-damaged by copy-pasting off my terminal to > emphasise how utterly untested it is ;) > > Robin. >
On 13/01/17 09:12, Vladimir Murzin wrote: > On 12/01/17 18:07, Robin Murphy wrote: >> On 12/01/17 17:15, Vladimir Murzin wrote: >>> On 12/01/17 17:04, Robin Murphy wrote: >>>> On 12/01/17 16:52, Vladimir Murzin wrote: >>>>> On 12/01/17 10:55, Benjamin Gaignard wrote: >>>>>> 2017-01-12 11:35 GMT+01:00 Benjamin Gaignard <benjamin.gaignard@linaro.org>: >>>>>>> 2017-01-11 15:34 GMT+01:00 Vladimir Murzin <vladimir.murzin@arm.com>: >>>>>>>> On 11/01/17 13:17, Benjamin Gaignard wrote: >>>>>>>>> 2017-01-10 15:18 GMT+01:00 Vladimir Murzin <vladimir.murzin@arm.com>: >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> It seem that addition of cache support for M-class cpus uncovered >>>>>>>>>> latent bug in DMA usage. NOMMU memory model has been treated as being >>>>>>>>>> always consistent; however, for R/M classes of cpu memory can be >>>>>>>>>> covered by MPU which in turn might configure RAM as Normal >>>>>>>>>> i.e. bufferable and cacheable. It breaks dma_alloc_coherent() and >>>>>>>>>> friends, since data can stuck in caches now or be buffered. >>>>>>>>>> >>>>>>>>>> This patch set is trying to address the issue by providing region of >>>>>>>>>> memory suitable for consistent DMA operations. It is supposed that >>>>>>>>>> such region is marked by MPU as non-cacheable. Robin suggested to >>>>>>>>>> advertise such memory as reserved shared-dma-pool, rather then using >>>>>>>>>> homebrew command line option, and extend dma-coherent to provide >>>>>>>>>> default DMA area in the similar way as it is done for CMA (PATCH >>>>>>>>>> 2/5). It allows us to offload all bookkeeping on generic coherent DMA >>>>>>>>>> framework, and it is seems that it might be reused by other >>>>>>>>>> architectures like c6x and blackfin. >>>>>>>>>> >>>>>>>>>> Dedicated DMA region is required for cases other than: >>>>>>>>>> - MMU/MPU is off >>>>>>>>>> - cpu is v7m w/o cache support >>>>>>>>>> - device is coherent >>>>>>>>>> >>>>>>>>>> In case one of the above conditions is true dma operations are forced >>>>>>>>>> to be coherent and wired with dma_noop_ops. >>>>>>>>>> >>>>>>>>>> To make life easier NOMMU dma operations are kept in separate >>>>>>>>>> compilation unit. >>>>>>>>>> >>>>>>>>>> Since the issue was reported in the same time as Benjamin sent his >>>>>>>>>> patch [1] to allow mmap for NOMMU, his case is also addressed in this >>>>>>>>>> series (PATCH 1/5 and PATCH 3/5). >>>>>>>>>> >>>>>>>>>> Thanks! >>>>>>>>> >>>>>>>>> I have tested this v4 on my setup (stm32f4, no cache, no MPU) and unfortunately >>>>>>>>> it doesn't work with my drm/kms driver. >>>>>>>> >>>>>>>> I guess the same is for fbmem, but would be better to have confirmation since >>>>>>>> amba-clcd I use has not been ported to drm/kms (yet), so I can't test. >>>>>>>> >>>>>>>>> I haven't any errors but nothing is displayed unlike what I have when >>>>>>>>> using current dma-mapping >>>>>>>>> code. >>>>>>>>> I guess the issue is coming from dma-noop where __get_free_pages() is >>>>>>>>> used instead of alloc_pages() >>>>>>>>> in dma-mapping. >>>>>>>> >>>>>>>> Unless I've missed something bellow is a call stack for both >>>>>>>> >>>>>>>> #1 >>>>>>>> __alloc_simple_buffer >>>>>>>> __dma_alloc_buffer >>>>>>>> alloc_pages >>>>>>>> split_page >>>>>>>> __dma_clear_buffer >>>>>>>> memset >>>>>>>> page_address >>>>>>>> >>>>>>>> #2 >>>>>>>> __get_free_pages >>>>>>>> alloc_pages >>>>>>>> page_address >>>>>>>> >>>>>>>> So the difference is that nommu case in dma-mapping.c memzeros memory, handles >>>>>>>> DMA_ATTR_NO_KERNEL_MAPPING and does optimisation of memory usage. >>>>>>>> >>>>>>>> Is something from above critical for your driver? >>>>>>> >>>>>>> I have removed all the diff (split_page, __dma_clear_buffer, memset) >>>>>>> from #1 and it is still working. >>>>>>> DMA_ATTR_NO_KERNEL_MAPPING flag is not set when allocating the buffer. >>>>>>> >>>>>>> I have investigated more and found that dma-noop doesn't take care of >>>>>>> "dma-ranges" property which is set in DT. >>>>>>> I believed that is the root cause of my problem with your patches. >>>>>> >>>>>> After testing changing virt_to_phys to virt_to_dma in dma-noop.c fix the issue >>>>>> modetest and fbdemo are now still functional. >>>>>> >>>>> >>>>> Thanks for narrowing it down! I did not noticed that stm32f4 remap its memory, >>>>> so dma-ranges property is in use. >>>>> >>>>> It looks like virt_to_dma is ARM specific, so I probably have to discard idea >>>>> of reusing dma-noop-ops and switch logic into dma-mapping-nommu.c based on >>>>> is_device_dma_coherent(dev) check. >>>> >>>> dma_pfn_offset is a member of struct device, so it should be OK for >>>> dma_noop_ops to also make reference to it (and assume it's zero if not >>>> explicitly set). >>>> >>>>> Meanwhile, I'm quite puzzled on how such memory remaping should work together >>>>> with reserved memory. It seem it doesn't account dma-ranges while reserving >>>>> memory (it is too early) nor while allocating/mapping/etc. >>>> >>>> The reserved memory is described in terms of CPU physical addresses, so >>>> a device offset shouldn't matter from that perspective. It only comes >>>> into play at the point you generate the dma_addr_t to hand off to the >>>> device - only then do you need to transform the CPU physical address of >>>> the allocated/mapped page into the device's view of that page (i.e. >>>> subtract the offset). >>> >>> Thanks for explanation! So dma-coherent.c should be modified, right? I see >>> that some architectures provide phys_to_dma/dma_to_phys helpers primary for >>> swiotlb, is it safe to reuse them given that default implementation is >>> provided? Nothing under Documentation explains how they supposed to be used, >>> sorry if asking stupid question. >> >> Those are essentially SWIOTLB-specific, so can't be universally relied >> upon. I think something like this ought to suffice: > > Yup, but what about dma-coherent.c? Currently it has > > int dma_alloc_from_coherent(struct device *dev, ssize_t size, > dma_addr_t *dma_handle, void **ret) > { > ... > *dma_handle = mem->device_base + (pageno << PAGE_SHIFT); > *ret = mem->virt_base + (pageno << PAGE_SHIFT); > ... > } > > In case reserved memory is described in terms of CPU phys addresses, would not > we need to take into account dma_pfn_offset? What I'm missing? Ah yes, I overlooked that one. AFAICS, that's intended to be accounted for when calling dma_init_coherent_memory (i.e. phys_addr vs. device_addr), but that's a bit awkward for a global pool. How utterly disgusting do you think this (or some variant thereof) looks? /* Apply device-specific offset for the global pool */ if (mem == dma_coherent_default_memory) *handle += dev->dma_pfn_offset << PAGE_SHIFT; Robin. > Thanks > Vladimir > >> >> ---8<--- >> diff --git a/lib/dma-noop.c b/lib/dma-noop.c >> index 3d766e78fbe2..fbb1b37750d5 100644 >> --- a/lib/dma-noop.c >> +++ b/lib/dma-noop.c >> @@ -8,6 +8,11 @@ >> #include <linux/dma-mapping.h> >> #include <linux/scatterlist.h> >> >> +static dma_addr_t dma_noop_dev_offset(struct device *dev) >> +{ >> + return (dma_addr_t)dev->dma_pfn_offset << PAGE_SHIFT; >> +} >> + >> static void *dma_noop_alloc(struct device *dev, size_t size, >> dma_addr_t *dma_handle, gfp_t gfp, >> unsigned long attrs) >> @@ -16,7 +21,7 @@ static void *dma_noop_alloc(struct device *dev, size_t >> size, >> >> ret = (void *)__get_free_pages(gfp, get_order(size)); >> if (ret) >> - *dma_handle = virt_to_phys(ret); >> + *dma_handle = virt_to_phys(ret) - dma_noop_dev_offset(dev); >> return ret; >> } >> >> @@ -32,7 +37,7 @@ static dma_addr_t dma_noop_map_page(struct device >> *dev, struct page *page, >> enum dma_data_direction dir, >> unsigned long attrs) >> { >> - return page_to_phys(page) + offset; >> + return page_to_phys(page) + offset - dma_noop_dev_offset(dev); >> } >> >> static int dma_noop_map_sg(struct device *dev, struct scatterlist *sgl, >> int nents, >> @@ -47,7 +52,8 @@ static int dma_noop_map_sg(struct device *dev, struct >> scatterlist *sgl, int nent >> >> BUG_ON(!sg_page(sg)); >> va = sg_virt(sg); >> - sg_dma_address(sg) = (dma_addr_t)virt_to_phys(va); >> + sg_dma_address(sg) = (dma_addr_t)virt_to_phys(va) - >> + dma_noop_dev_offset(dev); >> sg_dma_len(sg) = sg->length; >> } >> --->8--- >> >> intentionally whitespace-damaged by copy-pasting off my terminal to >> emphasise how utterly untested it is ;) >> >> Robin. >> >
diff --git a/lib/dma-noop.c b/lib/dma-noop.c index 3d766e78fbe2..fbb1b37750d5 100644 --- a/lib/dma-noop.c +++ b/lib/dma-noop.c @@ -8,6 +8,11 @@ #include <linux/dma-mapping.h> #include <linux/scatterlist.h> +static dma_addr_t dma_noop_dev_offset(struct device *dev) +{ + return (dma_addr_t)dev->dma_pfn_offset << PAGE_SHIFT; +} + static void *dma_noop_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs) @@ -16,7 +21,7 @@ static void *dma_noop_alloc(struct device *dev, size_t size, ret = (void *)__get_free_pages(gfp, get_order(size)); if (ret) - *dma_handle = virt_to_phys(ret); + *dma_handle = virt_to_phys(ret) - dma_noop_dev_offset(dev); return ret; }