diff mbox

[29/31] parisc: handle page-less SG entries

Message ID 1439363150-8661-30-git-send-email-hch@lst.de (mailing list archive)
State Not Applicable
Delegated to: Dan Williams
Headers show

Commit Message

Christoph Hellwig Aug. 12, 2015, 7:05 a.m. UTC
Make all cache invalidation conditional on sg_has_page() and use
sg_phys to get the physical address directly.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/parisc/kernel/pci-dma.c | 29 ++++++++++++++++++-----------
 1 file changed, 18 insertions(+), 11 deletions(-)

Comments

Linus Torvalds Aug. 12, 2015, 4:01 p.m. UTC | #1
On Wed, Aug 12, 2015 at 12:05 AM, Christoph Hellwig <hch@lst.de> wrote:
> Make all cache invalidation conditional on sg_has_page() and use
> sg_phys to get the physical address directly.

So this worries me a bit (I'm just reacting to one random patch in the series).

The reason?

I think this wants a big honking comment somewhere saying "non-sg_page
accesses are not necessarily cache coherent").

Now, I don't think that's _wrong_, but it's an important distinction:
if you look up pages in the page tables directly, there's a very
subtle difference between then saving just the pfn and saving the
"struct page" of the result.

On sane architectures, this whole cache flushing thing doesn't matter.
Which just means that it's going to be even more subtle on the odd
broken ones..

I'm assuming that anybody who wants to use the page-less
scatter-gather lists always does so on memory that isn't actually
virtually mapped at all, or only does so on sane architectures that
are cache coherent at a physical level, but I'd like that assumption
*documented* somewhere.

(And maybe it is, and I just didn't get to that patch yet)

                   Linus
Christoph Hellwig Aug. 13, 2015, 2:31 p.m. UTC | #2
On Wed, Aug 12, 2015 at 09:01:02AM -0700, Linus Torvalds wrote:
> I'm assuming that anybody who wants to use the page-less
> scatter-gather lists always does so on memory that isn't actually
> virtually mapped at all, or only does so on sane architectures that
> are cache coherent at a physical level, but I'd like that assumption
> *documented* somewhere.

It's temporarily mapped by kmap-like helpers.  That code isn't in
this series. The most recent version of it is here:

https://git.kernel.org/cgit/linux/kernel/git/djbw/nvdimm.git/commit/?h=pfn&id=de8237c99fdb4352be2193f3a7610e902b9bb2f0

note that it's not doing the cache flushing it would have to do yet, but
it's also only enabled for x86 at the moment.
Dan Williams Aug. 14, 2015, 3:30 a.m. UTC | #3
On Thu, Aug 13, 2015 at 7:31 AM, Christoph Hellwig <hch@lst.de> wrote:
> On Wed, Aug 12, 2015 at 09:01:02AM -0700, Linus Torvalds wrote:
>> I'm assuming that anybody who wants to use the page-less
>> scatter-gather lists always does so on memory that isn't actually
>> virtually mapped at all, or only does so on sane architectures that
>> are cache coherent at a physical level, but I'd like that assumption
>> *documented* somewhere.
>
> It's temporarily mapped by kmap-like helpers.  That code isn't in
> this series. The most recent version of it is here:
>
> https://git.kernel.org/cgit/linux/kernel/git/djbw/nvdimm.git/commit/?h=pfn&id=de8237c99fdb4352be2193f3a7610e902b9bb2f0
>
> note that it's not doing the cache flushing it would have to do yet, but
> it's also only enabled for x86 at the moment.

For virtually tagged caches I assume we would temporarily map with
kmap_atomic_pfn_t(), similar to how drm_clflush_pages() implements
powerpc support.  However with DAX we could end up with multiple
virtual aliases for a page-less pfn.
James Bottomley Aug. 14, 2015, 3:59 a.m. UTC | #4
On Thu, 2015-08-13 at 20:30 -0700, Dan Williams wrote:
> On Thu, Aug 13, 2015 at 7:31 AM, Christoph Hellwig <hch@lst.de> wrote:
> > On Wed, Aug 12, 2015 at 09:01:02AM -0700, Linus Torvalds wrote:
> >> I'm assuming that anybody who wants to use the page-less
> >> scatter-gather lists always does so on memory that isn't actually
> >> virtually mapped at all, or only does so on sane architectures that
> >> are cache coherent at a physical level, but I'd like that assumption
> >> *documented* somewhere.
> >
> > It's temporarily mapped by kmap-like helpers.  That code isn't in
> > this series. The most recent version of it is here:
> >
> > https://git.kernel.org/cgit/linux/kernel/git/djbw/nvdimm.git/commit/?h=pfn&id=de8237c99fdb4352be2193f3a7610e902b9bb2f0
> >
> > note that it's not doing the cache flushing it would have to do yet, but
> > it's also only enabled for x86 at the moment.
> 
> For virtually tagged caches I assume we would temporarily map with
> kmap_atomic_pfn_t(), similar to how drm_clflush_pages() implements
> powerpc support.  However with DAX we could end up with multiple
> virtual aliases for a page-less pfn.

At least on some PA architectures, you have to be very careful.
Improperly managed, multiple aliases will cause the system to crash
(actually a machine check in the cache chequerboard). For the most
temperamental systems, we need the cache line flushed and the alias
mapping ejected from the TLB cache before we access the same page at an
inequivalent alias.

James
David Miller Aug. 14, 2015, 4:11 a.m. UTC | #5
From: James Bottomley <James.Bottomley@HansenPartnership.com>
Date: Thu, 13 Aug 2015 20:59:20 -0700

> On Thu, 2015-08-13 at 20:30 -0700, Dan Williams wrote:
>> On Thu, Aug 13, 2015 at 7:31 AM, Christoph Hellwig <hch@lst.de> wrote:
>> > On Wed, Aug 12, 2015 at 09:01:02AM -0700, Linus Torvalds wrote:
>> >> I'm assuming that anybody who wants to use the page-less
>> >> scatter-gather lists always does so on memory that isn't actually
>> >> virtually mapped at all, or only does so on sane architectures that
>> >> are cache coherent at a physical level, but I'd like that assumption
>> >> *documented* somewhere.
>> >
>> > It's temporarily mapped by kmap-like helpers.  That code isn't in
>> > this series. The most recent version of it is here:
>> >
>> > https://git.kernel.org/cgit/linux/kernel/git/djbw/nvdimm.git/commit/?h=pfn&id=de8237c99fdb4352be2193f3a7610e902b9bb2f0
>> >
>> > note that it's not doing the cache flushing it would have to do yet, but
>> > it's also only enabled for x86 at the moment.
>> 
>> For virtually tagged caches I assume we would temporarily map with
>> kmap_atomic_pfn_t(), similar to how drm_clflush_pages() implements
>> powerpc support.  However with DAX we could end up with multiple
>> virtual aliases for a page-less pfn.
> 
> At least on some PA architectures, you have to be very careful.
> Improperly managed, multiple aliases will cause the system to crash
> (actually a machine check in the cache chequerboard). For the most
> temperamental systems, we need the cache line flushed and the alias
> mapping ejected from the TLB cache before we access the same page at an
> inequivalent alias.

Also, I want to mention that on sparc64 we manage the cache aliasing
state in the page struct.

Until a page is mapped into userspace, we just record the most recent
cpu to store into that page with kernel side mappings.  Once the page
ends up being mapped or the cpu doing kernel side stores changes, we
actually perform the cache flush.

Generally speaking, I think that all actual physical memory the kernel
operates on should have a struct page backing it.  So this whole
discussion of operating on physical memory in scatter lists without
backing page structs feels really foreign to me.
Dan Williams Aug. 14, 2015, 4:17 p.m. UTC | #6
On Thu, Aug 13, 2015 at 9:11 PM, David Miller <davem@davemloft.net> wrote:
> From: James Bottomley <James.Bottomley@HansenPartnership.com>
>> At least on some PA architectures, you have to be very careful.
>> Improperly managed, multiple aliases will cause the system to crash
>> (actually a machine check in the cache chequerboard). For the most
>> temperamental systems, we need the cache line flushed and the alias
>> mapping ejected from the TLB cache before we access the same page at an
>> inequivalent alias.
>
> Also, I want to mention that on sparc64 we manage the cache aliasing
> state in the page struct.
>
> Until a page is mapped into userspace, we just record the most recent
> cpu to store into that page with kernel side mappings.  Once the page
> ends up being mapped or the cpu doing kernel side stores changes, we
> actually perform the cache flush.
>
> Generally speaking, I think that all actual physical memory the kernel
> operates on should have a struct page backing it.  So this whole
> discussion of operating on physical memory in scatter lists without
> backing page structs feels really foreign to me.

So the only way for page-less pfns to enter the system is through the
->direct_access() method provided by a pmem device's struct
block_device_operations.  Architectures that require struct page for
cache management to must disable ->direct_access() in this case.

If an arch still wants to support pmem+DAX then it needs something
like this patchset (feedback welcome) to map pmem pfns:

https://lkml.org/lkml/2015/8/12/970

Effectively this would disable ->direct_access() on /dev/pmem0, but
permit ->direct_access() on /dev/pmem0m.
diff mbox

Patch

diff --git a/arch/parisc/kernel/pci-dma.c b/arch/parisc/kernel/pci-dma.c
index b9402c9..6cad0e0 100644
--- a/arch/parisc/kernel/pci-dma.c
+++ b/arch/parisc/kernel/pci-dma.c
@@ -483,11 +483,13 @@  static int pa11_dma_map_sg(struct device *dev, struct scatterlist *sglist, int n
 	BUG_ON(direction == DMA_NONE);
 
 	for_each_sg(sglist, sg, nents, i) {
-		unsigned long vaddr = (unsigned long)sg_virt(sg);
-
-		sg_dma_address(sg) = (dma_addr_t) virt_to_phys(vaddr);
+		sg_dma_address(sg) = sg_phys(sg);
 		sg_dma_len(sg) = sg->length;
-		flush_kernel_dcache_range(vaddr, sg->length);
+
+		if (sg_has_page(sg)) {
+			flush_kernel_dcache_range((unsigned long)sg_virt(sg),
+						  sg->length);
+		}
 	}
 	return nents;
 }
@@ -504,9 +506,10 @@  static void pa11_dma_unmap_sg(struct device *dev, struct scatterlist *sglist, in
 
 	/* once we do combining we'll need to use phys_to_virt(sg_dma_address(sglist)) */
 
-	for_each_sg(sglist, sg, nents, i)
-		flush_kernel_vmap_range(sg_virt(sg), sg->length);
-	return;
+	for_each_sg(sglist, sg, nents, i) {
+		if (sg_has_page(sg))
+			flush_kernel_vmap_range(sg_virt(sg), sg->length);
+	}
 }
 
 static void pa11_dma_sync_single_for_cpu(struct device *dev, dma_addr_t dma_handle, unsigned long offset, size_t size, enum dma_data_direction direction)
@@ -530,8 +533,10 @@  static void pa11_dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sgl
 
 	/* once we do combining we'll need to use phys_to_virt(sg_dma_address(sglist)) */
 
-	for_each_sg(sglist, sg, nents, i)
-		flush_kernel_vmap_range(sg_virt(sg), sg->length);
+	for_each_sg(sglist, sg, nents, i) {
+		if (sg_has_page(sg))
+			flush_kernel_vmap_range(sg_virt(sg), sg->length);
+	}
 }
 
 static void pa11_dma_sync_sg_for_device(struct device *dev, struct scatterlist *sglist, int nents, enum dma_data_direction direction)
@@ -541,8 +546,10 @@  static void pa11_dma_sync_sg_for_device(struct device *dev, struct scatterlist *
 
 	/* once we do combining we'll need to use phys_to_virt(sg_dma_address(sglist)) */
 
-	for_each_sg(sglist, sg, nents, i)
-		flush_kernel_vmap_range(sg_virt(sg), sg->length);
+	for_each_sg(sglist, sg, nents, i) {
+		if (sg_has_page(sg))
+			flush_kernel_vmap_range(sg_virt(sg), sg->length);
+	}
 }
 
 struct hppa_dma_ops pcxl_dma_ops = {