diff mbox

[v2,1/1] block: blk-merge: don't merge the pages with non-contiguous descriptors

Message ID 1358332355.2384.11.camel@dabdike.int.hansenpartnership.com (mailing list archive)
State New, archived
Headers show

Commit Message

James Bottomley Jan. 16, 2013, 10:32 a.m. UTC
On Wed, 2013-01-16 at 12:07 +0530, Subhash Jadavani wrote:

> Now consider this call stack from MMC block driver (this is on the ARmv7 
> based board):
>      [   98.918174] [<c001b50c>] (v7_dma_inv_range+0x30/0x48) from 
> [<c0017b8c>] (dma_cache_maint_page+0x1c4/0x24c)
>      [   98.927819] [<c0017b8c>] (dma_cache_maint_page+0x1c4/0x24c) from 
> [<c0017c28>] (___dma_page_cpu_to_dev+0x14/0x1c)
>      [   98.937982] [<c0017c28>] (___dma_page_cpu_to_dev+0x14/0x1c) from 
> [<c0017ff8>] (dma_map_sg+0x3c/0x114)

OK, so this is showing that ARM itself is making the assumption that the
pages are contiguous in the page offset map.

Fix this by doing the increment via the pfn, which will do the right
thing whatever the memory model.

Signed-off-by: James Bottomley <JBottomley@Parallels.com>

---

Comments

subhashj@codeaurora.org Jan. 16, 2013, 12:39 p.m. UTC | #1
On 1/16/2013 4:02 PM, James Bottomley wrote:
> On Wed, 2013-01-16 at 12:07 +0530, Subhash Jadavani wrote:
>
>> Now consider this call stack from MMC block driver (this is on the ARmv7
>> based board):
>>       [   98.918174] [<c001b50c>] (v7_dma_inv_range+0x30/0x48) from
>> [<c0017b8c>] (dma_cache_maint_page+0x1c4/0x24c)
>>       [   98.927819] [<c0017b8c>] (dma_cache_maint_page+0x1c4/0x24c) from
>> [<c0017c28>] (___dma_page_cpu_to_dev+0x14/0x1c)
>>       [   98.937982] [<c0017c28>] (___dma_page_cpu_to_dev+0x14/0x1c) from
>> [<c0017ff8>] (dma_map_sg+0x3c/0x114)
> OK, so this is showing that ARM itself is making the assumption that the
> pages are contiguous in the page offset map.
>
> Fix this by doing the increment via the pfn, which will do the right
> thing whatever the memory model.
>
> Signed-off-by: James Bottomley <JBottomley@Parallels.com>

Thanks James. Yes, it make sense to fix the ARM code itself if it is the 
only one giving this trouble.
I have tried your change below and it also fixes this issue (without 
having my blk-merge patch). I will forward your change to Russel King to 
see what he thinks about it.

Regards,
Subhash
>
> ---
>
> diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
> index 6b2fb87..ab88c5b 100644
> --- a/arch/arm/mm/dma-mapping.c
> +++ b/arch/arm/mm/dma-mapping.c
> @@ -809,7 +809,7 @@ static void dma_cache_maint_page(struct page *page, unsigned long offset,
>   			op(vaddr, len, dir);
>   		}
>   		offset = 0;
> -		page++;
> +		page = pfn_to_page(page_to_pfn(page) + 1);
>   		left -= len;
>   	} while (left);
>   }
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
subhashj@codeaurora.org Jan. 16, 2013, 12:47 p.m. UTC | #2
Hi Russell,

Is it possible to pick up James patch below? Thread here: 
http://comments.gmane.org/gmane.linux.kernel.mmc/18670, have the details 
on the motivation behind this fix.

Regards,
Subhash

-------- Original Message --------
Subject: 	Re: [PATCH v2 1/1] block: blk-merge: don't merge the pages 
with non-contiguous descriptors
Date: 	Wed, 16 Jan 2013 18:09:14 +0530
From: 	Subhash Jadavani <subhashj@codeaurora.org>
To: 	James Bottomley <James.Bottomley@HansenPartnership.com>
CC: 	linux-kernel@vger.kernel.org, linux-scsi@vger.kernel.org, 
linux-mmc@vger.kernel.org, linux-arm-msm@vger.kernel.org, 
martin.petersen@oracle.com, asias@redhat.com, tj@kernel.org, 
linux-arm-kernel@lists.infradead.org, Russell King <linux@arm.linux.org.uk>



On 1/16/2013 4:02 PM, James Bottomley wrote:
> On Wed, 2013-01-16 at 12:07 +0530, Subhash Jadavani wrote:
>
>> Now consider this call stack from MMC block driver (this is on the ARmv7
>> based board):
>>       [   98.918174] [<c001b50c>] (v7_dma_inv_range+0x30/0x48) from
>> [<c0017b8c>] (dma_cache_maint_page+0x1c4/0x24c)
>>       [   98.927819] [<c0017b8c>] (dma_cache_maint_page+0x1c4/0x24c) from
>> [<c0017c28>] (___dma_page_cpu_to_dev+0x14/0x1c)
>>       [   98.937982] [<c0017c28>] (___dma_page_cpu_to_dev+0x14/0x1c) from
>> [<c0017ff8>] (dma_map_sg+0x3c/0x114)
> OK, so this is showing that ARM itself is making the assumption that the
> pages are contiguous in the page offset map.
>
> Fix this by doing the increment via the pfn, which will do the right
> thing whatever the memory model.
>
> Signed-off-by: James Bottomley <JBottomley@Parallels.com>

Thanks James. Yes, it make sense to fix the ARM code itself if it is the
only one giving this trouble.
I have tried your change below and it also fixes this issue (without
having my blk-merge patch). I will forward your change to Russel King to
see what he thinks about it.

Regards,
Subhash
>
> ---
>
> diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
> index 6b2fb87..ab88c5b 100644
> --- a/arch/arm/mm/dma-mapping.c
> +++ b/arch/arm/mm/dma-mapping.c
> @@ -809,7 +809,7 @@ static void dma_cache_maint_page(struct page *page, unsigned long offset,
>   			op(vaddr, len, dir);
>   		}
>   		offset = 0;
> -		page++;
> +		page = pfn_to_page(page_to_pfn(page) + 1);
>   		left -= len;
>   	} while (left);
>   }
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
James Bottomley Jan. 16, 2013, 12:51 p.m. UTC | #3
On Wed, 2013-01-16 at 18:17 +0530, Subhash Jadavani wrote:
> Is it possible to pick up James patch below? Thread here: 
> http://comments.gmane.org/gmane.linux.kernel.mmc/18670, have the details 
> on the motivation behind this fix.

Someone should also audit the arm kernel code for more of these linear
page array assumptions.  I'm guessing that when sparsemem was added to
arm over a year ago, whoever did it either didn't audit or missed a few.

James
Russell King - ARM Linux Jan. 16, 2013, 1:08 p.m. UTC | #4
On Wed, Jan 16, 2013 at 12:51:55PM +0000, James Bottomley wrote:
> On Wed, 2013-01-16 at 18:17 +0530, Subhash Jadavani wrote:
> > Is it possible to pick up James patch below? Thread here: 
> > http://comments.gmane.org/gmane.linux.kernel.mmc/18670, have the details 
> > on the motivation behind this fix.
> 
> Someone should also audit the arm kernel code for more of these linear
> page array assumptions.  I'm guessing that when sparsemem was added to
> arm over a year ago, whoever did it either didn't audit or missed a few.

No, that's a bad assumption.  We've had discontigmem for years - maybe
something like 12 years.  I switched everything over to sparsemem, and
sparsemem has been used on ARM for years too:

	commit 05944d74bc28fffbcce159cb915d0acff82f30a1
	Author: Russell King <rmk@dyn-67.arm.linux.org.uk>
	Date:   Thu Nov 30 20:43:51 2006 +0000

	    [ARM] Add initial sparsemem support

	    Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

However, there's a big problem with this: very few of the lead people
have machines which suffer from this disability, so there's very little
testing of it - and there's very little testing of new code with it.

The patch which originally introduced this code which your patch
touches was part of adding highmem support to ARM:

	commit 43377453af83b8ff8c1c731da1508bd6b84ebfea
	Author: Nicolas Pitre <nico@cam.org>
	Date:   Thu Mar 12 22:52:09 2009 -0400

	    [ARM] introduce dma_cache_maint_page()

	    This is a helper to be used by the DMA mapping API to handle cache
	    maintenance for memory identified by a page structure instead of a
	    virtual address.  Those pages may or may not be highmem pages, and
	    when they're highmem pages, they may or may not be virtually mapped.
	    When they're not mapped then there is no L1 cache to worry about. But
	    even in that case the L2 cache must be processed since unmapped highmem
	    pages can still be L2 cached.

	    Signed-off-by: Nicolas Pitre <nico@marvell.com>

some three years later, and has been through a number of revisions since.

I'd really like to get rid of sparsemem so it's one less failure case, but
alas there's a relatively small bunch of folk who rely upon it.  That
means it's always going to be more buggy.
James Bottomley Jan. 16, 2013, 3:50 p.m. UTC | #5
On Wed, 2013-01-16 at 13:08 +0000, Russell King - ARM Linux wrote:
> On Wed, Jan 16, 2013 at 12:51:55PM +0000, James Bottomley wrote:
> > On Wed, 2013-01-16 at 18:17 +0530, Subhash Jadavani wrote:
> > > Is it possible to pick up James patch below? Thread here: 
> > > http://comments.gmane.org/gmane.linux.kernel.mmc/18670, have the details 
> > > on the motivation behind this fix.
> > 
> > Someone should also audit the arm kernel code for more of these linear
> > page array assumptions.  I'm guessing that when sparsemem was added to
> > arm over a year ago, whoever did it either didn't audit or missed a few.
> 
> No, that's a bad assumption.  We've had discontigmem for years - maybe
> something like 12 years.

Discontigmem doesn't suffer from this particular problem: The breaks in
the page arrays in a discontigmem environment represent real breaks in
the physical memory map, so there can never arise a situation where
page++ wouldn't get you the next valid page, provided there's still a
real contiguous page of physical memory present in the system, of
course.

Sparsemem tried to be cleverer and it broke that assumption.  I'm
starting to wonder if there are other places in the kernel which make it
and get broken by sparsemem.

I have patches for parisc to switch it over to sparsemem from
discontigmem ... I haven't pushed them yet, fortunately, and I certainly
didn't check our underlying assumptions for this problem.

>   I switched everything over to sparsemem, and
> sparsemem has been used on ARM for years too:
> 
> 	commit 05944d74bc28fffbcce159cb915d0acff82f30a1
> 	Author: Russell King <rmk@dyn-67.arm.linux.org.uk>
> 	Date:   Thu Nov 30 20:43:51 2006 +0000
> 
> 	    [ARM] Add initial sparsemem support
> 
> 	    Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
> 
> However, there's a big problem with this: very few of the lead people
> have machines which suffer from this disability, so there's very little
> testing of it - and there's very little testing of new code with it.

Anyone can test this just by enabling sparsemem with an artificially
small SECTION_SIZE_BITS (that makes it much more likely you'll hit a
boundary).

> The patch which originally introduced this code which your patch
> touches was part of adding highmem support to ARM:
> 
> 	commit 43377453af83b8ff8c1c731da1508bd6b84ebfea
> 	Author: Nicolas Pitre <nico@cam.org>
> 	Date:   Thu Mar 12 22:52:09 2009 -0400
> 
> 	    [ARM] introduce dma_cache_maint_page()
> 
> 	    This is a helper to be used by the DMA mapping API to handle cache
> 	    maintenance for memory identified by a page structure instead of a
> 	    virtual address.  Those pages may or may not be highmem pages, and
> 	    when they're highmem pages, they may or may not be virtually mapped.
> 	    When they're not mapped then there is no L1 cache to worry about. But
> 	    even in that case the L2 cache must be processed since unmapped highmem
> 	    pages can still be L2 cached.
> 
> 	    Signed-off-by: Nicolas Pitre <nico@marvell.com>
> 
> some three years later, and has been through a number of revisions since.
> 
> I'd really like to get rid of sparsemem so it's one less failure case, but
> alas there's a relatively small bunch of folk who rely upon it.  That
> means it's always going to be more buggy.

Heh, I'd like to do that too on parisc; the problem is that the memory
holes in our platform are just to big to use a linear array on.  And now
that the slub people want DISCONTIGMEM deprecated, there's not much
choice.

James
Russell King - ARM Linux Jan. 16, 2013, 11:14 p.m. UTC | #6
On Wed, Jan 16, 2013 at 10:32:35AM +0000, James Bottomley wrote:
> On Wed, 2013-01-16 at 12:07 +0530, Subhash Jadavani wrote:
> 
> > Now consider this call stack from MMC block driver (this is on the ARmv7 
> > based board):
> >      [   98.918174] [<c001b50c>] (v7_dma_inv_range+0x30/0x48) from 
> > [<c0017b8c>] (dma_cache_maint_page+0x1c4/0x24c)
> >      [   98.927819] [<c0017b8c>] (dma_cache_maint_page+0x1c4/0x24c) from 
> > [<c0017c28>] (___dma_page_cpu_to_dev+0x14/0x1c)
> >      [   98.937982] [<c0017c28>] (___dma_page_cpu_to_dev+0x14/0x1c) from 
> > [<c0017ff8>] (dma_map_sg+0x3c/0x114)
> 
> OK, so this is showing that ARM itself is making the assumption that the
> pages are contiguous in the page offset map.
> 
> Fix this by doing the increment via the pfn, which will do the right
> thing whatever the memory model.
> 
> Signed-off-by: James Bottomley <JBottomley@Parallels.com>

Ok.  What would you like the patch summary line for this to be -
the existing one seems to be a little wrong given the content of
this patch...

> 
> ---
> 
> diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
> index 6b2fb87..ab88c5b 100644
> --- a/arch/arm/mm/dma-mapping.c
> +++ b/arch/arm/mm/dma-mapping.c
> @@ -809,7 +809,7 @@ static void dma_cache_maint_page(struct page *page, unsigned long offset,
>  			op(vaddr, len, dir);
>  		}
>  		offset = 0;
> -		page++;
> +		page = pfn_to_page(page_to_pfn(page) + 1);
>  		left -= len;
>  	} while (left);
>  }
> 
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Tejun Heo Jan. 16, 2013, 11:18 p.m. UTC | #7
On Wed, Jan 16, 2013 at 10:32:35AM +0000, James Bottomley wrote:
> diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
> index 6b2fb87..ab88c5b 100644
> --- a/arch/arm/mm/dma-mapping.c
> +++ b/arch/arm/mm/dma-mapping.c
> @@ -809,7 +809,7 @@ static void dma_cache_maint_page(struct page *page, unsigned long offset,
>  			op(vaddr, len, dir);
>  		}
>  		offset = 0;
> -		page++;
> +		page = pfn_to_page(page_to_pfn(page) + 1);

Probably page = nth_page(page, 1) is the better form.

Thanks.
James Bottomley Jan. 17, 2013, 8:54 a.m. UTC | #8
On Wed, 2013-01-16 at 23:14 +0000, Russell King - ARM Linux wrote:
> On Wed, Jan 16, 2013 at 10:32:35AM +0000, James Bottomley wrote:
> > On Wed, 2013-01-16 at 12:07 +0530, Subhash Jadavani wrote:
> > 
> > > Now consider this call stack from MMC block driver (this is on the ARmv7 
> > > based board):
> > >      [   98.918174] [<c001b50c>] (v7_dma_inv_range+0x30/0x48) from 
> > > [<c0017b8c>] (dma_cache_maint_page+0x1c4/0x24c)
> > >      [   98.927819] [<c0017b8c>] (dma_cache_maint_page+0x1c4/0x24c) from 
> > > [<c0017c28>] (___dma_page_cpu_to_dev+0x14/0x1c)
> > >      [   98.937982] [<c0017c28>] (___dma_page_cpu_to_dev+0x14/0x1c) from 
> > > [<c0017ff8>] (dma_map_sg+0x3c/0x114)
> > 
> > OK, so this is showing that ARM itself is making the assumption that the
> > pages are contiguous in the page offset map.
> > 
> > Fix this by doing the increment via the pfn, which will do the right
> > thing whatever the memory model.
> > 
> > Signed-off-by: James Bottomley <JBottomley@Parallels.com>
> 
> Ok.  What would you like the patch summary line for this to be -
> the existing one seems to be a little wrong given the content of
> this patch...

how about

arm: fix struct page iterator in dma_cache_maint() to work with
sparsemem

?

James

> > 
> > ---
> > 
> > diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
> > index 6b2fb87..ab88c5b 100644
> > --- a/arch/arm/mm/dma-mapping.c
> > +++ b/arch/arm/mm/dma-mapping.c
> > @@ -809,7 +809,7 @@ static void dma_cache_maint_page(struct page *page, unsigned long offset,
> >  			op(vaddr, len, dir);
> >  		}
> >  		offset = 0;
> > -		page++;
> > +		page = pfn_to_page(page_to_pfn(page) + 1);
> >  		left -= len;
> >  	} while (left);
> >  }
> > 
> > 
> > 
> > _______________________________________________
> > linux-arm-kernel mailing list
> > linux-arm-kernel@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
James Bottomley Jan. 17, 2013, 9:11 a.m. UTC | #9
On Wed, 2013-01-16 at 15:18 -0800, Tejun Heo wrote:
> On Wed, Jan 16, 2013 at 10:32:35AM +0000, James Bottomley wrote:
> > diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
> > index 6b2fb87..ab88c5b 100644
> > --- a/arch/arm/mm/dma-mapping.c
> > +++ b/arch/arm/mm/dma-mapping.c
> > @@ -809,7 +809,7 @@ static void dma_cache_maint_page(struct page *page, unsigned long offset,
> >  			op(vaddr, len, dir);
> >  		}
> >  		offset = 0;
> > -		page++;
> > +		page = pfn_to_page(page_to_pfn(page) + 1);
> 
> Probably page = nth_page(page, 1) is the better form.

It's the same thing.

I'd actually prefer page = pfn_to_page(page_to_pfn(page) + 1); because
it makes the code look like the hack it is.  The preferred form for all
iterators like this should be to iterate over the pfn instead of a
pointer into the page arrays, because that will always work correctly no
matter how many weird and wonderful memory schemes we come up with.

James
Russell King - ARM Linux Jan. 17, 2013, 10:37 a.m. UTC | #10
On Thu, Jan 17, 2013 at 09:11:20AM +0000, James Bottomley wrote:
> On Wed, 2013-01-16 at 15:18 -0800, Tejun Heo wrote:
> > On Wed, Jan 16, 2013 at 10:32:35AM +0000, James Bottomley wrote:
> > > diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
> > > index 6b2fb87..ab88c5b 100644
> > > --- a/arch/arm/mm/dma-mapping.c
> > > +++ b/arch/arm/mm/dma-mapping.c
> > > @@ -809,7 +809,7 @@ static void dma_cache_maint_page(struct page *page, unsigned long offset,
> > >  			op(vaddr, len, dir);
> > >  		}
> > >  		offset = 0;
> > > -		page++;
> > > +		page = pfn_to_page(page_to_pfn(page) + 1);
> > 
> > Probably page = nth_page(page, 1) is the better form.
> 
> It's the same thing.
> 
> I'd actually prefer page = pfn_to_page(page_to_pfn(page) + 1); because
> it makes the code look like the hack it is.  The preferred form for all
> iterators like this should be to iterate over the pfn instead of a
> pointer into the page arrays, because that will always work correctly no
> matter how many weird and wonderful memory schemes we come up with.

So, why don't we update the code to do that then?
diff mbox

Patch

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 6b2fb87..ab88c5b 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -809,7 +809,7 @@  static void dma_cache_maint_page(struct page *page, unsigned long offset,
 			op(vaddr, len, dir);
 		}
 		offset = 0;
-		page++;
+		page = pfn_to_page(page_to_pfn(page) + 1);
 		left -= len;
 	} while (left);
 }