diff mbox series

[1/5] block: rewrite blk_bvec_map_sg to avoid a nth_page call

Message ID 20190408104641.4905-2-hch@lst.de (mailing list archive)
State New, archived
Headers show
Series [1/5] block: rewrite blk_bvec_map_sg to avoid a nth_page call | expand

Commit Message

Christoph Hellwig April 8, 2019, 10:46 a.m. UTC
The offset in scatterlists is allowed to be larger than the page size,
so don't go to great length to avoid that case and simplify the
arithmetics.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-merge.c | 21 ++++++---------------
 1 file changed, 6 insertions(+), 15 deletions(-)

Comments

Johannes Thumshirn April 8, 2019, 2:03 p.m. UTC | #1
Looks good,
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Bart Van Assche April 8, 2019, 10:04 p.m. UTC | #2
On Mon, 2019-04-08 at 12:46 +0200, Christoph Hellwig wrote:
> The offset in scatterlists is allowed to be larger than the page size,
> so don't go to great length to avoid that case and simplify the
> arithmetics.

Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Ming Lei April 8, 2019, 10:51 p.m. UTC | #3
On Mon, Apr 08, 2019 at 12:46:37PM +0200, Christoph Hellwig wrote:
> The offset in scatterlists is allowed to be larger than the page size,
> so don't go to great length to avoid that case and simplify the
> arithmetics.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  block/blk-merge.c | 21 ++++++---------------
>  1 file changed, 6 insertions(+), 15 deletions(-)
> 
> diff --git a/block/blk-merge.c b/block/blk-merge.c
> index 8f96d683b577..52dbdd089fdf 100644
> --- a/block/blk-merge.c
> +++ b/block/blk-merge.c
> @@ -467,26 +467,17 @@ static unsigned blk_bvec_map_sg(struct request_queue *q,
>  		struct scatterlist **sg)
>  {
>  	unsigned nbytes = bvec->bv_len;
> -	unsigned nsegs = 0, total = 0, offset = 0;
> +	unsigned nsegs = 0, total = 0;
>  
>  	while (nbytes > 0) {
> -		unsigned seg_size;
> -		struct page *pg;
> -		unsigned idx;
> +		unsigned offset = bvec->bv_offset + total;
> +		unsigned len = min(get_max_segment_size(q, offset), nbytes);
>  
>  		*sg = blk_next_sg(sg, sglist);
> +		sg_set_page(*sg, bvec->bv_page, len, offset);
>  
> -		seg_size = get_max_segment_size(q, bvec->bv_offset + total);
> -		seg_size = min(nbytes, seg_size);
> -
> -		offset = (total + bvec->bv_offset) % PAGE_SIZE;
> -		idx = (total + bvec->bv_offset) / PAGE_SIZE;
> -		pg = bvec_nth_page(bvec->bv_page, idx);
> -
> -		sg_set_page(*sg, pg, seg_size, offset);
> -
> -		total += seg_size;
> -		nbytes -= seg_size;
> +		total += len;
> +		nbytes -= len;
>  		nsegs++;
>  	}

Nice cleanup & simplification:

Reviewed-by: Ming Lei <ming.lei@redhat.com>

Thanks,
Ming
Guenter Roeck April 15, 2019, 7:44 p.m. UTC | #4
On Mon, Apr 08, 2019 at 12:46:37PM +0200, Christoph Hellwig wrote:
> The offset in scatterlists is allowed to be larger than the page size,
> so don't go to great length to avoid that case and simplify the
> arithmetics.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
> Reviewed-by: Bart Van Assche <bvanassche@acm.org>
> Reviewed-by: Ming Lei <ming.lei@redhat.com>
> ---

This patch causes crashes with various boot tests. Most sparc tests crash, as
well as several arm tests. Bisect results in both cases point to this patch.

On sparc:

Run /sbin/init as init process
kernel BUG at arch/sparc/lib/bitext.c:112!
              \|/ ____ \|/
              "@'/ ,. \`@"
              /_| \__/ |_\
                 \__U_/
lock_torture_wr(23): Kernel bad trap [#1]
CPU: 0 PID: 23 Comm: lock_torture_wr Not tainted 5.1.0-rc4-next-20190415 #1
PSR: 04401fc0 PC: f04a4e28 NPC: f04a4e2c Y: 00000000    Not tainted
PC: <bit_map_clear+0xe8/0xec>
%G: f000caec 00010003  f05db308 000000fd  00000000 000000fd  f514e000 f062be28
%O: 0000002a f05a41e0  00000070 00000001  f05dd564 f05dd504  f514f9a0 f04a4e20
RPC: <bit_map_clear+0xe0/0xec>
%L: 04001fc1 f00130c4  f00130c8 00000002  00000000 00000004  f514e000 00000547
%I: f500b990 0000003a  00000012 f513e570  00010000 f500b990  f514fa00 f001316c
Disabling lock debugging due to kernel taint
Caller[f001316c]: sbus_iommu_unmap_sg+0x34/0x60
Caller[f02d6170]: scsi_dma_unmap+0xac/0xd0
Caller[f02e2424]: esp_cmd_is_done+0x198/0x1a8
Caller[f02e4150]: scsi_esp_intr+0x191c/0x21e0
Caller[f0055438]: __handle_irq_event_percpu+0x8c/0x128
Caller[f00554e0]: handle_irq_event_percpu+0xc/0x4c
Caller[f0055550]: handle_irq_event+0x30/0x64
Caller[f0058b10]: handle_level_irq+0xb4/0x17c
Caller[f0054c38]: generic_handle_irq+0x30/0x40
Caller[f0009a4c]: handler_irq+0x3c/0x78
Caller[f0007b68]: patch_handler_irq+0x0/0x24
Caller[f004e054]: lock_torture_writer+0xc0/0x1fc
Caller[f003ca18]: kthread+0xd8/0x110
Caller[f00082f0]: ret_from_kernel_thread+0xc/0x38
Caller[00000000]:   (null)
Instruction DUMP:
 113c1690 
 7fed90ae 
 901221e0 
<91d02005>
 9de3bfa0 
 92102000 
 9406a01f 
 90100019 
 9532a005 

Kernel panic - not syncing: Aiee, killing interrupt handler!

On arm:

[   13.836353] Run /sbin/init as init process
[   13.937406] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[   13.937736] CPU: 0 PID: 1 Comm: init Not tainted 5.1.0-rc4-next-20190415 #1
[   13.937895] Hardware name: ARM-Versatile Express
[   13.938472] [<c0312a74>] (unwind_backtrace) from [<c030d350>] (show_stack+0x10/0x14)
[   13.938671] [<c030d350>] (show_stack) from [<c1097184>] (dump_stack+0xb4/0xc8)
[   13.938834] [<c1097184>] (dump_stack) from [<c03474c4>] (panic+0x110/0x328)
[   13.939008] [<c03474c4>] (panic) from [<c034c360>] (do_exit+0xbf8/0xc04)
[   13.939161] [<c034c360>] (do_exit) from [<c034d3c0>] (do_group_exit+0x3c/0xbc)
[   13.939323] [<c034d3c0>] (do_group_exit) from [<c0359460>] (get_signal+0x13c/0x9c4)
[   13.939491] [<c0359460>] (get_signal) from [<c030c760>] (do_work_pending+0x198/0x5f0)
[   13.939660] [<c030c760>] (do_work_pending) from [<c030106c>] (slow_work_pending+0xc/0x20)
[   13.939862] Exception stack(0xc703ffb0 to 0xc703fff8)
[   13.940102] ffa0:                                     b6f0e578 00000016 00000015 00000004
[   13.940372] ffc0: b6f0e060 b6f0e578 b6ede000 b6ede360 b6ef5dc0 b6ede95c b6ede2c0 becb6efc
[   13.940613] ffe0: b6f0e060 becb6ec0 b6edf628 b6ee9c00 60000010 ffffffff

Reverting the patch isn't possible since there are subsequent changes
affecting the code, and the image no longer builds after the revert.

Guenter

---
bisect results:

# bad: [f9221a7a1014d8a047b277a73289678646ddc110] Add linux-next specific files for 20190415
# good: [15ade5d2e7775667cf191cf2f94327a4889f8b9d] Linux 5.1-rc4
git bisect start 'HEAD' 'v5.1-rc4'
# good: [34ad076e8efd33658e4367df70a92058de7a870d] Merge remote-tracking branch 'crypto/master'
git bisect good 34ad076e8efd33658e4367df70a92058de7a870d
# bad: [5b51a36f96d773866e1a2165bacfcde58464eb1d] Merge remote-tracking branch 'devicetree/for-next'
git bisect bad 5b51a36f96d773866e1a2165bacfcde58464eb1d
# good: [457109829f4ee4107e8c7108237afba21fabbb5e] Merge branch 'drm-next-5.2' of git://people.freedesktop.org/~agd5f/linux into drm-next
git bisect good 457109829f4ee4107e8c7108237afba21fabbb5e
# good: [c4af5502948132ca3ce48d047188f8fc2f484dd7] Merge remote-tracking branch 'sound-asoc/for-next'
git bisect good c4af5502948132ca3ce48d047188f8fc2f484dd7
# bad: [722312b700e5fe2e9b16547fc2cb0dbbfb3e345c] Merge remote-tracking branch 'battery/for-next'
git bisect bad 722312b700e5fe2e9b16547fc2cb0dbbfb3e345c
# bad: [da55bae775eb580cd7f1b74850e55cf8880e7984] Merge branch 'for-5.2/block' into for-next
git bisect bad da55bae775eb580cd7f1b74850e55cf8880e7984
# good: [1e815b33c5ccd3936b71292b5ffb84e97e1df9e0] block: sed-opal: fix typos and formatting
git bisect good 1e815b33c5ccd3936b71292b5ffb84e97e1df9e0
# good: [06bda9d56ba3c0ba5ecb41f27da93a417aa442d1] Merge branch 'for-5.2/block' into for-next
git bisect good 06bda9d56ba3c0ba5ecb41f27da93a417aa442d1
# bad: [14eacf12dbc75352fa746dfd9e24de3170ba5ff5] block: don't allow multiple bio_iov_iter_get_pages calls per bio
git bisect bad 14eacf12dbc75352fa746dfd9e24de3170ba5ff5
# good: [ae50640bebc48f1fc0092f16ea004c7c4d12c985] md: use correct type in super_1_sync
git bisect good ae50640bebc48f1fc0092f16ea004c7c4d12c985
# good: [efcd487c69b9d968552a6bf80e7839c4f28b419d] md: add __acquires/__releases annotations to handle_active_stripes
git bisect good efcd487c69b9d968552a6bf80e7839c4f28b419d
# bad: [8a96a0e408102fb7aa73d8aa0b5e2219cfd51e55] block: rewrite blk_bvec_map_sg to avoid a nth_page call
git bisect bad 8a96a0e408102fb7aa73d8aa0b5e2219cfd51e55
# good: [22391ac30ab9cc2ba610bf7ea2244840b83c8017] Merge branch 'md-next' of https://github.com/liu-song-6/linux into for-5.2/block
git bisect good 22391ac30ab9cc2ba610bf7ea2244840b83c8017
# first bad commit: [8a96a0e408102fb7aa73d8aa0b5e2219cfd51e55] block: rewrite blk_bvec_map_sg to avoid a nth_page call
Christoph Hellwig April 15, 2019, 8:52 p.m. UTC | #5
On Mon, Apr 15, 2019 at 12:44:35PM -0700, Guenter Roeck wrote:
> This patch causes crashes with various boot tests. Most sparc tests crash, as
> well as several arm tests. Bisect results in both cases point to this patch.

That just means we trigger an existing bug more easily now.  I'll see
if I can help with the issues.
Guenter Roeck April 15, 2019, 9:07 p.m. UTC | #6
On Mon, Apr 15, 2019 at 10:52:42PM +0200, Christoph Hellwig wrote:
> On Mon, Apr 15, 2019 at 12:44:35PM -0700, Guenter Roeck wrote:
> > This patch causes crashes with various boot tests. Most sparc tests crash, as
> > well as several arm tests. Bisect results in both cases point to this patch.
> 
> That just means we trigger an existing bug more easily now.  I'll see
> if I can help with the issues.

Code which previously worked reliably no longer does. I would be quite
hesitant to call this "trigger an existing bug more easily". "Regression"
seems to be a more appropriate term - even more so as it seems to cause
'init' crashes, at least on arm.

Guenter
Christoph Hellwig April 16, 2019, 6:33 a.m. UTC | #7
On Mon, Apr 15, 2019 at 02:07:31PM -0700, Guenter Roeck wrote:
> On Mon, Apr 15, 2019 at 10:52:42PM +0200, Christoph Hellwig wrote:
> > On Mon, Apr 15, 2019 at 12:44:35PM -0700, Guenter Roeck wrote:
> > > This patch causes crashes with various boot tests. Most sparc tests crash, as
> > > well as several arm tests. Bisect results in both cases point to this patch.
> > 
> > That just means we trigger an existing bug more easily now.  I'll see
> > if I can help with the issues.
> 
> Code which previously worked reliably no longer does. I would be quite
> hesitant to call this "trigger an existing bug more easily". "Regression"
> seems to be a more appropriate term - even more so as it seems to cause
> 'init' crashes, at least on arm.

Well, we have these sgls in the wild already, it just is that they
are fairly rare.  For a related fix on a mainstream platform see
here for example:

	https://lore.kernel.org/patchwork/patch/1050367/

Below is a rework of the sparc32 iommu code that should avoid your
reported problem.  Please send any other reports to me as well.

diff --git a/arch/sparc/mm/iommu.c b/arch/sparc/mm/iommu.c
index e8d5d73ca40d..93c2fc440cb0 100644
--- a/arch/sparc/mm/iommu.c
+++ b/arch/sparc/mm/iommu.c
@@ -175,16 +175,38 @@ static void iommu_flush_iotlb(iopte_t *iopte, unsigned int niopte)
 	}
 }
 
-static u32 iommu_get_one(struct device *dev, struct page *page, int npages)
+static u32 __sbus_iommu_map_page(struct device *dev, struct page *page, unsigned offset,
+		unsigned len, bool need_flush)
 {
 	struct iommu_struct *iommu = dev->archdata.iommu;
+	phys_addr_t paddr = page_to_phys(page) + offset, p;
+	unsigned long pfn = __phys_to_pfn(paddr);
+	unsigned long off = (unsigned long)paddr & ~PAGE_MASK;
+	unsigned long npages = (off + len + PAGE_SIZE - 1) >> PAGE_SHIFT;
 	int ioptex;
 	iopte_t *iopte, *iopte0;
 	unsigned int busa, busa0;
 	int i;
 
+	/* XXX So what is maxphys for us and how do drivers know it? */
+	if (!len || len > 256 * 1024)
+		return DMA_MAPPING_ERROR;
+
+	/*
+	 * We expect unmapped highmem pages to be not in the cache.
+	 * XXX Is this a good assumption?
+	 * XXX What if someone else unmaps it here and races us?
+	 */
+	if (need_flush && !PageHighMem(page)) {
+		for (p = paddr & PAGE_MASK; p < paddr + len; p += PAGE_SIZE) {
+			unsigned long vaddr = (unsigned long)phys_to_virt(p);
+
+			flush_page_for_dma(vaddr);
+		}
+	}
+
 	/* page color = pfn of page */
-	ioptex = bit_map_string_get(&iommu->usemap, npages, page_to_pfn(page));
+	ioptex = bit_map_string_get(&iommu->usemap, npages, pfn);
 	if (ioptex < 0)
 		panic("iommu out");
 	busa0 = iommu->start + (ioptex << PAGE_SHIFT);
@@ -193,11 +215,11 @@ static u32 iommu_get_one(struct device *dev, struct page *page, int npages)
 	busa = busa0;
 	iopte = iopte0;
 	for (i = 0; i < npages; i++) {
-		iopte_val(*iopte) = MKIOPTE(page_to_pfn(page), IOPERM);
+		iopte_val(*iopte) = MKIOPTE(pfn, IOPERM);
 		iommu_invalidate_page(iommu->regs, busa);
 		busa += PAGE_SIZE;
 		iopte++;
-		page++;
+		pfn++;
 	}
 
 	iommu_flush_iotlb(iopte0, npages);
@@ -205,99 +227,62 @@ static u32 iommu_get_one(struct device *dev, struct page *page, int npages)
 	return busa0;
 }
 
-static dma_addr_t __sbus_iommu_map_page(struct device *dev, struct page *page,
-		unsigned long offset, size_t len)
-{
-	void *vaddr = page_address(page) + offset;
-	unsigned long off = (unsigned long)vaddr & ~PAGE_MASK;
-	unsigned long npages = (off + len + PAGE_SIZE - 1) >> PAGE_SHIFT;
-	
-	/* XXX So what is maxphys for us and how do drivers know it? */
-	if (!len || len > 256 * 1024)
-		return DMA_MAPPING_ERROR;
-	return iommu_get_one(dev, virt_to_page(vaddr), npages) + off;
-}
-
 static dma_addr_t sbus_iommu_map_page_gflush(struct device *dev,
 		struct page *page, unsigned long offset, size_t len,
 		enum dma_data_direction dir, unsigned long attrs)
 {
 	flush_page_for_dma(0);
-	return __sbus_iommu_map_page(dev, page, offset, len);
+	return __sbus_iommu_map_page(dev, page, offset, len, false);
 }
 
 static dma_addr_t sbus_iommu_map_page_pflush(struct device *dev,
 		struct page *page, unsigned long offset, size_t len,
 		enum dma_data_direction dir, unsigned long attrs)
 {
-	void *vaddr = page_address(page) + offset;
-	unsigned long p = ((unsigned long)vaddr) & PAGE_MASK;
-
-	while (p < (unsigned long)vaddr + len) {
-		flush_page_for_dma(p);
-		p += PAGE_SIZE;
-	}
-
-	return __sbus_iommu_map_page(dev, page, offset, len);
+	return __sbus_iommu_map_page(dev, page, offset, len, true);
 }
 
-static int sbus_iommu_map_sg_gflush(struct device *dev, struct scatterlist *sgl,
-		int nents, enum dma_data_direction dir, unsigned long attrs)
+static int __sbus_iommu_map_sg(struct device *dev, struct scatterlist *sgl,
+		int nents, enum dma_data_direction dir, unsigned long attrs,
+		bool need_flush)
 {
 	struct scatterlist *sg;
-	int i, n;
-
-	flush_page_for_dma(0);
+	int i;
 
 	for_each_sg(sgl, sg, nents, i) {
-		n = (sg->length + sg->offset + PAGE_SIZE-1) >> PAGE_SHIFT;
-		sg->dma_address = iommu_get_one(dev, sg_page(sg), n) + sg->offset;
+		sg->dma_address = __sbus_iommu_map_page(dev, sg_page(sg),
+				sg->offset, sg->length, need_flush);
+		if (sg->dma_address == DMA_MAPPING_ERROR)
+			return 0;
 		sg->dma_length = sg->length;
 	}
 
 	return nents;
 }
 
-static int sbus_iommu_map_sg_pflush(struct device *dev, struct scatterlist *sgl,
+static int sbus_iommu_map_sg_gflush(struct device *dev, struct scatterlist *sgl,
 		int nents, enum dma_data_direction dir, unsigned long attrs)
 {
-	unsigned long page, oldpage = 0;
-	struct scatterlist *sg;
-	int i, j, n;
-
-	for_each_sg(sgl, sg, nents, j) {
-		n = (sg->length + sg->offset + PAGE_SIZE-1) >> PAGE_SHIFT;
-
-		/*
-		 * We expect unmapped highmem pages to be not in the cache.
-		 * XXX Is this a good assumption?
-		 * XXX What if someone else unmaps it here and races us?
-		 */
-		if ((page = (unsigned long) page_address(sg_page(sg))) != 0) {
-			for (i = 0; i < n; i++) {
-				if (page != oldpage) {	/* Already flushed? */
-					flush_page_for_dma(page);
-					oldpage = page;
-				}
-				page += PAGE_SIZE;
-			}
-		}
-
-		sg->dma_address = iommu_get_one(dev, sg_page(sg), n) + sg->offset;
-		sg->dma_length = sg->length;
-	}
+	flush_page_for_dma(0);
+	return __sbus_iommu_map_sg(dev, sgl, nents, dir, attrs, false);
+}
 
-	return nents;
+static int sbus_iommu_map_sg_pflush(struct device *dev, struct scatterlist *sgl,
+		int nents, enum dma_data_direction dir, unsigned long attrs)
+{
+	return __sbus_iommu_map_sg(dev, sgl, nents, dir, attrs, true);
 }
 
-static void iommu_release_one(struct device *dev, u32 busa, int npages)
+static void __sbus_iommu_unmap_page(struct device *dev, dma_addr_t dma_addr,
+		size_t len)
 {
 	struct iommu_struct *iommu = dev->archdata.iommu;
-	int ioptex;
-	int i;
+	unsigned busa, npages, ioptex, i;
 
+	busa = dma_addr & PAGE_MASK;
 	BUG_ON(busa < iommu->start);
 	ioptex = (busa - iommu->start) >> PAGE_SHIFT;
+	npages = ((dma_addr & ~PAGE_MASK) + len + PAGE_SIZE-1) >> PAGE_SHIFT;
 	for (i = 0; i < npages; i++) {
 		iopte_val(iommu->page_table[ioptex + i]) = 0;
 		iommu_invalidate_page(iommu->regs, busa);
@@ -309,22 +294,17 @@ static void iommu_release_one(struct device *dev, u32 busa, int npages)
 static void sbus_iommu_unmap_page(struct device *dev, dma_addr_t dma_addr,
 		size_t len, enum dma_data_direction dir, unsigned long attrs)
 {
-	unsigned long off = dma_addr & ~PAGE_MASK;
-	int npages;
-
-	npages = (off + len + PAGE_SIZE-1) >> PAGE_SHIFT;
-	iommu_release_one(dev, dma_addr & PAGE_MASK, npages);
+	__sbus_iommu_unmap_page(dev, dma_addr, len);
 }
 
 static void sbus_iommu_unmap_sg(struct device *dev, struct scatterlist *sgl,
 		int nents, enum dma_data_direction dir, unsigned long attrs)
 {
 	struct scatterlist *sg;
-	int i, n;
+	int i;
 
 	for_each_sg(sgl, sg, nents, i) {
-		n = (sg->length + sg->offset + PAGE_SIZE-1) >> PAGE_SHIFT;
-		iommu_release_one(dev, sg->dma_address & PAGE_MASK, n);
+		__sbus_iommu_unmap_page(dev, sg->dma_address, sg->length);
 		sg->dma_address = 0x21212121;
 	}
 }
Guenter Roeck April 16, 2019, 2:09 p.m. UTC | #8
On 4/15/19 11:33 PM, Christoph Hellwig wrote:
> On Mon, Apr 15, 2019 at 02:07:31PM -0700, Guenter Roeck wrote:
>> On Mon, Apr 15, 2019 at 10:52:42PM +0200, Christoph Hellwig wrote:
>>> On Mon, Apr 15, 2019 at 12:44:35PM -0700, Guenter Roeck wrote:
>>>> This patch causes crashes with various boot tests. Most sparc tests crash, as
>>>> well as several arm tests. Bisect results in both cases point to this patch.
>>>
>>> That just means we trigger an existing bug more easily now.  I'll see
>>> if I can help with the issues.
>>
>> Code which previously worked reliably no longer does. I would be quite
>> hesitant to call this "trigger an existing bug more easily". "Regression"
>> seems to be a more appropriate term - even more so as it seems to cause
>> 'init' crashes, at least on arm.
> 
> Well, we have these sgls in the wild already, it just is that they
> are fairly rare.  For a related fix on a mainstream platform see
> here for example:
> 
> 	https://lore.kernel.org/patchwork/patch/1050367/
> 
> Below is a rework of the sparc32 iommu code that should avoid your
> reported problem.  Please send any other reports to me as well.
> 

The code below on top of next-20190415 results in

esp ffd398e4: esp0: regs[(ptrval):(ptrval)] irq[2]
esp ffd398e4: esp0: is a FAS100A, 40 MHz (ccf=0), SCSI ID 7
scsi host0: esp
scsi 0:0:0:0: Direct-Access     QEMU     QEMU HARDDISK    2.5+ PQ: 0 ANSI: 5
scsi target0:0:0: Beginning Domain Validation
scsi target0:0:0: Domain Validation Initial Inquiry Failed
scsi target0:0:0: Ending Domain Validation
scsi host0: scsi scan: INQUIRY result too short (5), using 36
scsi 0:0:2:0: Direct-Access                                    PQ: 0 ANSI: 0
scsi target0:0:2: Beginning Domain Validation
scsi target0:0:2: Domain Validation Initial Inquiry Failed
scsi target0:0:2: Ending Domain Validation
...
sd 0:0:2:0: Device offlined - not ready after error recovery
sd 0:0:2:0: rejecting I/O to offline device

and sometimes:

...
sd 0:0:0:4: [sde] Asking for cache data failed
sd 0:0:0:4: [sde] Assuming drive cache: write through
sd 0:0:0:4: rejecting I/O to offline device
sd 0:0:0:5: Device offlined - not ready after error recovery
Unable to handle kernel NULL pointer dereference
tsk->{mm,active_mm}->context = ffffffff
tsk->{mm,active_mm}->pgd = fc000000
               \|/ ____ \|/
               "@'/ ,. \`@"
               /_| \__/ |_\
                  \__U_/
kworker/u2:6(90): Oops [#1]
CPU: 0 PID: 90 Comm: kworker/u2:6 Not tainted 5.1.0-rc4-next-20190415-00001-g328cdb292761 #1
Workqueue: events_unbound async_run_entry_fn
PSR: 409010c5 PC: f02d2904 NPC: f02d2908 Y: 0000000c    Not tainted
PC: <__scsi_execute+0xc/0x18c>
%G: f5334608 00000000  00000000 00000002  00041200 00000008  f5386000 00000002
%O: f5334608 00000000  f5387c90 f50d3fb0  0000000b f5334a4c  f5387b98 f02d2a38
RPC: <__scsi_execute+0x140/0x18c>
%L: 00000000 00000000  f51b1800 00000001  00000002 00000000  f5386000 f05f9070
%I: 00000200 f5387ca0  00000002 00000000  00000000 00000000  f5387bf8 f02e92a4
Disabling lock debugging due to kernel taint
Caller[f02e92a4]: sd_revalidate_disk+0xe8/0x1f5c
Caller[f02eb418]: sd_probe+0x2b0/0x3f0
Caller[f02bbc98]: really_probe+0x1bc/0x2e8
Caller[f02bc10c]: __driver_attach_async_helper+0x48/0x58
Caller[f003f534]: async_run_entry_fn+0x38/0x124
Caller[f00373bc]: process_one_work+0x168/0x390
Caller[f0037728]: worker_thread+0x144/0x504
Caller[f003c90c]: kthread+0xd8/0x110
Caller[f00082f0]: ret_from_kernel_thread+0xc/0x38
Caller[00000000]:   (null)
Instruction DUMP:
  9de3bfa0
  b41ea001
  80a0001a
<d0062004>
  92603fff
  a4100018
  e207a06c
  e007a074
  94102008

Guenter

> diff --git a/arch/sparc/mm/iommu.c b/arch/sparc/mm/iommu.c
> index e8d5d73ca40d..93c2fc440cb0 100644
> --- a/arch/sparc/mm/iommu.c
> +++ b/arch/sparc/mm/iommu.c
> @@ -175,16 +175,38 @@ static void iommu_flush_iotlb(iopte_t *iopte, unsigned int niopte)
>   	}
>   }
>   
> -static u32 iommu_get_one(struct device *dev, struct page *page, int npages)
> +static u32 __sbus_iommu_map_page(struct device *dev, struct page *page, unsigned offset,
> +		unsigned len, bool need_flush)
>   {
>   	struct iommu_struct *iommu = dev->archdata.iommu;
> +	phys_addr_t paddr = page_to_phys(page) + offset, p;
> +	unsigned long pfn = __phys_to_pfn(paddr);
> +	unsigned long off = (unsigned long)paddr & ~PAGE_MASK;
> +	unsigned long npages = (off + len + PAGE_SIZE - 1) >> PAGE_SHIFT;
>   	int ioptex;
>   	iopte_t *iopte, *iopte0;
>   	unsigned int busa, busa0;
>   	int i;
>   
> +	/* XXX So what is maxphys for us and how do drivers know it? */
> +	if (!len || len > 256 * 1024)
> +		return DMA_MAPPING_ERROR;
> +
> +	/*
> +	 * We expect unmapped highmem pages to be not in the cache.
> +	 * XXX Is this a good assumption?
> +	 * XXX What if someone else unmaps it here and races us?
> +	 */
> +	if (need_flush && !PageHighMem(page)) {
> +		for (p = paddr & PAGE_MASK; p < paddr + len; p += PAGE_SIZE) {
> +			unsigned long vaddr = (unsigned long)phys_to_virt(p);
> +
> +			flush_page_for_dma(vaddr);
> +		}
> +	}
> +
>   	/* page color = pfn of page */
> -	ioptex = bit_map_string_get(&iommu->usemap, npages, page_to_pfn(page));
> +	ioptex = bit_map_string_get(&iommu->usemap, npages, pfn);
>   	if (ioptex < 0)
>   		panic("iommu out");
>   	busa0 = iommu->start + (ioptex << PAGE_SHIFT);
> @@ -193,11 +215,11 @@ static u32 iommu_get_one(struct device *dev, struct page *page, int npages)
>   	busa = busa0;
>   	iopte = iopte0;
>   	for (i = 0; i < npages; i++) {
> -		iopte_val(*iopte) = MKIOPTE(page_to_pfn(page), IOPERM);
> +		iopte_val(*iopte) = MKIOPTE(pfn, IOPERM);
>   		iommu_invalidate_page(iommu->regs, busa);
>   		busa += PAGE_SIZE;
>   		iopte++;
> -		page++;
> +		pfn++;
>   	}
>   
>   	iommu_flush_iotlb(iopte0, npages);
> @@ -205,99 +227,62 @@ static u32 iommu_get_one(struct device *dev, struct page *page, int npages)
>   	return busa0;
>   }
>   
> -static dma_addr_t __sbus_iommu_map_page(struct device *dev, struct page *page,
> -		unsigned long offset, size_t len)
> -{
> -	void *vaddr = page_address(page) + offset;
> -	unsigned long off = (unsigned long)vaddr & ~PAGE_MASK;
> -	unsigned long npages = (off + len + PAGE_SIZE - 1) >> PAGE_SHIFT;
> -	
> -	/* XXX So what is maxphys for us and how do drivers know it? */
> -	if (!len || len > 256 * 1024)
> -		return DMA_MAPPING_ERROR;
> -	return iommu_get_one(dev, virt_to_page(vaddr), npages) + off;
> -}
> -
>   static dma_addr_t sbus_iommu_map_page_gflush(struct device *dev,
>   		struct page *page, unsigned long offset, size_t len,
>   		enum dma_data_direction dir, unsigned long attrs)
>   {
>   	flush_page_for_dma(0);
> -	return __sbus_iommu_map_page(dev, page, offset, len);
> +	return __sbus_iommu_map_page(dev, page, offset, len, false);
>   }
>   
>   static dma_addr_t sbus_iommu_map_page_pflush(struct device *dev,
>   		struct page *page, unsigned long offset, size_t len,
>   		enum dma_data_direction dir, unsigned long attrs)
>   {
> -	void *vaddr = page_address(page) + offset;
> -	unsigned long p = ((unsigned long)vaddr) & PAGE_MASK;
> -
> -	while (p < (unsigned long)vaddr + len) {
> -		flush_page_for_dma(p);
> -		p += PAGE_SIZE;
> -	}
> -
> -	return __sbus_iommu_map_page(dev, page, offset, len);
> +	return __sbus_iommu_map_page(dev, page, offset, len, true);
>   }
>   
> -static int sbus_iommu_map_sg_gflush(struct device *dev, struct scatterlist *sgl,
> -		int nents, enum dma_data_direction dir, unsigned long attrs)
> +static int __sbus_iommu_map_sg(struct device *dev, struct scatterlist *sgl,
> +		int nents, enum dma_data_direction dir, unsigned long attrs,
> +		bool need_flush)
>   {
>   	struct scatterlist *sg;
> -	int i, n;
> -
> -	flush_page_for_dma(0);
> +	int i;
>   
>   	for_each_sg(sgl, sg, nents, i) {
> -		n = (sg->length + sg->offset + PAGE_SIZE-1) >> PAGE_SHIFT;
> -		sg->dma_address = iommu_get_one(dev, sg_page(sg), n) + sg->offset;
> +		sg->dma_address = __sbus_iommu_map_page(dev, sg_page(sg),
> +				sg->offset, sg->length, need_flush);
> +		if (sg->dma_address == DMA_MAPPING_ERROR)
> +			return 0;
>   		sg->dma_length = sg->length;
>   	}
>   
>   	return nents;
>   }
>   
> -static int sbus_iommu_map_sg_pflush(struct device *dev, struct scatterlist *sgl,
> +static int sbus_iommu_map_sg_gflush(struct device *dev, struct scatterlist *sgl,
>   		int nents, enum dma_data_direction dir, unsigned long attrs)
>   {
> -	unsigned long page, oldpage = 0;
> -	struct scatterlist *sg;
> -	int i, j, n;
> -
> -	for_each_sg(sgl, sg, nents, j) {
> -		n = (sg->length + sg->offset + PAGE_SIZE-1) >> PAGE_SHIFT;
> -
> -		/*
> -		 * We expect unmapped highmem pages to be not in the cache.
> -		 * XXX Is this a good assumption?
> -		 * XXX What if someone else unmaps it here and races us?
> -		 */
> -		if ((page = (unsigned long) page_address(sg_page(sg))) != 0) {
> -			for (i = 0; i < n; i++) {
> -				if (page != oldpage) {	/* Already flushed? */
> -					flush_page_for_dma(page);
> -					oldpage = page;
> -				}
> -				page += PAGE_SIZE;
> -			}
> -		}
> -
> -		sg->dma_address = iommu_get_one(dev, sg_page(sg), n) + sg->offset;
> -		sg->dma_length = sg->length;
> -	}
> +	flush_page_for_dma(0);
> +	return __sbus_iommu_map_sg(dev, sgl, nents, dir, attrs, false);
> +}
>   
> -	return nents;
> +static int sbus_iommu_map_sg_pflush(struct device *dev, struct scatterlist *sgl,
> +		int nents, enum dma_data_direction dir, unsigned long attrs)
> +{
> +	return __sbus_iommu_map_sg(dev, sgl, nents, dir, attrs, true);
>   }
>   
> -static void iommu_release_one(struct device *dev, u32 busa, int npages)
> +static void __sbus_iommu_unmap_page(struct device *dev, dma_addr_t dma_addr,
> +		size_t len)
>   {
>   	struct iommu_struct *iommu = dev->archdata.iommu;
> -	int ioptex;
> -	int i;
> +	unsigned busa, npages, ioptex, i;
>   
> +	busa = dma_addr & PAGE_MASK;
>   	BUG_ON(busa < iommu->start);
>   	ioptex = (busa - iommu->start) >> PAGE_SHIFT;
> +	npages = ((dma_addr & ~PAGE_MASK) + len + PAGE_SIZE-1) >> PAGE_SHIFT;
>   	for (i = 0; i < npages; i++) {
>   		iopte_val(iommu->page_table[ioptex + i]) = 0;
>   		iommu_invalidate_page(iommu->regs, busa);
> @@ -309,22 +294,17 @@ static void iommu_release_one(struct device *dev, u32 busa, int npages)
>   static void sbus_iommu_unmap_page(struct device *dev, dma_addr_t dma_addr,
>   		size_t len, enum dma_data_direction dir, unsigned long attrs)
>   {
> -	unsigned long off = dma_addr & ~PAGE_MASK;
> -	int npages;
> -
> -	npages = (off + len + PAGE_SIZE-1) >> PAGE_SHIFT;
> -	iommu_release_one(dev, dma_addr & PAGE_MASK, npages);
> +	__sbus_iommu_unmap_page(dev, dma_addr, len);
>   }
>   
>   static void sbus_iommu_unmap_sg(struct device *dev, struct scatterlist *sgl,
>   		int nents, enum dma_data_direction dir, unsigned long attrs)
>   {
>   	struct scatterlist *sg;
> -	int i, n;
> +	int i;
>   
>   	for_each_sg(sgl, sg, nents, i) {
> -		n = (sg->length + sg->offset + PAGE_SIZE-1) >> PAGE_SHIFT;
> -		iommu_release_one(dev, sg->dma_address & PAGE_MASK, n);
> +		__sbus_iommu_unmap_page(dev, sg->dma_address, sg->length);
>   		sg->dma_address = 0x21212121;
>   	}
>   }
>
Guenter Roeck April 16, 2019, 5:08 p.m. UTC | #9
On Tue, Apr 16, 2019 at 08:33:56AM +0200, Christoph Hellwig wrote:
> On Mon, Apr 15, 2019 at 02:07:31PM -0700, Guenter Roeck wrote:
> > On Mon, Apr 15, 2019 at 10:52:42PM +0200, Christoph Hellwig wrote:
> > > On Mon, Apr 15, 2019 at 12:44:35PM -0700, Guenter Roeck wrote:
> > > > This patch causes crashes with various boot tests. Most sparc tests crash, as
> > > > well as several arm tests. Bisect results in both cases point to this patch.
> > > 
> > > That just means we trigger an existing bug more easily now.  I'll see
> > > if I can help with the issues.
> > 
> > Code which previously worked reliably no longer does. I would be quite
> > hesitant to call this "trigger an existing bug more easily". "Regression"
> > seems to be a more appropriate term - even more so as it seems to cause
> > 'init' crashes, at least on arm.
> 
> Well, we have these sgls in the wild already, it just is that they

That is besides the point. Your code changes an internal API to be more
stringent and less forgiving. This causes failures, presumably because
callers of that API took advantage (on purpose or not) of it.
When changing an API, you are responsible for both ends. You can not claim
that the callers of that API are buggy. Taking advangage of a forgiving
API is not a bug. If you change an API, and that change causes a failure,
that is a regression, not a bug on the side of the caller.

On top of that, an API change causing roughly 4% of my boot tests to fail
is a serious regression. Those boot tests don't really do anything besides
trying to boot the system. If 4% of those tests fail, I don't even want to
know what else is going to fail when your patch (or patch series) hits
mainline. Your patch should be reverted until that is resolved. If making
the API more stringent / less forgiving indeed makes sense and improves code
quality and/or performance, the very least would be to change the code to
still accept what it used to accept before but generate a traceback.
That would let people fix the calling code without making systems unusable.
This is even more true with failures like the one I observed on arm,
where your patch causes init to crash without clear indication of the
root cause of that crash.

Guenter
Christoph Hellwig April 16, 2019, 5:10 p.m. UTC | #10
On Tue, Apr 16, 2019 at 10:08:47AM -0700, Guenter Roeck wrote:
> That is besides the point. Your code changes an internal API to be more
> stringent and less forgiving. This causes failures, presumably because
> callers of that API took advantage (on purpose or not) of it.
> When changing an API, you are responsible for both ends. You can not claim
> that the callers of that API are buggy. Taking advangage of a forgiving
> API is not a bug. If you change an API, and that change causes a failure,
> that is a regression, not a bug on the side of the caller.

As said I offered to fix these, even if this isn't my fault.  I'm also
still waiting for the the other reports.
Guenter Roeck April 16, 2019, 5:51 p.m. UTC | #11
On Tue, Apr 16, 2019 at 07:10:11PM +0200, Christoph Hellwig wrote:
> On Tue, Apr 16, 2019 at 10:08:47AM -0700, Guenter Roeck wrote:
> > That is besides the point. Your code changes an internal API to be more
> > stringent and less forgiving. This causes failures, presumably because
> > callers of that API took advantage (on purpose or not) of it.
> > When changing an API, you are responsible for both ends. You can not claim
> > that the callers of that API are buggy. Taking advangage of a forgiving
> > API is not a bug. If you change an API, and that change causes a failure,
> > that is a regression, not a bug on the side of the caller.
> 
> As said I offered to fix these, even if this isn't my fault.  I'm also

"even if this isn't my fault"

Here is where we disagree. You introduced the change, you are responsible
for its impact, on both ends.

> still waiting for the the other reports.

I reported everything I know. To summarize, the following tests are confirmed
to fail due to this patch.

arm:vexpress-a9:multi_v7_defconfig:nolocktests:sd:mem128:vexpress-v2p-ca9:rootfs 
arm:vexpress-a15:multi_v7_defconfig:nolocktests:sd:mem128:vexpress-v2p-ca15-tc1:rootfs 
arm:vexpress-a15-a7:multi_v7_defconfig:nolocktests:sd:mem256:vexpress-v2p-ca15_a7:rootfs 
sparc32:SPARCClassic:nosmp:scsi:hd 
sparc32:SPARCbook:nosmp:scsi:cd 
sparc32:SS-5:nosmp:scsi:hd 
sparc32:SS-10:nosmp:scsi:cd 
sparc32:SS-600MP:nosmp:scsi:hd 
sparc32:Voyager:nosmp:noapc:scsi:hd 
sparc32:SS-4:smp:scsi:hd 
sparc32:SS-5:smp:scsi:cd 
sparc32:SS-20:smp:scsi:hd 
sparc32:SS-600MP:smp:scsi:hd 
sparc32:Voyager:smp:noapc:scsi:hd

Detailed logs are available at https://kerneltests.org/builders, and the
test scripts are published at https://github.com/groeck/linux-build-test.

Guenter
Christoph Hellwig April 17, 2019, 5:27 a.m. UTC | #12
Now that I've fixed the sparc32 iommu code in another thread:  can
you send me your rootfs and qemu arm command line for the failing
one?  I have a hard time parsing your buildbot output.
Guenter Roeck April 17, 2019, 1:42 p.m. UTC | #13
On 4/16/19 10:27 PM, Christoph Hellwig wrote:
> Now that I've fixed the sparc32 iommu code in another thread:  can
> you send me your rootfs and qemu arm command line for the failing
> one?  I have a hard time parsing your buildbot output.
> 
Example command line:

qemu-system-arm -M vexpress-a9 -kernel arch/arm/boot/zImage -no-reboot \
	-snapshot -m 128 \
	-drive file=rootfs-armv5.ext2,format=raw,if=sd \
	--append 'panic=-1  root=/dev/mmcblk0 rootwait console=ttyAMA0,115200' \
	-dtb arch/arm/boot/dts/vexpress-v2p-ca9.dtb \
	-nographic -monitor null -serial stdio

Root file system is available from
https://github.com/groeck/linux-build-test/blob/master/rootfs/arm/rootfs-armv5.ext2.gz
Configuration is multi_v7_defconfig; I confirmed that this configuration is sufficient
to see the failure.

Guenter
Guenter Roeck April 17, 2019, 9:59 p.m. UTC | #14
On Wed, Apr 17, 2019 at 07:27:24AM +0200, Christoph Hellwig wrote:
> Now that I've fixed the sparc32 iommu code in another thread:  can
> you send me your rootfs and qemu arm command line for the failing
> one?  I have a hard time parsing your buildbot output.

FWIW: mmc_blk_data_prep() calls blk_rq_map_sg() with a large offset value.
The old code translated this into:

blk_bvec_map_sg(q=c77a0000 len=13824 offset=18944)
  sg_set_page(sg=c6015000 p=c7efd180 l=13824 o=2560)

The new code leaves offset unchanged:

blk_bvec_map_sg(q=c76c0528 len=13824 offset=18944)
  sg_set_page(sg=c6035000 p=c7f2af00 l=13824 o=18944)

Traceback:

[<c065e3d4>] (blk_rq_map_sg) from [<c0ca1444>] (mmc_blk_data_prep+0x1b0/0x2c8)
[<c0ca1444>] (mmc_blk_data_prep) from [<c0ca15ac>] (mmc_blk_rw_rq_prep+0x50/0x178)
[<c0ca15ac>] (mmc_blk_rw_rq_prep) from [<c0ca48bc>] (mmc_blk_mq_issue_rq+0x290/0x878)
[<c0ca48bc>] (mmc_blk_mq_issue_rq) from [<c0ca52e4>] (mmc_mq_queue_rq+0x128/0x234)
[<c0ca52e4>] (mmc_mq_queue_rq) from [<c066350c>] (blk_mq_dispatch_rq_list+0xc8/0x5e8)
[<c066350c>] (blk_mq_dispatch_rq_list) from [<c06681a8>] (blk_mq_do_dispatch_sched+0x60/0xfc)
[<c06681a8>] (blk_mq_do_dispatch_sched) from [<c06688b8>] (blk_mq_sched_dispatch_requests+0x134/0x1b0)
[<c06688b8>] (blk_mq_sched_dispatch_requests) from [<c0661f08>] (__blk_mq_run_hw_queue+0xa4/0x138)
[<c0661f08>] (__blk_mq_run_hw_queue) from [<c03622a0>] (process_one_work+0x218/0x510)
[<c03622a0>] (process_one_work) from [<c0363230>] (worker_thread+0x44/0x5bc)

This results in bad data transfers, which ultimately causes the crash.

Guenter
Ming Lei April 19, 2019, 2:27 a.m. UTC | #15
On Thu, Apr 18, 2019 at 5:59 AM Guenter Roeck <linux@roeck-us.net> wrote:
>
> On Wed, Apr 17, 2019 at 07:27:24AM +0200, Christoph Hellwig wrote:
> > Now that I've fixed the sparc32 iommu code in another thread:  can
> > you send me your rootfs and qemu arm command line for the failing
> > one?  I have a hard time parsing your buildbot output.
>
> FWIW: mmc_blk_data_prep() calls blk_rq_map_sg() with a large offset value.
> The old code translated this into:
>
> blk_bvec_map_sg(q=c77a0000 len=13824 offset=18944)
>   sg_set_page(sg=c6015000 p=c7efd180 l=13824 o=2560)
>
> The new code leaves offset unchanged:
>
> blk_bvec_map_sg(q=c76c0528 len=13824 offset=18944)
>   sg_set_page(sg=c6035000 p=c7f2af00 l=13824 o=18944)
>
> Traceback:
>
> [<c065e3d4>] (blk_rq_map_sg) from [<c0ca1444>] (mmc_blk_data_prep+0x1b0/0x2c8)
> [<c0ca1444>] (mmc_blk_data_prep) from [<c0ca15ac>] (mmc_blk_rw_rq_prep+0x50/0x178)
> [<c0ca15ac>] (mmc_blk_rw_rq_prep) from [<c0ca48bc>] (mmc_blk_mq_issue_rq+0x290/0x878)
> [<c0ca48bc>] (mmc_blk_mq_issue_rq) from [<c0ca52e4>] (mmc_mq_queue_rq+0x128/0x234)
> [<c0ca52e4>] (mmc_mq_queue_rq) from [<c066350c>] (blk_mq_dispatch_rq_list+0xc8/0x5e8)
> [<c066350c>] (blk_mq_dispatch_rq_list) from [<c06681a8>] (blk_mq_do_dispatch_sched+0x60/0xfc)
> [<c06681a8>] (blk_mq_do_dispatch_sched) from [<c06688b8>] (blk_mq_sched_dispatch_requests+0x134/0x1b0)
> [<c06688b8>] (blk_mq_sched_dispatch_requests) from [<c0661f08>] (__blk_mq_run_hw_queue+0xa4/0x138)
> [<c0661f08>] (__blk_mq_run_hw_queue) from [<c03622a0>] (process_one_work+0x218/0x510)
> [<c03622a0>] (process_one_work) from [<c0363230>] (worker_thread+0x44/0x5bc)
>
> This results in bad data transfers, which ultimately causes the crash.

There are several bugs related with kmap(sg_page(sg)), such as:

sdhci_kmap_atomic()
tmio_mmc_kmap_atomic()
wbsd_map_sg()

Thanks,
Ming Lei
Ming Lei April 19, 2019, 2:36 a.m. UTC | #16
On Fri, Apr 19, 2019 at 10:27 AM Ming Lei <tom.leiming@gmail.com> wrote:
>
> On Thu, Apr 18, 2019 at 5:59 AM Guenter Roeck <linux@roeck-us.net> wrote:
> >
> > On Wed, Apr 17, 2019 at 07:27:24AM +0200, Christoph Hellwig wrote:
> > > Now that I've fixed the sparc32 iommu code in another thread:  can
> > > you send me your rootfs and qemu arm command line for the failing
> > > one?  I have a hard time parsing your buildbot output.
> >
> > FWIW: mmc_blk_data_prep() calls blk_rq_map_sg() with a large offset value.
> > The old code translated this into:
> >
> > blk_bvec_map_sg(q=c77a0000 len=13824 offset=18944)
> >   sg_set_page(sg=c6015000 p=c7efd180 l=13824 o=2560)
> >
> > The new code leaves offset unchanged:
> >
> > blk_bvec_map_sg(q=c76c0528 len=13824 offset=18944)
> >   sg_set_page(sg=c6035000 p=c7f2af00 l=13824 o=18944)
> >
> > Traceback:
> >
> > [<c065e3d4>] (blk_rq_map_sg) from [<c0ca1444>] (mmc_blk_data_prep+0x1b0/0x2c8)
> > [<c0ca1444>] (mmc_blk_data_prep) from [<c0ca15ac>] (mmc_blk_rw_rq_prep+0x50/0x178)
> > [<c0ca15ac>] (mmc_blk_rw_rq_prep) from [<c0ca48bc>] (mmc_blk_mq_issue_rq+0x290/0x878)
> > [<c0ca48bc>] (mmc_blk_mq_issue_rq) from [<c0ca52e4>] (mmc_mq_queue_rq+0x128/0x234)
> > [<c0ca52e4>] (mmc_mq_queue_rq) from [<c066350c>] (blk_mq_dispatch_rq_list+0xc8/0x5e8)
> > [<c066350c>] (blk_mq_dispatch_rq_list) from [<c06681a8>] (blk_mq_do_dispatch_sched+0x60/0xfc)
> > [<c06681a8>] (blk_mq_do_dispatch_sched) from [<c06688b8>] (blk_mq_sched_dispatch_requests+0x134/0x1b0)
> > [<c06688b8>] (blk_mq_sched_dispatch_requests) from [<c0661f08>] (__blk_mq_run_hw_queue+0xa4/0x138)
> > [<c0661f08>] (__blk_mq_run_hw_queue) from [<c03622a0>] (process_one_work+0x218/0x510)
> > [<c03622a0>] (process_one_work) from [<c0363230>] (worker_thread+0x44/0x5bc)
> >
> > This results in bad data transfers, which ultimately causes the crash.
>
> There are several bugs related with kmap(sg_page(sg)), such as:
>
> sdhci_kmap_atomic()
> tmio_mmc_kmap_atomic()
> wbsd_map_sg()

Cc mmc maillist:

Looks there are more such bad uses:

au1xmmc_send_pio()
au1xmmc_receive_pio()
mmc_spi_data_do()
sdricoh_request()

However, seems tifm_sd.c notices this issue, see tifm_sd_transfer_data().

Thanks,
Ming Lei
diff mbox series

Patch

diff --git a/block/blk-merge.c b/block/blk-merge.c
index 8f96d683b577..52dbdd089fdf 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -467,26 +467,17 @@  static unsigned blk_bvec_map_sg(struct request_queue *q,
 		struct scatterlist **sg)
 {
 	unsigned nbytes = bvec->bv_len;
-	unsigned nsegs = 0, total = 0, offset = 0;
+	unsigned nsegs = 0, total = 0;
 
 	while (nbytes > 0) {
-		unsigned seg_size;
-		struct page *pg;
-		unsigned idx;
+		unsigned offset = bvec->bv_offset + total;
+		unsigned len = min(get_max_segment_size(q, offset), nbytes);
 
 		*sg = blk_next_sg(sg, sglist);
+		sg_set_page(*sg, bvec->bv_page, len, offset);
 
-		seg_size = get_max_segment_size(q, bvec->bv_offset + total);
-		seg_size = min(nbytes, seg_size);
-
-		offset = (total + bvec->bv_offset) % PAGE_SIZE;
-		idx = (total + bvec->bv_offset) / PAGE_SIZE;
-		pg = bvec_nth_page(bvec->bv_page, idx);
-
-		sg_set_page(*sg, pg, seg_size, offset);
-
-		total += seg_size;
-		nbytes -= seg_size;
+		total += len;
+		nbytes -= len;
 		nsegs++;
 	}