blk: accessing invalid memory with "blk-mq: dynamic h/w context count"
diff mbox

Message ID 20160212162435.1e809790@tom-T450
State New
Headers show

Commit Message

Ming Lei Feb. 12, 2016, 8:24 a.m. UTC
On Fri, 12 Feb 2016 00:41:28 -0500
Sasha Levin <sasha.levin@oracle.com> wrote:

> Hi all,
> 
> I've started seeing the following errors on boot:
> 
> [6035791.296570] ==================================================================
> [6035791.297467] BUG: KASAN: slab-out-of-bounds in loop_init_request+0x19c/0x1c0 at addr ffff880052e5c190
> [6035791.298355] Write of size 8 by task swapper/0/1
> [6035791.298842] =============================================================================
> [6035791.299751] BUG kmalloc-512 (Tainted: G        W      ): kasan: bad access detected
> [6035791.300736] -----------------------------------------------------------------------------
> [6035791.300736]
> [6035791.301696] Disabling lock debugging due to kernel taint
> [6035791.302220] INFO: Slab 0xffffea00014b9700 objects=32 used=32 fp=0x          (null) flags=0x1fffff80004080
> [6035791.303218] INFO: Object 0xffff880052e5c000 @offset=0 fp=0x          (null)
> [6035791.303218]
> [6035791.304047] Object ffff880052e5c000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> [6035791.304955] Object ffff880052e5c010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> [6035791.305970] Object ffff880052e5c020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> [6035791.306916] Object ffff880052e5c030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> [6035791.307908] Object ffff880052e5c040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> [6035791.308903] Object ffff880052e5c050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> [6035791.309959] Object ffff880052e5c060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> [6035791.310896] Object ffff880052e5c070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> [6035791.311849] Object ffff880052e5c080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> [6035791.312784] Object ffff880052e5c090: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> [6035791.313734] Object ffff880052e5c0a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> [6035791.314646] Object ffff880052e5c0b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> [6035791.315567] Object ffff880052e5c0c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> [6035791.316519] Object ffff880052e5c0d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> [6035791.317475] Object ffff880052e5c0e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> [6035791.318461] Object ffff880052e5c0f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> [6035791.319428] Object ffff880052e5c100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> [6035791.320548] Object ffff880052e5c110: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> [6035791.321680] Object ffff880052e5c120: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> [6035791.322585] Object ffff880052e5c130: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> [6035791.323587] Object ffff880052e5c140: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> [6035791.324574] Object ffff880052e5c150: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> [6035791.325505] Object ffff880052e5c160: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> [6035791.326449] Object ffff880052e5c170: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> [6035791.327412] Object ffff880052e5c180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> [6035791.328329] Object ffff880052e5c190: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> [6035791.329200] Object ffff880052e5c1a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> [6035791.330117] Object ffff880052e5c1b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> [6035791.331000] Object ffff880052e5c1c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> [6035791.331949] Object ffff880052e5c1d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> [6035791.332888] Object ffff880052e5c1e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> [6035791.333886] Object ffff880052e5c1f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> [6035791.334813] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G    B   W       4.5.0-rc3-next-20160211-sasha-00028-g542d18e-dirty #2898
> [6035791.335884]  1ffff1000a714ed2 00000000534d57fe ffff8800538a7718 ffffffffa34d4a15
> [6035791.336796]  ffffffff00000000 fffffbfff5eec534 0000000041b58ab3 ffffffffaefba520
> [6035791.337631]  ffffffffa34d489f 00000000534d57fe ffff880184220000 ffffffffaefd813f
> [6035791.338458] Call Trace:
> [6035791.338756] dump_stack (lib/dump_stack.c:53)
> [6035791.340573] print_trailer (mm/slub.c:661)
> [6035791.341117] object_err (mm/slub.c:668)
> [6035791.341738] kasan_report_error (include/linux/kasan.h:28 mm/kasan/report.c:170 mm/kasan/report.c:237)
> [6035791.344327] __asan_report_store8_noabort (mm/kasan/report.c:259 mm/kasan/report.c:285)
> [6035791.345775] loop_init_request (drivers/block/loop.c:1699)
> [6035791.347753] blk_mq_realloc_hw_ctxs (block/blk-mq.c:1722 block/blk-mq.c:1981)
> [6035791.351966] blk_mq_init_allocated_queue (block/blk-mq.c:2027)
> [6035791.355528] blk_mq_init_queue (block/blk-mq.c:1944)
> [6035791.356081] loop_add (drivers/block/loop.c:1749)
> [6035791.358663] loop_init (drivers/block/loop.c:2006 (discriminator 3))
> [6035791.362708] do_one_initcall (init/main.c:788)
> [6035791.363968] kernel_init_freeable (init/main.c:853 init/main.c:861 init/main.c:879 init/main.c:1004)
> [6035791.366040] kernel_init (init/main.c:932)
> [6035791.366573] ret_from_fork (arch/x86/entry/entry_64.S:383)
> [6035791.367782] Memory state around the buggy address:
> [6035791.368247]  ffff880052e5c080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [6035791.368968]  ffff880052e5c100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 fc fc
> [6035791.369852] >ffff880052e5c180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> [6035791.370635]                          ^
> [6035791.371015]  ffff880052e5c200: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> [6035791.371816]  ffff880052e5c280: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> 
> Bisection pointed to:
> 
> commit 868f2f0b72068a097508b6e8870a8950fd8eb7ef
> Author: Keith Busch <keith.busch@intel.com>
> Date:   Thu Dec 17 17:08:14 2015 -0700
> 
>     blk-mq: dynamic h/w context count

Hi Sasha,

It should be about timing of setting q->mq_ops, and
I believe the following patch may fix the issue, could
you give a test?

Thanks,
---
From 299dfbd27a4ede53104608b07669041d202afe1f Mon Sep 17 00:00:00 2001
From: Ming Lei <tom.leiming@gmail.com>
Date: Fri, 12 Feb 2016 15:27:00 +0800
Subject: [PATCH] blk-mq: mark request queue as mq asap

Currently q->mq_ops is used widely to decide if the queue
is mq or not, so we should set the 'flag' asap so that both
block core and drivers can get the correct mq info.

For example, commit 868f2f0b720(blk-mq: dynamic h/w context count)
moves the hctx's initialization before setting q->mq_ops in
blk_mq_init_allocated_queue(), then cause blk_alloc_flush_queue()
to think the queue is non-mq and don't allocate command size
for the per-hctx flush rq.

This patches should fix the problem reported by Sasha.

Cc: Keith Busch <keith.busch@intel.com>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 block/blk-mq.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Keith Busch Feb. 12, 2016, 6:50 p.m. UTC | #1
On Fri, Feb 12, 2016 at 04:24:35PM +0800, Ming Lei wrote:
> On Fri, 12 Feb 2016 00:41:28 -0500
> Hi Sasha,
> 
> It should be about timing of setting q->mq_ops, and
> I believe the following patch may fix the issue, could
> you give a test?

Thanks, Ming, that looks better and looks like the same as the 0-day
failure from when this was posted a couple months ago. I thought this
was potentially risky looking, but haven't had time to make changes.

I also didn't see that this was applied yet either. I've a broken filter
moving important emails to the junk... On the plus side, the exposure
led to a potential fix.
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch
diff mbox

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 645eb9e..f539a53 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2010,6 +2010,9 @@  static void blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set,
 struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
 						  struct request_queue *q)
 {
+	/* mark the queue as mq asap */
+	q->mq_ops = set->ops;
+
 	q->queue_ctx = alloc_percpu(struct blk_mq_ctx);
 	if (!q->queue_ctx)
 		return ERR_PTR(-ENOMEM);
@@ -2032,7 +2035,6 @@  struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
 
 	q->nr_queues = nr_cpu_ids;
 
-	q->mq_ops = set->ops;
 	q->queue_flags |= QUEUE_FLAG_MQ_DEFAULT;
 
 	if (!(set->flags & BLK_MQ_F_SG_MERGE))