From patchwork Thu Oct 25 09:41:11 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Junichi Nomura X-Patchwork-Id: 1642611 Return-Path: X-Original-To: patchwork-dm-devel@patchwork.kernel.org Delivered-To: patchwork-process-083081@patchwork1.kernel.org Received: from mx4-phx2.redhat.com (mx4-phx2.redhat.com [209.132.183.25]) by patchwork1.kernel.org (Postfix) with ESMTP id C623E3FE1C for ; Thu, 25 Oct 2012 09:46:08 +0000 (UTC) Received: from lists01.pubmisc.prod.ext.phx2.redhat.com (lists01.pubmisc.prod.ext.phx2.redhat.com [10.5.19.33]) by mx4-phx2.redhat.com (8.13.8/8.13.8) with ESMTP id q9P9guXZ027412; Thu, 25 Oct 2012 05:42:58 -0400 Received: from int-mx01.intmail.prod.int.phx2.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) by lists01.pubmisc.prod.ext.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id q9P9gtUZ013312 for ; Thu, 25 Oct 2012 05:42:55 -0400 Received: from mx1.redhat.com (ext-mx15.extmail.prod.ext.phx2.redhat.com [10.5.110.20]) by int-mx01.intmail.prod.int.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id q9P9godu014354; Thu, 25 Oct 2012 05:42:50 -0400 Received: from tyo202.gate.nec.co.jp (TYO202.gate.nec.co.jp [210.143.35.52]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id q9P9glnb026414; Thu, 25 Oct 2012 05:42:48 -0400 Received: from mailgate3.nec.co.jp ([10.7.69.192]) by tyo202.gate.nec.co.jp (8.13.8/8.13.4) with ESMTP id q9P9gdoj015994; Thu, 25 Oct 2012 18:42:39 +0900 (JST) Received: (from root@localhost) by mailgate3.nec.co.jp (8.11.7/3.7W-MAILGATE-NEC) id q9P9gd221722; Thu, 25 Oct 2012 18:42:39 +0900 (JST) Received: from mail03.kamome.nec.co.jp (mail03.kamome.nec.co.jp [10.25.43.7]) by mailsv.nec.co.jp (8.13.8/8.13.4) with ESMTP id q9P9gdxB005953; Thu, 25 Oct 2012 18:42:39 +0900 (JST) Received: from genzui.jp.nec.com ([10.26.220.13] [10.26.220.13]) by mail01b.kamome.nec.co.jp with ESMTP id BT-MMP-2180429; Thu, 25 Oct 2012 18:41:11 +0900 Received: from xzibit.linux.bs1.fc.nec.co.jp ([10.34.125.175] [10.34.125.175]) by mail.jp.nec.com with ESMTPA id BT-MMP-64912; Thu, 25 Oct 2012 18:41:11 +0900 Message-ID: <50890937.7010809@ce.jp.nec.com> Date: Thu, 25 Oct 2012 18:41:11 +0900 From: "Jun'ichi Nomura" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:14.0) Gecko/20120717 Thunderbird/14.0 MIME-Version: 1.0 To: "linux-kernel@vger.kernel.org" , device-mapper development X-RedHat-Spam-Score: -1.902 (BAYES_00,SPF_HELO_PASS,SPF_PASS) X-Scanned-By: MIMEDefang 2.67 on 10.5.11.11 X-Scanned-By: MIMEDefang 2.68 on 10.5.110.20 X-loop: dm-devel@redhat.com Cc: Tejun Heo , Jens Axboe , Alasdair G Kergon , Vivek Goyal Subject: [dm-devel] [PATCH 2/2] dm: stay in blk_queue_bypass until queue becomes initialized X-BeenThere: dm-devel@redhat.com X-Mailman-Version: 2.1.12 Precedence: junk Reply-To: device-mapper development List-Id: device-mapper development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: dm-devel-bounces@redhat.com Errors-To: dm-devel-bounces@redhat.com [PATCH] dm: stay in blk_queue_bypass until queue becomes initialized With 749fefe677 ("block: lift the initial queue bypass mode on blk_register_queue() instead of blk_init_allocated_queue()"), add_disk() eventually calls blk_queue_bypass_end(). This change invokes the following warning when multipath is used. BUG: scheduling while atomic: multipath/2460/0x00000002 1 lock held by multipath/2460: #0: (&md->type_lock){......}, at: [] dm_lock_md_type+0x17/0x19 [dm_mod] Modules linked in: ... Pid: 2460, comm: multipath Tainted: G W 3.7.0-rc2 #1 Call Trace: [] __schedule_bug+0x6a/0x78 [] __schedule+0xb4/0x5e0 [] schedule+0x64/0x66 [] schedule_timeout+0x39/0xf8 [] ? put_lock_stats+0xe/0x29 [] ? lock_release_holdtime+0xb6/0xbb [] wait_for_common+0x9d/0xee [] ? try_to_wake_up+0x206/0x206 [] ? kfree_call_rcu+0x1c/0x1c [] wait_for_completion+0x1d/0x1f [] wait_rcu_gp+0x5d/0x7a [] ? wait_rcu_gp+0x7a/0x7a [] ? complete+0x21/0x53 [] synchronize_rcu+0x1e/0x20 [] blk_queue_bypass_start+0x5d/0x62 [] blkcg_activate_policy+0x73/0x270 [] ? kmem_cache_alloc_node_trace+0xc7/0x108 [] cfq_init_queue+0x80/0x28e [] ? dm_blk_ioctl+0xa7/0xa7 [dm_mod] [] elevator_init+0xe1/0x115 [] ? blk_queue_make_request+0x54/0x59 [] blk_init_allocated_queue+0x8c/0x9e [] dm_setup_md_queue+0x36/0xaa [dm_mod] [] table_load+0x1bd/0x2c8 [dm_mod] [] ctl_ioctl+0x1d6/0x236 [dm_mod] [] ? table_clear+0xaa/0xaa [dm_mod] [] dm_ctl_ioctl+0x13/0x17 [dm_mod] [] do_vfs_ioctl+0x3fb/0x441 [] ? file_has_perm+0x8a/0x99 [] sys_ioctl+0x5e/0x82 [] ? trace_hardirqs_on_thunk+0x3a/0x3f [] system_call_fastpath+0x16/0x1b The warning means during queue initialization blk_queue_bypass_start() calls sleeping function (synchronize_rcu) while dm holds md->type_lock. dm device initialization basically includes the following 3 steps: 1. create ioctl, allocates queue and call add_disk() 2. table load ioctl, determines device type and initialize queue if request-based 3. resume ioctl, device becomes functional So it is better to have dm's queue stay in bypass mode until the initialization completes in table load ioctl. The effect of additional blk_queue_bypass_start(): 3.7-rc2 (plain) # time for n in $(seq 1000); do dmsetup create --noudevsync --notable a; \ dmsetup remove a; done real 0m15.434s user 0m0.423s sys 0m7.052s 3.7-rc2 (with this patch) # time for n in $(seq 1000); do dmsetup create --noudevsync --notable a; \ dmsetup remove a; done real 0m19.766s user 0m0.442s sys 0m6.861s If this additional cost is not negligible, we need a variant of add_disk() that does not end bypassing. Signed-off-by: Jun'ichi Nomura Cc: Vivek Goyal Cc: Tejun Heo Cc: Jens Axboe Cc: Alasdair G Kergon --- drivers/md/dm.c | 4 ++++ 1 file changed, 4 insertions(+) -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel diff --git a/drivers/md/dm.c b/drivers/md/dm.c index 02db918..ad02761 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -1869,6 +1869,8 @@ static struct mapped_device *alloc_dev(int minor) md->disk->private_data = md; sprintf(md->disk->disk_name, "dm-%d", minor); add_disk(md->disk); + /* Until md type is determined, put the queue in bypass mode */ + blk_queue_bypass_start(md->queue); format_dev_t(md->name, MKDEV(_major, minor)); md->wq = alloc_workqueue("kdmflush", @@ -2172,6 +2174,7 @@ static int dm_init_request_based_queue(struct mapped_device *md) return 1; /* Fully initialize the queue */ + WARN_ON(!blk_queue_bypass(md->queue)); q = blk_init_allocated_queue(md->queue, dm_request_fn, NULL); if (!q) return 0; @@ -2198,6 +2201,7 @@ int dm_setup_md_queue(struct mapped_device *md) return -EINVAL; } + blk_queue_bypass_end(md->queue); return 0; }