From patchwork Mon Jun 8 17:01:21 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Luis Chamberlain X-Patchwork-Id: 11593699 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B74A4912 for ; Mon, 8 Jun 2020 17:01:35 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 8D1F92053B for ; Mon, 8 Jun 2020 17:01:35 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8D1F92053B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 650326B0007; Mon, 8 Jun 2020 13:01:34 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 626016B0008; Mon, 8 Jun 2020 13:01:34 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 53B176B000A; Mon, 8 Jun 2020 13:01:34 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0085.hostedemail.com [216.40.44.85]) by kanga.kvack.org (Postfix) with ESMTP id 3E4E36B0007 for ; Mon, 8 Jun 2020 13:01:34 -0400 (EDT) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id DD5AA393F78 for ; Mon, 8 Jun 2020 17:01:33 +0000 (UTC) X-FDA: 76906660866.03.burst24_210dd5226dbb Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin03.hostedemail.com (Postfix) with ESMTP id ABB21223469 for ; Mon, 8 Jun 2020 17:01:33 +0000 (UTC) X-Spam-Summary: 2,0,0,211b439d138ffe59,d41d8cd98f00b204,mcgrof@gmail.com,,RULES_HIT:41:355:379:541:800:960:966:968:973:988:989:1260:1311:1314:1345:1359:1437:1515:1535:1544:1605:1711:1730:1747:1777:1792:2196:2199:2393:2559:2562:2689:2693:3138:3139:3140:3141:3142:3865:3866:3867:3868:3871:4118:4321:4385:5007:6261:7875:8660:9389:10004:11026:11473:11658:11914:12043:12048:12291:12296:12297:12438:12517:12519:12555:12683:12895:12986:13148:13230:13894:13972:14096:14110:14181:14394:14721:21080:21212:21444:21451:21627:21939:21987:21990:30054,0,RBL:209.85.215.195:@gmail.com:.lbl8.mailshell.net-62.18.0.100 66.100.201.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:28,LUA_SUMMARY:none X-HE-Tag: burst24_210dd5226dbb X-Filterd-Recvd-Size: 7281 Received: from mail-pg1-f195.google.com (mail-pg1-f195.google.com [209.85.215.195]) by imf28.hostedemail.com (Postfix) with ESMTP for ; Mon, 8 Jun 2020 17:01:33 +0000 (UTC) Received: by mail-pg1-f195.google.com with SMTP id o8so9011015pgm.7 for ; Mon, 08 Jun 2020 10:01:33 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=IR8sE8tegFIZFEE9c+9scyuraOZnu4HrFVBH9bLfdFU=; b=uSxCrk2OMDCv+zijXZjy5K1pvewUJLDmcl4JZ1T8WMAXqrd2fTR/hiJungQ4BIJbyI G1dicDZihGmFwx8jwk4/zcSsbB/I5nzllxt0V+rw4dgagY1nJ7QAOElg3gUzi5QNz7Q9 fBMg5nhZB1tWa3zUuGCKYszhe1lPGlVyae9W/nIqNgKovvuyViRsU3lxL7tlkxuWcSdl CyFiKdtjY6gGifJlVcB/lclFwbz1w7ahz2pYSCjO7VCZeeadPIhQh0pCT2Z6jJk/x2Wy RykHLdwNSAP/QzQD9Xi6RZNLFjTXyLa+34HWBd2V+abfs1/Q5V5s9gs1JkO9AelGSA0W ot7w== X-Gm-Message-State: AOAM533YZh8F/itCIFAtUVXIh3Sianr6Q9Sf7tDTDb41gEsHlD4yUtT7 Jt4BSP1rB/QVWT9mI4LyPk8= X-Google-Smtp-Source: ABdhPJxt2pD6W+zmvHH4/qZYFASujPCD7TMwgg/ner/gHNpT88Lk15hCgOpUe+6qMytW01UEwr4DCw== X-Received: by 2002:a63:7e5a:: with SMTP id o26mr21581596pgn.134.1591635692300; Mon, 08 Jun 2020 10:01:32 -0700 (PDT) Received: from 42.do-not-panic.com (42.do-not-panic.com. [157.230.128.187]) by smtp.gmail.com with ESMTPSA id b140sm7542863pfb.119.2020.06.08.10.01.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 08 Jun 2020 10:01:28 -0700 (PDT) Received: by 42.do-not-panic.com (Postfix, from userid 1000) id F0A7740945; Mon, 8 Jun 2020 17:01:27 +0000 (UTC) From: Luis Chamberlain To: axboe@kernel.dk, viro@zeniv.linux.org.uk, bvanassche@acm.org, gregkh@linuxfoundation.org, rostedt@goodmis.org, mingo@redhat.com, jack@suse.cz, ming.lei@redhat.com, nstange@suse.de, akpm@linux-foundation.org Cc: mhocko@suse.com, yukuai3@huawei.com, martin.petersen@oracle.com, jejb@linux.ibm.com, linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Luis Chamberlain , Christoph Hellwig Subject: [PATCH v6 1/6] block: add docs for gendisk / request_queue refcount helpers Date: Mon, 8 Jun 2020 17:01:21 +0000 Message-Id: <20200608170127.20419-2-mcgrof@kernel.org> X-Mailer: git-send-email 2.23.0.rc1 In-Reply-To: <20200608170127.20419-1-mcgrof@kernel.org> References: <20200608170127.20419-1-mcgrof@kernel.org> MIME-Version: 1.0 X-Rspamd-Queue-Id: ABB21223469 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam02 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This adds documentation for the gendisk / request_queue refcount helpers. Reviewed-by: Christoph Hellwig Reviewed-by: Bart Van Assche Signed-off-by: Luis Chamberlain --- block/blk-core.c | 13 +++++++++++++ block/genhd.c | 50 +++++++++++++++++++++++++++++++++++++++++++++++- 2 files changed, 62 insertions(+), 1 deletion(-) diff --git a/block/blk-core.c b/block/blk-core.c index 62a4904db921..a0760aac110a 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -321,6 +321,13 @@ void blk_clear_pm_only(struct request_queue *q) } EXPORT_SYMBOL_GPL(blk_clear_pm_only); +/** + * blk_put_queue - decrement the request_queue refcount + * @q: the request_queue structure to decrement the refcount for + * + * Decrements the refcount of the request_queue kobject. When this reaches 0 + * we'll have blk_release_queue() called. + */ void blk_put_queue(struct request_queue *q) { kobject_put(&q->kobj); @@ -598,6 +605,12 @@ struct request_queue *blk_alloc_queue(make_request_fn make_request, int node_id) } EXPORT_SYMBOL(blk_alloc_queue); +/** + * blk_get_queue - increment the request_queue refcount + * @q: the request_queue structure to increment the refcount for + * + * Increment the refcount of the request_queue kobject. + */ bool blk_get_queue(struct request_queue *q) { if (likely(!blk_queue_dying(q))) { diff --git a/block/genhd.c b/block/genhd.c index 1a7659327664..f741613d731f 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -876,6 +876,20 @@ static void invalidate_partition(struct gendisk *disk, int partno) bdput(bdev); } +/** + * del_gendisk - remove the gendisk + * @disk: the struct gendisk to remove + * + * Removes the gendisk and all its associated resources. This deletes the + * partitions associated with the gendisk, and unregisters the associated + * request_queue. + * + * This is the counter to the respective __device_add_disk() call. + * + * The final removal of the struct gendisk happens when its refcount reaches 0 + * with put_disk(), which should be called after del_gendisk(), if + * __device_add_disk() was used. + */ void del_gendisk(struct gendisk *disk) { struct disk_part_iter piter; @@ -1514,6 +1528,23 @@ int disk_expand_part_tbl(struct gendisk *disk, int partno) return 0; } +/** + * disk_release - releases all allocated resources of the gendisk + * @dev: the device representing this disk + * + * This function releases all allocated resources of the gendisk. + * + * The struct gendisk refcount is incremented with get_gendisk() or + * get_disk_and_module(), and its refcount is decremented with + * put_disk_and_module() or put_disk(). Once the refcount reaches 0 this + * function is called. + * + * Drivers which used __device_add_disk() have a gendisk with a request_queue + * assigned. Since the request_queue sits on top of the gendisk for these + * drivers we also call blk_put_queue() for them, and we expect the + * request_queue refcount to reach 0 at this point, and so the request_queue + * will also be freed prior to the disk. + */ static void disk_release(struct device *dev) { struct gendisk *disk = dev_to_disk(dev); @@ -1727,6 +1758,13 @@ struct gendisk *__alloc_disk_node(int minors, int node_id) } EXPORT_SYMBOL(__alloc_disk_node); +/** + * get_disk_and_module - increments the gendisk and gendisk fops module refcount + * @disk: the struct gendisk to to increment the refcount for + * + * This increments the refcount for the struct gendisk, and the gendisk's + * fops module owner. + */ struct kobject *get_disk_and_module(struct gendisk *disk) { struct module *owner; @@ -1747,6 +1785,13 @@ struct kobject *get_disk_and_module(struct gendisk *disk) } EXPORT_SYMBOL(get_disk_and_module); +/** + * put_disk - decrements the gendisk refcount + * @disk: the struct gendisk to to decrement the refcount for + * + * This decrements the refcount for the struct gendisk. When this reaches 0 + * we'll have disk_release() called. + */ void put_disk(struct gendisk *disk) { if (disk) @@ -1754,7 +1799,10 @@ void put_disk(struct gendisk *disk) } EXPORT_SYMBOL(put_disk); -/* +/** + * put_disk_and_module - decrements the module and gendisk refcount + * @disk: the struct gendisk to to decrement the refcount for + * * This is a counterpart of get_disk_and_module() and thus also of * get_gendisk(). */ From patchwork Mon Jun 8 17:01:22 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Luis Chamberlain X-Patchwork-Id: 11593711 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 67B97912 for ; Mon, 8 Jun 2020 17:01:53 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 33BB52086A for ; Mon, 8 Jun 2020 17:01:53 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 33BB52086A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id A20756B0024; Mon, 8 Jun 2020 13:01:46 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 980E66B0025; Mon, 8 Jun 2020 13:01:46 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8227A6B0026; Mon, 8 Jun 2020 13:01:46 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0102.hostedemail.com [216.40.44.102]) by kanga.kvack.org (Postfix) with ESMTP id 66C9F6B0024 for ; Mon, 8 Jun 2020 13:01:46 -0400 (EDT) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 0FDF41407AE for ; Mon, 8 Jun 2020 17:01:46 +0000 (UTC) X-FDA: 76906661412.22.crime52_360ec5d26dbb Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin22.hostedemail.com (Postfix) with ESMTP id 91E44187277E8 for ; Mon, 8 Jun 2020 17:01:32 +0000 (UTC) X-Spam-Summary: 2,0,0,6de9d742d7ce4881,d41d8cd98f00b204,mcgrof@gmail.com,,RULES_HIT:41:355:379:541:800:960:973:988:989:1260:1311:1314:1345:1359:1437:1515:1534:1541:1711:1730:1747:1777:1792:2393:2553:2559:2562:2693:2892:3138:3139:3140:3141:3142:3353:3865:3866:3867:3868:3870:3874:4321:5007:6261:10004:11026:11473:11658:11914:12043:12048:12296:12297:12438:12517:12519:12555:12895:13069:13311:13357:13894:14096:14181:14384:14394:14721:21080:21433:21444:21451:21627:21740:30003:30054:30075:30090,0,RBL:209.85.216.67:@gmail.com:.lbl8.mailshell.net-62.50.0.100 66.100.201.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:32,LUA_SUMMARY:none X-HE-Tag: crime52_360ec5d26dbb X-Filterd-Recvd-Size: 4566 Received: from mail-pj1-f67.google.com (mail-pj1-f67.google.com [209.85.216.67]) by imf49.hostedemail.com (Postfix) with ESMTP for ; Mon, 8 Jun 2020 17:01:31 +0000 (UTC) Received: by mail-pj1-f67.google.com with SMTP id jz3so97526pjb.0 for ; Mon, 08 Jun 2020 10:01:31 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ZlpRC/Mpz8aVWhkWOBV4KQndGPE3w8KEQwXgj5xVC8o=; b=doFajsmsL19/5L8OuHOO2UebgT1fayXEj0mmRQpevJMJ+Tuso3ey1fxx5E4D1wK8Au qyIreiFpbzdz3g01VoB72iljc4JqpJ2/dYauovoTgBuxqS5I41B4y9InbK0Ph9pSV3Ch 5rF/zOb/7MhUnaPrVFd3RqYT4QQ3sKn+7b8m2CtSxkbCSEWjC3uSVdL+IO+XXbDUmdWR FQx1wSDoV270UlgsKCjPjzAZ2kfo2YGIOW0l99XtocLaU/fUgrqCG/FeWcWNtO6IVczt 2i5M6seZrQ1uRDhw7oFbPaEb0dY0rANSs8YvpWpqasaDqlgSIin+duwtEpnfhvT4iJZP UMcg== X-Gm-Message-State: AOAM533Ww/97Kd6UBpjPIht7yZ+Dnzhd1s4UdXs2lWONh3aEMTNLgW92 Cj374Kr+F/XtiYp32hbQihs= X-Google-Smtp-Source: ABdhPJwNsBvs+w+3hNLWX1L/WRF8FImqilYKhRJVhUwJyH1Y0WnlC6Bcs7lRahJkSJraRMBfGhoc7w== X-Received: by 2002:a17:902:eb13:: with SMTP id l19mr20105245plb.213.1591635691008; Mon, 08 Jun 2020 10:01:31 -0700 (PDT) Received: from 42.do-not-panic.com (42.do-not-panic.com. [157.230.128.187]) by smtp.gmail.com with ESMTPSA id e124sm7619466pfh.140.2020.06.08.10.01.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 08 Jun 2020 10:01:28 -0700 (PDT) Received: by 42.do-not-panic.com (Postfix, from userid 1000) id 217B640B6C; Mon, 8 Jun 2020 17:01:28 +0000 (UTC) From: Luis Chamberlain To: axboe@kernel.dk, viro@zeniv.linux.org.uk, bvanassche@acm.org, gregkh@linuxfoundation.org, rostedt@goodmis.org, mingo@redhat.com, jack@suse.cz, ming.lei@redhat.com, nstange@suse.de, akpm@linux-foundation.org Cc: mhocko@suse.com, yukuai3@huawei.com, martin.petersen@oracle.com, jejb@linux.ibm.com, linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Luis Chamberlain , Christoph Hellwig Subject: [PATCH v6 2/6] block: clarify context for refcount increment helpers Date: Mon, 8 Jun 2020 17:01:22 +0000 Message-Id: <20200608170127.20419-3-mcgrof@kernel.org> X-Mailer: git-send-email 2.23.0.rc1 In-Reply-To: <20200608170127.20419-1-mcgrof@kernel.org> References: <20200608170127.20419-1-mcgrof@kernel.org> MIME-Version: 1.0 X-Rspamd-Queue-Id: 91E44187277E8 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam03 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Let us clarify the context under which the helpers to increment the refcount for the gendisk and request_queue can be called under. We make this explicit on the places where we may sleep with might_sleep(). We don't address the decrement context yet, as that needs some extra work and fixes, but will be addressed in the next patch. Reviewed-by: Christoph Hellwig Reviewed-by: Bart Van Assche Signed-off-by: Luis Chamberlain --- block/blk-core.c | 2 ++ block/genhd.c | 6 ++++++ 2 files changed, 8 insertions(+) diff --git a/block/blk-core.c b/block/blk-core.c index a0760aac110a..14c09daf55f3 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -610,6 +610,8 @@ EXPORT_SYMBOL(blk_alloc_queue); * @q: the request_queue structure to increment the refcount for * * Increment the refcount of the request_queue kobject. + * + * Context: Any context. */ bool blk_get_queue(struct request_queue *q) { diff --git a/block/genhd.c b/block/genhd.c index f741613d731f..1be86b1f43ec 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -985,11 +985,15 @@ static ssize_t disk_badblocks_store(struct device *dev, * * This function gets the structure containing partitioning * information for the given device @devt. + * + * Context: can sleep */ struct gendisk *get_gendisk(dev_t devt, int *partno) { struct gendisk *disk = NULL; + might_sleep(); + if (MAJOR(devt) != BLOCK_EXT_MAJOR) { struct kobject *kobj; @@ -1764,6 +1768,8 @@ EXPORT_SYMBOL(__alloc_disk_node); * * This increments the refcount for the struct gendisk, and the gendisk's * fops module owner. + * + * Context: Any context. */ struct kobject *get_disk_and_module(struct gendisk *disk) { From patchwork Mon Jun 8 17:01:23 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Luis Chamberlain X-Patchwork-Id: 11593701 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A64F492A for ; Mon, 8 Jun 2020 17:01:41 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 6FFE920760 for ; Mon, 8 Jun 2020 17:01:41 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6FFE920760 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 7AD0D6B000A; Mon, 8 Jun 2020 13:01:40 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 70DEA6B000C; Mon, 8 Jun 2020 13:01:40 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 629AE6B000D; Mon, 8 Jun 2020 13:01:40 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0112.hostedemail.com [216.40.44.112]) by kanga.kvack.org (Postfix) with ESMTP id 47F666B000A for ; Mon, 8 Jun 2020 13:01:40 -0400 (EDT) Received: from smtpin16.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id E2F581C829D for ; Mon, 8 Jun 2020 17:01:39 +0000 (UTC) X-FDA: 76906661118.16.flame90_49145f326dbb Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin16.hostedemail.com (Postfix) with ESMTP id 435D310332DFC for ; Mon, 8 Jun 2020 17:01:35 +0000 (UTC) X-Spam-Summary: 10,1,0,2b9b7b3562fa2abf,d41d8cd98f00b204,mcgrof@gmail.com,,RULES_HIT:1:2:41:69:355:379:541:800:960:966:968:973:982:988:989:1260:1311:1314:1345:1359:1437:1515:1605:1730:1747:1777:1792:2194:2196:2198:2199:2200:2201:2393:2553:2559:2562:2689:2693:2895:3138:3139:3140:3141:3142:3865:3866:3867:3868:3870:3871:3872:3874:4031:4051:4321:4385:4605:5007:6261:6742:7875:7903:8604:8660:9108:9389:9393:9592:10007:11026:11232:11473:11658:11914:12043:12048:12291:12296:12297:12438:12517:12519:12555:12679:12683:12895:12986:13148:13149:13161:13229:13230:13868:13894:13972:14394:21063:21080:21212:21324:21433:21444:21451:21627:21740:21939:21972:21987:21990:30012:30054:30064:30070:30090:30091,0,RBL:209.85.216.68:@gmail.com:.lbl8.mailshell.net-62.18.0.100 66.100.201.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:none,Custom_rules:0:1:0,LFtime:25,LUA_SUMMARY:none X-HE-Tag: flame90_49145f326dbb X-Filterd-Recvd-Size: 11788 Received: from mail-pj1-f68.google.com (mail-pj1-f68.google.com [209.85.216.68]) by imf04.hostedemail.com (Postfix) with ESMTP for ; Mon, 8 Jun 2020 17:01:34 +0000 (UTC) Received: by mail-pj1-f68.google.com with SMTP id ne5so87815pjb.5 for ; Mon, 08 Jun 2020 10:01:34 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=czNjXU30rSIShN0VGIH9O9NmGY2Ep4u1wGzVMmo2EPc=; b=KPCgEUOWFxK/mx3lVaWJ1gQ9caSylf4WdLl1q614/gHpIln6/Q65y2ipRpph1ruYiV +9xe3XwRLA+FsQV7FpZgHfJFixifvH+BOLzJeY59yNwNLxPAkfeKILefZKjTgpWyYoPj hoYY7MRNaRF/odQI1DaaexA9r//WH4pZrjRLYPIGaGlR6DglsgvJoTip85CtAa29eXsQ G780qdMLWyUF9fBBwyJOqSwumn+tK+QDU9fUsMGPsR7oWYtcD5e507MiTfGXRyQEmNK8 Ny2xoEyF0pcWL8BoCJFq/tKxH435v86yKAwK+SkrCgQw4uZUdzdp0Z8l9XJ3CjVAVclc Do2A== X-Gm-Message-State: AOAM531e+Q4skHF86OagGb2tAyDu5QP376UO7iRq7faGoyhA7ijMdFmR EgbdObjg26SVGn/jHGiJcyg= X-Google-Smtp-Source: ABdhPJyFWnizLnXRGDBsljYjqN67qm0DZiifB27mOcjtR4o0sSeP8FcBSncD65XYbTy97RXzjGWCKg== X-Received: by 2002:a17:902:7297:: with SMTP id d23mr4713106pll.35.1591635693722; Mon, 08 Jun 2020 10:01:33 -0700 (PDT) Received: from 42.do-not-panic.com (42.do-not-panic.com. [157.230.128.187]) by smtp.gmail.com with ESMTPSA id l3sm6404738pgm.59.2020.06.08.10.01.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 08 Jun 2020 10:01:31 -0700 (PDT) Received: by 42.do-not-panic.com (Postfix, from userid 1000) id 4A77B41C23; Mon, 8 Jun 2020 17:01:28 +0000 (UTC) From: Luis Chamberlain To: axboe@kernel.dk, viro@zeniv.linux.org.uk, bvanassche@acm.org, gregkh@linuxfoundation.org, rostedt@goodmis.org, mingo@redhat.com, jack@suse.cz, ming.lei@redhat.com, nstange@suse.de, akpm@linux-foundation.org Cc: mhocko@suse.com, yukuai3@huawei.com, martin.petersen@oracle.com, jejb@linux.ibm.com, linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Luis Chamberlain , Omar Sandoval , Hannes Reinecke , Michal Hocko , Christoph Hellwig Subject: [PATCH v6 3/6] block: revert back to synchronous request_queue removal Date: Mon, 8 Jun 2020 17:01:23 +0000 Message-Id: <20200608170127.20419-4-mcgrof@kernel.org> X-Mailer: git-send-email 2.23.0.rc1 In-Reply-To: <20200608170127.20419-1-mcgrof@kernel.org> References: <20200608170127.20419-1-mcgrof@kernel.org> MIME-Version: 1.0 X-Rspamd-Queue-Id: 435D310332DFC X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam02 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Commit dc9edc44de6c ("block: Fix a blk_exit_rl() regression") merged on v4.12 moved the work behind blk_release_queue() into a workqueue after a splat floated around which indicated some work on blk_release_queue() could sleep in blk_exit_rl(). This splat would be possible when a driver called blk_put_queue() or blk_cleanup_queue() (which calls blk_put_queue() as its final call) from an atomic context. blk_put_queue() decrements the refcount for the request_queue kobject, and upon reaching 0 blk_release_queue() is called. Although blk_exit_rl() is now removed through commit db6d9952356 ("block: remove request_list code") on v5.0, we reserve the right to be able to sleep within blk_release_queue() context. The last reference for the request_queue must not be called from atomic context. *When* the last reference to the request_queue reaches 0 varies, and so let's take the opportunity to document when that is expected to happen and also document the context of the related calls as best as possible so we can avoid future issues, and with the hopes that the synchronous request_queue removal sticks. We revert back to synchronous request_queue removal because asynchronous removal creates a regression with expected userspace interaction with several drivers. An example is when removing the loopback driver, one uses ioctls from userspace to do so, but upon return and if successful, one expects the device to be removed. Likewise if one races to add another device the new one may not be added as it is still being removed. This was expected behavior before and it now fails as the device is still present and busy still. Moving to asynchronous request_queue removal could have broken many scripts which relied on the removal to have been completed if there was no error. Document this expectation as well so that this doesn't regress userspace again. Using asynchronous request_queue removal however has helped us find other bugs. In the future we can test what could break with this arrangement by enabling CONFIG_DEBUG_KOBJECT_RELEASE. While at it, update the docs with the context expectations for the request_queue / gendisk refcount decrement, and make these expectations explicit by using might_sleep(). Cc: Bart Van Assche Cc: Omar Sandoval Cc: Hannes Reinecke Cc: Nicolai Stange Cc: Greg Kroah-Hartman Cc: Michal Hocko Cc: yu kuai Suggested-by: Nicolai Stange Fixes: dc9edc44de6c ("block: Fix a blk_exit_rl() regression") Reviewed-by: Christoph Hellwig Reviewed-by: Bart Van Assche Signed-off-by: Luis Chamberlain Reviewed-by: Bart Van Assche --- block/blk-core.c | 8 ++++++++ block/blk-sysfs.c | 43 +++++++++++++++++++++--------------------- block/genhd.c | 17 +++++++++++++++++ include/linux/blkdev.h | 2 -- 4 files changed, 47 insertions(+), 23 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index 14c09daf55f3..a5126c0be777 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -327,6 +327,9 @@ EXPORT_SYMBOL_GPL(blk_clear_pm_only); * * Decrements the refcount of the request_queue kobject. When this reaches 0 * we'll have blk_release_queue() called. + * + * Context: Any context, but the last reference must not be dropped from + * atomic context. */ void blk_put_queue(struct request_queue *q) { @@ -359,9 +362,14 @@ EXPORT_SYMBOL_GPL(blk_set_queue_dying); * * Mark @q DYING, drain all pending requests, mark @q DEAD, destroy and * put it. All future requests will be failed immediately with -ENODEV. + * + * Context: can sleep */ void blk_cleanup_queue(struct request_queue *q) { + /* cannot be called from atomic context */ + might_sleep(); + WARN_ON_ONCE(blk_queue_registered(q)); /* mark @q DYING, no new request or merges will be allowed afterwards */ diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index 02643e149d5e..561624d4cc4e 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -873,22 +873,32 @@ static void blk_exit_queue(struct request_queue *q) bdi_put(q->backing_dev_info); } - /** - * __blk_release_queue - release a request queue - * @work: pointer to the release_work member of the request queue to be released + * blk_release_queue - releases all allocated resources of the request_queue + * @kobj: pointer to a kobject, whose container is a request_queue + * + * This function releases all allocated resources of the request queue. + * + * The struct request_queue refcount is incremented with blk_get_queue() and + * decremented with blk_put_queue(). Once the refcount reaches 0 this function + * is called. + * + * For drivers that have a request_queue on a gendisk and added with + * __device_add_disk() the refcount to request_queue will reach 0 with + * the last put_disk() called by the driver. For drivers which don't use + * __device_add_disk() this happens with blk_cleanup_queue(). * - * Description: - * This function is called when a block device is being unregistered. The - * process of releasing a request queue starts with blk_cleanup_queue, which - * set the appropriate flags and then calls blk_put_queue, that decrements - * the reference counter of the request queue. Once the reference counter - * of the request queue reaches zero, blk_release_queue is called to release - * all allocated resources of the request queue. + * Drivers exist which depend on the release of the request_queue to be + * synchronous, it should not be deferred. + * + * Context: can sleep */ -static void __blk_release_queue(struct work_struct *work) +static void blk_release_queue(struct kobject *kobj) { - struct request_queue *q = container_of(work, typeof(*q), release_work); + struct request_queue *q = + container_of(kobj, struct request_queue, kobj); + + might_sleep(); if (test_bit(QUEUE_FLAG_POLL_STATS, &q->queue_flags)) blk_stat_remove_callback(q, q->poll_cb); @@ -917,15 +927,6 @@ static void __blk_release_queue(struct work_struct *work) call_rcu(&q->rcu_head, blk_free_queue_rcu); } -static void blk_release_queue(struct kobject *kobj) -{ - struct request_queue *q = - container_of(kobj, struct request_queue, kobj); - - INIT_WORK(&q->release_work, __blk_release_queue); - schedule_work(&q->release_work); -} - static const struct sysfs_ops queue_sysfs_ops = { .show = queue_attr_show, .store = queue_attr_store, diff --git a/block/genhd.c b/block/genhd.c index 1be86b1f43ec..60ae4e1b4d38 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -889,12 +889,19 @@ static void invalidate_partition(struct gendisk *disk, int partno) * The final removal of the struct gendisk happens when its refcount reaches 0 * with put_disk(), which should be called after del_gendisk(), if * __device_add_disk() was used. + * + * Drivers exist which depend on the release of the gendisk to be synchronous, + * it should not be deferred. + * + * Context: can sleep */ void del_gendisk(struct gendisk *disk) { struct disk_part_iter piter; struct hd_struct *part; + might_sleep(); + blk_integrity_del(disk); disk_del_events(disk); @@ -1548,11 +1555,15 @@ int disk_expand_part_tbl(struct gendisk *disk, int partno) * drivers we also call blk_put_queue() for them, and we expect the * request_queue refcount to reach 0 at this point, and so the request_queue * will also be freed prior to the disk. + * + * Context: can sleep */ static void disk_release(struct device *dev) { struct gendisk *disk = dev_to_disk(dev); + might_sleep(); + blk_free_devt(dev->devt); disk_release_events(disk); kfree(disk->random); @@ -1797,6 +1808,9 @@ EXPORT_SYMBOL(get_disk_and_module); * * This decrements the refcount for the struct gendisk. When this reaches 0 * we'll have disk_release() called. + * + * Context: Any context, but the last reference must not be dropped from + * atomic context. */ void put_disk(struct gendisk *disk) { @@ -1811,6 +1825,9 @@ EXPORT_SYMBOL(put_disk); * * This is a counterpart of get_disk_and_module() and thus also of * get_gendisk(). + * + * Context: Any context, but the last reference must not be dropped from + * atomic context. */ void put_disk_and_module(struct gendisk *disk) { diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 27887bf36d50..2462b78c1013 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -584,8 +584,6 @@ struct request_queue { size_t cmd_size; - struct work_struct release_work; - #define BLK_MAX_WRITE_HINTS 5 u64 write_hints[BLK_MAX_WRITE_HINTS]; }; From patchwork Mon Jun 8 17:01:24 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Luis Chamberlain X-Patchwork-Id: 11593707 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3E67D92A for ; Mon, 8 Jun 2020 17:01:47 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 157CB20760 for ; Mon, 8 Jun 2020 17:01:47 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 157CB20760 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id F3AAB6B000E; Mon, 8 Jun 2020 13:01:40 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id D79736B0023; Mon, 8 Jun 2020 13:01:40 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B777F6B000E; Mon, 8 Jun 2020 13:01:40 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0068.hostedemail.com [216.40.44.68]) by kanga.kvack.org (Postfix) with ESMTP id 8C6066B0010 for ; Mon, 8 Jun 2020 13:01:40 -0400 (EDT) Received: from smtpin02.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 220F91C829C for ; Mon, 8 Jun 2020 17:01:40 +0000 (UTC) X-FDA: 76906661160.02.grape95_550a40126dbb Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin02.hostedemail.com (Postfix) with ESMTP id 11D6F940B3 for ; Mon, 8 Jun 2020 17:01:36 +0000 (UTC) X-Spam-Summary: 2,0,0,30b94389a20cdd55,d41d8cd98f00b204,mcgrof@gmail.com,,RULES_HIT:41:355:379:541:800:960:973:988:989:1260:1311:1314:1345:1359:1437:1515:1534:1539:1567:1711:1714:1730:1747:1777:1792:2393:2559:2562:2903:3138:3139:3140:3141:3142:3871:3872:3876:5007:6261:10004:11026:11658:11914:12043:12048:12114:12296:12297:12517:12519:12555:12895:13069:13141:13230:13311:13357:13894:14181:14384:14394:14721:21080:21444:21627:21990:30029:30054,0,RBL:209.85.210.195:@gmail.com:.lbl8.mailshell.net-62.50.0.100 66.100.201.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: grape95_550a40126dbb X-Filterd-Recvd-Size: 3407 Received: from mail-pf1-f195.google.com (mail-pf1-f195.google.com [209.85.210.195]) by imf02.hostedemail.com (Postfix) with ESMTP for ; Mon, 8 Jun 2020 17:01:35 +0000 (UTC) Received: by mail-pf1-f195.google.com with SMTP id x22so8805389pfn.3 for ; Mon, 08 Jun 2020 10:01:35 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=jH2CPsvsaNzRGDf+QoUv0Reklj4c8N9GM5jBSfoalww=; b=iwKeW5JHpikXBm/C6qJ1E01v24OgF0c26iaPiyQyr5g9Cz+QK8Xmm9ydOSFDgTD4ca kSFFDaPjv+WKB1DbPwo+sNK9Nas+M/VScFL/YyDcrzYp274ajBcATcrnQNdf2neENypE 32x/L2gkPNxD+77mNg3XHLoGSHmm1PSdoiaPRId3f7kcOgZo8fp6chhflV9cnlYBQ7A0 9ikHpN8Q61rcnVXA69chhNR1sVTgELsYJtwrIVVvrmNoa8ZDKhiCcwquAZCYN23Tx/rk qrwyzS+ejMN8iuMC62VJiszsXt9RBIccdUeEFt8oXGjE0yc17/yqhuKgTy9smzQYWWGm qIGg== X-Gm-Message-State: AOAM5318Mee0tpyx0HxVR3bDR8sITP7QxpeQHlb8zh4WgurD0DYPyDL5 KknShJfLnQWIfAebjL7Z5yA= X-Google-Smtp-Source: ABdhPJwwBujS3U9PCsomBnrxJDGD76z2zyRa1W/F///y1VSt1LrxETfDgpsHrdXoXXBTG59fZ8Mfyg== X-Received: by 2002:a63:6345:: with SMTP id x66mr20466636pgb.156.1591635694775; Mon, 08 Jun 2020 10:01:34 -0700 (PDT) Received: from 42.do-not-panic.com (42.do-not-panic.com. [157.230.128.187]) by smtp.gmail.com with ESMTPSA id x197sm7671771pfc.13.2020.06.08.10.01.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 08 Jun 2020 10:01:32 -0700 (PDT) Received: by 42.do-not-panic.com (Postfix, from userid 1000) id 6DF0E41D95; Mon, 8 Jun 2020 17:01:28 +0000 (UTC) From: Luis Chamberlain To: axboe@kernel.dk, viro@zeniv.linux.org.uk, bvanassche@acm.org, gregkh@linuxfoundation.org, rostedt@goodmis.org, mingo@redhat.com, jack@suse.cz, ming.lei@redhat.com, nstange@suse.de, akpm@linux-foundation.org Cc: mhocko@suse.com, yukuai3@huawei.com, martin.petersen@oracle.com, jejb@linux.ibm.com, linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Luis Chamberlain Subject: [PATCH v6 4/6] blktrace: annotate required lock on do_blk_trace_setup() Date: Mon, 8 Jun 2020 17:01:24 +0000 Message-Id: <20200608170127.20419-5-mcgrof@kernel.org> X-Mailer: git-send-email 2.23.0.rc1 In-Reply-To: <20200608170127.20419-1-mcgrof@kernel.org> References: <20200608170127.20419-1-mcgrof@kernel.org> MIME-Version: 1.0 X-Rspamd-Queue-Id: 11D6F940B3 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam04 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Ensure it is clear which lock is required on do_blk_trace_setup(). Suggested-by: Bart Van Assche Signed-off-by: Luis Chamberlain Reviewed-by: Christoph Hellwig Reviewed-by: Bart Van Assche --- kernel/trace/blktrace.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c index 7f60029bdaff..7ff2ea5cd05e 100644 --- a/kernel/trace/blktrace.c +++ b/kernel/trace/blktrace.c @@ -483,6 +483,8 @@ static int do_blk_trace_setup(struct request_queue *q, char *name, dev_t dev, struct dentry *dir = NULL; int ret; + lockdep_assert_held(&q->blk_trace_mutex); + if (!buts->buf_size || !buts->buf_nr) return -EINVAL; From patchwork Mon Jun 8 17:01:25 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Luis Chamberlain X-Patchwork-Id: 11593709 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1235492A for ; Mon, 8 Jun 2020 17:01:50 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id DBC442053B for ; Mon, 8 Jun 2020 17:01:49 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org DBC442053B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id D052F6B0010; Mon, 8 Jun 2020 13:01:43 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id C8D2F6B0022; Mon, 8 Jun 2020 13:01:43 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A44666B0023; Mon, 8 Jun 2020 13:01:43 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0216.hostedemail.com [216.40.44.216]) by kanga.kvack.org (Postfix) with ESMTP id 8C43B6B0010 for ; Mon, 8 Jun 2020 13:01:43 -0400 (EDT) Received: from smtpin13.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 496A681621EB for ; Mon, 8 Jun 2020 17:01:43 +0000 (UTC) X-FDA: 76906661286.13.spot14_171192426dbb Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin13.hostedemail.com (Postfix) with ESMTP id C079D181D376F for ; Mon, 8 Jun 2020 17:01:39 +0000 (UTC) X-Spam-Summary: 2,0,0,d331e8eb30f79996,d41d8cd98f00b204,mcgrof@gmail.com,,RULES_HIT:41:355:379:541:800:960:973:988:989:1260:1311:1314:1345:1359:1437:1515:1534:1540:1711:1714:1730:1747:1777:1792:2393:2559:2562:3138:3139:3140:3141:3142:3351:3865:3866:3868:4321:5007:6261:9389:10004:11026:11658:11914:12043:12048:12297:12517:12519:12555:12895:12986:13069:13311:13357:13894:13972:14096:14181:14384:14394:14721:21080:21444:21451:21611:21627:30054,0,RBL:209.85.210.195:@gmail.com:.lbl8.mailshell.net-66.100.201.100 62.50.0.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:32,LUA_SUMMARY:none X-HE-Tag: spot14_171192426dbb X-Filterd-Recvd-Size: 3710 Received: from mail-pf1-f195.google.com (mail-pf1-f195.google.com [209.85.210.195]) by imf06.hostedemail.com (Postfix) with ESMTP for ; Mon, 8 Jun 2020 17:01:37 +0000 (UTC) Received: by mail-pf1-f195.google.com with SMTP id s23so7484226pfh.7 for ; Mon, 08 Jun 2020 10:01:37 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=f88bEORI++QIsnJdKm2ca64CmhxGUzC3g+SSrwClYUg=; b=Iip9mnSqGe+y/x/PTSw4Yzyf02jS14GiRtfB93zU9k76EeSmJJ2NnaCr2MCks5zCNl nIyocd/u+cDM82BYi7qlyDQlu+RGf3V4KvDdWij+cmDpU11M5AZjZCrtGjzMekftaA0l NeW5nkg9TzXQccDKKOe3WXHs7b5I+0MSyoCeKiicy8wKR/wJqV+8tudj9v5qPUTaYTFo ltJ/V8HT5CjDVFQLIV3yeoKg9SHcogZ+Bkf4NHH3kvWVIXqUQvnOZuG0DrHjhK1FPV1G sjiDNY/88cYuiSU25sRiEVdf73jW1IWSk0e48hsD9ORmFgRhsVY2sALqNPpLBDGKme6M RUdQ== X-Gm-Message-State: AOAM53305J+gfEdwA0ic/6zezr7xjWls0OYki4VbNWXCq6AxrbyYqhqL mzg+pKQFX8Cn1s711GomAi4= X-Google-Smtp-Source: ABdhPJwmTcpNrr37VElfNpqehHSBvOJFHFFfHrHovcAKxDpg8BeyMYMNILKDT4r2RmVxa1x2U8XKog== X-Received: by 2002:a63:f14a:: with SMTP id o10mr21710759pgk.216.1591635696982; Mon, 08 Jun 2020 10:01:36 -0700 (PDT) Received: from 42.do-not-panic.com (42.do-not-panic.com. [157.230.128.187]) by smtp.gmail.com with ESMTPSA id iq3sm103489pjb.6.2020.06.08.10.01.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 08 Jun 2020 10:01:32 -0700 (PDT) Received: by 42.do-not-panic.com (Postfix, from userid 1000) id 9045A41DD1; Mon, 8 Jun 2020 17:01:28 +0000 (UTC) From: Luis Chamberlain To: axboe@kernel.dk, viro@zeniv.linux.org.uk, bvanassche@acm.org, gregkh@linuxfoundation.org, rostedt@goodmis.org, mingo@redhat.com, jack@suse.cz, ming.lei@redhat.com, nstange@suse.de, akpm@linux-foundation.org Cc: mhocko@suse.com, yukuai3@huawei.com, martin.petersen@oracle.com, jejb@linux.ibm.com, linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Luis Chamberlain , Christoph Hellwig Subject: [PATCH v6 5/6] loop: be paranoid on exit and prevent new additions / removals Date: Mon, 8 Jun 2020 17:01:25 +0000 Message-Id: <20200608170127.20419-6-mcgrof@kernel.org> X-Mailer: git-send-email 2.23.0.rc1 In-Reply-To: <20200608170127.20419-1-mcgrof@kernel.org> References: <20200608170127.20419-1-mcgrof@kernel.org> MIME-Version: 1.0 X-Rspamd-Queue-Id: C079D181D376F X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam02 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Be pedantic on removal as well and hold the mutex. This should prevent uses of addition while we exit. Reviewed-by: Ming Lei Reviewed-by: Christoph Hellwig Signed-off-by: Luis Chamberlain --- drivers/block/loop.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/block/loop.c b/drivers/block/loop.c index c33bbbfd1bd9..d55e1b52f076 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -2402,6 +2402,8 @@ static void __exit loop_exit(void) range = max_loop ? max_loop << part_shift : 1UL << MINORBITS; + mutex_lock(&loop_ctl_mutex); + idr_for_each(&loop_index_idr, &loop_exit_cb, NULL); idr_destroy(&loop_index_idr); @@ -2409,6 +2411,8 @@ static void __exit loop_exit(void) unregister_blkdev(LOOP_MAJOR, "loop"); misc_deregister(&loop_misc); + + mutex_unlock(&loop_ctl_mutex); } module_init(loop_init); From patchwork Mon Jun 8 17:01:26 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Luis Chamberlain X-Patchwork-Id: 11593703 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A5CA3912 for ; Mon, 8 Jun 2020 17:01:44 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 5415F2053B for ; Mon, 8 Jun 2020 17:01:44 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5415F2053B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id C906F6B000C; Mon, 8 Jun 2020 13:01:40 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id C3E8A6B0010; Mon, 8 Jun 2020 13:01:40 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AE1466B0022; Mon, 8 Jun 2020 13:01:40 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0033.hostedemail.com [216.40.44.33]) by kanga.kvack.org (Postfix) with ESMTP id 813486B000E for ; Mon, 8 Jun 2020 13:01:40 -0400 (EDT) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 38312219837 for ; Mon, 8 Jun 2020 17:01:40 +0000 (UTC) X-FDA: 76906661160.09.pen73_1f155b826dbb Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin09.hostedemail.com (Postfix) with ESMTP id 144741857396F for ; Mon, 8 Jun 2020 17:01:38 +0000 (UTC) X-Spam-Summary: 2,0,0,c59e92d54f04cdbd,d41d8cd98f00b204,mcgrof@gmail.com,,RULES_HIT:41:327:355:379:541:800:960:965:966:968:973:982:988:989:1260:1311:1314:1345:1359:1437:1515:1605:1730:1747:1777:1792:1981:2194:2196:2198:2199:2200:2201:2393:2553:2559:2562:2693:2901:3138:3139:3140:3141:3142:3865:3866:3867:3868:3870:3871:3872:3873:3874:4250:4321:4385:4390:4395:4470:4605:5007:6119:6238:6261:6742:7208:7875:7903:7904:8660:9040:9121:9163:9165:9389:9592:10004:10226:10954:11026:11233:11473:11658:11914:12043:12048:12291:12296:12297:12438:12517:12519:12555:12679:12683:12740:12895:12986:13141:13148:13149:13161:13221:13229:13230:13870:13894:13972:14394:14877:21063:21080:21444:21451:21611:21622:21740:21939:21987:21990:30003:30012:30029:30054:30056:30064:30070:30090,0,RBL:209.85.210.196:@gmail.com:.lbl8.mailshell.net-66.100.201.100 62.50.0.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: pen73_1f155b826dbb X-Filterd-Recvd-Size: 22650 Received: from mail-pf1-f196.google.com (mail-pf1-f196.google.com [209.85.210.196]) by imf45.hostedemail.com (Postfix) with ESMTP for ; Mon, 8 Jun 2020 17:01:36 +0000 (UTC) Received: by mail-pf1-f196.google.com with SMTP id b16so8783469pfi.13 for ; Mon, 08 Jun 2020 10:01:36 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=BXDjJ7y4kAjAoIq3NLlxxshI3+WsFhQehQeoJndvSbs=; b=mi4jv3wQjLbd1OPn6a9FUZvRfEhIgQRnEDPkUUjidyOQ16N2l8RotXY6lpfIXu2Jw4 W+kG1+R6jZsNBTPnMb6O5NsaV3hQQ0t552X/rdr/iFkRR0Lta2wLUvEwJK4h5yajwYZa yH9Zs/rU+EOk0PlGNfgYpESyUbGH9cNII+tMcwI3UO9HHOSs0q2KifBf07tM8mZpEF6D OV+7PKEjyisFNf+C5aPPO7JWd4RvGpGUXkMN0QvhhJAo071lxHZ0cbXBRQeaDoBY0r1j bi2jG0a2TnCzXNHPwyebhmyCC/GxIH3qh8RqwE6MhDTkXeAjIGdVeuryI7kKYKLf1VxE eHew== X-Gm-Message-State: AOAM530cKREHBYvG6mTKALzffbCkE/dnSCa99qwm12zaBYY64NmCFKeE mQHLlOie6igdKQrj+I41c5I= X-Google-Smtp-Source: ABdhPJyeyDj9bMXdFW6Wd7IBC8nhDpsWhwG/SrlFg5KFMRn5bFXdjmvtvwKdt0w3nZVOHHQtnLodvQ== X-Received: by 2002:a65:52cd:: with SMTP id z13mr20816774pgp.259.1591635695752; Mon, 08 Jun 2020 10:01:35 -0700 (PDT) Received: from 42.do-not-panic.com (42.do-not-panic.com. [157.230.128.187]) by smtp.gmail.com with ESMTPSA id j16sm7443498pfa.179.2020.06.08.10.01.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 08 Jun 2020 10:01:32 -0700 (PDT) Received: by 42.do-not-panic.com (Postfix, from userid 1000) id C260C422E5; Mon, 8 Jun 2020 17:01:28 +0000 (UTC) From: Luis Chamberlain To: axboe@kernel.dk, viro@zeniv.linux.org.uk, bvanassche@acm.org, gregkh@linuxfoundation.org, rostedt@goodmis.org, mingo@redhat.com, jack@suse.cz, ming.lei@redhat.com, nstange@suse.de, akpm@linux-foundation.org Cc: mhocko@suse.com, yukuai3@huawei.com, martin.petersen@oracle.com, jejb@linux.ibm.com, linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Luis Chamberlain , Omar Sandoval , Hannes Reinecke , Michal Hocko , syzbot+603294af2d01acfdd6da@syzkaller.appspotmail.com Subject: [PATCH v6 6/6] blktrace: fix debugfs use after free Date: Mon, 8 Jun 2020 17:01:26 +0000 Message-Id: <20200608170127.20419-7-mcgrof@kernel.org> X-Mailer: git-send-email 2.23.0.rc1 In-Reply-To: <20200608170127.20419-1-mcgrof@kernel.org> References: <20200608170127.20419-1-mcgrof@kernel.org> MIME-Version: 1.0 X-Rspamd-Queue-Id: 144741857396F X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam05 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On commit 6ac93117ab00 ("blktrace: use existing disk debugfs directory") merged on v4.12 Omar fixed the original blktrace code for request-based drivers (multiqueue). This however left in place a possible crash, if you happen to abuse blktrace while racing to remove / add a device. We used to use asynchronous removal of the request_queue, and with that the issue was easier to reproduce. Now that we have reverted to synchronous removal of the request_queue, the issue is still possible to reproduce, its however just a bit more difficult. We essentially run two instances of break-blktrace which add/remove a loop device, and setup a blktrace and just never tear the blktrace down. We do this twice in parallel. This is easily reproduced with the script run_0004.sh from break-blktrace [0]. We can end up with two types of panics each reflecting where we race, one a failed blktrace setup: [ 252.426751] debugfs: Directory 'loop0' with parent 'block' already present! [ 252.432265] BUG: kernel NULL pointer dereference, address: 00000000000000a0 [ 252.436592] #PF: supervisor write access in kernel mode [ 252.439822] #PF: error_code(0x0002) - not-present page [ 252.442967] PGD 0 P4D 0 [ 252.444656] Oops: 0002 [#1] SMP NOPTI [ 252.446972] CPU: 10 PID: 1153 Comm: break-blktrace Tainted: G E 5.7.0-rc2-next-20200420+ #164 [ 252.452673] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014 [ 252.456343] RIP: 0010:down_write+0x15/0x40 [ 252.458146] Code: eb ca e8 ae 22 8d ff cc cc cc cc cc cc cc cc cc cc cc cc cc cc 0f 1f 44 00 00 55 48 89 fd e8 52 db ff ff 31 c0 ba 01 00 00 00 48 0f b1 55 00 75 0f 48 8b 04 25 c0 8b 01 00 48 89 45 08 5d [ 252.463638] RSP: 0018:ffffa626415abcc8 EFLAGS: 00010246 [ 252.464950] RAX: 0000000000000000 RBX: ffff958c25f0f5c0 RCX: ffffff8100000000 [ 252.466727] RDX: 0000000000000001 RSI: ffffff8100000000 RDI: 00000000000000a0 [ 252.468482] RBP: 00000000000000a0 R08: 0000000000000000 R09: 0000000000000001 [ 252.470014] R10: 0000000000000000 R11: ffff958d1f9227ff R12: 0000000000000000 [ 252.471473] R13: ffff958c25ea5380 R14: ffffffff8cce15f1 R15: 00000000000000a0 [ 252.473346] FS: 00007f2e69dee540(0000) GS:ffff958c2fc80000(0000) knlGS:0000000000000000 [ 252.475225] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 252.476267] CR2: 00000000000000a0 CR3: 0000000427d10004 CR4: 0000000000360ee0 [ 252.477526] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 252.478776] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 252.479866] Call Trace: [ 252.480322] simple_recursive_removal+0x4e/0x2e0 [ 252.481078] ? debugfs_remove+0x60/0x60 [ 252.481725] ? relay_destroy_buf+0x77/0xb0 [ 252.482662] debugfs_remove+0x40/0x60 [ 252.483518] blk_remove_buf_file_callback+0x5/0x10 [ 252.484328] relay_close_buf+0x2e/0x60 [ 252.484930] relay_open+0x1ce/0x2c0 [ 252.485520] do_blk_trace_setup+0x14f/0x2b0 [ 252.486187] __blk_trace_setup+0x54/0xb0 [ 252.486803] blk_trace_ioctl+0x90/0x140 [ 252.487423] ? do_sys_openat2+0x1ab/0x2d0 [ 252.488053] blkdev_ioctl+0x4d/0x260 [ 252.488636] block_ioctl+0x39/0x40 [ 252.489139] ksys_ioctl+0x87/0xc0 [ 252.489675] __x64_sys_ioctl+0x16/0x20 [ 252.490380] do_syscall_64+0x52/0x180 [ 252.491032] entry_SYSCALL_64_after_hwframe+0x44/0xa9 And the other on the device removal: [ 128.528940] debugfs: Directory 'loop0' with parent 'block' already present! [ 128.615325] BUG: kernel NULL pointer dereference, address: 00000000000000a0 [ 128.619537] #PF: supervisor write access in kernel mode [ 128.622700] #PF: error_code(0x0002) - not-present page [ 128.625842] PGD 0 P4D 0 [ 128.627585] Oops: 0002 [#1] SMP NOPTI [ 128.629871] CPU: 12 PID: 544 Comm: break-blktrace Tainted: G E 5.7.0-rc2-next-20200420+ #164 [ 128.635595] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014 [ 128.640471] RIP: 0010:down_write+0x15/0x40 [ 128.643041] Code: eb ca e8 ae 22 8d ff cc cc cc cc cc cc cc cc cc cc cc cc cc cc 0f 1f 44 00 00 55 48 89 fd e8 52 db ff ff 31 c0 ba 01 00 00 00 48 0f b1 55 00 75 0f 65 48 8b 04 25 c0 8b 01 00 48 89 45 08 5d [ 128.650180] RSP: 0018:ffffa9c3c05ebd78 EFLAGS: 00010246 [ 128.651820] RAX: 0000000000000000 RBX: ffff8ae9a6370240 RCX: ffffff8100000000 [ 128.653942] RDX: 0000000000000001 RSI: ffffff8100000000 RDI: 00000000000000a0 [ 128.655720] RBP: 00000000000000a0 R08: 0000000000000002 R09: ffff8ae9afd2d3d0 [ 128.657400] R10: 0000000000000056 R11: 0000000000000000 R12: 0000000000000000 [ 128.659099] R13: 0000000000000000 R14: 0000000000000003 R15: 00000000000000a0 [ 128.660500] FS: 00007febfd995540(0000) GS:ffff8ae9afd00000(0000) knlGS:0000000000000000 [ 128.662204] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 128.663426] CR2: 00000000000000a0 CR3: 0000000420042003 CR4: 0000000000360ee0 [ 128.664776] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 128.666022] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 128.667282] Call Trace: [ 128.667801] simple_recursive_removal+0x4e/0x2e0 [ 128.668663] ? debugfs_remove+0x60/0x60 [ 128.669368] debugfs_remove+0x40/0x60 [ 128.669985] blk_trace_free+0xd/0x50 [ 128.670593] __blk_trace_remove+0x27/0x40 [ 128.671274] blk_trace_shutdown+0x30/0x40 [ 128.671935] blk_release_queue+0x95/0xf0 [ 128.672589] kobject_put+0xa5/0x1b0 [ 128.673188] disk_release+0xa2/0xc0 [ 128.673786] device_release+0x28/0x80 [ 128.674376] kobject_put+0xa5/0x1b0 [ 128.674915] loop_remove+0x39/0x50 [loop] [ 128.675511] loop_control_ioctl+0x113/0x130 [loop] [ 128.676199] ksys_ioctl+0x87/0xc0 [ 128.676708] __x64_sys_ioctl+0x16/0x20 [ 128.677274] do_syscall_64+0x52/0x180 [ 128.677823] entry_SYSCALL_64_after_hwframe+0x44/0xa9 The common theme here is: debugfs: Directory 'loop0' with parent 'block' already present This crash happens because of how blktrace uses the debugfs directory where it places its files. Upon init we always create the same directory which would be needed by blktrace but we only do this for make_request drivers (multiqueue) block drivers, but never for request-based block drivers. Furthermore, that directory is only created on init for the entire disk. This means that if you use blktrace on a partition, we'll always be creating a new directory regardless of whether or not you are doing blktrace on a make_request driver (multiqueue) or a request-based block drivers. These directory creations are only associated with a path, and so when a debugfs_remove() is called it removes everything in its way. A device removal will remove all blktrace files, and so if a blktrace is still present a cleanup of blktrace files later will end up trying to remove dentries pointing to NULL. We can fix the UAF by using a debugfs directory which moving forward will always be accessible if debugfs is enabled for both make_request drivers (multiqueue) and request-based block drivers, *and* for all partitions upon creation. This ensures that removal of the directories only happens on device removal and removes the race of the files underneath an active blktrace. We special-case a solution for scsi-generic as well which got blktrace support added by Christof via commit 6da127ad0918 ("blktrace: Add blktrace ioctls to SCSI generic devices") so upstream since v2.6.25. scsi-generic drives use a character device, however behind the scenes we have a scsi device with a request_queue. How this is used varies by class of driver (TYPE_DISK, TYPE_TAPE, etc). We simply create its a scsi-generic dedicated debugfs_dir and have blktrace use that when this interface is used. This goes tested with: o nvme partitions o ISCSI with tgt, and blktracing against scsi-generic with: o block o tape o cdrom o media changer o blktests This patch is part of the work which disputes the severity of CVE-2019-19770 which shows this issue is not a core debugfs issue, but a misuse of debugfs within blktace. Cc: Bart Van Assche Cc: Omar Sandoval Cc: Hannes Reinecke Cc: Nicolai Stange Cc: Greg Kroah-Hartman Cc: Michal Hocko Cc: "Martin K. Petersen" Cc: "James E.J. Bottomley" Cc: yu kuai Reported-by: syzbot+603294af2d01acfdd6da@syzkaller.appspotmail.com Fixes: 6ac93117ab00 ("blktrace: use existing disk debugfs directory") Signed-off-by: Luis Chamberlain --- block/blk-core.c | 4 --- block/blk-mq-debugfs.c | 5 ---- block/blk-sysfs.c | 40 +++++++++++++++++++++++++++ block/blk.h | 2 -- block/partitions/core.c | 3 ++ drivers/scsi/sg.c | 3 ++ include/linux/blkdev.h | 4 ++- include/linux/blktrace_api.h | 1 - include/linux/genhd.h | 1 + kernel/trace/blktrace.c | 53 ++++++++++++++++++++++++++++-------- 10 files changed, 91 insertions(+), 25 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index a5126c0be777..fd850bebdf18 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -51,9 +51,7 @@ #include "blk-pm.h" #include "blk-rq-qos.h" -#ifdef CONFIG_DEBUG_FS struct dentry *blk_debugfs_root; -#endif EXPORT_TRACEPOINT_SYMBOL_GPL(block_bio_remap); EXPORT_TRACEPOINT_SYMBOL_GPL(block_rq_remap); @@ -1937,9 +1935,7 @@ int __init blk_dev_init(void) blk_requestq_cachep = kmem_cache_create("request_queue", sizeof(struct request_queue), 0, SLAB_PANIC, NULL); -#ifdef CONFIG_DEBUG_FS blk_debugfs_root = debugfs_create_dir("block", NULL); -#endif return 0; } diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c index 15df3a36e9fa..a2800bc56fb4 100644 --- a/block/blk-mq-debugfs.c +++ b/block/blk-mq-debugfs.c @@ -824,9 +824,6 @@ void blk_mq_debugfs_register(struct request_queue *q) struct blk_mq_hw_ctx *hctx; int i; - q->debugfs_dir = debugfs_create_dir(kobject_name(q->kobj.parent), - blk_debugfs_root); - debugfs_create_files(q->debugfs_dir, q, blk_mq_debugfs_queue_attrs); /* @@ -857,9 +854,7 @@ void blk_mq_debugfs_register(struct request_queue *q) void blk_mq_debugfs_unregister(struct request_queue *q) { - debugfs_remove_recursive(q->debugfs_dir); q->sched_debugfs_dir = NULL; - q->debugfs_dir = NULL; } static void blk_mq_debugfs_register_ctx(struct blk_mq_hw_ctx *hctx, diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index 561624d4cc4e..70168435f079 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -11,6 +11,7 @@ #include #include #include +#include #include "blk.h" #include "blk-mq.h" @@ -918,6 +919,9 @@ static void blk_release_queue(struct kobject *kobj) blk_trace_shutdown(q); + debugfs_remove_recursive(q->debugfs_dir); + if (IS_ENABLED(CONFIG_CHR_DEV_SG)) + debugfs_remove_recursive(q->sg_debugfs_dir); if (queue_is_mq(q)) blk_mq_debugfs_unregister(q); @@ -937,6 +941,20 @@ struct kobj_type blk_queue_ktype = { .release = blk_release_queue, }; +/** + * blk_sg_debugfs_init - initialize debugfs for scsi-generic + * @q: the associated queue + * @name: name of the scsi-generic device + * + * To be used by scsi-generic for allowing it to use blktrace. + */ +void blk_sg_debugfs_init(struct request_queue *q, const char *name) +{ + if (IS_ENABLED(CONFIG_CHR_DEV_SG)) + q->sg_debugfs_dir = debugfs_create_dir(name, blk_debugfs_root); +} +EXPORT_SYMBOL_GPL(blk_sg_debugfs_init); + /** * blk_register_queue - register a block layer queue with sysfs * @disk: Disk of which the request queue should be registered with sysfs. @@ -989,6 +1007,28 @@ int blk_register_queue(struct gendisk *disk) goto unlock; } + /* + * Blktrace needs a debugfs name even for queues that don't register + * a gendisk, so it lazily registers the debugfs directory. But that + * can get us into a situation where a SCSI device is found, with no + * driver for it (yet). Then blktrace is used on the device, creating + * the debugfs directory, and only after that a driver is loaded. In + * that case we might already have a debugfs directory registered here. + * Even worse we could be racing with blktrace to register it. + */ +#ifdef CONFIG_BLK_DEV_IO_TRACE + mutex_lock(&q->blk_trace_mutex); + if (!q->debugfs_dir) { + q->debugfs_dir = + debugfs_create_dir(kobject_name(q->kobj.parent), + blk_debugfs_root); + } + mutex_unlock(&q->blk_trace_mutex); +#else + q->debugfs_dir = debugfs_create_dir(kobject_name(q->kobj.parent), + blk_debugfs_root); +#endif + if (queue_is_mq(q)) { __blk_mq_register_dev(dev, q); blk_mq_debugfs_register(q); diff --git a/block/blk.h b/block/blk.h index b5d1f0fc6547..499308c6ab3b 100644 --- a/block/blk.h +++ b/block/blk.h @@ -14,9 +14,7 @@ /* Max future timer expiry for timeouts */ #define BLK_MAX_TIMEOUT (5 * HZ) -#ifdef CONFIG_DEBUG_FS extern struct dentry *blk_debugfs_root; -#endif struct blk_flush_queue { unsigned int flush_pending_idx:1; diff --git a/block/partitions/core.c b/block/partitions/core.c index 78951e33b2d7..8387128364f1 100644 --- a/block/partitions/core.c +++ b/block/partitions/core.c @@ -10,6 +10,7 @@ #include #include #include +#include #include "check.h" static int (*check_part[])(struct parsed_partitions *) = { @@ -322,6 +323,7 @@ void delete_partition(struct gendisk *disk, struct hd_struct *part) get_device(disk_to_dev(part_to_disk(part))); rcu_assign_pointer(ptbl->part[part->partno], NULL); kobject_put(part->holder_dir); + debugfs_remove_recursive(part->debugfs_dir); device_del(part_to_dev(part)); /* @@ -444,6 +446,7 @@ static struct hd_struct *add_partition(struct gendisk *disk, int partno, if (!p->holder_dir) goto out_del; + p->debugfs_dir = debugfs_create_dir(dev_name(pdev), blk_debugfs_root); dev_set_uevent_suppress(pdev, 0); if (flags & ADDPART_FLAG_WHOLEDISK) { err = device_create_file(pdev, &dev_attr_whole_disk); diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c index 20472aaaf630..5f6ccf4ba4d9 100644 --- a/drivers/scsi/sg.c +++ b/drivers/scsi/sg.c @@ -1117,6 +1117,9 @@ sg_ioctl_common(struct file *filp, Sg_device *sdp, Sg_fd *sfp, return put_user(max_sectors_bytes(sdp->device->request_queue), ip); case BLKTRACESETUP: + if (!sdp->device->request_queue->sg_debugfs_dir) + blk_sg_debugfs_init(sdp->device->request_queue, + sdp->disk->disk_name); return blk_trace_setup(sdp->device->request_queue, sdp->disk->disk_name, MKDEV(SCSI_GENERIC_MAJOR, sdp->index), diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 2462b78c1013..afc43c8923c5 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -574,8 +574,9 @@ struct request_queue { struct list_head tag_set_list; struct bio_set bio_split; -#ifdef CONFIG_BLK_DEBUG_FS struct dentry *debugfs_dir; + struct dentry *sg_debugfs_dir; +#ifdef CONFIG_BLK_DEBUG_FS struct dentry *sched_debugfs_dir; struct dentry *rqos_debugfs_dir; #endif @@ -858,6 +859,7 @@ static inline void rq_flush_dcache_pages(struct request *rq) extern int blk_register_queue(struct gendisk *disk); extern void blk_unregister_queue(struct gendisk *disk); +extern void blk_sg_debugfs_init(struct request_queue *q, const char *name); extern blk_qc_t generic_make_request(struct bio *bio); extern blk_qc_t direct_make_request(struct bio *bio); extern void blk_rq_init(struct request_queue *q, struct request *rq); diff --git a/include/linux/blktrace_api.h b/include/linux/blktrace_api.h index 3b6ff5902edc..eb6db276e293 100644 --- a/include/linux/blktrace_api.h +++ b/include/linux/blktrace_api.h @@ -22,7 +22,6 @@ struct blk_trace { u64 end_lba; u32 pid; u32 dev; - struct dentry *dir; struct dentry *dropped_file; struct dentry *msg_file; struct list_head running_list; diff --git a/include/linux/genhd.h b/include/linux/genhd.h index 392aad5e29a2..902e50808bd9 100644 --- a/include/linux/genhd.h +++ b/include/linux/genhd.h @@ -76,6 +76,7 @@ struct hd_struct { int make_it_fail; #endif struct rcu_work rcu_work; + struct dentry *debugfs_dir; }; /** diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c index 7ff2ea5cd05e..4690d70e16a4 100644 --- a/kernel/trace/blktrace.c +++ b/kernel/trace/blktrace.c @@ -314,7 +314,6 @@ static void blk_trace_free(struct blk_trace *bt) debugfs_remove(bt->msg_file); debugfs_remove(bt->dropped_file); relay_close(bt->rchan); - debugfs_remove(bt->dir); free_percpu(bt->sequence); free_percpu(bt->msg_data); kfree(bt); @@ -488,9 +487,6 @@ static int do_blk_trace_setup(struct request_queue *q, char *name, dev_t dev, if (!buts->buf_size || !buts->buf_nr) return -EINVAL; - if (!blk_debugfs_root) - return -ENOENT; - strncpy(buts->name, name, BLKTRACE_BDEV_SIZE); buts->name[BLKTRACE_BDEV_SIZE - 1] = '\0'; @@ -511,6 +507,47 @@ static int do_blk_trace_setup(struct request_queue *q, char *name, dev_t dev, return -EBUSY; } + /* + * We have to use a partition directory if a partition is being worked + * on. The same request_queue is shared between all partitions. + */ + if (bdev && bdev != bdev->bd_contains) { + dir = bdev->bd_part->debugfs_dir; + } else if (IS_ENABLED(CONFIG_CHR_DEV_SG) && + MAJOR(dev) == SCSI_GENERIC_MAJOR) { + /* + * scsi-generic exposes the request_queue through the /dev/sg* + * interface but since that uses a different path than whatever + * the respective scsi driver device name may expose and use + * for the request_queue debugfs_dir. We have a dedicated + * dentry for scsi-generic then. + */ + dir = q->sg_debugfs_dir; + } else { + /* + * Drivers which do not use the *add_disk*() interfaces will + * not have their debugfs_dir created for them automatically, + * so we must create it for them. + */ + if (!q->debugfs_dir) { + q->debugfs_dir = + debugfs_create_dir(buts->name, + blk_debugfs_root); + } + dir = q->debugfs_dir; + } + + /* + * As blktrace relies on debugfs for its interface the debugfs directory + * is required, contrary to the usual mantra of not checking for debugfs + * files or directories. + */ + if (IS_ERR_OR_NULL(dir)) { + pr_warn("debugfs_dir not present for %s so skipping\n", + buts->name); + return -ENOENT; + } + bt = kzalloc(sizeof(*bt), GFP_KERNEL); if (!bt) return -ENOMEM; @@ -524,12 +561,6 @@ static int do_blk_trace_setup(struct request_queue *q, char *name, dev_t dev, if (!bt->msg_data) goto err; - ret = -ENOENT; - - dir = debugfs_lookup(buts->name, blk_debugfs_root); - if (!dir) - bt->dir = dir = debugfs_create_dir(buts->name, blk_debugfs_root); - bt->dev = dev; atomic_set(&bt->dropped, 0); INIT_LIST_HEAD(&bt->running_list); @@ -565,8 +596,6 @@ static int do_blk_trace_setup(struct request_queue *q, char *name, dev_t dev, ret = 0; err: - if (dir && !bt->dir) - dput(dir); if (ret) blk_trace_free(bt); return ret;