From patchwork Tue Nov 14 17:33:50 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bart Van Assche X-Patchwork-Id: 10057977 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 749996023A for ; Tue, 14 Nov 2017 17:33:56 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5BB8929806 for ; Tue, 14 Nov 2017 17:33:56 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 5081F2980A; Tue, 14 Nov 2017 17:33:56 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 633A929806 for ; Tue, 14 Nov 2017 17:33:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755395AbdKNRdy (ORCPT ); Tue, 14 Nov 2017 12:33:54 -0500 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:57812 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754965AbdKNRdx (ORCPT ); Tue, 14 Nov 2017 12:33:53 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1510680833; x=1542216833; h=from:to:cc:subject:date:message-id:references: in-reply-to:content-id:content-transfer-encoding: mime-version; bh=p9Hf2U5pYezW/BRxuyj4qhplNOs9AzgJCCiuDmbH/U4=; b=TFzNT36l0x1Ae0kRBl0HBKAVl0bbvJw515gy+/aFKX7p1tdKMus1JVPd 2dmmmldNkLEobesYGh9draV5+okT5UI9YIp/aEsx0VRpF4GJnMPiS+M6j MJkzH4KPMaQO/Q9qOajyLKm5qp6yzkXC/e1Dj/r2fKL45Hp5YUSy5brY5 Nls8cIFU/HZBYRuu3zeUFtLgN0bEQOw/k/wzjix/jb5Z0S8f+jawY/u5p 5JFwFAXFsvrJ8BxyHUadpkNnW4W/I0DBAFu0CGDKVaZlKj2nbi+u46GM5 PeYSeaCPccELtcUZDNJ7N93vby6Ti8R0e+EOP1MSfZRmLcs6AgU7O2KtN Q==; X-IronPort-AV: E=Sophos;i="5.44,395,1505750400"; d="scan'208";a="62235490" Received: from mail-sn1nam02lp0015.outbound.protection.outlook.com (HELO NAM02-SN1-obe.outbound.protection.outlook.com) ([216.32.180.15]) by ob1.hgst.iphmx.com with ESMTP; 15 Nov 2017 01:33:52 +0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sharedspace.onmicrosoft.com; s=selector1-wdc-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=p9Hf2U5pYezW/BRxuyj4qhplNOs9AzgJCCiuDmbH/U4=; b=RjmDo17Eu3kdr4kQMIi+o/A1etvp0UJdi2aMMpsUAdyJNY1gP1tyKJwTkkACgHDcRFIN5FTUPz/ZbTjl9WgMkrSmClpOB53n1Z0Fwucw0hOCX2l0sSLYufKkxS3rof3KUdDd2z8YJSC0Vl1ZQW+i7w96ICcOqygGwZGoG8IBJK8= Received: from CY1PR0401MB1536.namprd04.prod.outlook.com (10.163.19.154) by CY1PR0401MB1533.namprd04.prod.outlook.com (10.163.19.151) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P256) id 15.20.218.12; Tue, 14 Nov 2017 17:33:50 +0000 Received: from CY1PR0401MB1536.namprd04.prod.outlook.com ([10.163.19.154]) by CY1PR0401MB1536.namprd04.prod.outlook.com ([10.163.19.154]) with mapi id 15.20.0218.011; Tue, 14 Nov 2017 17:33:50 +0000 From: Bart Van Assche To: "xjtuwjp@gmail.com" CC: "jejb@linux.vnet.ibm.com" , "linux-scsi@vger.kernel.org" , "hare@suse.de" , "martin.petersen@oracle.com" , "snitzer@redhat.com" Subject: Re: [PATCH 0/2] sd: Fix a deadlock between event checking and device removal Thread-Topic: [PATCH 0/2] sd: Fix a deadlock between event checking and device removal Thread-Index: AQHSPH0ixpKaIm34IUi4xiueAyX1x6DWacMAgAJo54CCPYjJgIAACQcA Date: Tue, 14 Nov 2017 17:33:50 +0000 Message-ID: <1510680828.2280.16.camel@sandisk.com> References: <1479016028.17624.16.camel@linux.vnet.ibm.com> <9500e9b2-2d34-099d-aa90-d38fb3feb02e@sandisk.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=Bart.VanAssche@wdc.com; x-originating-ip: [107.77.212.210] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; CY1PR0401MB1533; 20:sJPHc7I0HnTgOGNIXs/2bC2U6umywMmDhqE9aUAjqqX/6ntZHS6hrMCCgyZL711fJbjazcgB9+fV8AokPAFeUoV7r+8oB2d1UnUw4X4gtLzp7sURMzrXPPRcz7InYrDfXI02sKPgR03OSmfrJ6K7gafWxioJwfbV9O0p1nw1hWk= x-ms-exchange-antispam-srfa-diagnostics: SSOS; x-ms-office365-filtering-correlation-id: b13f08d6-f30b-4ee1-b5e8-08d52b85de62 x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(22001)(4534020)(4602075)(4627115)(201703031133081)(201702281549075)(48565401081)(2017052603199); SRVR:CY1PR0401MB1533; x-ms-traffictypediagnostic: CY1PR0401MB1533: wdcipoutbound: EOP-TRUE x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:; x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(100000700101)(100105000095)(100000701101)(100105300095)(100000702101)(100105100095)(6040450)(2401047)(5005006)(8121501046)(3002001)(93006095)(93001095)(10201501046)(100000703101)(100105400095)(3231022)(6055026)(6041248)(20161123564025)(20161123555025)(20161123560025)(20161123558100)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(20161123562025)(6072148)(201708071742011)(100000704101)(100105200095)(100000705101)(100105500095); SRVR:CY1PR0401MB1533; BCL:0; PCL:0; RULEID:(100000800101)(100110000095)(100000801101)(100110300095)(100000802101)(100110100095)(100000803101)(100110400095)(100000804101)(100110200095)(100000805101)(100110500095); SRVR:CY1PR0401MB1533; x-forefront-prvs: 04916EA04C x-forefront-antispam-report: SFV:NSPM; SFS:(10019020)(6009001)(346002)(39860400002)(376002)(24454002)(199003)(377424004)(189002)(6512007)(2900100001)(3846002)(6116002)(66066001)(478600001)(2950100002)(3660700001)(2906002)(50986999)(54356999)(86362001)(6916009)(305945005)(36756003)(5660300001)(76176999)(316002)(7736002)(33646002)(14454004)(101416001)(93886005)(54906003)(3280700002)(102836003)(8676002)(25786009)(103116003)(2351001)(81166006)(105586002)(99286004)(72206003)(39060400002)(5640700003)(6436002)(2501003)(1361003)(6506006)(575784001)(53936002)(68736007)(9686003)(1730700003)(81156014)(77096006)(97736004)(6246003)(4001150100001)(8936002)(6486002)(106356001)(4326008)(189998001)(1411001)(229853002); DIR:OUT; SFP:1102; SCL:1; SRVR:CY1PR0401MB1533; H:CY1PR0401MB1536.namprd04.prod.outlook.com; FPR:; SPF:None; PTR:InfoNoRecords; A:1; MX:1; LANG:en; spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-ID: <7510EFB41A117648B3DCE38DEFD8EDBD@namprd04.prod.outlook.com> MIME-Version: 1.0 X-OriginatorOrg: wdc.com X-MS-Exchange-CrossTenant-Network-Message-Id: b13f08d6-f30b-4ee1-b5e8-08d52b85de62 X-MS-Exchange-CrossTenant-originalarrivaltime: 14 Nov 2017 17:33:50.3042 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: b61c8803-16f3-4c35-9b17-6f65f441df86 X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY1PR0401MB1533 Sender: linux-scsi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-scsi@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Tue, 2017-11-14 at 18:01 +0100, Jack Wang wrote: > I suspect we run into same bug you were trying to fix in this patch > set. we're running in v4.4.50 > > I was trying to reproduce it, but no lucky yet, do you still have your > reproducer? Hello Jack, I can reproduce this about every fifth run of test one of the srp-test software and with the SRP initiator and target drivers of what will become kernel v4.15-rc1 and by switching the ib_srpt driver from non-SRQ to SRQ mode while the initiator is logging in. I'm currently analyzing where in the block layer a queue run is missing. The patch below for the sd driver does not fix the root cause but seems to help. Bart. Subject: [PATCH] Increase SCSI disk probing concurrency --- drivers/scsi/scsi.c | 5 ----- drivers/scsi/scsi_pm.c | 6 ++++-- drivers/scsi/scsi_priv.h | 1 - drivers/scsi/sd.c | 26 +++++++++++++++++++++----- drivers/scsi/sd.h | 1 + include/scsi/scsi_driver.h | 1 + 6 files changed, 27 insertions(+), 13 deletions(-) -- 2.15.0 diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c index a7e4fba724b7..e6d69e647f6a 100644 --- a/drivers/scsi/scsi.c +++ b/drivers/scsi/scsi.c @@ -85,10 +85,6 @@ unsigned int scsi_logging_level; EXPORT_SYMBOL(scsi_logging_level); #endif -/* sd, scsi core and power management need to coordinate flushing async actions */ -ASYNC_DOMAIN(scsi_sd_probe_domain); -EXPORT_SYMBOL(scsi_sd_probe_domain); - /* * Separate domain (from scsi_sd_probe_domain) to maximize the benefit of * asynchronous system resume operations. It is marked 'exclusive' to avoid @@ -839,7 +835,6 @@ static void __exit exit_scsi(void) scsi_exit_devinfo(); scsi_exit_procfs(); scsi_exit_queue(); - async_unregister_domain(&scsi_sd_probe_domain); } subsys_initcall(init_scsi); diff --git a/drivers/scsi/scsi_pm.c b/drivers/scsi/scsi_pm.c index b44c1bb687a2..d8e43c2f4d40 100644 --- a/drivers/scsi/scsi_pm.c +++ b/drivers/scsi/scsi_pm.c @@ -171,9 +171,11 @@ static int scsi_bus_resume_common(struct device *dev, static int scsi_bus_prepare(struct device *dev) { if (scsi_is_sdev_device(dev)) { - /* sd probing uses async_schedule. Wait until it finishes. */ - async_synchronize_full_domain(&scsi_sd_probe_domain); + struct scsi_driver *drv = to_scsi_driver(dev->driver); + /* sd probing happens asynchronously. Wait until it finishes. */ + if (drv->sync) + drv->sync(dev); } else if (scsi_is_host_device(dev)) { /* Wait until async scanning is finished */ scsi_complete_async_scans(); diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h index dab29f538612..bf0cadf6a321 100644 --- a/drivers/scsi/scsi_priv.h +++ b/drivers/scsi/scsi_priv.h @@ -174,7 +174,6 @@ static inline void scsi_autopm_put_host(struct Scsi_Host *h) {} #endif /* CONFIG_PM */ extern struct async_domain scsi_sd_pm_domain; -extern struct async_domain scsi_sd_probe_domain; /* scsi_dh.c */ #ifdef CONFIG_SCSI_DH diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index 0313486d85c8..c26dbb38b60c 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -112,6 +112,7 @@ static void sd_shutdown(struct device *); static int sd_suspend_system(struct device *); static int sd_suspend_runtime(struct device *); static int sd_resume(struct device *); +static void sd_sync_probe_domain(struct device *dev); static void sd_rescan(struct device *); static int sd_init_command(struct scsi_cmnd *SCpnt); static void sd_uninit_command(struct scsi_cmnd *SCpnt); @@ -564,6 +565,7 @@ static struct scsi_driver sd_template = { .shutdown = sd_shutdown, .pm = &sd_pm_ops, }, + .sync = sd_sync_probe_domain, .rescan = sd_rescan, .init_command = sd_init_command, .uninit_command = sd_uninit_command, @@ -3221,9 +3223,9 @@ static int sd_format_disk_name(char *prefix, int index, char *buf, int buflen) /* * The asynchronous part of sd_probe */ -static void sd_probe_async(void *data, async_cookie_t cookie) +static void sd_probe_async(struct work_struct *work) { - struct scsi_disk *sdkp = data; + struct scsi_disk *sdkp = container_of(work, typeof(*sdkp), probe_work); struct scsi_device *sdp; struct gendisk *gd; u32 index; @@ -3326,6 +3328,8 @@ static int sd_probe(struct device *dev) if (!sdkp) goto out; + INIT_WORK(&sdkp->probe_work, sd_probe_async); + gd = alloc_disk(SD_MINORS); if (!gd) goto out_free; @@ -3377,8 +3381,8 @@ static int sd_probe(struct device *dev) get_device(dev); dev_set_drvdata(dev, sdkp); - get_device(&sdkp->dev); /* prevent release before async_schedule */ - async_schedule_domain(sd_probe_async, sdkp, &scsi_sd_probe_domain); + get_device(&sdkp->dev); /* prevent release before sd_probe_async() */ + WARN_ON_ONCE(!queue_work(system_unbound_wq, &sdkp->probe_work)); return 0; @@ -3395,6 +3399,18 @@ static int sd_probe(struct device *dev) return error; } +static void sd_wait_for_probing(struct scsi_disk *sdkp) +{ + flush_work(&sdkp->probe_work); +} + +static void sd_sync_probe_domain(struct device *dev) +{ + struct scsi_disk *sdkp = dev_get_drvdata(dev); + + sd_wait_for_probing(sdkp); +} + /** * sd_remove - called whenever a scsi disk (previously recognized by * sd_probe) is detached from the system. It is called (potentially @@ -3416,7 +3432,7 @@ static int sd_remove(struct device *dev) scsi_autopm_get_device(sdkp->device); async_synchronize_full_domain(&scsi_sd_pm_domain); - async_synchronize_full_domain(&scsi_sd_probe_domain); + sd_wait_for_probing(sdkp); device_del(&sdkp->dev); del_gendisk(sdkp->disk); sd_shutdown(dev); diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h index 7b57dafcd45a..2cc47183c9aa 100644 --- a/drivers/scsi/sd.h +++ b/drivers/scsi/sd.h @@ -81,6 +81,7 @@ struct scsi_disk { unsigned int zones_optimal_nonseq; unsigned int zones_max_open; #endif + struct work_struct probe_work; atomic_t openers; sector_t capacity; /* size in logical blocks */ u32 max_xfer_blocks; diff --git a/include/scsi/scsi_driver.h b/include/scsi/scsi_driver.h index a5534ccad859..145d6239eecf 100644 --- a/include/scsi/scsi_driver.h +++ b/include/scsi/scsi_driver.h @@ -11,6 +11,7 @@ struct scsi_device; struct scsi_driver { struct device_driver gendrv; + void (*sync)(struct device *); void (*rescan)(struct device *); int (*init_command)(struct scsi_cmnd *); void (*uninit_command)(struct scsi_cmnd *);