From patchwork Thu Feb 22 02:23:25 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Damien Le Moal X-Patchwork-Id: 10234557 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 1B6DF602A7 for ; Thu, 22 Feb 2018 02:23:34 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0925B288E8 for ; Thu, 22 Feb 2018 02:23:34 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id F191E288ED; Thu, 22 Feb 2018 02:23:33 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EA748288E8 for ; Thu, 22 Feb 2018 02:23:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751876AbeBVCXb (ORCPT ); Wed, 21 Feb 2018 21:23:31 -0500 Received: from esa1.hgst.iphmx.com ([68.232.141.245]:39434 "EHLO esa1.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751834AbeBVCX3 (ORCPT ); Wed, 21 Feb 2018 21:23:29 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1519266209; x=1550802209; h=from:to:cc:subject:date:message-id:references: in-reply-to:content-id:content-transfer-encoding: mime-version; bh=KWKCUnQTsO/kY0lzJqEbiF3uK21h6gqwfI8AWQhRi6c=; b=oS7KX+X0h0P1WiGliBPkwL93zrOh0mFSX4aosSha/rJGz5J8P1dGmhfD TMp0YrW6fLD0VGAfssg17mf9QqijqFSrqkQXtY56t/MA7wa/88Xev5dNs zjFJt2SUvYBb0YnidW/HM1F3rRVIQKbbgb6uWzhiK+LZqruNn1/NQIZDZ VKmHpyDWwosjnPHNlQr4V1w9MXVjwVyqTV6ALQOQ3UYS474iwFaGdNXEi dKa3B7Nn/EXKmeHpL8TbmjVRpV43qtgA2LZvs7rvdMXyAJmDWeb/zdLDd pOklREtKMb3LDorxUOpaSvlavKi7TelTJeiHdEbhIzgRsoKrRimMPkJEA w==; X-IronPort-AV: E=Sophos;i="5.47,376,1515427200"; d="scan'208";a="174916606" Received: from mail-bn3nam01lp0179.outbound.protection.outlook.com (HELO NAM01-BN3-obe.outbound.protection.outlook.com) ([216.32.180.179]) by ob1.hgst.iphmx.com with ESMTP; 22 Feb 2018 10:23:28 +0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sharedspace.onmicrosoft.com; s=selector1-wdc-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=KWKCUnQTsO/kY0lzJqEbiF3uK21h6gqwfI8AWQhRi6c=; b=KmfvnAjDEqdjZCpkAI8YX1ifKQLS39cyTCWPp5NR54dZ7oCfA2rV4TiwDMXjXQIyZmC2dwxh0/xO4QdQ/I/rEpjUSI11n6SYKs3Jf7/G3Wy45zpPj8JgswjdGnRzfmkZKYce1SXd2d+dP8AQDi4Wt3XmkjVsCb+IpvDeqJOQZFk= Received: from BN3PR0401MB1377.namprd04.prod.outlook.com (10.161.209.149) by BN3PR0401MB1347.namprd04.prod.outlook.com (10.161.208.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P256) id 15.20.506.18; Thu, 22 Feb 2018 02:23:25 +0000 Received: from BN3PR0401MB1377.namprd04.prod.outlook.com ([fe80::49de:2df7:9bed:c83f]) by BN3PR0401MB1377.namprd04.prod.outlook.com ([fe80::49de:2df7:9bed:c83f%17]) with mapi id 15.20.0506.023; Thu, 22 Feb 2018 02:23:25 +0000 From: Damien Le Moal To: "jejb@linux.vnet.ibm.com" , Bart Van Assche , "martin.petersen@oracle.com" CC: "linux-scsi@vger.kernel.org" , "hare@suse.com" , "jthumshirn@suse.de" , "ptikhomirov@virtuozzo.com" , "ncopa@alpinelinux.org" , "stable@vger.kernel.org" Subject: Re: [PATCH] Avoid that ATA error handling hangs Thread-Topic: [PATCH] Avoid that ATA error handling hangs Thread-Index: AQHTqziu3psYMuehqk2Tf6b81oIkFKOvscYA Date: Thu, 22 Feb 2018 02:23:25 +0000 Message-ID: <1519266202.16203.5.camel@wdc.com> References: <20180221172316.11884-1-bart.vanassche@wdc.com> In-Reply-To: <20180221172316.11884-1-bart.vanassche@wdc.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [199.255.44.250] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; BN3PR0401MB1347; 7:QWVzXZ8SnRgfmMsfZ+XRfE3LFaLsjNZgcFQR5wCcQFejF4gnaNOjQjZ9jAPv1OVv9FgrbCbYF2DSHwgnsEZEIiWCrydLreMa3q3I8NIhzUenNrGGurVKfTIWjDPaFV0tEnv14Dv2DjiZBiQQ4QHiaat6R77L81qBYW7VZvIixXrSs2UexGYpW7c87BTCks1lwcGWwex5o8jS0ZrvvB4CxF3W5as+ULwQePzUsVT14ROGK3QDm9eTyleXQed6Gnbk; 20:oSA+yKrtmhbc8XtbPvb1XU1yLxEy8yrHfagY1utuLcqAHHDbLSwuf6meH29Ke3/l6bfm7MjN+HVoZVzYROnq8BAnb1nZoKSI8q2/rQVDQtuf3HxUS6be328Re7gf8rlxwGxCK6q8CijBHWTeA/u8FyxmX2XKcoZfb7oIpYuxnKI= x-ms-exchange-antispam-srfa-diagnostics: SSOS;SSOR; x-ms-office365-filtering-ht: Tenant x-ms-office365-filtering-correlation-id: af9086a0-cc28-4591-d935-08d5799b4092 x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(7020095)(4652020)(48565401081)(4534165)(4627221)(201703031133081)(201702281549075)(5600026)(4604075)(3008032)(2017052603307)(7153060)(7193020); SRVR:BN3PR0401MB1347; x-ms-traffictypediagnostic: BN3PR0401MB1347: authentication-results: spf=none (sender IP is ) smtp.mailfrom=Damien.LeMoal@wdc.com; wdcipoutbound: EOP-TRUE x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(190756311086443)(9452136761055); x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(6040501)(2401047)(8121501046)(5005006)(93006095)(93001095)(10201501046)(3231101)(944501161)(3002001)(6055026)(6041288)(20161123562045)(20161123564045)(20161123558120)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(20161123560045)(6072148)(201708071742011); SRVR:BN3PR0401MB1347; BCL:0; PCL:0; RULEID:; SRVR:BN3PR0401MB1347; x-forefront-prvs: 059185FE08 x-forefront-antispam-report: SFV:NSPM; SFS:(10019020)(366004)(396003)(376002)(39380400002)(346002)(39860400002)(377424004)(189003)(199004)(305945005)(53936002)(6506007)(102836004)(26005)(575784001)(5660300001)(105586002)(99286004)(86362001)(186003)(478600001)(4326008)(8676002)(3660700001)(59450400001)(81156014)(7736002)(81166006)(6246003)(6486002)(6306002)(6512007)(8936002)(97736004)(6436002)(5250100002)(2950100002)(14454004)(106356001)(103116003)(2906002)(2900100001)(68736007)(966005)(3846002)(6116002)(25786009)(2501003)(229853002)(76176011)(110136005)(54906003)(66066001)(3280700002)(316002)(72206003)(36756003); DIR:OUT; SFP:1102; SCL:1; SRVR:BN3PR0401MB1347; H:BN3PR0401MB1377.namprd04.prod.outlook.com; FPR:; SPF:None; PTR:InfoNoRecords; MX:1; A:1; LANG:en; x-microsoft-antispam-message-info: 1sRc7OxS7/E4emGftqznK/6ZZYnA/pQN3L+QlvnfuQVMLv49gQey680eNUMKNYYPNN9lW4ZI45fsRieCLc1sf35pRvlpqe+Km6Kze17DLdPPxjEoY4ag3cCKImbN+ZjsqZd6QN7sXQu9v1oa6Ls6xs2feipbNot2SSrQwr58r8Q= spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-ID: <2814518F2B73EB418908138337C4D526@namprd04.prod.outlook.com> MIME-Version: 1.0 X-OriginatorOrg: wdc.com X-MS-Exchange-CrossTenant-Network-Message-Id: af9086a0-cc28-4591-d935-08d5799b4092 X-MS-Exchange-CrossTenant-originalarrivaltime: 22 Feb 2018 02:23:25.2269 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: b61c8803-16f3-4c35-9b17-6f65f441df86 X-MS-Exchange-Transport-CrossTenantHeadersStamped: BN3PR0401MB1347 Sender: linux-scsi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-scsi@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Bart, On Wed, 2018-02-21 at 09:23 -0800, Bart Van Assche wrote: > Avoid that the recently introduced call_rcu() call in the SCSI core > causes the RCU core to complain about double call_rcu() calls. > > Reported-by: Natanael Copa > Reported-by: Damien Le Moal > References: https://bugzilla.kernel.org/show_bug.cgi?id=198861 > Fixes: 3bd6f43f5cb3 ("scsi: core: Ensure that the SCSI error handler gets > woken up") > Signed-off-by: Bart Van Assche > Cc: Natanael Copa > Cc: Damien Le Moal > Cc: Pavel Tikhomirov > Cc: Hannes Reinecke > Cc: Johannes Thumshirn > Cc: > --- > drivers/scsi/scsi_error.c | 5 +++-- > include/scsi/scsi_cmnd.h | 3 +++ > include/scsi/scsi_host.h | 2 -- > 3 files changed, 6 insertions(+), 4 deletions(-) > > diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c > index ae325985eac1..ac9ce099530e 100644 > --- a/drivers/scsi/scsi_error.c > +++ b/drivers/scsi/scsi_error.c > @@ -229,7 +229,8 @@ static void scsi_eh_reset(struct scsi_cmnd *scmd) > > static void scsi_eh_inc_host_failed(struct rcu_head *head) > { > - struct Scsi_Host *shost = container_of(head, typeof(*shost), rcu); > + struct scsi_cmnd *scmd = container_of(head, typeof(*scmd), rcu); > + struct Scsi_Host *shost = scmd->device->host; > unsigned long flags; > > spin_lock_irqsave(shost->host_lock, flags); > @@ -265,7 +266,7 @@ void scsi_eh_scmd_add(struct scsi_cmnd *scmd) > * Ensure that all tasks observe the host state change before the > * host_failed change. > */ > - call_rcu(&shost->rcu, scsi_eh_inc_host_failed); > + call_rcu(&scmd->rcu, scsi_eh_inc_host_failed); > } > > /** > diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h > index d8d4a902a88d..2280b2351739 100644 > --- a/include/scsi/scsi_cmnd.h > +++ b/include/scsi/scsi_cmnd.h > @@ -68,6 +68,9 @@ struct scsi_cmnd { > struct list_head list; /* scsi_cmnd participates in queue lists */ > struct list_head eh_entry; /* entry for the host eh_cmd_q */ > struct delayed_work abort_work; > + > + struct rcu_head rcu; > + > int eh_eflags; /* Used by error handlr */ > > /* > diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h > index 1a1df0d21ee3..a8b7bf879ced 100644 > --- a/include/scsi/scsi_host.h > +++ b/include/scsi/scsi_host.h > @@ -571,8 +571,6 @@ struct Scsi_Host { > struct blk_mq_tag_set tag_set; > }; > > - struct rcu_head rcu; > - > atomic_t host_busy; /* commands actually active > on low-level */ > atomic_t host_blocked; This does not compile. You missed the init_rcu_head() and destroy_rcu_head() changes. Adding this: And it compiles. Testing this, the rcu hang is now gone. However, the behavior of the error recovery is still different from what I see in 4.15 and 4.14. For my test case, an unaligned write to a sequential zone on a ZAC drive connected to an AHCI port, the report zone issued during the disk revalidation after the write error fails with a timeout, which causes capacity change to 0, port reset and recovery again. Eventually, everything comes back up OK, but it takes some time. I am investigating to make sure I am not hitting a device FW bug to confirm if this is a kernel problem. Best regards. -- Damien Le Moal Western Digital diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c index 57bf43e34863..dd9464920456 100644 --- a/drivers/scsi/hosts.c +++ b/drivers/scsi/hosts.c @@ -328,8 +328,6 @@ static void scsi_host_dev_release(struct device *dev) if (shost->work_q) destroy_workqueue(shost->work_q); - destroy_rcu_head(&shost->rcu); - if (shost->shost_state == SHOST_CREATED) { /* * Free the shost_dev device name here if scsi_host_alloc() @@ -404,7 +402,6 @@ struct Scsi_Host *scsi_host_alloc(struct scsi_host_template *sht, int privsize) INIT_LIST_HEAD(&shost->starved_list); init_waitqueue_head(&shost->host_wait); mutex_init(&shost->scan_mutex); - init_rcu_head(&shost->rcu); index = ida_simple_get(&host_index_ida, 0, 0, GFP_KERNEL); if (index < 0) diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index a86df9ca7d1c..488e5c9acedf 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -590,6 +590,8 @@ static void scsi_uninit_cmd(struct scsi_cmnd *cmd) if (drv->uninit_command) drv->uninit_command(cmd); } + + destroy_rcu_head(&cmd->rcu); } static void scsi_mq_free_sgtables(struct scsi_cmnd *cmd) @@ -1153,6 +1155,7 @@ static void scsi_initialize_rq(struct request *rq) scsi_req_init(&cmd->req); cmd->jiffies_at_alloc = jiffies; cmd->retries = 0; + init_rcu_head(&cmd->rcu); } /* Add a command to the list used by the aacraid and dpt_i2o drivers */