From patchwork Sat Apr 15 13:43:10 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cathy Avery X-Patchwork-Id: 9682213 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id B2AB4601E7 for ; Sat, 15 Apr 2017 13:43:18 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9F13528514 for ; Sat, 15 Apr 2017 13:43:18 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9294A2852D; Sat, 15 Apr 2017 13:43:18 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1777528514 for ; Sat, 15 Apr 2017 13:43:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753106AbdDONnO (ORCPT ); Sat, 15 Apr 2017 09:43:14 -0400 Received: from mx1.redhat.com ([209.132.183.28]:55322 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752804AbdDONnN (ORCPT ); Sat, 15 Apr 2017 09:43:13 -0400 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id BCD993B720; Sat, 15 Apr 2017 13:43:12 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com BCD993B720 Authentication-Results: ext-mx06.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx06.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=cavery@redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 mx1.redhat.com BCD993B720 Received: from dhcp-17-98.bos.redhat.com (ovpn-120-40.rdu2.redhat.com [10.10.120.40]) by smtp.corp.redhat.com (Postfix) with ESMTP id B1E8C4D749; Sat, 15 Apr 2017 13:43:11 +0000 (UTC) From: Cathy Avery To: kys@microsoft.com, jejb@linux.vnet.ibm.com, martin.petersen@oracle.com Cc: sthemmin@microsoft.com, haiyangz@microsoft.com, devel@linuxdriverproject.org, linux-kernel@vger.kernel.org, linux-scsi@vger.kernel.org Subject: [PATCH] scsi: storvsc: Allow only one remove lun work item to be issued per lun Date: Sat, 15 Apr 2017 09:43:10 -0400 Message-Id: <1492263790-11378-1-git-send-email-cavery@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.30]); Sat, 15 Apr 2017 13:43:12 +0000 (UTC) Sender: linux-scsi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-scsi@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP When running multipath on a VM if all available paths go down the driver can schedule large amounts of storvsc_remove_lun work items to the same lun. In response to the failing paths typically storvsc responds by taking host->scan_mutex and issuing a TUR per lun. If there has been heavy IO to the failed device all the failed IOs are returned from the host. A remove lun work item is issued per failed IO. If the outstanding TURs have not been completed in a timely manner the scan_mutex is never released or released too late. Consequently the many remove lun work items are not completed as scsi_remove_device also tries to take host->scan_mutex. This results in dragging the VM down and sometimes completely. This patch only allows one remove lun to be issued to a particular lun while it is an instantiated member of the scsi stack. Signed-off-by: Cathy Avery --- drivers/scsi/storvsc_drv.c | 33 +++++++++++++++++++++++++++++++-- 1 file changed, 31 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c index 016639d..9dbb5bf 100644 --- a/drivers/scsi/storvsc_drv.c +++ b/drivers/scsi/storvsc_drv.c @@ -478,6 +478,10 @@ struct storvsc_device { u64 port_name; }; +struct storvsc_dev_hostdata { + atomic_t req_remove_lun; +}; + struct hv_host_device { struct hv_device *dev; unsigned int port; @@ -918,6 +922,8 @@ static void storvsc_handle_error(struct vmscsi_request *vm_srb, u8 asc, u8 ascq) { struct storvsc_scan_work *wrk; + struct storvsc_dev_hostdata *hostdata; + struct scsi_device *sdev; void (*process_err_fn)(struct work_struct *work); bool do_work = false; @@ -953,8 +959,17 @@ static void storvsc_handle_error(struct vmscsi_request *vm_srb, } break; case SRB_STATUS_INVALID_LUN: - do_work = true; - process_err_fn = storvsc_remove_lun; + sdev = scsi_device_lookup(host, 0, vm_srb->target_id, + vm_srb->lun); + if (sdev) { + hostdata = sdev->hostdata; + if (hostdata && + !atomic_cmpxchg(&hostdata->req_remove_lun, 0, 1)) { + do_work = true; + process_err_fn = storvsc_remove_lun; + } + scsi_device_put(sdev); + } break; case SRB_STATUS_ABORTED: if (vm_srb->srb_status & SRB_STATUS_AUTOSENSE_VALID && @@ -1426,9 +1441,22 @@ static int storvsc_device_configure(struct scsi_device *sdevice) sdevice->no_write_same = 0; } + sdevice->hostdata = kzalloc(sizeof(struct storvsc_dev_hostdata), + GFP_ATOMIC); + if (!sdevice->hostdata) + return -ENOMEM; + return 0; } +static void storvsc_device_destroy(struct scsi_device *sdevice) +{ + if (sdevice->hostdata) { + kfree(sdevice->hostdata); + sdevice->hostdata = NULL; + } +} + static int storvsc_get_chs(struct scsi_device *sdev, struct block_device * bdev, sector_t capacity, int *info) { @@ -1669,6 +1697,7 @@ static struct scsi_host_template scsi_driver = { .eh_timed_out = storvsc_eh_timed_out, .slave_alloc = storvsc_device_alloc, .slave_configure = storvsc_device_configure, + .slave_destroy = storvsc_device_destroy, .cmd_per_lun = 255, .this_id = -1, .use_clustering = ENABLE_CLUSTERING,