From patchwork Wed Sep 28 20:01:32 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Josef Bacik X-Patchwork-Id: 9354853 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 49C2060757 for ; Wed, 28 Sep 2016 20:02:10 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 317072965D for ; Wed, 28 Sep 2016 20:02:10 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 259B229697; Wed, 28 Sep 2016 20:02:10 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 067732965D for ; Wed, 28 Sep 2016 20:02:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753091AbcI1UCH (ORCPT ); Wed, 28 Sep 2016 16:02:07 -0400 Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:49276 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752032AbcI1UCF (ORCPT ); Wed, 28 Sep 2016 16:02:05 -0400 Received: from pps.filterd (m0044010.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.0.17/8.16.0.17) with SMTP id u8SJxL5j019402; Wed, 28 Sep 2016 13:01:53 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=fb.com; h=from : to : subject : date : message-id : mime-version : content-type; s=facebook; bh=5GiyKNkZAhG0wwWqdR/aGAW1fYp542kjk6zyiMvATLg=; b=IjQS80lNTAvajF3ZliYMm5y3y06zIeICY1ZDekkoP4mVIB1VblyBvpz2wZgqvthPFVDR qRy9ZlFaErInsG1QGOHKZPmAAG5f7xRTQ4Z141neY4upduVJ6aRutmhrQYsgL6762qtN GXcU2H+093fwOEGh931I8yG7m1Apw9ZVXO4= Received: from maileast.thefacebook.com ([199.201.65.23]) by mx0a-00082601.pphosted.com with ESMTP id 25rj1998s8-1 (version=TLSv1 cipher=ECDHE-RSA-AES256-SHA bits=256 verify=NOT); Wed, 28 Sep 2016 13:01:53 -0700 Received: from NAM01-BN3-obe.outbound.protection.outlook.com (192.168.183.28) by o365-in.thefacebook.com (192.168.177.21) with Microsoft SMTP Server (TLS) id 14.3.294.0; Wed, 28 Sep 2016 16:01:51 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.onmicrosoft.com; s=selector1-fb-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=UF7Qrc8NJ1V/I9x5gwy6+TnfEMwty1jtYWRGqLA0Jlw=; b=SFi0v6UIsIqL7pJ5bP//g70dopkZuaeTc9ujfAJ1uv9UD/e7fIeodRU20jFLMFos+7P0BdGhfmg3zaSZCLjp0aSBrJKxtU8caviuZ2g55WvmucTcrDfLhPzwbXpdlHdSiSYYZMDFGb//vSux97xeU8MDvneS79s9H9Fv1hD8Y1o= Received: from localhost (107.15.72.49) by CY4PR15MB1320.namprd15.prod.outlook.com (10.172.182.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P384) id 15.1.639.5; Wed, 28 Sep 2016 20:01:36 +0000 From: Josef Bacik To: , , , , , Subject: [PATCH][V3] nbd: add multi-connection support Date: Wed, 28 Sep 2016 16:01:32 -0400 Message-ID: <1475092892-8230-1-git-send-email-jbacik@fb.com> X-Mailer: git-send-email 2.7.4 MIME-Version: 1.0 X-Originating-IP: [107.15.72.49] X-ClientProxiedBy: BN6PR01CA0054.prod.exchangelabs.com (10.172.194.144) To CY4PR15MB1320.namprd15.prod.outlook.com (10.172.182.8) X-MS-Office365-Filtering-Correlation-Id: d7e0ac2a-9e22-4975-40f3-08d3e7da40cd X-Microsoft-Exchange-Diagnostics: 1; CY4PR15MB1320; 2:JBW/46Bqi+xsx2SeaTEHSLXOC64suimVSYXVNFtmwZdhuN+9uW+0Uf1MZiZz+Isqadqon17pbCHk5Bdw91Iw9oYqD8AW6AA9A23rkteULEzckJzrIC9Tx1Owutf8K6yzNeN3r5XJ6sGOUQelloItotHNuKh0DKuGqwY4s6ithyLVxQLOgMP9GHo9eoL/DQtJ; 3:pIp9cGj7bBeeS9zkp5r5+52pA1f6cw75TA8l76YtoAb8GyHqCJcTGILHMbubAMPHgx8qK28gCF0GeWJvVmMQQebEcxfjP4EAek1C6r5kh0gGp/NvrmklV9+9RpPKbXHU X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:CY4PR15MB1320; X-Microsoft-Exchange-Diagnostics: 1; CY4PR15MB1320; 25:tftyPfuy4VdsifecWpGao4lRolx/BLQRD3E+gMz1WMBL/7EX6H/R4I+KZd2FVcKzxgEZH5VTyI19ILkPL5Cg6ORd/TOeOzOjyCCx+qgbrydewJbs4NxTZ/YHoH/Uu/8jJz2u50JdWPsXVXSN8hf8AIWCNOvYp9FttziN0w96MYxKb4Y5H408DY00PLLoXlWn0PmBlubsYfxfCSfUfGtfHkRioWLjb/AUbBr8KeaieUKbWSHLo/f9at1vWxOXm/4jDOhtFDLH0PNqZDZK1/rM+nRZBRxGMzVk41tvrWBNdGEEhjav2GV/0tRFZ/920sEVyLvT+NCxOErqz7uGYF+74Pzd9Xn1xVisftIngEMrvUZNftymwCUpn8DZSix3uuq+n0Q0fvWfx/GlJVCro9ZzYK2wBvMIdXKEbxcf/WM8oGfvjtgaVo4ZxyH5lbny1FRhUaRJYRkuPzQEVsUV41Si6ktJh1gTjFUagWoJhSrHTsw1SHZN4PgjZVx/Exh0nHvwaeCHv92eCgOoSt6NpZijV1hL0d1Lguq4qeYU0tkGQTBbKleOb+TVWa3bemzORcTID9q2hQ3swlHLXjAgB+yR8SjmgL6Mlmd3JjRDJA9Qc7N+c13KYYf7GbFU2EpnD9rnWcs8az5Vn7rNRRva5Nw9BzyrXk9U2OVOPkrACqrWoqI49Xn9Mo06jicmCcZV6VI1wAMWR5YDGpRpcf598KjEeE9w8axCNu87W8U7qPY3ad0= X-Microsoft-Exchange-Diagnostics: 1; CY4PR15MB1320; 31:6zn02pJvfgiMN+mPOuTkJakj8O/0P0NXd23E2ecRRu1BX2po688/gCj0UjJiTbVBrHj+jWXrm6wnEUUhn8yUFZJqqqnGJssUWXOFPIpne9rlXKs/I6hrRIuO8kqYLpfsx1xDxYSZpXFeQV7LZdQZMHz4IWGtom6sdKqgS3HHCael2HCnm0ZExnEqgy+WR9mU/sNoLJqBmy+8YfBDF9Qe46wRnDvEfU2Dx9Jk+deozdk=; 20:xMADIMTQf4vNUg8odC9Dr5vjVu3B64ipx4k+6Im0YL7dASj4QAJly6cghz8jnOwOZpOC0w7t3gq9Z2xaIWKX1NWlloejL0q2JoaA6Oi/z8EXwQokJVr173qnekPBpEvpc8yCRmEIXi/ZofiC5p/chtenUqcQhe0hRtk+khw/Fsw=; 4:9N4OHX66/oWqe3SGiOda+xe08m/0NYNeRJqXT8VanmVAmMXlM/VimPn6cDSDDpbqiYhirF1/NDjBlV5qTyFZCdxaCfXE103Ccxy9E5+9XSro3rCnZi7BGkb8zeVXvbJ0ZYmxfNxKmpvgiA554PNsRo8lem0wBW4Y/xOoDe5bVhjpwymzafHULgo9PFEMOWkDr71K0XhHM4BBJn4KeOyM6zJ/FGSfh+qUI8BcjW69rnFxsITsT5scdJSxyeASy12EXq3iFa+z0TCkAODJQmTChvY9XTPPqeQ5qHSauJFYc4gB8fJgxa/z1BWeQf2XY8eIcjmZ1C7XZwfnGCzNHbOBoIhAvhtDrRa5CgFT+jxeQRwnUuhySo8a1KFZUTJLZEJanp0UAk3fnMl3qLU5eQtlO0CyUt5F3fHJnufdzSFbLmLWhUFqQTnXQIgeblrshSkG X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(67672495146484); X-Exchange-Antispam-Report-CFA-Test: BCL:0; PCL:0; RULEID:(6040176)(601004)(2401047)(8121501046)(5005006)(10201501046)(3002001); SRVR:CY4PR15MB1320; BCL:0; PCL:0; RULEID:; SRVR:CY4PR15MB1320; X-Forefront-PRVS: 0079056367 X-Forefront-Antispam-Report: SFV:NSPM; SFS:(10019020)(4630300001)(6069001)(6009001)(7916002)(45904002)(199003)(189002)(107886002)(106356001)(189998001)(86362001)(47776003)(586003)(2201001)(66066001)(19580405001)(19580395003)(6116002)(81156014)(5660300001)(2906002)(5003940100001)(8676002)(3846002)(81166006)(68736007)(229853001)(50986999)(50226002)(5001770100001)(101416001)(7846002)(305945005)(7736002)(105586002)(92566002)(97736004)(76506005)(42186005)(36756003)(33646002)(48376002)(77096005)(50466002)(781001); DIR:OUT; SFP:1102; SCL:1; SRVR:CY4PR15MB1320; H:localhost; FPR:; SPF:None; PTR:InfoNoRecords; A:1; MX:1; LANG:en; Received-SPF: None (protection.outlook.com: fb.com does not designate permitted sender hosts) X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1; CY4PR15MB1320; 23:u2v9bEaPMiUIVJDTVv2J490AhFgIUkn49YRBlwNni?= =?us-ascii?Q?XzB/6BzVs1QFmozV7UjFLAru4OIgHltA4lOcHwNniZ6fiOALo1EWHk/Jzt0X?= =?us-ascii?Q?+XtNrA7BwlbnTiB6zTjWA2KhTDdLrUg9lAFAA+0ZqPeou4Z6hdqBcGwi9Lhg?= =?us-ascii?Q?oOYn/exBIAwiYv+tyfGXLWP9vLA8EU0W/933EPo9XqbLqsOHVzNbKZ+cHONl?= =?us-ascii?Q?qz3EzFoeZDpX2QqHKMSI4dVsfyy41fQZptP+UCeKhcrpaL4B5pEpB/RHzPqu?= =?us-ascii?Q?EcHdFgeWRZFF+/SMLCdypKnGc55PWCjbgtOmUpkWxlXWO+daHPos1mpBeEAC?= =?us-ascii?Q?CBac24e6WTGuXzLQwfHUl+vxWNoGTKHTFQRkvgY/Jc1yQ9gecmtcbmtoJIQ9?= =?us-ascii?Q?WanFKwHFjDbqvZ0z3axpj4yuTk9XoT4UExE0+AxsyEl85/G3phW7oKtCSs/3?= =?us-ascii?Q?2lo4hmKBSi6dW+IzR9uzo9iczq/oA7HpMY6YuS/6soc8VDYn6PphQK5UxSTh?= =?us-ascii?Q?m4faXgq94vgRxXk+DG9A92NucVyrnGycl8eYHloLb8lsmU9x3x+e+CTaDnSb?= =?us-ascii?Q?TnqYzrQVPX3tMe68sbrzwbu0jpkSFJ6BrQGcMcd/nO5S7jhTzs86Za9YKgF9?= =?us-ascii?Q?0d+Tu5AgpC57jJ6KwGA1RuKd52nGDwApezHnTX1emR7AeqSXniirDVpalO+E?= =?us-ascii?Q?Ui4ctLOyrM2TRH2PUKG76qjlnKoVMnUyUYPSz4mTUE3cZhiMihxOeciSAHjr?= =?us-ascii?Q?mZCsbFprjjQnOhT9ZFCFc0QQiTDDLPK2YYTuOGIPed1Ez/xt9ndHmxYzEmB+?= =?us-ascii?Q?mowUnMTmquzwDe2OhIff8trKYJF/u6G0qwP7UM0DuNUiA9NL0mNuMnlSr8/H?= =?us-ascii?Q?Im297j2+iiOySl913VyikeLQqbcrPAowaqVvYLGU/G3EJDAvKhzJ77X9VtqA?= =?us-ascii?Q?lXVSUv/EYaVM5I5k4fLIp55o+ilh7FdPD08F6YhTkTlMkMjYdFpIiUD9s2YD?= =?us-ascii?Q?jNS7ynLewDag4LsfpwT8B2zFATZfik4wtPFp67/WKhevUy2YcC3Cs4GkWd2u?= =?us-ascii?Q?WJrX35bbPWlFj3ilOPegaPyk7nRlDu3xngc0hOR38Zp6MPZcJSNerF1MGA3d?= =?us-ascii?Q?6bMvoJ3KaieC35eHMgd/dp7ULLWXXvF?= X-Microsoft-Exchange-Diagnostics: 1; CY4PR15MB1320; 6:xbs3Mxr+Pp1F9a/jSA4rnXZPdL8tCyx4DS9eYXXNiOvv8fSgbDrY/oKKJV+GOmXZtzo3cwGtkE05JF+OayOeLtD5hGhaF3BTGul5ECUkxhVRioUmWYvHyDKYUZjRxIy0LeFFqm3nkDtCS9YSaYyOqBGKCEsRDax3vzhV0d/lM2d+h/7ncgh+p6sqwHABeRniqGYvOyV3BCJiLiIeDppSEKMpI0qieF+GdKzW3j6vqHZhep0CVsmPvltMx3tlh6oMPIMEJL43tVy0yBqHBdTmfRpsuUggZSSg+uQvOHdx5aE=; 5:ywMJqB1BsIceXaIiwCVL86PZYdkSQR6ASYo0Y42IvZHPMnrbGOFjszXtdoORbx6xtGqoENKm5KtWLg+/QT32Un0N57/YPwUQI/OSMxCGXoeh6NorsDGhU4pnhSCRnqMllHY5felSUzbK3uV/s2LPSw==; 24:P0MlvM0gT/w8GZp6WtY/MEiIbL94fwJnbZrb+lTaYEdfrtoClmtYz5xwEUvyhKKflvBv85+0sKGO3qrpgMD4FG/bYNOegQVMpKCYXVrUcGQ=; 7:Qr71fdyPJ3kz4i+blHOLO5LccGNALPy+xx1UuqBasvEluA8CaDDlSMA2k3LdXZiYf1BzgmycifxLHcygqF6P1SIa7+fEB5CuHa0621JmlNSMZhV+rIsHqxMw2WCZTvxBAiKwkc/zgpH2BnC0B3pH5HKiVKIzNIF/TeNEWr5UBWfplxSfTay48HOrWpjBByyxzpTBMa20ED7FaReJGwnP39RkcXcSQl3W3dnHTahtHUO9DtMZc/gsSXd6qx6mxtKsYyD6wWkvuHpYr+0XL8snwbXMGPb6uLU4XrSWxnAmz5mzpuC+avHJ6f8tL1qmUour SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1; CY4PR15MB1320; 20:C12QpfNjzd+UXKPwZzwFGHoFRx89E1UTxqkNf0dkvJ0MAQOePtcYL8YdcXYFrRiaZe8MpuLVDk5uRnsDbuX6b7Qwj359NtaEipu9HmXcIbED2VruZLxpwXaSJ1yGh3JvcgfVibZFQnjXK17hML1FhZ4bwQWERzz8Zt31l94hQJU= X-MS-Exchange-CrossTenant-OriginalArrivalTime: 28 Sep 2016 20:01:36.3690 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY4PR15MB1320 X-OriginatorOrg: fb.com X-Proofpoint-Spam-Reason: safe X-FB-Internal: Safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2016-09-28_11:, , signatures=0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP NBD can become contended on its single connection. We have to serialize all writes and we can only process one read response at a time. Fix this by allowing userspace to provide multiple connections to a single nbd device. This coupled with block-mq drastically increases performance in multi-process cases. Thanks, Signed-off-by: Josef Bacik --- V2->V3: -Fixed a problem with the tag used for the requests. -Rebased onto the patch that enables async submit. V1->V2: -Dropped the index from nbd_cmd and just used the hctx->queue_num as HCH suggested -Added the pid attribute back to the /sys/block/nbd*/ directory for the recv pid. -Reworked the disconnect to simply send the command on all connections instead of sending a special command through the block layer. -Fixed some of the disconnect handling to be less verbose when we specifically request a disconnect. drivers/block/nbd.c | 358 +++++++++++++++++++++++++++++++--------------------- 1 file changed, 217 insertions(+), 141 deletions(-) diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index ccfcfc1..30f4f58 100644 --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -41,26 +41,36 @@ #include +struct nbd_sock { + struct socket *sock; + struct mutex tx_lock; +}; + #define NBD_TIMEDOUT 0 #define NBD_DISCONNECT_REQUESTED 1 +#define NBD_DISCONNECTED 2 +#define NBD_RUNNING 3 struct nbd_device { u32 flags; unsigned long runtime_flags; - struct socket * sock; /* If == NULL, device is not ready, yet */ + struct nbd_sock **socks; int magic; struct blk_mq_tag_set tag_set; - struct mutex tx_lock; + struct mutex config_lock; struct gendisk *disk; + int num_connections; + atomic_t recv_threads; + wait_queue_head_t recv_wq; int blksize; loff_t bytesize; /* protects initialization and shutdown of the socket */ spinlock_t sock_lock; struct task_struct *task_recv; - struct task_struct *task_send; + struct task_struct *task_setup; #if IS_ENABLED(CONFIG_DEBUG_FS) struct dentry *dbg_dir; @@ -69,7 +79,6 @@ struct nbd_device { struct nbd_cmd { struct nbd_device *nbd; - struct list_head list; }; #if IS_ENABLED(CONFIG_DEBUG_FS) @@ -159,22 +168,20 @@ static void nbd_end_request(struct nbd_cmd *cmd) */ static void sock_shutdown(struct nbd_device *nbd) { - struct socket *sock; - - spin_lock(&nbd->sock_lock); + int i; - if (!nbd->sock) { - spin_unlock_irq(&nbd->sock_lock); + if (nbd->num_connections == 0) + return; + if (test_and_set_bit(NBD_DISCONNECTED, &nbd->runtime_flags)) return; - } - - sock = nbd->sock; - dev_warn(disk_to_dev(nbd->disk), "shutting down socket\n"); - nbd->sock = NULL; - spin_unlock(&nbd->sock_lock); - kernel_sock_shutdown(sock, SHUT_RDWR); - sockfd_put(sock); + for (i = 0; i < nbd->num_connections; i++) { + struct nbd_sock *nsock = nbd->socks[i]; + mutex_lock(&nsock->tx_lock); + kernel_sock_shutdown(nsock->sock, SHUT_RDWR); + mutex_unlock(&nsock->tx_lock); + } + dev_warn(disk_to_dev(nbd->disk), "shutting down sockets\n"); } static enum blk_eh_timer_return nbd_xmit_timeout(struct request *req, @@ -182,35 +189,31 @@ static enum blk_eh_timer_return nbd_xmit_timeout(struct request *req, { struct nbd_cmd *cmd = blk_mq_rq_to_pdu(req); struct nbd_device *nbd = cmd->nbd; - struct socket *sock = NULL; - - spin_lock(&nbd->sock_lock); + dev_err(nbd_to_dev(nbd), "Connection timed out, shutting down connection\n"); set_bit(NBD_TIMEDOUT, &nbd->runtime_flags); - - if (nbd->sock) { - sock = nbd->sock; - get_file(sock->file); - } - - spin_unlock(&nbd->sock_lock); - if (sock) { - kernel_sock_shutdown(sock, SHUT_RDWR); - sockfd_put(sock); - } - req->errors++; - dev_err(nbd_to_dev(nbd), "Connection timed out, shutting down connection\n"); + + /* + * If our disconnect packet times out then we're already holding the + * config_lock and could deadlock here, so just set an error and return, + * we'll handle shutting everything down later. + */ + if (req->cmd_type == REQ_TYPE_DRV_PRIV) + return BLK_EH_HANDLED; + mutex_lock(&nbd->config_lock); + sock_shutdown(nbd); + mutex_unlock(&nbd->config_lock); return BLK_EH_HANDLED; } /* * Send or receive packet. */ -static int sock_xmit(struct nbd_device *nbd, int send, void *buf, int size, - int msg_flags) +static int sock_xmit(struct nbd_device *nbd, int index, int send, void *buf, + int size, int msg_flags) { - struct socket *sock = nbd->sock; + struct socket *sock = nbd->socks[index]->sock; int result; struct msghdr msg; struct kvec iov; @@ -254,29 +257,28 @@ static int sock_xmit(struct nbd_device *nbd, int send, void *buf, int size, return result; } -static inline int sock_send_bvec(struct nbd_device *nbd, struct bio_vec *bvec, - int flags) +static inline int sock_send_bvec(struct nbd_device *nbd, int index, + struct bio_vec *bvec, int flags) { int result; void *kaddr = kmap(bvec->bv_page); - result = sock_xmit(nbd, 1, kaddr + bvec->bv_offset, + result = sock_xmit(nbd, index, 1, kaddr + bvec->bv_offset, bvec->bv_len, flags); kunmap(bvec->bv_page); return result; } /* always call with the tx_lock held */ -static int nbd_send_cmd(struct nbd_device *nbd, struct nbd_cmd *cmd) +static int nbd_send_cmd(struct nbd_device *nbd, struct nbd_cmd *cmd, int index) { struct request *req = blk_mq_rq_from_pdu(cmd); int result, flags; struct nbd_request request; unsigned long size = blk_rq_bytes(req); u32 type; + u32 tag = blk_mq_unique_tag(req); - if (req->cmd_type == REQ_TYPE_DRV_PRIV) - type = NBD_CMD_DISC; - else if (req_op(req) == REQ_OP_DISCARD) + if (req_op(req) == REQ_OP_DISCARD) type = NBD_CMD_TRIM; else if (req_op(req) == REQ_OP_FLUSH) type = NBD_CMD_FLUSH; @@ -288,16 +290,16 @@ static int nbd_send_cmd(struct nbd_device *nbd, struct nbd_cmd *cmd) memset(&request, 0, sizeof(request)); request.magic = htonl(NBD_REQUEST_MAGIC); request.type = htonl(type); - if (type != NBD_CMD_FLUSH && type != NBD_CMD_DISC) { + if (type != NBD_CMD_FLUSH) { request.from = cpu_to_be64((u64)blk_rq_pos(req) << 9); request.len = htonl(size); } - memcpy(request.handle, &req->tag, sizeof(req->tag)); + memcpy(request.handle, &tag, sizeof(tag)); dev_dbg(nbd_to_dev(nbd), "request %p: sending control (%s@%llu,%uB)\n", cmd, nbdcmd_to_ascii(type), (unsigned long long)blk_rq_pos(req) << 9, blk_rq_bytes(req)); - result = sock_xmit(nbd, 1, &request, sizeof(request), + result = sock_xmit(nbd, index, 1, &request, sizeof(request), (type == NBD_CMD_WRITE) ? MSG_MORE : 0); if (result <= 0) { dev_err(disk_to_dev(nbd->disk), @@ -318,7 +320,7 @@ static int nbd_send_cmd(struct nbd_device *nbd, struct nbd_cmd *cmd) flags = MSG_MORE; dev_dbg(nbd_to_dev(nbd), "request %p: sending %d bytes data\n", cmd, bvec.bv_len); - result = sock_send_bvec(nbd, &bvec, flags); + result = sock_send_bvec(nbd, index, &bvec, flags); if (result <= 0) { dev_err(disk_to_dev(nbd->disk), "Send data failed (result %d)\n", @@ -330,31 +332,34 @@ static int nbd_send_cmd(struct nbd_device *nbd, struct nbd_cmd *cmd) return 0; } -static inline int sock_recv_bvec(struct nbd_device *nbd, struct bio_vec *bvec) +static inline int sock_recv_bvec(struct nbd_device *nbd, int index, + struct bio_vec *bvec) { int result; void *kaddr = kmap(bvec->bv_page); - result = sock_xmit(nbd, 0, kaddr + bvec->bv_offset, bvec->bv_len, - MSG_WAITALL); + result = sock_xmit(nbd, index, 0, kaddr + bvec->bv_offset, + bvec->bv_len, MSG_WAITALL); kunmap(bvec->bv_page); return result; } /* NULL returned = something went wrong, inform userspace */ -static struct nbd_cmd *nbd_read_stat(struct nbd_device *nbd) +static struct nbd_cmd *nbd_read_stat(struct nbd_device *nbd, int index) { int result; struct nbd_reply reply; struct nbd_cmd *cmd; struct request *req = NULL; u16 hwq; - int tag; + u32 tag; reply.magic = 0; - result = sock_xmit(nbd, 0, &reply, sizeof(reply), MSG_WAITALL); + result = sock_xmit(nbd, index, 0, &reply, sizeof(reply), MSG_WAITALL); if (result <= 0) { - dev_err(disk_to_dev(nbd->disk), - "Receive control failed (result %d)\n", result); + if (!test_bit(NBD_DISCONNECTED, &nbd->runtime_flags) && + !test_bit(NBD_DISCONNECT_REQUESTED, &nbd->runtime_flags)) + dev_err(disk_to_dev(nbd->disk), + "Receive control failed (result %d)\n", result); return ERR_PTR(result); } @@ -364,7 +369,7 @@ static struct nbd_cmd *nbd_read_stat(struct nbd_device *nbd) return ERR_PTR(-EPROTO); } - memcpy(&tag, reply.handle, sizeof(int)); + memcpy(&tag, reply.handle, sizeof(u32)); hwq = blk_mq_unique_tag_to_hwq(tag); if (hwq < nbd->tag_set.nr_hw_queues) @@ -390,7 +395,7 @@ static struct nbd_cmd *nbd_read_stat(struct nbd_device *nbd) struct bio_vec bvec; rq_for_each_segment(bvec, req, iter) { - result = sock_recv_bvec(nbd, &bvec); + result = sock_recv_bvec(nbd, index, &bvec); if (result <= 0) { dev_err(disk_to_dev(nbd->disk), "Receive data failed (result %d)\n", result); @@ -418,25 +423,24 @@ static struct device_attribute pid_attr = { .show = pid_show, }; -static int nbd_thread_recv(struct nbd_device *nbd, struct block_device *bdev) +struct recv_thread_args { + struct work_struct work; + struct nbd_device *nbd; + int index; +}; + +static void recv_work(struct work_struct *work) { + struct recv_thread_args *args = container_of(work, + struct recv_thread_args, + work); + struct nbd_device *nbd = args->nbd; struct nbd_cmd *cmd; - int ret; + int ret = 0; BUG_ON(nbd->magic != NBD_MAGIC); - - sk_set_memalloc(nbd->sock->sk); - - ret = device_create_file(disk_to_dev(nbd->disk), &pid_attr); - if (ret) { - dev_err(disk_to_dev(nbd->disk), "device_create_file failed!\n"); - return ret; - } - - nbd_size_update(nbd, bdev); - while (1) { - cmd = nbd_read_stat(nbd); + cmd = nbd_read_stat(nbd, args->index); if (IS_ERR(cmd)) { ret = PTR_ERR(cmd); break; @@ -445,10 +449,14 @@ static int nbd_thread_recv(struct nbd_device *nbd, struct block_device *bdev) nbd_end_request(cmd); } - nbd_size_clear(nbd, bdev); - - device_remove_file(disk_to_dev(nbd->disk), &pid_attr); - return ret; + /* + * We got an error, shut everybody down if this wasn't the result of a + * disconnect request. + */ + if (ret && !test_bit(NBD_DISCONNECT_REQUESTED, &nbd->runtime_flags)) + sock_shutdown(nbd); + atomic_dec(&nbd->recv_threads); + wake_up(&nbd->recv_wq); } static void nbd_clear_req(struct request *req, void *data, bool reserved) @@ -466,26 +474,35 @@ static void nbd_clear_que(struct nbd_device *nbd) { BUG_ON(nbd->magic != NBD_MAGIC); - /* - * Because we have set nbd->sock to NULL under the tx_lock, all - * modifications to the list must have completed by now. - */ - BUG_ON(nbd->sock); - blk_mq_tagset_busy_iter(&nbd->tag_set, nbd_clear_req, NULL); dev_dbg(disk_to_dev(nbd->disk), "queue cleared\n"); } -static void nbd_handle_cmd(struct nbd_cmd *cmd) +static void nbd_handle_cmd(struct nbd_cmd *cmd, int index) { struct request *req = blk_mq_rq_from_pdu(cmd); struct nbd_device *nbd = cmd->nbd; + struct nbd_sock *nsock; - if (req->cmd_type != REQ_TYPE_FS) + if (index >= nbd->num_connections) { + dev_err(disk_to_dev(nbd->disk), + "Attempted send on invalid socket\n"); goto error_out; + } + + if (test_bit(NBD_DISCONNECTED, &nbd->runtime_flags)) { + dev_err(disk_to_dev(nbd->disk), + "Attempted send on closed socket\n"); + goto error_out; + } - if (rq_data_dir(req) == WRITE && + if (req->cmd_type != REQ_TYPE_FS && + req->cmd_type != REQ_TYPE_DRV_PRIV) + goto error_out; + + if (req->cmd_type == REQ_TYPE_FS && + rq_data_dir(req) == WRITE && (nbd->flags & NBD_FLAG_READ_ONLY)) { dev_err(disk_to_dev(nbd->disk), "Write on read-only\n"); @@ -494,23 +511,22 @@ static void nbd_handle_cmd(struct nbd_cmd *cmd) req->errors = 0; - mutex_lock(&nbd->tx_lock); - nbd->task_send = current; - if (unlikely(!nbd->sock)) { - mutex_unlock(&nbd->tx_lock); + nsock = nbd->socks[index]; + mutex_lock(&nsock->tx_lock); + if (unlikely(!nsock->sock)) { + mutex_unlock(&nsock->tx_lock); dev_err(disk_to_dev(nbd->disk), "Attempted send on closed socket\n"); goto error_out; } - if (nbd_send_cmd(nbd, cmd) != 0) { + if (nbd_send_cmd(nbd, cmd, index) != 0) { dev_err(disk_to_dev(nbd->disk), "Request send failed\n"); req->errors++; nbd_end_request(cmd); } - nbd->task_send = NULL; - mutex_unlock(&nbd->tx_lock); + mutex_unlock(&nsock->tx_lock); return; @@ -525,38 +541,57 @@ static int nbd_queue_rq(struct blk_mq_hw_ctx *hctx, struct nbd_cmd *cmd = blk_mq_rq_to_pdu(bd->rq); blk_mq_start_request(bd->rq); - nbd_handle_cmd(cmd); + nbd_handle_cmd(cmd, hctx->queue_num); return BLK_MQ_RQ_QUEUE_OK; } -static int nbd_set_socket(struct nbd_device *nbd, struct socket *sock) +static int nbd_add_socket(struct nbd_device *nbd, struct socket *sock) { - int ret = 0; + struct nbd_sock **socks; + struct nbd_sock *nsock; - spin_lock_irq(&nbd->sock_lock); - - if (nbd->sock) { - ret = -EBUSY; - goto out; + if (!nbd->task_setup) + nbd->task_setup = current; + if (nbd->task_setup != current) { + dev_err(disk_to_dev(nbd->disk), + "Device being setup by another task"); + return -EINVAL; } - nbd->sock = sock; + socks = krealloc(nbd->socks, (nbd->num_connections + 1) * + sizeof(struct nbd_sock *), GFP_KERNEL); + if (!socks) + return -ENOMEM; + nsock = kzalloc(sizeof(struct nbd_sock), GFP_KERNEL); + if (!nsock) + return -ENOMEM; + + nbd->socks = socks; -out: - spin_unlock_irq(&nbd->sock_lock); + mutex_init(&nsock->tx_lock); + nsock->sock = sock; + socks[nbd->num_connections++] = nsock; - return ret; + return 0; } /* Reset all properties of an NBD device */ static void nbd_reset(struct nbd_device *nbd) { + int i; + + for (i = 0; i < nbd->num_connections; i++) + kfree(nbd->socks[i]); + kfree(nbd->socks); + nbd->socks = NULL; nbd->runtime_flags = 0; nbd->blksize = 1024; nbd->bytesize = 0; set_capacity(nbd->disk, 0); nbd->flags = 0; nbd->tag_set.timeout = 0; + nbd->num_connections = 0; + nbd->task_setup = NULL; queue_flag_clear_unlocked(QUEUE_FLAG_DISCARD, nbd->disk->queue); } @@ -582,48 +617,67 @@ static void nbd_parse_flags(struct nbd_device *nbd, struct block_device *bdev) blk_queue_write_cache(nbd->disk->queue, false, false); } +static void send_disconnects(struct nbd_device *nbd) +{ + struct nbd_request request = {}; + int i, ret; + + request.magic = htonl(NBD_REQUEST_MAGIC); + request.type = htonl(NBD_CMD_DISC); + + for (i = 0; i < nbd->num_connections; i++) { + ret = sock_xmit(nbd, i, 1, &request, sizeof(request), 0); + if (ret <= 0) + dev_err(disk_to_dev(nbd->disk), + "Send disconnect failed %d\n", ret); + } +} + static int nbd_dev_dbg_init(struct nbd_device *nbd); static void nbd_dev_dbg_close(struct nbd_device *nbd); -/* Must be called with tx_lock held */ - +/* Must be called with config_lock held */ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd, unsigned int cmd, unsigned long arg) { switch (cmd) { case NBD_DISCONNECT: { - struct request *sreq; - dev_info(disk_to_dev(nbd->disk), "NBD_DISCONNECT\n"); - if (!nbd->sock) + if (!nbd->socks) return -EINVAL; - sreq = blk_mq_alloc_request(bdev_get_queue(bdev), WRITE, 0); - if (!sreq) - return -ENOMEM; - - mutex_unlock(&nbd->tx_lock); + mutex_unlock(&nbd->config_lock); fsync_bdev(bdev); - mutex_lock(&nbd->tx_lock); - sreq->cmd_type = REQ_TYPE_DRV_PRIV; + mutex_lock(&nbd->config_lock); /* Check again after getting mutex back. */ - if (!nbd->sock) { - blk_mq_free_request(sreq); + if (!nbd->socks) return -EINVAL; - } - - set_bit(NBD_DISCONNECT_REQUESTED, &nbd->runtime_flags); - nbd_send_cmd(nbd, blk_mq_rq_to_pdu(sreq)); - blk_mq_free_request(sreq); + if (!test_and_set_bit(NBD_DISCONNECT_REQUESTED, + &nbd->runtime_flags)) + send_disconnects(nbd); return 0; } - + case NBD_CLEAR_SOCK: sock_shutdown(nbd); nbd_clear_que(nbd); kill_bdev(bdev); + nbd_bdev_reset(bdev); + /* + * We want to give the run thread a chance to wait for everybody + * to clean up and then do it's own cleanup. + */ + if (!test_bit(NBD_RUNNING, &nbd->runtime_flags)) { + int i; + + for (i = 0; i < nbd->num_connections; i++) + kfree(nbd->socks[i]); + kfree(nbd->socks); + nbd->socks = NULL; + nbd->num_connections = 0; + } return 0; case NBD_SET_SOCK: { @@ -633,7 +687,7 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd, if (!sock) return err; - err = nbd_set_socket(nbd, sock); + err = nbd_add_socket(nbd, sock); if (!err && max_part) bdev->bd_invalidated = 1; @@ -662,26 +716,53 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd, return 0; case NBD_DO_IT: { - int error; + struct recv_thread_args *args; + int num_connections = nbd->num_connections; + int error, i; if (nbd->task_recv) return -EBUSY; - if (!nbd->sock) + if (!nbd->socks) return -EINVAL; - /* We have to claim the device under the lock */ + set_bit(NBD_RUNNING, &nbd->runtime_flags); + blk_mq_update_nr_hw_queues(&nbd->tag_set, nbd->num_connections); + args = kcalloc(num_connections, sizeof(*args), GFP_KERNEL); + if (!args) + goto out_err; nbd->task_recv = current; - mutex_unlock(&nbd->tx_lock); + mutex_unlock(&nbd->config_lock); nbd_parse_flags(nbd, bdev); + error = device_create_file(disk_to_dev(nbd->disk), &pid_attr); + if (error) { + dev_err(disk_to_dev(nbd->disk), "device_create_file failed!\n"); + goto out_recv; + } + + nbd_size_update(nbd, bdev); + nbd_dev_dbg_init(nbd); - error = nbd_thread_recv(nbd, bdev); + for (i = 0; i < num_connections; i++) { + sk_set_memalloc(nbd->socks[i]->sock->sk); + atomic_inc(&nbd->recv_threads); + INIT_WORK(&args[i].work, recv_work); + args[i].nbd = nbd; + args[i].index = i; + queue_work(system_long_wq, &args[i].work); + } + wait_event_interruptible(nbd->recv_wq, + atomic_read(&nbd->recv_threads) == 0); + for (i = 0; i < num_connections; i++) + flush_work(&args[i].work); nbd_dev_dbg_close(nbd); - - mutex_lock(&nbd->tx_lock); + nbd_size_clear(nbd, bdev); + device_remove_file(disk_to_dev(nbd->disk), &pid_attr); +out_recv: + mutex_lock(&nbd->config_lock); nbd->task_recv = NULL; - +out_err: sock_shutdown(nbd); nbd_clear_que(nbd); kill_bdev(bdev); @@ -694,7 +775,6 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd, error = -ETIMEDOUT; nbd_reset(nbd); - return error; } @@ -726,9 +806,9 @@ static int nbd_ioctl(struct block_device *bdev, fmode_t mode, BUG_ON(nbd->magic != NBD_MAGIC); - mutex_lock(&nbd->tx_lock); + mutex_lock(&nbd->config_lock); error = __nbd_ioctl(bdev, nbd, cmd, arg); - mutex_unlock(&nbd->tx_lock); + mutex_unlock(&nbd->config_lock); return error; } @@ -748,8 +828,6 @@ static int nbd_dbg_tasks_show(struct seq_file *s, void *unused) if (nbd->task_recv) seq_printf(s, "recv: %d\n", task_pid_nr(nbd->task_recv)); - if (nbd->task_send) - seq_printf(s, "send: %d\n", task_pid_nr(nbd->task_send)); return 0; } @@ -873,9 +951,7 @@ static int nbd_init_request(void *data, struct request *rq, unsigned int numa_node) { struct nbd_cmd *cmd = blk_mq_rq_to_pdu(rq); - cmd->nbd = data; - INIT_LIST_HEAD(&cmd->list); return 0; } @@ -986,13 +1062,13 @@ static int __init nbd_init(void) for (i = 0; i < nbds_max; i++) { struct gendisk *disk = nbd_dev[i].disk; nbd_dev[i].magic = NBD_MAGIC; - spin_lock_init(&nbd_dev[i].sock_lock); - mutex_init(&nbd_dev[i].tx_lock); + mutex_init(&nbd_dev[i].config_lock); disk->major = NBD_MAJOR; disk->first_minor = i << part_shift; disk->fops = &nbd_fops; disk->private_data = &nbd_dev[i]; sprintf(disk->disk_name, "nbd%d", i); + init_waitqueue_head(&nbd_dev[i].recv_wq); nbd_reset(&nbd_dev[i]); add_disk(disk); }