From patchwork Thu Sep 8 21:12:10 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Josef Bacik X-Patchwork-Id: 9322083 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 8AA4760231 for ; Thu, 8 Sep 2016 21:13:50 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 78BE7299F6 for ; Thu, 8 Sep 2016 21:13:50 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 6B34C299F9; Thu, 8 Sep 2016 21:13:50 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 33804299F6 for ; Thu, 8 Sep 2016 21:13:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753732AbcIHVNs (ORCPT ); Thu, 8 Sep 2016 17:13:48 -0400 Received: from mx0b-00082601.pphosted.com ([67.231.153.30]:59571 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1757347AbcIHVMz (ORCPT ); Thu, 8 Sep 2016 17:12:55 -0400 Received: from pps.filterd (m0089730.ppops.net [127.0.0.1]) by m0089730.ppops.net (8.16.0.17/8.16.0.17) with SMTP id u88LAD65022002; Thu, 8 Sep 2016 14:12:45 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=fb.com; h=from : to : subject : date : message-id : in-reply-to : references : mime-version : content-type; s=facebook; bh=g6msAwfMwtStoSe4zH6IjgfaKH2K1CZH+jiP7Ewp3cU=; b=Jl5LZz25w9qqoYx42SfFJ3Zi2pW4HK3PqqwkzWNF5JAFII6RBQ1fV08lbBuOyeyY23t1 65b6E9+sx96Z6Ma3r0DHNPzSODC+NBb32ONRDTgNWvD2qpIiyRljeOoOPX6qW9GLeQ4H e1euu+gUA9zbLKX8GBIRKu9tOP/xJ6FN0hA= Received: from maileast.thefacebook.com ([199.201.65.23]) by m0089730.ppops.net with ESMTP id 25bc2dstu3-1 (version=TLSv1 cipher=ECDHE-RSA-AES256-SHA bits=256 verify=NOT); Thu, 08 Sep 2016 14:12:45 -0700 Received: from NAM02-BL2-obe.outbound.protection.outlook.com (192.168.183.28) by o365-in.thefacebook.com (192.168.177.29) with Microsoft SMTP Server (TLS) id 14.3.294.0; Thu, 8 Sep 2016 17:12:44 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.onmicrosoft.com; s=selector1-fb-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=dv0LvMnUtGts40iQHu0spTpBZ7VvP2Gr09BArJPMu7c=; b=ZOkPEH5Kw5WOhhnAPQqcoMxZJaezCm3YQ1P2FKQlsVwMPWftfuVEcUGCG3VRjr2GYYRD2OQ04vLMRMEJgFuFZugRqRlrPMErrGd2/kcrzADw4jR6Y8W+90mqM6eqgqQe635SwYDE3vmUoB7cKm+pjrqP7//OzVyhymJoiPn2j/I= Received: from localhost (2620:10d:c091:180::1:8c0f) by DM5PR15MB1321.namprd15.prod.outlook.com (10.173.210.11) with Microsoft SMTP Server (version=TLS1_0, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA_P384) id 15.1.609.9; Thu, 8 Sep 2016 21:12:43 +0000 From: Josef Bacik To: , , , , Subject: [PATCH 5/5] nbd: add multi-connection support Date: Thu, 8 Sep 2016 17:12:10 -0400 Message-ID: <1473369130-22986-6-git-send-email-jbacik@fb.com> X-Mailer: git-send-email 2.5.5 In-Reply-To: <1473369130-22986-1-git-send-email-jbacik@fb.com> References: <1473369130-22986-1-git-send-email-jbacik@fb.com> MIME-Version: 1.0 X-Originating-IP: [2620:10d:c091:180::1:8c0f] X-ClientProxiedBy: YQBPR01CA0081.CANPRD01.PROD.OUTLOOK.COM (10.169.139.49) To DM5PR15MB1321.namprd15.prod.outlook.com (10.173.210.11) X-MS-Office365-Filtering-Correlation-Id: b41302fe-18cf-4663-9966-08d3d82cdfec X-Microsoft-Exchange-Diagnostics: 1; DM5PR15MB1321; 2:45imod4rDfoZE01duVzpOR2cmHIyGL+k8P2PPYNIQTG1INus3TEFbMzYhbv8/4UmbeKKPjUhlYRERzW4P9DeYPmbEehhhvV4p/7Pp0sU9/we/EobUAAylgSeISrzk/e06X1gx6owLn+iZikwxtKgHjcc3bKIcQ/4t8ajRrF65uqgCYCKqZxMY0wctm2CaA/N; 3:CkrsKUSwkxHm3XBU64bSHwcYr3W4epY2j9GnIZ1hTRm/KaWnn4qDpkkKKNtwQnxQx7xwRnsHnnU5TN3zqzJsNFwIat793pJQsBZ+EJXTCx2uPIgpkjkP6UY54dmlT52Q; 25:yHPbyq6ye4lah2zzOK19WpZG0KbcSzrLIFF3lJfNy8pcVkCa74W4oUkoVs67Ck1BzQxVd/PLhISLZmzukO+3u8BXbYMTf5GGS2dNxpUR5rhynuAX5JDB9uhsvtbgiVX0mgLXrFesw0hRorihYx6rSW6rZTxnDrIOpOa81hPrAMbBqkLoPxunjj22FD6TOpwvHAx8JuUbqDOr2tqrl3HawZ6pxysMxA3RNwosci+YTRPyJKAq6KP9d1Mhohnp5X52jUPIHV6SYBTKzZE5SKpV7ReUWTozAqt48o3OCmzkIX68roGccOj2/BoRUwj8vIkEx2DUXdVluW7f8ZGGREjWfl0012CcDtvkHETTCgiv6yBN20iw/O0fb2UfTEVhaKpVz/s1eOL4Vs1921l2t1iEPA==; 31:QV3ubeSS3opHKWNxixZD/SCghwmheQLH1NVGO9Kh6T/38gtyLGht8x/tYltO8zPMHmJddm7Ky7Cdp39bm+pfjmd5u0+zJ8zyI1AQYP3++/18FsAfKSTNxoH1G/qBIvFwx/pxNBrPhlyrAke55ggov9xHSt0fK1L3/OUXnRC2VD1UXWzTTBR2linBjPjPW3VbBYdHgYz AxC3dqCK0Ze5zDyIOVhmp5N7db/a4Jm7aBoA= X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:DM5PR15MB1321; X-LD-Processed: 8ae927fe-1255-47a7-a2af-5f3a069daaa2,ExtAddr X-Microsoft-Exchange-Diagnostics: 1; DM5PR15MB1321; 20:6BcSs2rw+HlqRvGsUrBsXagCPO2W6SVUmOymb2J/GVeYAxKIllmSefxM6VlqQYWf5KAS+SSbF2Qqy9vUE/MGjRNrKtaXr15DM5ni9JPL5C0WVF4+JvrqDC5DtqgGzyUpXwmYPsM2SdAdAQHrA+AoHu7E+NlK/D7FeVOtf2oLtCw=; 4:iV906CeI3slApGk99sGymT2TF5okX6F+cejrPB2hkUpIVxDyt2YpeFTQ5sXzczj+8i5a+5eIlgfTAQL/uJk36BNbbDZE2rOw5BDWargyREeCIFh2aM7WDgILW3hPzomvr+PfjhoxMdxhKuMpDIBEUSTOjht2PTnAZK8jnC9qz9C2mC8xUA6D4kZX+czSG1XdfQC2fEPoCCdd/sDb/Y1FVA8Mcree/c8AiyFqznT8leqqFz5y945u8xHFh+JoQq7LZlYYneBSRhHdxKJAy8DLrojhcUKpe0OoC5F1x3LGEOOMliYoZSg0Ot7vGfiukVvSa7M4tVFsl3q0N9+q+pT84T9A1lhd+/3XAsMIfOk5Y9NyBvdnErgq5ExUmAzMfeqo4DXbzEJbq9nrVmalec51fv6ZEItKu7n9aPIfd5Y41+f9ZSlOZpTkx96IFimtNMWe X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(67672495146484); X-Exchange-Antispam-Report-CFA-Test: BCL:0; PCL:0; RULEID:(6040176)(601004)(2401047)(8121501046)(5005006)(10201501046)(3002001); SRVR:DM5PR15MB1321; BCL:0; PCL:0; RULEID:; SRVR:DM5PR15MB1321; X-Forefront-PRVS: 00594E8DBA X-Forefront-Antispam-Report: SFV:NSPM; SFS:(10019020)(4630300001)(6069001)(6009001)(7916002)(199003)(45904002)(189002)(8676002)(97736004)(107886002)(36756003)(76176999)(5001770100001)(6116002)(305945005)(5003940100001)(42186005)(81166006)(81156014)(50986999)(76506005)(19580405001)(7736002)(48376002)(2950100001)(7846002)(50466002)(33646002)(106356001)(586003)(47776003)(5660300001)(50226002)(92566002)(229853001)(68736007)(77096005)(19580395003)(86362001)(2201001)(105586002)(101416001)(2906002)(189998001)(3826002); DIR:OUT; SFP:1102; SCL:1; SRVR:DM5PR15MB1321; H:localhost; FPR:; SPF:None; PTR:InfoNoRecords; A:1; MX:1; LANG:en; Received-SPF: None (protection.outlook.com: fb.com does not designate permitted sender hosts) X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1; DM5PR15MB1321; 23:0b5wDb9cQFpfdNT8rebUkS9YuGjyW8iIxURTseUF8?= =?us-ascii?Q?BpLkulGAEjdpRhGuYMFWX52e+UVmJu7sdxtZahAtmmpaO18ED0IpFbgqCedT?= =?us-ascii?Q?RtY7czVBW7WOp5FN3N/nwDKkEgMDsY/R5CGo0dq3tLhhdHmpPri24kUj1H5O?= =?us-ascii?Q?iuD72MkB3mR8Zn3U2IcEm4XkCEMxqFHkgn4PLg+j9C1lXJUQRVsjeLC4LOj0?= =?us-ascii?Q?nFtrOtFDC4lqK0wCsGxaRDXIyV3ZnAPmGre501OoZqZDVxVC5O1re0N7nBva?= =?us-ascii?Q?bvK0y4EN6D1S5sacdpIVtzL4p3WcDdXQlMS3pJPJ5YYMWNmNDVm342tfwlbz?= =?us-ascii?Q?YnnbJzeYdKYLSExJcuospdIL2fK9Jp0FmxS9xmjv7be66B5Rph34+brfYBKs?= =?us-ascii?Q?TKJXFykYc7o6bB48UUbhLCuZ4Gzg9CH7lSR5fZdJtx15QmswSTi9IQ8N1Kga?= =?us-ascii?Q?wM5wXyo8KbN1slzGFnsPXHWHsfyivN3IuUHFwBAlgYmkF4yP7x9BXsE8gpz+?= =?us-ascii?Q?enekv3/w4xVn3khMTODwxCNapS9eWG3a1gFwNuKfjk05vrXHtBwMv+WWSF1k?= =?us-ascii?Q?bnSJGZJnX7UiaQujSXaZQ/prxpOUyxov/vY0XGSf5LQrM9IvTjczakHIk3+/?= =?us-ascii?Q?6oOrm5E+4jAQsZdkJyI3GzddaPuDibIpmxAKPwU/+Xpg+QCIG+ZsFJWtbCZQ?= =?us-ascii?Q?Tmdlkf7YZZFkNRF01kk8QQBTqRsPawG3zw9VV+CbAxOxiWEhmJD4j0maLazp?= =?us-ascii?Q?NZ3BtVt4JD+uGzWeiXXvqjXIxFScz/Hc3EzCJTUNZ3uY7uZAS7Y2LaTbs9o/?= =?us-ascii?Q?jE+DGkiwBlXTT3gzpllTXYBMU1TK+hMI9a57xj+chkHy1HZe2CCG6LQ4XT53?= =?us-ascii?Q?Xs538jRvNxdYyxIqjNWUt3bhVZvQ3GlE9Yxl6mceSwrSdWOrRtzPY8sPb0pQ?= =?us-ascii?Q?4WLK5xXWw1gFM2g2mQkJBi3jqBK926GiTcI3pB2JZe4wqTEAtKtapfOM5JAd?= =?us-ascii?Q?mH8/Tx0Orn3MG4Eaor8nLXxG7k96FgRM0mEGlYUvUaBbNCPrOtrvDSKJHJuk?= =?us-ascii?Q?CZYrYz8D/Y1WaL3cXSQC9gjq7Y7gGY5p4fwwhDQ0xR3dKS/wIYtbs5SbPPgi?= =?us-ascii?Q?y5MJG/TcipLm5mMU/tGa9SyhYX65eFa?= X-Microsoft-Exchange-Diagnostics: 1; DM5PR15MB1321; 6:a87Nw67jrYacS9LwL+CmPT6fjaFczXd9fQFW2CyYYRP9cVfsLOkxpHQPZHYdbh2ISnEdSLBp8jCb4G15fmhma2jWEKxMalWP24VVD23MigSJSItvttHqFQ/ZbrmRSLOCj2PmAKwpTxgw4IDmgCNRPMEIz+O6kDuurkCCfvfFYYIsQ9qArOux0ZQFt25e6Yr65C50q210oZy7reyTggphUHmr/+IsBpk3Ep+WRRknSxL9GhTINGIqVym3Vs4UKFugpIBR9DIu4ms+x0Yo98SSFVGCNpNm8qoAEVJ1MwL+1Es=; 5:wbxPG6hg1x9g4OcqWBfBf9ys2GawN8l9PFpxQpU8DE/FFVTeajQ4w5wVXYmL0TmCUFAddwRUd/Dz2LNWtf9iTsPjfgRmJKea3jHo/3pNbqNvLh5MYzMxDFvBmONl052bucgvDTR2VBV3SCBOjpYO4A==; 24:dIsTNPu+n/K3gbAh+BwLm0jm1+gI+nr9BMDFgu50CBz+v9SIp49W3hJP6g9AJhqyfNFnpRmTT9UxqesxiIFktjUMF5TUFEPoTKsOr5+TXao=; 7:4HRAFt0xE6pyWyQ7HE7PmqGyUoavGs28kyFQfzWIY2jLkxTavnN6p1EDIsA17Ht4DnJj/uikjGEH6URSdiLqnj8UcDuVungTYz5xOonT57i68QrBKbxGmktFnS622x0HE3Cjs5oKty3lxiJIt7vdTKJhCX1yXoikL/IKH2I/2ADJlcbEFC1okAiaELPxw7tU0On7fK7VRISTV0W/CBdWwX4+ifcTDlq9qdaUYIOk91rhebAHHYOuIQUqE/Eq2f0t SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1; DM5PR15MB1321; 20:kHKRFp2yl0bN8xNioym3pVKM9+C1Uh8kfR2HJbpJ5PU/5woSDhiwCHZ63tct4s39iS+p8sR0xW4TDzil1+0GYDf9lJa7yRCgn3VdDsd1UZ6hGt4mfDR5Lue/Sp/C5IJWouxcZmKp2XFP1ak7oCdT8+bih9AtrhsS+U1NLsSlH8E= X-MS-Exchange-CrossTenant-OriginalArrivalTime: 08 Sep 2016 21:12:43.2834 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM5PR15MB1321 X-OriginatorOrg: fb.com X-Proofpoint-Spam-Reason: safe X-FB-Internal: Safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2016-09-08_10:, , signatures=0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP NBD can become contended on its single connection. We have to serialize all writes and we can only process one read response at a time. Fix this by allowing userspace to provide multiple connections to a single nbd device. This coupled with block-mq drastically increases performance in multi-process cases. Thanks, Signed-off-by: Josef Bacik --- drivers/block/nbd.c | 267 ++++++++++++++++++++++++++++------------------------ 1 file changed, 143 insertions(+), 124 deletions(-) diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index 4c6dd1a..4aa45ed 100644 --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -41,26 +41,35 @@ #include +struct nbd_sock { + struct socket *sock; + struct mutex tx_lock; + unsigned int index; +}; + #define NBD_TIMEDOUT 0 #define NBD_DISCONNECT_REQUESTED 1 +#define NBD_DISCONNECTED 2 struct nbd_device { u32 flags; unsigned long runtime_flags; - struct socket * sock; /* If == NULL, device is not ready, yet */ + struct nbd_sock **socks; int magic; struct blk_mq_tag_set tag_set; - struct mutex tx_lock; + struct mutex config_lock; struct gendisk *disk; + int num_connections; + atomic_t recv_threads; + wait_queue_head_t recv_wq; int blksize; loff_t bytesize; /* protects initialization and shutdown of the socket */ spinlock_t sock_lock; struct task_struct *task_recv; - struct task_struct *task_send; #if IS_ENABLED(CONFIG_DEBUG_FS) struct dentry *dbg_dir; @@ -69,7 +78,7 @@ struct nbd_device { struct nbd_cmd { struct nbd_device *nbd; - struct list_head list; + int index; }; #if IS_ENABLED(CONFIG_DEBUG_FS) @@ -159,22 +168,18 @@ static void nbd_end_request(struct nbd_cmd *cmd) */ static void sock_shutdown(struct nbd_device *nbd) { - struct socket *sock; - - spin_lock(&nbd->sock_lock); + int i; - if (!nbd->sock) { - spin_unlock_irq(&nbd->sock_lock); + if (test_and_set_bit(NBD_DISCONNECTED, &nbd->runtime_flags)) return; - } - - sock = nbd->sock; - dev_warn(disk_to_dev(nbd->disk), "shutting down socket\n"); - nbd->sock = NULL; - spin_unlock(&nbd->sock_lock); - kernel_sock_shutdown(sock, SHUT_RDWR); - sockfd_put(sock); + for (i = 0; i < nbd->num_connections; i++) { + struct nbd_sock *nsock = nbd->socks[i]; + mutex_lock(&nsock->tx_lock); + kernel_sock_shutdown(nsock->sock, SHUT_RDWR); + mutex_unlock(&nsock->tx_lock); + } + dev_warn(disk_to_dev(nbd->disk), "shutting down sockets\n"); } static enum blk_eh_timer_return nbd_xmit_timeout(struct request *req, @@ -182,35 +187,31 @@ static enum blk_eh_timer_return nbd_xmit_timeout(struct request *req, { struct nbd_cmd *cmd = blk_mq_rq_to_pdu(req); struct nbd_device *nbd = cmd->nbd; - struct socket *sock = NULL; - - spin_lock(&nbd->sock_lock); + dev_err(nbd_to_dev(nbd), "Connection timed out, shutting down connection\n"); set_bit(NBD_TIMEDOUT, &nbd->runtime_flags); - - if (nbd->sock) { - sock = nbd->sock; - get_file(sock->file); - } - - spin_unlock(&nbd->sock_lock); - if (sock) { - kernel_sock_shutdown(sock, SHUT_RDWR); - sockfd_put(sock); - } - req->errors++; - dev_err(nbd_to_dev(nbd), "Connection timed out, shutting down connection\n"); + + /* + * If our disconnect packet times out then we're already holding the + * config_lock and could deadlock here, so just set an error and return, + * we'll handle shutting everything down later. + */ + if (req->cmd_type == REQ_TYPE_DRV_PRIV) + return BLK_EH_HANDLED; + mutex_lock(&nbd->config_lock); + sock_shutdown(nbd); + mutex_unlock(&nbd->config_lock); return BLK_EH_HANDLED; } /* * Send or receive packet. */ -static int sock_xmit(struct nbd_device *nbd, int send, void *buf, int size, - int msg_flags) +static int sock_xmit(struct nbd_device *nbd, int index, int send, void *buf, + int size, int msg_flags) { - struct socket *sock = nbd->sock; + struct socket *sock = nbd->socks[index]->sock; int result; struct msghdr msg; struct kvec iov; @@ -254,12 +255,12 @@ static int sock_xmit(struct nbd_device *nbd, int send, void *buf, int size, return result; } -static inline int sock_send_bvec(struct nbd_device *nbd, struct bio_vec *bvec, - int flags) +static inline int sock_send_bvec(struct nbd_device *nbd, int index, + struct bio_vec *bvec, int flags) { int result; void *kaddr = kmap(bvec->bv_page); - result = sock_xmit(nbd, 1, kaddr + bvec->bv_offset, + result = sock_xmit(nbd, index, 1, kaddr + bvec->bv_offset, bvec->bv_len, flags); kunmap(bvec->bv_page); return result; @@ -297,7 +298,7 @@ static int nbd_send_cmd(struct nbd_device *nbd, struct nbd_cmd *cmd) dev_dbg(nbd_to_dev(nbd), "request %p: sending control (%s@%llu,%uB)\n", cmd, nbdcmd_to_ascii(type), (unsigned long long)blk_rq_pos(req) << 9, blk_rq_bytes(req)); - result = sock_xmit(nbd, 1, &request, sizeof(request), + result = sock_xmit(nbd, cmd->index, 1, &request, sizeof(request), (type == NBD_CMD_WRITE) ? MSG_MORE : 0); if (result <= 0) { dev_err(disk_to_dev(nbd->disk), @@ -318,7 +319,7 @@ static int nbd_send_cmd(struct nbd_device *nbd, struct nbd_cmd *cmd) flags = MSG_MORE; dev_dbg(nbd_to_dev(nbd), "request %p: sending %d bytes data\n", cmd, bvec.bv_len); - result = sock_send_bvec(nbd, &bvec, flags); + result = sock_send_bvec(nbd, cmd->index, &bvec, flags); if (result <= 0) { dev_err(disk_to_dev(nbd->disk), "Send data failed (result %d)\n", @@ -330,18 +331,19 @@ static int nbd_send_cmd(struct nbd_device *nbd, struct nbd_cmd *cmd) return 0; } -static inline int sock_recv_bvec(struct nbd_device *nbd, struct bio_vec *bvec) +static inline int sock_recv_bvec(struct nbd_device *nbd, int index, + struct bio_vec *bvec) { int result; void *kaddr = kmap(bvec->bv_page); - result = sock_xmit(nbd, 0, kaddr + bvec->bv_offset, bvec->bv_len, - MSG_WAITALL); + result = sock_xmit(nbd, index, 0, kaddr + bvec->bv_offset, + bvec->bv_len, MSG_WAITALL); kunmap(bvec->bv_page); return result; } /* NULL returned = something went wrong, inform userspace */ -static struct nbd_cmd *nbd_read_stat(struct nbd_device *nbd) +static struct nbd_cmd *nbd_read_stat(struct nbd_device *nbd, int index) { int result; struct nbd_reply reply; @@ -351,7 +353,7 @@ static struct nbd_cmd *nbd_read_stat(struct nbd_device *nbd) int tag; reply.magic = 0; - result = sock_xmit(nbd, 0, &reply, sizeof(reply), MSG_WAITALL); + result = sock_xmit(nbd, index, 0, &reply, sizeof(reply), MSG_WAITALL); if (result <= 0) { dev_err(disk_to_dev(nbd->disk), "Receive control failed (result %d)\n", result); @@ -390,7 +392,7 @@ static struct nbd_cmd *nbd_read_stat(struct nbd_device *nbd) struct bio_vec bvec; rq_for_each_segment(bvec, req, iter) { - result = sock_recv_bvec(nbd, &bvec); + result = sock_recv_bvec(nbd, index, &bvec); if (result <= 0) { dev_err(disk_to_dev(nbd->disk), "Receive data failed (result %d)\n", result); @@ -404,39 +406,24 @@ static struct nbd_cmd *nbd_read_stat(struct nbd_device *nbd) return cmd; } -static ssize_t pid_show(struct device *dev, - struct device_attribute *attr, char *buf) -{ - struct gendisk *disk = dev_to_disk(dev); - struct nbd_device *nbd = (struct nbd_device *)disk->private_data; - - return sprintf(buf, "%d\n", task_pid_nr(nbd->task_recv)); -} - -static struct device_attribute pid_attr = { - .attr = { .name = "pid", .mode = S_IRUGO}, - .show = pid_show, +struct recv_thread_args { + struct work_struct work; + struct nbd_device *nbd; + int index; }; -static int nbd_thread_recv(struct nbd_device *nbd, struct block_device *bdev) +static void recv_work(struct work_struct *work) { + struct recv_thread_args *args = container_of(work, + struct recv_thread_args, + work); + struct nbd_device *nbd = args->nbd; struct nbd_cmd *cmd; int ret; BUG_ON(nbd->magic != NBD_MAGIC); - - sk_set_memalloc(nbd->sock->sk); - - ret = device_create_file(disk_to_dev(nbd->disk), &pid_attr); - if (ret) { - dev_err(disk_to_dev(nbd->disk), "device_create_file failed!\n"); - return ret; - } - - nbd_size_update(nbd, bdev); - while (1) { - cmd = nbd_read_stat(nbd); + cmd = nbd_read_stat(nbd, args->index); if (IS_ERR(cmd)) { ret = PTR_ERR(cmd); break; @@ -445,10 +432,8 @@ static int nbd_thread_recv(struct nbd_device *nbd, struct block_device *bdev) nbd_end_request(cmd); } - nbd_size_clear(nbd, bdev); - - device_remove_file(disk_to_dev(nbd->disk), &pid_attr); - return ret; + atomic_dec(&nbd->recv_threads); + wake_up(&nbd->recv_wq); } static void nbd_clear_req(struct request *req, void *data, bool reserved) @@ -466,12 +451,6 @@ static void nbd_clear_que(struct nbd_device *nbd) { BUG_ON(nbd->magic != NBD_MAGIC); - /* - * Because we have set nbd->sock to NULL under the tx_lock, all - * modifications to the list must have completed by now. - */ - BUG_ON(nbd->sock); - blk_mq_tagset_busy_iter(&nbd->tag_set, nbd_clear_req, NULL); dev_dbg(disk_to_dev(nbd->disk), "queue cleared\n"); } @@ -481,11 +460,20 @@ static void nbd_handle_cmd(struct nbd_cmd *cmd) { struct request *req = blk_mq_rq_from_pdu(cmd); struct nbd_device *nbd = cmd->nbd; + struct nbd_sock *nsock = nbd->socks[cmd->index]; - if (req->cmd_type != REQ_TYPE_FS) + if (test_bit(NBD_DISCONNECTED, &nbd->runtime_flags)) { + dev_err(disk_to_dev(nbd->disk), + "Attempted send on closed socket\n"); goto error_out; + } - if (rq_data_dir(req) == WRITE && + if (req->cmd_type != REQ_TYPE_FS && + req->cmd_type != REQ_TYPE_DRV_PRIV) + goto error_out; + + if (req->cmd_type == REQ_TYPE_FS && + rq_data_dir(req) == WRITE && (nbd->flags & NBD_FLAG_READ_ONLY)) { dev_err(disk_to_dev(nbd->disk), "Write on read-only\n"); @@ -494,10 +482,9 @@ static void nbd_handle_cmd(struct nbd_cmd *cmd) req->errors = 0; - mutex_lock(&nbd->tx_lock); - nbd->task_send = current; - if (unlikely(!nbd->sock)) { - mutex_unlock(&nbd->tx_lock); + mutex_lock(&nsock->tx_lock); + if (unlikely(!nsock->sock)) { + mutex_unlock(&nsock->tx_lock); dev_err(disk_to_dev(nbd->disk), "Attempted send on closed socket\n"); goto error_out; @@ -509,8 +496,7 @@ static void nbd_handle_cmd(struct nbd_cmd *cmd) nbd_end_request(cmd); } - nbd->task_send = NULL; - mutex_unlock(&nbd->tx_lock); + mutex_unlock(&nsock->tx_lock); return; @@ -529,34 +515,45 @@ static int nbd_queue_rq(struct blk_mq_hw_ctx *hctx, return BLK_MQ_RQ_QUEUE_OK; } -static int nbd_set_socket(struct nbd_device *nbd, struct socket *sock) +static int nbd_add_socket(struct nbd_device *nbd, struct socket *sock) { - int ret = 0; - - spin_lock_irq(&nbd->sock_lock); + struct nbd_sock **socks; + struct nbd_sock *nsock; - if (nbd->sock) { - ret = -EBUSY; - goto out; - } + socks = krealloc(nbd->socks, (nbd->num_connections + 1) * + sizeof(struct nbd_sock *), GFP_KERNEL); + if (!socks) + return -ENOMEM; + nsock = kzalloc(sizeof(struct nbd_sock), GFP_KERNEL); + if (!nsock) + return -ENOMEM; - nbd->sock = sock; + nbd->socks = socks; -out: - spin_unlock_irq(&nbd->sock_lock); + mutex_init(&nsock->tx_lock); + nsock->sock = sock; + socks[nbd->num_connections++] = nsock; - return ret; + return 0; } /* Reset all properties of an NBD device */ static void nbd_reset(struct nbd_device *nbd) { + int i; + + for (i = 0; i < nbd->num_connections; i++) + kfree(nbd->socks[i]); + kfree(nbd->socks); + nbd->socks = NULL; nbd->runtime_flags = 0; nbd->blksize = 1024; nbd->bytesize = 0; set_capacity(nbd->disk, 0); nbd->flags = 0; nbd->tag_set.timeout = 0; + nbd->num_connections = 0; + atomic_set(&nbd->recv_threads, 0); queue_flag_clear_unlocked(QUEUE_FLAG_DISCARD, nbd->disk->queue); } @@ -585,8 +582,7 @@ static void nbd_parse_flags(struct nbd_device *nbd, struct block_device *bdev) static int nbd_dev_dbg_init(struct nbd_device *nbd); static void nbd_dev_dbg_close(struct nbd_device *nbd); -/* Must be called with tx_lock held */ - +/* Must be called with config_lock held */ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd, unsigned int cmd, unsigned long arg) { @@ -595,27 +591,33 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd, struct request *sreq; dev_info(disk_to_dev(nbd->disk), "NBD_DISCONNECT\n"); - if (!nbd->sock) + if (!nbd->socks) return -EINVAL; sreq = blk_mq_alloc_request(bdev_get_queue(bdev), WRITE, 0); if (!sreq) return -ENOMEM; - mutex_unlock(&nbd->tx_lock); + mutex_unlock(&nbd->config_lock); fsync_bdev(bdev); - mutex_lock(&nbd->tx_lock); + mutex_lock(&nbd->config_lock); sreq->cmd_type = REQ_TYPE_DRV_PRIV; /* Check again after getting mutex back. */ - if (!nbd->sock) { + if (!nbd->socks) { blk_mq_free_request(sreq); return -EINVAL; } set_bit(NBD_DISCONNECT_REQUESTED, &nbd->runtime_flags); - nbd_send_cmd(nbd, blk_mq_rq_to_pdu(sreq)); + /* + * Since we are holding the config lock here we can't do the + * timeout work, so if this fails just shutdown the socks. + */ + if (blk_execute_rq(sreq->q, NULL, sreq, 0) && + !test_bit(NBD_DISCONNECTED, &nbd->runtime_flags)) + sock_shutdown(nbd); blk_mq_free_request(sreq); return 0; } @@ -633,7 +635,7 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd, if (!sock) return err; - err = nbd_set_socket(nbd, sock); + err = nbd_add_socket(nbd, sock); if (!err && max_part) bdev->bd_invalidated = 1; @@ -662,26 +664,45 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd, return 0; case NBD_DO_IT: { - int error; + struct recv_thread_args *args; + int num_connections = nbd->num_connections; + int error, i; if (nbd->task_recv) return -EBUSY; - if (!nbd->sock) + if (!nbd->socks) return -EINVAL; - /* We have to claim the device under the lock */ + blk_mq_update_nr_hw_queues(&nbd->tag_set, nbd->num_connections); + args = kcalloc(num_connections, sizeof(*args), GFP_KERNEL); + if (!args) + goto out_err; nbd->task_recv = current; - mutex_unlock(&nbd->tx_lock); + mutex_unlock(&nbd->config_lock); nbd_parse_flags(nbd, bdev); + nbd_size_update(nbd, bdev); + nbd_dev_dbg_init(nbd); - error = nbd_thread_recv(nbd, bdev); + for (i = 0; i < num_connections; i++) { + sk_set_memalloc(nbd->socks[i]->sock->sk); + atomic_inc(&nbd->recv_threads); + INIT_WORK(&args[i].work, recv_work); + args[i].nbd = nbd; + args[i].index = i; + queue_work(system_long_wq, &args[i].work); + } + wait_event_interruptible(nbd->recv_wq, + atomic_read(&nbd->recv_threads) == 0); + for (i = 0; i < num_connections; i++) + flush_work(&args[i].work); nbd_dev_dbg_close(nbd); + nbd_size_clear(nbd, bdev); - mutex_lock(&nbd->tx_lock); + mutex_lock(&nbd->config_lock); nbd->task_recv = NULL; - +out_err: sock_shutdown(nbd); nbd_clear_que(nbd); kill_bdev(bdev); @@ -726,9 +747,9 @@ static int nbd_ioctl(struct block_device *bdev, fmode_t mode, BUG_ON(nbd->magic != NBD_MAGIC); - mutex_lock(&nbd->tx_lock); + mutex_lock(&nbd->config_lock); error = __nbd_ioctl(bdev, nbd, cmd, arg); - mutex_unlock(&nbd->tx_lock); + mutex_unlock(&nbd->config_lock); return error; } @@ -748,8 +769,6 @@ static int nbd_dbg_tasks_show(struct seq_file *s, void *unused) if (nbd->task_recv) seq_printf(s, "recv: %d\n", task_pid_nr(nbd->task_recv)); - if (nbd->task_send) - seq_printf(s, "send: %d\n", task_pid_nr(nbd->task_send)); return 0; } @@ -875,7 +894,7 @@ static int nbd_init_request(void *data, struct request *rq, struct nbd_cmd *cmd = blk_mq_rq_to_pdu(rq); cmd->nbd = data; - INIT_LIST_HEAD(&cmd->list); + cmd->index = hctx_idx; return 0; } @@ -986,13 +1005,13 @@ static int __init nbd_init(void) for (i = 0; i < nbds_max; i++) { struct gendisk *disk = nbd_dev[i].disk; nbd_dev[i].magic = NBD_MAGIC; - spin_lock_init(&nbd_dev[i].sock_lock); - mutex_init(&nbd_dev[i].tx_lock); + mutex_init(&nbd_dev[i].config_lock); disk->major = NBD_MAJOR; disk->first_minor = i << part_shift; disk->fops = &nbd_fops; disk->private_data = &nbd_dev[i]; sprintf(disk->disk_name, "nbd%d", i); + init_waitqueue_head(&nbd_dev[i].recv_wq); nbd_reset(&nbd_dev[i]); add_disk(disk); }