From patchwork Wed Apr 15 09:05:03 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Thumshirn X-Patchwork-Id: 11490643 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 409B5112C for ; Wed, 15 Apr 2020 09:07:05 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2982420775 for ; Wed, 15 Apr 2020 09:07:05 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="VzG2zTyE" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2408265AbgDOJHC (ORCPT ); Wed, 15 Apr 2020 05:07:02 -0400 Received: from esa6.hgst.iphmx.com ([216.71.154.45]:19515 "EHLO esa6.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2393851AbgDOJFZ (ORCPT ); Wed, 15 Apr 2020 05:05:25 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1586941525; x=1618477525; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=AP8UZmn5ljqVZ4+z4DUJXnQSKmNe/BWD06zNOt2Hu9Q=; b=VzG2zTyEH0GwwOBlMjZKkeBHggWi0fNag6mpeDoYL1jni6lvRfhG8/TW HfS284lcQUUfFUgdxKudXkndtFBJZUhSYstwfDETMvMHRxYWZ2BkntAG2 MvSX4+qt+dieyo6W4XgjgGZgzZN7XNOlIJZf19eL/nDjKHvbeh36AFw2C T9v/R0prza9EI70l6qVCqWwUaaPQLeFg4aLz3pvajmaNcOhesb/2vbGep CeaW1GhNwpzWNSwPyqtqLkt8FEk1WNDCTpLPhpiNlty5CKsoyzm9Off2F 1xlKJG+xE7N/1HNJdUHbTYcWHwbDRxoM5880ttl4W3tQivgjhkiL01bYT Q==; IronPort-SDR: BoaS3ax6/BkDT9G2wTVXayM6EDiVoqcPXACFJCFmwvsvGI6b553jijofHGfjT/2tdI6A4o/133 0wO+u/kbc4T0fsk+6UQpD3GLVhWLIkc7LZd21F38R/iYYPmbrbnhzaWUTkoSc0Jn4y0YNlwc2D 6lviQSrz/2KlAhSflQ3pWbaiYINoC7Mrf2zZGZoQLzmaBWToQ2ynUN3Cj447TuLCuLc6cGu6BL tFKSmFj4vU8RSrm9o29qbC2SSNp3I4qSMflWqyDBcw30bJDhfhpnMTVVrrLqqmbSGIqAMRnjzD mRU= X-IronPort-AV: E=Sophos;i="5.72,386,1580745600"; d="scan'208";a="136802968" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 15 Apr 2020 17:05:24 +0800 IronPort-SDR: B0d7AqiFS1GjREfoaSJb+RXjmNj0AWJA1Gy8aev2xHiTtcM722UQhRfBpNAYBXU2Jy2qe2Vl89 T8S0bG5/TZrORrvMMvRjoRSJDNI2ZBZpCX9QWY70BWkXX7JJcNkHz7mmPhPbp6IJilj3RnghVU TljiHPtxmNJm458dxPA0GJGrkGYQpCftHD9H3bycmUUcfNfNFh+JKz2I+e9Uj29U3Na1knqHHw WNIrv8se6LYgj1ZifLfocK+wgwOb8PuQGdcNcPSpcXMvYXE+YXpJKyddI80g0woqeJ8bW4sbL2 MDHDnhSgQF6WJfV06Jdvbmlz Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Apr 2020 01:56:26 -0700 IronPort-SDR: G1Az18EKRNd2RXo6CTVI1cHv05AAeOwFl453t9yBSflSgNJ2NdQ+zDYEfF3dY/55pMrCpQ1uZs yKOOGouhzkCtzkficmwp3HTbQ7Q7RavJnD6YchgCuPPV6ZncXuwdI6obYc+BcTjnfWfPlCKifl V6lel/CKL4CFt/pcNwf2cyYbkhV8FWZx3tCq5PvaQJnqUNh8oKY8IjpJ8A3WVWHlEX8Wf9XwRp MWk5r9Vg2rW6VGDyv6OIPaZm045/i/wYuPgwUrxXhDRWh9qVlTkWvTHtCHPfupF3RzIp08ZPRK wjc= WDCIronportException: Internal Received: from unknown (HELO redsun60.ssa.fujisawa.hgst.com) ([10.149.66.36]) by uls-op-cesaip01.wdc.com with ESMTP; 15 Apr 2020 02:05:23 -0700 From: Johannes Thumshirn To: Jens Axboe Cc: Christoph Hellwig , linux-block , Damien Le Moal , Keith Busch , "linux-scsi @ vger . kernel . org" , "Martin K . Petersen" , "linux-fsdevel @ vger . kernel . org" , Johannes Thumshirn Subject: [PATCH v6 01/11] scsi: free sgtables in case command setup fails Date: Wed, 15 Apr 2020 18:05:03 +0900 Message-Id: <20200415090513.5133-2-johannes.thumshirn@wdc.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200415090513.5133-1-johannes.thumshirn@wdc.com> References: <20200415090513.5133-1-johannes.thumshirn@wdc.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org In case scsi_setup_fs_cmnd() fails we're not freeing the sgtables allocated by scsi_init_io(), thus we leak the allocated memory. So free the sgtables allocated by scsi_init_io() in case scsi_setup_fs_cmnd() fails. Technically scsi_setup_scsi_cmnd() does not suffer from this problem, as it can only fail if scsi_init_io() fails, so it does not have sgtables allocated. But to maintain symmetry and as a measure of defensive programming, free the sgtables on scsi_setup_scsi_cmnd() failure as well. scsi_mq_free_sgtables() has safeguards against double-freeing of memory so this is safe to do. While we're at it, rename scsi_mq_free_sgtables() to scsi_free_sgtables(). Signed-off-by: Johannes Thumshirn Reviewed-by: Christoph Hellwig Reviewed-by: Daniel Wagner --- drivers/scsi/scsi_lib.c | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 47835c4b4ee0..ad97369ffabd 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -548,7 +548,7 @@ static void scsi_uninit_cmd(struct scsi_cmnd *cmd) } } -static void scsi_mq_free_sgtables(struct scsi_cmnd *cmd) +static void scsi_free_sgtables(struct scsi_cmnd *cmd) { if (cmd->sdb.table.nents) sg_free_table_chained(&cmd->sdb.table, @@ -560,7 +560,7 @@ static void scsi_mq_free_sgtables(struct scsi_cmnd *cmd) static void scsi_mq_uninit_cmd(struct scsi_cmnd *cmd) { - scsi_mq_free_sgtables(cmd); + scsi_free_sgtables(cmd); scsi_uninit_cmd(cmd); } @@ -1059,7 +1059,7 @@ blk_status_t scsi_init_io(struct scsi_cmnd *cmd) return BLK_STS_OK; out_free_sgtables: - scsi_mq_free_sgtables(cmd); + scsi_free_sgtables(cmd); return ret; } EXPORT_SYMBOL(scsi_init_io); @@ -1190,6 +1190,7 @@ static blk_status_t scsi_setup_cmnd(struct scsi_device *sdev, struct request *req) { struct scsi_cmnd *cmd = blk_mq_rq_to_pdu(req); + blk_status_t ret; if (!blk_rq_bytes(req)) cmd->sc_data_direction = DMA_NONE; @@ -1199,9 +1200,14 @@ static blk_status_t scsi_setup_cmnd(struct scsi_device *sdev, cmd->sc_data_direction = DMA_FROM_DEVICE; if (blk_rq_is_scsi(req)) - return scsi_setup_scsi_cmnd(sdev, req); + ret = scsi_setup_scsi_cmnd(sdev, req); else - return scsi_setup_fs_cmnd(sdev, req); + ret = scsi_setup_fs_cmnd(sdev, req); + + if (ret != BLK_STS_OK) + scsi_free_sgtables(cmd); + + return ret; } static blk_status_t From patchwork Wed Apr 15 09:05:04 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Thumshirn X-Patchwork-Id: 11490623 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A009D112C for ; Wed, 15 Apr 2020 09:06:46 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 868EE20768 for ; Wed, 15 Apr 2020 09:06:46 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="A0sAm42l" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2408235AbgDOJFl (ORCPT ); Wed, 15 Apr 2020 05:05:41 -0400 Received: from esa6.hgst.iphmx.com ([216.71.154.45]:19517 "EHLO esa6.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2393856AbgDOJF0 (ORCPT ); Wed, 15 Apr 2020 05:05:26 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1586941526; x=1618477526; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=rp/9OsypQVA9Jov9x+82RNHkk+PbM/KfEuomMQ2uw8w=; b=A0sAm42lq0gRASBUZB+ykduHMQQBgD2IV9y6glZ2BrH9qvvQgmBoNqNr 3iNfHyx6MGv+3+cHPmjbl+E3km9Bi/T++FerSWrzXMXgWMO/HLzf1LlZl mDfUwCbgM0xF9cLHgMLndpPv0t+m9fu23b0rCw8GD8U5j67F920IlK7H2 4PZuF4vwWPJ7BTxtzrM3joXGiouJMEFMORbSHfgRCEHSR7vGX+H0Krq41 EqTBtwjJUEVDmVmiyCg74WltJmUYwGGAu7pR9JXO+4ckHhyZsfS7eevHL HsYgOU3ssFdeZfKwnbdlG9kgPw3uYa7KojGqpSw1Jk+8X+rGAzxW+uE1X g==; IronPort-SDR: 6U4e3Vn63ymwZGEaIf/guLMST5YhsL0FcgjxG4njL7hP18gw5CIHIMZl3MNtYTazjphOt6oM3M 0qDGT4Opay+3JUDrJBjPb/BuOAqdSay4XfUZDzEh+f5Fz6fcO3y98YQXE47PchKHhLYNSJHBqS tyzVa6BQTQ3jwCUNMMLPa7QQ7hVfVIbjGs+uFdDdaKRGKBYhERr+fpJ00EqMhKvOiK/SSYRTJX BMvYTCXHcbcCTAyRiVmuTYrUTu9wtIo/PGNzdAW/fhFK/Y7mq447Y9J5Kp3HSKa2bAcBx5SiRV 8LQ= X-IronPort-AV: E=Sophos;i="5.72,386,1580745600"; d="scan'208";a="136802970" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 15 Apr 2020 17:05:26 +0800 IronPort-SDR: Hbe9sqm6137kgcXdKENp81+0422rfOoid94+3d7ovTIom+0s6iF59wuz2eRHko/pw6XLge4z26 Gss4U1DKKdVGaB6kCljqvL+538t147uWulvA/hNoJj9kFrCbTxOuj9aNZtpf+TBaCjqQau7WLu i6TKKwYo2Eo2g1vDetea33ckx6e7WjkUcLOJ1eFv+R5w4UQLm4tHDEHOjtRKNtUJCQiiKOPiLc 6pYJRsKrziJcwTQOSG2IgqsjwZFWgYOYFUpVn6twBHfEEfgbyfzgYEu6tz5Jnni9LolJiXZjJb EsaMttPnSCOw3b4p9lJPIWm6 Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Apr 2020 01:56:28 -0700 IronPort-SDR: kuQf5tUEWAIbNPwKQS52xE2aOUQ3DY7v6ecMACcizaAMYMYBixSjwwFBibdCCPBsV8WEPUB5Rf C3xTCGPxQQVVIUQU+t5vPzUQwSodnubFSJYtiDezk/zXodz8Onr79FbnLzos0Hw17FIx5qNgxu hZ9Ob82r0+ZTOby6Iy6KNNz6t0qhfeVA26RQ4RQbUdhEOmK9YTRnxHdtOqQFkGZekXh6FtPlpF TT2F78UOoJmbhTwxjDyJJax+oSDPV6eQxNx6Y8KGoTMfsh2lVtGjx4ZIPKc0SOdIuNFMIQUg17 nn0= WDCIronportException: Internal Received: from unknown (HELO redsun60.ssa.fujisawa.hgst.com) ([10.149.66.36]) by uls-op-cesaip01.wdc.com with ESMTP; 15 Apr 2020 02:05:25 -0700 From: Johannes Thumshirn To: Jens Axboe Cc: Christoph Hellwig , linux-block , Damien Le Moal , Keith Busch , "linux-scsi @ vger . kernel . org" , "Martin K . Petersen" , "linux-fsdevel @ vger . kernel . org" , Johannes Thumshirn , Christoph Hellwig Subject: [PATCH v6 02/11] block: provide fallbacks for blk_queue_zone_is_seq and blk_queue_zone_no Date: Wed, 15 Apr 2020 18:05:04 +0900 Message-Id: <20200415090513.5133-3-johannes.thumshirn@wdc.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200415090513.5133-1-johannes.thumshirn@wdc.com> References: <20200415090513.5133-1-johannes.thumshirn@wdc.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org blk_queue_zone_is_seq() and blk_queue_zone_no() have not been called with CONFIG_BLK_DEV_ZONED disabled until now. The introduction of REQ_OP_ZONE_APPEND will change this, so we need to provide noop fallbacks for the !CONFIG_BLK_DEV_ZONED case. Signed-off-by: Johannes Thumshirn Reviewed-by: Christoph Hellwig Reviewed-by: Daniel Wagner --- include/linux/blkdev.h | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 32868fbedc9e..e47888a7d80b 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -729,6 +729,16 @@ static inline unsigned int blk_queue_nr_zones(struct request_queue *q) { return 0; } +static inline bool blk_queue_zone_is_seq(struct request_queue *q, + sector_t sector) +{ + return false; +} +static inline unsigned int blk_queue_zone_no(struct request_queue *q, + sector_t sector) +{ + return 0; +} #endif /* CONFIG_BLK_DEV_ZONED */ static inline bool rq_is_sync(struct request *rq) From patchwork Wed Apr 15 09:05:05 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Thumshirn X-Patchwork-Id: 11490637 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id ACFDE112C for ; Wed, 15 Apr 2020 09:07:00 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 91FF920768 for ; Wed, 15 Apr 2020 09:07:00 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="Hlt8zOFT" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2393892AbgDOJG4 (ORCPT ); Wed, 15 Apr 2020 05:06:56 -0400 Received: from esa6.hgst.iphmx.com ([216.71.154.45]:19515 "EHLO esa6.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2393857AbgDOJF3 (ORCPT ); Wed, 15 Apr 2020 05:05:29 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1586941529; x=1618477529; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Mbrcp3ypiorlyjCGTv6YLqgtm02r3Gc8tme5xzZKrHA=; b=Hlt8zOFTKECAXP+pTq5rs7Qsx20qHGysccv0vKpC6SMybLu3iOKOlwqP GOOBdxvlsgejeEpvT8nKjfKGuLes/1mOWN0YaSP3be3X2Nk/51ktzBClU JYbgY1Mht+Z92zMae21x7WiPZuVGkoJfwQ0vqTQH+mxMEs3/bi25gcx63 j19OuD2XWoexu6MDBQpsODy3rQXzetHiWsW8Lp/u3FpuyQxoDQUNC88od XnXz/iavxR6uTJISGyd9ROCIE8ol1in6Tsi0gAvLUbUJDx+NxOjokBx2e 7//r0WBLB6mCz1X+QOmNbqUGCPXzDUb5EuNzhAcdXJnlb9oU3QQkBaXDW g==; IronPort-SDR: E9WNweG6Ez8pX76lV9DDNCMF8Py418pYm3MGGNyJ9jMBXHR1Yc4QCCxKDHrPzaJXHtJI8cYXQ1 Wjv/yz7VOr7NiA5rQxCz/3JQsHzUOnTOYlxFfG2x5HLnPyr9HRA8xj8+R73WaVocc3GGpCMQje hPURLG1K52RVL4PaHDyREUPLd4zEiMz1fkZPiMbjKnreXn0xsjJvduiv7ihepjoc3YiyLIMhDx FLM76UeB3Ger58jmFLC3TpfRfrOyXgI+kcSFGta7UeDsLudlFSQqFFACluY9QjBrW50sN+aO5I Do0= X-IronPort-AV: E=Sophos;i="5.72,386,1580745600"; d="scan'208";a="136802978" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 15 Apr 2020 17:05:28 +0800 IronPort-SDR: qHMoKFGiOwTaHdIWlC6DzyyN6mryQ9srUt4qWUTGWtGJdCgF75VfiTyVzBbcyubcNJvqrPDvY0 CYevue9OTLVJdVKostY7TXrkargyjjrCAgEThor0SIXOS1rFUSJks79Hb492z2VYolO3Za1FiD W7IDowOu/A225Tr7MulWmtRSuVArmlCoVroLVRliTGsfscX1dPSXP0D05Ck8PycLuX2uqPkdoh iUz5RjgpKahqbUOH5tdOoN2jdnmefE83/mPRa/PbYA1VIi3At0zmtUoyYV7K50aTZ9cXFRZM3S AIPLQH31MKug004W5ccTFoze Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Apr 2020 01:56:30 -0700 IronPort-SDR: ZoFjElupDOGkSKXbyWTOSQa3q808I+jY4LULt5ezK58+5mY7bDBe8lKPuvf3R2vayQcZrGHOL7 Ve3pVA0yHTFILWAVtqKmWhuwJoXpevC+/qYq56NO0LoOGQ7oslQEBil3OUHcLuteqVIHiOq9+/ LPy+UOdPadliLku1JkLZKtN+Rr42v6zNsFeN7tl4uXidxyGFjvnswq6JPLb+I3w1yXN17nAN/N GLbNWryOjNbC4WNfuHoqL3ZUmOlw9MRjBeiKmVpq8LifPVD2LRulCJiWxBNy4fHNkddla3us2J xcY= WDCIronportException: Internal Received: from unknown (HELO redsun60.ssa.fujisawa.hgst.com) ([10.149.66.36]) by uls-op-cesaip01.wdc.com with ESMTP; 15 Apr 2020 02:05:27 -0700 From: Johannes Thumshirn To: Jens Axboe Cc: Christoph Hellwig , linux-block , Damien Le Moal , Keith Busch , "linux-scsi @ vger . kernel . org" , "Martin K . Petersen" , "linux-fsdevel @ vger . kernel . org" , Christoph Hellwig , Johannes Thumshirn Subject: [PATCH v6 03/11] block: rename __bio_add_pc_page to bio_add_hw_page Date: Wed, 15 Apr 2020 18:05:05 +0900 Message-Id: <20200415090513.5133-4-johannes.thumshirn@wdc.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200415090513.5133-1-johannes.thumshirn@wdc.com> References: <20200415090513.5133-1-johannes.thumshirn@wdc.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Christoph Hellwig Rename __bio_add_pc_page() to bio_add_hw_page() and explicitly pass in a max_sectors argument. This max_sectors argument can be used to specify constraints from the hardware. Signed-off-by: Christoph Hellwig [ jth: rebased and made public for blk-map.c ] Signed-off-by: Johannes Thumshirn Reviewed-by: Daniel Wagner --- block/bio.c | 60 +++++++++++++++++++++++++++---------------------- block/blk-map.c | 5 +++-- block/blk.h | 4 ++-- 3 files changed, 38 insertions(+), 31 deletions(-) diff --git a/block/bio.c b/block/bio.c index 21cbaa6a1c20..0f0e337e46b4 100644 --- a/block/bio.c +++ b/block/bio.c @@ -748,9 +748,14 @@ static inline bool page_is_mergeable(const struct bio_vec *bv, return true; } -static bool bio_try_merge_pc_page(struct request_queue *q, struct bio *bio, - struct page *page, unsigned len, unsigned offset, - bool *same_page) +/* + * Try to merge a page into a segment, while obeying the hardware segment + * size limit. This is not for normal read/write bios, but for passthrough + * or Zone Append operations that we can't split. + */ +static bool bio_try_merge_hw_seg(struct request_queue *q, struct bio *bio, + struct page *page, unsigned len, + unsigned offset, bool *same_page) { struct bio_vec *bv = &bio->bi_io_vec[bio->bi_vcnt - 1]; unsigned long mask = queue_segment_boundary(q); @@ -764,39 +769,24 @@ static bool bio_try_merge_pc_page(struct request_queue *q, struct bio *bio, return __bio_try_merge_page(bio, page, len, offset, same_page); } -/** - * __bio_add_pc_page - attempt to add page to passthrough bio - * @q: the target queue - * @bio: destination bio - * @page: page to add - * @len: vec entry length - * @offset: vec entry offset - * @same_page: return if the merge happen inside the same page - * - * Attempt to add a page to the bio_vec maplist. This can fail for a - * number of reasons, such as the bio being full or target block device - * limitations. The target block device must allow bio's up to PAGE_SIZE, - * so it is always possible to add a single page to an empty bio. - * - * This should only be used by passthrough bios. +/* + * Add a page to a bio while respecting the hardware max_sectors, max_segment + * and gap limitations. */ -int __bio_add_pc_page(struct request_queue *q, struct bio *bio, +int bio_add_hw_page(struct request_queue *q, struct bio *bio, struct page *page, unsigned int len, unsigned int offset, - bool *same_page) + unsigned int max_sectors, bool *same_page) { struct bio_vec *bvec; - /* - * cloned bio must not modify vec list - */ - if (unlikely(bio_flagged(bio, BIO_CLONED))) + if (WARN_ON_ONCE(bio_flagged(bio, BIO_CLONED))) return 0; - if (((bio->bi_iter.bi_size + len) >> 9) > queue_max_hw_sectors(q)) + if (((bio->bi_iter.bi_size + len) >> 9) > max_sectors) return 0; if (bio->bi_vcnt > 0) { - if (bio_try_merge_pc_page(q, bio, page, len, offset, same_page)) + if (bio_try_merge_hw_seg(q, bio, page, len, offset, same_page)) return len; /* @@ -823,11 +813,27 @@ int __bio_add_pc_page(struct request_queue *q, struct bio *bio, return len; } +/** + * bio_add_pc_page - attempt to add page to passthrough bio + * @q: the target queue + * @bio: destination bio + * @page: page to add + * @len: vec entry length + * @offset: vec entry offset + * + * Attempt to add a page to the bio_vec maplist. This can fail for a + * number of reasons, such as the bio being full or target block device + * limitations. The target block device must allow bio's up to PAGE_SIZE, + * so it is always possible to add a single page to an empty bio. + * + * This should only be used by passthrough bios. + */ int bio_add_pc_page(struct request_queue *q, struct bio *bio, struct page *page, unsigned int len, unsigned int offset) { bool same_page = false; - return __bio_add_pc_page(q, bio, page, len, offset, &same_page); + return bio_add_hw_page(q, bio, page, len, offset, + queue_max_hw_sectors(q), &same_page); } EXPORT_SYMBOL(bio_add_pc_page); diff --git a/block/blk-map.c b/block/blk-map.c index b72c361911a4..f36ff496a761 100644 --- a/block/blk-map.c +++ b/block/blk-map.c @@ -257,6 +257,7 @@ static struct bio *bio_copy_user_iov(struct request_queue *q, static struct bio *bio_map_user_iov(struct request_queue *q, struct iov_iter *iter, gfp_t gfp_mask) { + unsigned int max_sectors = queue_max_hw_sectors(q); int j; struct bio *bio; int ret; @@ -294,8 +295,8 @@ static struct bio *bio_map_user_iov(struct request_queue *q, if (n > bytes) n = bytes; - if (!__bio_add_pc_page(q, bio, page, n, offs, - &same_page)) { + if (!bio_add_hw_page(q, bio, page, n, offs, + max_sectors, &same_page)) { if (same_page) put_page(page); break; diff --git a/block/blk.h b/block/blk.h index 0a94ec68af32..ba31511c5243 100644 --- a/block/blk.h +++ b/block/blk.h @@ -484,8 +484,8 @@ static inline void part_nr_sects_write(struct hd_struct *part, sector_t size) struct request_queue *__blk_alloc_queue(int node_id); -int __bio_add_pc_page(struct request_queue *q, struct bio *bio, +int bio_add_hw_page(struct request_queue *q, struct bio *bio, struct page *page, unsigned int len, unsigned int offset, - bool *same_page); + unsigned int max_sectors, bool *same_page); #endif /* BLK_INTERNAL_H */ From patchwork Wed Apr 15 09:05:06 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Thumshirn X-Patchwork-Id: 11490649 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8FDF3112C for ; Wed, 15 Apr 2020 09:07:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6CED620775 for ; Wed, 15 Apr 2020 09:07:08 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="QgOIUAmg" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2408257AbgDOJGy (ORCPT ); Wed, 15 Apr 2020 05:06:54 -0400 Received: from esa6.hgst.iphmx.com ([216.71.154.45]:19517 "EHLO esa6.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2393859AbgDOJFc (ORCPT ); Wed, 15 Apr 2020 05:05:32 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1586941531; x=1618477531; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=WkTbo13psw0aKgpLjsC8eayOR9p9v26Qqgj/J8f7T7M=; b=QgOIUAmgP3rL46Ujp3Nyo+HamDY9ls4AcrTGypvISxi1VkYyIlQESqGj DteQKdIfXnwSac9CWt4/88AeToK2LEdwLusQdq6w3752rEiLZejxLNeaW tXW8ZQX/YKns4niAraBK7MxRZW51moVP8TUVsjmqBcZfuIkwYRzDz40xQ TtwLQwh2wYOcDijrv6JFC3OSOieWT9xINbtxQaxeQttjhy11CCdTbn5XR f+qKd6Xaecg24Oe3FqQVb+rqACbOpr6FjfHZrebXEyLKSDMUyhdIAy7X0 7HIhQhR5cAY6e4BnRqpgPlCHbvcKfbvLao0x9iZdQp90tvmuiYvU8zqPo g==; IronPort-SDR: xtpKNkOj0OoQKiPIwX7v3gj7fOtnrw/ZbXCp5CPXnX42esLuEcWRl6ea6KAsNbZPZTuwd7QZbD rPMD9ycXPKf7NK0cLun3qUmgUjLNAeo/FVbOAzebbquoVSntHafdhlnYh3YFKBAXAZdlYb5W7f wGkglSD8oSIrBmxk7+XK7zNQr7pro3zHz0tl43wBFPBGLe0TRnVRH0+aXv71T39M2k/rDL3zlu qtOeijryHBrTEKw+vHFgr5vV0YWfuRpNo60eEhV0XGJzpMIpi6yiiBdtdndQyl3CtSF3hchyMc lu8= X-IronPort-AV: E=Sophos;i="5.72,386,1580745600"; d="scan'208";a="136802981" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 15 Apr 2020 17:05:30 +0800 IronPort-SDR: ijN98C+QizZVIdnAjYYR7t3CY8aOrYGWwlOjRtrKZIWDV5GSN5LFJuNV9aXZgoJUszV6uToIhZ bsA5JAzFRyIQX/joBCKWiUDOkmgS4ysk3B9MWJi1Co7MKCJ1FO2iaGRoy+ogKy1jIfuvEx/G14 2RzZZAd75pPtltcP3sRCN2rS55YVVbNEbBQxEW55BuJN3tl+ot5aQOQCvkkq9jvzcGiy2BeDuR 2i6sLQ+1JEgY55UelU18zlYckwjAx4WdM8K86OUgspfo8hN9GL42wxvF+jK+icqAAG9HT0vIGb R2VRCUgVeCm/RHDZO824Ds0v Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Apr 2020 01:56:31 -0700 IronPort-SDR: K7cbigPBWgWbIbGVQfh0qAS7/f64aXKjBNKnRA4Xjg8MAbjUD+oDy+bJkpupNfRepODw16cRa2 A2/JozY6/0FZm7qy87pKiO7OPL2hjcJXbGNXH2DyAYTPMuS+L9+6XrgGUWdioB77zJwIlbdzAc UYgUVNmK/j9/0eNfZ+Kt3gmFhYCehWaemNu/7KwcGAJlEzgUdcJFOIEFNvPyrhoiMDWWxgGYa2 YUny5Un8o9IJAaRiZf+lJ6uf9JMPcIISknm+eLRa9iO6PxfFPyPyimkTbVPhnczxSHJEu3UgkW wUQ= WDCIronportException: Internal Received: from unknown (HELO redsun60.ssa.fujisawa.hgst.com) ([10.149.66.36]) by uls-op-cesaip01.wdc.com with ESMTP; 15 Apr 2020 02:05:29 -0700 From: Johannes Thumshirn To: Jens Axboe Cc: Christoph Hellwig , linux-block , Damien Le Moal , Keith Busch , "linux-scsi @ vger . kernel . org" , "Martin K . Petersen" , "linux-fsdevel @ vger . kernel . org" , Johannes Thumshirn Subject: [PATCH v6 04/11] block: Introduce REQ_OP_ZONE_APPEND Date: Wed, 15 Apr 2020 18:05:06 +0900 Message-Id: <20200415090513.5133-5-johannes.thumshirn@wdc.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200415090513.5133-1-johannes.thumshirn@wdc.com> References: <20200415090513.5133-1-johannes.thumshirn@wdc.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Keith Busch Define REQ_OP_ZONE_APPEND to append-write sectors to a zone of a zoned block device. This is a no-merge write operation. A zone append write BIO must: * Target a zoned block device * Have a sector position indicating the start sector of the target zone * The target zone must be a sequential write zone * The BIO must not cross a zone boundary * The BIO size must not be split to ensure that a single range of LBAs is written with a single command. Implement these checks in generic_make_request_checks() using the helper function blk_check_zone_append(). To avoid write append BIO splitting, introduce the new max_zone_append_sectors queue limit attribute and ensure that a BIO size is always lower than this limit. Export this new limit through sysfs and check these limits in bio_full(). Also when a LLDD can't dispatch a request to a specific zone, it will return BLK_STS_ZONE_RESOURCE indicating this request needs to be delayed, e.g. because the zone it will be dispatched to is still write-locked. If this happens set the request aside in a local list to continue trying dispatching requests such as READ requests or a WRITE/ZONE_APPEND requests targetting other zones. This way we can still keep a high queue depth without starving other requests even if one request can't be served due to zone write-locking. Finally, make sure that the bio sector position indicates the actual write position as indicated by the device on completion. Signed-off-by: Keith Busch [ jth: added zone-append specific add_page and merge_page helpers ] Signed-off-by: Johannes Thumshirn Reviewed-by: Christoph Hellwig Reviewed-by: Daniel Wagner --- block/bio.c | 65 +++++++++++++++++++++++++++++++++++---- block/blk-core.c | 52 +++++++++++++++++++++++++++++++ block/blk-mq.c | 27 ++++++++++++++++ block/blk-settings.c | 23 ++++++++++++++ block/blk-sysfs.c | 13 ++++++++ drivers/scsi/scsi_lib.c | 1 + include/linux/blk_types.h | 14 +++++++++ include/linux/blkdev.h | 11 +++++++ 8 files changed, 200 insertions(+), 6 deletions(-) diff --git a/block/bio.c b/block/bio.c index 0f0e337e46b4..ba10076ced9c 100644 --- a/block/bio.c +++ b/block/bio.c @@ -1000,13 +1000,12 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) struct page *page = pages[i]; len = min_t(size_t, PAGE_SIZE - offset, left); - if (__bio_try_merge_page(bio, page, len, offset, &same_page)) { if (same_page) put_page(page); } else { if (WARN_ON_ONCE(bio_full(bio, len))) - return -EINVAL; + return -EINVAL; __bio_add_page(bio, page, len, offset); } offset = 0; @@ -1016,6 +1015,50 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) return 0; } +static int __bio_iov_append_get_pages(struct bio *bio, struct iov_iter *iter) +{ + unsigned short nr_pages = bio->bi_max_vecs - bio->bi_vcnt; + unsigned short entries_left = bio->bi_max_vecs - bio->bi_vcnt; + struct request_queue *q = bio->bi_disk->queue; + unsigned int max_append_sectors = queue_max_zone_append_sectors(q); + struct bio_vec *bv = bio->bi_io_vec + bio->bi_vcnt; + struct page **pages = (struct page **)bv; + ssize_t size, left; + unsigned len, i; + size_t offset; + + if (WARN_ON_ONCE(!max_append_sectors)) + return 0; + + /* + * Move page array up in the allocated memory for the bio vecs as far as + * possible so that we can start filling biovecs from the beginning + * without overwriting the temporary page array. + */ + BUILD_BUG_ON(PAGE_PTRS_PER_BVEC < 2); + pages += entries_left * (PAGE_PTRS_PER_BVEC - 1); + + size = iov_iter_get_pages(iter, pages, LONG_MAX, nr_pages, &offset); + if (unlikely(size <= 0)) + return size ? size : -EFAULT; + + for (left = size, i = 0; left > 0; left -= len, i++) { + struct page *page = pages[i]; + bool same_page = false; + + len = min_t(size_t, PAGE_SIZE - offset, left); + if (bio_add_hw_page(q, bio, page, len, offset, + max_append_sectors, &same_page) != len) + return -EINVAL; + if (same_page) + put_page(page); + offset = 0; + } + + iov_iter_advance(iter, size); + return 0; +} + /** * bio_iov_iter_get_pages - add user or kernel pages to a bio * @bio: bio to add pages to @@ -1045,10 +1088,16 @@ int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) return -EINVAL; do { - if (is_bvec) - ret = __bio_iov_bvec_add_pages(bio, iter); - else - ret = __bio_iov_iter_get_pages(bio, iter); + if (bio_op(bio) == REQ_OP_ZONE_APPEND) { + if (WARN_ON_ONCE(is_bvec)) + return -EINVAL; + ret = __bio_iov_append_get_pages(bio, iter); + } else { + if (is_bvec) + ret = __bio_iov_bvec_add_pages(bio, iter); + else + ret = __bio_iov_iter_get_pages(bio, iter); + } } while (!ret && iov_iter_count(iter) && !bio_full(bio, 0)); if (is_bvec) @@ -1451,6 +1500,10 @@ struct bio *bio_split(struct bio *bio, int sectors, BUG_ON(sectors <= 0); BUG_ON(sectors >= bio_sectors(bio)); + /* Zone append commands cannot be split */ + if (WARN_ON_ONCE(bio_op(bio) == REQ_OP_ZONE_APPEND)) + return NULL; + split = bio_clone_fast(bio, gfp, bs); if (!split) return NULL; diff --git a/block/blk-core.c b/block/blk-core.c index 7e4a1da0715e..34fe47a728c3 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -135,6 +135,7 @@ static const char *const blk_op_name[] = { REQ_OP_NAME(ZONE_OPEN), REQ_OP_NAME(ZONE_CLOSE), REQ_OP_NAME(ZONE_FINISH), + REQ_OP_NAME(ZONE_APPEND), REQ_OP_NAME(WRITE_SAME), REQ_OP_NAME(WRITE_ZEROES), REQ_OP_NAME(SCSI_IN), @@ -240,6 +241,17 @@ static void req_bio_endio(struct request *rq, struct bio *bio, bio_advance(bio, nbytes); + if (req_op(rq) == REQ_OP_ZONE_APPEND && error == BLK_STS_OK) { + /* + * Partial zone append completions cannot be supported as the + * BIO fragments may end up not being written sequentially. + */ + if (bio->bi_iter.bi_size) + bio->bi_status = BLK_STS_IOERR; + else + bio->bi_iter.bi_sector = rq->__sector; + } + /* don't actually finish bio if it's part of flush sequence */ if (bio->bi_iter.bi_size == 0 && !(rq->rq_flags & RQF_FLUSH_SEQ)) bio_endio(bio); @@ -871,6 +883,41 @@ static inline int blk_partition_remap(struct bio *bio) return ret; } +/* + * Check write append to a zoned block device. + */ +static inline blk_status_t blk_check_zone_append(struct request_queue *q, + struct bio *bio) +{ + sector_t pos = bio->bi_iter.bi_sector; + int nr_sectors = bio_sectors(bio); + + /* Only applicable to zoned block devices */ + if (!blk_queue_is_zoned(q)) + return BLK_STS_NOTSUPP; + + /* The bio sector must point to the start of a sequential zone */ + if (pos & (blk_queue_zone_sectors(q) - 1) || + !blk_queue_zone_is_seq(q, pos)) + return BLK_STS_IOERR; + + /* + * Not allowed to cross zone boundaries. Otherwise, the BIO will be + * split and could result in non-contiguous sectors being written in + * different zones. + */ + if (blk_queue_zone_no(q, pos) != blk_queue_zone_no(q, pos + nr_sectors)) + return BLK_STS_IOERR; + + /* Make sure the BIO is small enough and will not get split */ + if (nr_sectors > q->limits.max_zone_append_sectors) + return BLK_STS_IOERR; + + bio->bi_opf |= REQ_NOMERGE; + + return BLK_STS_OK; +} + static noinline_for_stack bool generic_make_request_checks(struct bio *bio) { @@ -943,6 +990,11 @@ generic_make_request_checks(struct bio *bio) if (!q->limits.max_write_same_sectors) goto not_supported; break; + case REQ_OP_ZONE_APPEND: + status = blk_check_zone_append(q, bio); + if (status != BLK_STS_OK) + goto end_io; + break; case REQ_OP_ZONE_RESET: case REQ_OP_ZONE_OPEN: case REQ_OP_ZONE_CLOSE: diff --git a/block/blk-mq.c b/block/blk-mq.c index 8e56884fd2e9..50e216a218ee 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1195,6 +1195,19 @@ static void blk_mq_handle_dev_resource(struct request *rq, __blk_mq_requeue_request(rq); } +static void blk_mq_handle_zone_resource(struct request *rq, + struct list_head *zone_list) +{ + /* + * If we end up here it is because we cannot dispatch a request to a + * specific zone due to LLD level zone-write locking or other zone + * related resource not being available. In this case, set the request + * aside in zone_list for retrying it later. + */ + list_add(&rq->queuelist, zone_list); + __blk_mq_requeue_request(rq); +} + /* * Returns true if we did some work AND can potentially do more. */ @@ -1206,6 +1219,7 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list, bool no_tag = false; int errors, queued; blk_status_t ret = BLK_STS_OK; + LIST_HEAD(zone_list); if (list_empty(list)) return false; @@ -1264,6 +1278,16 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list, if (ret == BLK_STS_RESOURCE || ret == BLK_STS_DEV_RESOURCE) { blk_mq_handle_dev_resource(rq, list); break; + } else if (ret == BLK_STS_ZONE_RESOURCE) { + /* + * Move the request to zone_list and keep going through + * the dispatch list to find more requests the drive can + * accept. + */ + blk_mq_handle_zone_resource(rq, &zone_list); + if (list_empty(list)) + break; + continue; } if (unlikely(ret != BLK_STS_OK)) { @@ -1275,6 +1299,9 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list, queued++; } while (!list_empty(list)); + if (!list_empty(&zone_list)) + list_splice_tail_init(&zone_list, list); + hctx->dispatched[queued_to_index(queued)]++; /* diff --git a/block/blk-settings.c b/block/blk-settings.c index 14397b4c4b53..58d5b49fb131 100644 --- a/block/blk-settings.c +++ b/block/blk-settings.c @@ -48,6 +48,7 @@ void blk_set_default_limits(struct queue_limits *lim) lim->chunk_sectors = 0; lim->max_write_same_sectors = 0; lim->max_write_zeroes_sectors = 0; + lim->max_zone_append_sectors = 0; lim->max_discard_sectors = 0; lim->max_hw_discard_sectors = 0; lim->discard_granularity = 0; @@ -83,6 +84,7 @@ void blk_set_stacking_limits(struct queue_limits *lim) lim->max_dev_sectors = UINT_MAX; lim->max_write_same_sectors = UINT_MAX; lim->max_write_zeroes_sectors = UINT_MAX; + lim->max_zone_append_sectors = UINT_MAX; } EXPORT_SYMBOL(blk_set_stacking_limits); @@ -221,6 +223,25 @@ void blk_queue_max_write_zeroes_sectors(struct request_queue *q, } EXPORT_SYMBOL(blk_queue_max_write_zeroes_sectors); +/** + * blk_queue_max_zone_append_sectors - set max sectors for a single zone append + * @q: the request queue for the device + * @max_zone_append_sectors: maximum number of sectors to write per command + **/ +void blk_queue_max_zone_append_sectors(struct request_queue *q, + unsigned int max_zone_append_sectors) +{ + unsigned int max_sectors; + + max_sectors = min(q->limits.max_hw_sectors, max_zone_append_sectors); + if (max_sectors) + max_sectors = min_not_zero(q->limits.chunk_sectors, + max_sectors); + + q->limits.max_zone_append_sectors = max_sectors; +} +EXPORT_SYMBOL_GPL(blk_queue_max_zone_append_sectors); + /** * blk_queue_max_segments - set max hw segments for a request for this queue * @q: the request queue for the device @@ -470,6 +491,8 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b, b->max_write_same_sectors); t->max_write_zeroes_sectors = min(t->max_write_zeroes_sectors, b->max_write_zeroes_sectors); + t->max_zone_append_sectors = min(t->max_zone_append_sectors, + b->max_zone_append_sectors); t->bounce_pfn = min_not_zero(t->bounce_pfn, b->bounce_pfn); t->seg_boundary_mask = min_not_zero(t->seg_boundary_mask, diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index fca9b158f4a0..02643e149d5e 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -218,6 +218,13 @@ static ssize_t queue_write_zeroes_max_show(struct request_queue *q, char *page) (unsigned long long)q->limits.max_write_zeroes_sectors << 9); } +static ssize_t queue_zone_append_max_show(struct request_queue *q, char *page) +{ + unsigned long long max_sectors = q->limits.max_zone_append_sectors; + + return sprintf(page, "%llu\n", max_sectors << SECTOR_SHIFT); +} + static ssize_t queue_max_sectors_store(struct request_queue *q, const char *page, size_t count) { @@ -639,6 +646,11 @@ static struct queue_sysfs_entry queue_write_zeroes_max_entry = { .show = queue_write_zeroes_max_show, }; +static struct queue_sysfs_entry queue_zone_append_max_entry = { + .attr = {.name = "zone_append_max_bytes", .mode = 0444 }, + .show = queue_zone_append_max_show, +}; + static struct queue_sysfs_entry queue_nonrot_entry = { .attr = {.name = "rotational", .mode = 0644 }, .show = queue_show_nonrot, @@ -749,6 +761,7 @@ static struct attribute *queue_attrs[] = { &queue_discard_zeroes_data_entry.attr, &queue_write_same_max_entry.attr, &queue_write_zeroes_max_entry.attr, + &queue_zone_append_max_entry.attr, &queue_nonrot_entry.attr, &queue_zoned_entry.attr, &queue_nr_zones_entry.attr, diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index ad97369ffabd..b9e8f55cf8c4 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -1690,6 +1690,7 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx, case BLK_STS_OK: break; case BLK_STS_RESOURCE: + case BLK_STS_ZONE_RESOURCE: if (atomic_read(&sdev->device_busy) || scsi_device_blocked(sdev)) ret = BLK_STS_DEV_RESOURCE; diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index 70254ae11769..824ec2d89954 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -63,6 +63,18 @@ typedef u8 __bitwise blk_status_t; */ #define BLK_STS_DEV_RESOURCE ((__force blk_status_t)13) +/* + * BLK_STS_ZONE_RESOURCE is returned from the driver to the block layer if zone + * related resources are unavailable, but the driver can guarantee the queue + * will be rerun in the future once the resources become available again. + * + * This is different from BLK_STS_DEV_RESOURCE in that it explicitly references + * a zone specific resource and IO to a different zone on the same device could + * still be served. Examples of that are zones that are write-locked, but a read + * to the same zone could be served. + */ +#define BLK_STS_ZONE_RESOURCE ((__force blk_status_t)14) + /** * blk_path_error - returns true if error may be path related * @error: status the request was completed with @@ -296,6 +308,8 @@ enum req_opf { REQ_OP_ZONE_CLOSE = 11, /* Transition a zone to full */ REQ_OP_ZONE_FINISH = 12, + /* write data at the current zone write pointer */ + REQ_OP_ZONE_APPEND = 13, /* SCSI passthrough using struct scsi_request */ REQ_OP_SCSI_IN = 32, diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index e47888a7d80b..774947365341 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -336,6 +336,7 @@ struct queue_limits { unsigned int max_hw_discard_sectors; unsigned int max_write_same_sectors; unsigned int max_write_zeroes_sectors; + unsigned int max_zone_append_sectors; unsigned int discard_granularity; unsigned int discard_alignment; @@ -757,6 +758,9 @@ static inline bool rq_mergeable(struct request *rq) if (req_op(rq) == REQ_OP_WRITE_ZEROES) return false; + if (req_op(rq) == REQ_OP_ZONE_APPEND) + return false; + if (rq->cmd_flags & REQ_NOMERGE_FLAGS) return false; if (rq->rq_flags & RQF_NOMERGE_FLAGS) @@ -1091,6 +1095,8 @@ extern void blk_queue_max_write_same_sectors(struct request_queue *q, extern void blk_queue_max_write_zeroes_sectors(struct request_queue *q, unsigned int max_write_same_sectors); extern void blk_queue_logical_block_size(struct request_queue *, unsigned int); +extern void blk_queue_max_zone_append_sectors(struct request_queue *q, + unsigned int max_zone_append_sectors); extern void blk_queue_physical_block_size(struct request_queue *, unsigned int); extern void blk_queue_alignment_offset(struct request_queue *q, unsigned int alignment); @@ -1303,6 +1309,11 @@ static inline unsigned int queue_max_segment_size(const struct request_queue *q) return q->limits.max_segment_size; } +static inline unsigned int queue_max_zone_append_sectors(const struct request_queue *q) +{ + return q->limits.max_zone_append_sectors; +} + static inline unsigned queue_logical_block_size(const struct request_queue *q) { int retval = 512; From patchwork Wed Apr 15 09:05:07 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Thumshirn X-Patchwork-Id: 11490655 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A2F8781 for ; Wed, 15 Apr 2020 09:07:13 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8AC2E2082E for ; Wed, 15 Apr 2020 09:07:13 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="a0XyQ9N/" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2408236AbgDOJGw (ORCPT ); Wed, 15 Apr 2020 05:06:52 -0400 Received: from esa6.hgst.iphmx.com ([216.71.154.45]:19511 "EHLO esa6.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2393860AbgDOJFc (ORCPT ); Wed, 15 Apr 2020 05:05:32 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1586941532; x=1618477532; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=idI1WH8nxkPnh+O/RqijVff+VKCcvan/3RihAz7vyTw=; b=a0XyQ9N/k/lG1Fd8q6KoPB4n12VcDANkHGa/zFP95PMRh4fSzOyJ7t/k 44E154T8CeaciFrih9jQsEyNo0tjekPHk/xMKgS/MOTKGjvv+9xM28ADz hzzs9ZiaJx2Duvh5MYf0MTuh9RYPNVNYZ/jKzACqHbuIcF/izaVLpWKnZ vKf8KzmPJon4ILUyhSMrFtv6Gbf0QSII9i5kPMtL9/F15rMA0+zvKoKZs wNoQePClwFTbpwsoMT7DkvIUf8e+6Dtl7sVlMYP/RPFYJfUYDj2kYTPS7 m6tUB01893lXIISZ3/becRObLKfJIOGjSOlmaWMA3yo1qUDJDiPMK4AGe w==; IronPort-SDR: 9wrwBFjRsUc9S1qHHiLblQNKC3SD6nqvuaq64fUmjYy4rFv8TiwNn/Qd3jnoGVnrgTrr3tvMrI yBG8C2G71T/shB1Xn5HATxEVQt6HJbL3H3UeHFTP9ePpd4vQ4JBPXOMmf5fVKhIgBWiXbT6tTB bV2mUpCTxybJvSbeywp2A8LVsmlXHMWrz3a9jXwfd5bevjbn31nLTR3btO7SpTF80wSTZcgoSf NL33lM8t7kVgG3Eh6ttexYeU6htfUDUh4hDHEGezrrF5pjBlIM9aQX1yt7lgdBIikwRh1zdxAt nGM= X-IronPort-AV: E=Sophos;i="5.72,386,1580745600"; d="scan'208";a="136802984" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 15 Apr 2020 17:05:32 +0800 IronPort-SDR: LZIR/hTgf9K5yjr4UpOlj2+U1N0nNbSC8KziFhzJ55xJ3zCqYyZ2Wr6uG+QECB++vep9DlAdrE bx7O0SveGZ/RqkyZjY+RpCThUCyAyuh7EN5Rq6qStm1jBTHQMsms/uBm1Kw/2wTSxcrHoCSkKR tbP4PlJoPT0l0j6R+VB18yLWTsmFTIswAaycES4QCPAzfkbf5UCOKNx1XkNkY1MIm82MJ7lBLu ztdc+V90qaheLHpHBaAFvbQl7q7Tf3721tDRfnURixuiQjfQhDclxiY82Ltqy4uUGc2sfcfKUu sga765D8yLEq3xyNmsaq+VzH Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Apr 2020 01:56:33 -0700 IronPort-SDR: HkG8rv9h3gsOoqwpUjPARDHl84hAQkYxswWrcoSyJpMtpqKtMizwBLoyue+NEdDqfaBOR2k1ik kepzVsNApdvR0DWGZimcM/gvEL10iCiqj/Ys/8CUhSK3dLfgJdeOWagZERhNJ+C/PQvHgg2PuB 8UQOfMMTfSZW9JW/gUMqm06tFv+5+AEdfUGv9Wqb1Gh8f5U+fS2AZw0iWm6DZLa+UA7YmvCcT5 idaS21mcOp17PoboDz7R7Qjw8EN1pxLzCB3hoG0yc+DR86UZG5mN1YeSepOpdU9ESAdi343Vjb MNM= WDCIronportException: Internal Received: from unknown (HELO redsun60.ssa.fujisawa.hgst.com) ([10.149.66.36]) by uls-op-cesaip01.wdc.com with ESMTP; 15 Apr 2020 02:05:30 -0700 From: Johannes Thumshirn To: Jens Axboe Cc: Christoph Hellwig , linux-block , Damien Le Moal , Keith Busch , "linux-scsi @ vger . kernel . org" , "Martin K . Petersen" , "linux-fsdevel @ vger . kernel . org" , Johannes Thumshirn , Christoph Hellwig Subject: [PATCH v6 05/11] block: introduce blk_req_zone_write_trylock Date: Wed, 15 Apr 2020 18:05:07 +0900 Message-Id: <20200415090513.5133-6-johannes.thumshirn@wdc.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200415090513.5133-1-johannes.thumshirn@wdc.com> References: <20200415090513.5133-1-johannes.thumshirn@wdc.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Introduce blk_req_zone_write_trylock(), which either grabs the write-lock for a sequential zone or returns false, if the zone is already locked. Signed-off-by: Johannes Thumshirn Reviewed-by: Christoph Hellwig --- block/blk-zoned.c | 14 ++++++++++++++ include/linux/blkdev.h | 1 + 2 files changed, 15 insertions(+) diff --git a/block/blk-zoned.c b/block/blk-zoned.c index f87956e0dcaf..c822cfa7a102 100644 --- a/block/blk-zoned.c +++ b/block/blk-zoned.c @@ -82,6 +82,20 @@ bool blk_req_needs_zone_write_lock(struct request *rq) } EXPORT_SYMBOL_GPL(blk_req_needs_zone_write_lock); +bool blk_req_zone_write_trylock(struct request *rq) +{ + unsigned int zno = blk_rq_zone_no(rq); + + if (test_and_set_bit(zno, rq->q->seq_zones_wlock)) + return false; + + WARN_ON_ONCE(rq->rq_flags & RQF_ZONE_WRITE_LOCKED); + rq->rq_flags |= RQF_ZONE_WRITE_LOCKED; + + return true; +} +EXPORT_SYMBOL_GPL(blk_req_zone_write_trylock); + void __blk_req_zone_write_lock(struct request *rq) { if (WARN_ON_ONCE(test_and_set_bit(blk_rq_zone_no(rq), diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 774947365341..0797d1e81802 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -1740,6 +1740,7 @@ extern int bdev_write_page(struct block_device *, sector_t, struct page *, #ifdef CONFIG_BLK_DEV_ZONED bool blk_req_needs_zone_write_lock(struct request *rq); +bool blk_req_zone_write_trylock(struct request *rq); void __blk_req_zone_write_lock(struct request *rq); void __blk_req_zone_write_unlock(struct request *rq); From patchwork Wed Apr 15 09:05:08 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Thumshirn X-Patchwork-Id: 11490591 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id AEA2E81 for ; Wed, 15 Apr 2020 09:05:57 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 94A84206E9 for ; Wed, 15 Apr 2020 09:05:57 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="KRMMq/dx" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2393870AbgDOJFr (ORCPT ); Wed, 15 Apr 2020 05:05:47 -0400 Received: from esa6.hgst.iphmx.com ([216.71.154.45]:19515 "EHLO esa6.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2393862AbgDOJFf (ORCPT ); Wed, 15 Apr 2020 05:05:35 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1586941535; x=1618477535; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=EZGgCaxhmOjWPMpG7WdR3trpTne96AwC+nAojHGYRdc=; b=KRMMq/dxdZHNjoV+OvXfNPfsghap6rqDGiCtVQP/CNkUbaoRXg3s+p3+ c3+l2EGYQdGn2p/rESF1sA0AnkQcmeypXLHwmGD3nDd7ZUQWA0F5KyFlA jLHPbCI4SUNRFlt7uIURV8J3zamB6NK7MFT9JLo4P/OL+dh4WNd4+xzPe nWx9UPkuAxhdBvFScCb1l2jRUsdN5y4FLhdTLXE9qcw5+nyYGfXZ9+Lf8 ZNsn0pJfnMcgfmqcm+JINa2oqwLv1bBvxoaQ9QRqDtq3HzugwEQi8/Wut xAA7eLHatEw9p7+1dObhVbb9BRan4E3rPeIeywTJyB9sWr9sevobgznGe w==; IronPort-SDR: O/Yy1NarxNjJQ9Qj6AuvfCG//Mzof87BRHupzIRlNkzgj3UyWTlkEW1hiIuxvh2jx5N85hrDaH RdVtT91aLKGPHKadwyAB08ekJDmNgp0EW1HduDZXA6sQSLVQOgJo7+KI48843u88JC5ZxrQ+BR 6El62SaF3yglZHNvSU4Smj7EIAV8Dp9zMb2HdxR3PAC/7DAlAuOD/x9A7a13ax7z49ggB7A+bR sq7rvqwRPhOBgK8rMqpwGnDw2fw55viSE27Lj3gLxkosOTa+oQxVEjVhe/elJHuEG9Jyp9jsvo aoE= X-IronPort-AV: E=Sophos;i="5.72,386,1580745600"; d="scan'208";a="136802987" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 15 Apr 2020 17:05:33 +0800 IronPort-SDR: NHNLDTPTPM1js7NMbKRMP3yQ1bUdMJ8r82k2k8qGaUMZLycIl6ruqpbr+UYX8Fm4YFDMxtrK/L gaYcA3Jn3J6ccBTRSwM0G9zisIHFMvuia9F2NylhyaGsCcdEtJoHZvPcTzti73jC+nu9s4x/fW UrYBYosoRdhJBe6LDGbH+S2s7LLy02WvGASWNx7kCuj84pUVE4wAE/54rufESPCoR302+ouTpA iqfo8xta3S29uzaH9aoEOMHQBd0QYl9p2yWUeRQkevOj5b/tJ7WhC2XqvXI3IEkUhSvoqP4Fut 17yBtPJ4/V3SnP9IKW6VnA4w Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Apr 2020 01:56:35 -0700 IronPort-SDR: J4P2vw+hHRLrI+fFzRnvFTTG6uvGiNmHYnMX3PE5RUE7sANHkmvBqqbL7LkEClKL5NWaDGateo B3HmnnYSrQ98nJQgmxQXpdMMf2rSS+H0oddW9re7H1M9xxCXLbRBv3QWVurIM07/MU/lzzQ698 VZ7B422+Vq1bvOMXbP+KHmxDmrRt1h7Som1UIC8CZ8EjLNBK515OO3zTpRq7njq1bfqLwLYmXn rDss1JxpAM1bSm1YTtaPD3f/16HUEz/5I588hEy6aHNIyoKd2tb4cpys7QMS33yZzablZlEfWY a8g= WDCIronportException: Internal Received: from unknown (HELO redsun60.ssa.fujisawa.hgst.com) ([10.149.66.36]) by uls-op-cesaip01.wdc.com with ESMTP; 15 Apr 2020 02:05:32 -0700 From: Johannes Thumshirn To: Jens Axboe Cc: Christoph Hellwig , linux-block , Damien Le Moal , Keith Busch , "linux-scsi @ vger . kernel . org" , "Martin K . Petersen" , "linux-fsdevel @ vger . kernel . org" , Damien Le Moal , Johannes Thumshirn Subject: [PATCH v6 06/11] block: Modify revalidate zones Date: Wed, 15 Apr 2020 18:05:08 +0900 Message-Id: <20200415090513.5133-7-johannes.thumshirn@wdc.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200415090513.5133-1-johannes.thumshirn@wdc.com> References: <20200415090513.5133-1-johannes.thumshirn@wdc.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Damien Le Moal Modify the interface of blk_revalidate_disk_zones() to add an optional driver callback function that a driver can use to extend processing done during zone revalidation. The callback, if defined, is executed with the device request queue frozen, after all zones have been inspected. Signed-off-by: Damien Le Moal Signed-off-by: Johannes Thumshirn --- block/blk-zoned.c | 8 +++++++- drivers/block/null_blk_zoned.c | 2 +- include/linux/blkdev.h | 3 ++- 3 files changed, 10 insertions(+), 3 deletions(-) diff --git a/block/blk-zoned.c b/block/blk-zoned.c index c822cfa7a102..2912e964d7b2 100644 --- a/block/blk-zoned.c +++ b/block/blk-zoned.c @@ -471,14 +471,18 @@ static int blk_revalidate_zone_cb(struct blk_zone *zone, unsigned int idx, /** * blk_revalidate_disk_zones - (re)allocate and initialize zone bitmaps * @disk: Target disk + * @driver_cb: LLD callback * * Helper function for low-level device drivers to (re) allocate and initialize * a disk request queue zone bitmaps. This functions should normally be called * within the disk ->revalidate method for blk-mq based drivers. For BIO based * drivers only q->nr_zones needs to be updated so that the sysfs exposed value * is correct. + * If the @driver_cb callback function is not NULL, the callback is executed + * with the device request queue frozen after all zones have been checked. */ -int blk_revalidate_disk_zones(struct gendisk *disk) +int blk_revalidate_disk_zones(struct gendisk *disk, + void (*driver_cb)(struct gendisk *disk)) { struct request_queue *q = disk->queue; struct blk_revalidate_zone_args args = { @@ -512,6 +516,8 @@ int blk_revalidate_disk_zones(struct gendisk *disk) q->nr_zones = args.nr_zones; swap(q->seq_zones_wlock, args.seq_zones_wlock); swap(q->conv_zones_bitmap, args.conv_zones_bitmap); + if (driver_cb) + driver_cb(disk); ret = 0; } else { pr_warn("%s: failed to revalidate zones\n", disk->disk_name); diff --git a/drivers/block/null_blk_zoned.c b/drivers/block/null_blk_zoned.c index 9e4bcdad1a80..46641df2e58e 100644 --- a/drivers/block/null_blk_zoned.c +++ b/drivers/block/null_blk_zoned.c @@ -73,7 +73,7 @@ int null_register_zoned_dev(struct nullb *nullb) struct request_queue *q = nullb->q; if (queue_is_mq(q)) - return blk_revalidate_disk_zones(nullb->disk); + return blk_revalidate_disk_zones(nullb->disk, NULL); blk_queue_chunk_sectors(q, nullb->dev->zone_size_sects); q->nr_zones = blkdev_nr_zones(nullb->disk); diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 0797d1e81802..62fe9f962478 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -362,7 +362,8 @@ unsigned int blkdev_nr_zones(struct gendisk *disk); extern int blkdev_zone_mgmt(struct block_device *bdev, enum req_opf op, sector_t sectors, sector_t nr_sectors, gfp_t gfp_mask); -extern int blk_revalidate_disk_zones(struct gendisk *disk); +int blk_revalidate_disk_zones(struct gendisk *disk, + void (*driver_cb)(struct gendisk *disk)); extern int blkdev_report_zones_ioctl(struct block_device *bdev, fmode_t mode, unsigned int cmd, unsigned long arg); From patchwork Wed Apr 15 09:05:09 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Thumshirn X-Patchwork-Id: 11490615 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5B647112C for ; Wed, 15 Apr 2020 09:06:36 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 43ABC20768 for ; Wed, 15 Apr 2020 09:06:36 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="C6qmKpZn" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2393887AbgDOJGa (ORCPT ); Wed, 15 Apr 2020 05:06:30 -0400 Received: from esa6.hgst.iphmx.com ([216.71.154.45]:19517 "EHLO esa6.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2408233AbgDOJFt (ORCPT ); Wed, 15 Apr 2020 05:05:49 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1586941549; x=1618477549; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=af+TpmrNaP6opwMEPh6jKrw6Bxfh1IsevFX/PW8xfPo=; b=C6qmKpZnQhnDmtSiaysJUc71lChwkaOfY64qVWUoxV+tV8KwGy+h7mbK ihmKvsJpV9WDMSz9XJXOSynomuoIngmfCu9bWz+eVdkorY0Kcom6/AYJb psnhtEkK27kzWAKfveUgvn1WR8AAfr1jHYsme1cPyfJj/2QNTYpVpx3FY GVM3y4COEm4hAhGIU/QDzXlgTDDFMMTXbqCpBAyZ8BYxd/OnYTCWVx/Tu tR4QFlV4i0u0+PPm1QtwkanDpv47gFSrx/YzDlSTawt9hzjZRIAzw0u/e Ul4Cb/K8KOLpPfTGD4WUn/I8bx9rtKaKsXWF7h8t7mfH8zRC7KFIqW3PJ Q==; IronPort-SDR: WgLlcu/lgawOUmvV/VnmE0FJVWNMZL858hrP5WWZCapt2YavEpE6ja76a1GJd7yfNbrR5lqyrn D26J2AXJjlHNdrNGUrl2Y627d2GS+wYiuXrBmEEiF44fib5kLkx37X5vB77abbDPHBYkNA9cW+ 9HlPJ6hd4LZ03G4qOWc0uLoLDEqzzwcef0dWhgFKn/FyNu5JsHMTmyonHDDHI+hQ8u/Q41g9nX lc+xqOrYsgR37MiloqoSwvTU01HkBevbSx5zGSqMTsVrCplh9Y/uzUgTrcimW7ggC5k7LwkREt wx4= X-IronPort-AV: E=Sophos;i="5.72,386,1580745600"; d="scan'208";a="136802991" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 15 Apr 2020 17:05:35 +0800 IronPort-SDR: WZtRsJtSBy5gsk782FdxKd9ZymcXAeXls45IZZNre/wCUrtgsXQMCcqkYlyA55eH0yjZO8Ahei XBwKXKab3EhxAGKjOf9Jqhswfr+H+AEe1MRqE5uCk5PYLf0z9vcP0CNc4VRhb7hMQwyZLyG4Vc pSkO2t9W87EwjtA/azUA4BKYM9wXpkrRDM6Lp8DiUwa8X23d3UVmznjZ1KOq1ao1HzVe8azA5M Pjd6lPdX1rE3VVlHRyRAVWk+VnyQJNFcvTfGJ91laHY+MJE2sgJoeeXOaevWbw8tZR0bYcn5ok Xp4Oec0RkcasoPi+sm1dwIaf Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Apr 2020 01:56:37 -0700 IronPort-SDR: 25972+T8VhQ2ZGJbPcCsXluRRQ3BQC9YUkcuIJnAK8r7syvwfhITkDuVHf+6Q4TVN5lsiJfn7K xuduaUD13+2Sf+QZiL451i0OB9HD7bFtoGr3N6k1wuMH0vS8BUwo08jtiHAo3ZjeT43+JkHtfq m0R/xd4mwkZD2ncZzmTMD5WDaGsNhVpM5x06npqn99CZJVGoWifLy6pSd6llqf8ddvFqxitDme WfFlOKK8aZv7owojSPARP1cbPf5Ofq5EaA6l0QPJ881GYYaZcZDsVBkhyxyPzOkZrQOla+ErG1 nhU= WDCIronportException: Internal Received: from unknown (HELO redsun60.ssa.fujisawa.hgst.com) ([10.149.66.36]) by uls-op-cesaip01.wdc.com with ESMTP; 15 Apr 2020 02:05:34 -0700 From: Johannes Thumshirn To: Jens Axboe Cc: Christoph Hellwig , linux-block , Damien Le Moal , Keith Busch , "linux-scsi @ vger . kernel . org" , "Martin K . Petersen" , "linux-fsdevel @ vger . kernel . org" , Johannes Thumshirn , Christoph Hellwig Subject: [PATCH v6 07/11] scsi: sd_zbc: factor out sanity checks for zoned commands Date: Wed, 15 Apr 2020 18:05:09 +0900 Message-Id: <20200415090513.5133-8-johannes.thumshirn@wdc.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200415090513.5133-1-johannes.thumshirn@wdc.com> References: <20200415090513.5133-1-johannes.thumshirn@wdc.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Factor sanity checks for zoned commands from sd_zbc_setup_zone_mgmt_cmnd(). This will help with the introduction of an emulated ZONE_APPEND command. Signed-off-by: Johannes Thumshirn Reviewed-by: Christoph Hellwig --- drivers/scsi/sd_zbc.c | 36 +++++++++++++++++++++++++----------- 1 file changed, 25 insertions(+), 11 deletions(-) diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c index f45c22b09726..ee156fbf3780 100644 --- a/drivers/scsi/sd_zbc.c +++ b/drivers/scsi/sd_zbc.c @@ -209,6 +209,26 @@ int sd_zbc_report_zones(struct gendisk *disk, sector_t sector, return ret; } +static blk_status_t sd_zbc_cmnd_checks(struct scsi_cmnd *cmd) +{ + struct request *rq = cmd->request; + struct scsi_disk *sdkp = scsi_disk(rq->rq_disk); + sector_t sector = blk_rq_pos(rq); + + if (!sd_is_zoned(sdkp)) + /* Not a zoned device */ + return BLK_STS_IOERR; + + if (sdkp->device->changed) + return BLK_STS_IOERR; + + if (sector & (sd_zbc_zone_sectors(sdkp) - 1)) + /* Unaligned request */ + return BLK_STS_IOERR; + + return BLK_STS_OK; +} + /** * sd_zbc_setup_zone_mgmt_cmnd - Prepare a zone ZBC_OUT command. The operations * can be RESET WRITE POINTER, OPEN, CLOSE or FINISH. @@ -223,20 +243,14 @@ blk_status_t sd_zbc_setup_zone_mgmt_cmnd(struct scsi_cmnd *cmd, unsigned char op, bool all) { struct request *rq = cmd->request; - struct scsi_disk *sdkp = scsi_disk(rq->rq_disk); sector_t sector = blk_rq_pos(rq); + struct scsi_disk *sdkp = scsi_disk(rq->rq_disk); sector_t block = sectors_to_logical(sdkp->device, sector); + blk_status_t ret; - if (!sd_is_zoned(sdkp)) - /* Not a zoned device */ - return BLK_STS_IOERR; - - if (sdkp->device->changed) - return BLK_STS_IOERR; - - if (sector & (sd_zbc_zone_sectors(sdkp) - 1)) - /* Unaligned request */ - return BLK_STS_IOERR; + ret = sd_zbc_cmnd_checks(cmd); + if (ret != BLK_STS_OK) + return ret; cmd->cmd_len = 16; memset(cmd->cmnd, 0, cmd->cmd_len); From patchwork Wed Apr 15 09:05:10 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Thumshirn X-Patchwork-Id: 11490593 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C473C112C for ; Wed, 15 Apr 2020 09:06:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9CF3A2082E for ; Wed, 15 Apr 2020 09:06:15 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="Ry5r/zk+" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2408239AbgDOJGK (ORCPT ); Wed, 15 Apr 2020 05:06:10 -0400 Received: from esa6.hgst.iphmx.com ([216.71.154.45]:19511 "EHLO esa6.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2408236AbgDOJFu (ORCPT ); Wed, 15 Apr 2020 05:05:50 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1586941549; x=1618477549; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=rfAo4j6tAn8UbRDdmRIBk7ukgBVBIMhbdmDZLITX+vM=; b=Ry5r/zk++1lbwhZKSRtFU0s8p636XVSP3R1Y5lX/eWEMK+jqZwjmuzC8 pyH1iN7PdBhMONw1C/52oS+mxEEuGmqm/uor3wuP7IgQHWED5iZZEpKCZ YrywW1EOIxJPKNZ3/okUjlMZwUh+PJHafbNKTyqIw33cSaB+8i8Ku3Oze QhW/zI+qS8bY14d5j100JAD+nk+9oj8YkpvKaI6M3b3hZ7bHcXKlQOSCI Sl1MUB2mv6Aftz0ExHnSixjGYdKiuyhj5PLHnU6m1fzp4uOcBdfO7c56p nku730HuifK1xYYGWCQ01anHA/R/0eutzg2YTjHWc6WZVUh0dqfWivKGi w==; IronPort-SDR: icpsnZgikfxHAbS30ywKCPayIjSwtuSMZ3glk3tbkPCHSTu6+nT4f4+sgT9j5GVoH+xfHWveKz DNjIac25teo/NGbi6zCMO2jZaEam5gFomePNUxwQNqubbLAQ0JM9tKRqNTEwTNF7AzjrfqQ49H Y2LgcmQAaBlz6UxlHFcmsum0ohFJfegcExAGqNTxl2vq8cjOTfllsEsTSHdqjbhAI+qhWHQZfG L+LAgizDkG9CTSPCxI3aMBCtakVcIYtpEmugLAm7tOfJNgpLxcMvm5u6VARivaXooOW43B0Cie B/I= X-IronPort-AV: E=Sophos;i="5.72,386,1580745600"; d="scan'208";a="136802995" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 15 Apr 2020 17:05:37 +0800 IronPort-SDR: u1m1w8AlPg3GQksOF9PFsfnPca+aIV/nK0y9/Z3MmdBK346EKRzglBZ+mFVXwiEXhNHqMbXgY7 wM0WqOJi5R34uZdZDb/lKik7ajLycJXSnLTa40Y9Zd6emeh23psrReqiQhHqlSk+Am0lZJSLdD ZJ/JlXrjIH9bvDZjwC/J0TaeReVn1J2pZyG7IokAkksLaUSnnfsOPXPpYMNSdHXiwlU4zuoLTa guRsG26NTOwKrzS+6tMYWNgvt7ahY3GKSuDG1ipqazkmoZWeu22qAqSBppGgSLUTl6fPxiZG4X Y+8DUM3AoIrcv6rnBs68b0+O Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Apr 2020 01:56:39 -0700 IronPort-SDR: b+Z+MTe2+ZFBUNmpULQL+uUExwP87vCYEX+4JrJT2Mzmcoe8DssA7S0anai/uDF0URO8FE0J15 4mfES97U8G2poYfxivUzW0Jy8BEjkcEivQLmGnzDiMnZAc3YtBaonS/MnT8zalDptPFXbFPJdR UkCdlHIgPjWtEPddOzeiDC6Dr5wKNtDQwAwTXvPpsNmn3c+p3n6D3afl/KCdxFO3HTbPWjOf6H K3/F8p+mbn6NwssQ9rRStzqQGZPPlI4oagFFdR38LO+/y5Recy4gzYp9SepBoc70Rq97KwCRan WbI= WDCIronportException: Internal Received: from unknown (HELO redsun60.ssa.fujisawa.hgst.com) ([10.149.66.36]) by uls-op-cesaip01.wdc.com with ESMTP; 15 Apr 2020 02:05:36 -0700 From: Johannes Thumshirn To: Jens Axboe Cc: Christoph Hellwig , linux-block , Damien Le Moal , Keith Busch , "linux-scsi @ vger . kernel . org" , "Martin K . Petersen" , "linux-fsdevel @ vger . kernel . org" , Johannes Thumshirn Subject: [PATCH v6 08/11] scsi: sd_zbc: emulate ZONE_APPEND commands Date: Wed, 15 Apr 2020 18:05:10 +0900 Message-Id: <20200415090513.5133-9-johannes.thumshirn@wdc.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200415090513.5133-1-johannes.thumshirn@wdc.com> References: <20200415090513.5133-1-johannes.thumshirn@wdc.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Emulate ZONE_APPEND for SCSI disks using a regular WRITE(16) command with a start LBA set to the target zone write pointer position. In order to always know the write pointer position of a sequential write zone, the write pointer of all zones is tracked using an array of 32bits zone write pointer offset attached to the scsi disk structure. Each entry of the array indicate a zone write pointer position relative to the zone start sector. The write pointer offsets are maintained in sync with the device as follows: 1) the write pointer offset of a zone is reset to 0 when a REQ_OP_ZONE_RESET command completes. 2) the write pointer offset of a zone is set to the zone size when a REQ_OP_ZONE_FINISH command completes. 3) the write pointer offset of a zone is incremented by the number of 512B sectors written when a write, write same or a zone append command completes. 4) the write pointer offset of all zones is reset to 0 when a REQ_OP_ZONE_RESET_ALL command completes. Since the block layer does not write lock zones for zone append commands, to ensure a sequential ordering of the regular write commands used for the emulation, the target zone of a zone append command is locked when the function sd_zbc_prepare_zone_append() is called from sd_setup_read_write_cmnd(). If the zone write lock cannot be obtained (e.g. a zone append is in-flight or a regular write has already locked the zone), the zone append command dispatching is delayed by returning BLK_STS_ZONE_RESOURCE. To avoid the need for write locking all zones for REQ_OP_ZONE_RESET_ALL requests, use a spinlock to protect accesses and modifications of the zone write pointer offsets. This spinlock is initialized from sd_probe() using the new function sd_zbc_init(). Co-developed-by: Damien Le Moal Signed-off-by: Johannes Thumshirn Reviewed-by: Christoph Hellwig --- drivers/scsi/sd.c | 24 +++- drivers/scsi/sd.h | 43 +++++- drivers/scsi/sd_zbc.c | 323 ++++++++++++++++++++++++++++++++++++++++-- 3 files changed, 370 insertions(+), 20 deletions(-) diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index a793cb08d025..66ff5f04c0ce 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -1206,6 +1206,12 @@ static blk_status_t sd_setup_read_write_cmnd(struct scsi_cmnd *cmd) } } + if (req_op(rq) == REQ_OP_ZONE_APPEND) { + ret = sd_zbc_prepare_zone_append(cmd, &lba, nr_blocks); + if (ret) + return ret; + } + fua = rq->cmd_flags & REQ_FUA ? 0x8 : 0; dix = scsi_prot_sg_count(cmd); dif = scsi_host_dif_capable(cmd->device->host, sdkp->protection_type); @@ -1287,6 +1293,7 @@ static blk_status_t sd_init_command(struct scsi_cmnd *cmd) return sd_setup_flush_cmnd(cmd); case REQ_OP_READ: case REQ_OP_WRITE: + case REQ_OP_ZONE_APPEND: return sd_setup_read_write_cmnd(cmd); case REQ_OP_ZONE_RESET: return sd_zbc_setup_zone_mgmt_cmnd(cmd, ZO_RESET_WRITE_POINTER, @@ -2055,7 +2062,7 @@ static int sd_done(struct scsi_cmnd *SCpnt) out: if (sd_is_zoned(sdkp)) - sd_zbc_complete(SCpnt, good_bytes, &sshdr); + good_bytes = sd_zbc_complete(SCpnt, good_bytes, &sshdr); SCSI_LOG_HLCOMPLETE(1, scmd_printk(KERN_INFO, SCpnt, "sd_done: completed %d of %d bytes\n", @@ -3372,6 +3379,10 @@ static int sd_probe(struct device *dev) sdkp->first_scan = 1; sdkp->max_medium_access_timeouts = SD_MAX_MEDIUM_TIMEOUTS; + error = sd_zbc_init_disk(sdkp); + if (error) + goto out_free_index; + sd_revalidate_disk(gd); gd->flags = GENHD_FL_EXT_DEVT; @@ -3409,6 +3420,7 @@ static int sd_probe(struct device *dev) out_put: put_disk(gd); out_free: + sd_zbc_release_disk(sdkp); kfree(sdkp); out: scsi_autopm_put_device(sdp); @@ -3485,6 +3497,8 @@ static void scsi_disk_release(struct device *dev) put_disk(disk); put_device(&sdkp->device->sdev_gendev); + sd_zbc_release_disk(sdkp); + kfree(sdkp); } @@ -3665,19 +3679,19 @@ static int __init init_sd(void) if (!sd_page_pool) { printk(KERN_ERR "sd: can't init discard page pool\n"); err = -ENOMEM; - goto err_out_ppool; + goto err_out_cdb_pool; } err = scsi_register_driver(&sd_template.gendrv); if (err) - goto err_out_driver; + goto err_out_ppool; return 0; -err_out_driver: +err_out_ppool: mempool_destroy(sd_page_pool); -err_out_ppool: +err_out_cdb_pool: mempool_destroy(sd_cdb_pool); err_out_cache: diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h index 50fff0bf8c8e..6009311105ef 100644 --- a/drivers/scsi/sd.h +++ b/drivers/scsi/sd.h @@ -79,6 +79,12 @@ struct scsi_disk { u32 zones_optimal_open; u32 zones_optimal_nonseq; u32 zones_max_open; + u32 *zones_wp_ofst; + spinlock_t zones_wp_ofst_lock; + u32 *rev_wp_ofst; + struct mutex rev_mutex; + struct work_struct zone_wp_ofst_work; + char *zone_wp_update_buf; #endif atomic_t openers; sector_t capacity; /* size in logical blocks */ @@ -207,17 +213,35 @@ static inline int sd_is_zoned(struct scsi_disk *sdkp) #ifdef CONFIG_BLK_DEV_ZONED +int sd_zbc_init_disk(struct scsi_disk *sdkp); +void sd_zbc_release_disk(struct scsi_disk *sdkp); extern int sd_zbc_read_zones(struct scsi_disk *sdkp, unsigned char *buffer); extern void sd_zbc_print_zones(struct scsi_disk *sdkp); blk_status_t sd_zbc_setup_zone_mgmt_cmnd(struct scsi_cmnd *cmd, unsigned char op, bool all); -extern void sd_zbc_complete(struct scsi_cmnd *cmd, unsigned int good_bytes, - struct scsi_sense_hdr *sshdr); +unsigned int sd_zbc_complete(struct scsi_cmnd *cmd, unsigned int good_bytes, + struct scsi_sense_hdr *sshdr); int sd_zbc_report_zones(struct gendisk *disk, sector_t sector, unsigned int nr_zones, report_zones_cb cb, void *data); +blk_status_t sd_zbc_prepare_zone_append(struct scsi_cmnd *cmd, sector_t *lba, + unsigned int nr_blocks); + #else /* CONFIG_BLK_DEV_ZONED */ +static inline int sd_zbc_init(void) +{ + return 0; +} + +static inline int sd_zbc_init_disk(struct scsi_disk *sdkp) +{ + return 0; +} + +static inline void sd_zbc_exit(void) {} +static inline void sd_zbc_release_disk(struct scsi_disk *sdkp) {} + static inline int sd_zbc_read_zones(struct scsi_disk *sdkp, unsigned char *buf) { @@ -233,9 +257,18 @@ static inline blk_status_t sd_zbc_setup_zone_mgmt_cmnd(struct scsi_cmnd *cmd, return BLK_STS_TARGET; } -static inline void sd_zbc_complete(struct scsi_cmnd *cmd, - unsigned int good_bytes, - struct scsi_sense_hdr *sshdr) {} +static inline unsigned int sd_zbc_complete(struct scsi_cmnd *cmd, + unsigned int good_bytes, struct scsi_sense_hdr *sshdr) +{ + return 0; +} + +static inline blk_status_t sd_zbc_prepare_zone_append(struct scsi_cmnd *cmd, + sector_t *lba, + unsigned int nr_blocks) +{ + return BLK_STS_TARGET; +} #define sd_zbc_report_zones NULL diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c index ee156fbf3780..91002720c66c 100644 --- a/drivers/scsi/sd_zbc.c +++ b/drivers/scsi/sd_zbc.c @@ -11,6 +11,7 @@ #include #include #include +#include #include @@ -19,11 +20,36 @@ #include "sd.h" +static unsigned int sd_zbc_get_zone_wp_ofst(struct blk_zone *zone) +{ + if (zone->type == ZBC_ZONE_TYPE_CONV) + return 0; + + switch (zone->cond) { + case BLK_ZONE_COND_IMP_OPEN: + case BLK_ZONE_COND_EXP_OPEN: + case BLK_ZONE_COND_CLOSED: + return zone->wp - zone->start; + case BLK_ZONE_COND_FULL: + return zone->len; + case BLK_ZONE_COND_EMPTY: + case BLK_ZONE_COND_OFFLINE: + case BLK_ZONE_COND_READONLY: + default: + /* + * Offline and read-only zones do not have a valid + * write pointer. Use 0 as for an empty zone. + */ + return 0; + } +} + static int sd_zbc_parse_report(struct scsi_disk *sdkp, u8 *buf, unsigned int idx, report_zones_cb cb, void *data) { struct scsi_device *sdp = sdkp->device; struct blk_zone zone = { 0 }; + int ret; zone.type = buf[0] & 0x0f; zone.cond = (buf[1] >> 4) & 0xf; @@ -39,7 +65,14 @@ static int sd_zbc_parse_report(struct scsi_disk *sdkp, u8 *buf, zone.cond == ZBC_ZONE_COND_FULL) zone.wp = zone.start + zone.len; - return cb(&zone, idx, data); + ret = cb(&zone, idx, data); + if (ret) + return ret; + + if (sdkp->rev_wp_ofst) + sdkp->rev_wp_ofst[idx] = sd_zbc_get_zone_wp_ofst(&zone); + + return 0; } /** @@ -229,6 +262,116 @@ static blk_status_t sd_zbc_cmnd_checks(struct scsi_cmnd *cmd) return BLK_STS_OK; } +#define SD_ZBC_INVALID_WP_OFST (~0u) +#define SD_ZBC_UPDATING_WP_OFST (SD_ZBC_INVALID_WP_OFST - 1) + +static int sd_zbc_update_wp_ofst_cb(struct blk_zone *zone, unsigned int idx, + void *data) +{ + struct scsi_disk *sdkp = data; + + lockdep_assert_held(&sdkp->zones_wp_ofst_lock); + + sdkp->zones_wp_ofst[idx] = sd_zbc_get_zone_wp_ofst(zone); + + return 0; +} + +static void sd_zbc_update_wp_ofst_workfn(struct work_struct *work) +{ + struct scsi_disk *sdkp; + unsigned int zno; + int ret; + + sdkp = container_of(work, struct scsi_disk, zone_wp_ofst_work); + + spin_lock_bh(&sdkp->zones_wp_ofst_lock); + for (zno = 0; zno < sdkp->nr_zones; zno++) { + if (sdkp->zones_wp_ofst[zno] != SD_ZBC_UPDATING_WP_OFST) + continue; + + spin_unlock_bh(&sdkp->zones_wp_ofst_lock); + ret = sd_zbc_do_report_zones(sdkp, sdkp->zone_wp_update_buf, + SD_BUF_SIZE, + zno * sdkp->zone_blocks, true); + spin_lock_bh(&sdkp->zones_wp_ofst_lock); + if (!ret) + sd_zbc_parse_report(sdkp, sdkp->zone_wp_update_buf + 64, + zno, sd_zbc_update_wp_ofst_cb, + sdkp); + } + spin_unlock_bh(&sdkp->zones_wp_ofst_lock); + + scsi_device_put(sdkp->device); +} + +/** + * sd_zbc_prepare_zone_append() - Prepare an emulated ZONE_APPEND command. + * @cmd: the command to setup + * @lba: the LBA to patch + * @nr_blocks: the number of LBAs to be written + * + * Called from sd_setup_read_write_cmnd() for REQ_OP_ZONE_APPEND. + * @sd_zbc_prepare_zone_append() handles the necessary zone wrote locking and + * patching of the lba for an emulated ZONE_APPEND command. + * + * In case the cached write pointer offset is %SD_ZBC_INVALID_WP_OFST it will + * schedule a REPORT ZONES command and return BLK_STS_IOERR. + */ +blk_status_t sd_zbc_prepare_zone_append(struct scsi_cmnd *cmd, sector_t *lba, + unsigned int nr_blocks) +{ + struct request *rq = cmd->request; + struct scsi_disk *sdkp = scsi_disk(rq->rq_disk); + unsigned int wp_ofst, zno = blk_rq_zone_no(rq); + blk_status_t ret; + + ret = sd_zbc_cmnd_checks(cmd); + if (ret != BLK_STS_OK) + return ret; + + if (!blk_rq_zone_is_seq(rq)) + return BLK_STS_IOERR; + + /* Unlock of the write lock will happen in sd_zbc_complete() */ + if (!blk_req_zone_write_trylock(rq)) + return BLK_STS_ZONE_RESOURCE; + + spin_lock_bh(&sdkp->zones_wp_ofst_lock); + wp_ofst = sdkp->zones_wp_ofst[zno]; + switch (wp_ofst) { + case SD_ZBC_INVALID_WP_OFST: + /* + * We are about to schedule work to update a zone write pointer offset, + * which will cause the zone append command to be requeued. So make + * sure that the scsi device does not go away while the work is + * being processed. + */ + if (scsi_device_get(sdkp->device)) { + ret = BLK_STS_IOERR; + break; + } + sdkp->zones_wp_ofst[zno] = SD_ZBC_UPDATING_WP_OFST; + schedule_work(&sdkp->zone_wp_ofst_work); + /*FALLTHRU*/ + case SD_ZBC_UPDATING_WP_OFST: + ret = BLK_STS_DEV_RESOURCE; + break; + default: + wp_ofst = sectors_to_logical(sdkp->device, wp_ofst); + if (wp_ofst + nr_blocks > sdkp->zone_blocks) { + ret = BLK_STS_IOERR; + break; + } + + *lba += wp_ofst; + } + spin_unlock_bh(&sdkp->zones_wp_ofst_lock); + if (ret) + blk_req_zone_write_unlock(rq); + return ret; +} + /** * sd_zbc_setup_zone_mgmt_cmnd - Prepare a zone ZBC_OUT command. The operations * can be RESET WRITE POINTER, OPEN, CLOSE or FINISH. @@ -269,16 +412,104 @@ blk_status_t sd_zbc_setup_zone_mgmt_cmnd(struct scsi_cmnd *cmd, return BLK_STS_OK; } +static bool sd_zbc_need_zone_wp_update(struct request *rq) +{ + switch (req_op(rq)) { + case REQ_OP_ZONE_APPEND: + case REQ_OP_ZONE_FINISH: + case REQ_OP_ZONE_RESET: + case REQ_OP_ZONE_RESET_ALL: + return true; + case REQ_OP_WRITE: + case REQ_OP_WRITE_ZEROES: + case REQ_OP_WRITE_SAME: + return blk_rq_zone_is_seq(rq); + default: + return false; + } +} + +/** + * sd_zbc_zone_wp_update - Update cached zone write pointer upon cmd completion + * @cmd: Completed command + * @good_bytes: Command reply bytes + * + * Called from sd_zbc_complete() to handle the update of the cached zone write + * pointer value in case an update is needed. + */ +static unsigned int sd_zbc_zone_wp_update(struct scsi_cmnd *cmd, + unsigned int good_bytes) +{ + int result = cmd->result; + struct request *rq = cmd->request; + struct scsi_disk *sdkp = scsi_disk(rq->rq_disk); + unsigned int zno = blk_rq_zone_no(rq); + enum req_opf op = req_op(rq); + + /* + * If we got an error for a command that needs updating the write + * pointer offset cache, we must mark the zone wp offset entry as + * invalid to force an update from disk the next time a zone append + * command is issued. + */ + spin_lock_bh(&sdkp->zones_wp_ofst_lock); + + if (result && op != REQ_OP_ZONE_RESET_ALL) { + if (op == REQ_OP_ZONE_APPEND) { + /* Force complete completion (no retry) */ + good_bytes = 0; + scsi_set_resid(cmd, blk_rq_bytes(rq)); + } + + /* + * Force an update of the zone write pointer offset on + * the next zone append access. + */ + if (sdkp->zones_wp_ofst[zno] != SD_ZBC_UPDATING_WP_OFST) + sdkp->zones_wp_ofst[zno] = SD_ZBC_INVALID_WP_OFST; + goto unlock_wp_ofst; + } + + switch (op) { + case REQ_OP_ZONE_APPEND: + rq->__sector += sdkp->zones_wp_ofst[zno]; + /* fallthrough */ + case REQ_OP_WRITE_ZEROES: + case REQ_OP_WRITE_SAME: + case REQ_OP_WRITE: + if (sdkp->zones_wp_ofst[zno] < sd_zbc_zone_sectors(sdkp)) + sdkp->zones_wp_ofst[zno] += good_bytes >> SECTOR_SHIFT; + break; + case REQ_OP_ZONE_RESET: + sdkp->zones_wp_ofst[zno] = 0; + break; + case REQ_OP_ZONE_FINISH: + sdkp->zones_wp_ofst[zno] = sd_zbc_zone_sectors(sdkp); + break; + case REQ_OP_ZONE_RESET_ALL: + memset(sdkp->zones_wp_ofst, 0, + sdkp->nr_zones * sizeof(unsigned int)); + break; + default: + break; + } + +unlock_wp_ofst: + spin_unlock_bh(&sdkp->zones_wp_ofst_lock); + + return good_bytes; +} + /** * sd_zbc_complete - ZBC command post processing. * @cmd: Completed command * @good_bytes: Command reply bytes * @sshdr: command sense header * - * Called from sd_done(). Process report zones reply and handle reset zone - * and write commands errors. + * Called from sd_done() to handle zone commands errors and updates to the + * device queue zone write pointer offset cahce. */ -void sd_zbc_complete(struct scsi_cmnd *cmd, unsigned int good_bytes, +unsigned int sd_zbc_complete(struct scsi_cmnd *cmd, unsigned int good_bytes, struct scsi_sense_hdr *sshdr) { int result = cmd->result; @@ -294,7 +525,13 @@ void sd_zbc_complete(struct scsi_cmnd *cmd, unsigned int good_bytes, * so be quiet about the error. */ rq->rq_flags |= RQF_QUIET; - } + } else if (sd_zbc_need_zone_wp_update(rq)) + good_bytes = sd_zbc_zone_wp_update(cmd, good_bytes); + + if (req_op(rq) == REQ_OP_ZONE_APPEND) + blk_req_zone_write_unlock(rq); + + return good_bytes; } /** @@ -396,11 +633,48 @@ static int sd_zbc_check_capacity(struct scsi_disk *sdkp, unsigned char *buf, return 0; } +static void sd_zbc_revalidate_zones_cb(struct gendisk *disk) +{ + struct scsi_disk *sdkp = scsi_disk(disk); + + swap(sdkp->zones_wp_ofst, sdkp->rev_wp_ofst); +} + +static int sd_zbc_revalidate_zones(struct scsi_disk *sdkp, + unsigned int nr_zones) +{ + int ret; + + /* + * Make sure revalidate zones are serialized to ensure exclusive + * updates of the temporary array sdkp->rev_wp_ofst. + */ + mutex_lock(&sdkp->rev_mutex); + + sdkp->rev_wp_ofst = kvcalloc(nr_zones, sizeof(u32), GFP_NOIO); + if (!sdkp->rev_wp_ofst) { + ret = -ENOMEM; + goto unlock; + } + + ret = blk_revalidate_disk_zones(sdkp->disk, sd_zbc_revalidate_zones_cb); + + kvfree(sdkp->rev_wp_ofst); + sdkp->rev_wp_ofst = NULL; + +unlock: + mutex_unlock(&sdkp->rev_mutex); + + return ret; +} + int sd_zbc_read_zones(struct scsi_disk *sdkp, unsigned char *buf) { struct gendisk *disk = sdkp->disk; + struct request_queue *q = disk->queue; unsigned int nr_zones; u32 zone_blocks = 0; + u32 max_append; int ret; if (!sd_is_zoned(sdkp)) @@ -420,10 +694,14 @@ int sd_zbc_read_zones(struct scsi_disk *sdkp, unsigned char *buf) if (ret != 0) goto err; + max_append = min_t(u32, logical_to_sectors(sdkp->device, zone_blocks), + q->limits.max_segments << (PAGE_SHIFT - 9)); + max_append = min_t(u32, max_append, queue_max_hw_sectors(q)); + /* The drive satisfies the kernel restrictions: set it up */ - blk_queue_flag_set(QUEUE_FLAG_ZONE_RESETALL, sdkp->disk->queue); - blk_queue_required_elevator_features(sdkp->disk->queue, - ELEVATOR_F_ZBD_SEQ_WRITE); + blk_queue_flag_set(QUEUE_FLAG_ZONE_RESETALL, q); + blk_queue_required_elevator_features(q, ELEVATOR_F_ZBD_SEQ_WRITE); + blk_queue_max_zone_append_sectors(q, max_append); nr_zones = round_up(sdkp->capacity, zone_blocks) >> ilog2(zone_blocks); /* READ16/WRITE16 is mandatory for ZBC disks */ @@ -443,8 +721,8 @@ int sd_zbc_read_zones(struct scsi_disk *sdkp, unsigned char *buf) if (sdkp->zone_blocks != zone_blocks || sdkp->nr_zones != nr_zones || - disk->queue->nr_zones != nr_zones) { - ret = blk_revalidate_disk_zones(disk); + q->nr_zones != nr_zones) { + ret = sd_zbc_revalidate_zones(sdkp, nr_zones); if (ret != 0) goto err; sdkp->zone_blocks = zone_blocks; @@ -475,3 +753,28 @@ void sd_zbc_print_zones(struct scsi_disk *sdkp) sdkp->nr_zones, sdkp->zone_blocks); } + +int sd_zbc_init_disk(struct scsi_disk *sdkp) +{ + if (!sd_is_zoned(sdkp)) + return 0; + + sdkp->zones_wp_ofst = NULL; + spin_lock_init(&sdkp->zones_wp_ofst_lock); + sdkp->rev_wp_ofst = NULL; + mutex_init(&sdkp->rev_mutex); + INIT_WORK(&sdkp->zone_wp_ofst_work, sd_zbc_update_wp_ofst_workfn); + sdkp->zone_wp_update_buf = kzalloc(SD_BUF_SIZE, GFP_KERNEL); + if (!sdkp->zone_wp_update_buf) + return -ENOMEM; + + return 0; +} + +void sd_zbc_release_disk(struct scsi_disk *sdkp) +{ + kvfree(sdkp->zones_wp_ofst); + sdkp->zones_wp_ofst = NULL; + kfree(sdkp->zone_wp_update_buf); + sdkp->zone_wp_update_buf = NULL; +} From patchwork Wed Apr 15 09:05:11 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Thumshirn X-Patchwork-Id: 11490611 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DD21881 for ; Wed, 15 Apr 2020 09:06:30 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C12E0206E9 for ; Wed, 15 Apr 2020 09:06:30 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="b6EVcGfg" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2393878AbgDOJG0 (ORCPT ); Wed, 15 Apr 2020 05:06:26 -0400 Received: from esa6.hgst.iphmx.com ([216.71.154.45]:19515 "EHLO esa6.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2408237AbgDOJFu (ORCPT ); Wed, 15 Apr 2020 05:05:50 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1586941549; x=1618477549; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=EptpuszxKds2VZq8D512Jp2spjpDwJgzg8AZNAyDqGo=; b=b6EVcGfgb/oPGmahu3IG7qqRpyd/R9+EYXpPt5890a/7+Nd4L6Wn8wmY yuuBkzrUCVM7xSWxwn5DNRPRttJPTV/xFgGVxPeO22PJfXaEiUvj0pqTI 0oW7DVSWs+c+/Bz2fJcJGxfh7x2/nqYw6HFTYm8n4f3zVQGr8FD4iX2Ej mlnP7T2jRfbC3M/HHvIlK+GWQE+8pSvEgRHj5bQm63SV59vweang3COZJ hrmq48aYjnnWr5hHQncqdwJGgNI3ni6Z+7i6TJ6p1Dg2j8R+KVOwjbbSo TIZDT+n2GrGSSgKhsI5LCMfszFpMAF7iR70PuFOSxckbF++7JAy5YXQiH w==; IronPort-SDR: MrggPRi7VsMsaO0iZApmSj41Kjyj/zF+gEVkPIhutvAKSkmGuYVj8sOj3rx8lLHKvcV08UCgz+ Wippdkcx2zK9IcYmM4gMAB89RjTTniH6zpTBbT+tp6WCdRwifLSTT0/2gMilA57lmAgM32+umN 8e9h2bPEdq4BjCQIDm6H9i3EyTMB5dMwIVgrcE/9zeeIfLwAG/pRq6VbzBsLTFcxpNOqzp1rY6 iDwLNWLmbBa6FVJIx8XmiHAmYKT2eDito5P79a8fZoHrTtgWbMbPUdKQy05F0BCyLkgUttTfGi Zuo= X-IronPort-AV: E=Sophos;i="5.72,386,1580745600"; d="scan'208";a="136802998" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 15 Apr 2020 17:05:39 +0800 IronPort-SDR: lEHy18QdUcPh+fVDjXxf9VBrROI7j185P6qaKeGEyoPi97M8Qgc/3SRh3ObbOtUVPhF3cg18Q5 P455MyOsb3lDMLD/O8aq8aqxKRkItTqbx6Vhw2rAX6LPwIC5Bs7zZmTtOyddecyZW1H8qOCKF+ gwKbjljC5gkHPrRbSbLuROnah8AzRaxSteecsx6E6dTAyYpRYUxwa3aJmlJq5gDqXzF7MljJOv Fd+4bRUbgO0qvHg8RYOerk8r9mx2X5710jrC4H8WmDRS+FoTAJP3b/lVjMbN1rL78AYR8Ld7Ph Tj+ZS8IkmQU1r5DYLBozaTrU Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Apr 2020 01:56:41 -0700 IronPort-SDR: VtfkjweSwgT8rqL2+Jh7CNDNoOPZsFbgK6Nl2RfMC06f2FDvW130FxVz5Ear9oLrOLLRTUxQw+ ik/PfFwH5Qkz9kpU6bVZtyIlS+REzRU4up/OEG/Iop5SazkG0kNfEANC6CPorRz7uwSKSfRPfe VKi81nhd4raNf5orOL/1EId2fGbNAj6a7nUUyscZhT8gISsobpcUahyBTG2+QwkZgy1O+nuPvn y6UYMDERZ6iLoBr8ApBemgWJiMWVb4TltpyQof7J5FLNwOSE/iHSgyxMpftnggyCngnHioIB9Z UHs= WDCIronportException: Internal Received: from unknown (HELO redsun60.ssa.fujisawa.hgst.com) ([10.149.66.36]) by uls-op-cesaip01.wdc.com with ESMTP; 15 Apr 2020 02:05:37 -0700 From: Johannes Thumshirn To: Jens Axboe Cc: Christoph Hellwig , linux-block , Damien Le Moal , Keith Busch , "linux-scsi @ vger . kernel . org" , "Martin K . Petersen" , "linux-fsdevel @ vger . kernel . org" , Damien Le Moal , Christoph Hellwig , Johannes Thumshirn Subject: [PATCH v6 09/11] null_blk: Support REQ_OP_ZONE_APPEND Date: Wed, 15 Apr 2020 18:05:11 +0900 Message-Id: <20200415090513.5133-10-johannes.thumshirn@wdc.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200415090513.5133-1-johannes.thumshirn@wdc.com> References: <20200415090513.5133-1-johannes.thumshirn@wdc.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Damien Le Moal Support REQ_OP_ZONE_APPEND requests for null_blk devices with zoned mode enabled. Use the internally tracked zone write pointer position as the actual write position and return it using the command request __sector field in the case of an mq device and using the command BIO sector in the case of a BIO device. Reviewed-by: Christoph Hellwig Signed-off-by: Damien Le Moal Signed-off-by: Johannes Thumshirn --- drivers/block/null_blk_zoned.c | 39 +++++++++++++++++++++++++++------- 1 file changed, 31 insertions(+), 8 deletions(-) diff --git a/drivers/block/null_blk_zoned.c b/drivers/block/null_blk_zoned.c index 46641df2e58e..5c70e0c7e862 100644 --- a/drivers/block/null_blk_zoned.c +++ b/drivers/block/null_blk_zoned.c @@ -70,13 +70,22 @@ int null_init_zoned_dev(struct nullb_device *dev, struct request_queue *q) int null_register_zoned_dev(struct nullb *nullb) { + struct nullb_device *dev = nullb->dev; struct request_queue *q = nullb->q; - if (queue_is_mq(q)) - return blk_revalidate_disk_zones(nullb->disk, NULL); + if (queue_is_mq(q)) { + int ret = blk_revalidate_disk_zones(nullb->disk, NULL); + + if (ret) + return ret; + } else { + blk_queue_chunk_sectors(q, dev->zone_size_sects); + q->nr_zones = blkdev_nr_zones(nullb->disk); + } - blk_queue_chunk_sectors(q, nullb->dev->zone_size_sects); - q->nr_zones = blkdev_nr_zones(nullb->disk); + blk_queue_max_zone_append_sectors(q, + min_t(sector_t, q->limits.max_hw_sectors, + dev->zone_size_sects)); return 0; } @@ -138,7 +147,7 @@ size_t null_zone_valid_read_len(struct nullb *nullb, } static blk_status_t null_zone_write(struct nullb_cmd *cmd, sector_t sector, - unsigned int nr_sectors) + unsigned int nr_sectors, bool append) { struct nullb_device *dev = cmd->nq->dev; unsigned int zno = null_zone_no(dev, sector); @@ -158,9 +167,21 @@ static blk_status_t null_zone_write(struct nullb_cmd *cmd, sector_t sector, case BLK_ZONE_COND_IMP_OPEN: case BLK_ZONE_COND_EXP_OPEN: case BLK_ZONE_COND_CLOSED: - /* Writes must be at the write pointer position */ - if (sector != zone->wp) + /* + * Regular writes must be at the write pointer position. + * Zone append writes are automatically issued at the write + * pointer and the position returned using the request or BIO + * sector. + */ + if (append) { + sector = zone->wp; + if (cmd->bio) + cmd->bio->bi_iter.bi_sector = sector; + else + cmd->rq->__sector = sector; + } else if (sector != zone->wp) { return BLK_STS_IOERR; + } if (zone->cond != BLK_ZONE_COND_EXP_OPEN) zone->cond = BLK_ZONE_COND_IMP_OPEN; @@ -242,7 +263,9 @@ blk_status_t null_process_zoned_cmd(struct nullb_cmd *cmd, enum req_opf op, { switch (op) { case REQ_OP_WRITE: - return null_zone_write(cmd, sector, nr_sectors); + return null_zone_write(cmd, sector, nr_sectors, false); + case REQ_OP_ZONE_APPEND: + return null_zone_write(cmd, sector, nr_sectors, true); case REQ_OP_ZONE_RESET: case REQ_OP_ZONE_RESET_ALL: case REQ_OP_ZONE_OPEN: From patchwork Wed Apr 15 09:05:12 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Thumshirn X-Patchwork-Id: 11490605 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E605F81 for ; Wed, 15 Apr 2020 09:06:27 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id CCB352082E for ; Wed, 15 Apr 2020 09:06:27 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="mUa0v5+o" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2408249AbgDOJGY (ORCPT ); Wed, 15 Apr 2020 05:06:24 -0400 Received: from esa6.hgst.iphmx.com ([216.71.154.45]:19517 "EHLO esa6.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2393878AbgDOJGE (ORCPT ); Wed, 15 Apr 2020 05:06:04 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1586941563; x=1618477563; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=AAnwE0FIuZxZnRSG0Pax7N2EfNwbtDZD2M2TFf4NlDk=; b=mUa0v5+oOBNmAe3SeY+NzjNeJ0kVYdVi0Lq7dkSUcSZiNVbp5CVxk1w1 uy/qT+MuNjUOTTigNZ2U1j12u9js4KGaLOPIPBz5Co65Mqm3fpEpLARU1 hVnFCFqMPsdCdtqV9oIL6VmTUvQOoPsIE9ymeFbOSr6N1Zqi5OmI+SLcx WM7+UX2HQRdvFglO6vloiQGC6lmq6ct6VxK6Cd20T6hJuFo9TbRQWSxhl yUyRhXVypMVAfAgdJVae5tbpMKfXmkSEKB5nnc5f9w3AszE6OW+lc2VCr atNdzGEs0ENQ8eVBeaVvcUvsKGTs9o3CpZTkj2RvdTKVFBPeMbvzJCTgy A==; IronPort-SDR: MyeknbEkG6Ks8fAHM1z170gZTviUKkSmyD2zmvb0UATEZ6B2jIBb/qc25xYIZdQai+IkA8VvB+ nzS9o72y/fOtyi+snG/vO3IYt8wpd0W1FlSZvJMa2mbYJYXkQQwV2nerAy6cooz/I5/xgavWHf nnsiVpq8clOJm+ZiNSEWLHWeZmAWt9cH7MBiMzw31sqZiRTKVaN56xRQG8lzgKeozedJtRQQt9 YpWQrzZpWi3P9x17oBXCM+sMSAVazTajA91po2zWTMjmMB1emtEJ/it5hnkM4XaTvEZ5aclaEJ VCc= X-IronPort-AV: E=Sophos;i="5.72,386,1580745600"; d="scan'208";a="136803001" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 15 Apr 2020 17:05:41 +0800 IronPort-SDR: zWablAZD4olXlLS23nWhUpAyu/27WSuNsAGH/6dgKB+HE21HaHrx7+CUEv+q2NIohcNkSbGYbC jqmL+4/AA6cxVz6WpB5gOXOdYgy3J13VrOCYzUBfHtI30rlZ7IN9Oi1SLu5omIpogBJoWdgHsC FnlRlgajZgPxjc5NApscPNi+1pSL5bK7ixpx5to7w+ugzXS0Iq07ksaWsAU6aWVmdqw8reAfxU Ozx4kpxQRrnGJzBfkaZwZKzGVjiJBq311Tlyfk0OFD1jInH1wewFsN4RW62GQwqFwU0MYdefJ4 3SRbB2+MbLskOTNC1xMen64u Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Apr 2020 01:56:42 -0700 IronPort-SDR: VlQakm2KGJCZwf1f6XZrMtD6nmu7RsZHykkJo7/ChqFty7P9ylQllSbQn1eYudK2+SteJX6jdO kN+OCCAz1OCO9ZOi90PlcaTPKtJMyY58WEp8mnUeuZTK4bLOGeOCIdTrC/we8uFFCgw+4pKN0U Kh3JBgIiRuiJUfNK4scpLJVVLiSdjtTtDVo28Bz0phMQsfFTY42byrnZgg2cvSFEa85C4woHj0 JTjwpDpWmGVUxRFmYTBWliZryeDhV7C9oqKfdf0eHE8p2jAVS3S0e3vmAukQzWFlw7pdfs0RL1 cRo= WDCIronportException: Internal Received: from unknown (HELO redsun60.ssa.fujisawa.hgst.com) ([10.149.66.36]) by uls-op-cesaip01.wdc.com with ESMTP; 15 Apr 2020 02:05:39 -0700 From: Johannes Thumshirn To: Jens Axboe Cc: Christoph Hellwig , linux-block , Damien Le Moal , Keith Busch , "linux-scsi @ vger . kernel . org" , "Martin K . Petersen" , "linux-fsdevel @ vger . kernel . org" , Johannes Thumshirn Subject: [PATCH v6 10/11] block: export bio_release_pages and bio_iov_iter_get_pages Date: Wed, 15 Apr 2020 18:05:12 +0900 Message-Id: <20200415090513.5133-11-johannes.thumshirn@wdc.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200415090513.5133-1-johannes.thumshirn@wdc.com> References: <20200415090513.5133-1-johannes.thumshirn@wdc.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Export bio_release_pages and bio_iov_iter_get_pages, so they can be used from modular code. Signed-off-by: Johannes Thumshirn --- block/bio.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/block/bio.c b/block/bio.c index ba10076ced9c..6ece2d3019b0 100644 --- a/block/bio.c +++ b/block/bio.c @@ -942,6 +942,7 @@ void bio_release_pages(struct bio *bio, bool mark_dirty) put_page(bvec->bv_page); } } +EXPORT_SYMBOL_GPL(bio_release_pages); static int __bio_iov_bvec_add_pages(struct bio *bio, struct iov_iter *iter) { @@ -1104,6 +1105,7 @@ int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) bio_set_flag(bio, BIO_NO_PAGE_REF); return bio->bi_vcnt ? 0 : ret; } +EXPORT_SYMBOL_GPL(bio_iov_iter_get_pages); static void submit_bio_wait_endio(struct bio *bio) { From patchwork Wed Apr 15 09:05:13 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Thumshirn X-Patchwork-Id: 11490603 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 75C15112C for ; Wed, 15 Apr 2020 09:06:24 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 595B020768 for ; Wed, 15 Apr 2020 09:06:24 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="fJ0B2pAN" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2408244AbgDOJGT (ORCPT ); Wed, 15 Apr 2020 05:06:19 -0400 Received: from esa6.hgst.iphmx.com ([216.71.154.45]:19515 "EHLO esa6.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2393880AbgDOJGJ (ORCPT ); Wed, 15 Apr 2020 05:06:09 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1586941568; x=1618477568; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=DntH3FOL+F2qJ4JB8bNfeHvJeRnIrbJq4X8RpN63JaA=; b=fJ0B2pANzlSl0VBrWupGdOmva4z4MMx1j9WLLYI4ZzvzJ2uswJ1Wl0Gx 8Ooxc48WFdHDGZE+5IMY5RuqwANcpHqYEToqqDeHtg5o/LriK9cXBx8mN 93HQ1td/nUWUA+LmBF9rxvdOzRTixhGsXdL+mhaW1LmfkckG6ZdVCegII p+48iW74DAEACMMnYXkqPhdtC7oZDNjil7WBcZkgqE7QIUeZnvIS147/a +uP3tXdsmOdWZ22vVSUES/xBTr1vtD0OFq5Gya2H/sXa6nqTpF//4Kb+t ccvLNzw66krod8/EuRWcJ7WkYwUOZZGXAAEkoq/G7Ept1afwuD8by3/E+ A==; IronPort-SDR: mRqp9F8CN+TLjwY52VfuZqc3vxnZ0pCnZySf7JCNjrZSRL7SiPIteA/lhFeyALGNLBid6hyVfJ sfVTeaFK67Y/jD3TFjDNl3uhm45QODaau7eeg/dQ47CB2NPbF/qWgCbZW7Qe/oIkDptNx1PkpU 2tlmzwvgEa8qccX+WDZB903KG3P9vg7YAaG3xx5DjvPY1AJujgvtQgEj2jhdUuEKMdBBqx71FE w5faa5L2wX4wl3OUUcAiwAO3D79AHp8jfGxD6EgEu0RmKN1dVQsucOh7bN7EFjmqFOjcW5JtK/ e9E= X-IronPort-AV: E=Sophos;i="5.72,386,1580745600"; d="scan'208";a="136803003" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 15 Apr 2020 17:05:42 +0800 IronPort-SDR: Wy5oZ7G809To6Zq5kAITRnlSIqD4HksBFE5FIivXLK/u6/AUUrkyzS0MSGO90NQ6T6P4q++gk7 iriZqjqXiW8zkJxKlkCM+ebXXEWVnG42/WErfpCkXe/wbe1cTEnJ7/JFQLlab3EFD2GSnQeiFe ClNHShrkiuITbvWY8iNhdE4/gAURlSRF0hVnpH/BCHDXl7UHDrvDt68PIhggDaMFv8trRnrB76 wx3L4nirHTrtLWE7VvwxUGkCWX6WsagCTCzvdCge0ovJLco3SZD4Dp/F67/rjH0uqBn9DLJrET lpeBCiUPDaOMW09kbDTTM9MF Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Apr 2020 01:56:44 -0700 IronPort-SDR: dKC1E9AeVkx+muIlHXeFzjxxjuxcWeA+EC5/LsL0rUxOlaPuV09fr9UynNxAPRrWJAJZ9jqo91 k0PiJ1Fm2nOAcn3XGVTL5SyRHrPUMeq90nx14ewx7oMQPz5AV9mH+nb+qnomacEXrmQ4ep5EKb Aw8Y2yG7/IQwN5d6yj8kicVHi2P8B44r/yl3vJws2TEHERx2k77cCAc2mPdSVwhS9V5SSpfMjr haPoeyiQqo181Q6b09LTpV1994VC7+ES84sYiw5LaaVkCSPWjv691nR4Tl76Ek5zNtYhAWnUXE mtY= WDCIronportException: Internal Received: from unknown (HELO redsun60.ssa.fujisawa.hgst.com) ([10.149.66.36]) by uls-op-cesaip01.wdc.com with ESMTP; 15 Apr 2020 02:05:41 -0700 From: Johannes Thumshirn To: Jens Axboe Cc: Christoph Hellwig , linux-block , Damien Le Moal , Keith Busch , "linux-scsi @ vger . kernel . org" , "Martin K . Petersen" , "linux-fsdevel @ vger . kernel . org" , Johannes Thumshirn Subject: [PATCH v6 11/11] zonefs: use REQ_OP_ZONE_APPEND for sync DIO Date: Wed, 15 Apr 2020 18:05:13 +0900 Message-Id: <20200415090513.5133-12-johannes.thumshirn@wdc.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200415090513.5133-1-johannes.thumshirn@wdc.com> References: <20200415090513.5133-1-johannes.thumshirn@wdc.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Synchronous direct I/O to a sequential write only zone can be issued using the new REQ_OP_ZONE_APPEND request operation. As dispatching multiple BIOs can potentially result in reordering, we cannot support asynchronous IO via this interface. We also can only dispatch up to queue_max_zone_append_sectors() via the new zone-append method and have to return a short write back to user-space in case an IO larger than queue_max_zone_append_sectors() has been issued. Signed-off-by: Johannes Thumshirn --- fs/zonefs/super.c | 80 ++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 72 insertions(+), 8 deletions(-) diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c index 3ce9829a6936..0bf7009f50a2 100644 --- a/fs/zonefs/super.c +++ b/fs/zonefs/super.c @@ -20,6 +20,7 @@ #include #include #include +#include #include "zonefs.h" @@ -596,6 +597,61 @@ static const struct iomap_dio_ops zonefs_write_dio_ops = { .end_io = zonefs_file_write_dio_end_io, }; +static ssize_t zonefs_file_dio_append(struct kiocb *iocb, struct iov_iter *from) +{ + struct inode *inode = file_inode(iocb->ki_filp); + struct zonefs_inode_info *zi = ZONEFS_I(inode); + struct block_device *bdev = inode->i_sb->s_bdev; + unsigned int max; + struct bio *bio; + ssize_t size; + int nr_pages; + ssize_t ret; + + nr_pages = iov_iter_npages(from, BIO_MAX_PAGES); + if (!nr_pages) + return 0; + + max = queue_max_zone_append_sectors(bdev_get_queue(bdev)); + max = ALIGN_DOWN(max << SECTOR_SHIFT, inode->i_sb->s_blocksize); + iov_iter_truncate(from, max); + + bio = bio_alloc_bioset(GFP_NOFS, nr_pages, &fs_bio_set); + if (!bio) + return -ENOMEM; + + bio_set_dev(bio, bdev); + bio->bi_iter.bi_sector = zi->i_zsector; + bio->bi_write_hint = iocb->ki_hint; + bio->bi_ioprio = iocb->ki_ioprio; + bio->bi_opf = REQ_OP_ZONE_APPEND | REQ_SYNC | REQ_IDLE; + if (iocb->ki_flags & IOCB_DSYNC) + bio->bi_opf |= REQ_FUA; + + ret = bio_iov_iter_get_pages(bio, from); + if (unlikely(ret)) { + bio_io_error(bio); + return ret; + } + size = bio->bi_iter.bi_size; + task_io_account_write(ret); + + if (iocb->ki_flags & IOCB_HIPRI) + bio_set_polled(bio, iocb); + + ret = submit_bio_wait(bio); + + bio_put(bio); + + zonefs_file_write_dio_end_io(iocb, size, ret, 0); + if (ret >= 0) { + iocb->ki_pos += size; + return size; + } + + return ret; +} + /* * Handle direct writes. For sequential zone files, this is the only possible * write path. For these files, check that the user is issuing writes @@ -611,6 +667,8 @@ static ssize_t zonefs_file_dio_write(struct kiocb *iocb, struct iov_iter *from) struct inode *inode = file_inode(iocb->ki_filp); struct zonefs_inode_info *zi = ZONEFS_I(inode); struct super_block *sb = inode->i_sb; + bool sync = is_sync_kiocb(iocb); + bool append = false; size_t count; ssize_t ret; @@ -619,7 +677,7 @@ static ssize_t zonefs_file_dio_write(struct kiocb *iocb, struct iov_iter *from) * as this can cause write reordering (e.g. the first aio gets EAGAIN * on the inode lock but the second goes through but is now unaligned). */ - if (zi->i_ztype == ZONEFS_ZTYPE_SEQ && !is_sync_kiocb(iocb) && + if (zi->i_ztype == ZONEFS_ZTYPE_SEQ && !sync && (iocb->ki_flags & IOCB_NOWAIT)) return -EOPNOTSUPP; @@ -643,16 +701,22 @@ static ssize_t zonefs_file_dio_write(struct kiocb *iocb, struct iov_iter *from) } /* Enforce sequential writes (append only) in sequential zones */ - mutex_lock(&zi->i_truncate_mutex); - if (zi->i_ztype == ZONEFS_ZTYPE_SEQ && iocb->ki_pos != zi->i_wpoffset) { + if (zi->i_ztype == ZONEFS_ZTYPE_SEQ) { + mutex_lock(&zi->i_truncate_mutex); + if (iocb->ki_pos != zi->i_wpoffset) { + mutex_unlock(&zi->i_truncate_mutex); + ret = -EINVAL; + goto inode_unlock; + } mutex_unlock(&zi->i_truncate_mutex); - ret = -EINVAL; - goto inode_unlock; + append = sync; } - mutex_unlock(&zi->i_truncate_mutex); - ret = iomap_dio_rw(iocb, from, &zonefs_iomap_ops, - &zonefs_write_dio_ops, is_sync_kiocb(iocb)); + if (append) + ret = zonefs_file_dio_append(iocb, from); + else + ret = iomap_dio_rw(iocb, from, &zonefs_iomap_ops, + &zonefs_write_dio_ops, sync); if (zi->i_ztype == ZONEFS_ZTYPE_SEQ && (ret > 0 || ret == -EIOCBQUEUED)) { if (ret > 0)