From patchwork Fri Sep 11 08:56:48 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Thumshirn X-Patchwork-Id: 11770025 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EBA56746 for ; Fri, 11 Sep 2020 08:57:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C8BB4221EE for ; Fri, 11 Sep 2020 08:57:20 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="h466maQg" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725812AbgIKI5M (ORCPT ); Fri, 11 Sep 2020 04:57:12 -0400 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:58877 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725562AbgIKI47 (ORCPT ); Fri, 11 Sep 2020 04:56:59 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1599814619; x=1631350619; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=eWTIxp/RmEqVjo//4W1v8lFGYCXvB0lWyzCjmwLgssQ=; b=h466maQg+JWtxSUEZThwWeNmByMaeyl5gCtoyV1ykrVOLAlC61mUnTbe dCZc5v3HxMcUo5lJbDXMw3ZQ8J2cwYRAChLnjwwbabSRYv0se1dUF7P0F SVKVxlReS3Ngd/AJThTvzS2LGiZ06jPmDlIbavm3HTyHP5maTG1qg+UcX k+z9+FCqFl0fzDH2qGtLc1Yi7MP3c0qsq5hBJXoiQR1MjUandDiXbbluR bzRnAhOkf/2y9KO5CTzy5S/i43hUoMaOxZ9O5J88/V/Ns30XCH36jTBdB Ky2MKDldDTi1FXd8BGXY5GFdvLwjEDfpJYPTFJk3s/qt2WTWhZYgpUxYF A==; IronPort-SDR: 8GxdoOYe1dyDbUjI8l8i9XbdwOd5M9T+Ggzp0OFYqplOPZlKlrULUUbYD6wkqMNQ7CNz5jfnGC 45MPbwYW0JwywLt7I4mSPlyBCf/5yY56Ca9GPLmXwhr7cxjBr7nh49EsiqXAtCOegD/2r094dk zL0zVO8A6yZ6ym9ty39MX6IOmeSg2cJ3qjWOpZSt+AWtQYBopHk5NYKONsvzRTNUX2MLYHprgS T8nKQxEEahOrIZ6CY+dNfiHPU3mUFbmyIsBX2Phr4XZPmtjdqPD55WencZAS2jlSq5uPrdkuFc mDQ= X-IronPort-AV: E=Sophos;i="5.76,414,1592841600"; d="scan'208";a="147041238" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 11 Sep 2020 16:56:58 +0800 IronPort-SDR: Uja5UuRU+VS/6mAa3B0QpVSZhliuObIUFImvrOMdYEpsFJQ2lPQQJuYHwtzzL0pcOJqRs+Kztx QM2CuFzXoUKg== Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Sep 2020 01:44:15 -0700 IronPort-SDR: 6eQNwI1ncrYWuoPGTXz9u+ED9CmIi7SIRnI23c0DiO488MA7WlokYqh8akhYN9K5qSdClWi0Hl VIpzQqQOK8Yw== WDCIronportException: Internal Received: from unknown (HELO redsun60.ssa.fujisawa.hgst.com) ([10.149.66.36]) by uls-op-cesaip01.wdc.com with ESMTP; 11 Sep 2020 01:56:57 -0700 From: Johannes Thumshirn To: Damien Le Moal Cc: linux-fsdevel@vger.kernel.org, Johannes Thumshirn , Damien Le Moal Subject: [PATCH v5 1/4] zonefs: introduce helper for zone management Date: Fri, 11 Sep 2020 17:56:48 +0900 Message-Id: <20200911085651.23526-2-johannes.thumshirn@wdc.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200911085651.23526-1-johannes.thumshirn@wdc.com> References: <20200911085651.23526-1-johannes.thumshirn@wdc.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Introduce a helper function for sending zone management commands to the block device. As zone management commands can change a zone write pointer position reflected in the size of the zone file, this function expects the truncate mutex to be held. Signed-off-by: Johannes Thumshirn Reviewed-by: Damien Le Moal --- fs/zonefs/super.c | 29 ++++++++++++++++++++++------- 1 file changed, 22 insertions(+), 7 deletions(-) diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c index 8ec7c8f109d7..6e13a5127b01 100644 --- a/fs/zonefs/super.c +++ b/fs/zonefs/super.c @@ -24,6 +24,26 @@ #include "zonefs.h" +static inline int zonefs_zone_mgmt(struct inode *inode, + enum req_opf op) +{ + struct zonefs_inode_info *zi = ZONEFS_I(inode); + int ret; + + lockdep_assert_held(&zi->i_truncate_mutex); + + ret = blkdev_zone_mgmt(inode->i_sb->s_bdev, op, zi->i_zsector, + zi->i_zone_size >> SECTOR_SHIFT, GFP_NOFS); + if (ret) { + zonefs_err(inode->i_sb, + "Zone management operation %s at %llu failed %d\n", + blk_op_str(op), zi->i_zsector, ret); + return ret; + } + + return 0; +} + static int zonefs_iomap_begin(struct inode *inode, loff_t offset, loff_t length, unsigned int flags, struct iomap *iomap, struct iomap *srcmap) @@ -397,14 +417,9 @@ static int zonefs_file_truncate(struct inode *inode, loff_t isize) if (isize == old_isize) goto unlock; - ret = blkdev_zone_mgmt(inode->i_sb->s_bdev, op, zi->i_zsector, - zi->i_zone_size >> SECTOR_SHIFT, GFP_NOFS); - if (ret) { - zonefs_err(inode->i_sb, - "Zone management operation at %llu failed %d", - zi->i_zsector, ret); + ret = zonefs_zone_mgmt(inode, op); + if (ret) goto unlock; - } zonefs_update_stats(inode, isize); truncate_setsize(inode, isize); From patchwork Fri Sep 11 08:56:49 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Thumshirn X-Patchwork-Id: 11770027 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1BA40746 for ; Fri, 11 Sep 2020 08:57:24 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id ED158221EE for ; Fri, 11 Sep 2020 08:57:23 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="f6YU5jnh" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725791AbgIKI5U (ORCPT ); Fri, 11 Sep 2020 04:57:20 -0400 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:58876 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725550AbgIKI5A (ORCPT ); Fri, 11 Sep 2020 04:57:00 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1599814620; x=1631350620; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=bEWC6HFgbb+Y6fu3xjZjsnc7uaaqXo/mLp9u93jWXhU=; b=f6YU5jnhA87qLzj/IX/Kt+rwIav+/619rs1Hkztfw/CeDjnSjWUjFu2t K4hjD47o7NZHmoCaVFmPR+UrnodRzb6EBpRyzuKsOu5I1rdQj05FRSuAo T13NeGkNHM9KpFq6hmKRgVoc8R+nUEfWZmWfavTNGDrfcYuRyc6uiZQrf PiFCpscUoZwPCXYkMkdXd6tEVPEzFgWLtp1vAHTpgvpYObqN5K8bUmjmA pIC01TLA0zdReUZ6svLc0nWToml+pNd4/bpFwbbRdjxLN3+wsdDRECULm VR7/1bm8Pzdw1C4+iXSqOYUT9wtzpaGw9BEUNvIYzOjHh9TC4nCtIr2w6 g==; IronPort-SDR: XESHdm+xI5lqO/Hckc+BZPTm9YYozmmQr145UpfI1tLbEQ0GPuOgm4BKdnitWBoKsSpCbyKQtq ZaM+zfh+TsBpGpC08r/mYrRMENdjV/CQH0s9zG9vFZXmzCzdZwLQLxBp8qio2j38CDrX1+0e47 nlSKmZES6vyZKoKZkFs2eaX+wJVHKM7waIWh2yDq8V2yzQpsRlrA4NtO7S8JHIMTnnytLBoqHT FAuLlEBTFeXDRhhvEBCB1e73Rw6o+qli8Gpdg4KnUEA70Ces1gwBscW9rgwN5gBteAomVieZxg Caw= X-IronPort-AV: E=Sophos;i="5.76,414,1592841600"; d="scan'208";a="147041239" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 11 Sep 2020 16:56:59 +0800 IronPort-SDR: ClbI/BLXbLN/1ClPDNMuYmuKFDPmuJX6by7oZgNQBXuP06tD2SNI6CpL/dJcZM/uEbjv8NV4aq nc2EZ2cb3EZw== Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Sep 2020 01:44:16 -0700 IronPort-SDR: 22sKthLyhQAaoLbFiwqwP2XqJAEIJLC/VlCP7eqd+AMaUulRszEZuQsu6iZQEZ0IQ66peXW0Vr xpXyim6o9dHg== WDCIronportException: Internal Received: from unknown (HELO redsun60.ssa.fujisawa.hgst.com) ([10.149.66.36]) by uls-op-cesaip01.wdc.com with ESMTP; 11 Sep 2020 01:56:58 -0700 From: Johannes Thumshirn To: Damien Le Moal Cc: linux-fsdevel@vger.kernel.org, Johannes Thumshirn Subject: [PATCH v5 2/4] zonefs: provide zonefs_io_error variant that can be called with i_truncate_mutex held Date: Fri, 11 Sep 2020 17:56:49 +0900 Message-Id: <20200911085651.23526-3-johannes.thumshirn@wdc.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200911085651.23526-1-johannes.thumshirn@wdc.com> References: <20200911085651.23526-1-johannes.thumshirn@wdc.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Subsequent patches need to call zonefs_io_error() with the i_truncate_mutex already held, so factor out the body of zonefs_io_error() into __zonefs_io_error() which can be called from with the i_truncate_mutex held. Signed-off-by: Johannes Thumshirn Reviewed-by: Damien Le Moal --- fs/zonefs/super.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c index 6e13a5127b01..4309979eeb36 100644 --- a/fs/zonefs/super.c +++ b/fs/zonefs/super.c @@ -348,7 +348,7 @@ static int zonefs_io_error_cb(struct blk_zone *zone, unsigned int idx, * eventually correct the file size and zonefs inode write pointer offset * (which can be out of sync with the drive due to partial write failures). */ -static void zonefs_io_error(struct inode *inode, bool write) +static void __zonefs_io_error(struct inode *inode, bool write) { struct zonefs_inode_info *zi = ZONEFS_I(inode); struct super_block *sb = inode->i_sb; @@ -362,8 +362,6 @@ static void zonefs_io_error(struct inode *inode, bool write) }; int ret; - mutex_lock(&zi->i_truncate_mutex); - /* * Memory allocations in blkdev_report_zones() can trigger a memory * reclaim which may in turn cause a recursion into zonefs as well as @@ -379,7 +377,14 @@ static void zonefs_io_error(struct inode *inode, bool write) zonefs_err(sb, "Get inode %lu zone information failed %d\n", inode->i_ino, ret); memalloc_noio_restore(noio_flag); +} +static void zonefs_io_error(struct inode *inode, bool write) +{ + struct zonefs_inode_info *zi = ZONEFS_I(inode); + + mutex_lock(&zi->i_truncate_mutex); + __zonefs_io_error(inode, write); mutex_unlock(&zi->i_truncate_mutex); } From patchwork Fri Sep 11 08:56:50 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Thumshirn X-Patchwork-Id: 11770029 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D6AE8746 for ; Fri, 11 Sep 2020 08:57:24 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9F7B0221EE for ; Fri, 11 Sep 2020 08:57:24 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="JkhekH3C" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725764AbgIKI5X (ORCPT ); Fri, 11 Sep 2020 04:57:23 -0400 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:58878 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725770AbgIKI5B (ORCPT ); Fri, 11 Sep 2020 04:57:01 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1599814620; x=1631350620; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=BspeBz3aYcW5oYdSc20roZlGnCHCSu4uaIgl5MlZ1bA=; b=JkhekH3CbFwwNySz2VcenHF3MhBzZl3W4dvUQGPaIPe8yxjDCKdFBeRg A1QiwuWZMxM0QjiFvOlt2qH8DamHn22rFcjppzcIE0ZgRS8/fdo219GG9 Ihi7NMuDSl9knViz8HWBICTprj5FXTmyPzIjhz0e9VeWrw6S+jgZrDUsc ixPdQ6CMnWG4+1nY58Z2qpaR4rVr73nsHBDEI0OLO0tFyORQHuvaZ6or+ InwfE+3O8/Mhf+pP4L98wYdAfy/gKiEpJyYbmTmL09V4M2RLjlm1O2UyQ ExM+9mEp1AaBqJoLd1P1RPoB+TC2W/1AMvI+oz1OAIp5QHjqGYOg1GW2W w==; IronPort-SDR: wUOUBa5PaQoU3TKuOeiQlfaAtrDC5AHMXbdheU/6fzgj6pAc+CYv4Q6PfJdt8vqtCQxdFPDItS cwdz7PCZvDSXIg11iENqC+zGTkssfz1jROry+bLNXVGHDxKWXjM0SprGFhq/7jGKa3uudTWGVK 5GCaBjhSbB9Ti643asO4RVeuTweiTXb4e2dyXebcRxUiyLQrm/mrQ/lom+f5FnbBM5X3Qk1bwz XacLkQgdUBEz+T2xyJ+vxvcMXLasPeniTtNIqVlkk7YgIGwt1DI+3UFgwdl4b821ge4gbiwBhB CbU= X-IronPort-AV: E=Sophos;i="5.76,414,1592841600"; d="scan'208";a="147041240" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 11 Sep 2020 16:57:00 +0800 IronPort-SDR: 1l7A6GWCz2U53gh+p6UUAxk2yeug4fsdbOHlUlowKdPiWmLyD1jJj4UxdF56Umheux2AREUDv7 xU6nkf+CfIiw== Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Sep 2020 01:44:17 -0700 IronPort-SDR: w14L9HQzJlFxuxfn9tP3k+v1Rvka5Ikky50xXDO8ZAX8hHJMt7ILAUT09g0qtPlrJvEwVQ+Obf H6rTKJ2QD5kw== WDCIronportException: Internal Received: from unknown (HELO redsun60.ssa.fujisawa.hgst.com) ([10.149.66.36]) by uls-op-cesaip01.wdc.com with ESMTP; 11 Sep 2020 01:56:59 -0700 From: Johannes Thumshirn To: Damien Le Moal Cc: linux-fsdevel@vger.kernel.org, Johannes Thumshirn Subject: [PATCH v5 3/4] zonefs: open/close zone on file open/close Date: Fri, 11 Sep 2020 17:56:50 +0900 Message-Id: <20200911085651.23526-4-johannes.thumshirn@wdc.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200911085651.23526-1-johannes.thumshirn@wdc.com> References: <20200911085651.23526-1-johannes.thumshirn@wdc.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org NVMe Zoned Namespace introduced the concept of active zones, which are zones in the implicit open, explicit open or closed condition. Drives may have a limit on the number of zones that can be simultaneously active. This potential limitation translate into a risk for applications to see write IO errors due to this limit if the zone of a file being written to is not already active when a write request is issued. To avoid these potential errors, the zone of a file can explicitly be made active using an open zone command when the file is open for the first time. If the zone open command succeeds, the application is then guaranteed that write requests can be processed. This indirect management of active zones relies on the maximum number of open zones of a drive, which is always lower or equal to the maximum number of active zones. On the first open of a sequential zone file, send a REQ_OP_ZONE_OPEN command to the block device. Conversely, on the last release of a zone file and send a REQ_OP_ZONE_CLOSE to the device if the zone is not full or empty. As truncating a zone file to 0 or max can deactivate a zone as well, we need to serialize against truncates and also be careful not to close a zone as the file may still be open for writing, e.g. the user called ftruncate(). If the zone file is not open and a process does a truncate(), then no close operation is needed. Signed-off-by: Johannes Thumshirn Reviewed-by: Damien Le Moal --- fs/zonefs/super.c | 183 ++++++++++++++++++++++++++++++++++++++++++++- fs/zonefs/zonefs.h | 10 +++ 2 files changed, 189 insertions(+), 4 deletions(-) diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c index 4309979eeb36..64cc2a9c38c8 100644 --- a/fs/zonefs/super.c +++ b/fs/zonefs/super.c @@ -44,6 +44,19 @@ static inline int zonefs_zone_mgmt(struct inode *inode, return 0; } +static inline void zonefs_i_size_write(struct inode *inode, loff_t isize) +{ + struct zonefs_inode_info *zi = ZONEFS_I(inode); + + i_size_write(inode, isize); + /* + * A full zone is no longer open/active and does not need + * explicit closing. + */ + if (isize >= zi->i_max_size) + zi->i_flags &= ~ZONEFS_ZONE_OPEN; +} + static int zonefs_iomap_begin(struct inode *inode, loff_t offset, loff_t length, unsigned int flags, struct iomap *iomap, struct iomap *srcmap) @@ -321,6 +334,17 @@ static int zonefs_io_error_cb(struct blk_zone *zone, unsigned int idx, } } + /* + * If the filesystem is mounted with the explicit-open mount option, we + * need to clear the ZONEFS_ZONE_OPEN flag if the zone transitioned to + * the read-only or offline condition, to avoid attempting an explicit + * close of the zone when the inode file is closed. + */ + if ((sbi->s_mount_opts & ZONEFS_MNTOPT_EXPLICIT_OPEN) && + (zone->cond == BLK_ZONE_COND_OFFLINE || + zone->cond == BLK_ZONE_COND_READONLY)) + zi->i_flags &= ~ZONEFS_ZONE_OPEN; + /* * If error=remount-ro was specified, any error result in remounting * the volume as read-only. @@ -335,7 +359,7 @@ static int zonefs_io_error_cb(struct blk_zone *zone, unsigned int idx, * invalid data. */ zonefs_update_stats(inode, data_size); - i_size_write(inode, data_size); + zonefs_i_size_write(inode, data_size); zi->i_wpoffset = data_size; return 0; @@ -426,6 +450,25 @@ static int zonefs_file_truncate(struct inode *inode, loff_t isize) if (ret) goto unlock; + /* + * If the mount option ZONEFS_MNTOPT_EXPLICIT_OPEN is set, + * take care of open zones. + */ + if (zi->i_flags & ZONEFS_ZONE_OPEN) { + /* + * Truncating a zone to EMPTY or FULL is the equivalent of + * closing the zone. For a truncation to 0, we need to + * re-open the zone to ensure new writes can be processed. + * For a truncation to the maximum file size, the zone is + * closed and writes cannot be accepted anymore, so clear + * the open flag. + */ + if (!isize) + ret = zonefs_zone_mgmt(inode, REQ_OP_ZONE_OPEN); + else + zi->i_flags &= ~ZONEFS_ZONE_OPEN; + } + zonefs_update_stats(inode, isize); truncate_setsize(inode, isize); zi->i_wpoffset = isize; @@ -604,7 +647,7 @@ static int zonefs_file_write_dio_end_io(struct kiocb *iocb, ssize_t size, mutex_lock(&zi->i_truncate_mutex); if (i_size_read(inode) < iocb->ki_pos + size) { zonefs_update_stats(inode, iocb->ki_pos + size); - i_size_write(inode, iocb->ki_pos + size); + zonefs_i_size_write(inode, iocb->ki_pos + size); } mutex_unlock(&zi->i_truncate_mutex); } @@ -885,8 +928,128 @@ static ssize_t zonefs_file_read_iter(struct kiocb *iocb, struct iov_iter *to) return ret; } +static inline bool zonefs_file_use_exp_open(struct inode *inode, struct file *file) +{ + struct zonefs_inode_info *zi = ZONEFS_I(inode); + struct zonefs_sb_info *sbi = ZONEFS_SB(inode->i_sb); + + if (!(sbi->s_mount_opts & ZONEFS_MNTOPT_EXPLICIT_OPEN)) + return false; + + if (zi->i_ztype != ZONEFS_ZTYPE_SEQ) + return false; + + if (!(file->f_mode & FMODE_WRITE)) + return false; + + return true; +} + +static int zonefs_open_zone(struct inode *inode) +{ + struct zonefs_inode_info *zi = ZONEFS_I(inode); + struct zonefs_sb_info *sbi = ZONEFS_SB(inode->i_sb); + int ret = 0; + + mutex_lock(&zi->i_truncate_mutex); + + zi->i_wr_refcnt++; + if (zi->i_wr_refcnt == 1) { + + if (atomic_inc_return(&sbi->s_open_zones) > sbi->s_max_open_zones) { + atomic_dec(&sbi->s_open_zones); + ret = -EBUSY; + goto unlock; + } + + if (i_size_read(inode) < zi->i_max_size) { + ret = zonefs_zone_mgmt(inode, REQ_OP_ZONE_OPEN); + if (ret) { + zi->i_wr_refcnt--; + atomic_dec(&sbi->s_open_zones); + goto unlock; + } + zi->i_flags |= ZONEFS_ZONE_OPEN; + } + } + +unlock: + mutex_unlock(&zi->i_truncate_mutex); + + return ret; +} + +static int zonefs_file_open(struct inode *inode, struct file *file) +{ + int ret; + + ret = generic_file_open(inode, file); + if (ret) + return ret; + + if (zonefs_file_use_exp_open(inode, file)) + return zonefs_open_zone(inode); + + return 0; +} + +static void zonefs_close_zone(struct inode *inode) +{ + struct zonefs_inode_info *zi = ZONEFS_I(inode); + int ret = 0; + + mutex_lock(&zi->i_truncate_mutex); + zi->i_wr_refcnt--; + if (!zi->i_wr_refcnt) { + struct zonefs_sb_info *sbi = ZONEFS_SB(inode->i_sb); + struct super_block *sb = inode->i_sb; + + /* + * If the file zone is full, it is not open anymore and we only + * need to decrement the open count. + */ + if (!(zi->i_flags & ZONEFS_ZONE_OPEN)) + goto dec; + + ret = zonefs_zone_mgmt(inode, REQ_OP_ZONE_CLOSE); + if (ret) { + __zonefs_io_error(inode, false); + /* + * Leaving zones explicitly open may lead to a state + * where most zones cannot be written (zone resources + * exhausted). So take preventive action by remounting + * read-only. + */ + if (zi->i_flags & ZONEFS_ZONE_OPEN && + !(sb->s_flags & SB_RDONLY)) { + zonefs_warn(sb, "closing zone failed, remounting filesystem read-only\n"); + sb->s_flags |= SB_RDONLY; + } + } + zi->i_flags &= ~ZONEFS_ZONE_OPEN; +dec: + atomic_dec(&sbi->s_open_zones); + } + mutex_unlock(&zi->i_truncate_mutex); +} + +static int zonefs_file_release(struct inode *inode, struct file *file) +{ + /* + * If we explicitly open a zone we must close it again as well, but the + * zone management operation can fail (either due to an IO error or as + * the zone has gone offline or read-only). Make sure we don't fail the + * close(2) for user-space. + */ + if (zonefs_file_use_exp_open(inode, file)) + zonefs_close_zone(inode); + + return 0; +} + static const struct file_operations zonefs_file_operations = { - .open = generic_file_open, + .open = zonefs_file_open, + .release = zonefs_file_release, .fsync = zonefs_file_fsync, .mmap = zonefs_file_mmap, .llseek = zonefs_file_llseek, @@ -910,6 +1073,7 @@ static struct inode *zonefs_alloc_inode(struct super_block *sb) inode_init_once(&zi->i_vnode); mutex_init(&zi->i_truncate_mutex); init_rwsem(&zi->i_mmap_sem); + zi->i_wr_refcnt = 0; return &zi->i_vnode; } @@ -960,7 +1124,7 @@ static int zonefs_statfs(struct dentry *dentry, struct kstatfs *buf) enum { Opt_errors_ro, Opt_errors_zro, Opt_errors_zol, Opt_errors_repair, - Opt_err, + Opt_explicit_open, Opt_err, }; static const match_table_t tokens = { @@ -968,6 +1132,7 @@ static const match_table_t tokens = { { Opt_errors_zro, "errors=zone-ro"}, { Opt_errors_zol, "errors=zone-offline"}, { Opt_errors_repair, "errors=repair"}, + { Opt_explicit_open, "explicit-open" }, { Opt_err, NULL} }; @@ -1004,6 +1169,9 @@ static int zonefs_parse_options(struct super_block *sb, char *options) sbi->s_mount_opts &= ~ZONEFS_MNTOPT_ERRORS_MASK; sbi->s_mount_opts |= ZONEFS_MNTOPT_ERRORS_REPAIR; break; + case Opt_explicit_open: + sbi->s_mount_opts |= ZONEFS_MNTOPT_EXPLICIT_OPEN; + break; default: return -EINVAL; } @@ -1423,6 +1591,13 @@ static int zonefs_fill_super(struct super_block *sb, void *data, int silent) sbi->s_gid = GLOBAL_ROOT_GID; sbi->s_perm = 0640; sbi->s_mount_opts = ZONEFS_MNTOPT_ERRORS_RO; + sbi->s_max_open_zones = bdev_max_open_zones(sb->s_bdev); + atomic_set(&sbi->s_open_zones, 0); + if (!sbi->s_max_open_zones && + sbi->s_mount_opts & ZONEFS_MNTOPT_EXPLICIT_OPEN) { + zonefs_info(sb, "No open zones limit. Ignoring explicit_open mount option\n"); + sbi->s_mount_opts &= ~ZONEFS_MNTOPT_EXPLICIT_OPEN; + } ret = zonefs_read_super(sb); if (ret) diff --git a/fs/zonefs/zonefs.h b/fs/zonefs/zonefs.h index 55b39970acb2..51141907097c 100644 --- a/fs/zonefs/zonefs.h +++ b/fs/zonefs/zonefs.h @@ -38,6 +38,8 @@ static inline enum zonefs_ztype zonefs_zone_type(struct blk_zone *zone) return ZONEFS_ZTYPE_SEQ; } +#define ZONEFS_ZONE_OPEN (1 << 0) + /* * In-memory inode data. */ @@ -74,6 +76,10 @@ struct zonefs_inode_info { */ struct mutex i_truncate_mutex; struct rw_semaphore i_mmap_sem; + + /* guarded by i_truncate_mutex */ + unsigned int i_wr_refcnt; + unsigned int i_flags; }; static inline struct zonefs_inode_info *ZONEFS_I(struct inode *inode) @@ -154,6 +160,7 @@ enum zonefs_features { #define ZONEFS_MNTOPT_ERRORS_MASK \ (ZONEFS_MNTOPT_ERRORS_RO | ZONEFS_MNTOPT_ERRORS_ZRO | \ ZONEFS_MNTOPT_ERRORS_ZOL | ZONEFS_MNTOPT_ERRORS_REPAIR) +#define ZONEFS_MNTOPT_EXPLICIT_OPEN (1 << 4) /* Explicit open/close of zones on open/close */ /* * In-memory Super block information. @@ -175,6 +182,9 @@ struct zonefs_sb_info { loff_t s_blocks; loff_t s_used_blocks; + + unsigned int s_max_open_zones; + atomic_t s_open_zones; }; static inline struct zonefs_sb_info *ZONEFS_SB(struct super_block *sb) From patchwork Fri Sep 11 08:56:51 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Thumshirn X-Patchwork-Id: 11770031 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 663AE112E for ; Fri, 11 Sep 2020 08:57:28 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4BC0F221EB for ; Fri, 11 Sep 2020 08:57:28 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="HNXEZZoW" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725771AbgIKI5Y (ORCPT ); Fri, 11 Sep 2020 04:57:24 -0400 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:58877 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725790AbgIKI5G (ORCPT ); Fri, 11 Sep 2020 04:57:06 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1599814626; x=1631350626; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=MWzomiudUXRpALqsd5z2KVHaNY/hDcGHCszrDMZwWbU=; b=HNXEZZoWMQuiVDXf166un3bkAIZFn93bUuWv3cop3SwnBp/fjh85tnlX ER9YVQyZv4qfQAC38+FHpqbD+I0grt621cM1J5ry0EVwDqsQkGjecBUr8 GSvHeAEMUMaS7bqPFwdKUAiwWRlNkEobozyzxsJTjlHrTe5kL0CQNhpJN 0MSls5Ct4aujd95RG4CgzXhavAOkBtveBGauAFcsHGxECl6P+iOkxfxuE cXJUabBpPadSsJjBZAbhk7zP+KMi8ybkjRt+VFfDefVH6msB0S+spY+TU SjAMiWLv6daINRFsHLtm2H4Guj27F722ZWccnbZh7LPmqZwh5Z2NuGHT+ A==; IronPort-SDR: t3nN57v9b0Zf9tpmLMTzX+YParhSZj3xFCEk+JALEtYHUuBX+Wam/xNjBwm8MSb2UuIW/p84gj iejjSkjiHkqNf52G0kIEEZs70HvQI5x0aZCQuJbCzxTHZKRNE+oWO0nKwoDfj/4z2Y3ultQwyA FDoNbsEkFwksJJLBTyEbCFEQtYxKh0VcsZGstxHuhCHdWtHd4uWQ1g6hOk1PSMwPpYPdDtpFo2 MIhRdrkmJUYpxI10JAscs/DETn8ulR533GCwFHp4VX8qUGi1Tq1K0ucUoHn1AenHBmeI6vpaXa i+E= X-IronPort-AV: E=Sophos;i="5.76,414,1592841600"; d="scan'208";a="147041241" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 11 Sep 2020 16:57:01 +0800 IronPort-SDR: KboUYO8WbzxOgNhUUuMP+GKVdVFtY7G1Q9ckkKd/oTugKwiwj6REH1DemZRpAVdXaTSmMyysmn jgqNLXqWO2QA== Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Sep 2020 01:44:18 -0700 IronPort-SDR: DW+wibrQUrf8hey4NqX8BKNhFEtVnYHhKiCQoaFsmLRKwZkZeGCh8kajdYYDcKuVIQfjXd5KqM 6kzxlTAyOhOQ== WDCIronportException: Internal Received: from unknown (HELO redsun60.ssa.fujisawa.hgst.com) ([10.149.66.36]) by uls-op-cesaip01.wdc.com with ESMTP; 11 Sep 2020 01:57:00 -0700 From: Johannes Thumshirn To: Damien Le Moal Cc: linux-fsdevel@vger.kernel.org, Johannes Thumshirn , Damien Le Moal Subject: [PATCH v5 4/4] zonefs: document the explicit-open mount option Date: Fri, 11 Sep 2020 17:56:51 +0900 Message-Id: <20200911085651.23526-5-johannes.thumshirn@wdc.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200911085651.23526-1-johannes.thumshirn@wdc.com> References: <20200911085651.23526-1-johannes.thumshirn@wdc.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Document the newly introduced explicit-open mount option. Signed-off-by: Johannes Thumshirn Reviewed-by: Damien Le Moal --- Documentation/filesystems/zonefs.rst | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/Documentation/filesystems/zonefs.rst b/Documentation/filesystems/zonefs.rst index 6c18bc8ce332..6b213fe9a33e 100644 --- a/Documentation/filesystems/zonefs.rst +++ b/Documentation/filesystems/zonefs.rst @@ -326,6 +326,21 @@ discover the amount of data that has been written to the zone. In the case of a read-only zone discovered at run-time, as indicated in the previous section. The size of the zone file is left unchanged from its last updated value. +A zoned block device (e.g. an NVMe Zoned Namespace device) may have limits on +the number of zones that can be active, that is, zones that are in the +implicit open, explicit open or closed conditions. This potential limitation +translates into a risk for applications to see write IO errors due to this +limit being exceeded if the zone of a file is not already active when a write +request is issued by the user. + +To avoid these potential errors, the "explicit-open" mount option forces zones +to be made active using an open zone command when a file is opened for writing +for the first time. If the zone open command succeeds, the application is then +guaranteed that write requests can be processed. Conversely, the +"explicit-open" mount option will result in a zone close command being issued +to the device on the last close() of a zone file if the zone is not full nor +empty. + Zonefs User Space Tools =======================