From patchwork Wed May 25 15:49:49 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pankaj Raghav X-Patchwork-Id: 12861398 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8A11EC433EF for ; Wed, 25 May 2022 15:50:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232572AbiEYPuQ (ORCPT ); Wed, 25 May 2022 11:50:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40042 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S245349AbiEYPuG (ORCPT ); Wed, 25 May 2022 11:50:06 -0400 Received: from mailout2.w1.samsung.com (mailout2.w1.samsung.com [210.118.77.12]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2450EAE24B for ; Wed, 25 May 2022 08:50:04 -0700 (PDT) Received: from eucas1p1.samsung.com (unknown [182.198.249.206]) by mailout2.w1.samsung.com (KnoxPortal) with ESMTP id 20220525155000euoutp025951019916feebcd518c1c69fca91fed~yY9nt-BuP0326303263euoutp02C for ; Wed, 25 May 2022 15:50:00 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 mailout2.w1.samsung.com 20220525155000euoutp025951019916feebcd518c1c69fca91fed~yY9nt-BuP0326303263euoutp02C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=samsung.com; s=mail20170921; t=1653493800; bh=go1x8375JmtxvkXA3AQKlCThO04+CBvjgeidLIMVqD4=; h=From:To:Cc:Subject:Date:References:From; b=uPLaG5obOg8RI/bJVCHE5i+gMpY2wfiJ1alF7VrgKb70dUkRm1h/JPuWNKZKmpOAU 3zmL9S7n0pvP2j+usgXAPWMm8uiNy/wSIH8gouEFuZsgLXpJBUtrA5gK1y7gmAHJnn M3/sXqy4Oxs6FKCL/JeCqoYA1x7HjfghHoiXiuBM= Received: from eusmges1new.samsung.com (unknown [203.254.199.242]) by eucas1p1.samsung.com (KnoxPortal) with ESMTP id 20220525154959eucas1p19212cc5cb3c42b45762a8993bebc305c~yY9mG7NPt1681116811eucas1p1e; Wed, 25 May 2022 15:49:59 +0000 (GMT) Received: from eucas1p2.samsung.com ( [182.198.249.207]) by eusmges1new.samsung.com (EUCPMTA) with SMTP id 49.5B.10009.6205E826; Wed, 25 May 2022 16:49:59 +0100 (BST) Received: from eusmtrp2.samsung.com (unknown [182.198.249.139]) by eucas1p2.samsung.com (KnoxPortal) with ESMTPA id 20220525154958eucas1p2f6af3db8ab178be28eb6c42e9e1be591~yY9luDlFF1436314363eucas1p2A; Wed, 25 May 2022 15:49:58 +0000 (GMT) Received: from eusmgms2.samsung.com (unknown [182.198.249.180]) by eusmtrp2.samsung.com (KnoxPortal) with ESMTP id 20220525154958eusmtrp27a53011d768f4a6c7f76b14e42e26996~yY9ltLqy40177001770eusmtrp2V; Wed, 25 May 2022 15:49:58 +0000 (GMT) X-AuditID: cbfec7f2-e7fff70000002719-32-628e502659c4 Received: from eusmtip2.samsung.com ( [203.254.199.222]) by eusmgms2.samsung.com (EUCPMTA) with SMTP id BB.8E.09404.6205E826; Wed, 25 May 2022 16:49:58 +0100 (BST) Received: from localhost (unknown [106.210.248.20]) by eusmtip2.samsung.com (KnoxPortal) with ESMTPA id 20220525154958eusmtip280aab28279aa8118080c2646323aed1a~yY9lXaGZE0140601406eusmtip2L; Wed, 25 May 2022 15:49:58 +0000 (GMT) From: Pankaj Raghav To: axboe@kernel.dk, damien.lemoal@opensource.wdc.com, snitzer@redhat.com, Johannes.Thumshirn@wdc.com, hch@lst.de, hare@suse.de Cc: dsterba@suse.com, dm-devel@redhat.com, jiangbo.365@bytedance.com, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nvme@lists.infradead.org, jaegeuk@kernel.org, gost.dev@samsung.com, Pankaj Raghav Subject: [PATCH v6 0/8] support non power of 2 zoned devices Date: Wed, 25 May 2022 17:49:49 +0200 Message-Id: <20220525154957.393656-1-p.raghav@samsung.com> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrGKsWRmVeSWpSXmKPExsWy7djP87rqAX1JBhO6mCxW3+1ns/h99jyz xd53s1ktLvxoZLK4eWAnk8WeRZOYLFauPspk8WT9LGaLngMfWCz+dt1jsth7S9vi8q45bBbz lz1lt/i8tIXdom3jV0YHfo9/J9aweVw+W+qxaVUnm8fmJfUeu282sHnsbL3P6vF+31U2j74t qxg91m+5yuKx+XS1x+dNch7tB7qZAniiuGxSUnMyy1KL9O0SuDJ+fXrOUrBMv+LWLocGxja1 LkZODgkBE4kPt14zdTFycQgJrGCUOPzpOwuE84VRYuba36wQzmdGidXHV7DDtMx5uIMZIrGc UWLv+WlQLS8YJb5v/MbWxcjBwSagJdHYyQ4SFxFoZJSY+vIEI4jDLPCVUeLGweeMIEXCAjYS x1+HgkxlEVCV+Nq+gxHE5hWwkvh48jULxDZ5iZmXvrNDxAUlTs58AhZnBoo3b50NdoWEwGxO iac/n7OAzJQQcJHY/bcUoldY4tXxLVBXy0j83zmfCcKulnh64zdUbwujRP/O9WwQvdYSfWdy QExmAU2J9bv0IcodJX7MPsgEUcEnceOtIMQFfBKTtk1nhgjzSnS0CUFUK0ns/PkEaqmExOWm OVCPeEh86X3FBmILCcRK9G1eyjqBUWEWkr9mIflrFsINCxiZVzGKp5YW56anFhvmpZbrFSfm Fpfmpesl5+duYgQmvdP/jn/awTj31Ue9Q4xMHIyHGCU4mJVEeC887U0S4k1JrKxKLcqPLyrN SS0+xCjNwaIkzpucuSFRSCA9sSQ1OzW1ILUIJsvEwSnVwDTV2Y/TLcE2kSXqq8js0rf/nB1v JSz9lWCcxuXuJxy+3sf+5oFJdtYlC1h25KW9Xfq3ZU3e2uWXy1slLD4IebqWRxxc8E8q/Car kcyZCYd/snm8/LDAtrp7U9nKRzGL+D/b+96LMNO4+JnpzDNLU6F/skzFJy6c3OXbqiO+VTtk Mz/DTKtF8655Hrj5YA7rsX0rP5RxqCy7M/dfvi+3eOrVxy/K4iMzPxhsMpmmcCz1xY+DP6/P /j7leOP//7OOOqVcv/vxmjNz4P4YYdE1id1L1kwOjD7Ts+n18v9Xq9P2vpyle+m0D4vy9mV/ daU3HXk3S/dZzI7H1S/6IgUfp+9NDLSI+h118fr1L3PYp/49qsRSnJFoqMVcVJwIAPvKOn7p AwAA X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrCIsWRmVeSWpSXmKPExsVy+t/xe7pqAX1JBvP+qVisvtvPZvH77Hlm i73vZrNaXPjRyGRx88BOJos9iyYxWaxcfZTJ4sn6WcwWPQc+sFj87brHZLH3lrbF5V1z2Czm L3vKbvF5aQu7RdvGr4wO/B7/Tqxh87h8ttRj06pONo/NS+o9dt9sYPPY2Xqf1eP9vqtsHn1b VjF6rN9ylcVj8+lqj8+b5DzaD3QzBfBE6dkU5ZeWpCpk5BeX2CpFG1oY6RlaWugZmVjqGRqb x1oZmSrp29mkpOZklqUW6dsl6GX8+vScpWCZfsWtXQ4NjG1qXYycHBICJhJzHu5g7mLk4hAS WMooseXufWaIhITE7YVNjBC2sMSfa11sEEXPGCWen2sAcjg42AS0JBo72UHiIgKdjBJz9rWC OcwC/xkl/nRcZAQpEhawkTj+OhRkEIuAqsTX9h1gQ3kFrCQ+nnzNArFAXmLmpe/sIOXMApoS 63fpQ5QISpyc+QSshBmopHnrbOYJjPyzEKpmIamahaRqASPzKkaR1NLi3PTcYiO94sTc4tK8 dL3k/NxNjMDo3Hbs55YdjCtffdQ7xMjEwXiIUYKDWUmE98LT3iQh3pTEyqrUovz4otKc1OJD jKZAV09klhJNzgemh7ySeEMzA1NDEzNLA1NLM2MlcV7Pgo5EIYH0xJLU7NTUgtQimD4mDk6p BiabRQxbU+ZMPn8r++bj6IvLVF+laK76t1ZhnWXSoz9mt3zNr1e07ftWtPOGaOOpD6wr5yrc ZfZ9a1Qs7SNx5E9Wq9gpE/kHVzbcfPr9oSt7aVLfpL/9EdHcz2cWaljvcOdyZdCceFrzQ87j WI4DzCW6H8LevtJfr/eX+VOzDucrrf1N6s1s/qmFokvK/Ffot1ke/awpIOeTarLh1Nkdrl/W 9DcezNiWebDs5endjHPmz5jo8GBL95Ln3S21AjnXZlh9mbb1+823L6R3rWm82XBuc3fdHXY7 c5Ydmqubjb3CzSzvPbr35afKrEV1NnOOGumE816dcY+B75KB/EuP7zdmy+/e+rL/ieeBu7tr zA2XK7EUZyQaajEXFScCAHGbNo5XAwAA X-CMS-MailID: 20220525154958eucas1p2f6af3db8ab178be28eb6c42e9e1be591 X-Msg-Generator: CA X-RootMTR: 20220525154958eucas1p2f6af3db8ab178be28eb6c42e9e1be591 X-EPHeader: CA CMS-TYPE: 201P X-CMS-RootMailID: 20220525154958eucas1p2f6af3db8ab178be28eb6c42e9e1be591 References: Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Hello, The previous revision ended up leading to a new direction to add npo2 device support as a dm target instead of adding support to filesystems directly[0]. I would like to hear some inputs from the community, especially from Christoph and Mike Snitzer about this approach. - Background and Motivation: The zone storage implementation in Linux, introduced since v4.10, first targetted SMR drives which have a power of 2 (po2) zone size alignment requirement. The po2 zone size was further imposed implicitly by the block layer's blk_queue_chunk_sectors(), used to prevent IO merging across chunks beyond the specified size, since v3.16 through commit 762380ad9322 ("block: add notion of a chunk size for request merging"). But this same general block layer po2 requirement for blk_queue_chunk_sectors() was removed on v5.10 through commit 07d098e6bbad ("block: allow 'chunk_sectors' to be non-power-of-2"). NAND, which is the media used in newer zoned storage devices, does not naturally align to po2. In these devices, zone cap is not the same as the po2 zone size. When the zone cap != zone size, then unmapped LBAs are introduced to cover the space between the zone cap and zone size. po2 requirement does not make sense for these type of zone storage devices. This patch series aims to remove these unmapped LBAs for zoned devices when zone cap is npo2. This is done by relaxing the po2 zone size constraint in the kernel and allowing zoned device with npo2 zone sizes if zone cap == zone size. Removing the po2 requirement from zone storage should be possible now provided that no userspace regression and no performance regressions are introduced. Stop-gap patches have been already merged into f2fs-tools to proactively not allow npo2 zone sizes until proper support is added [1]. There were two efforts previously to add support to npo2 devices: 1) via device level emulation [2] but that was rejected with a final conclusion to add support for non po2 zoned device in the complete stack[3] 2) adding support to the complete stack by removing the constraint in the block layer and NVMe layer with support to btrfs, zonefs, etc which was rejected with a conclusion to add a dm target for FS support [0] to reduce the regression impact. - Patchset description: The support is planned to be added in two phases: - Add npo2 support to block, nvme layer and necessary stop gap patches in the filesystems - Add dm target for npo2 devices so that they are presented as a po2 device to filesystems This patchset addresses the first phase for adding support to npo2 devices. Patches 1-2 deals with removing the po2 constraint from the block layer. Patches 3-4 deals with removing the constraint from nvme zns. Patch 5-6 removes the po2 contraint in null blk Patches 7 adds conditions to not allow non power of 2 devices in DM. The patch series is based on linux-next tag: next-20220520 - Future work Add DM target for npo2 devices to be presented as a po2 device. [0] https://lore.kernel.org/lkml/PH0PR04MB74166C87F694B150A5AE0F009BD09@PH0PR04MB7416.namprd04.prod.outlook.com/ [1] https://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs-tools.git/commit/?h=dev-test&id=6afcf6493578e77528abe65ab8b12f3e1c16749f [2] https://lore.kernel.org/all/20220310094725.GA28499@lst.de/T/ [3] https://lore.kernel.org/all/20220315135245.eqf4tqngxxb7ymqa@unifi/ Changes since v1: - Put the function declaration and its usage in the same commit (Bart) - Remove bdev_zone_aligned function (Bart) - Change the name from blk_queue_zone_aligned to blk_queue_is_zone_start (Damien) - q is never null in from bdev_get_queue (Damien) - Add condition during bringup and check for zsze == zcap for npo2 drives (Damien) - Rounddown operation should be made generic to work in 32 bits arch (bart) - Add comments where generic calculation is directly used instead having special handling for po2 zone sizes (Hannes) - Make the minimum zone size alignment requirement for btrfs to be 1M instead of BTRFS_STRIPE_LEN(David) Changes since v2: - Minor formatting changes Changes since v3: - Make superblock mirror align with the existing superblock log offsets (David) - DM change return value and remove extra newline - Optimize null blk zone index lookup with shift for po2 zone size Changes since v4: - Remove direct filesystems support for npo2 devices (Johannes, Hannes, Damien) Changes since v5: - Use DIV_ROUND_UP* helper instead of round_up as it breaks 32bit arch build in null blk(kernel-test-robot, Nathan) - Use DIV_ROUND_UP_SECTOR_T also in blkdev_nr_zones function instead of open coding it with div64_u64 - Added extra condition in dm-zoned and in dm to reject non power of 2 zone sizes. Luis Chamberlain (1): dm-zoned: ensure only power of 2 zone sizes are allowed Pankaj Raghav (7): block: make blkdev_nr_zones and blk_queue_zone_no generic for npo2 zsze block: allow blk-zoned devices to have non-power-of-2 zone size nvme: zns: Allow ZNS drives that have non-power_of_2 zone size nvmet: Allow ZNS target to support non-power_of_2 zone sizes null_blk: allow non power of 2 zoned devices null_blk: use zone_size_sects_shift for power of 2 zoned devices dm: ensure only power of 2 zone sizes are allowed block/blk-core.c | 3 +-- block/blk-zoned.c | 37 +++++++++++++++++++++++-------- drivers/block/null_blk/main.c | 5 ++--- drivers/block/null_blk/null_blk.h | 6 +++++ drivers/block/null_blk/zoned.c | 18 +++++++++------ drivers/md/dm-table.c | 6 +++++ drivers/md/dm-zone.c | 10 +++++++++ drivers/md/dm-zoned-target.c | 8 +++++++ drivers/nvme/host/zns.c | 21 ++++++++++-------- drivers/nvme/target/zns.c | 2 +- include/linux/blkdev.h | 36 +++++++++++++++++++++++++++++- 11 files changed, 120 insertions(+), 32 deletions(-)