From patchwork Thu Mar 20 11:13:24 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Luis Chamberlain X-Patchwork-Id: 14023724 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9BBC2C28B30 for ; Thu, 20 Mar 2025 11:13:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C4CFC280005; Thu, 20 Mar 2025 07:13:45 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BA72D280004; Thu, 20 Mar 2025 07:13:45 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 93899280004; Thu, 20 Mar 2025 07:13:45 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 6A5B7280001 for ; Thu, 20 Mar 2025 07:13:45 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 575341212CD for ; Thu, 20 Mar 2025 11:13:46 +0000 (UTC) X-FDA: 83241669252.01.45AD68F Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) by imf11.hostedemail.com (Postfix) with ESMTP id 250424000B for ; Thu, 20 Mar 2025 11:13:43 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=infradead.org header.s=bombadil.20210309 header.b=nQ4SLRp8; dmarc=fail reason="No valid SPF, DKIM not aligned (relaxed)" header.from=kernel.org (policy=quarantine); spf=none (imf11.hostedemail.com: domain of mcgrof@infradead.org has no SPF policy when checking 198.137.202.133) smtp.mailfrom=mcgrof@infradead.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1742469224; a=rsa-sha256; cv=none; b=kIsFKUiHjl/dZDtx6nTESDEWD2GeKfoF8GtRzyh3mivcw+kpetgcR5R2izQ4fn4/DyWagF FlR5lMcl9XXWM44E8sDHr0ghwvBr0fVLcFEeHK1s28gxDhhVtOtjmbcr0nD2spojaI4kMS FzxwLIyE0hdMjten32jjgDTAkJNdt3s= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=infradead.org header.s=bombadil.20210309 header.b=nQ4SLRp8; dmarc=fail reason="No valid SPF, DKIM not aligned (relaxed)" header.from=kernel.org (policy=quarantine); spf=none (imf11.hostedemail.com: domain of mcgrof@infradead.org has no SPF policy when checking 198.137.202.133) smtp.mailfrom=mcgrof@infradead.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1742469224; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=WS4+PxUYUf19jfxnqS2EQlmOXbGsS87CgpVY0Jf14I8=; b=qzxzP+JijwkTAe9NFx7TiLYpVYSbZaIwskzYc5k3Gfx7Lgv+fuf/f9/9jOOIqt6alclOI/ Y8bun8D5e/DNi8WN8WHgH1ILRFrb3BQYIENgFSA2FcqW/Z1Qv16pNkQZwWcnU2S7NPI++Y x1eSF2MOJAJ7pUE93Y76Dfa+beGflYY= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Sender:Content-Transfer-Encoding: MIME-Version:Message-ID:Date:Subject:Cc:To:From:Reply-To:Content-Type: Content-ID:Content-Description:In-Reply-To:References; bh=WS4+PxUYUf19jfxnqS2EQlmOXbGsS87CgpVY0Jf14I8=; b=nQ4SLRp8pgRrgY0eqG6fpORerE +/jDoFDBF+u18IAKjQaTHK6XVtvFyLe2a5XNQbeuIISOdvQlFIqvg1CwaoEX/csVtXSQC5HBPk7oS STGsU2WwCKLjHs44liiXHF5/E5UbYirwWVT/Jx/4QZjt1rwGnikjNy9DagBCsX6J5DVH4b9qfVMmC fq0R2Fmj0sTVsvDRoSg0DjtIe9/6ilLHO6o9ezmLrNxodTVvbCtJkurNvtatLpcrH6cdKP+a6nTjb scyNfjsnYlQwI2bElpvGnXte1/kqfsFnuKUNThTJxM0dGQcriEUQfhXmsp6txniVcyfBvDqmSASR/ HupGc0TQ==; Received: from mcgrof by bombadil.infradead.org with local (Exim 4.98 #2 (Red Hat Linux)) id 1tvDqB-0000000BvGH-0d8w; Thu, 20 Mar 2025 11:13:35 +0000 From: Luis Chamberlain To: leon@kernel.org, hch@lst.de, kbusch@kernel.org, sagi@grimberg.me, axboe@kernel.dk, joro@8bytes.org, brauner@kernel.org, hare@suse.de, willy@infradead.org, david@fromorbit.com, djwong@kernel.org Cc: john.g.garry@oracle.com, ritesh.list@gmail.com, linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-mm@kvack.org, gost.dev@samsung.com, p.raghav@samsung.com, da.gomez@samsung.com, kernel@pankajraghav.com, mcgrof@kernel.org Subject: [RFC 0/4] nvme-pci: breaking the 512 KiB max IO boundary Date: Thu, 20 Mar 2025 04:13:24 -0700 Message-ID: <20250320111328.2841690-1-mcgrof@kernel.org> X-Mailer: git-send-email 2.48.1 MIME-Version: 1.0 X-Rspamd-Pre-Result: action=add header; module=dmarc; Action set by DMARC X-Rspam-User: X-Rspamd-Queue-Id: 250424000B X-Rspamd-Server: rspam05 X-Stat-Signature: 15iei85p6yfoi1ynbspking9rctco6qr X-Rspam: Yes X-HE-Tag: 1742469223-848015 X-HE-Meta: U2FsdGVkX1+dDgeCuL4F4i48I373GcXodyf4uEmE5fEVbxlmhgZpU9VpZol/TLbKwIQTOlrjn+KLOXtAwYfi4ri1D4dImsnMqvJ0iABx6enfCOzfYc1p4hqoNv3IAygFLdyoGCr0l2giSTpXeSGGz4xD/IcyTA4Im6ARaDlcCdKf2yu6IbiXfrX+ykt2HpLNRg8NAXc9S2ztoxpTn4y2PmoUb+jgc08eIL56sHdYfhA9V3x57wz9yAZurJFNH9Jjj2djX42Ujj8tfNoIHO9bU5dBUU4lVdcMxK3pgRDSDjK8M2z2ROb4BEgM4NyHRQ/6rZe7VYnmtIqiRWRnmTeRi4UkrDqMJ7KmodN2w+Hg+F19zNBW0DFefLtV5Z5czOxs/hdMt7rO064sNDxpA4Y4NYfrBFT86d2OCZrDx+eAY8QFBA1nx9LayybcJeBhl5g1QWhr7H8YPwKS6Mq00L/YL6PnnXVGQmmL/klBpkRg27OxirA7xB2kgPb/Nu4EAMsshz71nXMMGVW37qK/dKUk38R4hqwMWfl8eEY9qB71XR5SWt9u1uiTK6fnbzg3NKlgTXQwJKuXWxHdJqbl+930WRv4tjPqS7+wpjDeQA+d6X2fyxiF0DsM8MJCuep5taruPwGhPUuMet5omSprUcTe1FcNs+3MB+2KoWtt3dL1bGBVxHKPjXKYC6hO1zt0H0ODQPEd0SB4Gc1iqkLHYOeQTgQpPkYg949twkfVuNPnnBNl82RaO7n6RMmIHXIHW2GSzkS/t+UyKSJPib20KtSJFtyNmqaBc3zhAOFYTRaBUfUZYdfc40cRyiDCOgXhC8EyD8JhWTCye4UD8lKEDBOgapLhphANNoz6AqX7+iDStUDzlmJoqW4BNWXO1GCo7z1WjNcIsG8Lg7p/az1h1r19mRADUe69G+fBaWveXx7Xn5Ry+DeUN92b9EceLkaPpOvmRrdRXBcJMftu2vPkUzq 6AdtUgzk 1+TtnTdVM+3EFpZw55lEuUJoMyMwDhqH7qHa7WA4iGfzBjrhRvoQqyOWkMvZBzu8Tg3nH3endPAY5CkHCR2YnIXLyKoIJISs/GhqVFmIzTW1OypckKB3RbmIAY3pkrFwkSxzrweKVE3HYQQypBfH2vJX5Bdc7QWq3wXSDNIGOR4HYhJd6tzVV2YbJ40luIRT4CNUDP4cQzx0PlbeTIaxgo6xvTjyLaBn3rYreE3untYuQrHOZuBEpdDTuMTkWxzBhsxQ82+p+YjDZRf2g4RpAOphE8nMmDYqtnqGz4spY2U0DJqvcmgDBvrMLpRtgxSr/CpjbKLRAsGEqdm+gA+rPPwcY7U6mXsh3sgh2PR1yj8mZNemJjnX0VmoncSC0Q1ejY8NDtkXaH+Kta8zx1AU8jqssE8qm1gGoOFhKKQlCaiY5feGGDW2G4fnKtMxvGc8KEFRL5T1xtCSVAD8rlbUNhii8fx44P3+QWugzcibp6uPu2gnceOyRGYW+K8oPky6yHTuMwY1MAIwzCzY= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Now that we have bs > ps for block device sector sizes on linux-next the next eye sore is why our max sector size is stuck at 64k while we should be able to go up to in theory to the max supported by the page cache. On x86_64 that's 2 MiB. The reason we didn't jump to 2 MiB is because testing with a higher limit than 64k proved to have issues. While we've looked into them a glaring issue was scatter list limitation on the NVMe PCI driver. While we could adopt scatter list chaining, the work Christoph and Leon have been working on with the two step DMA API seems to be the way to go since the scatter lists are tied to PAGE_SIZE restrictions, and the scatter list chaining is just a mess. So it begs the question, with the new two step DMA API, does the problem get easier? The answer is yes, and for those that want to experiment this will let you do just that. With this we can enable 2 MiB LBA format on NVMe and we can issue single IOs up to 8 MiB for both buffered IO and direct IO. The last two patches are not really intended for upstream, but rather experimental code to let folks muck around with large sector sizes. Daniel Gomez has taken Leon Romanovsky's new two step DMA API [0] and Christoph Hellwig's "Block and NMMe PCI use of new DMA mapping API" [1]. We then used this to apply on top the 64k sector size patches now merged on linux-next and backported them to v6.14-rc5. The patches on this RFC are the patches on top of all that so to demonstrate the minimal changes needed to enable up to 8 MiB IOs on NVMe leveraging a 2 MiB max block sector size on x86_64 after the two-step DMA API and the NVMe cleanup. If you want a git tree to play with you can use our large-block-buffer-heads-2m linux branch from kdevops. [0] https://lore.kernel.org/all/20250302085717.GO53094@unreal/ [1] https://lore.kernel.org/all/cover.1730037261.git.leon@kernel.org/ [2] https://github.com/linux-kdevops/linux/tree/large-block-buffer-heads-2m Luis Chamberlain (4): iomap: use BLK_MAX_BLOCK_SIZE for the iomap zero page blkdev: lift BLK_MAX_BLOCK_SIZE to page cache limit nvme-pci: bump segments to what the device can use nvme-pci: add quirk for qemu with bogus NOWS drivers/nvme/host/core.c | 2 + drivers/nvme/host/nvme.h | 5 ++ drivers/nvme/host/pci.c | 167 ++------------------------------------- fs/iomap/direct-io.c | 2 +- include/linux/blkdev.h | 7 +- 5 files changed, 15 insertions(+), 168 deletions(-)