From patchwork Thu May 18 15:01:05 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Luis Chamberlain X-Patchwork-Id: 13246964 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CC7D6C7EE2E for ; Thu, 18 May 2023 15:02:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231361AbjERPCz (ORCPT ); Thu, 18 May 2023 11:02:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42208 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231152AbjERPCx (ORCPT ); Thu, 18 May 2023 11:02:53 -0400 Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4F2BDE40; Thu, 18 May 2023 08:02:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Sender:Content-Transfer-Encoding: MIME-Version:Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-Type: Content-ID:Content-Description:In-Reply-To:References; bh=IPbC7YoIkKqC6t40TCzl8MEpx4iVpXpJL9oG1pZsQSk=; b=ETGJx0BGMQxGLkkG8vwXjcO1Vq Y7F82nXlNSs9UkJ98AVAKrP8FnCn1iSJTBJ39wdHmlcuT57O+AuMT5x2RPzVhawaAElpqe9L1qF+F UQ9ijM93tsKB3aDDtcVONigOcsb/jIvO3dWRCwQPYMyV4BQ1eIXCYfs/n3mFMbhicwxeT/p6LMNtn GzqfQXUVkq96AgKMs9eirs7u+zbDjdEoNrFAg2PdcvpVY4HCBAxx6Zwp/q6k5swh/5N94C+2x7L4S 3V8Fzwwht7cRGgp5vzxp4k3/wEcPnBeAV3cfYN9CI878OUUbSP02nPZvWxIJ8vqUFkaUTl1VLNlgQ YjqpwIpQ==; Received: from mcgrof by bombadil.infradead.org with local (Exim 4.96 #2 (Red Hat Linux)) id 1pzf7q-00DGB4-1d; Thu, 18 May 2023 15:01:07 +0000 From: Luis Chamberlain To: corbet@lwn.net, jake@lwn.net, hch@infradead.org, djwong@kernel.org, dchinner@redhat.com Cc: ritesh.list@gmail.com, rgoldwyn@suse.com, jack@suse.cz, linux-doc@vger.kernel.org, linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, p.raghav@samsung.com, da.gomez@samsung.com, rohan.puri@samsung.com, mcgrof@kernel.org Subject: [PATCH v2] Documentation: add initial iomap kdoc Date: Thu, 18 May 2023 08:01:05 -0700 Message-Id: <20230518150105.3160445-1-mcgrof@kernel.org> X-Mailer: git-send-email 2.38.1 MIME-Version: 1.0 Sender: Luis Chamberlain Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org To help with iomap adoption / porting I set out the goal to try to help improve the iomap documentation and get general guidance for filesystem conversions over from buffer-head in time for this year's LSFMM. The end results thanks to the review of Darrick, Christoph and others is on the kernelnewbies wiki [0]. This brings this forward a relevant subset of that documentation to the kernel in kdoc format and also kdoc'ifies the existing documentation on iomap.h. Tested with: make htmldocs SPHINXDIRS="filesystems" Then looking at the docs produced on: Documentation/output/filesystems/iomap.html [0] https://kernelnewbies.org/KernelProjects/iomap Signed-off-by: Luis Chamberlain --- Changes on v2: * use 80 char length as if we're in the 1980's Documentation/filesystems/index.rst | 1 + Documentation/filesystems/iomap.rst | 253 +++++++++++++++++++++ include/linux/iomap.h | 336 ++++++++++++++++++---------- 3 files changed, 468 insertions(+), 122 deletions(-) create mode 100644 Documentation/filesystems/iomap.rst diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst index fbb2b5ada95b..6186ab7c3ea8 100644 --- a/Documentation/filesystems/index.rst +++ b/Documentation/filesystems/index.rst @@ -34,6 +34,7 @@ algorithms work. seq_file sharedsubtree idmappings + iomap automount-support diff --git a/Documentation/filesystems/iomap.rst b/Documentation/filesystems/iomap.rst new file mode 100644 index 000000000000..be487030fcff --- /dev/null +++ b/Documentation/filesystems/iomap.rst @@ -0,0 +1,253 @@ +.. SPDX-License-Identifier: GPL-2.0 +.. _iomap: + +.. + Mapping of heading styles within this document: + Heading 1 uses "====" above and below + Heading 2 uses "====" + Heading 3 uses "----" + Heading 4 uses "````" + Heading 5 uses "^^^^" + Heading 6 uses "~~~~" + Heading 7 uses "...." + + Sections are manually numbered because apparently that's what everyone + does in the kernel. +.. contents:: Table of Contents + :local: + +===== +iomap +===== + +.. kernel-doc:: include/linux/iomap.h + +A modern block abstraction +========================== + +**iomap** allows filesystems to query storage media for data using *byte +ranges*. Since block mapping are provided for a *byte ranges* for cache data in +memory, in the page cache, naturally this implies operations on block ranges +will also deal with *multipage* operations in the page cache. **Folios** are +used to help provide *multipage* operations in memory for the *byte ranges* +being worked on. + + +iomap IO interfaces +=================== + +You call **iomap** depending on the type of filesystem operation you are working +on. We detail some of these interactions below. + +iomap for bufferred IO writes +----------------------------- + +You call **iomap** for buffered IO with: + + * ``iomap_file_buffered_write()`` - for buffered writes + * ``iomap_page_mkwrite()`` - when dealing callbacks for + ``struct vm_operations_struct`` + + * ``struct vm_operations_struct.page_mkwrite()`` + * ``struct vm_operations_struct.fault()`` + * ``struct vm_operations_struct.huge_fault()`` + * ``struct vm_operations_struct`.pfn_mkwrite()`` + +You *may* use buffered writes to also deal with ``fallocate()``: + + * ``iomap_zero_range()`` on fallocate for zeroing + * ``iomap_truncate_page()`` on fallocate for truncation + +Typically you'd also happen to use these on paths when updating an inode's size. + +iomap for direct IO +------------------- + +You call **iomap** for direct IO with: + + * ``iomap_dio_rw()`` + +You **may** use direct IO writes to also deal with ``fallocate()``: + + * ``iomap_zero_range()`` on fallocate for zeroing + * ``iomap_truncate_page()`` on fallocate for truncation + +Typically you'd also happen to use these on paths when updating an inode's size. + +iomap for reads +--------------- + +You can call into **iomap** for reading, ie, dealing with the filesystems's +``struct file_operations``: + + * ``struct file_operations.read_iter()``: note that depending on the type of + read your filesystem might use ``iomap_dio_rw()`` for direct IO, + generic_file_read_iter() for buffered IO and + ``dax_iomap_rw()`` for DAX. + * ``struct file_operations.remap_file_range()`` - currently the special + ``dax_remap_file_range_prep()`` helper is provided for DAX mode reads. + +iomap for userspace file extent mapping +--------------------------------------- + +The ``fiemap`` ioctl can be used to allow userspace to get a file extent +mapping. The older ``bmap()`` (aka ``FIBMAP``) allows the VM to map logical +block offset to physical block number. ``bmap()`` is a legacy block mapping +operation supported only for the ioctl and two areas in the kernel which likely +are broken (the default swapfile implementation and odd md bitmap code). +``bmap()`` was only useful in the days of ext2 when there were no support for +delalloc or unwritten extents. Consequently, the interface reports nothing for +those types of mappings. Because of this we don't want filesystems to start +exporting this interface if they don't already do so. + +The ``fiemap`` ioctl is supported through an inode ``struct +inode_operations.fiemap()`` callback. + +You would use ``iomap_fiemap()`` to provide the mapping. You could use two +seperate ``struct iomap_ops`` one for when requested to also map extended +attributes (``FIEMAP_FLAG_XATTR``) and your another ``struct iomap_ops`` for +regular read ``struct iomap_ops`` when there is no need for extended attributes. +In the future **iomap** may provide its own dedicated ops structure for +``fiemap``. + +``iomap_bmap()`` exists and should *only be used* by filesystems that +**already** supported ``FIBMAP``. ``FIBMAP`` **should not be used** with the +address_space -- we have iomap readpages and writepages for that. + +iomap for assisting the VFS +--------------------------- + +A filesystem also needs to call **iomap** when assisting the VFS manipulating a +file into the page cache. + +iomap for VFS reading +--------------------- + +A filesystem can call **iomap** to deal with the VFS reading a file into folios +with: + + * ``iomap_bmap()`` - called to assist the VFS when manipulating page cache with + ``struct address_space_operations.bmap()``, to help the VFS map a logical + block offset to physical block number. + * ``iomap_read_folio()`` - called to assist the page cache with + ``struct address_space_operations.read_folio()`` + * ``iomap_readahead()`` - called to assist the page cache with + ``struct address_space_operations.readahead()`` + +iomap for VFS writepages +------------------------ + +A filesystem can call **iomap** to deal with the VFS write out of pages back to +backing store, that is to help deal with a filesystems's ``struct +address_space_operations.writepages()``. The special ``iomap_writepages()`` is +used for this case with its own respective filestems's ``struct iomap_ops`` for +this. + +iomap for VFS llseek +-------------------- + +A filesystem ``struct address_space_operations.llseek()`` is used by the VFS +when it needs to move the current file offset, the file offset is in ``struct +file.f_pos``. **iomap** has special support for the ``llseek`` ``SEEK_HOLE`` or +``SEEK_DATA`` interfaces: + + * ``iomap_seek_hole()``: for when the + ``struct address_space_operations.llseek()`` *whence* argument is + ``SEEK_HOLE``, when looking for the file's next hole. + * ``iomap_seek_data()``: for when the + ``struct address_space_operations.llseek()`` *whence* argument isj + ``SEEK_DATA`` when looking for the file's next data area. + +Your own ``struct iomap_ops`` for this is encouraged. + +iomap for DAX +------------- +You can use ``dax_iomap_rw()`` when calling iomap from a DAX context, this is +typically from the filesystems's ``struct file_operations.write_iter()`` +callback. + +Converting filesystems from buffer-head to iomap guide +====================================================== + +These are generic guidelines on converting a filesystem over to **iomap** from +'''buffer-heads'''. + +One op at at time +----------------- + +You may try to convert a filesystem with different clustered set of operations +at time, below are a generic order you may strive to target: + + * direct io + * miscellaneous helpers (seek/fiemap/bmap) + * buffered io + +Defining a simple filesystem +---------------------------- + +A simple filesystem is perhaps the easiest to convert over to **iomap**, a +simple filesystem is one which: + + * does not use fsverify, fscrypt, compression + * has no Copy on Write support (reflinks) + +Converting a simple filesystem to iomap +--------------------------------------- + +Simple filesystems should covert to IOMAP piecemeal wise first converting over +**direct IO**, then the miscellaneous helpers (seek/fiemap/bmap) and last +should be buffered IO. + +Converting shared filesystem features +------------------------------------- + +Shared filesystems features such as fscrypt, compression, erasure coding, and +any other data transformations need to be ported to **iomap** first, as none of +the current **iomap** users require any of this functionality. + +Converting complex filesystems +------------------------------ + +If your filesystem relies on any shared filesystem features mentioned above +those would need to be converted piecemeal wise. If reflinks are supported you +need to first ensure proper locking sanity in order to be able to address byte +ranges can be handled properly through **iomap** operations. An example +filesystem where this work is taking place is btrfs. + +IOMAP_F_BUFFER_HEAD considerations +---------------------------------- + +``IOMAP_F_BUFFER_HEAD`` won't be removed until we have all filesystem fully +converted away from **buffer-heads**, and this could be never. + +``IOMAP_F_BUFFER_HEAD`` should be avoided as a stepping stone / to port +filesystems over to **iomap** as it's support for **buffer-heads** only apply to +the buffered write path and nothing else including the read_folio/readahead and +writepages aops. + +Testing Direct IO +================= + +Other than fstests you can use LTP's dio, however this tests is limited as it +does not test stale data. + +{{{ +./runltp -f dio -d /mnt1/scratch/tmp/ +}}} + +Known issues and future improvements +==================================== + +We try to document known issues that folks should be aware of with **iomap** here. + + * write amplification on IOMAP when bs < ps: **iomap** needs improvements for + large folios for dirty bitmap tracking + * filesystems which use buffer head helpers such as ``sb_bread()`` and friends + will have to continue to use buffer heads as there is no generic iomap + metadata read/write library yet. + +References +========== + + * `Presentation on iomap evolution`` + * `LWN review on deprecating buffer-heads ` diff --git a/include/linux/iomap.h b/include/linux/iomap.h index e2b836c2e119..ee4b026995ac 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -10,6 +10,30 @@ #include #include +/** + * DOC: Introduction + * + * iomap allows filesystems to sequentially iterate over byte addressable block + * ranges on an inode and apply operations to it. + * + * iomap grew out of the need to provide a modern block mapping abstraction for + * filesystems with the different IO access methods they support and assisting + * the VFS with manipulating files into the page cache. iomap helpers are + * provided for each of these mechanisms. However, block mapping is just one of + * the features of iomap, given iomap supports DAX IO for filesystems and also + * supports such the ``lseek``/``llseek`` ``SEEK_DATA``/``SEEK_HOLE`` + * interfaces. + * + * Block mapping provides a mapping between data cached in memory and the + * location on persistent storage where that data lives. `LWN has an great + * review of the old buffer-heads block-mapping and why they are inefficient + * `, since the inception of Linux. Since + * **buffer-heads** work on a 512-byte block based paradigm, it creates an + * overhead for modern storage media which no longer necessarily works only on + * 512-blocks. iomap is flexible providing block ranges in *bytes*. iomap, with + * the support of folios, provides a modern replacement for **buffer-heads**. + */ + struct address_space; struct fiemap_extent_info; struct inode; @@ -22,37 +46,43 @@ struct page; struct vm_area_struct; struct vm_fault; -/* - * Types of block ranges for iomap mappings: +/** + * DOC: iomap block ranges types + * + * * IOMAP_HOLE - no blocks allocated, need allocation + * * IOMAP_DELALLOC - delayed allocation blocks + * * IOMAP_MAPPED - blocks allocated at @addr + * * IOMAP_UNWRITTEN - blocks allocated at @addr in unwritten state + * * IOMAP_INLINE - data inline in the inode */ -#define IOMAP_HOLE 0 /* no blocks allocated, need allocation */ -#define IOMAP_DELALLOC 1 /* delayed allocation blocks */ -#define IOMAP_MAPPED 2 /* blocks allocated at @addr */ -#define IOMAP_UNWRITTEN 3 /* blocks allocated at @addr in unwritten state */ -#define IOMAP_INLINE 4 /* data inline in the inode */ +#define IOMAP_HOLE 0 +#define IOMAP_DELALLOC 1 +#define IOMAP_MAPPED 2 +#define IOMAP_UNWRITTEN 3 +#define IOMAP_INLINE 4 -/* - * Flags reported by the file system from iomap_begin: +/** + * DOC: Flags reported by the file system from iomap_begin * - * IOMAP_F_NEW indicates that the blocks have been newly allocated and need - * zeroing for areas that no data is copied to. + * * IOMAP_F_NEW: indicates that the blocks have been newly allocated and need + * zeroing for areas that no data is copied to. * - * IOMAP_F_DIRTY indicates the inode has uncommitted metadata needed to access - * written data and requires fdatasync to commit them to persistent storage. - * This needs to take into account metadata changes that *may* be made at IO - * completion, such as file size updates from direct IO. + * * IOMAP_F_DIRTY: indicates the inode has uncommitted metadata needed to access + * written data and requires fdatasync to commit them to persistent storage. + * This needs to take into account metadata changes that *may* be made at IO + * completion, such as file size updates from direct IO. * - * IOMAP_F_SHARED indicates that the blocks are shared, and will need to be - * unshared as part a write. + * * IOMAP_F_SHARED: indicates that the blocks are shared, and will need to be + * unshared as part a write. * - * IOMAP_F_MERGED indicates that the iomap contains the merge of multiple block - * mappings. + * * IOMAP_F_MERGED: indicates that the iomap contains the merge of multiple + * block mappings. * - * IOMAP_F_BUFFER_HEAD indicates that the file system requires the use of - * buffer heads for this mapping. + * * IOMAP_F_BUFFER_HEAD: indicates that the file system requires the use of + * buffer heads for this mapping. * - * IOMAP_F_XATTR indicates that the iomap is for an extended attribute extent - * rather than a file data extent. + * * IOMAP_F_XATTR: indicates that the iomap is for an extended attribute extent + * rather than a file data extent. */ #define IOMAP_F_NEW (1U << 0) #define IOMAP_F_DIRTY (1U << 1) @@ -61,22 +91,20 @@ struct vm_fault; #define IOMAP_F_BUFFER_HEAD (1U << 4) #define IOMAP_F_XATTR (1U << 5) -/* - * Flags set by the core iomap code during operations: +/** + * DOC: Flags set by the core iomap code during operations + * + * * IOMAP_F_SIZE_CHANGED: indicates to the iomap_end method that the file size + * has changed as the result of this write operation. * - * IOMAP_F_SIZE_CHANGED indicates to the iomap_end method that the file size - * has changed as the result of this write operation. + * * IOMAP_F_STALE: indicates that the iomap is not valid any longer and the file + * range it covers needs to be remapped by the high level before the + * operation can proceed. * - * IOMAP_F_STALE indicates that the iomap is not valid any longer and the file - * range it covers needs to be remapped by the high level before the operation - * can proceed. + * * IOMAP_F_PRIVATE: Flags from 0x1000 up are for file system specific usage */ #define IOMAP_F_SIZE_CHANGED (1U << 8) #define IOMAP_F_STALE (1U << 9) - -/* - * Flags from 0x1000 up are for file system specific usage: - */ #define IOMAP_F_PRIVATE (1U << 12) @@ -124,73 +152,119 @@ static inline bool iomap_inline_data_valid(const struct iomap *iomap) return iomap->length <= PAGE_SIZE - offset_in_page(iomap->inline_data); } -/* - * When a filesystem sets folio_ops in an iomap mapping it returns, get_folio - * and put_folio will be called for each folio written to. This only applies - * to buffered writes as unbuffered writes will not typically have folios - * associated with them. - * - * When get_folio succeeds, put_folio will always be called to do any - * cleanup work necessary. put_folio is responsible for unlocking and putting - * @folio. +/** + * struct iomap_folio_ops - buffered writes folio folio reference count helpers + * + * A filesystem can optionally set folio_ops in a &struct iomap mapping it + * returns to override the default get_folio and put_folio for each folio + * written to. This only applies to buffered writes as unbuffered writes will + * not typically have folios associated with them. + * + * @get_folio: iomap defaults to iomap_get_folio() (which calls + * __filemap_get_folio()) if the filesystem did not provide a get folio op. + * + * @put_folio: when get_folio succeeds, put_folio will always be called to do + * any cleanup work necessary. put_folio is responsible for unlocking and + * putting @folio. + * + * @iomap_valid: check that the cached iomap still maps correctly to the + * filesystem's internal extent map. FS internal extent maps can change + * while iomap is iterating a cached iomap, so this hook allows iomap to + * detect that the iomap needs to be refreshed during a long running write operation. + * + * The filesystem can store internal state (e.g. a sequence number) in + * iomap->validity_cookie when the iomap is first mapped to be able to + * detect changes between mapping time and whenever .iomap_valid() is + * called. + * + * This is called with the folio over the specified file position held + * locked by the iomap code. This is useful for filesystems that have + * dynamic mappings (e.g. anything other than zonefs). An example reason + * as to why this is necessary is writeback doesn't take the vfs locks. */ struct iomap_folio_ops { struct folio *(*get_folio)(struct iomap_iter *iter, loff_t pos, unsigned len); void (*put_folio)(struct inode *inode, loff_t pos, unsigned copied, struct folio *folio); - - /* - * Check that the cached iomap still maps correctly to the filesystem's - * internal extent map. FS internal extent maps can change while iomap - * is iterating a cached iomap, so this hook allows iomap to detect that - * the iomap needs to be refreshed during a long running write - * operation. - * - * The filesystem can store internal state (e.g. a sequence number) in - * iomap->validity_cookie when the iomap is first mapped to be able to - * detect changes between mapping time and whenever .iomap_valid() is - * called. - * - * This is called with the folio over the specified file position held - * locked by the iomap code. - */ bool (*iomap_valid)(struct inode *inode, const struct iomap *iomap); }; -/* - * Flags for iomap_begin / iomap_end. No flag implies a read. +/** + * DOC: Flags for iomap_begin / iomap_end. No flag implies a read. + * + * * IOMAP_WRITE: writing, must allocate blocks + * * IOMAP_ZERO: zeroing operation, may skip holes + * * IOMAP_REPORT: report extent status, e.g. FIEMAP + * * IOMAP_FAULT: mapping for page fault + * * IOMAP_DIRECT: direct I/O + * * IOMAP_NOWAIT: do not block + * * IOMAP_OVERWRITE_ONLY: only pure overwrites allowed + * * IOMAP_UNSHARE: unshare_file_range + * * IOMAP_DAX: DAX mapping */ -#define IOMAP_WRITE (1 << 0) /* writing, must allocate blocks */ -#define IOMAP_ZERO (1 << 1) /* zeroing operation, may skip holes */ -#define IOMAP_REPORT (1 << 2) /* report extent status, e.g. FIEMAP */ -#define IOMAP_FAULT (1 << 3) /* mapping for page fault */ -#define IOMAP_DIRECT (1 << 4) /* direct I/O */ -#define IOMAP_NOWAIT (1 << 5) /* do not block */ -#define IOMAP_OVERWRITE_ONLY (1 << 6) /* only pure overwrites allowed */ -#define IOMAP_UNSHARE (1 << 7) /* unshare_file_range */ +#define IOMAP_WRITE (1 << 0) +#define IOMAP_ZERO (1 << 1) +#define IOMAP_REPORT (1 << 2) +#define IOMAP_FAULT (1 << 3) +#define IOMAP_DIRECT (1 << 4) +#define IOMAP_NOWAIT (1 << 5) +#define IOMAP_OVERWRITE_ONLY (1 << 6) +#define IOMAP_UNSHARE (1 << 7) #ifdef CONFIG_FS_DAX -#define IOMAP_DAX (1 << 8) /* DAX mapping */ +#define IOMAP_DAX (1 << 8) #else #define IOMAP_DAX 0 #endif /* CONFIG_FS_DAX */ +/** + * struct iomap_ops - IO interface specific operations + * + * A filesystem is must provide a &struct iomap_ops for to deal with the + * beginning an IO operation, iomap_begin(), and ending an IO operation on a + * block range, ``iomap_end()``. You would call iomap with a specialized iomap + * operation depending on its filesystem or the VFS needs. + * + * For example iomap_dio_rw() would be used for for a filesystem when doing a + * block range read or write operation with direct IO. In this case your + * filesystem's respective &struct file_operations.write_iter() would eventually + * call iomap_dio_rw() on the filesystem's &struct file_operations.write_iter(). + * + * For buffered IO a filesystem would use iomap_file_buffered_write() on the + * same &struct file_operations.write_iter(). But that is not the only situation + * in which a filesystem would deal with buffered writes, you could also use + * buffered writes when a filesystem has to deal with &struct + * file_operations.fallocate(). However fallocate() can be used for *zeroing* or + * for *truncation* purposes. A special respective iomap_zero_range() would be + * used for *zeroing* and a iomap_truncate_page() would be used for + * *truncation*. + * + * Experience with adopting iomap on filesystems have shown that the filesystem + * implementation of these operations can be simplified considerably if one + * &struct iomap_ops is provided per major filesystem IO operation: + * + * * buffered io + * * direct io + * * DAX io + * * fiemap for with extended attributes (``FIEMAP_FLAG_XATTR``) + * * lseek + * + * @iomap_begin: return the existing mapping at pos, or reserve space starting + * at pos for up to length, as long as we can do it as a single mapping. The + * actual length is returned in iomap->length. The &struct iomap iomap must + * always be set. The &struct iomap srcmap should be set if the range is + * CoW. + * + * @iomap_end: commit and/or unreserve space previous allocated using + * iomap_begin. Written indicates the length of the successful write + * operation which needs to be committed, while the rest needs to be + * unreserved. Written might be zero if no data was written. + */ struct iomap_ops { - /* - * Return the existing mapping at pos, or reserve space starting at - * pos for up to length, as long as we can do it as a single mapping. - * The actual length is returned in iomap->length. - */ int (*iomap_begin)(struct inode *inode, loff_t pos, loff_t length, unsigned flags, struct iomap *iomap, struct iomap *srcmap); - /* - * Commit and/or unreserve space previous allocated using iomap_begin. - * Written indicates the length of the successful write operation which - * needs to be commited, while the rest needs to be unreserved. - * Written might be zero if no data was written. - */ int (*iomap_end)(struct inode *inode, loff_t pos, loff_t length, ssize_t written, unsigned flags, struct iomap *iomap); }; @@ -207,6 +281,7 @@ struct iomap_ops { * @flags: Zero or more of the iomap_begin flags above. * @iomap: Map describing the I/O iteration * @srcmap: Source map for COW operations + * @private: internal use */ struct iomap_iter { struct inode *inode; @@ -241,7 +316,7 @@ static inline u64 iomap_length(const struct iomap_iter *iter) * @i: iteration structure * * Write operations on file systems with reflink support might require a - * source and a destination map. This function retourns the source map + * source and a destination map. This function returns the source map * for a given operation, which may or may no be identical to the destination * map in &i->iomap. */ @@ -281,42 +356,52 @@ loff_t iomap_seek_data(struct inode *inode, loff_t offset, sector_t iomap_bmap(struct address_space *mapping, sector_t bno, const struct iomap_ops *ops); -/* - * Structure for writeback I/O completions. +/** + * struct iomap_ioend - for writeback I/O completions + * + * @io_list: next ioend in chain + * @io_type: + * @io_flags: IOMAP_F_* + * @io_folios: folios added to ioend + * @io_inode: file being written to + * @io_size: size of the extent + * @io_offset: offset in the file + * @io_sector: start sector of ioend + * @io_bio: bio being built + * @io_inline_bio: MUST BE LAST! */ struct iomap_ioend { - struct list_head io_list; /* next ioend in chain */ + struct list_head io_list; u16 io_type; - u16 io_flags; /* IOMAP_F_* */ - u32 io_folios; /* folios added to ioend */ - struct inode *io_inode; /* file being written to */ - size_t io_size; /* size of the extent */ - loff_t io_offset; /* offset in the file */ - sector_t io_sector; /* start sector of ioend */ - struct bio *io_bio; /* bio being built */ - struct bio io_inline_bio; /* MUST BE LAST! */ + u16 io_flags; + u32 io_folios; + struct inode *io_inode; + size_t io_size; + loff_t io_offset + sector_t io_sector; + struct bio *io_bio; + struct bio io_inline_bio; }; +/** + * struct iomap_writeback_ops - used for writeback + * + * This structure is used to support dealing with a filesystem + * ``struct address_space_operations.writepages()``, for writeback. + * + * @map_blocks: required, maps the blocks so that writeback can be performed on + * the range starting at offset. + * @prepare_ioend: optional, allows the file systems to perform actions just + * before submitting the bio and/or override the bio end_io handler for + * complex operations like copy on write extent manipulation or unwritten + * extent conversions. + * @discard_folio: optional, allows the file system to discard state on a page where + * we failed to submit any I/O. + */ struct iomap_writeback_ops { - /* - * Required, maps the blocks so that writeback can be performed on - * the range starting at offset. - */ int (*map_blocks)(struct iomap_writepage_ctx *wpc, struct inode *inode, loff_t offset); - - /* - * Optional, allows the file systems to perform actions just before - * submitting the bio and/or override the bio end_io handler for complex - * operations like copy on write extent manipulation or unwritten extent - * conversions. - */ int (*prepare_ioend)(struct iomap_ioend *ioend, int status); - - /* - * Optional, allows the file system to discard state on a page where - * we failed to submit any I/O. - */ void (*discard_folio)(struct folio *folio, loff_t pos); }; @@ -334,26 +419,33 @@ int iomap_writepages(struct address_space *mapping, struct writeback_control *wbc, struct iomap_writepage_ctx *wpc, const struct iomap_writeback_ops *ops); -/* - * Flags for direct I/O ->end_io: +/** + * DOC: Flags for direct I/O ->end_io + * + * * IOMAP_DIO_UNWRITTEN: covers unwritten extent(s) + * * IOMAP_DIO_COW: covers COW extent(s) */ -#define IOMAP_DIO_UNWRITTEN (1 << 0) /* covers unwritten extent(s) */ -#define IOMAP_DIO_COW (1 << 1) /* covers COW extent(s) */ +#define IOMAP_DIO_UNWRITTEN (1 << 0) +#define IOMAP_DIO_COW (1 << 1) +/** + * struct iomap_dio_ops - used for direct IO + * + * This is used to support direct IO. + * + * @end_io: + * @submit_io: + * @bio_set: Filesystems wishing to attach private information to a direct io + * bio must provide a ->submit_io method that attaches the additional + * information to the bio and changes the ->bi_end_io callback to a custom + * function. This function should, at a minimum, perform any relevant + * post-processing of the bio and end with a call to iomap_dio_bio_end_io. + */ struct iomap_dio_ops { int (*end_io)(struct kiocb *iocb, ssize_t size, int error, unsigned flags); void (*submit_io)(const struct iomap_iter *iter, struct bio *bio, loff_t file_offset); - - /* - * Filesystems wishing to attach private information to a direct io bio - * must provide a ->submit_io method that attaches the additional - * information to the bio and changes the ->bi_end_io callback to a - * custom function. This function should, at a minimum, perform any - * relevant post-processing of the bio and end with a call to - * iomap_dio_bio_end_io. - */ struct bio_set *bio_set; };