diff mbox series

[v2] Documentation: add initial iomap kdoc

Message ID 20230518150105.3160445-1-mcgrof@kernel.org (mailing list archive)
State New, archived
Headers show
Series [v2] Documentation: add initial iomap kdoc | expand

Commit Message

Luis Chamberlain May 18, 2023, 3:01 p.m. UTC
To help with iomap adoption / porting I set out the goal to try to
help improve the iomap documentation and get general guidance for
filesystem conversions over from buffer-head in time for this year's
LSFMM. The end results thanks to the review of Darrick, Christoph and
others is on the kernelnewbies wiki [0].

This brings this forward a relevant subset of that documentation to
the kernel in kdoc format and also kdoc'ifies the existing documentation
on iomap.h.

Tested with:

make htmldocs SPHINXDIRS="filesystems"

Then looking at the docs produced on:

Documentation/output/filesystems/iomap.html

[0] https://kernelnewbies.org/KernelProjects/iomap

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---

Changes on v2:

  * use 80 char length as if we're in the 1980's

 Documentation/filesystems/index.rst |   1 +
 Documentation/filesystems/iomap.rst | 253 +++++++++++++++++++++
 include/linux/iomap.h               | 336 ++++++++++++++++++----------
 3 files changed, 468 insertions(+), 122 deletions(-)
 create mode 100644 Documentation/filesystems/iomap.rst

Comments

Randy Dunlap May 18, 2023, 3:49 p.m. UTC | #1
On 5/18/23 08:01, Luis Chamberlain wrote:
> To help with iomap adoption / porting I set out the goal to try to
> help improve the iomap documentation and get general guidance for
> filesystem conversions over from buffer-head in time for this year's
> LSFMM. The end results thanks to the review of Darrick, Christoph and
> others is on the kernelnewbies wiki [0].
> 
> This brings this forward a relevant subset of that documentation to
> the kernel in kdoc format and also kdoc'ifies the existing documentation
> on iomap.h.
> 
> Tested with:
> 
> make htmldocs SPHINXDIRS="filesystems"
> 
> Then looking at the docs produced on:
> 
> Documentation/output/filesystems/iomap.html
> 
> [0] https://kernelnewbies.org/KernelProjects/iomap
> 
> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
> ---
> 
> Changes on v2:
> 
>   * use 80 char length as if we're in the 1980's

Well, like Jon said, long lines are difficult to read, even on printed paper.
That's (at least one reason) why newspapers(!) have narrow columns of print.

Anyway, thanks for doing it.

> 
>  Documentation/filesystems/index.rst |   1 +
>  Documentation/filesystems/iomap.rst | 253 +++++++++++++++++++++
>  include/linux/iomap.h               | 336 ++++++++++++++++++----------
>  3 files changed, 468 insertions(+), 122 deletions(-)
>  create mode 100644 Documentation/filesystems/iomap.rst
> 
> diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
> index fbb2b5ada95b..6186ab7c3ea8 100644
> --- a/Documentation/filesystems/index.rst
> +++ b/Documentation/filesystems/index.rst
> @@ -34,6 +34,7 @@ algorithms work.
>     seq_file
>     sharedsubtree
>     idmappings
> +   iomap
>  
>     automount-support
>  
> diff --git a/Documentation/filesystems/iomap.rst b/Documentation/filesystems/iomap.rst
> new file mode 100644
> index 000000000000..be487030fcff
> --- /dev/null
> +++ b/Documentation/filesystems/iomap.rst
> @@ -0,0 +1,253 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +.. _iomap:
> +
> +..
> +        Mapping of heading styles within this document:
> +        Heading 1 uses "====" above and below
> +        Heading 2 uses "===="
> +        Heading 3 uses "----"
> +        Heading 4 uses "````"
> +        Heading 5 uses "^^^^"
> +        Heading 6 uses "~~~~"
> +        Heading 7 uses "...."
> +
> +        Sections are manually numbered because apparently that's what everyone
> +        does in the kernel.
> +.. contents:: Table of Contents
> +   :local:
> +
> +=====
> +iomap
> +=====
> +
> +.. kernel-doc:: include/linux/iomap.h
> +
> +A modern block abstraction
> +==========================
> +
> +**iomap** allows filesystems to query storage media for data using *byte
> +ranges*. Since block mapping are provided for a *byte ranges* for cache data in

                        mappings

> +memory, in the page cache, naturally this implies operations on block ranges
> +will also deal with *multipage* operations in the page cache. **Folios** are
> +used to help provide *multipage* operations in memory for the *byte ranges*
> +being worked on.
> +
> +
> +iomap IO interfaces
> +===================
> +
> +You call **iomap** depending on the type of filesystem operation you are working
> +on. We detail some of these interactions below.
> +
> +iomap for bufferred IO writes

             buffered

> +-----------------------------
> +
> +You call **iomap** for buffered IO with:
> +
> + * ``iomap_file_buffered_write()`` - for buffered writes
> + * ``iomap_page_mkwrite()`` - when dealing callbacks for

                                      dealing with callbacks for

> +    ``struct vm_operations_struct``
> +
> +  * ``struct vm_operations_struct.page_mkwrite()``
> +  * ``struct vm_operations_struct.fault()``
> +  * ``struct vm_operations_struct.huge_fault()``
> +  * ``struct vm_operations_struct`.pfn_mkwrite()``

What are these 4 structs for?  Or why the blank line above them?
Confusing.

> +
> +You *may* use buffered writes to also deal with ``fallocate()``:
> +
> + * ``iomap_zero_range()`` on fallocate for zeroing
> + * ``iomap_truncate_page()`` on fallocate for truncation
> +
> +Typically you'd also happen to use these on paths when updating an inode's size.
> +
> +iomap for direct IO
> +-------------------
> +
> +You call **iomap** for direct IO with:
> +
> + * ``iomap_dio_rw()``
> +
> +You **may** use direct IO writes to also deal with ``fallocate()``:
> +
> + * ``iomap_zero_range()`` on fallocate for zeroing
> + * ``iomap_truncate_page()`` on fallocate for truncation
> +
> +Typically you'd also happen to use these on paths when updating an inode's size.

                   also use these on paths ...

> +
> +iomap for reads
> +---------------
> +
> +You can call into **iomap** for reading, ie, dealing with the filesystems's

                                            i.e.,

> +``struct file_operations``:
> +
> + * ``struct file_operations.read_iter()``: note that depending on the type of
> +   read your filesystem might use ``iomap_dio_rw()`` for direct IO,
> +   generic_file_read_iter() for buffered IO and
> +   ``dax_iomap_rw()`` for DAX.
> + * ``struct file_operations.remap_file_range()`` - currently the special
> +   ``dax_remap_file_range_prep()`` helper is provided for DAX mode reads.
> +
> +iomap for userspace file extent mapping
> +---------------------------------------
> +
> +The ``fiemap`` ioctl can be used to allow userspace to get a file extent
> +mapping. The older ``bmap()`` (aka ``FIBMAP``)  allows the VM to map logical
> +block offset to physical block number.  ``bmap()`` is a legacy block mapping
> +operation supported only for the ioctl and two areas in the kernel which likely
> +are broken (the default swapfile implementation and odd md bitmap code).
> +``bmap()`` was only useful in the days of ext2 when there were no support for
> +delalloc or unwritten extents. Consequently, the interface reports nothing for
> +those types of mappings. Because of this we don't want filesystems to start
> +exporting this interface if they don't already do so.
> +
> +The ``fiemap`` ioctl is supported through an inode ``struct
> +inode_operations.fiemap()`` callback.
> +
> +You would use ``iomap_fiemap()`` to provide the mapping. You could use two
> +seperate ``struct iomap_ops`` one for when requested to also map extended

   separate

> +attributes (``FIEMAP_FLAG_XATTR``) and your another ``struct iomap_ops`` for

                                          yet

> +regular read ``struct iomap_ops`` when there is no need for extended attributes.
> +In the future **iomap** may provide its own dedicated ops structure for
> +``fiemap``.
> +
> +``iomap_bmap()`` exists and should *only be used* by filesystems that
> +**already** supported ``FIBMAP``.  ``FIBMAP`` **should not be used** with the

               support

> +address_space -- we have iomap readpages and writepages for that.
> +
> +iomap for assisting the VFS
> +---------------------------
> +
> +A filesystem also needs to call **iomap** when assisting the VFS manipulating a
> +file into the page cache.
> +
> +iomap for VFS reading
> +---------------------
> +
> +A filesystem can call **iomap** to deal with the VFS reading a file into folios
> +with:
> +
> + * ``iomap_bmap()`` - called to assist the VFS when manipulating page cache with
> +   ``struct address_space_operations.bmap()``, to help the VFS map a logical
> +   block offset to physical block number.
> + * ``iomap_read_folio()`` - called to assist the page cache with
> +   ``struct address_space_operations.read_folio()``
> + * ``iomap_readahead()`` - called to assist the page cache with
> +   ``struct address_space_operations.readahead()``
> +
> +iomap for VFS writepages
> +------------------------
> +
> +A filesystem can call **iomap** to deal with the VFS write out of pages back to
> +backing store, that is to help deal with a filesystems's ``struct
> +address_space_operations.writepages()``. The special ``iomap_writepages()`` is
> +used for this case with its own respective filestems's ``struct iomap_ops`` for
> +this.
> +
> +iomap for VFS llseek
> +--------------------
> +
> +A filesystem ``struct address_space_operations.llseek()`` is used by the VFS
> +when it needs to move the current file offset, the file offset is in ``struct

                                          offset;
or
                                          offset. The

> +file.f_pos``. **iomap** has special support for the ``llseek`` ``SEEK_HOLE`` or
> +``SEEK_DATA`` interfaces:
> +
> + * ``iomap_seek_hole()``: for when the
> +   ``struct address_space_operations.llseek()`` *whence* argument is
> +   ``SEEK_HOLE``, when looking for the file's next hole.
> + * ``iomap_seek_data()``: for when the
> +   ``struct address_space_operations.llseek()`` *whence* argument isj

                                                                     is

> +   ``SEEK_DATA`` when looking for the file's next data area.
> +
> +Your own ``struct iomap_ops`` for this is encouraged.
> +
> +iomap for DAX
> +-------------
> +You can use ``dax_iomap_rw()`` when calling iomap from a DAX context, this is

                                                                context. This is

> +typically from the filesystems's ``struct file_operations.write_iter()``
> +callback.
> +
> +Converting filesystems from buffer-head to iomap guide
> +======================================================
> +
> +These are generic guidelines on converting a filesystem over to **iomap** from
> +'''buffer-heads'''.
> +
> +One op at at time
> +-----------------
> +
> +You may try to convert a filesystem with different clustered set of operations> +at time, below are a generic order you may strive to target:

  on at a time. Below is

> +
> + * direct io

             IO

> + * miscellaneous helpers (seek/fiemap/bmap)
> + * buffered io

               IO

> +
> +Defining a simple filesystem
> +----------------------------
> +
> +A simple filesystem is perhaps the easiest to convert over to **iomap**, a

                                                                 **iomap**. A

> +simple filesystem is one which:
> +
> + * does not use fsverify, fscrypt, compression
> + * has no Copy on Write support (reflinks)
> +
> +Converting a simple filesystem to iomap
> +---------------------------------------
> +
> +Simple filesystems should covert to IOMAP piecemeal wise first converting over

                                                       wise, first

> +**direct IO**, then the miscellaneous helpers  (seek/fiemap/bmap) and last

   No need for **..** on direct IO if not also used on buffered IO below.

s/  / /

> +should be buffered IO.
> +
> +Converting shared filesystem features
> +-------------------------------------
> +
> +Shared filesystems features such as fscrypt, compression, erasure coding, and
> +any other data transformations need to be ported to **iomap** first, as none of
> +the current **iomap** users require any of this functionality.
> +
> +Converting complex filesystems
> +------------------------------
> +
> +If your filesystem relies on any shared filesystem features mentioned above

                                                                         above,

> +those would need to be converted piecemeal wise. If reflinks are supported you

IMO "wise" is not needed here or above when piecemeal is use.

> +need to first ensure proper locking sanity in order to be able to address byte
> +ranges can be handled properly through **iomap** operations.  An example
> +filesystem where this work is taking place is btrfs.
> +
> +IOMAP_F_BUFFER_HEAD considerations
> +----------------------------------
> +
> +``IOMAP_F_BUFFER_HEAD`` won't be removed until we have all filesystem fully
> +converted away from **buffer-heads**, and this could be never.
> +
> +``IOMAP_F_BUFFER_HEAD`` should be avoided as a stepping stone / to port
> +filesystems over to **iomap** as it's support for **buffer-heads** only apply to

                                    its                                    applies to

> +the buffered write path and nothing else including the read_folio/readahead and
> +writepages aops.
> +
> +Testing Direct IO
> +=================
> +
> +Other than fstests you can use LTP's dio, however this tests is limited as it

                                        dio. However, this test

> +does not test stale data.
> +
> +{{{
> +./runltp -f dio -d /mnt1/scratch/tmp/
> +}}}
> +
> +Known issues and future improvements
> +====================================
> +
> +We try to document known issues that folks should be aware of with **iomap** here.
> +
> + * write amplification on IOMAP when bs < ps: **iomap** needs improvements for

Is that buffer size < page size?  Preferably don't use such cryptic abbreviations.

> +   large folios for dirty bitmap tracking
> + * filesystems which use buffer head helpers such as ``sb_bread()`` and friends
> +   will have to continue to use buffer heads as there is no generic iomap
> +   metadata read/write library yet.
> +
> +References
> +==========
> +
> +  *  `Presentation on iomap evolution`<https://docs.google.com/presentation/d/e/2PACX-1vSN4TmhiTu1c6HNv6_gJZFqbFZpbF7GkABllSwJw5iLnSYKkkO-etQJ3AySYEbgJA/pub?start=true&loop=false&delayms=3000&slide=id.g189cfd05063_0_185>`
> +  * `LWN review on deprecating buffer-heads <https://lwn.net/Articles/930173/>`
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index e2b836c2e119..ee4b026995ac 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -10,6 +10,30 @@
>  #include <linux/mm_types.h>
>  #include <linux/blkdev.h>
>  
> +/**
> + * DOC: Introduction
> + *
> + * iomap allows filesystems to sequentially iterate over byte addressable block
> + * ranges on an inode and apply operations to it.
> + *
> + * iomap grew out of the need to provide a modern block mapping abstraction for
> + * filesystems with the different IO access methods they support and assisting
> + * the VFS with manipulating files into the page cache. iomap helpers are
> + * provided for each of these mechanisms. However, block mapping is just one of
> + * the features of iomap, given iomap supports DAX IO for filesystems and also
> + * supports such the ``lseek``/``llseek`` ``SEEK_DATA``/``SEEK_HOLE``
> + * interfaces.
> + *
> + * Block mapping provides a mapping between data cached in memory and the
> + * location on persistent storage where that data lives. `LWN has an great
> + * review of the old buffer-heads block-mapping and why they are inefficient
> + * <https://lwn.net/Articles/930173/>`, since the inception of Linux.  Since
> + * **buffer-heads** work on a 512-byte block based paradigm, it creates an

                                          block-based

> + * overhead for modern storage media which no longer necessarily works only on
> + * 512-blocks. iomap is flexible providing block ranges in *bytes*. iomap, with

      512-byte blocks.

> + * the support of folios, provides a modern replacement for **buffer-heads**.
> + */
> +
>  struct address_space;
>  struct fiemap_extent_info;
>  struct inode;
> @@ -22,37 +46,43 @@ struct page;
>  struct vm_area_struct;
>  struct vm_fault;
>  
> -/*
> - * Types of block ranges for iomap mappings:
> +/**
> + * DOC: iomap block ranges types
> + *
> + * * IOMAP_HOLE		- no blocks allocated, need allocation
> + * * IOMAP_DELALLOC	- delayed allocation blocks
> + * * IOMAP_MAPPED	- blocks allocated at @addr
> + * * IOMAP_UNWRITTEN	- blocks allocated at @addr in unwritten state
> + * * IOMAP_INLINE	- data inline in the inode
>   */
> -#define IOMAP_HOLE	0	/* no blocks allocated, need allocation */
> -#define IOMAP_DELALLOC	1	/* delayed allocation blocks */
> -#define IOMAP_MAPPED	2	/* blocks allocated at @addr */
> -#define IOMAP_UNWRITTEN	3	/* blocks allocated at @addr in unwritten state */
> -#define IOMAP_INLINE	4	/* data inline in the inode */
> +#define IOMAP_HOLE	0
> +#define IOMAP_DELALLOC	1
> +#define IOMAP_MAPPED	2
> +#define IOMAP_UNWRITTEN	3
> +#define IOMAP_INLINE	4
>  
> -/*
> - * Flags reported by the file system from iomap_begin:
> +/**
> + * DOC:  Flags reported by the file system from iomap_begin
>   *
> - * IOMAP_F_NEW indicates that the blocks have been newly allocated and need
> - * zeroing for areas that no data is copied to.
> + * * IOMAP_F_NEW: indicates that the blocks have been newly allocated and need
> + *	zeroing for areas that no data is copied to.
>   *
> - * IOMAP_F_DIRTY indicates the inode has uncommitted metadata needed to access
> - * written data and requires fdatasync to commit them to persistent storage.
> - * This needs to take into account metadata changes that *may* be made at IO
> - * completion, such as file size updates from direct IO.
> + * * IOMAP_F_DIRTY: indicates the inode has uncommitted metadata needed to access
> + *	written data and requires fdatasync to commit them to persistent storage.
> + *	This needs to take into account metadata changes that *may* be made at IO
> + *	completion, such as file size updates from direct IO.
>   *
> - * IOMAP_F_SHARED indicates that the blocks are shared, and will need to be
> - * unshared as part a write.
> + * * IOMAP_F_SHARED: indicates that the blocks are shared, and will need to be
> + *	unshared as part a write.
>   *
> - * IOMAP_F_MERGED indicates that the iomap contains the merge of multiple block
> - * mappings.
> + * * IOMAP_F_MERGED: indicates that the iomap contains the merge of multiple
> + *	block mappings.
>   *
> - * IOMAP_F_BUFFER_HEAD indicates that the file system requires the use of
> - * buffer heads for this mapping.
> + * * IOMAP_F_BUFFER_HEAD: indicates that the file system requires the use of
> + *	buffer heads for this mapping.
>   *
> - * IOMAP_F_XATTR indicates that the iomap is for an extended attribute extent
> - * rather than a file data extent.
> + * * IOMAP_F_XATTR: indicates that the iomap is for an extended attribute extent
> + *	rather than a file data extent.
>   */
>  #define IOMAP_F_NEW		(1U << 0)
>  #define IOMAP_F_DIRTY		(1U << 1)
> @@ -61,22 +91,20 @@ struct vm_fault;
>  #define IOMAP_F_BUFFER_HEAD	(1U << 4)
>  #define IOMAP_F_XATTR		(1U << 5)
>  
> -/*
> - * Flags set by the core iomap code during operations:
> +/**
> + * DOC: Flags set by the core iomap code during operations
> + *
> + * * IOMAP_F_SIZE_CHANGED: indicates to the iomap_end method that the file size
> + *	has changed as the result of this write operation.
>   *
> - * IOMAP_F_SIZE_CHANGED indicates to the iomap_end method that the file size
> - * has changed as the result of this write operation.
> + * * IOMAP_F_STALE: indicates that the iomap is not valid any longer and the file
> + *	range it covers needs to be remapped by the high level before the
> + *	operation can proceed.
>   *
> - * IOMAP_F_STALE indicates that the iomap is not valid any longer and the file
> - * range it covers needs to be remapped by the high level before the operation
> - * can proceed.
> + * * IOMAP_F_PRIVATE: Flags from 0x1000 up are for file system specific usage
>   */
>  #define IOMAP_F_SIZE_CHANGED	(1U << 8)
>  #define IOMAP_F_STALE		(1U << 9)
> -
> -/*
> - * Flags from 0x1000 up are for file system specific usage:
> - */
>  #define IOMAP_F_PRIVATE		(1U << 12)
>  
>  
> @@ -124,73 +152,119 @@ static inline bool iomap_inline_data_valid(const struct iomap *iomap)
>  	return iomap->length <= PAGE_SIZE - offset_in_page(iomap->inline_data);
>  }
>  
> -/*
> - * When a filesystem sets folio_ops in an iomap mapping it returns, get_folio
> - * and put_folio will be called for each folio written to.  This only applies
> - * to buffered writes as unbuffered writes will not typically have folios
> - * associated with them.
> - *
> - * When get_folio succeeds, put_folio will always be called to do any
> - * cleanup work necessary.  put_folio is responsible for unlocking and putting
> - * @folio.
> +/**
> + * struct iomap_folio_ops - buffered writes folio folio reference count helpers

                                               double folio?

> + *
> + * A filesystem can optionally set folio_ops in a &struct iomap mapping it
> + * returns to override the default get_folio and put_folio for each folio
> + * written to.  This only applies to buffered writes as unbuffered writes will
> + * not typically have folios associated with them.
> + *
> + * @get_folio: iomap defaults to iomap_get_folio() (which calls
> + *	__filemap_get_folio()) if the filesystem did not provide a get folio op.
> + *
> + * @put_folio: when get_folio succeeds, put_folio will always be called to do
> + *	any cleanup work necessary. put_folio is responsible for unlocking and
> + *	putting @folio.
> + *
> + * @iomap_valid: check that the cached iomap still maps correctly to the
> + *	filesystem's internal extent map. FS internal extent maps can change
> + *	while iomap is iterating a cached iomap, so this hook allows iomap to
> + *	detect that the iomap needs to be refreshed during a long running write operation.
> + *
> + *	The filesystem can store internal state (e.g. a sequence number) in
> + *	iomap->validity_cookie when the iomap is first mapped to be able to
> + *	detect changes between mapping time and whenever .iomap_valid() is
> + *	called.
> + *
> + *	This is called with the folio over the specified file position held
> + *	locked by the iomap code.  This is useful for filesystems that have
> + *	dynamic mappings (e.g. anything other than zonefs).  An example reason
> + *	as to why this is necessary is writeback doesn't take the vfs locks.

                                                                  VFS

>   */
>  struct iomap_folio_ops {
>  	struct folio *(*get_folio)(struct iomap_iter *iter, loff_t pos,
>  			unsigned len);
>  	void (*put_folio)(struct inode *inode, loff_t pos, unsigned copied,
>  			struct folio *folio);
> -
> -	/*
> -	 * Check that the cached iomap still maps correctly to the filesystem's
> -	 * internal extent map. FS internal extent maps can change while iomap
> -	 * is iterating a cached iomap, so this hook allows iomap to detect that
> -	 * the iomap needs to be refreshed during a long running write
> -	 * operation.
> -	 *
> -	 * The filesystem can store internal state (e.g. a sequence number) in
> -	 * iomap->validity_cookie when the iomap is first mapped to be able to
> -	 * detect changes between mapping time and whenever .iomap_valid() is
> -	 * called.
> -	 *
> -	 * This is called with the folio over the specified file position held
> -	 * locked by the iomap code.
> -	 */
>  	bool (*iomap_valid)(struct inode *inode, const struct iomap *iomap);
>  };
>  
> -/*
> - * Flags for iomap_begin / iomap_end.  No flag implies a read.
> +/**
> + * DOC:  Flags for iomap_begin / iomap_end.  No flag implies a read.
> + *
> + * * IOMAP_WRITE: writing, must allocate blocks
> + * * IOMAP_ZERO: zeroing operation, may skip holes
> + * * IOMAP_REPORT: report extent status, e.g. FIEMAP
> + * * IOMAP_FAULT: mapping for page fault
> + * * IOMAP_DIRECT: direct I/O
> + * * IOMAP_NOWAIT: do not block
> + * * IOMAP_OVERWRITE_ONLY: only pure overwrites allowed
> + * * IOMAP_UNSHARE: unshare_file_range
> + * * IOMAP_DAX: DAX mapping
>   */
> -#define IOMAP_WRITE		(1 << 0) /* writing, must allocate blocks */
> -#define IOMAP_ZERO		(1 << 1) /* zeroing operation, may skip holes */
> -#define IOMAP_REPORT		(1 << 2) /* report extent status, e.g. FIEMAP */
> -#define IOMAP_FAULT		(1 << 3) /* mapping for page fault */
> -#define IOMAP_DIRECT		(1 << 4) /* direct I/O */
> -#define IOMAP_NOWAIT		(1 << 5) /* do not block */
> -#define IOMAP_OVERWRITE_ONLY	(1 << 6) /* only pure overwrites allowed */
> -#define IOMAP_UNSHARE		(1 << 7) /* unshare_file_range */
> +#define IOMAP_WRITE		(1 << 0)
> +#define IOMAP_ZERO		(1 << 1)
> +#define IOMAP_REPORT		(1 << 2)
> +#define IOMAP_FAULT		(1 << 3)
> +#define IOMAP_DIRECT		(1 << 4)
> +#define IOMAP_NOWAIT		(1 << 5)
> +#define IOMAP_OVERWRITE_ONLY	(1 << 6)
> +#define IOMAP_UNSHARE		(1 << 7)
>  #ifdef CONFIG_FS_DAX
> -#define IOMAP_DAX		(1 << 8) /* DAX mapping */
> +#define IOMAP_DAX		(1 << 8)
>  #else
>  #define IOMAP_DAX		0
>  #endif /* CONFIG_FS_DAX */
>  
> +/**
> + * struct iomap_ops - IO interface specific operations
> + *
> + * A filesystem is must provide a &struct iomap_ops for to deal with the

s/is //
s/for //
s/the//

> + * beginning an IO operation, iomap_begin(), and ending an IO operation on as

                                                                              a

> + * block range, ``iomap_end()``. You would call iomap with a specialized iomap
> + * operation depending on its filesystem or the VFS needs.
> + *
> + * For example iomap_dio_rw() would be used for for a filesystem when doing a

                                               double for
checkpatch should have caught that.

> + * block range read or write operation with direct IO. In this case your
> + * filesystem's respective &struct file_operations.write_iter() would eventually
> + * call iomap_dio_rw() on the filesystem's &struct file_operations.write_iter().
> + *
> + * For buffered IO a filesystem would use iomap_file_buffered_write() on the
> + * same &struct file_operations.write_iter(). But that is not the only situation
> + * in which a filesystem would deal with buffered writes, you could also use

                                                     writes. You

> + * buffered writes when a filesystem has to deal with &struct
> + * file_operations.fallocate(). However fallocate() can be used for *zeroing* or

                                   However,

> + * for *truncation* purposes. A special respective iomap_zero_range() would be
> + * used for *zeroing* and a iomap_truncate_page() would be used for

                         and an

> + * *truncation*.
> + *
> + * Experience with adopting iomap on filesystems have shown that the filesystem

                                                    has

> + * implementation of these operations can be simplified considerably if one
> + * &struct iomap_ops is provided per major filesystem IO operation:
> + *
> + * * buffered io

                 IO

> + * * direct io

               IO

> + * * DAX io

            IO

> + * * fiemap for with extended attributes (``FIEMAP_FLAG_XATTR``)
> + * * lseek
> + *
> + * @iomap_begin: return the existing mapping at pos, or reserve space starting
> + *	at pos for up to length, as long as we can do it as a single mapping. The
> + *	actual length is returned in iomap->length. The &struct iomap iomap must
> + *	always be set. The &struct iomap srcmap should be set if the range is
> + *	CoW.
> + *
> + * @iomap_end: commit and/or unreserve space previous allocated using
> + *	iomap_begin. Written indicates the length of the successful write
> + *	operation which needs to be committed, while the rest needs to be
> + *	unreserved. Written might be zero if no data was written.
> + */
>  struct iomap_ops {
> -	/*
> -	 * Return the existing mapping at pos, or reserve space starting at
> -	 * pos for up to length, as long as we can do it as a single mapping.
> -	 * The actual length is returned in iomap->length.
> -	 */
>  	int (*iomap_begin)(struct inode *inode, loff_t pos, loff_t length,
>  			unsigned flags, struct iomap *iomap,
>  			struct iomap *srcmap);
>  
> -	/*
> -	 * Commit and/or unreserve space previous allocated using iomap_begin.
> -	 * Written indicates the length of the successful write operation which
> -	 * needs to be commited, while the rest needs to be unreserved.
> -	 * Written might be zero if no data was written.
> -	 */
>  	int (*iomap_end)(struct inode *inode, loff_t pos, loff_t length,
>  			ssize_t written, unsigned flags, struct iomap *iomap);
>  };
> @@ -207,6 +281,7 @@ struct iomap_ops {
>   * @flags: Zero or more of the iomap_begin flags above.
>   * @iomap: Map describing the I/O iteration
>   * @srcmap: Source map for COW operations
> + * @private: internal use
>   */
>  struct iomap_iter {
>  	struct inode *inode;
> @@ -241,7 +316,7 @@ static inline u64 iomap_length(const struct iomap_iter *iter)
>   * @i: iteration structure
>   *
>   * Write operations on file systems with reflink support might require a
> - * source and a destination map.  This function retourns the source map
> + * source and a destination map.  This function returns the source map
>   * for a given operation, which may or may no be identical to the destination
>   * map in &i->iomap.
>   */
> @@ -281,42 +356,52 @@ loff_t iomap_seek_data(struct inode *inode, loff_t offset,
>  sector_t iomap_bmap(struct address_space *mapping, sector_t bno,
>  		const struct iomap_ops *ops);
>  
> -/*
> - * Structure for writeback I/O completions.
> +/**
> + * struct iomap_ioend - for writeback I/O completions
> + *
> + * @io_list: next ioend in chain
> + * @io_type:

missing field description.

> + * @io_flags: IOMAP_F_*
> + * @io_folios: folios added to ioend
> + * @io_inode: file being written to
> + * @io_size: size of the extent
> + * @io_offset: offset in the file
> + * @io_sector: start sector of ioend
> + * @io_bio: bio being built
> + * @io_inline_bio: MUST BE LAST!
>   */
>  struct iomap_ioend {
> -	struct list_head	io_list;	/* next ioend in chain */
> +	struct list_head	io_list;
>  	u16			io_type;
> -	u16			io_flags;	/* IOMAP_F_* */
> -	u32			io_folios;	/* folios added to ioend */
> -	struct inode		*io_inode;	/* file being written to */
> -	size_t			io_size;	/* size of the extent */
> -	loff_t			io_offset;	/* offset in the file */
> -	sector_t		io_sector;	/* start sector of ioend */
> -	struct bio		*io_bio;	/* bio being built */
> -	struct bio		io_inline_bio;	/* MUST BE LAST! */
> +	u16			io_flags;
> +	u32			io_folios;
> +	struct inode		*io_inode;
> +	size_t			io_size;
> +	loff_t			io_offset
> +	sector_t		io_sector;
> +	struct bio		*io_bio;
> +	struct bio		io_inline_bio;
>  };
>  
> +/**
> + * struct iomap_writeback_ops - used for writeback
> + *
> + * This structure is used to support dealing with a filesystem
> + * ``struct address_space_operations.writepages()``, for writeback.
> + *
> + * @map_blocks: required, maps the blocks so that writeback can be performed on
> + *	the range starting at offset.
> + * @prepare_ioend: optional, allows the file systems to perform actions just
> + *	before submitting the bio and/or override the bio end_io handler for
> + *	complex operations like copy on write extent manipulation or unwritten
> + *	extent conversions.
> + * @discard_folio: optional, allows the file system to discard state on a page where
> + *	we failed to submit any I/O.
> + */
>  struct iomap_writeback_ops {
> -	/*
> -	 * Required, maps the blocks so that writeback can be performed on
> -	 * the range starting at offset.
> -	 */
>  	int (*map_blocks)(struct iomap_writepage_ctx *wpc, struct inode *inode,
>  				loff_t offset);
> -
> -	/*
> -	 * Optional, allows the file systems to perform actions just before
> -	 * submitting the bio and/or override the bio end_io handler for complex
> -	 * operations like copy on write extent manipulation or unwritten extent
> -	 * conversions.
> -	 */
>  	int (*prepare_ioend)(struct iomap_ioend *ioend, int status);
> -
> -	/*
> -	 * Optional, allows the file system to discard state on a page where
> -	 * we failed to submit any I/O.
> -	 */
>  	void (*discard_folio)(struct folio *folio, loff_t pos);
>  };
>  
> @@ -334,26 +419,33 @@ int iomap_writepages(struct address_space *mapping,
>  		struct writeback_control *wbc, struct iomap_writepage_ctx *wpc,
>  		const struct iomap_writeback_ops *ops);
>  
> -/*
> - * Flags for direct I/O ->end_io:
> +/**
> + * DOC: Flags for direct I/O ->end_io
> + *
> + * * IOMAP_DIO_UNWRITTEN: covers unwritten extent(s)
> + * * IOMAP_DIO_COW: covers COW extent(s)
>   */
> -#define IOMAP_DIO_UNWRITTEN	(1 << 0)	/* covers unwritten extent(s) */
> -#define IOMAP_DIO_COW		(1 << 1)	/* covers COW extent(s) */
> +#define IOMAP_DIO_UNWRITTEN	(1 << 0)
> +#define IOMAP_DIO_COW		(1 << 1)
>  
> +/**
> + * struct iomap_dio_ops - used for direct IO
> + *
> + * This is used to support direct IO.
> + *
> + * @end_io:
> + * @submit_io:

missing field descriptions.

> + * @bio_set: Filesystems wishing to attach private information to a direct io

                                                                              IO

> + *	bio must provide a ->submit_io method that attaches the additional
> + *	information to the bio and changes the ->bi_end_io callback to a custom
> + *	function.  This function should, at a minimum, perform any relevant
> + *	post-processing of the bio and end with a call to iomap_dio_bio_end_io.
> + */
>  struct iomap_dio_ops {
>  	int (*end_io)(struct kiocb *iocb, ssize_t size, int error,
>  		      unsigned flags);
>  	void (*submit_io)(const struct iomap_iter *iter, struct bio *bio,
>  		          loff_t file_offset);
> -
> -	/*
> -	 * Filesystems wishing to attach private information to a direct io bio
> -	 * must provide a ->submit_io method that attaches the additional
> -	 * information to the bio and changes the ->bi_end_io callback to a
> -	 * custom function.  This function should, at a minimum, perform any
> -	 * relevant post-processing of the bio and end with a call to
> -	 * iomap_dio_bio_end_io.
> -	 */
>  	struct bio_set *bio_set;
>  };
>  


Thanks for doing all this work.
kernel test robot May 18, 2023, 8:15 p.m. UTC | #2
Hi Luis,

kernel test robot noticed the following build errors:

[auto build test ERROR on xfs-linux/for-next]
[also build test ERROR on lwn/docs-next linus/master v6.4-rc2 next-20230518]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Luis-Chamberlain/Documentation-add-initial-iomap-kdoc/20230518-231530
base:   https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git for-next
patch link:    https://lore.kernel.org/r/20230518150105.3160445-1-mcgrof%40kernel.org
patch subject: [PATCH v2] Documentation: add initial iomap kdoc
config: i386-randconfig-a003
compiler: gcc-11 (Debian 11.3.0-12) 11.3.0
reproduce (this is a W=1 build):
        # https://github.com/intel-lab-lkp/linux/commit/4db9a33eed82de4cb27f6e2890110a39bb4859af
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Luis-Chamberlain/Documentation-add-initial-iomap-kdoc/20230518-231530
        git checkout 4db9a33eed82de4cb27f6e2890110a39bb4859af
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        make W=1 O=build_dir ARCH=i386 olddefconfig
        make W=1 O=build_dir ARCH=i386 SHELL=/bin/bash fs/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202305190340.ukIywuHk-lkp@intel.com/

All errors (new ones prefixed by >>):

   In file included from fs/gfs2/bmap.c:13:
>> include/linux/iomap.h:381:9: error: expected ':', ',', ';', '}' or '__attribute__' before 'sector_t'
     381 |         sector_t                io_sector;
         |         ^~~~~~~~
--
   In file included from fs/iomap/buffered-io.c:9:
>> include/linux/iomap.h:381:9: error: expected ':', ',', ';', '}' or '__attribute__' before 'sector_t'
     381 |         sector_t                io_sector;
         |         ^~~~~~~~
   fs/iomap/buffered-io.c: In function 'iomap_finish_ioend':
>> fs/iomap/buffered-io.c:1323:33: error: 'struct iomap_ioend' has no member named 'io_inline_bio'
    1323 |         struct bio *bio = &ioend->io_inline_bio;
         |                                 ^~
>> fs/iomap/buffered-io.c:1324:33: error: 'struct iomap_ioend' has no member named 'io_bio'
    1324 |         struct bio *last = ioend->io_bio, *next;
         |                                 ^~
>> fs/iomap/buffered-io.c:1326:30: error: 'struct iomap_ioend' has no member named 'io_offset'
    1326 |         loff_t offset = ioend->io_offset;
         |                              ^~
   fs/iomap/buffered-io.c:1330:26: error: 'struct iomap_ioend' has no member named 'io_inline_bio'
    1330 |         for (bio = &ioend->io_inline_bio; bio; bio = next) {
         |                          ^~
   fs/iomap/buffered-io.c: In function 'iomap_ioend_can_merge':
   fs/iomap/buffered-io.c:1397:18: error: 'struct iomap_ioend' has no member named 'io_bio'
    1397 |         if (ioend->io_bio->bi_status != next->io_bio->bi_status)
         |                  ^~
   fs/iomap/buffered-io.c:1397:45: error: 'struct iomap_ioend' has no member named 'io_bio'
    1397 |         if (ioend->io_bio->bi_status != next->io_bio->bi_status)
         |                                             ^~
   fs/iomap/buffered-io.c:1405:18: error: 'struct iomap_ioend' has no member named 'io_offset'
    1405 |         if (ioend->io_offset + ioend->io_size != next->io_offset)
         |                  ^~
   fs/iomap/buffered-io.c:1405:54: error: 'struct iomap_ioend' has no member named 'io_offset'
    1405 |         if (ioend->io_offset + ioend->io_size != next->io_offset)
         |                                                      ^~
>> fs/iomap/buffered-io.c:1417:18: error: 'struct iomap_ioend' has no member named 'io_sector'
    1417 |         if (ioend->io_sector + (ioend->io_size >> 9) != next->io_sector)
         |                  ^~
   fs/iomap/buffered-io.c:1417:61: error: 'struct iomap_ioend' has no member named 'io_sector'
    1417 |         if (ioend->io_sector + (ioend->io_size >> 9) != next->io_sector)
         |                                                             ^~
   fs/iomap/buffered-io.c: In function 'iomap_ioend_compare':
   fs/iomap/buffered-io.c:1446:15: error: 'struct iomap_ioend' has no member named 'io_offset'
    1446 |         if (ia->io_offset < ib->io_offset)
         |               ^~
   fs/iomap/buffered-io.c:1446:31: error: 'struct iomap_ioend' has no member named 'io_offset'
    1446 |         if (ia->io_offset < ib->io_offset)
         |                               ^~
   fs/iomap/buffered-io.c:1448:15: error: 'struct iomap_ioend' has no member named 'io_offset'
    1448 |         if (ia->io_offset > ib->io_offset)
         |               ^~
   fs/iomap/buffered-io.c:1448:31: error: 'struct iomap_ioend' has no member named 'io_offset'
    1448 |         if (ia->io_offset > ib->io_offset)
         |                               ^~
   fs/iomap/buffered-io.c: In function 'iomap_submit_ioend':
   fs/iomap/buffered-io.c:1479:14: error: 'struct iomap_ioend' has no member named 'io_bio'
    1479 |         ioend->io_bio->bi_private = ioend;
         |              ^~
   fs/iomap/buffered-io.c:1480:14: error: 'struct iomap_ioend' has no member named 'io_bio'
    1480 |         ioend->io_bio->bi_end_io = iomap_writepage_end_bio;
         |              ^~
   fs/iomap/buffered-io.c:1491:22: error: 'struct iomap_ioend' has no member named 'io_bio'
    1491 |                 ioend->io_bio->bi_status = errno_to_blk_status(error);
         |                      ^~
   fs/iomap/buffered-io.c:1492:32: error: 'struct iomap_ioend' has no member named 'io_bio'
    1492 |                 bio_endio(ioend->io_bio);
         |                                ^~
   fs/iomap/buffered-io.c:1496:25: error: 'struct iomap_ioend' has no member named 'io_bio'
    1496 |         submit_bio(ioend->io_bio);
         |                         ^~
   In file included from include/linux/container_of.h:5,
                    from include/linux/list.h:5,
                    from include/linux/module.h:12,
                    from fs/iomap/buffered-io.c:6:
   fs/iomap/buffered-io.c: In function 'iomap_alloc_ioend':
>> include/linux/container_of.h:20:54: error: 'struct iomap_ioend' has no member named 'io_inline_bio'
      20 |         static_assert(__same_type(*(ptr), ((type *)0)->member) ||       \
         |                                                      ^~
   include/linux/build_bug.h:78:56: note: in definition of macro '__static_assert'
      78 | #define __static_assert(expr, msg, ...) _Static_assert(expr, msg)
         |                                                        ^~~~
   include/linux/container_of.h:20:9: note: in expansion of macro 'static_assert'
      20 |         static_assert(__same_type(*(ptr), ((type *)0)->member) ||       \
         |         ^~~~~~~~~~~~~
   include/linux/container_of.h:20:23: note: in expansion of macro '__same_type'
      20 |         static_assert(__same_type(*(ptr), ((type *)0)->member) ||       \
         |                       ^~~~~~~~~~~
   fs/iomap/buffered-io.c:1513:17: note: in expansion of macro 'container_of'
    1513 |         ioend = container_of(bio, struct iomap_ioend, io_inline_bio);
         |                 ^~~~~~~~~~~~
>> include/linux/compiler_types.h:338:27: error: expression in static assertion is not an integer
     338 | #define __same_type(a, b) __builtin_types_compatible_p(typeof(a), typeof(b))
         |                           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/build_bug.h:78:56: note: in definition of macro '__static_assert'
      78 | #define __static_assert(expr, msg, ...) _Static_assert(expr, msg)
         |                                                        ^~~~
   include/linux/container_of.h:20:9: note: in expansion of macro 'static_assert'
      20 |         static_assert(__same_type(*(ptr), ((type *)0)->member) ||       \
         |         ^~~~~~~~~~~~~
   include/linux/container_of.h:20:23: note: in expansion of macro '__same_type'
      20 |         static_assert(__same_type(*(ptr), ((type *)0)->member) ||       \
         |                       ^~~~~~~~~~~
   fs/iomap/buffered-io.c:1513:17: note: in expansion of macro 'container_of'
    1513 |         ioend = container_of(bio, struct iomap_ioend, io_inline_bio);
         |                 ^~~~~~~~~~~~
   In file included from include/uapi/linux/posix_types.h:5,
                    from include/uapi/linux/types.h:14,
                    from include/linux/types.h:6,
                    from include/linux/kasan-checks.h:5,
                    from include/asm-generic/rwonce.h:26,
                    from ./arch/x86/include/generated/asm/rwonce.h:1,
                    from include/linux/compiler.h:247,
                    from include/linux/build_bug.h:5,
                    from include/linux/container_of.h:5,
                    from include/linux/list.h:5,
                    from include/linux/module.h:12,
                    from fs/iomap/buffered-io.c:6:
>> include/linux/stddef.h:16:33: error: 'struct iomap_ioend' has no member named 'io_inline_bio'
      16 | #define offsetof(TYPE, MEMBER)  __builtin_offsetof(TYPE, MEMBER)
         |                                 ^~~~~~~~~~~~~~~~~~
   include/linux/container_of.h:23:28: note: in expansion of macro 'offsetof'
      23 |         ((type *)(__mptr - offsetof(type, member))); })
         |                            ^~~~~~~~
   fs/iomap/buffered-io.c:1513:17: note: in expansion of macro 'container_of'
    1513 |         ioend = container_of(bio, struct iomap_ioend, io_inline_bio);
         |                 ^~~~~~~~~~~~
   fs/iomap/buffered-io.c:1520:14: error: 'struct iomap_ioend' has no member named 'io_offset'
    1520 |         ioend->io_offset = offset;
         |              ^~
   fs/iomap/buffered-io.c:1521:14: error: 'struct iomap_ioend' has no member named 'io_bio'
    1521 |         ioend->io_bio = bio;
         |              ^~
   fs/iomap/buffered-io.c:1522:14: error: 'struct iomap_ioend' has no member named 'io_sector'
    1522 |         ioend->io_sector = sector;
         |              ^~
   fs/iomap/buffered-io.c: In function 'iomap_can_add_to_ioend':
   fs/iomap/buffered-io.c:1557:33: error: 'struct iomap_ioend' has no member named 'io_offset'
    1557 |         if (offset != wpc->ioend->io_offset + wpc->ioend->io_size)
         |                                 ^~
   In file included from include/linux/blkdev.h:17,
                    from include/linux/iomap.h:11,
                    from fs/iomap/buffered-io.c:9:
   fs/iomap/buffered-io.c:1559:48: error: 'struct iomap_ioend' has no member named 'io_bio'
    1559 |         if (sector != bio_end_sector(wpc->ioend->io_bio))
         |                                                ^~
   include/linux/bio.h:40:38: note: in definition of macro 'bvec_iter_end_sector'
      40 | #define bvec_iter_end_sector(iter) ((iter).bi_sector + bvec_iter_sectors((iter)))
         |                                      ^~~~
   fs/iomap/buffered-io.c:1559:23: note: in expansion of macro 'bio_end_sector'
    1559 |         if (sector != bio_end_sector(wpc->ioend->io_bio))
         |                       ^~~~~~~~~~~~~~
   fs/iomap/buffered-io.c:1559:48: error: 'struct iomap_ioend' has no member named 'io_bio'
    1559 |         if (sector != bio_end_sector(wpc->ioend->io_bio))
         |                                                ^~
   include/linux/bio.h:39:35: note: in definition of macro 'bvec_iter_sectors'
      39 | #define bvec_iter_sectors(iter) ((iter).bi_size >> 9)
         |                                   ^~~~
   include/linux/bio.h:43:33: note: in expansion of macro 'bvec_iter_end_sector'
      43 | #define bio_end_sector(bio)     bvec_iter_end_sector((bio)->bi_iter)
         |                                 ^~~~~~~~~~~~~~~~~~~~
   fs/iomap/buffered-io.c:1559:23: note: in expansion of macro 'bio_end_sector'
    1559 |         if (sector != bio_end_sector(wpc->ioend->io_bio))
         |                       ^~~~~~~~~~~~~~
   fs/iomap/buffered-io.c: In function 'iomap_add_to_ioend':
   fs/iomap/buffered-io.c:1590:38: error: 'struct iomap_ioend' has no member named 'io_bio'
    1590 |         if (!bio_add_folio(wpc->ioend->io_bio, folio, len, poff)) {
         |                                      ^~
   fs/iomap/buffered-io.c:1591:27: error: 'struct iomap_ioend' has no member named 'io_bio'
    1591 |                 wpc->ioend->io_bio = iomap_chain_bio(wpc->ioend->io_bio);
         |                           ^~
   fs/iomap/buffered-io.c:1591:64: error: 'struct iomap_ioend' has no member named 'io_bio'
    1591 |                 wpc->ioend->io_bio = iomap_chain_bio(wpc->ioend->io_bio);
         |                                                                ^~
   fs/iomap/buffered-io.c:1592:41: error: 'struct iomap_ioend' has no member named 'io_bio'
    1592 |                 bio_add_folio(wpc->ioend->io_bio, folio, len, poff);
         |                                         ^~
   In file included from include/uapi/linux/posix_types.h:5,
                    from include/uapi/linux/types.h:14,
                    from include/linux/types.h:6,
                    from include/linux/kasan-checks.h:5,
                    from include/asm-generic/rwonce.h:26,
                    from ./arch/x86/include/generated/asm/rwonce.h:1,
                    from include/linux/compiler.h:247,
                    from include/linux/build_bug.h:5,
                    from include/linux/container_of.h:5,
                    from include/linux/list.h:5,
                    from include/linux/module.h:12,
                    from fs/iomap/buffered-io.c:6:
   fs/iomap/buffered-io.c: In function 'iomap_init':
>> include/linux/stddef.h:16:33: error: 'struct iomap_ioend' has no member named 'io_inline_bio'
      16 | #define offsetof(TYPE, MEMBER)  __builtin_offsetof(TYPE, MEMBER)
         |                                 ^~~~~~~~~~~~~~~~~~
   fs/iomap/buffered-io.c:1830:28: note: in expansion of macro 'offsetof'
    1830 |                            offsetof(struct iomap_ioend, io_inline_bio),
         |                            ^~~~~~~~
   fs/iomap/buffered-io.c:1832:1: error: control reaches end of non-void function [-Werror=return-type]
    1832 | }
         | ^
   cc1: some warnings being treated as errors
--
   In file included from fs/gfs2/bmap.h:10,
                    from fs/gfs2/glock.c:50:
>> include/linux/iomap.h:381:9: error: expected ':', ',', ';', '}' or '__attribute__' before 'sector_t'
     381 |         sector_t                io_sector;
         |         ^~~~~~~~
   In file included from fs/gfs2/trace_gfs2.h:641,
                    from fs/gfs2/glock.c:52:
   include/trace/define_trace.h:95:42: fatal error: ./trace_gfs2.h: No such file or directory
      95 | #include TRACE_INCLUDE(TRACE_INCLUDE_FILE)
         |                                          ^
   compilation terminated.
--
   In file included from fs/xfs/xfs_iomap.h:9,
                    from fs/xfs/xfs_aops.c:15:
>> include/linux/iomap.h:381:9: error: expected ':', ',', ';', '}' or '__attribute__' before 'sector_t'
     381 |         sector_t                io_sector;
         |         ^~~~~~~~
   fs/xfs/xfs_aops.c: In function 'xfs_ioend_is_append':
>> fs/xfs/xfs_aops.c:40:21: error: 'struct iomap_ioend' has no member named 'io_offset'
      40 |         return ioend->io_offset + ioend->io_size >
         |                     ^~
   fs/xfs/xfs_aops.c: In function 'xfs_end_ioend':
   fs/xfs/xfs_aops.c:88:47: error: 'struct iomap_ioend' has no member named 'io_offset'
      88 |         xfs_off_t               offset = ioend->io_offset;
         |                                               ^~
>> fs/xfs/xfs_aops.c:115:42: error: 'struct iomap_ioend' has no member named 'io_bio'
     115 |         error = blk_status_to_errno(ioend->io_bio->bi_status);
         |                                          ^~
   fs/xfs/xfs_aops.c:134:50: error: 'struct iomap_ioend' has no member named 'io_offset'
     134 |                 error = xfs_setfilesize(ip, ioend->io_offset, ioend->io_size);
         |                                                  ^~
   fs/xfs/xfs_aops.c: In function 'xfs_prepare_ioend':
   fs/xfs/xfs_aops.c:439:38: error: 'struct iomap_ioend' has no member named 'io_offset'
     439 |                                 ioend->io_offset, ioend->io_size);
         |                                      ^~
   fs/xfs/xfs_aops.c:447:22: error: 'struct iomap_ioend' has no member named 'io_bio'
     447 |                 ioend->io_bio->bi_end_io = xfs_end_bio;
         |                      ^~
   fs/xfs/xfs_aops.c: In function 'xfs_ioend_is_append':
   fs/xfs/xfs_aops.c:42:1: error: control reaches end of non-void function [-Werror=return-type]
      42 | }
         | ^
   cc1: some warnings being treated as errors


vim +381 include/linux/iomap.h

   329	
   330	ssize_t iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *from,
   331			const struct iomap_ops *ops);
   332	int iomap_file_buffered_write_punch_delalloc(struct inode *inode,
   333			struct iomap *iomap, loff_t pos, loff_t length, ssize_t written,
   334			int (*punch)(struct inode *inode, loff_t pos, loff_t length));
   335	
   336	int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops);
   337	void iomap_readahead(struct readahead_control *, const struct iomap_ops *ops);
   338	bool iomap_is_partially_uptodate(struct folio *, size_t from, size_t count);
   339	struct folio *iomap_get_folio(struct iomap_iter *iter, loff_t pos);
   340	bool iomap_release_folio(struct folio *folio, gfp_t gfp_flags);
   341	void iomap_invalidate_folio(struct folio *folio, size_t offset, size_t len);
   342	int iomap_file_unshare(struct inode *inode, loff_t pos, loff_t len,
   343			const struct iomap_ops *ops);
   344	int iomap_zero_range(struct inode *inode, loff_t pos, loff_t len,
   345			bool *did_zero, const struct iomap_ops *ops);
   346	int iomap_truncate_page(struct inode *inode, loff_t pos, bool *did_zero,
   347			const struct iomap_ops *ops);
   348	vm_fault_t iomap_page_mkwrite(struct vm_fault *vmf,
   349				const struct iomap_ops *ops);
   350	int iomap_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
   351			u64 start, u64 len, const struct iomap_ops *ops);
   352	loff_t iomap_seek_hole(struct inode *inode, loff_t offset,
   353			const struct iomap_ops *ops);
   354	loff_t iomap_seek_data(struct inode *inode, loff_t offset,
   355			const struct iomap_ops *ops);
   356	sector_t iomap_bmap(struct address_space *mapping, sector_t bno,
   357			const struct iomap_ops *ops);
   358	
   359	/**
   360	 * struct iomap_ioend - for writeback I/O completions
   361	 *
   362	 * @io_list: next ioend in chain
   363	 * @io_type:
   364	 * @io_flags: IOMAP_F_*
   365	 * @io_folios: folios added to ioend
   366	 * @io_inode: file being written to
   367	 * @io_size: size of the extent
   368	 * @io_offset: offset in the file
   369	 * @io_sector: start sector of ioend
   370	 * @io_bio: bio being built
   371	 * @io_inline_bio: MUST BE LAST!
   372	 */
   373	struct iomap_ioend {
   374		struct list_head	io_list;
   375		u16			io_type;
   376		u16			io_flags;
   377		u32			io_folios;
   378		struct inode		*io_inode;
   379		size_t			io_size;
   380		loff_t			io_offset
 > 381		sector_t		io_sector;
   382		struct bio		*io_bio;
   383		struct bio		io_inline_bio;
   384	};
   385
kernel test robot May 18, 2023, 8:38 p.m. UTC | #3
Hi Luis,

kernel test robot noticed the following build errors:

[auto build test ERROR on xfs-linux/for-next]
[also build test ERROR on lwn/docs-next linus/master v6.4-rc2 next-20230518]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Luis-Chamberlain/Documentation-add-initial-iomap-kdoc/20230518-231530
base:   https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git for-next
patch link:    https://lore.kernel.org/r/20230518150105.3160445-1-mcgrof%40kernel.org
patch subject: [PATCH v2] Documentation: add initial iomap kdoc
config: i386-randconfig-a002
compiler: clang version 14.0.6 (https://github.com/llvm/llvm-project f28c006a5895fc0e329fe15fead81e37457cb1d1)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/4db9a33eed82de4cb27f6e2890110a39bb4859af
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Luis-Chamberlain/Documentation-add-initial-iomap-kdoc/20230518-231530
        git checkout 4db9a33eed82de4cb27f6e2890110a39bb4859af
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=i386 olddefconfig
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=i386 SHELL=/bin/bash fs/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202305190426.ohhxM3oA-lkp@intel.com/

All errors (new ones prefixed by >>):

   In file included from fs/f2fs/file.c:27:
>> include/linux/iomap.h:380:20: error: expected ';' at end of declaration list
           loff_t                  io_offset
                                            ^
                                            ;
   1 error generated.
--
   In file included from fs/ext4/file.c:24:
>> include/linux/iomap.h:380:20: error: expected ';' at end of declaration list
           loff_t                  io_offset
                                            ^
                                            ;
   In file included from fs/ext4/file.c:31:
   include/linux/mman.h:154:9: warning: division by zero is undefined [-Wdivision-by-zero]
                  _calc_vm_trans(flags, MAP_SYNC,       VM_SYNC      ) |
                  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/mman.h:132:21: note: expanded from macro '_calc_vm_trans'
      : ((x) & (bit1)) / ((bit1) / (bit2))))
                       ^ ~~~~~~~~~~~~~~~~~
   1 warning and 1 error generated.
--
   In file included from fs/iomap/buffered-io.c:9:
>> include/linux/iomap.h:380:20: error: expected ';' at end of declaration list
           loff_t                  io_offset
                                            ^
                                            ;
>> fs/iomap/buffered-io.c:1417:13: error: no member named 'io_sector' in 'struct iomap_ioend'
           if (ioend->io_sector + (ioend->io_size >> 9) != next->io_sector)
               ~~~~~  ^
   fs/iomap/buffered-io.c:1417:56: error: no member named 'io_sector' in 'struct iomap_ioend'
           if (ioend->io_sector + (ioend->io_size >> 9) != next->io_sector)
                                                           ~~~~  ^
   fs/iomap/buffered-io.c:1522:9: error: no member named 'io_sector' in 'struct iomap_ioend'
           ioend->io_sector = sector;
           ~~~~~  ^
   4 errors generated.


vim +380 include/linux/iomap.h

   329	
   330	ssize_t iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *from,
   331			const struct iomap_ops *ops);
   332	int iomap_file_buffered_write_punch_delalloc(struct inode *inode,
   333			struct iomap *iomap, loff_t pos, loff_t length, ssize_t written,
   334			int (*punch)(struct inode *inode, loff_t pos, loff_t length));
   335	
   336	int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops);
   337	void iomap_readahead(struct readahead_control *, const struct iomap_ops *ops);
   338	bool iomap_is_partially_uptodate(struct folio *, size_t from, size_t count);
   339	struct folio *iomap_get_folio(struct iomap_iter *iter, loff_t pos);
   340	bool iomap_release_folio(struct folio *folio, gfp_t gfp_flags);
   341	void iomap_invalidate_folio(struct folio *folio, size_t offset, size_t len);
   342	int iomap_file_unshare(struct inode *inode, loff_t pos, loff_t len,
   343			const struct iomap_ops *ops);
   344	int iomap_zero_range(struct inode *inode, loff_t pos, loff_t len,
   345			bool *did_zero, const struct iomap_ops *ops);
   346	int iomap_truncate_page(struct inode *inode, loff_t pos, bool *did_zero,
   347			const struct iomap_ops *ops);
   348	vm_fault_t iomap_page_mkwrite(struct vm_fault *vmf,
   349				const struct iomap_ops *ops);
   350	int iomap_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
   351			u64 start, u64 len, const struct iomap_ops *ops);
   352	loff_t iomap_seek_hole(struct inode *inode, loff_t offset,
   353			const struct iomap_ops *ops);
   354	loff_t iomap_seek_data(struct inode *inode, loff_t offset,
   355			const struct iomap_ops *ops);
   356	sector_t iomap_bmap(struct address_space *mapping, sector_t bno,
   357			const struct iomap_ops *ops);
   358	
   359	/**
   360	 * struct iomap_ioend - for writeback I/O completions
   361	 *
   362	 * @io_list: next ioend in chain
   363	 * @io_type:
   364	 * @io_flags: IOMAP_F_*
   365	 * @io_folios: folios added to ioend
   366	 * @io_inode: file being written to
   367	 * @io_size: size of the extent
   368	 * @io_offset: offset in the file
   369	 * @io_sector: start sector of ioend
   370	 * @io_bio: bio being built
   371	 * @io_inline_bio: MUST BE LAST!
   372	 */
   373	struct iomap_ioend {
   374		struct list_head	io_list;
   375		u16			io_type;
   376		u16			io_flags;
   377		u32			io_folios;
   378		struct inode		*io_inode;
   379		size_t			io_size;
 > 380		loff_t			io_offset
   381		sector_t		io_sector;
   382		struct bio		*io_bio;
   383		struct bio		io_inline_bio;
   384	};
   385
Luis Chamberlain May 18, 2023, 8:55 p.m. UTC | #4
On Thu, May 18, 2023 at 08:49:26AM -0700, Randy Dunlap wrote:
> On 5/18/23 08:01, Luis Chamberlain wrote:
> >   * use 80 char length as if we're in the 1980's
> 
> Well, like Jon said, long lines are difficult to read, even on printed paper.
> That's (at least one reason) why newspapers(!) have narrow columns of print.

Makes sense!

> Anyway, thanks for doing it.

Heh yeah, I just trying to follow the convention, but I didn't know it was
a special-case for bumping up to 100 only, and that it was definitely
not good for docs. It's easy to loose it though, we have one for commit log,
one for length on code with an exception, and we have a clear perference
for docs. Makes me wonder if editors pick up on project specific requirements
somehow? 

So for instance I used to have incorrectly;

set textwidth=100
autocmd FileType gitcommit set textwidth=72
set colorcolumn=+1

Cleary that 100 is wrong now and I've now updated it bacak to 80.
Could one be used for FileType for rst and ascii files?

If we shared something on the top level which lets developers
optionally pick up on project specific guideline it would be a less
common problem to ping back / forth about this. Curious how many
patches "length" is the reason introduces a latency for patches getting
upstream. You figured this would be a simple fix in year 2023 :P

Thanks for the fix recommendations! I'll wait and see if others find
others!

  Luis
Jonathan Corbet May 18, 2023, 11:09 p.m. UTC | #5
Luis Chamberlain <mcgrof@kernel.org> writes:

> To help with iomap adoption / porting I set out the goal to try to
> help improve the iomap documentation and get general guidance for
> filesystem conversions over from buffer-head in time for this year's
> LSFMM. The end results thanks to the review of Darrick, Christoph and
> others is on the kernelnewbies wiki [0].
>
> This brings this forward a relevant subset of that documentation to
> the kernel in kdoc format and also kdoc'ifies the existing documentation
> on iomap.h.

OK, I've had a read through it.  Thanks again for doing it, we
definitely need this.  There are typos and such that Randy has already
pointed out, so I won't bother with those.  My main comment is mainly a
high level one, along with a handful of nits.

My high-level question is: who is the audience for this document?  I'm
guessing it's filesystem developers?  Whoever it is, the document could
benefit from an introductory section, aimed at that audience, saying how
to get *started* with iomap.  What include files do I need?  How do I
provide a set of iomap callbacks and make them visible to the VFS?
Without that sort of stuff, it makes for a rough jumping-in experience.

The nits:

> diff --git a/Documentation/filesystems/iomap.rst b/Documentation/filesystems/iomap.rst
> new file mode 100644
> index 000000000000..be487030fcff
> --- /dev/null
> +++ b/Documentation/filesystems/iomap.rst
> @@ -0,0 +1,253 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +.. _iomap:

I don't think you use this label anywhere, so it doesn't need to be here.

> +..
> +        Mapping of heading styles within this document:
> +        Heading 1 uses "====" above and below
> +        Heading 2 uses "===="
> +        Heading 3 uses "----"
> +        Heading 4 uses "````"
> +        Heading 5 uses "^^^^"
> +        Heading 6 uses "~~~~"
> +        Heading 7 uses "...."

We have a set of conventions for section headings, nicely documented in
Documentation/doc-guide/sphinx.rst.  This hierarchy doesn't quite match
it, but you don't get far enough into it to hit the differences.  I'd
just take this out.

> +
> +        Sections are manually numbered because apparently that's what everyone
> +        does in the kernel.

The sections are *not* manually numbered, which I think is entirely
fine.  But that makes this comment a bit weird.

> +.. contents:: Table of Contents
> +   :local:
> +
> +=====
> +iomap
> +=====
> +
> +.. kernel-doc:: include/linux/iomap.h
> +
> +A modern block abstraction
> +==========================
> +
> +**iomap** allows filesystems to query storage media for data using *byte
> +ranges*. Since block mapping are provided for a *byte ranges* for cache data in
> +memory, in the page cache, naturally this implies operations on block ranges
> +will also deal with *multipage* operations in the page cache. **Folios** are
> +used to help provide *multipage* operations in memory for the *byte ranges*
> +being worked on.

This text (and the document as a whole) is a bit heavy on the markup.
I'd consider taking some of it out to improve the plain-text readability.

[...]

> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index e2b836c2e119..ee4b026995ac 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -10,6 +10,30 @@
>  #include <linux/mm_types.h>
>  #include <linux/blkdev.h>
>  
> +/**
> + * DOC: Introduction
> + *
> + * iomap allows filesystems to sequentially iterate over byte addressable block
> + * ranges on an inode and apply operations to it.
> + *
> + * iomap grew out of the need to provide a modern block mapping abstraction for
> + * filesystems with the different IO access methods they support and assisting
> + * the VFS with manipulating files into the page cache. iomap helpers are
> + * provided for each of these mechanisms. However, block mapping is just one of
> + * the features of iomap, given iomap supports DAX IO for filesystems and also
> + * supports such the ``lseek``/``llseek`` ``SEEK_DATA``/``SEEK_HOLE``
> + * interfaces.
> + *
> + * Block mapping provides a mapping between data cached in memory and the
> + * location on persistent storage where that data lives. `LWN has an great
> + * review of the old buffer-heads block-mapping and why they are inefficient
> + * <https://lwn.net/Articles/930173/>`, since the inception of Linux.  Since
> + * **buffer-heads** work on a 512-byte block based paradigm, it creates an
> + * overhead for modern storage media which no longer necessarily works only on
> + * 512-blocks. iomap is flexible providing block ranges in *bytes*. iomap, with
> + * the support of folios, provides a modern replacement for **buffer-heads**.

As much as I love to see LWN references embedded in as many kernel files
as possible, I'm honestly not sure that the iomap documentation needs to
talk about buffer heads at all - except maybe for help in replacing
them.  In particular, I don't think that the documentation needs to
justify iomap's existence; we can look at the commit logs if we need to
remind ourselves of that.

Thanks,

jon
Dave Chinner May 19, 2023, 1:48 a.m. UTC | #6
On Thu, May 18, 2023 at 08:01:05AM -0700, Luis Chamberlain wrote:
> To help with iomap adoption / porting I set out the goal to try to
> help improve the iomap documentation and get general guidance for
> filesystem conversions over from buffer-head in time for this year's
> LSFMM. The end results thanks to the review of Darrick, Christoph and
> others is on the kernelnewbies wiki [0].
> 
> This brings this forward a relevant subset of that documentation to
> the kernel in kdoc format and also kdoc'ifies the existing documentation
> on iomap.h.
> 
> Tested with:
> 
> make htmldocs SPHINXDIRS="filesystems"
> 
> Then looking at the docs produced on:
> 
> Documentation/output/filesystems/iomap.html
> 
> [0] https://kernelnewbies.org/KernelProjects/iomap
> 
> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
> ---
> 
> Changes on v2:
> 
>   * use 80 char length as if we're in the 1980's
> 
>  Documentation/filesystems/index.rst |   1 +
>  Documentation/filesystems/iomap.rst | 253 +++++++++++++++++++++
>  include/linux/iomap.h               | 336 ++++++++++++++++++----------
>  3 files changed, 468 insertions(+), 122 deletions(-)
>  create mode 100644 Documentation/filesystems/iomap.rst
> 
> diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
> index fbb2b5ada95b..6186ab7c3ea8 100644
> --- a/Documentation/filesystems/index.rst
> +++ b/Documentation/filesystems/index.rst
> @@ -34,6 +34,7 @@ algorithms work.
>     seq_file
>     sharedsubtree
>     idmappings
> +   iomap
>  
>     automount-support
>  
> diff --git a/Documentation/filesystems/iomap.rst b/Documentation/filesystems/iomap.rst
> new file mode 100644
> index 000000000000..be487030fcff
> --- /dev/null
> +++ b/Documentation/filesystems/iomap.rst
> @@ -0,0 +1,253 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +.. _iomap:
> +
> +..
> +        Mapping of heading styles within this document:
> +        Heading 1 uses "====" above and below
> +        Heading 2 uses "===="
> +        Heading 3 uses "----"
> +        Heading 4 uses "````"
> +        Heading 5 uses "^^^^"
> +        Heading 6 uses "~~~~"
> +        Heading 7 uses "...."
> +
> +        Sections are manually numbered because apparently that's what everyone
> +        does in the kernel.
> +.. contents:: Table of Contents
> +   :local:
> +
> +=====
> +iomap
> +=====
> +
> +.. kernel-doc:: include/linux/iomap.h
> +
> +A modern block abstraction
> +==========================

iomap is not a "block" abstraction layer. It's a mapping layer that
maps -file offset- ranges to -extents- that describe a LBA range in
a block device address space.

> +**iomap** allows filesystems to query storage media for data using *byte
> +ranges*. Since block mapping are provided for a *byte ranges* for cache data in
> +memory, in the page cache, naturally this implies operations on block ranges
> +will also deal with *multipage* operations in the page cache.

I don't know what this means.

> **Folios** are
> +used to help provide *multipage* operations in memory for the *byte ranges*
> +being worked on.

No. iomap naturally does multi-page operations that span the mapped
extent that is returned. We do not need folios for this (never have)
but folios can optimise it.

Any discussion that tries to describe iomap in page cache terms has
started off in completely the wrong direction. The whole point of
iomap is that it is filesystem centric rather than page cache or mm
centric. i.e. the fundamental structure of the iomap infrastructure
is this:

user IO
  loop for file IO range
    loop for each mapped extent
      if (buffered) {
        loop for each page/folio {
	  instantiate page cache
	  copy data to/from page cache
	  update page cache state
	}
      } else { /* direct IO */
        loop for each bio {
	  pack user pages into bio
	  submit bio
	}
      }
    }
  }

IOWs, if you are trying to describe iomap in terms of page cache
behaviour, then you've got it completely back to front.  The whole
point of iomap is that it is "completely back to front" w.r.t. to
the old "get_blocks()" per page callback mechanism the page cache
used to use to interact with filesystems.

The introduction should actually give some grounding in how the
infrastructure works - how it iterates, when mapping callbacks run,
why we have iomap_begin and iomap_end operations and when it is
necessary to supply both, etc. Spreading all that information
across this file and structure and function definitions in the code
does not make for coherent, readable or understandable
documentation...

Also, describing the required locking heirachy also helps to explain
how things work. iomap assumes two layers of locking. It requires
locking above the iomap layer for IO serialisation (i_rwsem,
invalidation lock) which is generally taken before calling into
iomap functions.  There is also locking below iomap for
mapping/allocation serialisation on an inode (e.g. XFS_ILOCK,
i_data_sem in ext4, etc) that is usually taken inside the mapping
methods filesystems supply to the iomap infrastructure. This layer
of locking needs to be independent of the IO path serialisation
locking as it nests inside in the IO path but is also used without
the filesystem IO path locking protecting it (e.g. in the iomap
writeback path).

> +iomap IO interfaces
> +===================
> +
> +You call **iomap** depending on the type of filesystem operation you are working
> +on. We detail some of these interactions below.
> +
> +iomap for bufferred IO writes
> +-----------------------------
> +
> +You call **iomap** for buffered IO with:
> +
> + * ``iomap_file_buffered_write()`` - for buffered writes

This is not sufficient to implement a filesystem based on iomap.
There's no discussion of:

- locking requirements
- IOCB_NOWAIT requirements
- that write validity checking must have already been performed
- it doesn't do things like update timestamps or strip privileges
- the ops structure that needs to be supplied to this function needs
  to be documented as well as when those callbacks are run and what
  they are supposed to do.

> + * ``iomap_page_mkwrite()`` - when dealing callbacks for
> +    ``struct vm_operations_struct``

Similar issues here.

> +
> +  * ``struct vm_operations_struct.page_mkwrite()``
> +  * ``struct vm_operations_struct.fault()``
> +  * ``struct vm_operations_struct.huge_fault()``
> +  * ``struct vm_operations_struct`.pfn_mkwrite()``

No idea what this is doing in an iomap document. :/

> +You *may* use buffered writes to also deal with ``fallocate()``:
> +
> + * ``iomap_zero_range()`` on fallocate for zeroing
> + * ``iomap_truncate_page()`` on fallocate for truncation

These are *not* fallocate() operations. fallocate() implementations
may use iomap_zero_range(), but that's because fallocate() ops might
need to -zero file offset ranges within EOF-

IOWs, iomap_zero_range() is a standalone helper that converts a
range of the file to contain zeros. That's all it does. It also
needs documenting about what locks need to be held when it's called,
along with a discussion of how it handles partial block zeroing and
what happens when the range spans unwritten extents..

iomap_truncate_page() is only used when removing file data from the
file. It is a destructive operation, and it is typically used to
remove cached data from beyond EOF when a truncate down operation is
making the file smaller. It has nothing to do with fallocate() at
all....

> +Typically you'd also happen to use these on paths when updating an inode's size.
> +
> +iomap for direct IO
> +-------------------
> +
> +You call **iomap** for direct IO with:
> +
> + * ``iomap_dio_rw()``

All the same comments as for buffered IO.

There are also addition data integrity concerns with serialising
concurrent sub-block DIO constraints that the filesystem has to
handle itself.

> +You **may** use direct IO writes to also deal with ``fallocate()``:
> +
> + * ``iomap_zero_range()`` on fallocate for zeroing
> + * ``iomap_truncate_page()`` on fallocate for truncation

This makes no sense at all.

iomap_dio_rw() does it's own sub-block zeroing if the mapping
returned to it by the filesystem requires it. That's where all the
additional data integrity concerns with concurrent sub-block DIO
come from.

> +Typically you'd also happen to use these on paths when updating an inode's size.

What has that got to do with discussing how iomap_dio_rw() works?

> +iomap for reads
> +---------------
> +
> +You can call into **iomap** for reading, ie, dealing with the filesystems's
> +``struct file_operations``:
> +
> + * ``struct file_operations.read_iter()``: note that depending on the type of
> +   read your filesystem might use ``iomap_dio_rw()`` for direct IO,
> +   generic_file_read_iter() for buffered IO and
> +   ``dax_iomap_rw()`` for DAX.
> + * ``struct file_operations.remap_file_range()`` - currently the special
> +   ``dax_remap_file_range_prep()`` helper is provided for DAX mode reads.

This is confusing. Why mention struct file_operations here and
not in the write path? Why mention DAX here, and not in the write
path? Why mention remap_file_range here, when that is an operation
that simply layers over the read and write IO paths?

Indeed, you don't discuss anything about the IO path, locking
requirements, how reads and writes can be serialised against each
other with iomap (e.g. via coarse grained per-IO exclusion via
i_rwsem or fine-grained via folio locks), etc. Those are things that
people implementing filesystem actually need to know....

> +iomap for userspace file extent mapping
> +---------------------------------------
> +
> +The ``fiemap`` ioctl can be used to allow userspace to get a file extent
> +mapping. The older ``bmap()`` (aka ``FIBMAP``)  allows the VM to map logical
> +block offset to physical block number.  ``bmap()`` is a legacy block mapping
> +operation supported only for the ioctl and two areas in the kernel which likely
> +are broken (the default swapfile implementation and odd md bitmap code).
> +``bmap()`` was only useful in the days of ext2 when there were no support for
> +delalloc or unwritten extents. Consequently, the interface reports nothing for
> +those types of mappings. Because of this we don't want filesystems to start
> +exporting this interface if they don't already do so.

Discussion of FIBMAP problems has no place in a description of how
to use iomap to support FIEMAP.

> +The ``fiemap`` ioctl is supported through an inode ``struct
> +inode_operations.fiemap()`` callback.

Well, yes, but that's largely irrelevant to how to use iomap to
implement it?

> +You would use ``iomap_fiemap()`` to provide the mapping. You could use two
> +seperate ``struct iomap_ops`` one for when requested to also map extended
> +attributes (``FIEMAP_FLAG_XATTR``) and your another ``struct iomap_ops`` for
> +regular read ``struct iomap_ops`` when there is no need for extended attributes.

Yay, finally the 'struct iomap_ops' is mentioned, but without any
description of what it is or how it gets used, this is just
confusing.

"iomap_fiemap() will return the fiemap data ready to be returned to
userspace based on the mapping function provided by the caller. Note
that FIEMAP may ask for different mappings from the same inode (e.g.
attribute vs data) so care must be taken to ensure the mapping
function provided returns the information the caller asked for."


> +In the future **iomap** may provide its own dedicated ops structure for
> +``fiemap``.

unnecessary.

> +``iomap_bmap()`` exists and should *only be used* by filesystems that
> +**already** supported ``FIBMAP``.  ``FIBMAP`` **should not be used** with the
> +address_space -- we have iomap readpages and writepages for that.

I really don't understand what you are trying to say "do not do"
here.  Nothing in a filesystem implementation needs to call
iomap_bmap(); it's a function to implement the fs ->bmap method and
nothing else.

> +iomap for assisting the VFS
> +---------------------------
> +
> +A filesystem also needs to call **iomap** when assisting the VFS manipulating a
> +file into the page cache.

I have no idea what this means.

> +iomap for VFS reading
> +---------------------
> +
> +A filesystem can call **iomap** to deal with the VFS reading a file into folios
> +with:
> +
> + * ``iomap_bmap()`` - called to assist the VFS when manipulating page cache with
> +   ``struct address_space_operations.bmap()``, to help the VFS map a logical
> +   block offset to physical block number.

What? Nobody should be using ->bmap() except for FIBMAP and the
legacy swapfile mapping path. Any filesystem that is implementing
iomap that supports swapfiles should be using
iomap_swapfile_activate(). Swapfiles are most definitely not
a "VFS reading a file into folios" situation.

> + * ``iomap_read_folio()`` - called to assist the page cache with
> +   ``struct address_space_operations.read_folio()``
> + * ``iomap_readahead()`` - called to assist the page cache with
> +   ``struct address_space_operations.readahead()``

Oh, these aren't "VFS reading" operations. These are page cache IO
interfaces.

> +
> +iomap for VFS writepages
> +------------------------
> +
> +A filesystem can call **iomap** to deal with the VFS write out of pages back to
> +backing store, that is to help deal with a filesystems's ``struct
> +address_space_operations.writepages()``. The special ``iomap_writepages()`` is
> +used for this case with its own respective filestems's ``struct iomap_ops`` for
> +this.

That is .... very brief.

There's no discussion of:
- the iomap_writepage_ctx and how to use that,
- calling context constraints (e.g. no direct reclaim calls),
- what happens when the page spans or is beyond EOF,
- what the ->map_blocks() callback is supposed to do,
- how errors during write submission are handled,
- when ->discard_folio() should be implemented,
- data integrity requirements,
- how ioends work
- when a filesystem might need to implement ->prepare_ioend() (e.g.
  for COW state updates)
- how ioend completion merging can be implemented
- how to attach custom IO completion handlers (e.g. for unwritten
  extent conversion, COW state updates, file size updates, etc)


> +iomap for VFS llseek
> +--------------------
> +
> +A filesystem ``struct address_space_operations.llseek()`` is used by the VFS
> +when it needs to move the current file offset, the file offset is in ``struct
> +file.f_pos``. **iomap** has special support for the ``llseek`` ``SEEK_HOLE`` or
> +``SEEK_DATA`` interfaces:

All this needs to say is:

"iomap provides generic support for SEEK_HOLE and SEEK_DATA via
iomap_seek_hole() and iomap_seek_data(). This requires the
filesystem to supply a mapping callback to allow targeted
iteration of the current inode's extent map."

> +
> + * ``iomap_seek_hole()``: for when the
> +   ``struct address_space_operations.llseek()`` *whence* argument is
> +   ``SEEK_HOLE``, when looking for the file's next hole.
> + * ``iomap_seek_data()``: for when the
> +   ``struct address_space_operations.llseek()`` *whence* argument isj
> +   ``SEEK_DATA`` when looking for the file's next data area.
> +
> +Your own ``struct iomap_ops`` for this is encouraged.

What does that mean? The structure with a valid ->iomap_begin method
is required for correct operation....

> +iomap for DAX
> +-------------
> +You can use ``dax_iomap_rw()`` when calling iomap from a DAX context, this is
> +typically from the filesystems's ``struct file_operations.write_iter()``
> +callback.

Ugh. That is *not useful*. It's worse than not documenting it at
all.

DAX has a separate iomap API and iterator. it implements many of the
same functions unsing the same API. e.g. dax_zero_range(),
dax_truncate_page(), dax_iomap_fault(), dax_file_unshare(),
dax_remap_file_range_prep(), etc.

Any filesystem that uses the iomap API should be able to branch
operations to the DAX iomap operations if IS_DAX() is set on the
inode. Other than that, implementation of DAX support is way outside
the scope of implementing iomap support in a filesystem.

> +Converting filesystems from buffer-head to iomap guide
> +======================================================
> +
> +These are generic guidelines on converting a filesystem over to **iomap** from
> +'''buffer-heads'''.
> +
> +One op at at time
> +-----------------
> +
> +You may try to convert a filesystem with different clustered set of operations
> +at time, below are a generic order you may strive to target:
> +
> + * direct io
> + * miscellaneous helpers (seek/fiemap/bmap)
> + * buffered io

That doesn't tell anyone how to actually go about this.

The right approach is to first implement ->iomap_begin and (if
necessary) ->iomap_end to allow iomap to obtain a read-only mapping
of a file range.  In most cases, this is a relatively trivial
conversion of the existing get_block() callback for read-only
mappings.

FIEMAP is a really good first target because it is trivial to
implement support for it and then to determine that the extent map
iteration is correct from userspace.  i.e. if FIEMAP is returning
the correct information, it's a good sign that other read-only
mapping operations will also do the right thing.

Next, rewrite the filesystem's get_block(create = false)
implementatio to use the new ->iomap_begin() implementation. i.e.
get_block wraps around the outside and converts the information in
the iomap to the bufferhead-based map that getblock returns. This
will convert all the existing read-only mapping users to use the new
iomap mapping function internally. This way the iomap mapping
function can be further tested without needing to implement any
other iomap APIs.

Once everything is working like this, then convert all the other
read-only mapping operations to use iomap. Done one at a time,
regressions should be self evident. The only likely complexity
at this point will be the buffered read IO path because
bufferheads. The buffered read IO paths don't need to be converted
yet, though the direct IO read path should be converted.

The next thing to do is implement get_blocks(create = true)
functionality in the ->iomap_begin/end() methods. Then convert the
direct IO write path to iomap, and start running fsx w/ DIO enabled
in earnest on filesystem. This will flush out lots of data integrity
corner case bug that the new write mapping implementation
introduces.

At this point, converting the entire get_blocks() path to call the
iomap functions and convert the iomaps to bufferhead maps is
possible. THis will get the entire filesystem using the new mapping
functions, and they should largely be debugged and working correctly
after this step.

This now largely leaves the buffered read and write paths to be
converted. The mapping functions should all work correctly,
so all that needs to be done is rewriting all the code that
interfaces with bufferheads to interface with iomap and folios. This
is left as an exercise for the reader, as it will be different for
every filesystem.


> +Defining a simple filesystem
> +----------------------------
> +
> +A simple filesystem is perhaps the easiest to convert over to **iomap**, a
> +simple filesystem is one which:
> +
> + * does not use fsverify, fscrypt, compression
> + * has no Copy on Write support (reflinks)
> +
> +Converting a simple filesystem to iomap
> +---------------------------------------
> +
> +Simple filesystems should covert to IOMAP piecemeal wise first converting over
> +**direct IO**, then the miscellaneous helpers  (seek/fiemap/bmap) and last
> +should be buffered IO.
> +
> +Converting shared filesystem features
> +-------------------------------------
> +
> +Shared filesystems features such as fscrypt, compression, erasure coding, and
> +any other data transformations need to be ported to **iomap** first, as none of
> +the current **iomap** users require any of this functionality.
> +
> +Converting complex filesystems
> +------------------------------
> +
> +If your filesystem relies on any shared filesystem features mentioned above
> +those would need to be converted piecemeal wise.

"piecemeal wise"? Anyway, didn't you just address this in the
previous section? Why mention it again?

> If reflinks are supported you
> +need to first ensure proper locking sanity in order to be able to address byte
> +ranges can be handled properly through **iomap** operations.  An example
> +filesystem where this work is taking place is btrfs.

Ah, what? To use iomap at all, you first need to ensure properly
locking sanity across the filesystem operations. Reflink or any
other filesystem functionality doesn't magically create locking
issues with iomap - the iomap infrastructure *always* assumes the
caller is holding the correct locks when iomap functions are called,
and that the methods will take the right locks to ensure filesystem
mapping operations are serialised against each other correctly. i.e.
there is no locking in the iomap infrastructure except for internal
state protection....


> +IOMAP_F_BUFFER_HEAD considerations
> +----------------------------------
> +
> +``IOMAP_F_BUFFER_HEAD`` won't be removed until we have all filesystem fully
> +converted away from **buffer-heads**, and this could be never.
> +
> +``IOMAP_F_BUFFER_HEAD`` should be avoided as a stepping stone / to port
> +filesystems over to **iomap** as it's support for **buffer-heads** only apply to
> +the buffered write path and nothing else including the read_folio/readahead and
> +writepages aops.
> +
> +Testing Direct IO
> +=================
> +
> +Other than fstests you can use LTP's dio, however this tests is limited as it
> +does not test stale data.
> +
> +{{{
> +./runltp -f dio -d /mnt1/scratch/tmp/
> +}}}

This should point to all the data integrity tests and IO path stress
tests in fstests (i.e. '-g rw -g aio -g fiemap -g seek'). I think
those cover far more cases (including stale data and buffered/mmap
IO) than what you are suggesting here.

> +
> +Known issues and future improvements
> +====================================
> +
> +We try to document known issues that folks should be aware of with **iomap** here.
> +
> + * write amplification on IOMAP when bs < ps: **iomap** needs improvements for
> +   large folios for dirty bitmap tracking
> + * filesystems which use buffer head helpers such as ``sb_bread()`` and friends
> +   will have to continue to use buffer heads as there is no generic iomap
> +   metadata read/write library yet.

This will get out of data rapidly. I would not bother to document
stuff like this - if it needs to be documented it should be in the
code where the problems are being fixed, not in the documentation.

> +References
> +==========
> +
> +  *  `Presentation on iomap evolution`<https://docs.google.com/presentation/d/e/2PACX-1vSN4TmhiTu1c6HNv6_gJZFqbFZpbF7GkABllSwJw5iLnSYKkkO-etQJ3AySYEbgJA/pub?start=true&loop=false&delayms=3000&slide=id.g189cfd05063_0_185>`

Someone did "git grep iomap fs/xfs" and turned it into a slide set?

Yeah, they did. The total contents of Slide 11:

| XFS IOMAP usage today
|
| git grep "struct buffer_head" fs/xfs |wc -l
| 0

That's really not a useful reference at all. Maybe what was said
during the talk was more insightful, but as reference material it
is not useful at all.

> +  * `LWN review on deprecating buffer-heads <https://lwn.net/Articles/930173/>`
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index e2b836c2e119..ee4b026995ac 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -10,6 +10,30 @@
>  #include <linux/mm_types.h>
>  #include <linux/blkdev.h>
>  
> +/**
> + * DOC: Introduction
> + *
> + * iomap allows filesystems to sequentially iterate over byte addressable block
> + * ranges on an inode and apply operations to it.
> + *
> + * iomap grew out of the need to provide a modern block mapping abstraction for
> + * filesystems with the different IO access methods they support and assisting
> + * the VFS with manipulating files into the page cache. iomap helpers are
> + * provided for each of these mechanisms. However, block mapping is just one of
> + * the features of iomap, given iomap supports DAX IO for filesystems and also
> + * supports such the ``lseek``/``llseek`` ``SEEK_DATA``/``SEEK_HOLE``
> + * interfaces.
> + *
> + * Block mapping provides a mapping between data cached in memory and the
> + * location on persistent storage where that data lives. `LWN has an great
> + * review of the old buffer-heads block-mapping and why they are inefficient
> + * <https://lwn.net/Articles/930173/>`, since the inception of Linux.  Since
> + * **buffer-heads** work on a 512-byte block based paradigm, it creates an
> + * overhead for modern storage media which no longer necessarily works only on
> + * 512-blocks. iomap is flexible providing block ranges in *bytes*. iomap, with
> + * the support of folios, provides a modern replacement for **buffer-heads**.
> + */

This commentary belongs in the documentation, not the code. If you
are going to document anything in the code, it should be the
requirements the code is implementing, the design behind the
implementation in the code, implementation constraints and
assumptions, etc. 

> +
>  struct address_space;
>  struct fiemap_extent_info;
>  struct inode;
> @@ -22,37 +46,43 @@ struct page;
>  struct vm_area_struct;
>  struct vm_fault;
>  
> -/*
> - * Types of block ranges for iomap mappings:
> +/**
> + * DOC: iomap block ranges types

I seriously dislike this "DOC:" keyword appearing everywhere.
We've already got a "this is a comment for documentation" annotation
in the "/**" comment prefix, having to add "DOC:" is entirely
redudant and unnecessary noise.

> + *
> + * * IOMAP_HOLE		- no blocks allocated, need allocation
> + * * IOMAP_DELALLOC	- delayed allocation blocks
> + * * IOMAP_MAPPED	- blocks allocated at @addr
> + * * IOMAP_UNWRITTEN	- blocks allocated at @addr in unwritten state
> + * * IOMAP_INLINE	- data inline in the inode

Why we need "double *" here? This just makes the comment look weird.

>   */
> -#define IOMAP_HOLE	0	/* no blocks allocated, need allocation */
> -#define IOMAP_DELALLOC	1	/* delayed allocation blocks */
> -#define IOMAP_MAPPED	2	/* blocks allocated at @addr */
> -#define IOMAP_UNWRITTEN	3	/* blocks allocated at @addr in unwritten state */
> -#define IOMAP_INLINE	4	/* data inline in the inode */
> +#define IOMAP_HOLE	0
> +#define IOMAP_DELALLOC	1
> +#define IOMAP_MAPPED	2
> +#define IOMAP_UNWRITTEN	3
> +#define IOMAP_INLINE	4
>  
> -/*
> - * Flags reported by the file system from iomap_begin:
> +/**
> + * DOC:  Flags reported by the file system from iomap_begin
>   *
> - * IOMAP_F_NEW indicates that the blocks have been newly allocated and need
> - * zeroing for areas that no data is copied to.
> + * * IOMAP_F_NEW: indicates that the blocks have been newly allocated and need
> + *	zeroing for areas that no data is copied to.
>   *
> - * IOMAP_F_DIRTY indicates the inode has uncommitted metadata needed to access
> - * written data and requires fdatasync to commit them to persistent storage.
> - * This needs to take into account metadata changes that *may* be made at IO
> - * completion, such as file size updates from direct IO.
+ * * IOMAP_F_DIRTY: indicates the inode has uncommitted metadata needed to access
> + *	written data and requires fdatasync to commit them to persistent storage.
> + *	This needs to take into account metadata changes that *may* be made at IO
> + *	completion, such as file size updates from direct IO.

This breaks all the 80 column wrapping in this file. If you are
going to indent comments, they need to be reflowed to stay within
80 columns.

Which brings me back to why do we want all this extra comment
formatting constraints?"

Next question: Why is the comment formatting here different to the
first set of "DOC" comments you added?


> @@ -61,22 +91,20 @@ struct vm_fault;
>  #define IOMAP_F_BUFFER_HEAD	(1U << 4)
>  #define IOMAP_F_XATTR		(1U << 5)
>  
> -/*
> - * Flags set by the core iomap code during operations:
> +/**
> + * DOC: Flags set by the core iomap code during operations
> + *
> + * * IOMAP_F_SIZE_CHANGED: indicates to the iomap_end method that the file size
> + *	has changed as the result of this write operation.
>   *
> - * IOMAP_F_SIZE_CHANGED indicates to the iomap_end method that the file size
> - * has changed as the result of this write operation.
> + * * IOMAP_F_STALE: indicates that the iomap is not valid any longer and the file
> + *	range it covers needs to be remapped by the high level before the
> + *	operation can proceed.

Again beyond 80 columns.

>   *
> - * IOMAP_F_STALE indicates that the iomap is not valid any longer and the file
> - * range it covers needs to be remapped by the high level before the operation
> - * can proceed.
> + * * IOMAP_F_PRIVATE: Flags from 0x1000 up are for file system specific usage
>   */
>  #define IOMAP_F_SIZE_CHANGED	(1U << 8)
>  #define IOMAP_F_STALE		(1U << 9)
> -
> -/*
> - * Flags from 0x1000 up are for file system specific usage:
> - */
>  #define IOMAP_F_PRIVATE		(1U << 12)
>  
>  
> @@ -124,73 +152,119 @@ static inline bool iomap_inline_data_valid(const struct iomap *iomap)
>  	return iomap->length <= PAGE_SIZE - offset_in_page(iomap->inline_data);
>  }
>  
> -/*
> - * When a filesystem sets folio_ops in an iomap mapping it returns, get_folio
> - * and put_folio will be called for each folio written to.  This only applies
> - * to buffered writes as unbuffered writes will not typically have folios
> - * associated with them.
> - *
> - * When get_folio succeeds, put_folio will always be called to do any
> - * cleanup work necessary.  put_folio is responsible for unlocking and putting
> - * @folio.
> +/**
> + * struct iomap_folio_ops - buffered writes folio folio reference count helpers

folio folio folio folio....

> + *
> + * A filesystem can optionally set folio_ops in a &struct iomap mapping it
> + * returns to override the default get_folio and put_folio for each folio
> + * written to.

That's much worse than the comment it replaces. "get_folio and
put_folio will be called for each folio written to" is *much
clearer* compared to "override the default".
>
> This only applies to buffered writes as unbuffered writes will
> + * not typically have folios associated with them.
> + *
> + * @get_folio: iomap defaults to iomap_get_folio() (which calls
> + *	__filemap_get_folio()) if the filesystem did not provide a get folio op.

Nope. Document what the method is provided with and is expected to
perform, not what iomap_get_folio() does.


> + * @put_folio: when get_folio succeeds, put_folio will always be called to do
> + *	any cleanup work necessary. put_folio is responsible for unlocking and
> + *	putting @folio.

See, you didn't mention that get_folio is responsible for returning a
locked, referenced folio....

> + * @iomap_valid: check that the cached iomap still maps correctly to the
> + *	filesystem's internal extent map. FS internal extent maps can change
> + *	while iomap is iterating a cached iomap, so this hook allows iomap to
> + *	detect that the iomap needs to be refreshed during a long running write operation.

Line length.

> + *
> + *	The filesystem can store internal state (e.g. a sequence number) in
> + *	iomap->validity_cookie when the iomap is first mapped to be able to
> + *	detect changes between mapping time and whenever .iomap_valid() is
> + *	called.
> + *
> + *	This is called with the folio over the specified file position held
> + *	locked by the iomap code.  This is useful for filesystems that have
> + *	dynamic mappings (e.g. anything other than zonefs).  An example reason
> + *	as to why this is necessary is writeback doesn't take the vfs locks.

I don't think these last two sentences add any value here. the first
paragraph makes it pretty clear that it is related to dynamic
mappings, and not taking VFS locks in writeback does not mean a
filesystem needs to implement ->iomap_valid(). Indeed, the problem
this fixes with XFS is related to stale unwritten extent mappings
and memory reclaim racing with write() - it has nothing to do
with the locking in ->writepages so at minimum the comment doesn't
reflect the actual reason for this method existing....

> -/*
> - * Flags for iomap_begin / iomap_end.  No flag implies a read.
> +/**
> + * DOC:  Flags for iomap_begin / iomap_end.  No flag implies a read.
> + *
> + * * IOMAP_WRITE: writing, must allocate blocks
> + * * IOMAP_ZERO: zeroing operation, may skip holes
> + * * IOMAP_REPORT: report extent status, e.g. FIEMAP
> + * * IOMAP_FAULT: mapping for page fault
> + * * IOMAP_DIRECT: direct I/O
> + * * IOMAP_NOWAIT: do not block
> + * * IOMAP_OVERWRITE_ONLY: only pure overwrites allowed
> + * * IOMAP_UNSHARE: unshare_file_range
> + * * IOMAP_DAX: DAX mapping

Yet another different doc comment style.

> +/**
> + * struct iomap_ops - IO interface specific operations
> + *
> + * A filesystem is must provide a &struct iomap_ops for to deal with the
> + * beginning an IO operation, iomap_begin(), and ending an IO operation on a
> + * block range, ``iomap_end()``. You would call iomap with a specialized iomap
> + * operation depending on its filesystem or the VFS needs.
> + *
> + * For example iomap_dio_rw() would be used for for a filesystem when doing a
> + * block range read or write operation with direct IO. In this case your
> + * filesystem's respective &struct file_operations.write_iter() would eventually
> + * call iomap_dio_rw() on the filesystem's &struct file_operations.write_iter().
> + *
> + * For buffered IO a filesystem would use iomap_file_buffered_write() on the
> + * same &struct file_operations.write_iter(). But that is not the only situation
> + * in which a filesystem would deal with buffered writes, you could also use
> + * buffered writes when a filesystem has to deal with &struct
> + * file_operations.fallocate(). However fallocate() can be used for *zeroing* or
> + * for *truncation* purposes. A special respective iomap_zero_range() would be
> + * used for *zeroing* and a iomap_truncate_page() would be used for
> + * *truncation*.

This isn't useful information. It's commentary - it doesn't document
what the methods are supposed to do or provide the iomap
infrastructure. This sort of stuff belongs in the documentation, not
the code.

> + *
> + * Experience with adopting iomap on filesystems have shown that the filesystem
> + * implementation of these operations can be simplified considerably if one
> + * &struct iomap_ops is provided per major filesystem IO operation:
> + *
> + * * buffered io
> + * * direct io
> + * * DAX io
> + * * fiemap for with extended attributes (``FIEMAP_FLAG_XATTR``)
> + * * lseek

And this most definitely doesn't belong in the code. Put the "how to
implement a filesystem" commentary in the documentation, leave the
comments for the API to *describe the API*.

/**
 * struct iomap_ops - IO interface specific operations
 *
 * This structure provides the filesystem methods for mapping a file
 * offset range to a mapped extent than an IO operation could be
 * performed on.
 *
 * An iomap reflects a single contiguous range of filesystem address
 * space that either exists in memory or on a block device. The type
 * of the iomap determines what the range maps to, and there are
 * several different state flags that can be returned that indicate
 * how the iomap infrastructure should modify it's behaviour to do
 * the right thing.
 *
 * The methods are designed to be used as pairs. The begin method
 * creates the iomap and attaches all the necessary state and
 * information subsequent iomap methods and callbacks might need.
 * and once the iomap infrastructure has finished working on the
 * iomap it will call the end method to allow the filesystem to tear
 * down any unused space and/or structures it created for the
 * specific iomap context.
 *
 .....

> + *
> + * @iomap_begin: return the existing mapping at pos, or reserve space starting
> + *	at pos for up to length, as long as we can do it as a single mapping. The
> + *	actual length is returned in iomap->length. The &struct iomap iomap must
> + *	always be set. The &struct iomap srcmap should be set if the range is
> + *	CoW.

Discuss what @flags contains and what the flags mean.

Describe what the @srcmap is used for when CoW is being set up.

Explain constraints on @srcmap because it is not passed to the
->iomap_end() method.

> + *
> + * @iomap_end: commit and/or unreserve space previous allocated using
> + *	iomap_begin. Written indicates the length of the successful write

Shouldn't this be @written

> + *	operation which needs to be committed, while the rest needs to be
> + *	unreserved. Written might be zero if no data was written.

@flags is undocumented.

> + */
>  struct iomap_ops {
> -	/*
> -	 * Return the existing mapping at pos, or reserve space starting at
> -	 * pos for up to length, as long as we can do it as a single mapping.
> -	 * The actual length is returned in iomap->length.
> -	 */
>  	int (*iomap_begin)(struct inode *inode, loff_t pos, loff_t length,
>  			unsigned flags, struct iomap *iomap,
>  			struct iomap *srcmap);
>  
> -	/*
> -	 * Commit and/or unreserve space previous allocated using iomap_begin.
> -	 * Written indicates the length of the successful write operation which
> -	 * needs to be commited, while the rest needs to be unreserved.
> -	 * Written might be zero if no data was written.
> -	 */
>  	int (*iomap_end)(struct inode *inode, loff_t pos, loff_t length,
>  			ssize_t written, unsigned flags, struct iomap *iomap);
>  };
> @@ -207,6 +281,7 @@ struct iomap_ops {
>   * @flags: Zero or more of the iomap_begin flags above.
>   * @iomap: Map describing the I/O iteration
>   * @srcmap: Source map for COW operations
> + * @private: internal use

Completely useless description. Wrong as well.

iomap_dio_rw() allows the caller to pass a filesystem private
pointer which gets stored in the iomap_iter. The iomap_iter is then
passed to iomap_dio_ops->submit_io, and the filesystem can then grab
whatever it private structure it passes to the DIO from the iter
and make use of it.

i.e. if you are going to document this, it is a pointer to a
filesystem owned private structure that was passed to the high level
iomap IO function so it is available to individual filesystem
methods that are passed the iomap_iter structure as it is iterated.

>   */
>  struct iomap_iter {
>  	struct inode *inode;
> @@ -241,7 +316,7 @@ static inline u64 iomap_length(const struct iomap_iter *iter)
>   * @i: iteration structure
>   *
>   * Write operations on file systems with reflink support might require a
> - * source and a destination map.  This function retourns the source map
> + * source and a destination map.  This function returns the source map
>   * for a given operation, which may or may no be identical to the destination
>   * map in &i->iomap.
>   */
> @@ -281,42 +356,52 @@ loff_t iomap_seek_data(struct inode *inode, loff_t offset,
>  sector_t iomap_bmap(struct address_space *mapping, sector_t bno,
>  		const struct iomap_ops *ops);
>  
> -/*
> - * Structure for writeback I/O completions.
> +/**
> + * struct iomap_ioend - for writeback I/O completions
> + *
> + * @io_list: next ioend in chain
> + * @io_type:

Care to update this?

> + * @io_flags: IOMAP_F_*
> + * @io_folios: folios added to ioend
> + * @io_inode: file being written to
> + * @io_size: size of the extent
> + * @io_offset: offset in the file
> + * @io_sector: start sector of ioend
> + * @io_bio: bio being built
> + * @io_inline_bio: MUST BE LAST!

Yet another different doc comment format....

That said, these comments lose a lot of their relevance when the
varible type is stripped out. e.g "io_folios" - is that a pointer to
a list of folios, an array of folios, or something else? it's
something else - it's a counter, and that's obvious from the "u32"
type. It's not obvious from the comment in isolation.

Hence I think just lifting the comments as "documentation" is less
useful than not documenting them at all. I'd still have to go look
at the code to know what these variables mean, and at that point
there's no reason to look at the documentation....

>   */
>  struct iomap_ioend {
> -	struct list_head	io_list;	/* next ioend in chain */
> +	struct list_head	io_list;
>  	u16			io_type;
> -	u16			io_flags;	/* IOMAP_F_* */
> -	u32			io_folios;	/* folios added to ioend */
> -	struct inode		*io_inode;	/* file being written to */
> -	size_t			io_size;	/* size of the extent */
> -	loff_t			io_offset;	/* offset in the file */
> -	sector_t		io_sector;	/* start sector of ioend */
> -	struct bio		*io_bio;	/* bio being built */
> -	struct bio		io_inline_bio;	/* MUST BE LAST! */
> +	u16			io_flags;
> +	u32			io_folios;
> +	struct inode		*io_inode;
> +	size_t			io_size;
> +	loff_t			io_offset
> +	sector_t		io_sector;
> +	struct bio		*io_bio;
> +	struct bio		io_inline_bio;
>  };
>  
> +/**
> + * struct iomap_writeback_ops - used for writeback
> + *
> + * This structure is used to support dealing with a filesystem
> + * ``struct address_space_operations.writepages()``, for writeback.
> + *
> + * @map_blocks: required, maps the blocks so that writeback can be performed on
> + *	the range starting at offset.
> + * @prepare_ioend: optional, allows the file systems to perform actions just
> + *	before submitting the bio and/or override the bio end_io handler for
> + *	complex operations like copy on write extent manipulation or unwritten
> + *	extent conversions.
> + * @discard_folio: optional, allows the file system to discard state on a page where
> + *	we failed to submit any I/O.
> + */

And yet another different document comment format. Lines are still
too long.

This is also much harder to read than the original comments. The
layout of the original comments make it very clear what is required
or optional, now it's hidden in a big chunk of dense text.

>  struct iomap_writeback_ops {
> -	/*
> -	 * Required, maps the blocks so that writeback can be performed on
> -	 * the range starting at offset.
> -	 */
>  	int (*map_blocks)(struct iomap_writepage_ctx *wpc, struct inode *inode,
>  				loff_t offset);
> -
> -	/*
> -	 * Optional, allows the file systems to perform actions just before
> -	 * submitting the bio and/or override the bio end_io handler for complex
> -	 * operations like copy on write extent manipulation or unwritten extent
> -	 * conversions.
> -	 */
>  	int (*prepare_ioend)(struct iomap_ioend *ioend, int status);
> -
> -	/*
> -	 * Optional, allows the file system to discard state on a page where
> -	 * we failed to submit any I/O.
> -	 */
>  	void (*discard_folio)(struct folio *folio, loff_t pos);
>  };
>  
> @@ -334,26 +419,33 @@ int iomap_writepages(struct address_space *mapping,
>  		struct writeback_control *wbc, struct iomap_writepage_ctx *wpc,
>  		const struct iomap_writeback_ops *ops);
>  
> -/*
> - * Flags for direct I/O ->end_io:
> +/**
> + * DOC: Flags for direct I/O ->end_io
> + *
> + * * IOMAP_DIO_UNWRITTEN: covers unwritten extent(s)
> + * * IOMAP_DIO_COW: covers COW extent(s)
>   */
> -#define IOMAP_DIO_UNWRITTEN	(1 << 0)	/* covers unwritten extent(s) */
> -#define IOMAP_DIO_COW		(1 << 1)	/* covers COW extent(s) */
> +#define IOMAP_DIO_UNWRITTEN	(1 << 0)
> +#define IOMAP_DIO_COW		(1 << 1)
>  
> +/**
> + * struct iomap_dio_ops - used for direct IO
> + *
> + * This is used to support direct IO.
> + *
> + * @end_io:
> + * @submit_io:
> + * @bio_set: Filesystems wishing to attach private information to a direct io
> + *	bio must provide a ->submit_io method that attaches the additional
> + *	information to the bio and changes the ->bi_end_io callback to a custom
> + *	function.  This function should, at a minimum, perform any relevant
> + *	post-processing of the bio and end with a call to iomap_dio_bio_end_io.
> + */

Why does the documentation of bio_set document what end_io and
submit_io do, and provide no information about what @bio_set
actually does?

Blindly lifting comments and making them documentation doesn't make
for good documentation....

Cheers,

Dave.
Christoph Hellwig May 19, 2023, 5:04 a.m. UTC | #7
On Thu, May 18, 2023 at 08:01:05AM -0700, Luis Chamberlain wrote:
> +        Mapping of heading styles within this document:
> +        Heading 1 uses "====" above and below
> +        Heading 2 uses "===="
> +        Heading 3 uses "----"
> +        Heading 4 uses "````"
> +        Heading 5 uses "^^^^"
> +        Heading 6 uses "~~~~"
> +        Heading 7 uses "...."
> +
> +        Sections are manually numbered because apparently that's what everyone

Why are you picking different defaults then the rest of the kernel
documentation?

> +
> +A modern block abstraction
> +==========================
> +
> +**iomap** allows filesystems to query storage media for data using *byte
> +ranges*. Since block mapping are provided for a *byte ranges* for cache data in
> +memory, in the page cache, naturally this implies operations on block ranges
> +will also deal with *multipage* operations in the page cache. **Folios** are
> +used to help provide *multipage* operations in memory for the *byte ranges*
> +being worked on.

As mentioned you list time this information was circulated this is not
true.  iomap itself has nothing to with blocks, and even less so with
the page cache per se.  It just iterates over ranges of file data and
applies work to it.

> +iomap IO interfaces
> +===================
> +
> +You call **iomap** depending on the type of filesystem operation you are working
> +on. We detail some of these interactions below.

Who is you?

> +
> +iomap for bufferred IO writes
> +-----------------------------
> +
> +You call **iomap** for buffered IO with:
> +
> + * ``iomap_file_buffered_write()`` - for buffered writes
> + * ``iomap_page_mkwrite()`` - when dealing callbacks for
> +    ``struct vm_operations_struct``
> +
> +  * ``struct vm_operations_struct.page_mkwrite()``
> +  * ``struct vm_operations_struct.fault()``
> +  * ``struct vm_operations_struct.huge_fault()``
> +  * ``struct vm_operations_struct`.pfn_mkwrite()``
> +
> +You *may* use buffered writes to also deal with ``fallocate()``:
> +
> + * ``iomap_zero_range()`` on fallocate for zeroing
> + * ``iomap_truncate_page()`` on fallocate for truncation
> +
> +Typically you'd also happen to use these on paths when updating an inode's size.

I'm not really sure what this is trying to explain.  It basically looks
like filler text generated by machine learning algorithms..

The same is true for a large part of this document.

> +A filesystem also needs to call **iomap** when assisting the VFS manipulating a
> +file into the page cache.

A file systsem doesn't _need_ to do anything.  It may chose to do
things, and the iomap based helpers might be useful for that.  But
again, I'm still not getting what this document is even trying to
explain, as "to implement the method foo, use the iomap_foo" isn't
really helping anyone.

> +Converting filesystems from buffer-head to iomap guide
> +======================================================

If you want such a guide, please keep it in a separate file from the
iomap API documentation.  I'd also suggest that you actually try such
a conversion first, as that might help shaping the documentation :)

> +Testing Direct IO
> +=================
> +
> +Other than fstests you can use LTP's dio, however this tests is limited as it
> +does not test stale data.
> +
> +{{{
> +./runltp -f dio -d /mnt1/scratch/tmp/
> +}}}

How does this belong into an iomap documentation?  If LTPs dio is really
all that useful we should import it into xfstests, btw.  I'm not sure it
is, though.

> +We try to document known issues that folks should be aware of with **iomap** here.

Who is "we"?

> + * DOC: Introduction
> + *
> + * iomap allows filesystems to sequentially iterate over byte addressable block
> + * ranges on an inode and apply operations to it.
> + *
> + * iomap grew out of the need to provide a modern block mapping abstraction for
> + * filesystems with the different IO access methods they support and assisting
> + * the VFS with manipulating files into the page cache. iomap helpers are
> + * provided for each of these mechanisms. However, block mapping is just one of
> + * the features of iomap, given iomap supports DAX IO for filesystems and also
> + * supports such the ``lseek``/``llseek`` ``SEEK_DATA``/``SEEK_HOLE``
> + * interfaces.
> + *
> + * Block mapping provides a mapping between data cached in memory and the
> + * location on persistent storage where that data lives. `LWN has an great
> + * review of the old buffer-heads block-mapping and why they are inefficient
> + * <https://lwn.net/Articles/930173/>`, since the inception of Linux.  Since
> + * **buffer-heads** work on a 512-byte block based paradigm, it creates an
> + * overhead for modern storage media which no longer necessarily works only on
> + * 512-blocks. iomap is flexible providing block ranges in *bytes*. iomap, with
> + * the support of folios, provides a modern replacement for **buffer-heads**.
> + */

I really don't want random blurbs and links like this in the main
header.  If you want to ramble in a little howto that's fine, but the
main header is not the place for it.

Also please keep improvements to the header in a separate patch from
adding Documentation/ documents.
Jonathan Corbet May 19, 2023, 12:41 p.m. UTC | #8
Dave Chinner <david@fromorbit.com> writes:

One little point among all the more substantial stuff:

>> -/*
>> - * Types of block ranges for iomap mappings:
>> +/**
>> + * DOC: iomap block ranges types
>
> I seriously dislike this "DOC:" keyword appearing everywhere.
> We've already got a "this is a comment for documentation" annotation
> in the "/**" comment prefix, having to add "DOC:" is entirely
> redudant and unnecessary noise.

DOC: actually isn't redundant, it causes the kernel-doc directive to
pull that text into the rendered documentation.

This document shows both the advantages and disadvantages of that
mechanism, IMO.  It allows the documentation to be kept with the code,
where optimistic people think it is more likely to be updated.  But it
also scatters the material to the detriment of readers of the plain-text
documentation.  The rendered version of iomap.rst is rather more
complete and comprehensible than the RST file itself.

Thanks,

jon
Darrick J. Wong May 23, 2023, 1:20 a.m. UTC | #9
On Thu, May 18, 2023 at 10:04:43PM -0700, Christoph Hellwig wrote:
> On Thu, May 18, 2023 at 08:01:05AM -0700, Luis Chamberlain wrote:
> > +        Mapping of heading styles within this document:
> > +        Heading 1 uses "====" above and below
> > +        Heading 2 uses "===="
> > +        Heading 3 uses "----"
> > +        Heading 4 uses "````"
> > +        Heading 5 uses "^^^^"
> > +        Heading 6 uses "~~~~"
> > +        Heading 7 uses "...."
> > +
> > +        Sections are manually numbered because apparently that's what everyone
> 
> Why are you picking different defaults then the rest of the kernel
> documentation?

I bet Luis copied that from the online fsck document.

IIRC the doc generator is smart enough to handle per-file heading usage.
The rst parser sourcecode doesn't seem to have harcoded defaults; every
time it sees an unfamiliar heading style in a .rst file, it adds that as
the next level down in the hierarchy.

Also, where are the "proper" headings documented for Documentation/?

(Skip to the end; it's late and I don't have time right now to read the
content of this patch.)

> > +
> > +A modern block abstraction
> > +==========================
> > +
> > +**iomap** allows filesystems to query storage media for data using *byte
> > +ranges*. Since block mapping are provided for a *byte ranges* for cache data in
> > +memory, in the page cache, naturally this implies operations on block ranges
> > +will also deal with *multipage* operations in the page cache. **Folios** are
> > +used to help provide *multipage* operations in memory for the *byte ranges*
> > +being worked on.
> 
> As mentioned you list time this information was circulated this is not
> true.  iomap itself has nothing to with blocks, and even less so with
> the page cache per se.  It just iterates over ranges of file data and
> applies work to it.
> 
> > +iomap IO interfaces
> > +===================
> > +
> > +You call **iomap** depending on the type of filesystem operation you are working
> > +on. We detail some of these interactions below.
> 
> Who is you?
> 
> > +
> > +iomap for bufferred IO writes
> > +-----------------------------
> > +
> > +You call **iomap** for buffered IO with:
> > +
> > + * ``iomap_file_buffered_write()`` - for buffered writes
> > + * ``iomap_page_mkwrite()`` - when dealing callbacks for
> > +    ``struct vm_operations_struct``
> > +
> > +  * ``struct vm_operations_struct.page_mkwrite()``
> > +  * ``struct vm_operations_struct.fault()``
> > +  * ``struct vm_operations_struct.huge_fault()``
> > +  * ``struct vm_operations_struct`.pfn_mkwrite()``
> > +
> > +You *may* use buffered writes to also deal with ``fallocate()``:
> > +
> > + * ``iomap_zero_range()`` on fallocate for zeroing
> > + * ``iomap_truncate_page()`` on fallocate for truncation
> > +
> > +Typically you'd also happen to use these on paths when updating an inode's size.
> 
> I'm not really sure what this is trying to explain.  It basically looks
> like filler text generated by machine learning algorithms..
> 
> The same is true for a large part of this document.
> 
> > +A filesystem also needs to call **iomap** when assisting the VFS manipulating a
> > +file into the page cache.
> 
> A file systsem doesn't _need_ to do anything.  It may chose to do
> things, and the iomap based helpers might be useful for that.  But
> again, I'm still not getting what this document is even trying to
> explain, as "to implement the method foo, use the iomap_foo" isn't
> really helping anyone.
> 
> > +Converting filesystems from buffer-head to iomap guide
> > +======================================================
> 
> If you want such a guide, please keep it in a separate file from the
> iomap API documentation.  I'd also suggest that you actually try such
> a conversion first, as that might help shaping the documentation :)
> 
> > +Testing Direct IO
> > +=================
> > +
> > +Other than fstests you can use LTP's dio, however this tests is limited as it
> > +does not test stale data.
> > +
> > +{{{
> > +./runltp -f dio -d /mnt1/scratch/tmp/
> > +}}}
> 
> How does this belong into an iomap documentation?  If LTPs dio is really
> all that useful we should import it into xfstests, btw.  I'm not sure it
> is, though.
> 
> > +We try to document known issues that folks should be aware of with **iomap** here.
> 
> Who is "we"?
> 
> > + * DOC: Introduction
> > + *
> > + * iomap allows filesystems to sequentially iterate over byte addressable block
> > + * ranges on an inode and apply operations to it.
> > + *
> > + * iomap grew out of the need to provide a modern block mapping abstraction for
> > + * filesystems with the different IO access methods they support and assisting
> > + * the VFS with manipulating files into the page cache. iomap helpers are
> > + * provided for each of these mechanisms. However, block mapping is just one of
> > + * the features of iomap, given iomap supports DAX IO for filesystems and also
> > + * supports such the ``lseek``/``llseek`` ``SEEK_DATA``/``SEEK_HOLE``
> > + * interfaces.
> > + *
> > + * Block mapping provides a mapping between data cached in memory and the
> > + * location on persistent storage where that data lives. `LWN has an great
> > + * review of the old buffer-heads block-mapping and why they are inefficient
> > + * <https://lwn.net/Articles/930173/>`, since the inception of Linux.  Since
> > + * **buffer-heads** work on a 512-byte block based paradigm, it creates an
> > + * overhead for modern storage media which no longer necessarily works only on
> > + * 512-blocks. iomap is flexible providing block ranges in *bytes*. iomap, with
> > + * the support of folios, provides a modern replacement for **buffer-heads**.
> > + */
> 
> I really don't want random blurbs and links like this in the main
> header.  If you want to ramble in a little howto that's fine, but the
> main header is not the place for it.
> 
> Also please keep improvements to the header in a separate patch from
> adding Documentation/ documents.

Frankly I don't really like the iomap.h changes -- that's going to blow
up the git blame on that file, just to produce a stilted-language
manpage.

Someone who wants to port a filesystem to iomap (or write a new fs) will
need a coherent narrative (you know, with paragraphs and sentences)
about how to build this piece and that.  The rst file under Documentation/
is the place for that, not trying to mash it into a C header.

--D
Randy Dunlap May 23, 2023, 2:11 a.m. UTC | #10
On 5/22/23 18:20, Darrick J. Wong wrote:
> On Thu, May 18, 2023 at 10:04:43PM -0700, Christoph Hellwig wrote:
>> On Thu, May 18, 2023 at 08:01:05AM -0700, Luis Chamberlain wrote:
>>> +        Mapping of heading styles within this document:
>>> +        Heading 1 uses "====" above and below
>>> +        Heading 2 uses "===="
>>> +        Heading 3 uses "----"
>>> +        Heading 4 uses "````"
>>> +        Heading 5 uses "^^^^"
>>> +        Heading 6 uses "~~~~"
>>> +        Heading 7 uses "...."
>>> +
>>> +        Sections are manually numbered because apparently that's what everyone
>>
>> Why are you picking different defaults then the rest of the kernel
>> documentation?
> 
> I bet Luis copied that from the online fsck document.
> 
> IIRC the doc generator is smart enough to handle per-file heading usage.
> The rst parser sourcecode doesn't seem to have harcoded defaults; every
> time it sees an unfamiliar heading style in a .rst file, it adds that as
> the next level down in the hierarchy.
> 
> Also, where are the "proper" headings documented for Documentation/?

Documentation/doc-guide/sphinx.rst:

* Please stick to this order of heading adornments:

and following lines.
diff mbox series

Patch

diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index fbb2b5ada95b..6186ab7c3ea8 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -34,6 +34,7 @@  algorithms work.
    seq_file
    sharedsubtree
    idmappings
+   iomap
 
    automount-support
 
diff --git a/Documentation/filesystems/iomap.rst b/Documentation/filesystems/iomap.rst
new file mode 100644
index 000000000000..be487030fcff
--- /dev/null
+++ b/Documentation/filesystems/iomap.rst
@@ -0,0 +1,253 @@ 
+.. SPDX-License-Identifier: GPL-2.0
+.. _iomap:
+
+..
+        Mapping of heading styles within this document:
+        Heading 1 uses "====" above and below
+        Heading 2 uses "===="
+        Heading 3 uses "----"
+        Heading 4 uses "````"
+        Heading 5 uses "^^^^"
+        Heading 6 uses "~~~~"
+        Heading 7 uses "...."
+
+        Sections are manually numbered because apparently that's what everyone
+        does in the kernel.
+.. contents:: Table of Contents
+   :local:
+
+=====
+iomap
+=====
+
+.. kernel-doc:: include/linux/iomap.h
+
+A modern block abstraction
+==========================
+
+**iomap** allows filesystems to query storage media for data using *byte
+ranges*. Since block mapping are provided for a *byte ranges* for cache data in
+memory, in the page cache, naturally this implies operations on block ranges
+will also deal with *multipage* operations in the page cache. **Folios** are
+used to help provide *multipage* operations in memory for the *byte ranges*
+being worked on.
+
+
+iomap IO interfaces
+===================
+
+You call **iomap** depending on the type of filesystem operation you are working
+on. We detail some of these interactions below.
+
+iomap for bufferred IO writes
+-----------------------------
+
+You call **iomap** for buffered IO with:
+
+ * ``iomap_file_buffered_write()`` - for buffered writes
+ * ``iomap_page_mkwrite()`` - when dealing callbacks for
+    ``struct vm_operations_struct``
+
+  * ``struct vm_operations_struct.page_mkwrite()``
+  * ``struct vm_operations_struct.fault()``
+  * ``struct vm_operations_struct.huge_fault()``
+  * ``struct vm_operations_struct`.pfn_mkwrite()``
+
+You *may* use buffered writes to also deal with ``fallocate()``:
+
+ * ``iomap_zero_range()`` on fallocate for zeroing
+ * ``iomap_truncate_page()`` on fallocate for truncation
+
+Typically you'd also happen to use these on paths when updating an inode's size.
+
+iomap for direct IO
+-------------------
+
+You call **iomap** for direct IO with:
+
+ * ``iomap_dio_rw()``
+
+You **may** use direct IO writes to also deal with ``fallocate()``:
+
+ * ``iomap_zero_range()`` on fallocate for zeroing
+ * ``iomap_truncate_page()`` on fallocate for truncation
+
+Typically you'd also happen to use these on paths when updating an inode's size.
+
+iomap for reads
+---------------
+
+You can call into **iomap** for reading, ie, dealing with the filesystems's
+``struct file_operations``:
+
+ * ``struct file_operations.read_iter()``: note that depending on the type of
+   read your filesystem might use ``iomap_dio_rw()`` for direct IO,
+   generic_file_read_iter() for buffered IO and
+   ``dax_iomap_rw()`` for DAX.
+ * ``struct file_operations.remap_file_range()`` - currently the special
+   ``dax_remap_file_range_prep()`` helper is provided for DAX mode reads.
+
+iomap for userspace file extent mapping
+---------------------------------------
+
+The ``fiemap`` ioctl can be used to allow userspace to get a file extent
+mapping. The older ``bmap()`` (aka ``FIBMAP``)  allows the VM to map logical
+block offset to physical block number.  ``bmap()`` is a legacy block mapping
+operation supported only for the ioctl and two areas in the kernel which likely
+are broken (the default swapfile implementation and odd md bitmap code).
+``bmap()`` was only useful in the days of ext2 when there were no support for
+delalloc or unwritten extents. Consequently, the interface reports nothing for
+those types of mappings. Because of this we don't want filesystems to start
+exporting this interface if they don't already do so.
+
+The ``fiemap`` ioctl is supported through an inode ``struct
+inode_operations.fiemap()`` callback.
+
+You would use ``iomap_fiemap()`` to provide the mapping. You could use two
+seperate ``struct iomap_ops`` one for when requested to also map extended
+attributes (``FIEMAP_FLAG_XATTR``) and your another ``struct iomap_ops`` for
+regular read ``struct iomap_ops`` when there is no need for extended attributes.
+In the future **iomap** may provide its own dedicated ops structure for
+``fiemap``.
+
+``iomap_bmap()`` exists and should *only be used* by filesystems that
+**already** supported ``FIBMAP``.  ``FIBMAP`` **should not be used** with the
+address_space -- we have iomap readpages and writepages for that.
+
+iomap for assisting the VFS
+---------------------------
+
+A filesystem also needs to call **iomap** when assisting the VFS manipulating a
+file into the page cache.
+
+iomap for VFS reading
+---------------------
+
+A filesystem can call **iomap** to deal with the VFS reading a file into folios
+with:
+
+ * ``iomap_bmap()`` - called to assist the VFS when manipulating page cache with
+   ``struct address_space_operations.bmap()``, to help the VFS map a logical
+   block offset to physical block number.
+ * ``iomap_read_folio()`` - called to assist the page cache with
+   ``struct address_space_operations.read_folio()``
+ * ``iomap_readahead()`` - called to assist the page cache with
+   ``struct address_space_operations.readahead()``
+
+iomap for VFS writepages
+------------------------
+
+A filesystem can call **iomap** to deal with the VFS write out of pages back to
+backing store, that is to help deal with a filesystems's ``struct
+address_space_operations.writepages()``. The special ``iomap_writepages()`` is
+used for this case with its own respective filestems's ``struct iomap_ops`` for
+this.
+
+iomap for VFS llseek
+--------------------
+
+A filesystem ``struct address_space_operations.llseek()`` is used by the VFS
+when it needs to move the current file offset, the file offset is in ``struct
+file.f_pos``. **iomap** has special support for the ``llseek`` ``SEEK_HOLE`` or
+``SEEK_DATA`` interfaces:
+
+ * ``iomap_seek_hole()``: for when the
+   ``struct address_space_operations.llseek()`` *whence* argument is
+   ``SEEK_HOLE``, when looking for the file's next hole.
+ * ``iomap_seek_data()``: for when the
+   ``struct address_space_operations.llseek()`` *whence* argument isj
+   ``SEEK_DATA`` when looking for the file's next data area.
+
+Your own ``struct iomap_ops`` for this is encouraged.
+
+iomap for DAX
+-------------
+You can use ``dax_iomap_rw()`` when calling iomap from a DAX context, this is
+typically from the filesystems's ``struct file_operations.write_iter()``
+callback.
+
+Converting filesystems from buffer-head to iomap guide
+======================================================
+
+These are generic guidelines on converting a filesystem over to **iomap** from
+'''buffer-heads'''.
+
+One op at at time
+-----------------
+
+You may try to convert a filesystem with different clustered set of operations
+at time, below are a generic order you may strive to target:
+
+ * direct io
+ * miscellaneous helpers (seek/fiemap/bmap)
+ * buffered io
+
+Defining a simple filesystem
+----------------------------
+
+A simple filesystem is perhaps the easiest to convert over to **iomap**, a
+simple filesystem is one which:
+
+ * does not use fsverify, fscrypt, compression
+ * has no Copy on Write support (reflinks)
+
+Converting a simple filesystem to iomap
+---------------------------------------
+
+Simple filesystems should covert to IOMAP piecemeal wise first converting over
+**direct IO**, then the miscellaneous helpers  (seek/fiemap/bmap) and last
+should be buffered IO.
+
+Converting shared filesystem features
+-------------------------------------
+
+Shared filesystems features such as fscrypt, compression, erasure coding, and
+any other data transformations need to be ported to **iomap** first, as none of
+the current **iomap** users require any of this functionality.
+
+Converting complex filesystems
+------------------------------
+
+If your filesystem relies on any shared filesystem features mentioned above
+those would need to be converted piecemeal wise. If reflinks are supported you
+need to first ensure proper locking sanity in order to be able to address byte
+ranges can be handled properly through **iomap** operations.  An example
+filesystem where this work is taking place is btrfs.
+
+IOMAP_F_BUFFER_HEAD considerations
+----------------------------------
+
+``IOMAP_F_BUFFER_HEAD`` won't be removed until we have all filesystem fully
+converted away from **buffer-heads**, and this could be never.
+
+``IOMAP_F_BUFFER_HEAD`` should be avoided as a stepping stone / to port
+filesystems over to **iomap** as it's support for **buffer-heads** only apply to
+the buffered write path and nothing else including the read_folio/readahead and
+writepages aops.
+
+Testing Direct IO
+=================
+
+Other than fstests you can use LTP's dio, however this tests is limited as it
+does not test stale data.
+
+{{{
+./runltp -f dio -d /mnt1/scratch/tmp/
+}}}
+
+Known issues and future improvements
+====================================
+
+We try to document known issues that folks should be aware of with **iomap** here.
+
+ * write amplification on IOMAP when bs < ps: **iomap** needs improvements for
+   large folios for dirty bitmap tracking
+ * filesystems which use buffer head helpers such as ``sb_bread()`` and friends
+   will have to continue to use buffer heads as there is no generic iomap
+   metadata read/write library yet.
+
+References
+==========
+
+  *  `Presentation on iomap evolution`<https://docs.google.com/presentation/d/e/2PACX-1vSN4TmhiTu1c6HNv6_gJZFqbFZpbF7GkABllSwJw5iLnSYKkkO-etQJ3AySYEbgJA/pub?start=true&loop=false&delayms=3000&slide=id.g189cfd05063_0_185>`
+  * `LWN review on deprecating buffer-heads <https://lwn.net/Articles/930173/>`
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index e2b836c2e119..ee4b026995ac 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -10,6 +10,30 @@ 
 #include <linux/mm_types.h>
 #include <linux/blkdev.h>
 
+/**
+ * DOC: Introduction
+ *
+ * iomap allows filesystems to sequentially iterate over byte addressable block
+ * ranges on an inode and apply operations to it.
+ *
+ * iomap grew out of the need to provide a modern block mapping abstraction for
+ * filesystems with the different IO access methods they support and assisting
+ * the VFS with manipulating files into the page cache. iomap helpers are
+ * provided for each of these mechanisms. However, block mapping is just one of
+ * the features of iomap, given iomap supports DAX IO for filesystems and also
+ * supports such the ``lseek``/``llseek`` ``SEEK_DATA``/``SEEK_HOLE``
+ * interfaces.
+ *
+ * Block mapping provides a mapping between data cached in memory and the
+ * location on persistent storage where that data lives. `LWN has an great
+ * review of the old buffer-heads block-mapping and why they are inefficient
+ * <https://lwn.net/Articles/930173/>`, since the inception of Linux.  Since
+ * **buffer-heads** work on a 512-byte block based paradigm, it creates an
+ * overhead for modern storage media which no longer necessarily works only on
+ * 512-blocks. iomap is flexible providing block ranges in *bytes*. iomap, with
+ * the support of folios, provides a modern replacement for **buffer-heads**.
+ */
+
 struct address_space;
 struct fiemap_extent_info;
 struct inode;
@@ -22,37 +46,43 @@  struct page;
 struct vm_area_struct;
 struct vm_fault;
 
-/*
- * Types of block ranges for iomap mappings:
+/**
+ * DOC: iomap block ranges types
+ *
+ * * IOMAP_HOLE		- no blocks allocated, need allocation
+ * * IOMAP_DELALLOC	- delayed allocation blocks
+ * * IOMAP_MAPPED	- blocks allocated at @addr
+ * * IOMAP_UNWRITTEN	- blocks allocated at @addr in unwritten state
+ * * IOMAP_INLINE	- data inline in the inode
  */
-#define IOMAP_HOLE	0	/* no blocks allocated, need allocation */
-#define IOMAP_DELALLOC	1	/* delayed allocation blocks */
-#define IOMAP_MAPPED	2	/* blocks allocated at @addr */
-#define IOMAP_UNWRITTEN	3	/* blocks allocated at @addr in unwritten state */
-#define IOMAP_INLINE	4	/* data inline in the inode */
+#define IOMAP_HOLE	0
+#define IOMAP_DELALLOC	1
+#define IOMAP_MAPPED	2
+#define IOMAP_UNWRITTEN	3
+#define IOMAP_INLINE	4
 
-/*
- * Flags reported by the file system from iomap_begin:
+/**
+ * DOC:  Flags reported by the file system from iomap_begin
  *
- * IOMAP_F_NEW indicates that the blocks have been newly allocated and need
- * zeroing for areas that no data is copied to.
+ * * IOMAP_F_NEW: indicates that the blocks have been newly allocated and need
+ *	zeroing for areas that no data is copied to.
  *
- * IOMAP_F_DIRTY indicates the inode has uncommitted metadata needed to access
- * written data and requires fdatasync to commit them to persistent storage.
- * This needs to take into account metadata changes that *may* be made at IO
- * completion, such as file size updates from direct IO.
+ * * IOMAP_F_DIRTY: indicates the inode has uncommitted metadata needed to access
+ *	written data and requires fdatasync to commit them to persistent storage.
+ *	This needs to take into account metadata changes that *may* be made at IO
+ *	completion, such as file size updates from direct IO.
  *
- * IOMAP_F_SHARED indicates that the blocks are shared, and will need to be
- * unshared as part a write.
+ * * IOMAP_F_SHARED: indicates that the blocks are shared, and will need to be
+ *	unshared as part a write.
  *
- * IOMAP_F_MERGED indicates that the iomap contains the merge of multiple block
- * mappings.
+ * * IOMAP_F_MERGED: indicates that the iomap contains the merge of multiple
+ *	block mappings.
  *
- * IOMAP_F_BUFFER_HEAD indicates that the file system requires the use of
- * buffer heads for this mapping.
+ * * IOMAP_F_BUFFER_HEAD: indicates that the file system requires the use of
+ *	buffer heads for this mapping.
  *
- * IOMAP_F_XATTR indicates that the iomap is for an extended attribute extent
- * rather than a file data extent.
+ * * IOMAP_F_XATTR: indicates that the iomap is for an extended attribute extent
+ *	rather than a file data extent.
  */
 #define IOMAP_F_NEW		(1U << 0)
 #define IOMAP_F_DIRTY		(1U << 1)
@@ -61,22 +91,20 @@  struct vm_fault;
 #define IOMAP_F_BUFFER_HEAD	(1U << 4)
 #define IOMAP_F_XATTR		(1U << 5)
 
-/*
- * Flags set by the core iomap code during operations:
+/**
+ * DOC: Flags set by the core iomap code during operations
+ *
+ * * IOMAP_F_SIZE_CHANGED: indicates to the iomap_end method that the file size
+ *	has changed as the result of this write operation.
  *
- * IOMAP_F_SIZE_CHANGED indicates to the iomap_end method that the file size
- * has changed as the result of this write operation.
+ * * IOMAP_F_STALE: indicates that the iomap is not valid any longer and the file
+ *	range it covers needs to be remapped by the high level before the
+ *	operation can proceed.
  *
- * IOMAP_F_STALE indicates that the iomap is not valid any longer and the file
- * range it covers needs to be remapped by the high level before the operation
- * can proceed.
+ * * IOMAP_F_PRIVATE: Flags from 0x1000 up are for file system specific usage
  */
 #define IOMAP_F_SIZE_CHANGED	(1U << 8)
 #define IOMAP_F_STALE		(1U << 9)
-
-/*
- * Flags from 0x1000 up are for file system specific usage:
- */
 #define IOMAP_F_PRIVATE		(1U << 12)
 
 
@@ -124,73 +152,119 @@  static inline bool iomap_inline_data_valid(const struct iomap *iomap)
 	return iomap->length <= PAGE_SIZE - offset_in_page(iomap->inline_data);
 }
 
-/*
- * When a filesystem sets folio_ops in an iomap mapping it returns, get_folio
- * and put_folio will be called for each folio written to.  This only applies
- * to buffered writes as unbuffered writes will not typically have folios
- * associated with them.
- *
- * When get_folio succeeds, put_folio will always be called to do any
- * cleanup work necessary.  put_folio is responsible for unlocking and putting
- * @folio.
+/**
+ * struct iomap_folio_ops - buffered writes folio folio reference count helpers
+ *
+ * A filesystem can optionally set folio_ops in a &struct iomap mapping it
+ * returns to override the default get_folio and put_folio for each folio
+ * written to.  This only applies to buffered writes as unbuffered writes will
+ * not typically have folios associated with them.
+ *
+ * @get_folio: iomap defaults to iomap_get_folio() (which calls
+ *	__filemap_get_folio()) if the filesystem did not provide a get folio op.
+ *
+ * @put_folio: when get_folio succeeds, put_folio will always be called to do
+ *	any cleanup work necessary. put_folio is responsible for unlocking and
+ *	putting @folio.
+ *
+ * @iomap_valid: check that the cached iomap still maps correctly to the
+ *	filesystem's internal extent map. FS internal extent maps can change
+ *	while iomap is iterating a cached iomap, so this hook allows iomap to
+ *	detect that the iomap needs to be refreshed during a long running write operation.
+ *
+ *	The filesystem can store internal state (e.g. a sequence number) in
+ *	iomap->validity_cookie when the iomap is first mapped to be able to
+ *	detect changes between mapping time and whenever .iomap_valid() is
+ *	called.
+ *
+ *	This is called with the folio over the specified file position held
+ *	locked by the iomap code.  This is useful for filesystems that have
+ *	dynamic mappings (e.g. anything other than zonefs).  An example reason
+ *	as to why this is necessary is writeback doesn't take the vfs locks.
  */
 struct iomap_folio_ops {
 	struct folio *(*get_folio)(struct iomap_iter *iter, loff_t pos,
 			unsigned len);
 	void (*put_folio)(struct inode *inode, loff_t pos, unsigned copied,
 			struct folio *folio);
-
-	/*
-	 * Check that the cached iomap still maps correctly to the filesystem's
-	 * internal extent map. FS internal extent maps can change while iomap
-	 * is iterating a cached iomap, so this hook allows iomap to detect that
-	 * the iomap needs to be refreshed during a long running write
-	 * operation.
-	 *
-	 * The filesystem can store internal state (e.g. a sequence number) in
-	 * iomap->validity_cookie when the iomap is first mapped to be able to
-	 * detect changes between mapping time and whenever .iomap_valid() is
-	 * called.
-	 *
-	 * This is called with the folio over the specified file position held
-	 * locked by the iomap code.
-	 */
 	bool (*iomap_valid)(struct inode *inode, const struct iomap *iomap);
 };
 
-/*
- * Flags for iomap_begin / iomap_end.  No flag implies a read.
+/**
+ * DOC:  Flags for iomap_begin / iomap_end.  No flag implies a read.
+ *
+ * * IOMAP_WRITE: writing, must allocate blocks
+ * * IOMAP_ZERO: zeroing operation, may skip holes
+ * * IOMAP_REPORT: report extent status, e.g. FIEMAP
+ * * IOMAP_FAULT: mapping for page fault
+ * * IOMAP_DIRECT: direct I/O
+ * * IOMAP_NOWAIT: do not block
+ * * IOMAP_OVERWRITE_ONLY: only pure overwrites allowed
+ * * IOMAP_UNSHARE: unshare_file_range
+ * * IOMAP_DAX: DAX mapping
  */
-#define IOMAP_WRITE		(1 << 0) /* writing, must allocate blocks */
-#define IOMAP_ZERO		(1 << 1) /* zeroing operation, may skip holes */
-#define IOMAP_REPORT		(1 << 2) /* report extent status, e.g. FIEMAP */
-#define IOMAP_FAULT		(1 << 3) /* mapping for page fault */
-#define IOMAP_DIRECT		(1 << 4) /* direct I/O */
-#define IOMAP_NOWAIT		(1 << 5) /* do not block */
-#define IOMAP_OVERWRITE_ONLY	(1 << 6) /* only pure overwrites allowed */
-#define IOMAP_UNSHARE		(1 << 7) /* unshare_file_range */
+#define IOMAP_WRITE		(1 << 0)
+#define IOMAP_ZERO		(1 << 1)
+#define IOMAP_REPORT		(1 << 2)
+#define IOMAP_FAULT		(1 << 3)
+#define IOMAP_DIRECT		(1 << 4)
+#define IOMAP_NOWAIT		(1 << 5)
+#define IOMAP_OVERWRITE_ONLY	(1 << 6)
+#define IOMAP_UNSHARE		(1 << 7)
 #ifdef CONFIG_FS_DAX
-#define IOMAP_DAX		(1 << 8) /* DAX mapping */
+#define IOMAP_DAX		(1 << 8)
 #else
 #define IOMAP_DAX		0
 #endif /* CONFIG_FS_DAX */
 
+/**
+ * struct iomap_ops - IO interface specific operations
+ *
+ * A filesystem is must provide a &struct iomap_ops for to deal with the
+ * beginning an IO operation, iomap_begin(), and ending an IO operation on a
+ * block range, ``iomap_end()``. You would call iomap with a specialized iomap
+ * operation depending on its filesystem or the VFS needs.
+ *
+ * For example iomap_dio_rw() would be used for for a filesystem when doing a
+ * block range read or write operation with direct IO. In this case your
+ * filesystem's respective &struct file_operations.write_iter() would eventually
+ * call iomap_dio_rw() on the filesystem's &struct file_operations.write_iter().
+ *
+ * For buffered IO a filesystem would use iomap_file_buffered_write() on the
+ * same &struct file_operations.write_iter(). But that is not the only situation
+ * in which a filesystem would deal with buffered writes, you could also use
+ * buffered writes when a filesystem has to deal with &struct
+ * file_operations.fallocate(). However fallocate() can be used for *zeroing* or
+ * for *truncation* purposes. A special respective iomap_zero_range() would be
+ * used for *zeroing* and a iomap_truncate_page() would be used for
+ * *truncation*.
+ *
+ * Experience with adopting iomap on filesystems have shown that the filesystem
+ * implementation of these operations can be simplified considerably if one
+ * &struct iomap_ops is provided per major filesystem IO operation:
+ *
+ * * buffered io
+ * * direct io
+ * * DAX io
+ * * fiemap for with extended attributes (``FIEMAP_FLAG_XATTR``)
+ * * lseek
+ *
+ * @iomap_begin: return the existing mapping at pos, or reserve space starting
+ *	at pos for up to length, as long as we can do it as a single mapping. The
+ *	actual length is returned in iomap->length. The &struct iomap iomap must
+ *	always be set. The &struct iomap srcmap should be set if the range is
+ *	CoW.
+ *
+ * @iomap_end: commit and/or unreserve space previous allocated using
+ *	iomap_begin. Written indicates the length of the successful write
+ *	operation which needs to be committed, while the rest needs to be
+ *	unreserved. Written might be zero if no data was written.
+ */
 struct iomap_ops {
-	/*
-	 * Return the existing mapping at pos, or reserve space starting at
-	 * pos for up to length, as long as we can do it as a single mapping.
-	 * The actual length is returned in iomap->length.
-	 */
 	int (*iomap_begin)(struct inode *inode, loff_t pos, loff_t length,
 			unsigned flags, struct iomap *iomap,
 			struct iomap *srcmap);
 
-	/*
-	 * Commit and/or unreserve space previous allocated using iomap_begin.
-	 * Written indicates the length of the successful write operation which
-	 * needs to be commited, while the rest needs to be unreserved.
-	 * Written might be zero if no data was written.
-	 */
 	int (*iomap_end)(struct inode *inode, loff_t pos, loff_t length,
 			ssize_t written, unsigned flags, struct iomap *iomap);
 };
@@ -207,6 +281,7 @@  struct iomap_ops {
  * @flags: Zero or more of the iomap_begin flags above.
  * @iomap: Map describing the I/O iteration
  * @srcmap: Source map for COW operations
+ * @private: internal use
  */
 struct iomap_iter {
 	struct inode *inode;
@@ -241,7 +316,7 @@  static inline u64 iomap_length(const struct iomap_iter *iter)
  * @i: iteration structure
  *
  * Write operations on file systems with reflink support might require a
- * source and a destination map.  This function retourns the source map
+ * source and a destination map.  This function returns the source map
  * for a given operation, which may or may no be identical to the destination
  * map in &i->iomap.
  */
@@ -281,42 +356,52 @@  loff_t iomap_seek_data(struct inode *inode, loff_t offset,
 sector_t iomap_bmap(struct address_space *mapping, sector_t bno,
 		const struct iomap_ops *ops);
 
-/*
- * Structure for writeback I/O completions.
+/**
+ * struct iomap_ioend - for writeback I/O completions
+ *
+ * @io_list: next ioend in chain
+ * @io_type:
+ * @io_flags: IOMAP_F_*
+ * @io_folios: folios added to ioend
+ * @io_inode: file being written to
+ * @io_size: size of the extent
+ * @io_offset: offset in the file
+ * @io_sector: start sector of ioend
+ * @io_bio: bio being built
+ * @io_inline_bio: MUST BE LAST!
  */
 struct iomap_ioend {
-	struct list_head	io_list;	/* next ioend in chain */
+	struct list_head	io_list;
 	u16			io_type;
-	u16			io_flags;	/* IOMAP_F_* */
-	u32			io_folios;	/* folios added to ioend */
-	struct inode		*io_inode;	/* file being written to */
-	size_t			io_size;	/* size of the extent */
-	loff_t			io_offset;	/* offset in the file */
-	sector_t		io_sector;	/* start sector of ioend */
-	struct bio		*io_bio;	/* bio being built */
-	struct bio		io_inline_bio;	/* MUST BE LAST! */
+	u16			io_flags;
+	u32			io_folios;
+	struct inode		*io_inode;
+	size_t			io_size;
+	loff_t			io_offset
+	sector_t		io_sector;
+	struct bio		*io_bio;
+	struct bio		io_inline_bio;
 };
 
+/**
+ * struct iomap_writeback_ops - used for writeback
+ *
+ * This structure is used to support dealing with a filesystem
+ * ``struct address_space_operations.writepages()``, for writeback.
+ *
+ * @map_blocks: required, maps the blocks so that writeback can be performed on
+ *	the range starting at offset.
+ * @prepare_ioend: optional, allows the file systems to perform actions just
+ *	before submitting the bio and/or override the bio end_io handler for
+ *	complex operations like copy on write extent manipulation or unwritten
+ *	extent conversions.
+ * @discard_folio: optional, allows the file system to discard state on a page where
+ *	we failed to submit any I/O.
+ */
 struct iomap_writeback_ops {
-	/*
-	 * Required, maps the blocks so that writeback can be performed on
-	 * the range starting at offset.
-	 */
 	int (*map_blocks)(struct iomap_writepage_ctx *wpc, struct inode *inode,
 				loff_t offset);
-
-	/*
-	 * Optional, allows the file systems to perform actions just before
-	 * submitting the bio and/or override the bio end_io handler for complex
-	 * operations like copy on write extent manipulation or unwritten extent
-	 * conversions.
-	 */
 	int (*prepare_ioend)(struct iomap_ioend *ioend, int status);
-
-	/*
-	 * Optional, allows the file system to discard state on a page where
-	 * we failed to submit any I/O.
-	 */
 	void (*discard_folio)(struct folio *folio, loff_t pos);
 };
 
@@ -334,26 +419,33 @@  int iomap_writepages(struct address_space *mapping,
 		struct writeback_control *wbc, struct iomap_writepage_ctx *wpc,
 		const struct iomap_writeback_ops *ops);
 
-/*
- * Flags for direct I/O ->end_io:
+/**
+ * DOC: Flags for direct I/O ->end_io
+ *
+ * * IOMAP_DIO_UNWRITTEN: covers unwritten extent(s)
+ * * IOMAP_DIO_COW: covers COW extent(s)
  */
-#define IOMAP_DIO_UNWRITTEN	(1 << 0)	/* covers unwritten extent(s) */
-#define IOMAP_DIO_COW		(1 << 1)	/* covers COW extent(s) */
+#define IOMAP_DIO_UNWRITTEN	(1 << 0)
+#define IOMAP_DIO_COW		(1 << 1)
 
+/**
+ * struct iomap_dio_ops - used for direct IO
+ *
+ * This is used to support direct IO.
+ *
+ * @end_io:
+ * @submit_io:
+ * @bio_set: Filesystems wishing to attach private information to a direct io
+ *	bio must provide a ->submit_io method that attaches the additional
+ *	information to the bio and changes the ->bi_end_io callback to a custom
+ *	function.  This function should, at a minimum, perform any relevant
+ *	post-processing of the bio and end with a call to iomap_dio_bio_end_io.
+ */
 struct iomap_dio_ops {
 	int (*end_io)(struct kiocb *iocb, ssize_t size, int error,
 		      unsigned flags);
 	void (*submit_io)(const struct iomap_iter *iter, struct bio *bio,
 		          loff_t file_offset);
-
-	/*
-	 * Filesystems wishing to attach private information to a direct io bio
-	 * must provide a ->submit_io method that attaches the additional
-	 * information to the bio and changes the ->bi_end_io callback to a
-	 * custom function.  This function should, at a minimum, perform any
-	 * relevant post-processing of the bio and end with a call to
-	 * iomap_dio_bio_end_io.
-	 */
 	struct bio_set *bio_set;
 };