diff mbox series

[man-pages,RFC,v6] statx, inode: document the new STATX_VERSION field

Message ID 20220928134200.28741-1-jlayton@kernel.org (mailing list archive)
State New, archived
Headers show
Series [man-pages,RFC,v6] statx, inode: document the new STATX_VERSION field | expand

Commit Message

Jeffrey Layton Sept. 28, 2022, 1:42 p.m. UTC
I'm proposing to expose the inode change attribute via statx [1]. Document
what this value means and what an observer can infer from it changing.

NB: this will probably have conflicts with the STATX_DIOALIGN doc
patches, but we should be able to resolve those before merging anything.

Signed-off-by: Jeff Layton <jlayton@kernel.org>

[1]: https://lore.kernel.org/linux-nfs/20220826214703.134870-1-jlayton@kernel.org/T/#t
---
 man2/statx.2 | 13 +++++++++++++
 man7/inode.7 | 36 ++++++++++++++++++++++++++++++++++++
 2 files changed, 49 insertions(+)

v6: incorporate Neil's suggestions
    clarify how well-behaved filesystems should order things

Comments

Alejandro Colomar Oct. 8, 2022, 2:03 a.m. UTC | #1
Hi Jeff,

On 9/28/22 15:42, Jeff Layton wrote:
> I'm proposing to expose the inode change attribute via statx [1]. Document
> what this value means and what an observer can infer from it changing.
> 
> NB: this will probably have conflicts with the STATX_DIOALIGN doc
> patches, but we should be able to resolve those before merging anything.
> 
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> 
> [1]: https://lore.kernel.org/linux-nfs/20220826214703.134870-1-jlayton@kernel.org/T/#t

Thanks! Please see some formatting comments below.

Cheers,

Alex

> ---
>   man2/statx.2 | 13 +++++++++++++
>   man7/inode.7 | 36 ++++++++++++++++++++++++++++++++++++
>   2 files changed, 49 insertions(+)
> 
> v6: incorporate Neil's suggestions
>      clarify how well-behaved filesystems should order things
> 
> diff --git a/man2/statx.2 b/man2/statx.2
> index 0d1b4591f74c..ee7005334a2f 100644
> --- a/man2/statx.2
> +++ b/man2/statx.2
> @@ -62,6 +62,7 @@ struct statx {
>       __u32 stx_dev_major;   /* Major ID */
>       __u32 stx_dev_minor;   /* Minor ID */
>       __u64 stx_mnt_id;      /* Mount ID */
> +    __u64 stx_version;     /* Inode change attribute */
>   };
>   .EE
>   .in
> @@ -247,6 +248,7 @@ STATX_BTIME	Want stx_btime
>   STATX_ALL	The same as STATX_BASIC_STATS | STATX_BTIME.
>   	It is deprecated and should not be used.
>   STATX_MNT_ID	Want stx_mnt_id (since Linux 5.8)
> +STATX_VERSION	Want stx_version (DRAFT)
>   .TE
>   .in
>   .PP
> @@ -407,10 +409,16 @@ This is the same number reported by
>   .BR name_to_handle_at (2)
>   and corresponds to the number in the first field in one of the records in
>   .IR /proc/self/mountinfo .
> +.TP
> +.I stx_version
> +The inode version, also known as the inode change attribute. See

Please use semantic newlines.

See man-pages(7):
    Use semantic newlines
        In the source of a manual page, new sentences  should  be
        started on new lines, long sentences should be split into
        lines  at  clause breaks (commas, semicolons, colons, and
        so on), and long clauses should be split at phrase bound‐
        aries.  This convention,  sometimes  known  as  "semantic
        newlines",  makes it easier to see the effect of patches,
        which often operate at the level of individual sentences,
        clauses, or phrases.


> +.BR inode (7)
> +for details.
>   .PP
>   For further information on the above fields, see
>   .BR inode (7).
>   .\"
> +.TP

Why?  .TP is used to start tagged paragraphs.  But .SS is used to start 
subsections.

>   .SS File attributes
>   The
>   .I stx_attributes
> @@ -489,6 +497,11 @@ without an explicit
>   See
>   .BR mmap (2)
>   for more information.
> +.TP
> +.BR STATX_ATTR_VERSION_MONOTONIC " (since Linux 6.?)"
> +The stx_version value monotonically increases over time and will never appear
> +to go backward, even in the event of a crash. This can allow an application to
> +make a better determination about ordering when viewing different versions.
>   .SH RETURN VALUE
>   On success, zero is returned.
>   On error, \-1 is returned, and
> diff --git a/man7/inode.7 b/man7/inode.7
> index 9b255a890720..e8adb63b1f6a 100644
> --- a/man7/inode.7
> +++ b/man7/inode.7
> @@ -184,6 +184,12 @@ Last status change timestamp (ctime)
>   This is the file's last status change timestamp.
>   It is changed by writing or by setting inode information
>   (i.e., owner, group, link count, mode, etc.).
> +.TP
> +Inode version (version)
> +(not returned in the \fIstat\fP structure); \fIstatx.stx_version\fP

Please use .I and .B macros instead of inline formatting.  The above 
line could be rewritten as:

(not returned in the
.I stat
structure);
.I statx.stx_version


> +.IP
> +This is the inode change counter. See the discussion of
> +\fBthe inode version counter\fP, below.
>   .PP
>   The timestamp fields report time measured with a zero point at the
>   .IR Epoch ,
> @@ -424,6 +430,36 @@ on a directory means that a file
>   in that directory can be renamed or deleted only by the owner
>   of the file, by the owner of the directory, and by a privileged
>   process.
> +.SS The inode version counter
> +.PP

.PP should not be used after .SS.  We use it to separate paragraphs 
between themselves, but [sub]section titles are put next to the first 
paragraph.

> +The \fIstatx.stx_version\fP field is the inode change counter. Any operation
> +that could result in a change to \fIstatx.stx_ctime\fP must result in an
> +increase to this value. Soon after a change has been made, an stx_version value
> +should appear to be larger than previous readings. This is the case even
> +when a ctime change is not evident due to coarse timestamp granularity.
> +.PP
> +An observer cannot infer anything from amount of increase about the
> +nature or magnitude of the change. In fact, a single increment can reflect
> +multiple discrete changes if the value was not checked while those changes
> +were being processed.
> +.PP
> +Changes to stx_version are not necessarily atomic with the change itself, but
> +well-behaved filesystems should increment stx_version after a change has been
> +made visible to observers rather than before. This is especially important for
> +read-caching algorithms which could be fooled into associating a newer
> +stx_version with an older version of data. Note that this does leave a window
> +of time where a change may be visible, but the old stx_version is still being
> +reported.
> +.PP
> +In the event of a system crash, this value can appear to go backward if it was
> +queried before ever being written to the backing store. Applications that
> +persist stx_version values across a reboot should take care to mitigate this.
> +If the filesystem reports \fISTATX_ATTR_VERSION_MONOTONIC\fP in
> +\fIstatx.stx_attributes\fP, then it is not subject to this problem.
> +.PP
> +The stx_version is a Linux extension and is not supported by all filesystems.
> +The application must verify that the \fISTATX_VERSION\fP bit is set in the
> +returned \fIstatx.stx_mask\fP before relying on this field.
>   .SH STANDARDS
>   If you need to obtain the definition of the
>   .I blkcnt_t
diff mbox series

Patch

diff --git a/man2/statx.2 b/man2/statx.2
index 0d1b4591f74c..ee7005334a2f 100644
--- a/man2/statx.2
+++ b/man2/statx.2
@@ -62,6 +62,7 @@  struct statx {
     __u32 stx_dev_major;   /* Major ID */
     __u32 stx_dev_minor;   /* Minor ID */
     __u64 stx_mnt_id;      /* Mount ID */
+    __u64 stx_version;     /* Inode change attribute */
 };
 .EE
 .in
@@ -247,6 +248,7 @@  STATX_BTIME	Want stx_btime
 STATX_ALL	The same as STATX_BASIC_STATS | STATX_BTIME.
 	It is deprecated and should not be used.
 STATX_MNT_ID	Want stx_mnt_id (since Linux 5.8)
+STATX_VERSION	Want stx_version (DRAFT)
 .TE
 .in
 .PP
@@ -407,10 +409,16 @@  This is the same number reported by
 .BR name_to_handle_at (2)
 and corresponds to the number in the first field in one of the records in
 .IR /proc/self/mountinfo .
+.TP
+.I stx_version
+The inode version, also known as the inode change attribute. See
+.BR inode (7)
+for details.
 .PP
 For further information on the above fields, see
 .BR inode (7).
 .\"
+.TP
 .SS File attributes
 The
 .I stx_attributes
@@ -489,6 +497,11 @@  without an explicit
 See
 .BR mmap (2)
 for more information.
+.TP
+.BR STATX_ATTR_VERSION_MONOTONIC " (since Linux 6.?)"
+The stx_version value monotonically increases over time and will never appear
+to go backward, even in the event of a crash. This can allow an application to
+make a better determination about ordering when viewing different versions.
 .SH RETURN VALUE
 On success, zero is returned.
 On error, \-1 is returned, and
diff --git a/man7/inode.7 b/man7/inode.7
index 9b255a890720..e8adb63b1f6a 100644
--- a/man7/inode.7
+++ b/man7/inode.7
@@ -184,6 +184,12 @@  Last status change timestamp (ctime)
 This is the file's last status change timestamp.
 It is changed by writing or by setting inode information
 (i.e., owner, group, link count, mode, etc.).
+.TP
+Inode version (version)
+(not returned in the \fIstat\fP structure); \fIstatx.stx_version\fP
+.IP
+This is the inode change counter. See the discussion of
+\fBthe inode version counter\fP, below.
 .PP
 The timestamp fields report time measured with a zero point at the
 .IR Epoch ,
@@ -424,6 +430,36 @@  on a directory means that a file
 in that directory can be renamed or deleted only by the owner
 of the file, by the owner of the directory, and by a privileged
 process.
+.SS The inode version counter
+.PP
+The \fIstatx.stx_version\fP field is the inode change counter. Any operation
+that could result in a change to \fIstatx.stx_ctime\fP must result in an
+increase to this value. Soon after a change has been made, an stx_version value
+should appear to be larger than previous readings. This is the case even
+when a ctime change is not evident due to coarse timestamp granularity.
+.PP
+An observer cannot infer anything from amount of increase about the
+nature or magnitude of the change. In fact, a single increment can reflect
+multiple discrete changes if the value was not checked while those changes
+were being processed.
+.PP
+Changes to stx_version are not necessarily atomic with the change itself, but
+well-behaved filesystems should increment stx_version after a change has been
+made visible to observers rather than before. This is especially important for
+read-caching algorithms which could be fooled into associating a newer
+stx_version with an older version of data. Note that this does leave a window
+of time where a change may be visible, but the old stx_version is still being
+reported.
+.PP
+In the event of a system crash, this value can appear to go backward if it was
+queried before ever being written to the backing store. Applications that
+persist stx_version values across a reboot should take care to mitigate this.
+If the filesystem reports \fISTATX_ATTR_VERSION_MONOTONIC\fP in
+\fIstatx.stx_attributes\fP, then it is not subject to this problem.
+.PP
+The stx_version is a Linux extension and is not supported by all filesystems.
+The application must verify that the \fISTATX_VERSION\fP bit is set in the
+returned \fIstatx.stx_mask\fP before relying on this field.
 .SH STANDARDS
 If you need to obtain the definition of the
 .I blkcnt_t