xfs: report crtime and attribute flags to statx

Message ID	20170307000609.GG5280@birch.djwong.org (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-fsdevel-owner@kernel.org> Date: Mon, 6 Mar 2017 16:06:09 -0800 From: "Darrick J. Wong" <darrick.wong@oracle.com> To: "Darrick J. Wong" <darrick.wong@oracle.com> Cc: dhowells@redhat.com, linux-fsdevel <linux-fsdevel@vger.kernel.org>, xfs <linux-xfs@vger.kernel.org> Subject: [PATCH] xfs: report crtime and attribute flags to statx Message-ID: <20170307000609.GG5280@birch.djwong.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk

Darrick J. Wong March 7, 2017, 12:06 a.m. UTC

statx has the ability to report inode creation times and inode flags, so
hook up di_crtime and di_flags to that functionality.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_iops.c |   14 ++++++++++++++
 1 file changed, 14 insertions(+)

Christoph Hellwig March 7, 2017, 5:01 a.m. UTC | #1

On Mon, Mar 06, 2017 at 04:06:09PM -0800, Darrick J. Wong wrote:
> statx has the ability to report inode creation times and inode flags, so
> hook up di_crtime and di_flags to that functionality.

This would be great to have in 4.11 together with the initial statx
implementation.  But until I see documentation and testcases for statx
I don't really feel comfortable reviewing anything related to it.

David Howells March 7, 2017, 5:22 p.m. UTC | #2

Christoph Hellwig <hch@infradead.org> wrote:

> This would be great to have in 4.11 together with the initial statx
> implementation.  But until I see documentation and testcases for statx
> I don't really feel comfortable reviewing anything related to it.

Well, since you asked for documentation, here's a manual page for you to
review:-)

Note that as it isn't in glibc yet, I've left out all the
set-this-and-that-#define-to-make-it-appear stuff except where it is pertinent
to particular constants.

I don't suppose you know where the documentation on writing xfstests tests is?
xfstests-dev/doc/ only contains an old and out of date changelog.

David
---
'\" t
.\" Copyright (c) 1992 Drew Eckhardt (drew@cs.colorado.edu), March 28, 1992
.\" Parts Copyright (c) 1995 Nicolai Langfeldt (janl@ifi.uio.no), 1/1/95
.\" and Copyright (c) 2006, 2007, 2014 Michael Kerrisk <mtk.manpages@gmail.com>
.\" and Copyright (c) 2017 David Howells <dhowells@redhat.com>
.\"
.\" %%%LICENSE_START(VERBATIM)
.\" Permission is granted to make and distribute verbatim copies of this
.\" manual provided the copyright notice and this permission notice are
.\" preserved on all copies.
.\"
.\" Permission is granted to copy and distribute modified versions of this
.\" manual under the conditions for verbatim copying, provided that the
.\" entire resulting derived work is distributed under the terms of a
.\" permission notice identical to this one.
.\"
.\" Since the Linux kernel and libraries are constantly changing, this
.\" manual page may be incorrect or out-of-date.  The author(s) assume no
.\" responsibility for errors or omissions, or for damages resulting from
.\" the use of the information contained herein.  The author(s) may not
.\" have taken the same level of care in the production of this manual,
.\" which is licensed free of charge, as they might when working
.\" professionally.
.\"
.\" Formatted or processed versions of this manual, if unaccompanied by
.\" the source, must acknowledge the copyright and authors of this work.
.\" %%%LICENSE_END
.\"
.TH STATX 2 2017-03-07 "Linux" "Linux Programmer's Manual"
.SH NAME
statx \- Get file status (extended)
.SH SYNOPSIS
.nf
.B #include <sys/types.h>
.br
.B #include <sys/stat.h>
.br
.B #include <unistd.h>
.br
.BR "#include <fcntl.h>           " "/* Definition of AT_* constants */"
.sp
.BI "int statx(int " dirfd ", const char *" pathname ", int " flags ","
.BI "          unsigned int " mask ", struct statx *" buf );
.fi
.sp
.in -4n
Feature Test Macro Requirements for glibc (see
.BR feature_test_macros (7)):
.in
.ad l
.PD 0
.sp
.RS 4
<unknown as yet>
.RE
.PD
.ad
.SH DESCRIPTION
.PP
This function returns information about a file, storing it in the buffer
pointed to by
.IR buf .
The buffer is filled in according to the following type:
.PP
.in +4n
.nf
struct statx {
    __u32 stx_mask;          -- Mask of bits indicating filled fields
    __u32 stx_blksize;       -- Block size for filesystem I/O
    __u64 stx_attributes;    -- Extra file attribute indicators
    __u32 stx_nlink;         -- Number of hard links
    __u32 stx_uid;           -- User ID of owner
    __u32 stx_gid;           -- Group ID of owner
    __u16 stx_mode;          -- File type and mode
    __u64 stx_ino;           -- Inode number
    __u64 stx_size;          -- Total size in bytes
    __u64 stx_blocks;        -- Number of 512B blocks allocated
    struct statx_timestamp stx_atime;  -- Time of last access
    struct statx_timestamp stx_btime;  -- Time of creation
    struct statx_timestamp stx_ctime;  -- Time of last status change
    struct statx_timestamp stx_mtime;  -- Time of last modification
    __u32 stx_rdev_major;    } Device number if device file
    __u32 stx_rdev_minor;    }
    __u32 stx_dev_major;      } Device number of containing file
    __u32 stx_dev_minor;      }
};
.fi
.in
.PP
Where the timestamps are defined as:
.PP
.in +4n
.nf
struct statx_timestamp {
    __s64 tv_sec;    -- Number of seconds before or since 1970
    __s32 tv_nsec;   -- Number of nanoseconds before or since tv_sec
};
.fi
.in
.PP
(Note that reserved space and padding is ommitted)
.SS
Invoking \fBstatx\fR():
.PP
To access a file's status, no permissions are required on the file itself, but
in the case of
.BR statx ()
with a path, execute (search) permission is required on all of the directories
in
.I pathname
that lead to the file.
.PP
.BR statx ()
uses
.IR pathname ", " dirfd " and " flags
to locate the target file in one of a variety of ways:
.TP
[*] By absolute path.
.I pathname
points to an absolute path and
.I dirfd
is ignored.  The file is looked up by name, starting from the root of the
filesystem as seen by the calling process.
.TP
[*] By cwd-relative path.
.I pathname
points to a relative path and
.IR dirfd " is " AT_FDCWD .
The file is looked up by name, starting from the current working directory.
.TP
[*] By dir-relative path.
.I pathname
points to relative path and
.I dirfd
indicates a file descriptor pointing to a directory.  The file is looked up by
name, starting from the directory specified by
.IR dirfd .
.TP
[*] By file descriptor.
.IR pathname " is " NULL " and " dirfd
indicates a file descriptor.  The file attached to the file descriptor is
queried directly.  The file descriptor may point to any type of file, not just
a directory.
.PP
.I flags
can be used to influence a path-based lookup.  A value for
.I flags
is constructed by OR'ing together zero or more of the following constants:
.TP
.BR AT_EMPTY_PATH " (since Linux 2.6.39)"
.\" commit 65cfc6722361570bfe255698d9cd4dccaf47570d
If
.I pathname
is an empty string, operate on the file referred to by
.IR dirfd
(which may have been obtained using the
.BR open (2)
.B O_PATH
flag).
If
.I dirfd
is
.BR AT_FDCWD ,
the call operates on the current working directory.
In this case,
.I dirfd
can refer to any type of file, not just a directory.
This flag is Linux-specific; define
.B _GNU_SOURCE
.\" Before glibc 2.16, defining _ATFILE_SOURCE sufficed
to obtain its definition.
.TP
.BR AT_NO_AUTOMOUNT " (since Linux 2.6.38)"
Don't automount the terminal ("basename") component of
.I pathname
if it is a directory that is an automount point.
This allows the caller to gather attributes of an automount point
(rather than the location it would mount).
This flag can be used in tools that scan directories
to prevent mass-automounting of a directory of automount points.
The
.B AT_NO_AUTOMOUNT
flag has no effect if the mount point has already been mounted over.
This flag is Linux-specific; define
.B _GNU_SOURCE
.\" Before glibc 2.16, defining _ATFILE_SOURCE sufficed
to obtain its definition.
.TP
.B AT_SYMLINK_NOFOLLOW
If
.I pathname
is a symbolic link, do not dereference it:
instead return information about the link itself, like
.BR lstat ().
.PP
.I flags
can also be used to control what sort of synchronisation the kernel will do
when querying a file on a remote filesystem.  This is done by OR'ing in one of
the following values:
.TP
AT_STATX_SYNC_AS_STAT
Do whatever
.BR stat ()
does.  This is the default and is very much filesystem specific.
.TP
AT_STATX_FORCE_SYNC
Force the attributes to be synchronised with the server.  This may require that
a network filesystem perform a data writeback to get the timestamps correct.
.TP
AT_STATX_DONT_SYNC
Don't synchronise anything, but rather just take whatever the system has cached
if possible.  This may mean that the information returned is approximate, but,
on a network filesystem, it may not involve a round trip to the server - even
if no lease is held.
.PP
The
.I mask
argument to
.BR statx ()
is used to tell the kernel which fields the caller is interested in
.I mask
is an OR'ed combination of the following constants:
.PP
.in +4n
.TS
lB l.
STATX_TYPE	Want stx_mode & S_IFMT
STATX_MODE	Want stx_mode & ~S_IFMT
STATX_NLINK	Want stx_nlink
STATX_UID	Want stx_uid
STATX_GID	Want stx_gid
STATX_ATIME	Want stx_atime{,_ns}
STATX_MTIME	Want stx_mtime{,_ns}
STATX_CTIME	Want stx_ctime{,_ns}
STATX_INO	Want stx_ino
STATX_SIZE	Want stx_size
STATX_BLOCKS	Want stx_blocks
STATX_BASIC_STATS	[The stuff in the normal stat struct]
STATX_BTIME	Want stx_btime{,_ns}
STATX_ALL	[All currently available stuff]
.TE
.in
.PP
.B "Do not"
simply set
.I mask
to UINT_MAX as one or more bits may, in future, be used to specify an extension
to the buffer.
.SS
The returned information
.PP
The status information for the target file is returned in the
.I statx
structure pointed to by
.IR buf .
Included in this is
.I stx_mask
which indicates what other information has been returned.
.I stx_mask
has the same format as the mask argument and bits are set in it to indicate
which fields have been filled in.
.PP
It should be noted that the kernel may return fields that weren't requested and
may fail to return fields that were requested, depending on what the backing
filesystem supports.  In either case,
.I stx_mask
will not be equal
.IR mask .
.PP
If a filesystem does not support a field or if it has an unrepresentable value
(for instance, a file with an exotic type), then the mask bit corresponding to
that field will be cleared in
.I stx_mask
even if the user asked for it and a dummy value will be filled in for
compatibility purposes if one is available (e.g. a dummy uid and gid may be
specified to mount under some circumstances).
.PP
A filesystem may also fill in fields that the caller didn't ask for if it has
values for them available at no extra cost.  If this happens, the corresponding
bits will be set in
.IR stx_mask .
.PP

.\" Background: inode attributes are modified with i_mutex held, but
.\" read by stat() without taking the mutex.
.I Note:
For performance and simplicity reasons, different fields in the
.I statx
structure may contain state information from different moments
during the execution of the system call.
For example, if
.IR stx_mode
or
.IR stx_uid
is changed by another process by calling
.BR chmod (2)
or
.BR chown (2),
.BR stat ()
might return the old
.I stx_mode
together with the new
.IR stx_uid ,
or the old
.I stx_uid
together with the new
.IR stx_mode .
.PP
Apart from stx_mask (which is described above), the fields in the
.I statx
structure are:
.TP
.I stx_mode
The file type and mode.  This is described in more detail below.
.TP
.I stx_size
The size of the file (if it is a regular file or a symbolic link) in bytes.
The size of a symbolic link is the length of the pathname it contains, without
a terminating null byte.
.TP
.I stx_blocks
The number of blocks allocated to the file on the medium, in 512-byte units.
(This may be smaller than
.IR stx_size /512
when the file has holes.)
.TP
.I stx_blksize
The "preferred" blocksize for efficient filesystem I/O.  (Writing to a file in
smaller chunks may cause an inefficient read-modify-rewrite.)
.TP
.I stx_nlink
The number of hard links on a file.
.TP
.I stx_uid
The user ID of the file's owner.
.TP
.I stx_gid
The ID of the group that may access the file.
.TP
.IR stx_dev_major " and "  stx_dev_minor
The device on which this file (inode) resides.
.TP
.IR stx_rdev_major " and "  stx_rdev_minor
The device that this file (inode) represents if the file is of block or
character device type.
.TP
.I stx_attributes
Further status information about the file.  This consists of zero or more of
the following constants OR'ed together:
.in +4n
.TS
lB l.
STATX_ATTR_COMPRESSED	File is compressed by the fs
STATX_ATTR_IMMUTABLE	File is marked immutable
STATX_ATTR_APPEND	File is append-only
STATX_ATTR_NODUMP	File is not to be dumped
STATX_ATTR_ENCRYPTED	File requires key to decrypt in fs
.TE
.in
.TP
.I stx_atime
The file's last access timestamp.
This field is changed by file accesses, for example, by
.BR execve (2),
.BR mknod (2),
.BR pipe (2),
.BR utime (2),
and
.BR read (2)
(of more than zero bytes).
Other routines, such as
.BR mmap (2),
may or may not update it.
.TP
.I stx_btime
The file's creation timestamp.  This is set on file creation and not changed
subsequently.
.TP
.I stx_ctime
The file's last status change timestamp.  This field is changed by writing or
by setting inode information (i.e., owner, group, link count, mode, etc.).
.TP
.I stx_mtime
The file's last modification timestamp.  This is changed by file modifications,
for example, by
.BR mknod (2),
.BR truncate (2),
.BR utime (2),
and
.BR write (2)
(of more than zero bytes).  Moreover, the modification time of a directory is
changed by the creation or deletion of files in that directory.  This field is
.I not
changed for changes in owner, group, hard link count, or mode.

.PP
Not all of the Linux filesystems implement all of the timestamp fields.  Some
filesystems allow mounting in such a way that file and/or directory accesses do
not cause an update of the
.I stx_atime
field.
(See
.IR noatime ,
.IR nodiratime ,
and
.I relatime
in
.BR mount (8),
and related information in
.BR mount (2).)
In addition,
.I stx_atime
is not updated if a file is opened with the
.BR O_NOATIME ;
see
.BR open (2).

.SS File type and mode
.PP
The
.I stx_mode
field contains the combined file type and mode.  POSIX refers to the bits in
this field corresponding to the mask
.B S_IFMT
(see below) as the
.IR "file type" ,
the 12 bits corresponding to the mask 07777 as the
.IR "file mode bits"
and the least significant 9 bits (0777) as the
.IR "file permission bits" .
.IP
The following mask values are defined for the file type of the
.I stx_mode
field:
.in +4n
.TS
lB l l.
S_IFMT	0170000	bit mask for the file type bit field

S_IFSOCK	0140000	socket
S_IFLNK	0120000	symbolic link
S_IFREG	0100000	regular file
S_IFBLK	0060000	block device
S_IFDIR	0040000	directory
S_IFCHR	0020000	character device
S_IFIFO	0010000	FIFO
.TE
.in
.IP
Note that
.I stx_mode
has two mask flags covering it: one for the type and one for the mode bits.
.PP
To test for a regular file (for example), one could write:
.nf
.in +4n
statx(AT_FDCWD, pathname, 0, STATX_BASIC_STATS, &sb);
if ((sb.stx_mode & S_IFMT) == S_IFREG) {
    /* Handle regular file */
}
.in
.fi
.PP
Because tests of the above form are common, additional macros are defined by
POSIX to allow the test of the file type in
.I stx_mode
to be written more concisely:
.RS 4
.TS
lB l.
\fBS_ISREG\fR(m)	Is it a regular file?
\fBS_ISDIR\fR(m)	Is it a directory?
\fBS_ISCHR\fR(m)	Is it a character device?
\fBS_ISBLK\fR(m)	Is it a block device?
\fBS_ISFIFO\fR(m)	Is it a FIFO (named pipe)?
\fBS_ISLNK\fR(m)	Is it a symbolic link?  (Not in POSIX.1-1996.)
\fBS_ISSOCK\fR(m)	Is it a socket?  (Not in POSIX.1-1996.)
.TE
.RE
.PP
The preceding code snippet could thus be rewritten as:

.nf
.in +4n
statx(AT_FDCWD, pathname, 0, STATX_BASIC_STATS, &sb);
if (S_ISREG(sb.stx_mode)) {
    /* Handle regular file */
}
.in
.fi
.PP
The definitions of most of the above file type test macros
are provided if any of the following feature test macros is defined:
.BR _BSD_SOURCE
(in glibc 2.19 and earlier),
.BR _SVID_SOURCE
(in glibc 2.19 and earlier),
or
.BR _DEFAULT_SOURCE
(in glibc 2.20 and later).
In addition, definitions of all of the above macros except
.BR S_IFSOCK
and
.BR S_ISSOCK ()
are provided if
.BR _XOPEN_SOURCE
is defined.
The definition of
.BR S_IFSOCK
can also be exposed by defining
.BR _XOPEN_SOURCE
with a value of 500 or greater.

The definition of
.BR S_ISSOCK ()
is exposed if any of the following feature test macros is defined:
.BR _BSD_SOURCE
(in glibc 2.19 and earlier),
.BR _DEFAULT_SOURCE
(in glibc 2.20 and later),
.BR _XOPEN_SOURCE
with a value of 500 or greater, or
.BR _POSIX_C_SOURCE
with a value of 200112L or greater.
.PP
The following mask values are defined for
the file mode component of the
.I stx_mode
field:
.in +4n
.TS
lB l l.
S_ISUID	  04000	set-user-ID bit
S_ISGID	  02000	set-group-ID bit (see below)
S_ISVTX	  01000	sticky bit (see below)

S_IRWXU	  00700	owner has read, write, and execute permission
S_IRUSR	  00400	owner has read permission
S_IWUSR	  00200	owner has write permission
S_IXUSR	  00100	owner has execute permission

S_IRWXG	  00070	group has read, write, and execute permission
S_IRGRP	  00040	group has read permission
S_IWGRP	  00020	group has write permission
S_IXGRP	  00010	group has execute permission

S_IRWXO	  00007	T{
others (not in group) have read, write, and execute permission
T}
S_IROTH	  00004	others have read permission
S_IWOTH	  00002	others have write permission
S_IXOTH	  00001	others have execute permission
.TE
.in
.P
The set-group-ID bit
.RB ( S_ISGID )
has several special uses.
For a directory, it indicates that BSD semantics is to be used
for that directory: files created there inherit their group ID from
the directory, not from the effective group ID of the creating process,
and directories created there will also get the
.B S_ISGID
bit set.
For a file that does not have the group execution bit
.RB ( S_IXGRP )
set,
the set-group-ID bit indicates mandatory file/record locking.
.P
The sticky bit
.RB ( S_ISVTX )
on a directory means that a file
in that directory can be renamed or deleted only by the owner
of the file, by the owner of the directory, and by a privileged
process.

.SH RETURN VALUE
On success, zero is returned.
On error, \-1 is returned, and
.I errno
is set appropriately.
.SH ERRORS
.TP
.B EINVAL
Invalid flag specified in
.IR flags .
.TP
.B EACCES
Search permission is denied for one of the directories
in the path prefix of
.IR pathname .
(See also
.BR path_resolution (7).)
.TP
.B EBADF
.I dirfd
is not a valid open file descriptor.
.TP
.B EFAULT
Bad address.
.TP
.B ELOOP
Too many symbolic links encountered while traversing the path.
.TP
.B ENAMETOOLONG
.I pathname
is too long.
.TP
.B ENOENT
A component of
.I pathname
does not exist, or
.I pathname
is an empty string.
.TP
.B ENOMEM
Out of memory (i.e., kernel memory).
.TP
.B ENOTDIR
A component of the path prefix of
.I pathname
is not a directory or
.I pathname
is relative and
.I dirfd
is a file descriptor referring to a file other than a directory.
.SH VERSIONS
.BR statx ()
was added to Linux in kernel 4.11;
library support is not yet added to glibc.
.SH SEE ALSO
.BR ls (1),
.BR stat (1),
.BR access (2),
.BR chmod (2),
.BR chown (2),
.BR readlink (2),
.BR utime (2),
.BR capabilities (7),
.BR symlink (7)

Darrick J. Wong March 7, 2017, 5:23 p.m. UTC | #3

On Mon, Mar 06, 2017 at 09:01:40PM -0800, Christoph Hellwig wrote:
> On Mon, Mar 06, 2017 at 04:06:09PM -0800, Darrick J. Wong wrote:
> > statx has the ability to report inode creation times and inode flags, so
> > hook up di_crtime and di_flags to that functionality.
> 
> This would be great to have in 4.11 together with the initial statx
> implementation.  But until I see documentation and testcases for statx
> I don't really feel comfortable reviewing anything related to it.

Same feeling here -- aside from cargo-culting from the proposed ext4
patch I don't have much to go on other than floating a test balloon to
see if Mr. Howells responds.

(FWIW the samples/test-statx.c program reported the flags & btime.)

--D

> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Darrick J. Wong March 7, 2017, 6:02 p.m. UTC | #4

On Tue, Mar 07, 2017 at 05:22:55PM +0000, David Howells wrote:
> Christoph Hellwig <hch@infradead.org> wrote:
> 
> > This would be great to have in 4.11 together with the initial statx
> > implementation.  But until I see documentation and testcases for statx
> > I don't really feel comfortable reviewing anything related to it.
> 
> Well, since you asked for documentation, here's a manual page for you to
> review:-)

HA, ok.  This came in while I was scribbling a reply to Christoph's email.
I'll have a look.

> Note that as it isn't in glibc yet, I've left out all the
> set-this-and-that-#define-to-make-it-appear stuff except where it is pertinent
> to particular constants.
> 
> I don't suppose you know where the documentation on writing xfstests tests is?
> xfstests-dev/doc/ only contains an old and out of date changelog.

/me usually just copies one of the newer tests and changes it. <cough>

> David
> ---
> '\" t
> .\" Copyright (c) 1992 Drew Eckhardt (drew@cs.colorado.edu), March 28, 1992
> .\" Parts Copyright (c) 1995 Nicolai Langfeldt (janl@ifi.uio.no), 1/1/95
> .\" and Copyright (c) 2006, 2007, 2014 Michael Kerrisk <mtk.manpages@gmail.com>
> .\" and Copyright (c) 2017 David Howells <dhowells@redhat.com>
> .\"
> .\" %%%LICENSE_START(VERBATIM)
> .\" Permission is granted to make and distribute verbatim copies of this
> .\" manual provided the copyright notice and this permission notice are
> .\" preserved on all copies.
> .\"
> .\" Permission is granted to copy and distribute modified versions of this
> .\" manual under the conditions for verbatim copying, provided that the
> .\" entire resulting derived work is distributed under the terms of a
> .\" permission notice identical to this one.
> .\"
> .\" Since the Linux kernel and libraries are constantly changing, this
> .\" manual page may be incorrect or out-of-date.  The author(s) assume no
> .\" responsibility for errors or omissions, or for damages resulting from
> .\" the use of the information contained herein.  The author(s) may not
> .\" have taken the same level of care in the production of this manual,
> .\" which is licensed free of charge, as they might when working
> .\" professionally.
> .\"
> .\" Formatted or processed versions of this manual, if unaccompanied by
> .\" the source, must acknowledge the copyright and authors of this work.
> .\" %%%LICENSE_END
> .\"
> .TH STATX 2 2017-03-07 "Linux" "Linux Programmer's Manual"
> .SH NAME
> statx \- Get file status (extended)
> .SH SYNOPSIS
> .nf
> .B #include <sys/types.h>
> .br
> .B #include <sys/stat.h>
> .br
> .B #include <unistd.h>
> .br
> .BR "#include <fcntl.h>           " "/* Definition of AT_* constants */"
> .sp
> .BI "int statx(int " dirfd ", const char *" pathname ", int " flags ","
> .BI "          unsigned int " mask ", struct statx *" buf );
> .fi
> .sp
> .in -4n
> Feature Test Macro Requirements for glibc (see
> .BR feature_test_macros (7)):
> .in
> .ad l
> .PD 0
> .sp
> .RS 4
> <unknown as yet>
> .RE
> .PD
> .ad
> .SH DESCRIPTION
> .PP
> This function returns information about a file, storing it in the buffer
> pointed to by
> .IR buf .
> The buffer is filled in according to the following type:
> .PP
> .in +4n
> .nf
> struct statx {
>     __u32 stx_mask;          -- Mask of bits indicating filled fields
>     __u32 stx_blksize;       -- Block size for filesystem I/O
>     __u64 stx_attributes;    -- Extra file attribute indicators
>     __u32 stx_nlink;         -- Number of hard links
>     __u32 stx_uid;           -- User ID of owner
>     __u32 stx_gid;           -- Group ID of owner
>     __u16 stx_mode;          -- File type and mode
>     __u64 stx_ino;           -- Inode number
>     __u64 stx_size;          -- Total size in bytes
>     __u64 stx_blocks;        -- Number of 512B blocks allocated
>     struct statx_timestamp stx_atime;  -- Time of last access
>     struct statx_timestamp stx_btime;  -- Time of creation
>     struct statx_timestamp stx_ctime;  -- Time of last status change
>     struct statx_timestamp stx_mtime;  -- Time of last modification
>     __u32 stx_rdev_major;    } Device number if device file

"of device file" ?

>     __u32 stx_rdev_minor;    }
>     __u32 stx_dev_major;      } Device number of containing file
>     __u32 stx_dev_minor;      }

"Device number of device containing file"?

Or perhaps just "ID of device containing file" from the stat(2)
manpage?

> };
> .fi
> .in
> .PP
> Where the timestamps are defined as:
> .PP
> .in +4n
> .nf
> struct statx_timestamp {
>     __s64 tv_sec;    -- Number of seconds before or since 1970
>     __s32 tv_nsec;   -- Number of nanoseconds before or since tv_sec
> };
> .fi
> .in
> .PP
> (Note that reserved space and padding is ommitted)
> .SS
> Invoking \fBstatx\fR():
> .PP
> To access a file's status, no permissions are required on the file itself, but
> in the case of
> .BR statx ()
> with a path, execute (search) permission is required on all of the directories
> in
> .I pathname
> that lead to the file.
> .PP
> .BR statx ()
> uses
> .IR pathname ", " dirfd " and " flags
> to locate the target file in one of a variety of ways:
> .TP
> [*] By absolute path.
> .I pathname
> points to an absolute path and
> .I dirfd
> is ignored.  The file is looked up by name, starting from the root of the
> filesystem as seen by the calling process.
> .TP
> [*] By cwd-relative path.
> .I pathname
> points to a relative path and
> .IR dirfd " is " AT_FDCWD .
> The file is looked up by name, starting from the current working directory.
> .TP
> [*] By dir-relative path.
> .I pathname
> points to relative path and
> .I dirfd
> indicates a file descriptor pointing to a directory.  The file is looked up by
> name, starting from the directory specified by
> .IR dirfd .
> .TP
> [*] By file descriptor.
> .IR pathname " is " NULL " and " dirfd
> indicates a file descriptor.  The file attached to the file descriptor is
> queried directly.  The file descriptor may point to any type of file, not just
> a directory.
> .PP
> .I flags
> can be used to influence a path-based lookup.  A value for
> .I flags
> is constructed by OR'ing together zero or more of the following constants:
> .TP
> .BR AT_EMPTY_PATH " (since Linux 2.6.39)"
> .\" commit 65cfc6722361570bfe255698d9cd4dccaf47570d
> If
> .I pathname
> is an empty string, operate on the file referred to by
> .IR dirfd
> (which may have been obtained using the
> .BR open (2)
> .B O_PATH
> flag).
> If
> .I dirfd
> is
> .BR AT_FDCWD ,
> the call operates on the current working directory.
> In this case,
> .I dirfd
> can refer to any type of file, not just a directory.

Is (flags & AT_EMPTY_PATH) is the same as (pathname == NULL)?

> This flag is Linux-specific; define
> .B _GNU_SOURCE
> .\" Before glibc 2.16, defining _ATFILE_SOURCE sufficed
> to obtain its definition.
> .TP
> .BR AT_NO_AUTOMOUNT " (since Linux 2.6.38)"
> Don't automount the terminal ("basename") component of
> .I pathname
> if it is a directory that is an automount point.
> This allows the caller to gather attributes of an automount point
> (rather than the location it would mount).
> This flag can be used in tools that scan directories
> to prevent mass-automounting of a directory of automount points.
> The
> .B AT_NO_AUTOMOUNT
> flag has no effect if the mount point has already been mounted over.
> This flag is Linux-specific; define
> .B _GNU_SOURCE
> .\" Before glibc 2.16, defining _ATFILE_SOURCE sufficed
> to obtain its definition.
> .TP
> .B AT_SYMLINK_NOFOLLOW
> If
> .I pathname
> is a symbolic link, do not dereference it:
> instead return information about the link itself, like
> .BR lstat ().
> .PP
> .I flags
> can also be used to control what sort of synchronisation the kernel will do
> when querying a file on a remote filesystem.  This is done by OR'ing in one of
> the following values:
> .TP
> AT_STATX_SYNC_AS_STAT
> Do whatever
> .BR stat ()
> does.  This is the default and is very much filesystem specific.
> .TP
> AT_STATX_FORCE_SYNC
> Force the attributes to be synchronised with the server.  This may require that
> a network filesystem perform a data writeback to get the timestamps correct.
> .TP
> AT_STATX_DONT_SYNC
> Don't synchronise anything, but rather just take whatever the system has cached
> if possible.  This may mean that the information returned is approximate, but,
> on a network filesystem, it may not involve a round trip to the server - even
> if no lease is held.
> .PP
> The
> .I mask
> argument to
> .BR statx ()
> is used to tell the kernel which fields the caller is interested in
> .I mask
> is an OR'ed combination of the following constants:
> .PP
> .in +4n
> .TS
> lB l.
> STATX_TYPE	Want stx_mode & S_IFMT
> STATX_MODE	Want stx_mode & ~S_IFMT
> STATX_NLINK	Want stx_nlink
> STATX_UID	Want stx_uid
> STATX_GID	Want stx_gid
> STATX_ATIME	Want stx_atime{,_ns}
> STATX_MTIME	Want stx_mtime{,_ns}
> STATX_CTIME	Want stx_ctime{,_ns}
> STATX_INO	Want stx_ino
> STATX_SIZE	Want stx_size
> STATX_BLOCKS	Want stx_blocks
> STATX_BASIC_STATS	[The stuff in the normal stat struct]
> STATX_BTIME	Want stx_btime{,_ns}
> STATX_ALL	[All currently available stuff]
> .TE
> .in
> .PP
> .B "Do not"
> simply set
> .I mask
> to UINT_MAX as one or more bits may, in future, be used to specify an extension
> to the buffer.
> .SS
> The returned information
> .PP
> The status information for the target file is returned in the
> .I statx
> structure pointed to by
> .IR buf .
> Included in this is
> .I stx_mask
> which indicates what other information has been returned.
> .I stx_mask
> has the same format as the mask argument and bits are set in it to indicate
> which fields have been filled in.
> .PP
> It should be noted that the kernel may return fields that weren't requested and
> may fail to return fields that were requested, depending on what the backing
> filesystem supports.  In either case,
> .I stx_mask
> will not be equal
> .IR mask .
> .PP
> If a filesystem does not support a field or if it has an unrepresentable value
> (for instance, a file with an exotic type), then the mask bit corresponding to
> that field will be cleared in
> .I stx_mask
> even if the user asked for it and a dummy value will be filled in for
> compatibility purposes if one is available (e.g. a dummy uid and gid may be
> specified to mount under some circumstances).
> .PP
> A filesystem may also fill in fields that the caller didn't ask for if it has
> values for them available at no extra cost.  If this happens, the corresponding
> bits will be set in
> .IR stx_mask .
> .PP
> 
> .\" Background: inode attributes are modified with i_mutex held, but
> .\" read by stat() without taking the mutex.
> .I Note:
> For performance and simplicity reasons, different fields in the
> .I statx
> structure may contain state information from different moments
> during the execution of the system call.

Hm.  Judging from the ext4 patch you proposed, I gather this is
expected, at least in the btime case.

--D

> For example, if
> .IR stx_mode
> or
> .IR stx_uid
> is changed by another process by calling
> .BR chmod (2)
> or
> .BR chown (2),
> .BR stat ()
> might return the old
> .I stx_mode
> together with the new
> .IR stx_uid ,
> or the old
> .I stx_uid
> together with the new
> .IR stx_mode .
> .PP
> Apart from stx_mask (which is described above), the fields in the
> .I statx
> structure are:
> .TP
> .I stx_mode
> The file type and mode.  This is described in more detail below.
> .TP
> .I stx_size
> The size of the file (if it is a regular file or a symbolic link) in bytes.
> The size of a symbolic link is the length of the pathname it contains, without
> a terminating null byte.
> .TP
> .I stx_blocks
> The number of blocks allocated to the file on the medium, in 512-byte units.
> (This may be smaller than
> .IR stx_size /512
> when the file has holes.)
> .TP
> .I stx_blksize
> The "preferred" blocksize for efficient filesystem I/O.  (Writing to a file in
> smaller chunks may cause an inefficient read-modify-rewrite.)
> .TP
> .I stx_nlink
> The number of hard links on a file.
> .TP
> .I stx_uid
> The user ID of the file's owner.
> .TP
> .I stx_gid
> The ID of the group that may access the file.
> .TP
> .IR stx_dev_major " and "  stx_dev_minor
> The device on which this file (inode) resides.
> .TP
> .IR stx_rdev_major " and "  stx_rdev_minor
> The device that this file (inode) represents if the file is of block or
> character device type.
> .TP
> .I stx_attributes
> Further status information about the file.  This consists of zero or more of
> the following constants OR'ed together:
> .in +4n
> .TS
> lB l.
> STATX_ATTR_COMPRESSED	File is compressed by the fs
> STATX_ATTR_IMMUTABLE	File is marked immutable
> STATX_ATTR_APPEND	File is append-only
> STATX_ATTR_NODUMP	File is not to be dumped
> STATX_ATTR_ENCRYPTED	File requires key to decrypt in fs
> .TE
> .in
> .TP
> .I stx_atime
> The file's last access timestamp.
> This field is changed by file accesses, for example, by
> .BR execve (2),
> .BR mknod (2),
> .BR pipe (2),
> .BR utime (2),
> and
> .BR read (2)
> (of more than zero bytes).
> Other routines, such as
> .BR mmap (2),
> may or may not update it.
> .TP
> .I stx_btime
> The file's creation timestamp.  This is set on file creation and not changed
> subsequently.
> .TP
> .I stx_ctime
> The file's last status change timestamp.  This field is changed by writing or
> by setting inode information (i.e., owner, group, link count, mode, etc.).
> .TP
> .I stx_mtime
> The file's last modification timestamp.  This is changed by file modifications,
> for example, by
> .BR mknod (2),
> .BR truncate (2),
> .BR utime (2),
> and
> .BR write (2)
> (of more than zero bytes).  Moreover, the modification time of a directory is
> changed by the creation or deletion of files in that directory.  This field is
> .I not
> changed for changes in owner, group, hard link count, or mode.
> 
> 
> 
> .PP
> Not all of the Linux filesystems implement all of the timestamp fields.  Some
> filesystems allow mounting in such a way that file and/or directory accesses do
> not cause an update of the
> .I stx_atime
> field.
> (See
> .IR noatime ,
> .IR nodiratime ,
> and
> .I relatime
> in
> .BR mount (8),
> and related information in
> .BR mount (2).)
> In addition,
> .I stx_atime
> is not updated if a file is opened with the
> .BR O_NOATIME ;
> see
> .BR open (2).
> 
> .SS File type and mode
> .PP
> The
> .I stx_mode
> field contains the combined file type and mode.  POSIX refers to the bits in
> this field corresponding to the mask
> .B S_IFMT
> (see below) as the
> .IR "file type" ,
> the 12 bits corresponding to the mask 07777 as the
> .IR "file mode bits"
> and the least significant 9 bits (0777) as the
> .IR "file permission bits" .
> .IP
> The following mask values are defined for the file type of the
> .I stx_mode
> field:
> .in +4n
> .TS
> lB l l.
> S_IFMT	0170000	bit mask for the file type bit field
> 
> S_IFSOCK	0140000	socket
> S_IFLNK	0120000	symbolic link
> S_IFREG	0100000	regular file
> S_IFBLK	0060000	block device
> S_IFDIR	0040000	directory
> S_IFCHR	0020000	character device
> S_IFIFO	0010000	FIFO
> .TE
> .in
> .IP
> Note that
> .I stx_mode
> has two mask flags covering it: one for the type and one for the mode bits.
> .PP
> To test for a regular file (for example), one could write:
> .nf
> .in +4n
> statx(AT_FDCWD, pathname, 0, STATX_BASIC_STATS, &sb);
> if ((sb.stx_mode & S_IFMT) == S_IFREG) {
>     /* Handle regular file */
> }
> .in
> .fi
> .PP
> Because tests of the above form are common, additional macros are defined by
> POSIX to allow the test of the file type in
> .I stx_mode
> to be written more concisely:
> .RS 4
> .TS
> lB l.
> \fBS_ISREG\fR(m)	Is it a regular file?
> \fBS_ISDIR\fR(m)	Is it a directory?
> \fBS_ISCHR\fR(m)	Is it a character device?
> \fBS_ISBLK\fR(m)	Is it a block device?
> \fBS_ISFIFO\fR(m)	Is it a FIFO (named pipe)?
> \fBS_ISLNK\fR(m)	Is it a symbolic link?  (Not in POSIX.1-1996.)
> \fBS_ISSOCK\fR(m)	Is it a socket?  (Not in POSIX.1-1996.)
> .TE
> .RE
> .PP
> The preceding code snippet could thus be rewritten as:
> 
> .nf
> .in +4n
> statx(AT_FDCWD, pathname, 0, STATX_BASIC_STATS, &sb);
> if (S_ISREG(sb.stx_mode)) {
>     /* Handle regular file */
> }
> .in
> .fi
> .PP
> The definitions of most of the above file type test macros
> are provided if any of the following feature test macros is defined:
> .BR _BSD_SOURCE
> (in glibc 2.19 and earlier),
> .BR _SVID_SOURCE
> (in glibc 2.19 and earlier),
> or
> .BR _DEFAULT_SOURCE
> (in glibc 2.20 and later).
> In addition, definitions of all of the above macros except
> .BR S_IFSOCK
> and
> .BR S_ISSOCK ()
> are provided if
> .BR _XOPEN_SOURCE
> is defined.
> The definition of
> .BR S_IFSOCK
> can also be exposed by defining
> .BR _XOPEN_SOURCE
> with a value of 500 or greater.
> 
> The definition of
> .BR S_ISSOCK ()
> is exposed if any of the following feature test macros is defined:
> .BR _BSD_SOURCE
> (in glibc 2.19 and earlier),
> .BR _DEFAULT_SOURCE
> (in glibc 2.20 and later),
> .BR _XOPEN_SOURCE
> with a value of 500 or greater, or
> .BR _POSIX_C_SOURCE
> with a value of 200112L or greater.
> .PP
> The following mask values are defined for
> the file mode component of the
> .I stx_mode
> field:
> .in +4n
> .TS
> lB l l.
> S_ISUID	  04000	set-user-ID bit
> S_ISGID	  02000	set-group-ID bit (see below)
> S_ISVTX	  01000	sticky bit (see below)
> 
> S_IRWXU	  00700	owner has read, write, and execute permission
> S_IRUSR	  00400	owner has read permission
> S_IWUSR	  00200	owner has write permission
> S_IXUSR	  00100	owner has execute permission
> 
> S_IRWXG	  00070	group has read, write, and execute permission
> S_IRGRP	  00040	group has read permission
> S_IWGRP	  00020	group has write permission
> S_IXGRP	  00010	group has execute permission
> 
> S_IRWXO	  00007	T{
> others (not in group) have read, write, and execute permission
> T}
> S_IROTH	  00004	others have read permission
> S_IWOTH	  00002	others have write permission
> S_IXOTH	  00001	others have execute permission
> .TE
> .in
> .P
> The set-group-ID bit
> .RB ( S_ISGID )
> has several special uses.
> For a directory, it indicates that BSD semantics is to be used
> for that directory: files created there inherit their group ID from
> the directory, not from the effective group ID of the creating process,
> and directories created there will also get the
> .B S_ISGID
> bit set.
> For a file that does not have the group execution bit
> .RB ( S_IXGRP )
> set,
> the set-group-ID bit indicates mandatory file/record locking.
> .P
> The sticky bit
> .RB ( S_ISVTX )
> on a directory means that a file
> in that directory can be renamed or deleted only by the owner
> of the file, by the owner of the directory, and by a privileged
> process.
> 
> 
> .SH RETURN VALUE
> On success, zero is returned.
> On error, \-1 is returned, and
> .I errno
> is set appropriately.
> .SH ERRORS
> .TP
> .B EINVAL
> Invalid flag specified in
> .IR flags .
> .TP
> .B EACCES
> Search permission is denied for one of the directories
> in the path prefix of
> .IR pathname .
> (See also
> .BR path_resolution (7).)
> .TP
> .B EBADF
> .I dirfd
> is not a valid open file descriptor.
> .TP
> .B EFAULT
> Bad address.
> .TP
> .B ELOOP
> Too many symbolic links encountered while traversing the path.
> .TP
> .B ENAMETOOLONG
> .I pathname
> is too long.
> .TP
> .B ENOENT
> A component of
> .I pathname
> does not exist, or
> .I pathname
> is an empty string.
> .TP
> .B ENOMEM
> Out of memory (i.e., kernel memory).
> .TP
> .B ENOTDIR
> A component of the path prefix of
> .I pathname
> is not a directory or
> .I pathname
> is relative and
> .I dirfd
> is a file descriptor referring to a file other than a directory.
> .SH VERSIONS
> .BR statx ()
> was added to Linux in kernel 4.11;
> library support is not yet added to glibc.
> .SH SEE ALSO
> .BR ls (1),
> .BR stat (1),
> .BR access (2),
> .BR chmod (2),
> .BR chown (2),
> .BR readlink (2),
> .BR utime (2),
> .BR capabilities (7),
> .BR symlink (7)

David Howells March 7, 2017, 6:39 p.m. UTC | #5

Darrick J. Wong <darrick.wong@oracle.com> wrote:

> Is (flags & AT_EMPTY_PATH) is the same as (pathname == NULL)?

I'm not sure.  I was wondering about that whilst writing this.  The text
describing the option is copied from the stat(2) manpage.  It specified an
empty path (presumably "") rather than NULL.  I'm also not sure whether it has
other ramifications.

> > .I Note:
> > For performance and simplicity reasons, different fields in the
> > .I statx
> > structure may contain state information from different moments
> > during the execution of the system call.
> 
> Hm.  Judging from the ext4 patch you proposed, I gather this is
> expected, at least in the btime case.

This is copied from the stat(2) manpage.  The comment in the source there says
it's because there's no locking in the getattr path against the setattr path.

David

Darrick J. Wong March 7, 2017, 6:44 p.m. UTC | #6

On Tue, Mar 07, 2017 at 06:39:42PM +0000, David Howells wrote:
> Darrick J. Wong <darrick.wong@oracle.com> wrote:
> 
> > Is (flags & AT_EMPTY_PATH) is the same as (pathname == NULL)?
> 
> I'm not sure.  I was wondering about that whilst writing this.  The text
> describing the option is copied from the stat(2) manpage.  It specified an
> empty path (presumably "") rather than NULL.  I'm also not sure whether it has
> other ramifications.

<nod>

> > > .I Note:
> > > For performance and simplicity reasons, different fields in the
> > > .I statx
> > > structure may contain state information from different moments
> > > during the execution of the system call.
> > 
> > Hm.  Judging from the ext4 patch you proposed, I gather this is
> > expected, at least in the btime case.

Ugh, I put that in the wrong part -- this was supposed to be a comment
about the part where the manpage states that filesystems can return more
than userspace asked for.  My eyes are bad at reading manpage source. :/

(Sorry for the noise.)

--D

> This is copied from the stat(2) manpage.  The comment in the source there says
> it's because there's no locking in the getattr path against the setattr path.
> 
> David

David Howells March 7, 2017, 6:55 p.m. UTC | #7

Darrick J. Wong <darrick.wong@oracle.com> wrote:

> > > Hm.  Judging from the ext4 patch you proposed, I gather this is
> > > expected, at least in the btime case.
> 
> Ugh, I put that in the wrong part -- this was supposed to be a comment
> about the part where the manpage states that filesystems can return more
> than userspace asked for.

In that case, sort of.  You could just ask for STATX_TYPE | STATX_MODE, for
example, but the kernel is perfectly at liberty to add STATX_MTIME,
STATX_SIZE, STATX_INO, etc. even if it doesn't add STATX_BTIME as well.

> My eyes are bad at reading manpage source. :/

I know what you mean.

David

Andreas Dilger March 7, 2017, 9:44 p.m. UTC | #8

> On Mar 7, 2017, at 10:22 AM, David Howells <dhowells@redhat.com> wrote:
> 
> Christoph Hellwig <hch@infradead.org> wrote:
> 
>> This would be great to have in 4.11 together with the initial statx
>> implementation.  But until I see documentation and testcases for statx
>> I don't really feel comfortable reviewing anything related to it.
> 
> Well, since you asked for documentation, here's a manual page for you to
> review:-)
> 
> Note that as it isn't in glibc yet, I've left out all the
> set-this-and-that-#define-to-make-it-appear stuff except where it is pertinent
> to particular constants.
> 
> I don't suppose you know where the documentation on writing xfstests tests is?
> xfstests-dev/doc/ only contains an old and out of date changelog.
> 
> David
> ---
> '\" t
> .\" Copyright (c) 1992 Drew Eckhardt (drew@cs.colorado.edu), March 28, 1992
> .\" Parts Copyright (c) 1995 Nicolai Langfeldt (janl@ifi.uio.no), 1/1/95
> .\" and Copyright (c) 2006, 2007, 2014 Michael Kerrisk <mtk.manpages@gmail.com>
> .\" and Copyright (c) 2017 David Howells <dhowells@redhat.com>
> .\"
> .\" %%%LICENSE_START(VERBATIM)
> .\" Permission is granted to make and distribute verbatim copies of this
> .\" manual provided the copyright notice and this permission notice are
> .\" preserved on all copies.
> .\"
> .\" Permission is granted to copy and distribute modified versions of this
> .\" manual under the conditions for verbatim copying, provided that the
> .\" entire resulting derived work is distributed under the terms of a
> .\" permission notice identical to this one.
> .\"
> .\" Since the Linux kernel and libraries are constantly changing, this
> .\" manual page may be incorrect or out-of-date.  The author(s) assume no
> .\" responsibility for errors or omissions, or for damages resulting from
> .\" the use of the information contained herein.  The author(s) may not
> .\" have taken the same level of care in the production of this manual,
> .\" which is licensed free of charge, as they might when working
> .\" professionally.
> .\"
> .\" Formatted or processed versions of this manual, if unaccompanied by
> .\" the source, must acknowledge the copyright and authors of this work.
> .\" %%%LICENSE_END
> .\"
> .TH STATX 2 2017-03-07 "Linux" "Linux Programmer's Manual"
> .SH NAME
> statx \- Get file status (extended)
> .SH SYNOPSIS
> .nf
> .B #include <sys/types.h>
> .br
> .B #include <sys/stat.h>
> .br
> .B #include <unistd.h>
> .br
> .BR "#include <fcntl.h>           " "/* Definition of AT_* constants */"
> .sp
> .BI "int statx(int " dirfd ", const char *" pathname ", int " flags ","
> .BI "          unsigned int " mask ", struct statx *" buf );
> .fi
> .sp
> .in -4n
> Feature Test Macro Requirements for glibc (see
> .BR feature_test_macros (7)):
> .in
> .ad l
> .PD 0
> .sp
> .RS 4
> <unknown as yet>
> .RE
> .PD
> .ad
> .SH DESCRIPTION
> .PP
> This function returns information about a file, storing it in the buffer
> pointed to by
> .IR buf .
> The buffer is filled in according to the following type:
> .PP
> .in +4n
> .nf
> struct statx {
>    __u32 stx_mask;          -- Mask of bits indicating filled fields
>    __u32 stx_blksize;       -- Block size for filesystem I/O
>    __u64 stx_attributes;    -- Extra file attribute indicators
>    __u32 stx_nlink;         -- Number of hard links
>    __u32 stx_uid;           -- User ID of owner
>    __u32 stx_gid;           -- Group ID of owner
>    __u16 stx_mode;          -- File type and mode
>    __u64 stx_ino;           -- Inode number
>    __u64 stx_size;          -- Total size in bytes
>    __u64 stx_blocks;        -- Number of 512B blocks allocated
>    struct statx_timestamp stx_atime;  -- Time of last access
>    struct statx_timestamp stx_btime;  -- Time of creation
>    struct statx_timestamp stx_ctime;  -- Time of last status change
>    struct statx_timestamp stx_mtime;  -- Time of last modification
>    __u32 stx_rdev_major;    } Device number if device file
>    __u32 stx_rdev_minor;    }
>    __u32 stx_dev_major;      } Device number of containing file
>    __u32 stx_dev_minor;      }
> };
> .fi
> .in
> .PP
> Where the timestamps are defined as:
> .PP
> .in +4n
> .nf
> struct statx_timestamp {
>    __s64 tv_sec;    -- Number of seconds before or since 1970
>    __s32 tv_nsec;   -- Number of nanoseconds before or since tv_sec
> };
> .fi
> .in
> .PP
> (Note that reserved space and padding is ommitted)

Do you think that not including the padding could be problematic for users?

> .SS
> Invoking \fBstatx\fR():
> .PP
> To access a file's status, no permissions are required on the file itself, but
> in the case of
> .BR statx ()
> with a path, execute (search) permission is required on all of the directories
> in
> .I pathname
> that lead to the file.
> .PP
> .BR statx ()
> uses
> .IR pathname ", " dirfd " and " flags
> to locate the target file in one of a variety of ways:
> .TP
> [*] By absolute path.
> .I pathname
> points to an absolute path and
> .I dirfd
> is ignored.  The file is looked up by name, starting from the root of the
> filesystem as seen by the calling process.
> .TP
> [*] By cwd-relative path.
> .I pathname
> points to a relative path and
> .IR dirfd " is " AT_FDCWD .
> The file is looked up by name, starting from the current working directory.
> .TP
> [*] By dir-relative path.
> .I pathname
> points to relative path and
> .I dirfd
> indicates a file descriptor pointing to a directory.  The file is looked up by
> name, starting from the directory specified by
> .IR dirfd .
> .TP
> [*] By file descriptor.
> .IR pathname " is " NULL " and " dirfd
> indicates a file descriptor.  The file attached to the file descriptor is
> queried directly.  The file descriptor may point to any type of file, not just
> a directory.
> .PP
> .I flags
> can be used to influence a path-based lookup.  A value for
> .I flags
> is constructed by OR'ing together zero or more of the following constants:
> .TP
> .BR AT_EMPTY_PATH " (since Linux 2.6.39)"
> .\" commit 65cfc6722361570bfe255698d9cd4dccaf47570d
> If
> .I pathname
> is an empty string, operate on the file referred to by
> .IR dirfd
> (which may have been obtained using the
> .BR open (2)
> .B O_PATH
> flag).
> If
> .I dirfd
> is
> .BR AT_FDCWD ,
> the call operates on the current working directory.
> In this case,
> .I dirfd
> can refer to any type of file, not just a directory.
> This flag is Linux-specific; define
> .B _GNU_SOURCE
> .\" Before glibc 2.16, defining _ATFILE_SOURCE sufficed
> to obtain its definition.
> .TP
> .BR AT_NO_AUTOMOUNT " (since Linux 2.6.38)"
> Don't automount the terminal ("basename") component of
> .I pathname
> if it is a directory that is an automount point.
> This allows the caller to gather attributes of an automount point
> (rather than the location it would mount).
> This flag can be used in tools that scan directories
> to prevent mass-automounting of a directory of automount points.
> The
> .B AT_NO_AUTOMOUNT
> flag has no effect if the mount point has already been mounted over.
> This flag is Linux-specific; define
> .B _GNU_SOURCE
> .\" Before glibc 2.16, defining _ATFILE_SOURCE sufficed
> to obtain its definition.
> .TP
> .B AT_SYMLINK_NOFOLLOW
> If
> .I pathname
> is a symbolic link, do not dereference it:
> instead return information about the link itself, like
> .BR lstat ().
> .PP
> .I flags
> can also be used to control what sort of synchronisation the kernel will do
> when querying a file on a remote filesystem.  This is done by OR'ing in one of
> the following values:
> .TP
> AT_STATX_SYNC_AS_STAT
> Do whatever
> .BR stat ()
> does.  This is the default and is very much filesystem specific.
> .TP
> AT_STATX_FORCE_SYNC
> Force the attributes to be synchronised with the server.  This may require that
> a network filesystem perform a data writeback to get the timestamps correct.
> .TP
> AT_STATX_DONT_SYNC
> Don't synchronise anything, but rather just take whatever the system has cached
> if possible.  This may mean that the information returned is approximate, but,
> on a network filesystem, it may not involve a round trip to the server - even
> if no lease is held.
> .PP
> The
> .I mask
> argument to
> .BR statx ()
> is used to tell the kernel which fields the caller is interested in
> .I mask
> is an OR'ed combination of the following constants:
> .PP
> .in +4n
> .TS
> lB l.
> STATX_TYPE	Want stx_mode & S_IFMT
> STATX_MODE	Want stx_mode & ~S_IFMT
> STATX_NLINK	Want stx_nlink
> STATX_UID	Want stx_uid
> STATX_GID	Want stx_gid
> STATX_ATIME	Want stx_atime{,_ns}
> STATX_MTIME	Want stx_mtime{,_ns}
> STATX_CTIME	Want stx_ctime{,_ns}
> STATX_INO	Want stx_ino
> STATX_SIZE	Want stx_size
> STATX_BLOCKS	Want stx_blocks
> STATX_BASIC_STATS	[The stuff in the normal stat struct]
> STATX_BTIME	Want stx_btime{,_ns}
> STATX_ALL	[All currently available stuff]
> .TE
> .in
> .PP
> .B "Do not"
> simply set
> .I mask
> to UINT_MAX as one or more bits may, in future, be used to specify an extension
> to the buffer.
> .SS
> The returned information
> .PP
> The status information for the target file is returned in the
> .I statx
> structure pointed to by
> .IR buf .
> Included in this is
> .I stx_mask
> which indicates what other information has been returned.
> .I stx_mask
> has the same format as the mask argument and bits are set in it to indicate
> which fields have been filled in.
> .PP
> It should be noted that the kernel may return fields that weren't requested and
> may fail to return fields that were requested, depending on what the backing
> filesystem supports.  In either case,
> .I stx_mask
> will not be equal
> .IR mask .
> .PP
> If a filesystem does not support a field or if it has an unrepresentable value
> (for instance, a file with an exotic type), then the mask bit corresponding to
> that field will be cleared in
> .I stx_mask
> even if the user asked for it and a dummy value will be filled in for
> compatibility purposes if one is available (e.g. a dummy uid and gid may be
> specified to mount under some circumstances).
> .PP
> A filesystem may also fill in fields that the caller didn't ask for if it has
> values for them available at no extra cost.  If this happens, the corresponding
> bits will be set in
> .IR stx_mask .
> .PP
> 
> .\" Background: inode attributes are modified with i_mutex held, but
> .\" read by stat() without taking the mutex.
> .I Note:
> For performance and simplicity reasons, different fields in the
> .I statx
> structure may contain state information from different moments
> during the execution of the system call.
> For example, if
> .IR stx_mode
> or
> .IR stx_uid
> is changed by another process by calling
> .BR chmod (2)
> or
> .BR chown (2),
> .BR stat ()
> might return the old
> .I stx_mode
> together with the new
> .IR stx_uid ,
> or the old
> .I stx_uid
> together with the new
> .IR stx_mode .
> .PP
> Apart from stx_mask (which is described above), the fields in the
> .I statx
> structure are:
> .TP
> .I stx_mode
> The file type and mode.  This is described in more detail below.
> .TP
> .I stx_size
> The size of the file (if it is a regular file or a symbolic link) in bytes.
> The size of a symbolic link is the length of the pathname it contains, without
> a terminating null byte.
> .TP
> .I stx_blocks
> The number of blocks allocated to the file on the medium, in 512-byte units.
> (This may be smaller than
> .IR stx_size /512
> when the file has holes.)
> .TP
> .I stx_blksize
> The "preferred" blocksize for efficient filesystem I/O.  (Writing to a file in
> smaller chunks may cause an inefficient read-modify-rewrite.)
> .TP
> .I stx_nlink
> The number of hard links on a file.
> .TP
> .I stx_uid
> The user ID of the file's owner.
> .TP
> .I stx_gid
> The ID of the group that may access the file.
> .TP
> .IR stx_dev_major " and "  stx_dev_minor
> The device on which this file (inode) resides.
> .TP
> .IR stx_rdev_major " and "  stx_rdev_minor
> The device that this file (inode) represents if the file is of block or
> character device type.
> .TP
> .I stx_attributes
> Further status information about the file.  This consists of zero or more of
> the following constants OR'ed together:
> .in +4n
> .TS
> lB l.
> STATX_ATTR_COMPRESSED	File is compressed by the fs
> STATX_ATTR_IMMUTABLE	File is marked immutable

This definition is circular...  Maybe "File cannot be changed once closed"?

> STATX_ATTR_APPEND	File is append-only

This is also circular.  Maybe "file can only be written after .B stx_size"?

> STATX_ATTR_NODUMP	File is not to be dumped

This one is also circular.  Maybe "not to be included in backups"?

> STATX_ATTR_ENCRYPTED	File requires key to decrypt in fs
> .TE
> .in
> .TP
> .I stx_atime
> The file's last access timestamp.
> This field is changed by file accesses, for example, by
> .BR execve (2),
> .BR mknod (2),
> .BR pipe (2),
> .BR utime (2),
> and
> .BR read (2)
> (of more than zero bytes).
> Other routines, such as
> .BR mmap (2),
> may or may not update it.
> .TP
> .I stx_btime
> The file's creation timestamp.  This is set on file creation and not changed
> subsequently.
> .TP
> .I stx_ctime
> The file's last status change timestamp.  This field is changed by writing or
> by setting inode information (i.e., owner, group, link count, mode, etc.).
> .TP
> .I stx_mtime
> The file's last modification timestamp.  This is changed by file modifications,
> for example, by
> .BR mknod (2),
> .BR truncate (2),
> .BR utime (2),
> and
> .BR write (2)
> (of more than zero bytes).  Moreover, the modification time of a directory is
> changed by the creation or deletion of files in that directory.  This field is
> .I not
> changed for changes in owner, group, hard link count, or mode.
> 
> 
> 
> .PP
> Not all of the Linux filesystems implement all of the timestamp fields.  Some
> filesystems allow mounting in such a way that file and/or directory accesses do
> not cause an update of the
> .I stx_atime
> field.
> (See
> .IR noatime ,
> .IR nodiratime ,
> and
> .I relatime
> in
> .BR mount (8),
> and related information in
> .BR mount (2).)
> In addition,
> .I stx_atime
> is not updated if a file is opened with the
> .BR O_NOATIME ;
> see
> .BR open (2).
> 
> .SS File type and mode
> .PP
> The
> .I stx_mode
> field contains the combined file type and mode.  POSIX refers to the bits in
> this field corresponding to the mask
> .B S_IFMT
> (see below) as the
> .IR "file type" ,
> the 12 bits corresponding to the mask 07777 as the
> .IR "file mode bits"
> and the least significant 9 bits (0777) as the
> .IR "file permission bits" .
> .IP
> The following mask values are defined for the file type of the
> .I stx_mode
> field:
> .in +4n
> .TS
> lB l l.
> S_IFMT	0170000	bit mask for the file type bit field
> 
> S_IFSOCK	0140000	socket
> S_IFLNK	0120000	symbolic link
> S_IFREG	0100000	regular file
> S_IFBLK	0060000	block device
> S_IFDIR	0040000	directory
> S_IFCHR	0020000	character device
> S_IFIFO	0010000	FIFO
> .TE
> .in
> .IP
> Note that
> .I stx_mode
> has two mask flags covering it: one for the type and one for the mode bits.
> .PP
> To test for a regular file (for example), one could write:
> .nf
> .in +4n
> statx(AT_FDCWD, pathname, 0, STATX_BASIC_STATS, &sb);
> if ((sb.stx_mode & S_IFMT) == S_IFREG) {
>    /* Handle regular file */
> }

Two notes on these two examples:
- users unfamiliar with statx() may benefit from seeing STATX_TYPE used
  here instead of STATX_BASIC_STATS
- the check should also look for the presence of STATX_TYPE in the
  returned stx_mask to ensure it is valid before use?


   To test whether a path is a regular file (for example), one could write:
   .nf
   .in +4n
   rc = statx(AT_FDCWD, pathname, 0, STATX_TYPE, &stx);
   if (stx.stx_mask & STATX_TYPE && S_ISREG(sb.stx_mode)) {
      /* Handle regular file */
   }



> Because tests of the above form are common, additional macros are defined by
> POSIX to allow the test of the file type in
> .I stx_mode
> to be written more concisely:

Should this all just reference the existing stat(2) man page instead of
duplicating the whole contents here?  This is spending a lot of space
discussing the stx_mode field which could be avoided.

> .RS 4
> .TS
> lB l.
> \fBS_ISREG\fR(m)	Is it a regular file?
> \fBS_ISDIR\fR(m)	Is it a directory?
> \fBS_ISCHR\fR(m)	Is it a character device?
> \fBS_ISBLK\fR(m)	Is it a block device?
> \fBS_ISFIFO\fR(m)	Is it a FIFO (named pipe)?
> \fBS_ISLNK\fR(m)	Is it a symbolic link?  (Not in POSIX.1-1996.)
> \fBS_ISSOCK\fR(m)	Is it a socket?  (Not in POSIX.1-1996.)
> .TE
> .RE
> .PP
> The preceding code snippet could thus be rewritten as:
> 
> .nf
> .in +4n
> statx(AT_FDCWD, pathname, 0, STATX_BASIC_STATS, &sb);
> if (S_ISREG(sb.stx_mode)) {
>    /* Handle regular file */
> }
> .in
> .fi
> .PP
> The definitions of most of the above file type test macros
> are provided if any of the following feature test macros is defined:
> .BR _BSD_SOURCE
> (in glibc 2.19 and earlier),
> .BR _SVID_SOURCE
> (in glibc 2.19 and earlier),
> or
> .BR _DEFAULT_SOURCE
> (in glibc 2.20 and later).
> In addition, definitions of all of the above macros except
> .BR S_IFSOCK
> and
> .BR S_ISSOCK ()
> are provided if
> .BR _XOPEN_SOURCE
> is defined.
> The definition of
> .BR S_IFSOCK
> can also be exposed by defining
> .BR _XOPEN_SOURCE
> with a value of 500 or greater.
> 
> The definition of
> .BR S_ISSOCK ()
> is exposed if any of the following feature test macros is defined:
> .BR _BSD_SOURCE
> (in glibc 2.19 and earlier),
> .BR _DEFAULT_SOURCE
> (in glibc 2.20 and later),
> .BR _XOPEN_SOURCE
> with a value of 500 or greater, or
> .BR _POSIX_C_SOURCE
> with a value of 200112L or greater.
> .PP
> The following mask values are defined for
> the file mode component of the
> .I stx_mode
> field:
> .in +4n
> .TS
> lB l l.
> S_ISUID	  04000	set-user-ID bit
> S_ISGID	  02000	set-group-ID bit (see below)
> S_ISVTX	  01000	sticky bit (see below)
> 
> S_IRWXU	  00700	owner has read, write, and execute permission
> S_IRUSR	  00400	owner has read permission
> S_IWUSR	  00200	owner has write permission
> S_IXUSR	  00100	owner has execute permission
> 
> S_IRWXG	  00070	group has read, write, and execute permission
> S_IRGRP	  00040	group has read permission
> S_IWGRP	  00020	group has write permission
> S_IXGRP	  00010	group has execute permission
> 
> S_IRWXO	  00007	T{
> others (not in group) have read, write, and execute permission
> T}
> S_IROTH	  00004	others have read permission
> S_IWOTH	  00002	others have write permission
> S_IXOTH	  00001	others have execute permission
> .TE
> .in
> .P
> The set-group-ID bit
> .RB ( S_ISGID )
> has several special uses.
> For a directory, it indicates that BSD semantics is to be used
> for that directory: files created there inherit their group ID from
> the directory, not from the effective group ID of the creating process,
> and directories created there will also get the
> .B S_ISGID
> bit set.
> For a file that does not have the group execution bit
> .RB ( S_IXGRP )
> set,
> the set-group-ID bit indicates mandatory file/record locking.
> .P
> The sticky bit
> .RB ( S_ISVTX )
> on a directory means that a file
> in that directory can be renamed or deleted only by the owner
> of the file, by the owner of the directory, and by a privileged
> process.
> 
> 
> .SH RETURN VALUE
> On success, zero is returned.
> On error, \-1 is returned, and
> .I errno
> is set appropriately.
> .SH ERRORS
> .TP
> .B EINVAL
> Invalid flag specified in
> .IR flags .
> .TP
> .B EACCES
> Search permission is denied for one of the directories
> in the path prefix of
> .IR pathname .
> (See also
> .BR path_resolution (7).)
> .TP
> .B EBADF
> .I dirfd
> is not a valid open file descriptor.
> .TP
> .B EFAULT
> Bad address.
> .TP
> .B ELOOP
> Too many symbolic links encountered while traversing the path.
> .TP
> .B ENAMETOOLONG
> .I pathname
> is too long.
> .TP
> .B ENOENT
> A component of
> .I pathname
> does not exist, or
> .I pathname
> is an empty string.
> .TP
> .B ENOMEM
> Out of memory (i.e., kernel memory).
> .TP
> .B ENOTDIR
> A component of the path prefix of
> .I pathname
> is not a directory or
> .I pathname
> is relative and
> .I dirfd
> is a file descriptor referring to a file other than a directory.
> .SH VERSIONS
> .BR statx ()
> was added to Linux in kernel 4.11;
> library support is not yet added to glibc.
> .SH SEE ALSO
> .BR ls (1),
> .BR stat (1),

Maybe stat(2) ? :-)

> .BR access (2),
> .BR chmod (2),
> .BR chown (2),
> .BR readlink (2),
> .BR utime (2),
> .BR capabilities (7),
> .BR symlink (7)


Cheers, Andreas

Eric Biggers March 7, 2017, 10:55 p.m. UTC | #9

On Tue, Mar 07, 2017 at 05:22:55PM +0000, David Howells wrote:
> .fi
> .in
> .PP
> (Note that reserved space and padding is ommitted)

ommitted => omitted

> .TP
> .BR AT_EMPTY_PATH " (since Linux 2.6.39)"
...
> .TP
> .BR AT_NO_AUTOMOUNT " (since Linux 2.6.38)"

Given that statx is just being added now in 4.11, saying "since Linux 2.6.39"
and "since Linux 2.6.38" for these flags seems unneeded.

> The
> .I mask
> argument to
> .BR statx ()
> is used to tell the kernel which fields the caller is interested in
> .I mask
> is an OR'ed combination of the following constants:

Missing a period between "interested in" and "mask".

> .B "Do not"
> simply set
> .I mask
> to UINT_MAX as one or more bits may, in future, be used to specify an extension
> to the buffer.

"in future" => "in the future".

> .I Note:
> For performance and simplicity reasons, different fields in the
> .I statx
> structure may contain state information from different moments
> during the execution of the system call.
> For example, if
> .IR stx_mode
> or
> .IR stx_uid
> is changed by another process by calling
> .BR chmod (2)
> or
> .BR chown (2),
> .BR stat ()
> might return the old
> .I stx_mode
> together with the new
> .IR stx_uid ,
> or the old
> .I stx_uid
> together with the new
> .IR stx_mode .

This example is confusing because chmod() doesn't change st_uid, and chown()
doesn't ordinarily change st_mode.  A better example would be that if chown() is
used to change both the owner and group, it's possible that statx() would return
the new owner paired with the old group, or the new group paired with the old
owner.  Or, the new owner and group could be observed before the new ctime.
Also, it seems this note belongs in a BUGS section of the man page, since it is
a bug.

> .SH ERRORS
> .TP
> .B EINVAL
> Invalid flag specified in
> .IR flags .

Actually, the behavior when specifying invalid flags differs depending on
whether a path is specified.

If a path is specified, then statx() fails with EINVAL if unrecognized flags are
specified.

However, the behavior differs when no path is specified.  For example, the the
following call with invalid flags succeeds:

	statx(fd, NULL, 0xFFFF0000, 0, &stbuf);

I think this is a bug, and it should be fixed by applying the same 'flags'
validation regardless of whether a path is specified.

> .B ENOENT
> A component of
> .I pathname
> does not exist, or
> .I pathname
> is an empty string.

is an empty string and AT_EMPTY_PATH was not specified in flags.

> .SH SEE ALSO
> .BR ls (1),
> .BR stat (1),

Shouldn't this link to stat (2), not stat (1)?

- Eric

Eryu Guan March 8, 2017, 3:45 a.m. UTC | #10

On Tue, Mar 07, 2017 at 05:22:55PM +0000, David Howells wrote:
> Christoph Hellwig <hch@infradead.org> wrote:
> 
> > This would be great to have in 4.11 together with the initial statx
> > implementation.  But until I see documentation and testcases for statx
> > I don't really feel comfortable reviewing anything related to it.
> 
> Well, since you asked for documentation, here's a manual page for you to
> review:-)
> 
> Note that as it isn't in glibc yet, I've left out all the
> set-this-and-that-#define-to-make-it-appear stuff except where it is pertinent
> to particular constants.
> 
> I don't suppose you know where the documentation on writing xfstests tests is?
> xfstests-dev/doc/ only contains an old and out of date changelog.

There's a "ADDING TO THE FSQA SUITE" section in README file, basically
it's telling you to use './new' script to generate new test template. So
based on what type of new test you're going to create, you could run:

./new generic	# new generic test
./new xfs	# new xfs test
./new ext4	# new ext4 test
./new shared	# new shared test
...

It finds the first free seq number and creates test template and a
default .out file in tests/<type> dir and add a new entry in
tests/<type>/group file.  You could just edit the template and update
the .out file with expected output from the test run.

E.g. currently 416 is the first free seq number in generic tests,
./new generic will create

./tests/generic/416
./tests/generic/416.out

I think statx test should be a new generic test.

Thanks,
Eryu

David Howells March 8, 2017, 9:24 a.m. UTC | #11

Andreas Dilger <adilger@dilger.ca> wrote:

> > (Note that reserved space and padding is ommitted)
> 
> Do you think that not including the padding could be problematic for users?

Interesting question.  Do you think it would be?  I don't see why it should be
a problem, since they shouldn't be touching it anyway.  Also, the stat(2)
manpage also gives the same warning and omits padding and doesn't seem to have
been amended.

But if you would prefer, I can put it back.

The only reason I can really see for explicitly including it is to say that
this is where future extensions will go.

> > STATX_ATTR_IMMUTABLE	File is marked immutable
> ...
> > STATX_ATTR_APPEND	File is append-only

I should copy the definitions in chattr(1).

>    To test whether a path is a regular file (for example), one could write:
>    .nf
>    .in +4n
>    rc = statx(AT_FDCWD, pathname, 0, STATX_TYPE, &stx);
>    if (stx.stx_mask & STATX_TYPE && S_ISREG(sb.stx_mode)) {
>       /* Handle regular file */
>    }

Good idea.

> > Because tests of the above form are common, additional macros are defined by
> > POSIX to allow the test of the file type in
> > .I stx_mode
> > to be written more concisely:
> 
> Should this all just reference the existing stat(2) man page instead of
> duplicating the whole contents here?  This is spending a lot of space
> discussing the stx_mode field which could be avoided.

Possibly.  On the other hand, it means that everything you need to refer to is
in one page from the user's PoV.  I'm not sure what the best policy on this
is.

If I do defer to stat(2), I do need to make some sort of note that the
examples are different and the field name is stx_mask, not st_mask.

David

David Howells March 8, 2017, 9:41 a.m. UTC | #12

Eric Biggers <ebiggers3@gmail.com> wrote:

> > .I Note:
> > For performance and simplicity reasons, different fields in the
> > .I statx
> > structure may contain state information from different moments
> > during the execution of the system call.
> > For example, if
> > .IR stx_mode
> > or
> > .IR stx_uid
> > is changed by another process by calling
> > .BR chmod (2)
> > or
> > .BR chown (2),
> > .BR stat ()
> > might return the old
> > .I stx_mode
> > together with the new
> > .IR stx_uid ,
> > or the old
> > .I stx_uid
> > together with the new
> > .IR stx_mode .
> 
> This example is confusing because chmod() doesn't change st_uid, and chown()
> doesn't ordinarily change st_mode.

Doesn't chown() clear the S_ISUID and S_ISGID flags?

> A better example would be that if chown() is used to change both the owner
> and group, it's possible that statx() would return the new owner paired with
> the old group, or the new group paired with the old owner.  Or, the new
> owner and group could be observed before the new ctime.

This is copied more or less verbatim from stat(2), so what you said applies
there too.

> Also, it seems this note belongs in a BUGS section of the man page, since it
> is a bug.

No, it's not a bug, exactly.  It's a design decision.  It might be worth
adding an AT_STAT_LOCK flag to synchronise against attribute setting, but it
will cost you something in terms of performance - and even then, over a
network filesystem it might not achieve anything.

> > .SH ERRORS
> > .TP
> > .B EINVAL
> > Invalid flag specified in
> > .IR flags .
> 
> Actually, the behavior when specifying invalid flags differs depending on
> whether a path is specified.

Good point.  Should I reject AT_EMPTY_PATH, AT_NO_AUTOMOUNT and
AT_SYMLINK_NOFOLLOW if pathname is NULL, I wonder?

> > .B ENOENT
> > A component of
> > .I pathname
> > does not exist, or
> > .I pathname
> > is an empty string.
> 
> is an empty string and AT_EMPTY_PATH was not specified in flags.

That needs fixing in stat(2) also.

> > .SH SEE ALSO
> > .BR ls (1),
> > .BR stat (1),
> 
> Shouldn't this link to stat (2), not stat (1)?

Depends whether stat(1) ends up using statx() or not.  I can take it out for
now.

David

Jeff Layton March 8, 2017, 3:26 p.m. UTC | #13

On Wed, 2017-03-08 at 09:24 +0000, David Howells wrote:
> Andreas Dilger <adilger@dilger.ca> wrote:
> 
> > > (Note that reserved space and padding is ommitted)
> > 
> > Do you think that not including the padding could be problematic for users?
> 
> Interesting question.  Do you think it would be?  I don't see why it should be
> a problem, since they shouldn't be touching it anyway.  Also, the stat(2)
> manpage also gives the same warning and omits padding and doesn't seem to have
> been amended.
> 
> But if you would prefer, I can put it back.
> 
> The only reason I can really see for explicitly including it is to say that
> this is where future extensions will go.
> 

It seems like we really ought to have 2 statx manpages:

A small statx(2) manpage that describes the kernel<->userland interface
(showing structs with padding), and then the "real" statx(3) manpage
that shows the glibc userland API.

The kernel<->userland one could be very small, and just say "This is not
the function you're interested in. Look at statx(3) for the C library
interface.", similar to what readdir(2) does.

> > > STATX_ATTR_IMMUTABLE	File is marked immutable
> > 
> > ...
> > > STATX_ATTR_APPEND	File is append-only
> 
> I should copy the definitions in chattr(1).
> 
> >    To test whether a path is a regular file (for example), one could write:
> >    .nf
> >    .in +4n
> >    rc = statx(AT_FDCWD, pathname, 0, STATX_TYPE, &stx);
> >    if (stx.stx_mask & STATX_TYPE && S_ISREG(sb.stx_mode)) {
> >       /* Handle regular file */
> >    }
> 
> Good idea.

Note that & and && have similar precedence, IIRC, so you probably want
to wrap that flag check in parenthesis.

> 
> > > Because tests of the above form are common, additional macros are defined by
> > > POSIX to allow the test of the file type in
> > > .I stx_mode
> > > to be written more concisely:
> > 
> > Should this all just reference the existing stat(2) man page instead of
> > duplicating the whole contents here?  This is spending a lot of space
> > discussing the stx_mode field which could be avoided.
> 
> Possibly.  On the other hand, it means that everything you need to refer to is
> in one page from the user's PoV.  I'm not sure what the best policy on this
> is.
> 
> If I do defer to stat(2), I do need to make some sort of note that the
> examples are different and the field name is stx_mask, not st_mask.
> 

If the plan is to eventually (in some far away future) to deprecate
stat(), then we probably don't want this manpage to refer to it as a
canonical source of information. I say duplicate it here.

It might also not hurt to consider updating the stat(2) manpage with a
NOTE that points to the statx manpage as well.

David Howells March 9, 2017, 6:46 a.m. UTC | #14

Eryu Guan <eguan@redhat.com> wrote:

> There's a "ADDING TO THE FSQA SUITE" section in README file, basically
> it's telling you to use './new' script to generate new test template. So
> based on what type of new test you're going to create, you could run:
> 
> ./new generic	# new generic test
> ./new xfs	# new xfs test
> ./new ext4	# new ext4 test
> ./new shared	# new shared test
> ...
> 
> It finds the first free seq number and creates test template and a
> default .out file in tests/<type> dir and add a new entry in
> tests/<type>/group file.  You could just edit the template and update
> the .out file with expected output from the test run.

Yes, I'd spotted the ./new script, but it's not as simple as that since shell
scripts can't call system calls directly.

David

Eryu Guan March 9, 2017, 6:59 a.m. UTC | #15

On Thu, Mar 09, 2017 at 06:46:48AM +0000, David Howells wrote:
> Eryu Guan <eguan@redhat.com> wrote:
> 
> > There's a "ADDING TO THE FSQA SUITE" section in README file, basically
> > it's telling you to use './new' script to generate new test template. So
> > based on what type of new test you're going to create, you could run:
> > 
> > ./new generic	# new generic test
> > ./new xfs	# new xfs test
> > ./new ext4	# new ext4 test
> > ./new shared	# new shared test
> > ...
> > 
> > It finds the first free seq number and creates test template and a
> > default .out file in tests/<type> dir and add a new entry in
> > tests/<type>/group file.  You could just edit the template and update
> > the .out file with expected output from the test run.
> 
> Yes, I'd spotted the ./new script, but it's not as simple as that since shell
> scripts can't call system calls directly.

A normal way we take for fstests is adding new syscall support to xfs_io
first, then writing new tests based on it, as some other new
syscalls/utilities do, e.g. set|get_encpolicy were added to xfs_io
in January to set|get encryption policy for ext4 encryption testing.

Thanks,
Eryu

Darrick J. Wong March 9, 2017, 6:59 a.m. UTC | #16

On Thu, Mar 09, 2017 at 06:46:48AM +0000, David Howells wrote:
> Eryu Guan <eguan@redhat.com> wrote:
> 
> > There's a "ADDING TO THE FSQA SUITE" section in README file, basically
> > it's telling you to use './new' script to generate new test template. So
> > based on what type of new test you're going to create, you could run:
> > 
> > ./new generic	# new generic test
> > ./new xfs	# new xfs test
> > ./new ext4	# new ext4 test
> > ./new shared	# new shared test
> > ...
> > 
> > It finds the first free seq number and creates test template and a
> > default .out file in tests/<type> dir and add a new entry in
> > tests/<type>/group file.  You could just edit the template and update
> > the .out file with expected output from the test run.
> 
> Yes, I'd spotted the ./new script, but it's not as simple as that since shell
> scripts can't call system calls directly.

Probably best to place a program in src/ so that xfstests can make the
raw syscalls early while we wait for glibc to invent a wrapper.

--D

> 
> David

David Howells March 9, 2017, 7:45 a.m. UTC | #17

Eryu Guan <eguan@redhat.com> wrote:

> A normal way we take for fstests is adding new syscall support to xfs_io
> first, then writing new tests based on it, as some other new
> syscalls/utilities do, e.g. set|get_encpolicy were added to xfs_io
> in January to set|get encryption policy for ext4 encryption testing.

Please add this to the docs.

Thanks,
David

Christoph Hellwig March 9, 2017, 2:01 p.m. UTC | #18

On Wed, Mar 08, 2017 at 10:59:49PM -0800, Darrick J. Wong wrote:
> Probably best to place a program in src/ so that xfstests can make the
> raw syscalls early while we wait for glibc to invent a wrapper.

Іn general don't invent new programs if we can avoid it and add a statx
command to xfs_io, that all the tests can be built around.

Eric Biggers March 10, 2017, 5:01 a.m. UTC | #19

On Wed, Mar 08, 2017 at 09:41:00AM +0000, David Howells wrote:
> > 
> > This example is confusing because chmod() doesn't change st_uid, and chown()
> > doesn't ordinarily change st_mode.
> 
> Doesn't chown() clear the S_ISUID and S_ISGID flags?
> 

It can, but it's not a straightforward example.

> 
> No, it's not a bug, exactly.  It's a design decision.  It might be worth
> adding an AT_STAT_LOCK flag to synchronise against attribute setting, but it
> will cost you something in terms of performance - and even then, over a
> network filesystem it might not achieve anything.
> 

Well, regardless of whether people want to fix it or not, I do think it's a bug.
Currently even the update of a timestamp is non-atomic, since that involves
updating both tv_sec and tv_nsec.  Therefore, stat() might return the new tv_sec
along with the old tv_nsec, or vice versa.  This can cause some very strange
behavior, such as two successively observed timestamps going backwards in time.

I expect that historically people haven't complained enough for this to be
worthwhile to fix.  But in theory I think it could be fixed with little
performance loss by protecting all the stat data with a sequence count, similar
to how reads of i_size are made atomic on 32-bit platforms by using
i_size_seqcount.

> 
> Good point.  Should I reject AT_EMPTY_PATH, AT_NO_AUTOMOUNT and
> AT_SYMLINK_NOFOLLOW if pathname is NULL, I wonder?
> 

It could be argued either way, but I was thinking those particular flags should
just be ignored, as they have known meanings but simply aren't relevant.  It
could also cause confusion for

	statx(dfd, "", AT_EMPTY_PATH, 0, buffer)

to succeed but for

	statx(dfd, NULL, AT_EMPTY_PATH, 0, buffer)

to fail (for example).

- Eric

Matthew Wilcox March 20, 2017, 4:01 p.m. UTC | #20

On Wed, Mar 08, 2017 at 10:26:14AM -0500, Jeff Layton wrote:
> On Wed, 2017-03-08 at 09:24 +0000, David Howells wrote:
> > Andreas Dilger <adilger@dilger.ca> wrote:
> > >    To test whether a path is a regular file (for example), one could write:
> > >    .nf
> > >    .in +4n
> > >    rc = statx(AT_FDCWD, pathname, 0, STATX_TYPE, &stx);
> > >    if (stx.stx_mask & STATX_TYPE && S_ISREG(sb.stx_mode)) {
> > >       /* Handle regular file */
> > >    }
> > 
> > Good idea.
> 
> Note that & and && have similar precedence, IIRC, so you probably want
> to wrap that flag check in parenthesis.

They're actually pretty far apart.  & is priority 8 and && is priority 11.
But the fact that you thought they were similar shows that most of
us don't really know what C's operator precedence is (beyond + and *)
and we should err on the side of clarity by using the unnecessary parens.

> If the plan is to eventually (in some far away future) to deprecate
> stat(), then we probably don't want this manpage to refer to it as a
> canonical source of information. I say duplicate it here.

We haven't even deprecated select() in favour of poll() ... indeed,
pselect() was added.  I can't imagine stat() ever being deprecated.

Jeff Layton March 22, 2017, 10:55 a.m. UTC | #21

On Mon, 2017-03-20 at 09:01 -0700, Matthew Wilcox wrote:
> On Wed, Mar 08, 2017 at 10:26:14AM -0500, Jeff Layton wrote:
> > On Wed, 2017-03-08 at 09:24 +0000, David Howells wrote:
> > > Andreas Dilger <adilger@dilger.ca> wrote:
> > > >    To test whether a path is a regular file (for example), one could write:
> > > >    .nf
> > > >    .in +4n
> > > >    rc = statx(AT_FDCWD, pathname, 0, STATX_TYPE, &stx);
> > > >    if (stx.stx_mask & STATX_TYPE && S_ISREG(sb.stx_mode)) {
> > > >       /* Handle regular file */
> > > >    }
> > > 
> > > Good idea.
> > 
> > Note that & and && have similar precedence, IIRC, so you probably want
> > to wrap that flag check in parenthesis.
> 
> They're actually pretty far apart.  & is priority 8 and && is priority 11.
> But the fact that you thought they were similar shows that most of
> us don't really know what C's operator precedence is (beyond + and *)
> and we should err on the side of clarity by using the unnecessary parens.
> 

Ahh, good to know. My rules are generally: multiplication and division
come before addition and subtraction, and put parenthesis around
everything else. :)

> > If the plan is to eventually (in some far away future) to deprecate
> > stat(), then we probably don't want this manpage to refer to it as a
> > canonical source of information. I say duplicate it here.
> 
> We haven't even deprecated select() in favour of poll() ... indeed,
> pselect() was added.  I can't imagine stat() ever being deprecated.

Ok, deprecated is probably the wrong choice of words. You're correct
that we'll never be able to fully rip it out just due to the volume of
old programs that use it.

My thinking is that statx will come to supplant stat in newer code, and
stat will eventually become a "legacy" syscall. Typically we don't want
to have to refer to "legacy" manpages for canonical info in new
interfaces. If anything we may want to remove that info from stat(2)
manpage and have it refer to statx(2) for it.

Eric Biggers March 24, 2017, 8:53 p.m. UTC | #22

On Tue, Mar 07, 2017 at 05:22:55PM +0000, David Howells wrote:
> STATX_ALL	[All currently available stuff]
> .TE
> .in
> .PP
> .B "Do not"
> simply set
> .I mask
> to UINT_MAX as one or more bits may, in future, be used to specify an extension
> to the buffer.

To clarify, will an "extension to the buffer" be an increase in the size of
struct statx?  I think it would have to be, otherwise programs filling a struct
statx with STATX_ALL would start breaking as soon as they're rebuilt with the
new value of STATX_ALL, no?  Or would these "extension to the buffer" bits not
be added to STATX_ALL ...?

And I don't suppose there's anything we can do to stop programs from asking for
mask bits that haven't been defined yet, then breaking later if they happen to
be defined as "extensions"?  Maybe adding an extra "buffer size" argument to the
syscall?

I'm concerned that the idea of "extensions" isn't well thought out, and in
practice we'll just be stuck with the current struct size (256 bytes) forever.

- Eric

Andreas Dilger March 27, 2017, 12:46 a.m. UTC | #23

> On Mar 24, 2017, at 2:53 PM, Eric Biggers <ebiggers3@gmail.com> wrote:
> 
> On Tue, Mar 07, 2017 at 05:22:55PM +0000, David Howells wrote:
>> STATX_ALL	[All currently available stuff]
>> .TE
>> .in
>> .PP
>> .B "Do not"
>> simply set
>> .I mask
>> to UINT_MAX as one or more bits may, in future, be used to specify an extension
>> to the buffer.
> 
> To clarify, will an "extension to the buffer" be an increase in the size of
> struct statx?  I think it would have to be, otherwise programs filling a struct
> statx with STATX_ALL would start breaking as soon as they're rebuilt with the
> new value of STATX_ALL, no?  Or would these "extension to the buffer" bits not
> be added to STATX_ALL ...?

The value of STATX_ALL would match the size of struct statx in the header at
compilation time, so this would always be consistent.
> 
> And I don't suppose there's anything we can do to stop programs from asking
> for mask bits that haven't been defined yet, then breaking later if they
> happen to be defined as "extensions"?  Maybe adding an extra "buffer size"
> argument to the syscall?

You can't stop applications from doing dumb things, like asking to read 1MB
of data into a buffer that is only 512KB in size.  That will also work fine
as long as the application only reads a files smaller than 512KB.

Similarly, if the statx() API says that the STATX_ALL mask is the list of
currently-supported bits, but the app asks for more bits than it allocates
a buffer for, there isn't much that the kernel can do.

> I'm concerned that the idea of "extensions" isn't well thought out, and in
> practice we'll just be stuck with the current struct size (256 bytes) forever.

The extensions work exactly as they should - the client sets bits for fields
that it needs (and by definition it shouldn't ask for anything that it doesn't
understand), and the kernel masks this down to the bits that it understands.

If the client asks for more bits than the kernel understands, it is likely a
newer application on an older kernel, and it will only get back the fields
that the kernel understands.  The reverse (client asking for fewer bits than
the kernel understands) is normal behaviour for this interface.  The kernel should only fill in fields that the client requested and for which there is
space in the struct.

Cheers, Andreas

David Howells March 27, 2017, 9:55 a.m. UTC | #24

Eric Biggers <ebiggers3@gmail.com> wrote:

> On Tue, Mar 07, 2017 at 05:22:55PM +0000, David Howells wrote:
> > STATX_ALL	[All currently available stuff]
> > .TE
> > .in
> > .PP
> > .B "Do not"
> > simply set
> > .I mask
> > to UINT_MAX as one or more bits may, in future, be used to specify an extension
> > to the buffer.
> 
> To clarify, will an "extension to the buffer" be an increase in the size of
> struct statx?

Yes.  I can make that more explicit, say perhaps "... specify an extension to
the size of the buffer".

> I think it would have to be, otherwise programs filling a struct statx with
> STATX_ALL would start breaking as soon as they're rebuilt with the new value
> of STATX_ALL, no?  Or would these "extension to the buffer" bits not be
> added to STATX_ALL ...?

It would have to work that way.  I can put this in a comment for future
guidance in the kernel header file if you'd prefer.

> And I don't suppose there's anything we can do to stop programs from asking
> for mask bits that haven't been defined yet, then breaking later if they
> happen to be defined as "extensions"?

If someone decides not to read the documentation, one could argue they get
what they deserve.

> Maybe adding an extra "buffer size" argument to the syscall?

This idea has already been rejected.

> I'm concerned that the idea of "extensions" isn't well thought out, and in
> practice we'll just be stuck with the current struct size (256 bytes) forever.

One or more bits (probably just one) are tentatively reserved to indicate that
the buffer is now bigger.  How much bigger I cannot say at this point, not
knowing the circumstances yet that will require the extension.

David

Darrick J. Wong March 27, 2017, 3:40 p.m. UTC | #25

On Sun, Mar 26, 2017 at 06:46:43PM -0600, Andreas Dilger wrote:
> 
> > On Mar 24, 2017, at 2:53 PM, Eric Biggers <ebiggers3@gmail.com> wrote:
> > 
> > On Tue, Mar 07, 2017 at 05:22:55PM +0000, David Howells wrote:
> >> STATX_ALL	[All currently available stuff]
> >> .TE
> >> .in
> >> .PP
> >> .B "Do not"
> >> simply set
> >> .I mask
> >> to UINT_MAX as one or more bits may, in future, be used to specify an extension
> >> to the buffer.
> > 
> > To clarify, will an "extension to the buffer" be an increase in the size of
> > struct statx?  I think it would have to be, otherwise programs filling a struct
> > statx with STATX_ALL would start breaking as soon as they're rebuilt with the
> > new value of STATX_ALL, no?  Or would these "extension to the buffer" bits not
> > be added to STATX_ALL ...?
> 
> The value of STATX_ALL would match the size of struct statx in the header at
> compilation time, so this would always be consistent.
> > 
> > And I don't suppose there's anything we can do to stop programs from asking
> > for mask bits that haven't been defined yet, then breaking later if they
> > happen to be defined as "extensions"?  Maybe adding an extra "buffer size"
> > argument to the syscall?
> 
> You can't stop applications from doing dumb things, like asking to read 1MB
> of data into a buffer that is only 512KB in size.  That will also work fine
> as long as the application only reads a files smaller than 512KB.
> 
> Similarly, if the statx() API says that the STATX_ALL mask is the list of
> currently-supported bits, but the app asks for more bits than it allocates
> a buffer for, there isn't much that the kernel can do.
> 
> > I'm concerned that the idea of "extensions" isn't well thought out, and in
> > practice we'll just be stuck with the current struct size (256 bytes) forever.
> 
> The extensions work exactly as they should - the client sets bits for fields
> that it needs (and by definition it shouldn't ask for anything that it doesn't
> understand), and the kernel masks this down to the bits that it understands.
> 
> If the client asks for more bits than the kernel understands, it is likely a
> newer application on an older kernel, and it will only get back the fields
> that the kernel understands.  The reverse (client asking for fewer bits than
> the kernel understands) is normal behaviour for this interface.  The kernel
> should only fill in fields that the client requested and for which there is
> space in the struct.

During the statx session at LSF last week I asked if filesystems ought
to fill in fields that weren't asked for (btime, specifically) and the
impression I got was that it's ok to go ahead and fill out fields that
weren't asked for if we already have the data.

Since statx backends can do that, they'll have to check the structure
size, and not rely on "you asked for this field so we assume that you
allocated enough memory in userspace to hold it".

Or we could just shift the precedent now -- programs only get the
information they ask for, and in asking for it we assume that we can
write to that part of the buffer.  Frankly I'd prefer that behavior (see
the XFS statx patch), but I don't own the interface. :)

--D

> 
> Cheers, Andreas
> 
> 
> 
> 
>

David Howells March 27, 2017, 4:25 p.m. UTC | #26

Darrick J. Wong <darrick.wong@oracle.com> wrote:

> During the statx session at LSF last week I asked if filesystems ought
> to fill in fields that weren't asked for (btime, specifically) and the
> impression I got was that it's ok to go ahead and fill out fields that
> weren't asked for if we already have the data.

Yes.  The filesystem may return data that it wasn't asked for.

> Since statx backends can do that, they'll have to check the structure
> size, and not rely on "you asked for this field so we assume that you
> allocated enough memory in userspace to hold it".

Kind of.  I'm assuming that the extension fields will be in struct kstat and
can be set directly by the filesystem unconditionally, and the core can choose
whether to copy them or not.  However, assuming that one of the mask bits is
reserved to indicate a buffer extension, this will be made available to the fs
to let the fs know whether it's pointless to return it.

However, at the moment I have no extensions to play with.

Would it help you if I marked one of the mask bits reserved for this purpose?

> Or we could just shift the precedent now -- programs only get the
> information they ask for, and in asking for it we assume that we can
> write to that part of the buffer.  Frankly I'd prefer that behavior (see
> the XFS statx patch), but I don't own the interface. :)

That means a filesystem can't simply return non-basic data unconditionally if
possible.  I prefer letting it do so if it doesn't incur any extra I/O
overheads.

David

Christoph Hellwig March 27, 2017, 4:46 p.m. UTC | #27

On Mon, Mar 27, 2017 at 05:25:26PM +0100, David Howells wrote:
> That means a filesystem can't simply return non-basic data unconditionally if
> possible.  I prefer letting it do so if it doesn't incur any extra I/O
> overheads.

This seems like it will lead to userspace expecting certain fields to
just be there, and a lot harder to properly verify for tests.  Which btw
we all need for these odd behaviors.  If we can't get them any time soon
(e.g. before -rc6) I think we'll simply have to revert statx instead of
leaving this untested mess in the tree.

David Howells March 27, 2017, 6:57 p.m. UTC | #28

Christoph Hellwig <hch@infradead.org> wrote:

> On Mon, Mar 27, 2017 at 05:25:26PM +0100, David Howells wrote:
> > That means a filesystem can't simply return non-basic data unconditionally if
> > possible.  I prefer letting it do so if it doesn't incur any extra I/O
> > overheads.
> 
> This seems like it will lead to userspace expecting certain fields to
> just be there, and a lot harder to properly verify for tests.  Which btw
> we all need for these odd behaviors.  If we can't get them any time soon
> (e.g. before -rc6) I think we'll simply have to revert statx instead of
> leaving this untested mess in the tree.

Have you reviewed the manpage yet?

David

Jeff Layton March 27, 2017, 7:04 p.m. UTC | #29

On Mon, 2017-03-27 at 09:46 -0700, Christoph Hellwig wrote:
> On Mon, Mar 27, 2017 at 05:25:26PM +0100, David Howells wrote:
> > That means a filesystem can't simply return non-basic data unconditionally if
> > possible.  I prefer letting it do so if it doesn't incur any extra I/O
> > overheads.
> 
> This seems like it will lead to userspace expecting certain fields to
> just be there, and a lot harder to properly verify for tests.  Which btw
> we all need for these odd behaviors.  If we can't get them any time soon
> (e.g. before -rc6) I think we'll simply have to revert statx instead of
> leaving this untested mess in the tree.

I don't think so.

I think we just have to clearly document that that will not be the
case. If they really expect the field to be there, then they need to
set the bit in the "want" mask -- it's really as simple as that.

Andreas Dilger March 28, 2017, 7:39 p.m. UTC | #30

On Mar 27, 2017, at 1:04 PM, Jeff Layton <jlayton@poochiereds.net> wrote:
> 
> On Mon, 2017-03-27 at 09:46 -0700, Christoph Hellwig wrote:
>> On Mon, Mar 27, 2017 at 05:25:26PM +0100, David Howells wrote:
>>> That means a filesystem can't simply return non-basic data unconditionally if
>>> possible.  I prefer letting it do so if it doesn't incur any extra I/O
>>> overheads.
>> 
>> This seems like it will lead to userspace expecting certain fields to
>> just be there, and a lot harder to properly verify for tests.  Which btw
>> we all need for these odd behaviors.  If we can't get them any time soon
>> (e.g. before -rc6) I think we'll simply have to revert statx instead of
>> leaving this untested mess in the tree.
> 
> I don't think so.
> 
> I think we just have to clearly document that that will not be the
> case. If they really expect the field to be there, then they need to
> set the bit in the "want" mask -- it's really as simple as that.

It would be my preference to only return attributes which are explicitly
requested (Lustre won't return fields for which the bit is not set), but
there was a request for this behaviour in NFS I recall.

That said, there are also flags for "lazy" attributes for NFS. I wonder if
it is enough to pass AT_STATX_DONT_SYNC and request STATX_SIZE | STATX_CTIME,
or whatever NFS is worried about, and return the client-local version of the
attributes?

I guess the real question is what an application is going to do with fields
that it didn't actually request, and has no way to know if they are valid or
contain garbage?

Cheers, Andreas

Jeff Layton March 28, 2017, 8:22 p.m. UTC | #31

On Tue, 2017-03-28 at 13:39 -0600, Andreas Dilger wrote:
> On Mar 27, 2017, at 1:04 PM, Jeff Layton <jlayton@poochiereds.net> wrote:
> > 
> > On Mon, 2017-03-27 at 09:46 -0700, Christoph Hellwig wrote:
> > > On Mon, Mar 27, 2017 at 05:25:26PM +0100, David Howells wrote:
> > > > That means a filesystem can't simply return non-basic data unconditionally if
> > > > possible.  I prefer letting it do so if it doesn't incur any extra I/O
> > > > overheads.
> > > 
> > > This seems like it will lead to userspace expecting certain fields to
> > > just be there, and a lot harder to properly verify for tests.  Which btw
> > > we all need for these odd behaviors.  If we can't get them any time soon
> > > (e.g. before -rc6) I think we'll simply have to revert statx instead of
> > > leaving this untested mess in the tree.
> > 
> > I don't think so.
> > 
> > I think we just have to clearly document that that will not be the
> > case. If they really expect the field to be there, then they need to
> > set the bit in the "want" mask -- it's really as simple as that.
> 
> It would be my preference to only return attributes which are explicitly
> requested (Lustre won't return fields for which the bit is not set), but
> there was a request for this behaviour in NFS I recall.
> 
> That said, there are also flags for "lazy" attributes for NFS. I wonder if
> it is enough to pass AT_STATX_DONT_SYNC and request STATX_SIZE | STATX_CTIME,
> or whatever NFS is worried about, and return the client-local version of the
> attributes?
> 
> I guess the real question is what an application is going to do with fields
> that it didn't actually request, and has no way to know if they are valid or
> contain garbage?
> 

You certainly do have a way to know if they're valid. You can check the
stx_mask.

Basically, if you request something, you're guaranteed to get it, or an
error. If you don't request it, you might get it but you had better
check the stx_mask before assuming that you did.

TBQH though, I don't _really_ expect userland to use this that often.

The main reason I'm arguing for this is for simplicity of the in-kernel 
implementations. Having to check each bit in the mask before filling
out its field is cumbersome. It's much simpler to fill out the kstat
struct with what you have and set the stx_mask unconditionally.

It's also not 100% clear to me how the unrequested fields would be set
if we don't allow this. zeroing them out seems like the obvious thing
to do, but that doesn't really provide any guarantee that someone won't
try to interpret it as valid.

Christoph Hellwig March 31, 2017, 3:49 p.m. UTC | #32

On Mon, Mar 27, 2017 at 03:04:11PM -0400, Jeff Layton wrote:
> I think we just have to clearly document that that will not be the
> case. If they really expect the field to be there, then they need to
> set the bit in the "want" mask -- it's really as simple as that.

If the usual local file systems set it but say NFS doesn't I will
very much expect that people don't set the flag and create buggy
programs a lot.  An API that doesn't give you more than you ask
for is a lot more obvious.

Christoph Hellwig March 31, 2017, 3:56 p.m. UTC | #33

On Tue, Mar 07, 2017 at 05:22:55PM +0000, David Howells wrote:
> Christoph Hellwig <hch@infradead.org> wrote:
> 
> > This would be great to have in 4.11 together with the initial statx
> > implementation.  But until I see documentation and testcases for statx
> > I don't really feel comfortable reviewing anything related to it.
> 
> Well, since you asked for documentation, here's a manual page for you to
> review:-)
> 
> Note that as it isn't in glibc yet, I've left out all the
> set-this-and-that-#define-to-make-it-appear stuff except where it is pertinent
> to particular constants.

I think we'll initially need to make this a section 2 page as libcs
will take a while to pick things up.  But in the end Michael will have
to decide.  Adding linux-man to the Cc list so that the right people
are involved.

Quoting the rendered page below:

>   Invoking statx():

 This is an odd header for a man page.  Usually this would start as

DESCRIPTION

(in bold).

>       [*] By absolute path.

Odd way to enumerate in a man page - I'll leave it to the linux-man regulars
to point out the right way, because I don't know it off-hand.

>       AT_EMPTY_PATH (since Linux 2.6.39)

All these have been there since statx was added, so we can remove the annotation.

>           STATX_BASIC_STATS   [The stuff in the normal stat struct]

Please enumerate the actual fields.

>           STATX_ALL           [All currently available stuff]

How about "All currently available fields"

David Howells March 31, 2017, 4:43 p.m. UTC | #34

Christoph Hellwig <hch@infradead.org> wrote:

> Quoting the rendered page below:
> 
> >   Invoking statx():
>  
>  This is an odd header for a man page.  Usually this would start as
> 
> DESCRIPTION
> 
> (in bold).

Yes, I know.  But it seems to make more sense this way round.  Putting the
struct definition before the description of how to access it means that I can
refer to it in the definitions of the flags that you OR into the mask and
means you get to the struct much faster.  Maybe I should put it under the
SYNOPSIS section?

> >       [*] By absolute path.
> 
> Odd way to enumerate in a man page - I'll leave it to the linux-man regulars
> to point out the right way, because I don't know it off-hand.

Me neither.

> >       AT_EMPTY_PATH (since Linux 2.6.39)
> 
> All these have been there since statx was added, so we can remove the
> annotation.

You're not the first to note this.  I've already taken that out.

> >           STATX_BASIC_STATS   [The stuff in the normal stat struct]
> 
> Please enumerate the actual fields.

They're enumerated immediately prior.  I've changed it to:

	STATX_BASIC_STATS   [All of the above]

> >           STATX_ALL           [All currently available stuff]
> 
> How about "All currently available fields"

Sure.

David

xfs: report crtime and attribute flags to statx

Commit Message

Comments

Patch