Message ID | 20170307000609.GG5280@birch.djwong.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mon, Mar 06, 2017 at 04:06:09PM -0800, Darrick J. Wong wrote: > statx has the ability to report inode creation times and inode flags, so > hook up di_crtime and di_flags to that functionality. This would be great to have in 4.11 together with the initial statx implementation. But until I see documentation and testcases for statx I don't really feel comfortable reviewing anything related to it.
Christoph Hellwig <hch@infradead.org> wrote: > This would be great to have in 4.11 together with the initial statx > implementation. But until I see documentation and testcases for statx > I don't really feel comfortable reviewing anything related to it. Well, since you asked for documentation, here's a manual page for you to review:-) Note that as it isn't in glibc yet, I've left out all the set-this-and-that-#define-to-make-it-appear stuff except where it is pertinent to particular constants. I don't suppose you know where the documentation on writing xfstests tests is? xfstests-dev/doc/ only contains an old and out of date changelog. David --- '\" t .\" Copyright (c) 1992 Drew Eckhardt (drew@cs.colorado.edu), March 28, 1992 .\" Parts Copyright (c) 1995 Nicolai Langfeldt (janl@ifi.uio.no), 1/1/95 .\" and Copyright (c) 2006, 2007, 2014 Michael Kerrisk <mtk.manpages@gmail.com> .\" and Copyright (c) 2017 David Howells <dhowells@redhat.com> .\" .\" %%%LICENSE_START(VERBATIM) .\" Permission is granted to make and distribute verbatim copies of this .\" manual provided the copyright notice and this permission notice are .\" preserved on all copies. .\" .\" Permission is granted to copy and distribute modified versions of this .\" manual under the conditions for verbatim copying, provided that the .\" entire resulting derived work is distributed under the terms of a .\" permission notice identical to this one. .\" .\" Since the Linux kernel and libraries are constantly changing, this .\" manual page may be incorrect or out-of-date. The author(s) assume no .\" responsibility for errors or omissions, or for damages resulting from .\" the use of the information contained herein. The author(s) may not .\" have taken the same level of care in the production of this manual, .\" which is licensed free of charge, as they might when working .\" professionally. .\" .\" Formatted or processed versions of this manual, if unaccompanied by .\" the source, must acknowledge the copyright and authors of this work. .\" %%%LICENSE_END .\" .TH STATX 2 2017-03-07 "Linux" "Linux Programmer's Manual" .SH NAME statx \- Get file status (extended) .SH SYNOPSIS .nf .B #include <sys/types.h> .br .B #include <sys/stat.h> .br .B #include <unistd.h> .br .BR "#include <fcntl.h> " "/* Definition of AT_* constants */" .sp .BI "int statx(int " dirfd ", const char *" pathname ", int " flags "," .BI " unsigned int " mask ", struct statx *" buf ); .fi .sp .in -4n Feature Test Macro Requirements for glibc (see .BR feature_test_macros (7)): .in .ad l .PD 0 .sp .RS 4 <unknown as yet> .RE .PD .ad .SH DESCRIPTION .PP This function returns information about a file, storing it in the buffer pointed to by .IR buf . The buffer is filled in according to the following type: .PP .in +4n .nf struct statx { __u32 stx_mask; -- Mask of bits indicating filled fields __u32 stx_blksize; -- Block size for filesystem I/O __u64 stx_attributes; -- Extra file attribute indicators __u32 stx_nlink; -- Number of hard links __u32 stx_uid; -- User ID of owner __u32 stx_gid; -- Group ID of owner __u16 stx_mode; -- File type and mode __u64 stx_ino; -- Inode number __u64 stx_size; -- Total size in bytes __u64 stx_blocks; -- Number of 512B blocks allocated struct statx_timestamp stx_atime; -- Time of last access struct statx_timestamp stx_btime; -- Time of creation struct statx_timestamp stx_ctime; -- Time of last status change struct statx_timestamp stx_mtime; -- Time of last modification __u32 stx_rdev_major; } Device number if device file __u32 stx_rdev_minor; } __u32 stx_dev_major; } Device number of containing file __u32 stx_dev_minor; } }; .fi .in .PP Where the timestamps are defined as: .PP .in +4n .nf struct statx_timestamp { __s64 tv_sec; -- Number of seconds before or since 1970 __s32 tv_nsec; -- Number of nanoseconds before or since tv_sec }; .fi .in .PP (Note that reserved space and padding is ommitted) .SS Invoking \fBstatx\fR(): .PP To access a file's status, no permissions are required on the file itself, but in the case of .BR statx () with a path, execute (search) permission is required on all of the directories in .I pathname that lead to the file. .PP .BR statx () uses .IR pathname ", " dirfd " and " flags to locate the target file in one of a variety of ways: .TP [*] By absolute path. .I pathname points to an absolute path and .I dirfd is ignored. The file is looked up by name, starting from the root of the filesystem as seen by the calling process. .TP [*] By cwd-relative path. .I pathname points to a relative path and .IR dirfd " is " AT_FDCWD . The file is looked up by name, starting from the current working directory. .TP [*] By dir-relative path. .I pathname points to relative path and .I dirfd indicates a file descriptor pointing to a directory. The file is looked up by name, starting from the directory specified by .IR dirfd . .TP [*] By file descriptor. .IR pathname " is " NULL " and " dirfd indicates a file descriptor. The file attached to the file descriptor is queried directly. The file descriptor may point to any type of file, not just a directory. .PP .I flags can be used to influence a path-based lookup. A value for .I flags is constructed by OR'ing together zero or more of the following constants: .TP .BR AT_EMPTY_PATH " (since Linux 2.6.39)" .\" commit 65cfc6722361570bfe255698d9cd4dccaf47570d If .I pathname is an empty string, operate on the file referred to by .IR dirfd (which may have been obtained using the .BR open (2) .B O_PATH flag). If .I dirfd is .BR AT_FDCWD , the call operates on the current working directory. In this case, .I dirfd can refer to any type of file, not just a directory. This flag is Linux-specific; define .B _GNU_SOURCE .\" Before glibc 2.16, defining _ATFILE_SOURCE sufficed to obtain its definition. .TP .BR AT_NO_AUTOMOUNT " (since Linux 2.6.38)" Don't automount the terminal ("basename") component of .I pathname if it is a directory that is an automount point. This allows the caller to gather attributes of an automount point (rather than the location it would mount). This flag can be used in tools that scan directories to prevent mass-automounting of a directory of automount points. The .B AT_NO_AUTOMOUNT flag has no effect if the mount point has already been mounted over. This flag is Linux-specific; define .B _GNU_SOURCE .\" Before glibc 2.16, defining _ATFILE_SOURCE sufficed to obtain its definition. .TP .B AT_SYMLINK_NOFOLLOW If .I pathname is a symbolic link, do not dereference it: instead return information about the link itself, like .BR lstat (). .PP .I flags can also be used to control what sort of synchronisation the kernel will do when querying a file on a remote filesystem. This is done by OR'ing in one of the following values: .TP AT_STATX_SYNC_AS_STAT Do whatever .BR stat () does. This is the default and is very much filesystem specific. .TP AT_STATX_FORCE_SYNC Force the attributes to be synchronised with the server. This may require that a network filesystem perform a data writeback to get the timestamps correct. .TP AT_STATX_DONT_SYNC Don't synchronise anything, but rather just take whatever the system has cached if possible. This may mean that the information returned is approximate, but, on a network filesystem, it may not involve a round trip to the server - even if no lease is held. .PP The .I mask argument to .BR statx () is used to tell the kernel which fields the caller is interested in .I mask is an OR'ed combination of the following constants: .PP .in +4n .TS lB l. STATX_TYPE Want stx_mode & S_IFMT STATX_MODE Want stx_mode & ~S_IFMT STATX_NLINK Want stx_nlink STATX_UID Want stx_uid STATX_GID Want stx_gid STATX_ATIME Want stx_atime{,_ns} STATX_MTIME Want stx_mtime{,_ns} STATX_CTIME Want stx_ctime{,_ns} STATX_INO Want stx_ino STATX_SIZE Want stx_size STATX_BLOCKS Want stx_blocks STATX_BASIC_STATS [The stuff in the normal stat struct] STATX_BTIME Want stx_btime{,_ns} STATX_ALL [All currently available stuff] .TE .in .PP .B "Do not" simply set .I mask to UINT_MAX as one or more bits may, in future, be used to specify an extension to the buffer. .SS The returned information .PP The status information for the target file is returned in the .I statx structure pointed to by .IR buf . Included in this is .I stx_mask which indicates what other information has been returned. .I stx_mask has the same format as the mask argument and bits are set in it to indicate which fields have been filled in. .PP It should be noted that the kernel may return fields that weren't requested and may fail to return fields that were requested, depending on what the backing filesystem supports. In either case, .I stx_mask will not be equal .IR mask . .PP If a filesystem does not support a field or if it has an unrepresentable value (for instance, a file with an exotic type), then the mask bit corresponding to that field will be cleared in .I stx_mask even if the user asked for it and a dummy value will be filled in for compatibility purposes if one is available (e.g. a dummy uid and gid may be specified to mount under some circumstances). .PP A filesystem may also fill in fields that the caller didn't ask for if it has values for them available at no extra cost. If this happens, the corresponding bits will be set in .IR stx_mask . .PP .\" Background: inode attributes are modified with i_mutex held, but .\" read by stat() without taking the mutex. .I Note: For performance and simplicity reasons, different fields in the .I statx structure may contain state information from different moments during the execution of the system call. For example, if .IR stx_mode or .IR stx_uid is changed by another process by calling .BR chmod (2) or .BR chown (2), .BR stat () might return the old .I stx_mode together with the new .IR stx_uid , or the old .I stx_uid together with the new .IR stx_mode . .PP Apart from stx_mask (which is described above), the fields in the .I statx structure are: .TP .I stx_mode The file type and mode. This is described in more detail below. .TP .I stx_size The size of the file (if it is a regular file or a symbolic link) in bytes. The size of a symbolic link is the length of the pathname it contains, without a terminating null byte. .TP .I stx_blocks The number of blocks allocated to the file on the medium, in 512-byte units. (This may be smaller than .IR stx_size /512 when the file has holes.) .TP .I stx_blksize The "preferred" blocksize for efficient filesystem I/O. (Writing to a file in smaller chunks may cause an inefficient read-modify-rewrite.) .TP .I stx_nlink The number of hard links on a file. .TP .I stx_uid The user ID of the file's owner. .TP .I stx_gid The ID of the group that may access the file. .TP .IR stx_dev_major " and " stx_dev_minor The device on which this file (inode) resides. .TP .IR stx_rdev_major " and " stx_rdev_minor The device that this file (inode) represents if the file is of block or character device type. .TP .I stx_attributes Further status information about the file. This consists of zero or more of the following constants OR'ed together: .in +4n .TS lB l. STATX_ATTR_COMPRESSED File is compressed by the fs STATX_ATTR_IMMUTABLE File is marked immutable STATX_ATTR_APPEND File is append-only STATX_ATTR_NODUMP File is not to be dumped STATX_ATTR_ENCRYPTED File requires key to decrypt in fs .TE .in .TP .I stx_atime The file's last access timestamp. This field is changed by file accesses, for example, by .BR execve (2), .BR mknod (2), .BR pipe (2), .BR utime (2), and .BR read (2) (of more than zero bytes). Other routines, such as .BR mmap (2), may or may not update it. .TP .I stx_btime The file's creation timestamp. This is set on file creation and not changed subsequently. .TP .I stx_ctime The file's last status change timestamp. This field is changed by writing or by setting inode information (i.e., owner, group, link count, mode, etc.). .TP .I stx_mtime The file's last modification timestamp. This is changed by file modifications, for example, by .BR mknod (2), .BR truncate (2), .BR utime (2), and .BR write (2) (of more than zero bytes). Moreover, the modification time of a directory is changed by the creation or deletion of files in that directory. This field is .I not changed for changes in owner, group, hard link count, or mode. .PP Not all of the Linux filesystems implement all of the timestamp fields. Some filesystems allow mounting in such a way that file and/or directory accesses do not cause an update of the .I stx_atime field. (See .IR noatime , .IR nodiratime , and .I relatime in .BR mount (8), and related information in .BR mount (2).) In addition, .I stx_atime is not updated if a file is opened with the .BR O_NOATIME ; see .BR open (2). .SS File type and mode .PP The .I stx_mode field contains the combined file type and mode. POSIX refers to the bits in this field corresponding to the mask .B S_IFMT (see below) as the .IR "file type" , the 12 bits corresponding to the mask 07777 as the .IR "file mode bits" and the least significant 9 bits (0777) as the .IR "file permission bits" . .IP The following mask values are defined for the file type of the .I stx_mode field: .in +4n .TS lB l l. S_IFMT 0170000 bit mask for the file type bit field S_IFSOCK 0140000 socket S_IFLNK 0120000 symbolic link S_IFREG 0100000 regular file S_IFBLK 0060000 block device S_IFDIR 0040000 directory S_IFCHR 0020000 character device S_IFIFO 0010000 FIFO .TE .in .IP Note that .I stx_mode has two mask flags covering it: one for the type and one for the mode bits. .PP To test for a regular file (for example), one could write: .nf .in +4n statx(AT_FDCWD, pathname, 0, STATX_BASIC_STATS, &sb); if ((sb.stx_mode & S_IFMT) == S_IFREG) { /* Handle regular file */ } .in .fi .PP Because tests of the above form are common, additional macros are defined by POSIX to allow the test of the file type in .I stx_mode to be written more concisely: .RS 4 .TS lB l. \fBS_ISREG\fR(m) Is it a regular file? \fBS_ISDIR\fR(m) Is it a directory? \fBS_ISCHR\fR(m) Is it a character device? \fBS_ISBLK\fR(m) Is it a block device? \fBS_ISFIFO\fR(m) Is it a FIFO (named pipe)? \fBS_ISLNK\fR(m) Is it a symbolic link? (Not in POSIX.1-1996.) \fBS_ISSOCK\fR(m) Is it a socket? (Not in POSIX.1-1996.) .TE .RE .PP The preceding code snippet could thus be rewritten as: .nf .in +4n statx(AT_FDCWD, pathname, 0, STATX_BASIC_STATS, &sb); if (S_ISREG(sb.stx_mode)) { /* Handle regular file */ } .in .fi .PP The definitions of most of the above file type test macros are provided if any of the following feature test macros is defined: .BR _BSD_SOURCE (in glibc 2.19 and earlier), .BR _SVID_SOURCE (in glibc 2.19 and earlier), or .BR _DEFAULT_SOURCE (in glibc 2.20 and later). In addition, definitions of all of the above macros except .BR S_IFSOCK and .BR S_ISSOCK () are provided if .BR _XOPEN_SOURCE is defined. The definition of .BR S_IFSOCK can also be exposed by defining .BR _XOPEN_SOURCE with a value of 500 or greater. The definition of .BR S_ISSOCK () is exposed if any of the following feature test macros is defined: .BR _BSD_SOURCE (in glibc 2.19 and earlier), .BR _DEFAULT_SOURCE (in glibc 2.20 and later), .BR _XOPEN_SOURCE with a value of 500 or greater, or .BR _POSIX_C_SOURCE with a value of 200112L or greater. .PP The following mask values are defined for the file mode component of the .I stx_mode field: .in +4n .TS lB l l. S_ISUID 04000 set-user-ID bit S_ISGID 02000 set-group-ID bit (see below) S_ISVTX 01000 sticky bit (see below) S_IRWXU 00700 owner has read, write, and execute permission S_IRUSR 00400 owner has read permission S_IWUSR 00200 owner has write permission S_IXUSR 00100 owner has execute permission S_IRWXG 00070 group has read, write, and execute permission S_IRGRP 00040 group has read permission S_IWGRP 00020 group has write permission S_IXGRP 00010 group has execute permission S_IRWXO 00007 T{ others (not in group) have read, write, and execute permission T} S_IROTH 00004 others have read permission S_IWOTH 00002 others have write permission S_IXOTH 00001 others have execute permission .TE .in .P The set-group-ID bit .RB ( S_ISGID ) has several special uses. For a directory, it indicates that BSD semantics is to be used for that directory: files created there inherit their group ID from the directory, not from the effective group ID of the creating process, and directories created there will also get the .B S_ISGID bit set. For a file that does not have the group execution bit .RB ( S_IXGRP ) set, the set-group-ID bit indicates mandatory file/record locking. .P The sticky bit .RB ( S_ISVTX ) on a directory means that a file in that directory can be renamed or deleted only by the owner of the file, by the owner of the directory, and by a privileged process. .SH RETURN VALUE On success, zero is returned. On error, \-1 is returned, and .I errno is set appropriately. .SH ERRORS .TP .B EINVAL Invalid flag specified in .IR flags . .TP .B EACCES Search permission is denied for one of the directories in the path prefix of .IR pathname . (See also .BR path_resolution (7).) .TP .B EBADF .I dirfd is not a valid open file descriptor. .TP .B EFAULT Bad address. .TP .B ELOOP Too many symbolic links encountered while traversing the path. .TP .B ENAMETOOLONG .I pathname is too long. .TP .B ENOENT A component of .I pathname does not exist, or .I pathname is an empty string. .TP .B ENOMEM Out of memory (i.e., kernel memory). .TP .B ENOTDIR A component of the path prefix of .I pathname is not a directory or .I pathname is relative and .I dirfd is a file descriptor referring to a file other than a directory. .SH VERSIONS .BR statx () was added to Linux in kernel 4.11; library support is not yet added to glibc. .SH SEE ALSO .BR ls (1), .BR stat (1), .BR access (2), .BR chmod (2), .BR chown (2), .BR readlink (2), .BR utime (2), .BR capabilities (7), .BR symlink (7)
On Mon, Mar 06, 2017 at 09:01:40PM -0800, Christoph Hellwig wrote: > On Mon, Mar 06, 2017 at 04:06:09PM -0800, Darrick J. Wong wrote: > > statx has the ability to report inode creation times and inode flags, so > > hook up di_crtime and di_flags to that functionality. > > This would be great to have in 4.11 together with the initial statx > implementation. But until I see documentation and testcases for statx > I don't really feel comfortable reviewing anything related to it. Same feeling here -- aside from cargo-culting from the proposed ext4 patch I don't have much to go on other than floating a test balloon to see if Mr. Howells responds. (FWIW the samples/test-statx.c program reported the flags & btime.) --D > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Mar 07, 2017 at 05:22:55PM +0000, David Howells wrote: > Christoph Hellwig <hch@infradead.org> wrote: > > > This would be great to have in 4.11 together with the initial statx > > implementation. But until I see documentation and testcases for statx > > I don't really feel comfortable reviewing anything related to it. > > Well, since you asked for documentation, here's a manual page for you to > review:-) HA, ok. This came in while I was scribbling a reply to Christoph's email. I'll have a look. > Note that as it isn't in glibc yet, I've left out all the > set-this-and-that-#define-to-make-it-appear stuff except where it is pertinent > to particular constants. > > I don't suppose you know where the documentation on writing xfstests tests is? > xfstests-dev/doc/ only contains an old and out of date changelog. /me usually just copies one of the newer tests and changes it. <cough> > David > --- > '\" t > .\" Copyright (c) 1992 Drew Eckhardt (drew@cs.colorado.edu), March 28, 1992 > .\" Parts Copyright (c) 1995 Nicolai Langfeldt (janl@ifi.uio.no), 1/1/95 > .\" and Copyright (c) 2006, 2007, 2014 Michael Kerrisk <mtk.manpages@gmail.com> > .\" and Copyright (c) 2017 David Howells <dhowells@redhat.com> > .\" > .\" %%%LICENSE_START(VERBATIM) > .\" Permission is granted to make and distribute verbatim copies of this > .\" manual provided the copyright notice and this permission notice are > .\" preserved on all copies. > .\" > .\" Permission is granted to copy and distribute modified versions of this > .\" manual under the conditions for verbatim copying, provided that the > .\" entire resulting derived work is distributed under the terms of a > .\" permission notice identical to this one. > .\" > .\" Since the Linux kernel and libraries are constantly changing, this > .\" manual page may be incorrect or out-of-date. The author(s) assume no > .\" responsibility for errors or omissions, or for damages resulting from > .\" the use of the information contained herein. The author(s) may not > .\" have taken the same level of care in the production of this manual, > .\" which is licensed free of charge, as they might when working > .\" professionally. > .\" > .\" Formatted or processed versions of this manual, if unaccompanied by > .\" the source, must acknowledge the copyright and authors of this work. > .\" %%%LICENSE_END > .\" > .TH STATX 2 2017-03-07 "Linux" "Linux Programmer's Manual" > .SH NAME > statx \- Get file status (extended) > .SH SYNOPSIS > .nf > .B #include <sys/types.h> > .br > .B #include <sys/stat.h> > .br > .B #include <unistd.h> > .br > .BR "#include <fcntl.h> " "/* Definition of AT_* constants */" > .sp > .BI "int statx(int " dirfd ", const char *" pathname ", int " flags "," > .BI " unsigned int " mask ", struct statx *" buf ); > .fi > .sp > .in -4n > Feature Test Macro Requirements for glibc (see > .BR feature_test_macros (7)): > .in > .ad l > .PD 0 > .sp > .RS 4 > <unknown as yet> > .RE > .PD > .ad > .SH DESCRIPTION > .PP > This function returns information about a file, storing it in the buffer > pointed to by > .IR buf . > The buffer is filled in according to the following type: > .PP > .in +4n > .nf > struct statx { > __u32 stx_mask; -- Mask of bits indicating filled fields > __u32 stx_blksize; -- Block size for filesystem I/O > __u64 stx_attributes; -- Extra file attribute indicators > __u32 stx_nlink; -- Number of hard links > __u32 stx_uid; -- User ID of owner > __u32 stx_gid; -- Group ID of owner > __u16 stx_mode; -- File type and mode > __u64 stx_ino; -- Inode number > __u64 stx_size; -- Total size in bytes > __u64 stx_blocks; -- Number of 512B blocks allocated > struct statx_timestamp stx_atime; -- Time of last access > struct statx_timestamp stx_btime; -- Time of creation > struct statx_timestamp stx_ctime; -- Time of last status change > struct statx_timestamp stx_mtime; -- Time of last modification > __u32 stx_rdev_major; } Device number if device file "of device file" ? > __u32 stx_rdev_minor; } > __u32 stx_dev_major; } Device number of containing file > __u32 stx_dev_minor; } "Device number of device containing file"? Or perhaps just "ID of device containing file" from the stat(2) manpage? > }; > .fi > .in > .PP > Where the timestamps are defined as: > .PP > .in +4n > .nf > struct statx_timestamp { > __s64 tv_sec; -- Number of seconds before or since 1970 > __s32 tv_nsec; -- Number of nanoseconds before or since tv_sec > }; > .fi > .in > .PP > (Note that reserved space and padding is ommitted) > .SS > Invoking \fBstatx\fR(): > .PP > To access a file's status, no permissions are required on the file itself, but > in the case of > .BR statx () > with a path, execute (search) permission is required on all of the directories > in > .I pathname > that lead to the file. > .PP > .BR statx () > uses > .IR pathname ", " dirfd " and " flags > to locate the target file in one of a variety of ways: > .TP > [*] By absolute path. > .I pathname > points to an absolute path and > .I dirfd > is ignored. The file is looked up by name, starting from the root of the > filesystem as seen by the calling process. > .TP > [*] By cwd-relative path. > .I pathname > points to a relative path and > .IR dirfd " is " AT_FDCWD . > The file is looked up by name, starting from the current working directory. > .TP > [*] By dir-relative path. > .I pathname > points to relative path and > .I dirfd > indicates a file descriptor pointing to a directory. The file is looked up by > name, starting from the directory specified by > .IR dirfd . > .TP > [*] By file descriptor. > .IR pathname " is " NULL " and " dirfd > indicates a file descriptor. The file attached to the file descriptor is > queried directly. The file descriptor may point to any type of file, not just > a directory. > .PP > .I flags > can be used to influence a path-based lookup. A value for > .I flags > is constructed by OR'ing together zero or more of the following constants: > .TP > .BR AT_EMPTY_PATH " (since Linux 2.6.39)" > .\" commit 65cfc6722361570bfe255698d9cd4dccaf47570d > If > .I pathname > is an empty string, operate on the file referred to by > .IR dirfd > (which may have been obtained using the > .BR open (2) > .B O_PATH > flag). > If > .I dirfd > is > .BR AT_FDCWD , > the call operates on the current working directory. > In this case, > .I dirfd > can refer to any type of file, not just a directory. Is (flags & AT_EMPTY_PATH) is the same as (pathname == NULL)? > This flag is Linux-specific; define > .B _GNU_SOURCE > .\" Before glibc 2.16, defining _ATFILE_SOURCE sufficed > to obtain its definition. > .TP > .BR AT_NO_AUTOMOUNT " (since Linux 2.6.38)" > Don't automount the terminal ("basename") component of > .I pathname > if it is a directory that is an automount point. > This allows the caller to gather attributes of an automount point > (rather than the location it would mount). > This flag can be used in tools that scan directories > to prevent mass-automounting of a directory of automount points. > The > .B AT_NO_AUTOMOUNT > flag has no effect if the mount point has already been mounted over. > This flag is Linux-specific; define > .B _GNU_SOURCE > .\" Before glibc 2.16, defining _ATFILE_SOURCE sufficed > to obtain its definition. > .TP > .B AT_SYMLINK_NOFOLLOW > If > .I pathname > is a symbolic link, do not dereference it: > instead return information about the link itself, like > .BR lstat (). > .PP > .I flags > can also be used to control what sort of synchronisation the kernel will do > when querying a file on a remote filesystem. This is done by OR'ing in one of > the following values: > .TP > AT_STATX_SYNC_AS_STAT > Do whatever > .BR stat () > does. This is the default and is very much filesystem specific. > .TP > AT_STATX_FORCE_SYNC > Force the attributes to be synchronised with the server. This may require that > a network filesystem perform a data writeback to get the timestamps correct. > .TP > AT_STATX_DONT_SYNC > Don't synchronise anything, but rather just take whatever the system has cached > if possible. This may mean that the information returned is approximate, but, > on a network filesystem, it may not involve a round trip to the server - even > if no lease is held. > .PP > The > .I mask > argument to > .BR statx () > is used to tell the kernel which fields the caller is interested in > .I mask > is an OR'ed combination of the following constants: > .PP > .in +4n > .TS > lB l. > STATX_TYPE Want stx_mode & S_IFMT > STATX_MODE Want stx_mode & ~S_IFMT > STATX_NLINK Want stx_nlink > STATX_UID Want stx_uid > STATX_GID Want stx_gid > STATX_ATIME Want stx_atime{,_ns} > STATX_MTIME Want stx_mtime{,_ns} > STATX_CTIME Want stx_ctime{,_ns} > STATX_INO Want stx_ino > STATX_SIZE Want stx_size > STATX_BLOCKS Want stx_blocks > STATX_BASIC_STATS [The stuff in the normal stat struct] > STATX_BTIME Want stx_btime{,_ns} > STATX_ALL [All currently available stuff] > .TE > .in > .PP > .B "Do not" > simply set > .I mask > to UINT_MAX as one or more bits may, in future, be used to specify an extension > to the buffer. > .SS > The returned information > .PP > The status information for the target file is returned in the > .I statx > structure pointed to by > .IR buf . > Included in this is > .I stx_mask > which indicates what other information has been returned. > .I stx_mask > has the same format as the mask argument and bits are set in it to indicate > which fields have been filled in. > .PP > It should be noted that the kernel may return fields that weren't requested and > may fail to return fields that were requested, depending on what the backing > filesystem supports. In either case, > .I stx_mask > will not be equal > .IR mask . > .PP > If a filesystem does not support a field or if it has an unrepresentable value > (for instance, a file with an exotic type), then the mask bit corresponding to > that field will be cleared in > .I stx_mask > even if the user asked for it and a dummy value will be filled in for > compatibility purposes if one is available (e.g. a dummy uid and gid may be > specified to mount under some circumstances). > .PP > A filesystem may also fill in fields that the caller didn't ask for if it has > values for them available at no extra cost. If this happens, the corresponding > bits will be set in > .IR stx_mask . > .PP > > .\" Background: inode attributes are modified with i_mutex held, but > .\" read by stat() without taking the mutex. > .I Note: > For performance and simplicity reasons, different fields in the > .I statx > structure may contain state information from different moments > during the execution of the system call. Hm. Judging from the ext4 patch you proposed, I gather this is expected, at least in the btime case. --D > For example, if > .IR stx_mode > or > .IR stx_uid > is changed by another process by calling > .BR chmod (2) > or > .BR chown (2), > .BR stat () > might return the old > .I stx_mode > together with the new > .IR stx_uid , > or the old > .I stx_uid > together with the new > .IR stx_mode . > .PP > Apart from stx_mask (which is described above), the fields in the > .I statx > structure are: > .TP > .I stx_mode > The file type and mode. This is described in more detail below. > .TP > .I stx_size > The size of the file (if it is a regular file or a symbolic link) in bytes. > The size of a symbolic link is the length of the pathname it contains, without > a terminating null byte. > .TP > .I stx_blocks > The number of blocks allocated to the file on the medium, in 512-byte units. > (This may be smaller than > .IR stx_size /512 > when the file has holes.) > .TP > .I stx_blksize > The "preferred" blocksize for efficient filesystem I/O. (Writing to a file in > smaller chunks may cause an inefficient read-modify-rewrite.) > .TP > .I stx_nlink > The number of hard links on a file. > .TP > .I stx_uid > The user ID of the file's owner. > .TP > .I stx_gid > The ID of the group that may access the file. > .TP > .IR stx_dev_major " and " stx_dev_minor > The device on which this file (inode) resides. > .TP > .IR stx_rdev_major " and " stx_rdev_minor > The device that this file (inode) represents if the file is of block or > character device type. > .TP > .I stx_attributes > Further status information about the file. This consists of zero or more of > the following constants OR'ed together: > .in +4n > .TS > lB l. > STATX_ATTR_COMPRESSED File is compressed by the fs > STATX_ATTR_IMMUTABLE File is marked immutable > STATX_ATTR_APPEND File is append-only > STATX_ATTR_NODUMP File is not to be dumped > STATX_ATTR_ENCRYPTED File requires key to decrypt in fs > .TE > .in > .TP > .I stx_atime > The file's last access timestamp. > This field is changed by file accesses, for example, by > .BR execve (2), > .BR mknod (2), > .BR pipe (2), > .BR utime (2), > and > .BR read (2) > (of more than zero bytes). > Other routines, such as > .BR mmap (2), > may or may not update it. > .TP > .I stx_btime > The file's creation timestamp. This is set on file creation and not changed > subsequently. > .TP > .I stx_ctime > The file's last status change timestamp. This field is changed by writing or > by setting inode information (i.e., owner, group, link count, mode, etc.). > .TP > .I stx_mtime > The file's last modification timestamp. This is changed by file modifications, > for example, by > .BR mknod (2), > .BR truncate (2), > .BR utime (2), > and > .BR write (2) > (of more than zero bytes). Moreover, the modification time of a directory is > changed by the creation or deletion of files in that directory. This field is > .I not > changed for changes in owner, group, hard link count, or mode. > > > > .PP > Not all of the Linux filesystems implement all of the timestamp fields. Some > filesystems allow mounting in such a way that file and/or directory accesses do > not cause an update of the > .I stx_atime > field. > (See > .IR noatime , > .IR nodiratime , > and > .I relatime > in > .BR mount (8), > and related information in > .BR mount (2).) > In addition, > .I stx_atime > is not updated if a file is opened with the > .BR O_NOATIME ; > see > .BR open (2). > > .SS File type and mode > .PP > The > .I stx_mode > field contains the combined file type and mode. POSIX refers to the bits in > this field corresponding to the mask > .B S_IFMT > (see below) as the > .IR "file type" , > the 12 bits corresponding to the mask 07777 as the > .IR "file mode bits" > and the least significant 9 bits (0777) as the > .IR "file permission bits" . > .IP > The following mask values are defined for the file type of the > .I stx_mode > field: > .in +4n > .TS > lB l l. > S_IFMT 0170000 bit mask for the file type bit field > > S_IFSOCK 0140000 socket > S_IFLNK 0120000 symbolic link > S_IFREG 0100000 regular file > S_IFBLK 0060000 block device > S_IFDIR 0040000 directory > S_IFCHR 0020000 character device > S_IFIFO 0010000 FIFO > .TE > .in > .IP > Note that > .I stx_mode > has two mask flags covering it: one for the type and one for the mode bits. > .PP > To test for a regular file (for example), one could write: > .nf > .in +4n > statx(AT_FDCWD, pathname, 0, STATX_BASIC_STATS, &sb); > if ((sb.stx_mode & S_IFMT) == S_IFREG) { > /* Handle regular file */ > } > .in > .fi > .PP > Because tests of the above form are common, additional macros are defined by > POSIX to allow the test of the file type in > .I stx_mode > to be written more concisely: > .RS 4 > .TS > lB l. > \fBS_ISREG\fR(m) Is it a regular file? > \fBS_ISDIR\fR(m) Is it a directory? > \fBS_ISCHR\fR(m) Is it a character device? > \fBS_ISBLK\fR(m) Is it a block device? > \fBS_ISFIFO\fR(m) Is it a FIFO (named pipe)? > \fBS_ISLNK\fR(m) Is it a symbolic link? (Not in POSIX.1-1996.) > \fBS_ISSOCK\fR(m) Is it a socket? (Not in POSIX.1-1996.) > .TE > .RE > .PP > The preceding code snippet could thus be rewritten as: > > .nf > .in +4n > statx(AT_FDCWD, pathname, 0, STATX_BASIC_STATS, &sb); > if (S_ISREG(sb.stx_mode)) { > /* Handle regular file */ > } > .in > .fi > .PP > The definitions of most of the above file type test macros > are provided if any of the following feature test macros is defined: > .BR _BSD_SOURCE > (in glibc 2.19 and earlier), > .BR _SVID_SOURCE > (in glibc 2.19 and earlier), > or > .BR _DEFAULT_SOURCE > (in glibc 2.20 and later). > In addition, definitions of all of the above macros except > .BR S_IFSOCK > and > .BR S_ISSOCK () > are provided if > .BR _XOPEN_SOURCE > is defined. > The definition of > .BR S_IFSOCK > can also be exposed by defining > .BR _XOPEN_SOURCE > with a value of 500 or greater. > > The definition of > .BR S_ISSOCK () > is exposed if any of the following feature test macros is defined: > .BR _BSD_SOURCE > (in glibc 2.19 and earlier), > .BR _DEFAULT_SOURCE > (in glibc 2.20 and later), > .BR _XOPEN_SOURCE > with a value of 500 or greater, or > .BR _POSIX_C_SOURCE > with a value of 200112L or greater. > .PP > The following mask values are defined for > the file mode component of the > .I stx_mode > field: > .in +4n > .TS > lB l l. > S_ISUID 04000 set-user-ID bit > S_ISGID 02000 set-group-ID bit (see below) > S_ISVTX 01000 sticky bit (see below) > > S_IRWXU 00700 owner has read, write, and execute permission > S_IRUSR 00400 owner has read permission > S_IWUSR 00200 owner has write permission > S_IXUSR 00100 owner has execute permission > > S_IRWXG 00070 group has read, write, and execute permission > S_IRGRP 00040 group has read permission > S_IWGRP 00020 group has write permission > S_IXGRP 00010 group has execute permission > > S_IRWXO 00007 T{ > others (not in group) have read, write, and execute permission > T} > S_IROTH 00004 others have read permission > S_IWOTH 00002 others have write permission > S_IXOTH 00001 others have execute permission > .TE > .in > .P > The set-group-ID bit > .RB ( S_ISGID ) > has several special uses. > For a directory, it indicates that BSD semantics is to be used > for that directory: files created there inherit their group ID from > the directory, not from the effective group ID of the creating process, > and directories created there will also get the > .B S_ISGID > bit set. > For a file that does not have the group execution bit > .RB ( S_IXGRP ) > set, > the set-group-ID bit indicates mandatory file/record locking. > .P > The sticky bit > .RB ( S_ISVTX ) > on a directory means that a file > in that directory can be renamed or deleted only by the owner > of the file, by the owner of the directory, and by a privileged > process. > > > .SH RETURN VALUE > On success, zero is returned. > On error, \-1 is returned, and > .I errno > is set appropriately. > .SH ERRORS > .TP > .B EINVAL > Invalid flag specified in > .IR flags . > .TP > .B EACCES > Search permission is denied for one of the directories > in the path prefix of > .IR pathname . > (See also > .BR path_resolution (7).) > .TP > .B EBADF > .I dirfd > is not a valid open file descriptor. > .TP > .B EFAULT > Bad address. > .TP > .B ELOOP > Too many symbolic links encountered while traversing the path. > .TP > .B ENAMETOOLONG > .I pathname > is too long. > .TP > .B ENOENT > A component of > .I pathname > does not exist, or > .I pathname > is an empty string. > .TP > .B ENOMEM > Out of memory (i.e., kernel memory). > .TP > .B ENOTDIR > A component of the path prefix of > .I pathname > is not a directory or > .I pathname > is relative and > .I dirfd > is a file descriptor referring to a file other than a directory. > .SH VERSIONS > .BR statx () > was added to Linux in kernel 4.11; > library support is not yet added to glibc. > .SH SEE ALSO > .BR ls (1), > .BR stat (1), > .BR access (2), > .BR chmod (2), > .BR chown (2), > .BR readlink (2), > .BR utime (2), > .BR capabilities (7), > .BR symlink (7)
Darrick J. Wong <darrick.wong@oracle.com> wrote: > Is (flags & AT_EMPTY_PATH) is the same as (pathname == NULL)? I'm not sure. I was wondering about that whilst writing this. The text describing the option is copied from the stat(2) manpage. It specified an empty path (presumably "") rather than NULL. I'm also not sure whether it has other ramifications. > > .I Note: > > For performance and simplicity reasons, different fields in the > > .I statx > > structure may contain state information from different moments > > during the execution of the system call. > > Hm. Judging from the ext4 patch you proposed, I gather this is > expected, at least in the btime case. This is copied from the stat(2) manpage. The comment in the source there says it's because there's no locking in the getattr path against the setattr path. David
On Tue, Mar 07, 2017 at 06:39:42PM +0000, David Howells wrote: > Darrick J. Wong <darrick.wong@oracle.com> wrote: > > > Is (flags & AT_EMPTY_PATH) is the same as (pathname == NULL)? > > I'm not sure. I was wondering about that whilst writing this. The text > describing the option is copied from the stat(2) manpage. It specified an > empty path (presumably "") rather than NULL. I'm also not sure whether it has > other ramifications. <nod> > > > .I Note: > > > For performance and simplicity reasons, different fields in the > > > .I statx > > > structure may contain state information from different moments > > > during the execution of the system call. > > > > Hm. Judging from the ext4 patch you proposed, I gather this is > > expected, at least in the btime case. Ugh, I put that in the wrong part -- this was supposed to be a comment about the part where the manpage states that filesystems can return more than userspace asked for. My eyes are bad at reading manpage source. :/ (Sorry for the noise.) --D > This is copied from the stat(2) manpage. The comment in the source there says > it's because there's no locking in the getattr path against the setattr path. > > David
Darrick J. Wong <darrick.wong@oracle.com> wrote: > > > Hm. Judging from the ext4 patch you proposed, I gather this is > > > expected, at least in the btime case. > > Ugh, I put that in the wrong part -- this was supposed to be a comment > about the part where the manpage states that filesystems can return more > than userspace asked for. In that case, sort of. You could just ask for STATX_TYPE | STATX_MODE, for example, but the kernel is perfectly at liberty to add STATX_MTIME, STATX_SIZE, STATX_INO, etc. even if it doesn't add STATX_BTIME as well. > My eyes are bad at reading manpage source. :/ I know what you mean. David
> On Mar 7, 2017, at 10:22 AM, David Howells <dhowells@redhat.com> wrote: > > Christoph Hellwig <hch@infradead.org> wrote: > >> This would be great to have in 4.11 together with the initial statx >> implementation. But until I see documentation and testcases for statx >> I don't really feel comfortable reviewing anything related to it. > > Well, since you asked for documentation, here's a manual page for you to > review:-) > > Note that as it isn't in glibc yet, I've left out all the > set-this-and-that-#define-to-make-it-appear stuff except where it is pertinent > to particular constants. > > I don't suppose you know where the documentation on writing xfstests tests is? > xfstests-dev/doc/ only contains an old and out of date changelog. > > David > --- > '\" t > .\" Copyright (c) 1992 Drew Eckhardt (drew@cs.colorado.edu), March 28, 1992 > .\" Parts Copyright (c) 1995 Nicolai Langfeldt (janl@ifi.uio.no), 1/1/95 > .\" and Copyright (c) 2006, 2007, 2014 Michael Kerrisk <mtk.manpages@gmail.com> > .\" and Copyright (c) 2017 David Howells <dhowells@redhat.com> > .\" > .\" %%%LICENSE_START(VERBATIM) > .\" Permission is granted to make and distribute verbatim copies of this > .\" manual provided the copyright notice and this permission notice are > .\" preserved on all copies. > .\" > .\" Permission is granted to copy and distribute modified versions of this > .\" manual under the conditions for verbatim copying, provided that the > .\" entire resulting derived work is distributed under the terms of a > .\" permission notice identical to this one. > .\" > .\" Since the Linux kernel and libraries are constantly changing, this > .\" manual page may be incorrect or out-of-date. The author(s) assume no > .\" responsibility for errors or omissions, or for damages resulting from > .\" the use of the information contained herein. The author(s) may not > .\" have taken the same level of care in the production of this manual, > .\" which is licensed free of charge, as they might when working > .\" professionally. > .\" > .\" Formatted or processed versions of this manual, if unaccompanied by > .\" the source, must acknowledge the copyright and authors of this work. > .\" %%%LICENSE_END > .\" > .TH STATX 2 2017-03-07 "Linux" "Linux Programmer's Manual" > .SH NAME > statx \- Get file status (extended) > .SH SYNOPSIS > .nf > .B #include <sys/types.h> > .br > .B #include <sys/stat.h> > .br > .B #include <unistd.h> > .br > .BR "#include <fcntl.h> " "/* Definition of AT_* constants */" > .sp > .BI "int statx(int " dirfd ", const char *" pathname ", int " flags "," > .BI " unsigned int " mask ", struct statx *" buf ); > .fi > .sp > .in -4n > Feature Test Macro Requirements for glibc (see > .BR feature_test_macros (7)): > .in > .ad l > .PD 0 > .sp > .RS 4 > <unknown as yet> > .RE > .PD > .ad > .SH DESCRIPTION > .PP > This function returns information about a file, storing it in the buffer > pointed to by > .IR buf . > The buffer is filled in according to the following type: > .PP > .in +4n > .nf > struct statx { > __u32 stx_mask; -- Mask of bits indicating filled fields > __u32 stx_blksize; -- Block size for filesystem I/O > __u64 stx_attributes; -- Extra file attribute indicators > __u32 stx_nlink; -- Number of hard links > __u32 stx_uid; -- User ID of owner > __u32 stx_gid; -- Group ID of owner > __u16 stx_mode; -- File type and mode > __u64 stx_ino; -- Inode number > __u64 stx_size; -- Total size in bytes > __u64 stx_blocks; -- Number of 512B blocks allocated > struct statx_timestamp stx_atime; -- Time of last access > struct statx_timestamp stx_btime; -- Time of creation > struct statx_timestamp stx_ctime; -- Time of last status change > struct statx_timestamp stx_mtime; -- Time of last modification > __u32 stx_rdev_major; } Device number if device file > __u32 stx_rdev_minor; } > __u32 stx_dev_major; } Device number of containing file > __u32 stx_dev_minor; } > }; > .fi > .in > .PP > Where the timestamps are defined as: > .PP > .in +4n > .nf > struct statx_timestamp { > __s64 tv_sec; -- Number of seconds before or since 1970 > __s32 tv_nsec; -- Number of nanoseconds before or since tv_sec > }; > .fi > .in > .PP > (Note that reserved space and padding is ommitted) Do you think that not including the padding could be problematic for users? > .SS > Invoking \fBstatx\fR(): > .PP > To access a file's status, no permissions are required on the file itself, but > in the case of > .BR statx () > with a path, execute (search) permission is required on all of the directories > in > .I pathname > that lead to the file. > .PP > .BR statx () > uses > .IR pathname ", " dirfd " and " flags > to locate the target file in one of a variety of ways: > .TP > [*] By absolute path. > .I pathname > points to an absolute path and > .I dirfd > is ignored. The file is looked up by name, starting from the root of the > filesystem as seen by the calling process. > .TP > [*] By cwd-relative path. > .I pathname > points to a relative path and > .IR dirfd " is " AT_FDCWD . > The file is looked up by name, starting from the current working directory. > .TP > [*] By dir-relative path. > .I pathname > points to relative path and > .I dirfd > indicates a file descriptor pointing to a directory. The file is looked up by > name, starting from the directory specified by > .IR dirfd . > .TP > [*] By file descriptor. > .IR pathname " is " NULL " and " dirfd > indicates a file descriptor. The file attached to the file descriptor is > queried directly. The file descriptor may point to any type of file, not just > a directory. > .PP > .I flags > can be used to influence a path-based lookup. A value for > .I flags > is constructed by OR'ing together zero or more of the following constants: > .TP > .BR AT_EMPTY_PATH " (since Linux 2.6.39)" > .\" commit 65cfc6722361570bfe255698d9cd4dccaf47570d > If > .I pathname > is an empty string, operate on the file referred to by > .IR dirfd > (which may have been obtained using the > .BR open (2) > .B O_PATH > flag). > If > .I dirfd > is > .BR AT_FDCWD , > the call operates on the current working directory. > In this case, > .I dirfd > can refer to any type of file, not just a directory. > This flag is Linux-specific; define > .B _GNU_SOURCE > .\" Before glibc 2.16, defining _ATFILE_SOURCE sufficed > to obtain its definition. > .TP > .BR AT_NO_AUTOMOUNT " (since Linux 2.6.38)" > Don't automount the terminal ("basename") component of > .I pathname > if it is a directory that is an automount point. > This allows the caller to gather attributes of an automount point > (rather than the location it would mount). > This flag can be used in tools that scan directories > to prevent mass-automounting of a directory of automount points. > The > .B AT_NO_AUTOMOUNT > flag has no effect if the mount point has already been mounted over. > This flag is Linux-specific; define > .B _GNU_SOURCE > .\" Before glibc 2.16, defining _ATFILE_SOURCE sufficed > to obtain its definition. > .TP > .B AT_SYMLINK_NOFOLLOW > If > .I pathname > is a symbolic link, do not dereference it: > instead return information about the link itself, like > .BR lstat (). > .PP > .I flags > can also be used to control what sort of synchronisation the kernel will do > when querying a file on a remote filesystem. This is done by OR'ing in one of > the following values: > .TP > AT_STATX_SYNC_AS_STAT > Do whatever > .BR stat () > does. This is the default and is very much filesystem specific. > .TP > AT_STATX_FORCE_SYNC > Force the attributes to be synchronised with the server. This may require that > a network filesystem perform a data writeback to get the timestamps correct. > .TP > AT_STATX_DONT_SYNC > Don't synchronise anything, but rather just take whatever the system has cached > if possible. This may mean that the information returned is approximate, but, > on a network filesystem, it may not involve a round trip to the server - even > if no lease is held. > .PP > The > .I mask > argument to > .BR statx () > is used to tell the kernel which fields the caller is interested in > .I mask > is an OR'ed combination of the following constants: > .PP > .in +4n > .TS > lB l. > STATX_TYPE Want stx_mode & S_IFMT > STATX_MODE Want stx_mode & ~S_IFMT > STATX_NLINK Want stx_nlink > STATX_UID Want stx_uid > STATX_GID Want stx_gid > STATX_ATIME Want stx_atime{,_ns} > STATX_MTIME Want stx_mtime{,_ns} > STATX_CTIME Want stx_ctime{,_ns} > STATX_INO Want stx_ino > STATX_SIZE Want stx_size > STATX_BLOCKS Want stx_blocks > STATX_BASIC_STATS [The stuff in the normal stat struct] > STATX_BTIME Want stx_btime{,_ns} > STATX_ALL [All currently available stuff] > .TE > .in > .PP > .B "Do not" > simply set > .I mask > to UINT_MAX as one or more bits may, in future, be used to specify an extension > to the buffer. > .SS > The returned information > .PP > The status information for the target file is returned in the > .I statx > structure pointed to by > .IR buf . > Included in this is > .I stx_mask > which indicates what other information has been returned. > .I stx_mask > has the same format as the mask argument and bits are set in it to indicate > which fields have been filled in. > .PP > It should be noted that the kernel may return fields that weren't requested and > may fail to return fields that were requested, depending on what the backing > filesystem supports. In either case, > .I stx_mask > will not be equal > .IR mask . > .PP > If a filesystem does not support a field or if it has an unrepresentable value > (for instance, a file with an exotic type), then the mask bit corresponding to > that field will be cleared in > .I stx_mask > even if the user asked for it and a dummy value will be filled in for > compatibility purposes if one is available (e.g. a dummy uid and gid may be > specified to mount under some circumstances). > .PP > A filesystem may also fill in fields that the caller didn't ask for if it has > values for them available at no extra cost. If this happens, the corresponding > bits will be set in > .IR stx_mask . > .PP > > .\" Background: inode attributes are modified with i_mutex held, but > .\" read by stat() without taking the mutex. > .I Note: > For performance and simplicity reasons, different fields in the > .I statx > structure may contain state information from different moments > during the execution of the system call. > For example, if > .IR stx_mode > or > .IR stx_uid > is changed by another process by calling > .BR chmod (2) > or > .BR chown (2), > .BR stat () > might return the old > .I stx_mode > together with the new > .IR stx_uid , > or the old > .I stx_uid > together with the new > .IR stx_mode . > .PP > Apart from stx_mask (which is described above), the fields in the > .I statx > structure are: > .TP > .I stx_mode > The file type and mode. This is described in more detail below. > .TP > .I stx_size > The size of the file (if it is a regular file or a symbolic link) in bytes. > The size of a symbolic link is the length of the pathname it contains, without > a terminating null byte. > .TP > .I stx_blocks > The number of blocks allocated to the file on the medium, in 512-byte units. > (This may be smaller than > .IR stx_size /512 > when the file has holes.) > .TP > .I stx_blksize > The "preferred" blocksize for efficient filesystem I/O. (Writing to a file in > smaller chunks may cause an inefficient read-modify-rewrite.) > .TP > .I stx_nlink > The number of hard links on a file. > .TP > .I stx_uid > The user ID of the file's owner. > .TP > .I stx_gid > The ID of the group that may access the file. > .TP > .IR stx_dev_major " and " stx_dev_minor > The device on which this file (inode) resides. > .TP > .IR stx_rdev_major " and " stx_rdev_minor > The device that this file (inode) represents if the file is of block or > character device type. > .TP > .I stx_attributes > Further status information about the file. This consists of zero or more of > the following constants OR'ed together: > .in +4n > .TS > lB l. > STATX_ATTR_COMPRESSED File is compressed by the fs > STATX_ATTR_IMMUTABLE File is marked immutable This definition is circular... Maybe "File cannot be changed once closed"? > STATX_ATTR_APPEND File is append-only This is also circular. Maybe "file can only be written after .B stx_size"? > STATX_ATTR_NODUMP File is not to be dumped This one is also circular. Maybe "not to be included in backups"? > STATX_ATTR_ENCRYPTED File requires key to decrypt in fs > .TE > .in > .TP > .I stx_atime > The file's last access timestamp. > This field is changed by file accesses, for example, by > .BR execve (2), > .BR mknod (2), > .BR pipe (2), > .BR utime (2), > and > .BR read (2) > (of more than zero bytes). > Other routines, such as > .BR mmap (2), > may or may not update it. > .TP > .I stx_btime > The file's creation timestamp. This is set on file creation and not changed > subsequently. > .TP > .I stx_ctime > The file's last status change timestamp. This field is changed by writing or > by setting inode information (i.e., owner, group, link count, mode, etc.). > .TP > .I stx_mtime > The file's last modification timestamp. This is changed by file modifications, > for example, by > .BR mknod (2), > .BR truncate (2), > .BR utime (2), > and > .BR write (2) > (of more than zero bytes). Moreover, the modification time of a directory is > changed by the creation or deletion of files in that directory. This field is > .I not > changed for changes in owner, group, hard link count, or mode. > > > > .PP > Not all of the Linux filesystems implement all of the timestamp fields. Some > filesystems allow mounting in such a way that file and/or directory accesses do > not cause an update of the > .I stx_atime > field. > (See > .IR noatime , > .IR nodiratime , > and > .I relatime > in > .BR mount (8), > and related information in > .BR mount (2).) > In addition, > .I stx_atime > is not updated if a file is opened with the > .BR O_NOATIME ; > see > .BR open (2). > > .SS File type and mode > .PP > The > .I stx_mode > field contains the combined file type and mode. POSIX refers to the bits in > this field corresponding to the mask > .B S_IFMT > (see below) as the > .IR "file type" , > the 12 bits corresponding to the mask 07777 as the > .IR "file mode bits" > and the least significant 9 bits (0777) as the > .IR "file permission bits" . > .IP > The following mask values are defined for the file type of the > .I stx_mode > field: > .in +4n > .TS > lB l l. > S_IFMT 0170000 bit mask for the file type bit field > > S_IFSOCK 0140000 socket > S_IFLNK 0120000 symbolic link > S_IFREG 0100000 regular file > S_IFBLK 0060000 block device > S_IFDIR 0040000 directory > S_IFCHR 0020000 character device > S_IFIFO 0010000 FIFO > .TE > .in > .IP > Note that > .I stx_mode > has two mask flags covering it: one for the type and one for the mode bits. > .PP > To test for a regular file (for example), one could write: > .nf > .in +4n > statx(AT_FDCWD, pathname, 0, STATX_BASIC_STATS, &sb); > if ((sb.stx_mode & S_IFMT) == S_IFREG) { > /* Handle regular file */ > } Two notes on these two examples: - users unfamiliar with statx() may benefit from seeing STATX_TYPE used here instead of STATX_BASIC_STATS - the check should also look for the presence of STATX_TYPE in the returned stx_mask to ensure it is valid before use? To test whether a path is a regular file (for example), one could write: .nf .in +4n rc = statx(AT_FDCWD, pathname, 0, STATX_TYPE, &stx); if (stx.stx_mask & STATX_TYPE && S_ISREG(sb.stx_mode)) { /* Handle regular file */ } > Because tests of the above form are common, additional macros are defined by > POSIX to allow the test of the file type in > .I stx_mode > to be written more concisely: Should this all just reference the existing stat(2) man page instead of duplicating the whole contents here? This is spending a lot of space discussing the stx_mode field which could be avoided. > .RS 4 > .TS > lB l. > \fBS_ISREG\fR(m) Is it a regular file? > \fBS_ISDIR\fR(m) Is it a directory? > \fBS_ISCHR\fR(m) Is it a character device? > \fBS_ISBLK\fR(m) Is it a block device? > \fBS_ISFIFO\fR(m) Is it a FIFO (named pipe)? > \fBS_ISLNK\fR(m) Is it a symbolic link? (Not in POSIX.1-1996.) > \fBS_ISSOCK\fR(m) Is it a socket? (Not in POSIX.1-1996.) > .TE > .RE > .PP > The preceding code snippet could thus be rewritten as: > > .nf > .in +4n > statx(AT_FDCWD, pathname, 0, STATX_BASIC_STATS, &sb); > if (S_ISREG(sb.stx_mode)) { > /* Handle regular file */ > } > .in > .fi > .PP > The definitions of most of the above file type test macros > are provided if any of the following feature test macros is defined: > .BR _BSD_SOURCE > (in glibc 2.19 and earlier), > .BR _SVID_SOURCE > (in glibc 2.19 and earlier), > or > .BR _DEFAULT_SOURCE > (in glibc 2.20 and later). > In addition, definitions of all of the above macros except > .BR S_IFSOCK > and > .BR S_ISSOCK () > are provided if > .BR _XOPEN_SOURCE > is defined. > The definition of > .BR S_IFSOCK > can also be exposed by defining > .BR _XOPEN_SOURCE > with a value of 500 or greater. > > The definition of > .BR S_ISSOCK () > is exposed if any of the following feature test macros is defined: > .BR _BSD_SOURCE > (in glibc 2.19 and earlier), > .BR _DEFAULT_SOURCE > (in glibc 2.20 and later), > .BR _XOPEN_SOURCE > with a value of 500 or greater, or > .BR _POSIX_C_SOURCE > with a value of 200112L or greater. > .PP > The following mask values are defined for > the file mode component of the > .I stx_mode > field: > .in +4n > .TS > lB l l. > S_ISUID 04000 set-user-ID bit > S_ISGID 02000 set-group-ID bit (see below) > S_ISVTX 01000 sticky bit (see below) > > S_IRWXU 00700 owner has read, write, and execute permission > S_IRUSR 00400 owner has read permission > S_IWUSR 00200 owner has write permission > S_IXUSR 00100 owner has execute permission > > S_IRWXG 00070 group has read, write, and execute permission > S_IRGRP 00040 group has read permission > S_IWGRP 00020 group has write permission > S_IXGRP 00010 group has execute permission > > S_IRWXO 00007 T{ > others (not in group) have read, write, and execute permission > T} > S_IROTH 00004 others have read permission > S_IWOTH 00002 others have write permission > S_IXOTH 00001 others have execute permission > .TE > .in > .P > The set-group-ID bit > .RB ( S_ISGID ) > has several special uses. > For a directory, it indicates that BSD semantics is to be used > for that directory: files created there inherit their group ID from > the directory, not from the effective group ID of the creating process, > and directories created there will also get the > .B S_ISGID > bit set. > For a file that does not have the group execution bit > .RB ( S_IXGRP ) > set, > the set-group-ID bit indicates mandatory file/record locking. > .P > The sticky bit > .RB ( S_ISVTX ) > on a directory means that a file > in that directory can be renamed or deleted only by the owner > of the file, by the owner of the directory, and by a privileged > process. > > > .SH RETURN VALUE > On success, zero is returned. > On error, \-1 is returned, and > .I errno > is set appropriately. > .SH ERRORS > .TP > .B EINVAL > Invalid flag specified in > .IR flags . > .TP > .B EACCES > Search permission is denied for one of the directories > in the path prefix of > .IR pathname . > (See also > .BR path_resolution (7).) > .TP > .B EBADF > .I dirfd > is not a valid open file descriptor. > .TP > .B EFAULT > Bad address. > .TP > .B ELOOP > Too many symbolic links encountered while traversing the path. > .TP > .B ENAMETOOLONG > .I pathname > is too long. > .TP > .B ENOENT > A component of > .I pathname > does not exist, or > .I pathname > is an empty string. > .TP > .B ENOMEM > Out of memory (i.e., kernel memory). > .TP > .B ENOTDIR > A component of the path prefix of > .I pathname > is not a directory or > .I pathname > is relative and > .I dirfd > is a file descriptor referring to a file other than a directory. > .SH VERSIONS > .BR statx () > was added to Linux in kernel 4.11; > library support is not yet added to glibc. > .SH SEE ALSO > .BR ls (1), > .BR stat (1), Maybe stat(2) ? :-) > .BR access (2), > .BR chmod (2), > .BR chown (2), > .BR readlink (2), > .BR utime (2), > .BR capabilities (7), > .BR symlink (7) Cheers, Andreas
On Tue, Mar 07, 2017 at 05:22:55PM +0000, David Howells wrote: > .fi > .in > .PP > (Note that reserved space and padding is ommitted) ommitted => omitted > .TP > .BR AT_EMPTY_PATH " (since Linux 2.6.39)" ... > .TP > .BR AT_NO_AUTOMOUNT " (since Linux 2.6.38)" Given that statx is just being added now in 4.11, saying "since Linux 2.6.39" and "since Linux 2.6.38" for these flags seems unneeded. > The > .I mask > argument to > .BR statx () > is used to tell the kernel which fields the caller is interested in > .I mask > is an OR'ed combination of the following constants: Missing a period between "interested in" and "mask". > .B "Do not" > simply set > .I mask > to UINT_MAX as one or more bits may, in future, be used to specify an extension > to the buffer. "in future" => "in the future". > .I Note: > For performance and simplicity reasons, different fields in the > .I statx > structure may contain state information from different moments > during the execution of the system call. > For example, if > .IR stx_mode > or > .IR stx_uid > is changed by another process by calling > .BR chmod (2) > or > .BR chown (2), > .BR stat () > might return the old > .I stx_mode > together with the new > .IR stx_uid , > or the old > .I stx_uid > together with the new > .IR stx_mode . This example is confusing because chmod() doesn't change st_uid, and chown() doesn't ordinarily change st_mode. A better example would be that if chown() is used to change both the owner and group, it's possible that statx() would return the new owner paired with the old group, or the new group paired with the old owner. Or, the new owner and group could be observed before the new ctime. Also, it seems this note belongs in a BUGS section of the man page, since it is a bug. > .SH ERRORS > .TP > .B EINVAL > Invalid flag specified in > .IR flags . Actually, the behavior when specifying invalid flags differs depending on whether a path is specified. If a path is specified, then statx() fails with EINVAL if unrecognized flags are specified. However, the behavior differs when no path is specified. For example, the the following call with invalid flags succeeds: statx(fd, NULL, 0xFFFF0000, 0, &stbuf); I think this is a bug, and it should be fixed by applying the same 'flags' validation regardless of whether a path is specified. > .B ENOENT > A component of > .I pathname > does not exist, or > .I pathname > is an empty string. is an empty string and AT_EMPTY_PATH was not specified in flags. > .SH SEE ALSO > .BR ls (1), > .BR stat (1), Shouldn't this link to stat (2), not stat (1)? - Eric
On Tue, Mar 07, 2017 at 05:22:55PM +0000, David Howells wrote: > Christoph Hellwig <hch@infradead.org> wrote: > > > This would be great to have in 4.11 together with the initial statx > > implementation. But until I see documentation and testcases for statx > > I don't really feel comfortable reviewing anything related to it. > > Well, since you asked for documentation, here's a manual page for you to > review:-) > > Note that as it isn't in glibc yet, I've left out all the > set-this-and-that-#define-to-make-it-appear stuff except where it is pertinent > to particular constants. > > I don't suppose you know where the documentation on writing xfstests tests is? > xfstests-dev/doc/ only contains an old and out of date changelog. There's a "ADDING TO THE FSQA SUITE" section in README file, basically it's telling you to use './new' script to generate new test template. So based on what type of new test you're going to create, you could run: ./new generic # new generic test ./new xfs # new xfs test ./new ext4 # new ext4 test ./new shared # new shared test ... It finds the first free seq number and creates test template and a default .out file in tests/<type> dir and add a new entry in tests/<type>/group file. You could just edit the template and update the .out file with expected output from the test run. E.g. currently 416 is the first free seq number in generic tests, ./new generic will create ./tests/generic/416 ./tests/generic/416.out I think statx test should be a new generic test. Thanks, Eryu
Andreas Dilger <adilger@dilger.ca> wrote: > > (Note that reserved space and padding is ommitted) > > Do you think that not including the padding could be problematic for users? Interesting question. Do you think it would be? I don't see why it should be a problem, since they shouldn't be touching it anyway. Also, the stat(2) manpage also gives the same warning and omits padding and doesn't seem to have been amended. But if you would prefer, I can put it back. The only reason I can really see for explicitly including it is to say that this is where future extensions will go. > > STATX_ATTR_IMMUTABLE File is marked immutable > ... > > STATX_ATTR_APPEND File is append-only I should copy the definitions in chattr(1). > To test whether a path is a regular file (for example), one could write: > .nf > .in +4n > rc = statx(AT_FDCWD, pathname, 0, STATX_TYPE, &stx); > if (stx.stx_mask & STATX_TYPE && S_ISREG(sb.stx_mode)) { > /* Handle regular file */ > } Good idea. > > Because tests of the above form are common, additional macros are defined by > > POSIX to allow the test of the file type in > > .I stx_mode > > to be written more concisely: > > Should this all just reference the existing stat(2) man page instead of > duplicating the whole contents here? This is spending a lot of space > discussing the stx_mode field which could be avoided. Possibly. On the other hand, it means that everything you need to refer to is in one page from the user's PoV. I'm not sure what the best policy on this is. If I do defer to stat(2), I do need to make some sort of note that the examples are different and the field name is stx_mask, not st_mask. David
Eric Biggers <ebiggers3@gmail.com> wrote: > > .I Note: > > For performance and simplicity reasons, different fields in the > > .I statx > > structure may contain state information from different moments > > during the execution of the system call. > > For example, if > > .IR stx_mode > > or > > .IR stx_uid > > is changed by another process by calling > > .BR chmod (2) > > or > > .BR chown (2), > > .BR stat () > > might return the old > > .I stx_mode > > together with the new > > .IR stx_uid , > > or the old > > .I stx_uid > > together with the new > > .IR stx_mode . > > This example is confusing because chmod() doesn't change st_uid, and chown() > doesn't ordinarily change st_mode. Doesn't chown() clear the S_ISUID and S_ISGID flags? > A better example would be that if chown() is used to change both the owner > and group, it's possible that statx() would return the new owner paired with > the old group, or the new group paired with the old owner. Or, the new > owner and group could be observed before the new ctime. This is copied more or less verbatim from stat(2), so what you said applies there too. > Also, it seems this note belongs in a BUGS section of the man page, since it > is a bug. No, it's not a bug, exactly. It's a design decision. It might be worth adding an AT_STAT_LOCK flag to synchronise against attribute setting, but it will cost you something in terms of performance - and even then, over a network filesystem it might not achieve anything. > > .SH ERRORS > > .TP > > .B EINVAL > > Invalid flag specified in > > .IR flags . > > Actually, the behavior when specifying invalid flags differs depending on > whether a path is specified. Good point. Should I reject AT_EMPTY_PATH, AT_NO_AUTOMOUNT and AT_SYMLINK_NOFOLLOW if pathname is NULL, I wonder? > > .B ENOENT > > A component of > > .I pathname > > does not exist, or > > .I pathname > > is an empty string. > > is an empty string and AT_EMPTY_PATH was not specified in flags. That needs fixing in stat(2) also. > > .SH SEE ALSO > > .BR ls (1), > > .BR stat (1), > > Shouldn't this link to stat (2), not stat (1)? Depends whether stat(1) ends up using statx() or not. I can take it out for now. David
On Wed, 2017-03-08 at 09:24 +0000, David Howells wrote: > Andreas Dilger <adilger@dilger.ca> wrote: > > > > (Note that reserved space and padding is ommitted) > > > > Do you think that not including the padding could be problematic for users? > > Interesting question. Do you think it would be? I don't see why it should be > a problem, since they shouldn't be touching it anyway. Also, the stat(2) > manpage also gives the same warning and omits padding and doesn't seem to have > been amended. > > But if you would prefer, I can put it back. > > The only reason I can really see for explicitly including it is to say that > this is where future extensions will go. > It seems like we really ought to have 2 statx manpages: A small statx(2) manpage that describes the kernel<->userland interface (showing structs with padding), and then the "real" statx(3) manpage that shows the glibc userland API. The kernel<->userland one could be very small, and just say "This is not the function you're interested in. Look at statx(3) for the C library interface.", similar to what readdir(2) does. > > > STATX_ATTR_IMMUTABLE File is marked immutable > > > > ... > > > STATX_ATTR_APPEND File is append-only > > I should copy the definitions in chattr(1). > > > To test whether a path is a regular file (for example), one could write: > > .nf > > .in +4n > > rc = statx(AT_FDCWD, pathname, 0, STATX_TYPE, &stx); > > if (stx.stx_mask & STATX_TYPE && S_ISREG(sb.stx_mode)) { > > /* Handle regular file */ > > } > > Good idea. Note that & and && have similar precedence, IIRC, so you probably want to wrap that flag check in parenthesis. > > > > Because tests of the above form are common, additional macros are defined by > > > POSIX to allow the test of the file type in > > > .I stx_mode > > > to be written more concisely: > > > > Should this all just reference the existing stat(2) man page instead of > > duplicating the whole contents here? This is spending a lot of space > > discussing the stx_mode field which could be avoided. > > Possibly. On the other hand, it means that everything you need to refer to is > in one page from the user's PoV. I'm not sure what the best policy on this > is. > > If I do defer to stat(2), I do need to make some sort of note that the > examples are different and the field name is stx_mask, not st_mask. > If the plan is to eventually (in some far away future) to deprecate stat(), then we probably don't want this manpage to refer to it as a canonical source of information. I say duplicate it here. It might also not hurt to consider updating the stat(2) manpage with a NOTE that points to the statx manpage as well.
Eryu Guan <eguan@redhat.com> wrote: > There's a "ADDING TO THE FSQA SUITE" section in README file, basically > it's telling you to use './new' script to generate new test template. So > based on what type of new test you're going to create, you could run: > > ./new generic # new generic test > ./new xfs # new xfs test > ./new ext4 # new ext4 test > ./new shared # new shared test > ... > > It finds the first free seq number and creates test template and a > default .out file in tests/<type> dir and add a new entry in > tests/<type>/group file. You could just edit the template and update > the .out file with expected output from the test run. Yes, I'd spotted the ./new script, but it's not as simple as that since shell scripts can't call system calls directly. David
On Thu, Mar 09, 2017 at 06:46:48AM +0000, David Howells wrote: > Eryu Guan <eguan@redhat.com> wrote: > > > There's a "ADDING TO THE FSQA SUITE" section in README file, basically > > it's telling you to use './new' script to generate new test template. So > > based on what type of new test you're going to create, you could run: > > > > ./new generic # new generic test > > ./new xfs # new xfs test > > ./new ext4 # new ext4 test > > ./new shared # new shared test > > ... > > > > It finds the first free seq number and creates test template and a > > default .out file in tests/<type> dir and add a new entry in > > tests/<type>/group file. You could just edit the template and update > > the .out file with expected output from the test run. > > Yes, I'd spotted the ./new script, but it's not as simple as that since shell > scripts can't call system calls directly. A normal way we take for fstests is adding new syscall support to xfs_io first, then writing new tests based on it, as some other new syscalls/utilities do, e.g. set|get_encpolicy were added to xfs_io in January to set|get encryption policy for ext4 encryption testing. Thanks, Eryu
On Thu, Mar 09, 2017 at 06:46:48AM +0000, David Howells wrote: > Eryu Guan <eguan@redhat.com> wrote: > > > There's a "ADDING TO THE FSQA SUITE" section in README file, basically > > it's telling you to use './new' script to generate new test template. So > > based on what type of new test you're going to create, you could run: > > > > ./new generic # new generic test > > ./new xfs # new xfs test > > ./new ext4 # new ext4 test > > ./new shared # new shared test > > ... > > > > It finds the first free seq number and creates test template and a > > default .out file in tests/<type> dir and add a new entry in > > tests/<type>/group file. You could just edit the template and update > > the .out file with expected output from the test run. > > Yes, I'd spotted the ./new script, but it's not as simple as that since shell > scripts can't call system calls directly. Probably best to place a program in src/ so that xfstests can make the raw syscalls early while we wait for glibc to invent a wrapper. --D > > David
Eryu Guan <eguan@redhat.com> wrote: > A normal way we take for fstests is adding new syscall support to xfs_io > first, then writing new tests based on it, as some other new > syscalls/utilities do, e.g. set|get_encpolicy were added to xfs_io > in January to set|get encryption policy for ext4 encryption testing. Please add this to the docs. Thanks, David
On Wed, Mar 08, 2017 at 10:59:49PM -0800, Darrick J. Wong wrote: > Probably best to place a program in src/ so that xfstests can make the > raw syscalls early while we wait for glibc to invent a wrapper. Іn general don't invent new programs if we can avoid it and add a statx command to xfs_io, that all the tests can be built around.
On Wed, Mar 08, 2017 at 09:41:00AM +0000, David Howells wrote: > > > > This example is confusing because chmod() doesn't change st_uid, and chown() > > doesn't ordinarily change st_mode. > > Doesn't chown() clear the S_ISUID and S_ISGID flags? > It can, but it's not a straightforward example. > > No, it's not a bug, exactly. It's a design decision. It might be worth > adding an AT_STAT_LOCK flag to synchronise against attribute setting, but it > will cost you something in terms of performance - and even then, over a > network filesystem it might not achieve anything. > Well, regardless of whether people want to fix it or not, I do think it's a bug. Currently even the update of a timestamp is non-atomic, since that involves updating both tv_sec and tv_nsec. Therefore, stat() might return the new tv_sec along with the old tv_nsec, or vice versa. This can cause some very strange behavior, such as two successively observed timestamps going backwards in time. I expect that historically people haven't complained enough for this to be worthwhile to fix. But in theory I think it could be fixed with little performance loss by protecting all the stat data with a sequence count, similar to how reads of i_size are made atomic on 32-bit platforms by using i_size_seqcount. > > Good point. Should I reject AT_EMPTY_PATH, AT_NO_AUTOMOUNT and > AT_SYMLINK_NOFOLLOW if pathname is NULL, I wonder? > It could be argued either way, but I was thinking those particular flags should just be ignored, as they have known meanings but simply aren't relevant. It could also cause confusion for statx(dfd, "", AT_EMPTY_PATH, 0, buffer) to succeed but for statx(dfd, NULL, AT_EMPTY_PATH, 0, buffer) to fail (for example). - Eric
On Wed, Mar 08, 2017 at 10:26:14AM -0500, Jeff Layton wrote: > On Wed, 2017-03-08 at 09:24 +0000, David Howells wrote: > > Andreas Dilger <adilger@dilger.ca> wrote: > > > To test whether a path is a regular file (for example), one could write: > > > .nf > > > .in +4n > > > rc = statx(AT_FDCWD, pathname, 0, STATX_TYPE, &stx); > > > if (stx.stx_mask & STATX_TYPE && S_ISREG(sb.stx_mode)) { > > > /* Handle regular file */ > > > } > > > > Good idea. > > Note that & and && have similar precedence, IIRC, so you probably want > to wrap that flag check in parenthesis. They're actually pretty far apart. & is priority 8 and && is priority 11. But the fact that you thought they were similar shows that most of us don't really know what C's operator precedence is (beyond + and *) and we should err on the side of clarity by using the unnecessary parens. > If the plan is to eventually (in some far away future) to deprecate > stat(), then we probably don't want this manpage to refer to it as a > canonical source of information. I say duplicate it here. We haven't even deprecated select() in favour of poll() ... indeed, pselect() was added. I can't imagine stat() ever being deprecated.
On Mon, 2017-03-20 at 09:01 -0700, Matthew Wilcox wrote: > On Wed, Mar 08, 2017 at 10:26:14AM -0500, Jeff Layton wrote: > > On Wed, 2017-03-08 at 09:24 +0000, David Howells wrote: > > > Andreas Dilger <adilger@dilger.ca> wrote: > > > > To test whether a path is a regular file (for example), one could write: > > > > .nf > > > > .in +4n > > > > rc = statx(AT_FDCWD, pathname, 0, STATX_TYPE, &stx); > > > > if (stx.stx_mask & STATX_TYPE && S_ISREG(sb.stx_mode)) { > > > > /* Handle regular file */ > > > > } > > > > > > Good idea. > > > > Note that & and && have similar precedence, IIRC, so you probably want > > to wrap that flag check in parenthesis. > > They're actually pretty far apart. & is priority 8 and && is priority 11. > But the fact that you thought they were similar shows that most of > us don't really know what C's operator precedence is (beyond + and *) > and we should err on the side of clarity by using the unnecessary parens. > Ahh, good to know. My rules are generally: multiplication and division come before addition and subtraction, and put parenthesis around everything else. :) > > If the plan is to eventually (in some far away future) to deprecate > > stat(), then we probably don't want this manpage to refer to it as a > > canonical source of information. I say duplicate it here. > > We haven't even deprecated select() in favour of poll() ... indeed, > pselect() was added. I can't imagine stat() ever being deprecated. Ok, deprecated is probably the wrong choice of words. You're correct that we'll never be able to fully rip it out just due to the volume of old programs that use it. My thinking is that statx will come to supplant stat in newer code, and stat will eventually become a "legacy" syscall. Typically we don't want to have to refer to "legacy" manpages for canonical info in new interfaces. If anything we may want to remove that info from stat(2) manpage and have it refer to statx(2) for it.
On Tue, Mar 07, 2017 at 05:22:55PM +0000, David Howells wrote: > STATX_ALL [All currently available stuff] > .TE > .in > .PP > .B "Do not" > simply set > .I mask > to UINT_MAX as one or more bits may, in future, be used to specify an extension > to the buffer. To clarify, will an "extension to the buffer" be an increase in the size of struct statx? I think it would have to be, otherwise programs filling a struct statx with STATX_ALL would start breaking as soon as they're rebuilt with the new value of STATX_ALL, no? Or would these "extension to the buffer" bits not be added to STATX_ALL ...? And I don't suppose there's anything we can do to stop programs from asking for mask bits that haven't been defined yet, then breaking later if they happen to be defined as "extensions"? Maybe adding an extra "buffer size" argument to the syscall? I'm concerned that the idea of "extensions" isn't well thought out, and in practice we'll just be stuck with the current struct size (256 bytes) forever. - Eric
> On Mar 24, 2017, at 2:53 PM, Eric Biggers <ebiggers3@gmail.com> wrote: > > On Tue, Mar 07, 2017 at 05:22:55PM +0000, David Howells wrote: >> STATX_ALL [All currently available stuff] >> .TE >> .in >> .PP >> .B "Do not" >> simply set >> .I mask >> to UINT_MAX as one or more bits may, in future, be used to specify an extension >> to the buffer. > > To clarify, will an "extension to the buffer" be an increase in the size of > struct statx? I think it would have to be, otherwise programs filling a struct > statx with STATX_ALL would start breaking as soon as they're rebuilt with the > new value of STATX_ALL, no? Or would these "extension to the buffer" bits not > be added to STATX_ALL ...? The value of STATX_ALL would match the size of struct statx in the header at compilation time, so this would always be consistent. > > And I don't suppose there's anything we can do to stop programs from asking > for mask bits that haven't been defined yet, then breaking later if they > happen to be defined as "extensions"? Maybe adding an extra "buffer size" > argument to the syscall? You can't stop applications from doing dumb things, like asking to read 1MB of data into a buffer that is only 512KB in size. That will also work fine as long as the application only reads a files smaller than 512KB. Similarly, if the statx() API says that the STATX_ALL mask is the list of currently-supported bits, but the app asks for more bits than it allocates a buffer for, there isn't much that the kernel can do. > I'm concerned that the idea of "extensions" isn't well thought out, and in > practice we'll just be stuck with the current struct size (256 bytes) forever. The extensions work exactly as they should - the client sets bits for fields that it needs (and by definition it shouldn't ask for anything that it doesn't understand), and the kernel masks this down to the bits that it understands. If the client asks for more bits than the kernel understands, it is likely a newer application on an older kernel, and it will only get back the fields that the kernel understands. The reverse (client asking for fewer bits than the kernel understands) is normal behaviour for this interface. The kernel should only fill in fields that the client requested and for which there is space in the struct. Cheers, Andreas
Eric Biggers <ebiggers3@gmail.com> wrote: > On Tue, Mar 07, 2017 at 05:22:55PM +0000, David Howells wrote: > > STATX_ALL [All currently available stuff] > > .TE > > .in > > .PP > > .B "Do not" > > simply set > > .I mask > > to UINT_MAX as one or more bits may, in future, be used to specify an extension > > to the buffer. > > To clarify, will an "extension to the buffer" be an increase in the size of > struct statx? Yes. I can make that more explicit, say perhaps "... specify an extension to the size of the buffer". > I think it would have to be, otherwise programs filling a struct statx with > STATX_ALL would start breaking as soon as they're rebuilt with the new value > of STATX_ALL, no? Or would these "extension to the buffer" bits not be > added to STATX_ALL ...? It would have to work that way. I can put this in a comment for future guidance in the kernel header file if you'd prefer. > And I don't suppose there's anything we can do to stop programs from asking > for mask bits that haven't been defined yet, then breaking later if they > happen to be defined as "extensions"? If someone decides not to read the documentation, one could argue they get what they deserve. > Maybe adding an extra "buffer size" argument to the syscall? This idea has already been rejected. > I'm concerned that the idea of "extensions" isn't well thought out, and in > practice we'll just be stuck with the current struct size (256 bytes) forever. One or more bits (probably just one) are tentatively reserved to indicate that the buffer is now bigger. How much bigger I cannot say at this point, not knowing the circumstances yet that will require the extension. David
On Sun, Mar 26, 2017 at 06:46:43PM -0600, Andreas Dilger wrote: > > > On Mar 24, 2017, at 2:53 PM, Eric Biggers <ebiggers3@gmail.com> wrote: > > > > On Tue, Mar 07, 2017 at 05:22:55PM +0000, David Howells wrote: > >> STATX_ALL [All currently available stuff] > >> .TE > >> .in > >> .PP > >> .B "Do not" > >> simply set > >> .I mask > >> to UINT_MAX as one or more bits may, in future, be used to specify an extension > >> to the buffer. > > > > To clarify, will an "extension to the buffer" be an increase in the size of > > struct statx? I think it would have to be, otherwise programs filling a struct > > statx with STATX_ALL would start breaking as soon as they're rebuilt with the > > new value of STATX_ALL, no? Or would these "extension to the buffer" bits not > > be added to STATX_ALL ...? > > The value of STATX_ALL would match the size of struct statx in the header at > compilation time, so this would always be consistent. > > > > And I don't suppose there's anything we can do to stop programs from asking > > for mask bits that haven't been defined yet, then breaking later if they > > happen to be defined as "extensions"? Maybe adding an extra "buffer size" > > argument to the syscall? > > You can't stop applications from doing dumb things, like asking to read 1MB > of data into a buffer that is only 512KB in size. That will also work fine > as long as the application only reads a files smaller than 512KB. > > Similarly, if the statx() API says that the STATX_ALL mask is the list of > currently-supported bits, but the app asks for more bits than it allocates > a buffer for, there isn't much that the kernel can do. > > > I'm concerned that the idea of "extensions" isn't well thought out, and in > > practice we'll just be stuck with the current struct size (256 bytes) forever. > > The extensions work exactly as they should - the client sets bits for fields > that it needs (and by definition it shouldn't ask for anything that it doesn't > understand), and the kernel masks this down to the bits that it understands. > > If the client asks for more bits than the kernel understands, it is likely a > newer application on an older kernel, and it will only get back the fields > that the kernel understands. The reverse (client asking for fewer bits than > the kernel understands) is normal behaviour for this interface. The kernel > should only fill in fields that the client requested and for which there is > space in the struct. During the statx session at LSF last week I asked if filesystems ought to fill in fields that weren't asked for (btime, specifically) and the impression I got was that it's ok to go ahead and fill out fields that weren't asked for if we already have the data. Since statx backends can do that, they'll have to check the structure size, and not rely on "you asked for this field so we assume that you allocated enough memory in userspace to hold it". Or we could just shift the precedent now -- programs only get the information they ask for, and in asking for it we assume that we can write to that part of the buffer. Frankly I'd prefer that behavior (see the XFS statx patch), but I don't own the interface. :) --D > > Cheers, Andreas > > > > >
Darrick J. Wong <darrick.wong@oracle.com> wrote: > During the statx session at LSF last week I asked if filesystems ought > to fill in fields that weren't asked for (btime, specifically) and the > impression I got was that it's ok to go ahead and fill out fields that > weren't asked for if we already have the data. Yes. The filesystem may return data that it wasn't asked for. > Since statx backends can do that, they'll have to check the structure > size, and not rely on "you asked for this field so we assume that you > allocated enough memory in userspace to hold it". Kind of. I'm assuming that the extension fields will be in struct kstat and can be set directly by the filesystem unconditionally, and the core can choose whether to copy them or not. However, assuming that one of the mask bits is reserved to indicate a buffer extension, this will be made available to the fs to let the fs know whether it's pointless to return it. However, at the moment I have no extensions to play with. Would it help you if I marked one of the mask bits reserved for this purpose? > Or we could just shift the precedent now -- programs only get the > information they ask for, and in asking for it we assume that we can > write to that part of the buffer. Frankly I'd prefer that behavior (see > the XFS statx patch), but I don't own the interface. :) That means a filesystem can't simply return non-basic data unconditionally if possible. I prefer letting it do so if it doesn't incur any extra I/O overheads. David
On Mon, Mar 27, 2017 at 05:25:26PM +0100, David Howells wrote: > That means a filesystem can't simply return non-basic data unconditionally if > possible. I prefer letting it do so if it doesn't incur any extra I/O > overheads. This seems like it will lead to userspace expecting certain fields to just be there, and a lot harder to properly verify for tests. Which btw we all need for these odd behaviors. If we can't get them any time soon (e.g. before -rc6) I think we'll simply have to revert statx instead of leaving this untested mess in the tree.
Christoph Hellwig <hch@infradead.org> wrote: > On Mon, Mar 27, 2017 at 05:25:26PM +0100, David Howells wrote: > > That means a filesystem can't simply return non-basic data unconditionally if > > possible. I prefer letting it do so if it doesn't incur any extra I/O > > overheads. > > This seems like it will lead to userspace expecting certain fields to > just be there, and a lot harder to properly verify for tests. Which btw > we all need for these odd behaviors. If we can't get them any time soon > (e.g. before -rc6) I think we'll simply have to revert statx instead of > leaving this untested mess in the tree. Have you reviewed the manpage yet? David
On Mon, 2017-03-27 at 09:46 -0700, Christoph Hellwig wrote: > On Mon, Mar 27, 2017 at 05:25:26PM +0100, David Howells wrote: > > That means a filesystem can't simply return non-basic data unconditionally if > > possible. I prefer letting it do so if it doesn't incur any extra I/O > > overheads. > > This seems like it will lead to userspace expecting certain fields to > just be there, and a lot harder to properly verify for tests. Which btw > we all need for these odd behaviors. If we can't get them any time soon > (e.g. before -rc6) I think we'll simply have to revert statx instead of > leaving this untested mess in the tree. I don't think so. I think we just have to clearly document that that will not be the case. If they really expect the field to be there, then they need to set the bit in the "want" mask -- it's really as simple as that.
On Mar 27, 2017, at 1:04 PM, Jeff Layton <jlayton@poochiereds.net> wrote: > > On Mon, 2017-03-27 at 09:46 -0700, Christoph Hellwig wrote: >> On Mon, Mar 27, 2017 at 05:25:26PM +0100, David Howells wrote: >>> That means a filesystem can't simply return non-basic data unconditionally if >>> possible. I prefer letting it do so if it doesn't incur any extra I/O >>> overheads. >> >> This seems like it will lead to userspace expecting certain fields to >> just be there, and a lot harder to properly verify for tests. Which btw >> we all need for these odd behaviors. If we can't get them any time soon >> (e.g. before -rc6) I think we'll simply have to revert statx instead of >> leaving this untested mess in the tree. > > I don't think so. > > I think we just have to clearly document that that will not be the > case. If they really expect the field to be there, then they need to > set the bit in the "want" mask -- it's really as simple as that. It would be my preference to only return attributes which are explicitly requested (Lustre won't return fields for which the bit is not set), but there was a request for this behaviour in NFS I recall. That said, there are also flags for "lazy" attributes for NFS. I wonder if it is enough to pass AT_STATX_DONT_SYNC and request STATX_SIZE | STATX_CTIME, or whatever NFS is worried about, and return the client-local version of the attributes? I guess the real question is what an application is going to do with fields that it didn't actually request, and has no way to know if they are valid or contain garbage? Cheers, Andreas
On Tue, 2017-03-28 at 13:39 -0600, Andreas Dilger wrote: > On Mar 27, 2017, at 1:04 PM, Jeff Layton <jlayton@poochiereds.net> wrote: > > > > On Mon, 2017-03-27 at 09:46 -0700, Christoph Hellwig wrote: > > > On Mon, Mar 27, 2017 at 05:25:26PM +0100, David Howells wrote: > > > > That means a filesystem can't simply return non-basic data unconditionally if > > > > possible. I prefer letting it do so if it doesn't incur any extra I/O > > > > overheads. > > > > > > This seems like it will lead to userspace expecting certain fields to > > > just be there, and a lot harder to properly verify for tests. Which btw > > > we all need for these odd behaviors. If we can't get them any time soon > > > (e.g. before -rc6) I think we'll simply have to revert statx instead of > > > leaving this untested mess in the tree. > > > > I don't think so. > > > > I think we just have to clearly document that that will not be the > > case. If they really expect the field to be there, then they need to > > set the bit in the "want" mask -- it's really as simple as that. > > It would be my preference to only return attributes which are explicitly > requested (Lustre won't return fields for which the bit is not set), but > there was a request for this behaviour in NFS I recall. > > That said, there are also flags for "lazy" attributes for NFS. I wonder if > it is enough to pass AT_STATX_DONT_SYNC and request STATX_SIZE | STATX_CTIME, > or whatever NFS is worried about, and return the client-local version of the > attributes? > > I guess the real question is what an application is going to do with fields > that it didn't actually request, and has no way to know if they are valid or > contain garbage? > You certainly do have a way to know if they're valid. You can check the stx_mask. Basically, if you request something, you're guaranteed to get it, or an error. If you don't request it, you might get it but you had better check the stx_mask before assuming that you did. TBQH though, I don't _really_ expect userland to use this that often. The main reason I'm arguing for this is for simplicity of the in-kernel implementations. Having to check each bit in the mask before filling out its field is cumbersome. It's much simpler to fill out the kstat struct with what you have and set the stx_mask unconditionally. It's also not 100% clear to me how the unrequested fields would be set if we don't allow this. zeroing them out seems like the obvious thing to do, but that doesn't really provide any guarantee that someone won't try to interpret it as valid.
On Mon, Mar 27, 2017 at 03:04:11PM -0400, Jeff Layton wrote: > I think we just have to clearly document that that will not be the > case. If they really expect the field to be there, then they need to > set the bit in the "want" mask -- it's really as simple as that. If the usual local file systems set it but say NFS doesn't I will very much expect that people don't set the flag and create buggy programs a lot. An API that doesn't give you more than you ask for is a lot more obvious.
On Tue, Mar 07, 2017 at 05:22:55PM +0000, David Howells wrote: > Christoph Hellwig <hch@infradead.org> wrote: > > > This would be great to have in 4.11 together with the initial statx > > implementation. But until I see documentation and testcases for statx > > I don't really feel comfortable reviewing anything related to it. > > Well, since you asked for documentation, here's a manual page for you to > review:-) > > Note that as it isn't in glibc yet, I've left out all the > set-this-and-that-#define-to-make-it-appear stuff except where it is pertinent > to particular constants. I think we'll initially need to make this a section 2 page as libcs will take a while to pick things up. But in the end Michael will have to decide. Adding linux-man to the Cc list so that the right people are involved. Quoting the rendered page below: > Invoking statx(): This is an odd header for a man page. Usually this would start as DESCRIPTION (in bold). > [*] By absolute path. Odd way to enumerate in a man page - I'll leave it to the linux-man regulars to point out the right way, because I don't know it off-hand. > AT_EMPTY_PATH (since Linux 2.6.39) All these have been there since statx was added, so we can remove the annotation. > STATX_BASIC_STATS [The stuff in the normal stat struct] Please enumerate the actual fields. > STATX_ALL [All currently available stuff] How about "All currently available fields"
Christoph Hellwig <hch@infradead.org> wrote: > Quoting the rendered page below: > > > Invoking statx(): > > This is an odd header for a man page. Usually this would start as > > DESCRIPTION > > (in bold). Yes, I know. But it seems to make more sense this way round. Putting the struct definition before the description of how to access it means that I can refer to it in the definitions of the flags that you OR into the mask and means you get to the struct much faster. Maybe I should put it under the SYNOPSIS section? > > [*] By absolute path. > > Odd way to enumerate in a man page - I'll leave it to the linux-man regulars > to point out the right way, because I don't know it off-hand. Me neither. > > AT_EMPTY_PATH (since Linux 2.6.39) > > All these have been there since statx was added, so we can remove the > annotation. You're not the first to note this. I've already taken that out. > > STATX_BASIC_STATS [The stuff in the normal stat struct] > > Please enumerate the actual fields. They're enumerated immediately prior. I've changed it to: STATX_BASIC_STATS [All of the above] > > STATX_ALL [All currently available stuff] > > How about "All currently available fields" Sure. David
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c index 229cc6a..ebfc133 100644 --- a/fs/xfs/xfs_iops.c +++ b/fs/xfs/xfs_iops.c @@ -516,6 +516,20 @@ xfs_vn_getattr( stat->blocks = XFS_FSB_TO_BB(mp, ip->i_d.di_nblocks + ip->i_delayed_blks); + if (ip->i_d.di_version == 3) { + if (request_mask & STATX_BTIME) { + stat->result_mask |= STATX_BTIME; + stat->btime.tv_sec = ip->i_d.di_crtime.t_sec; + stat->btime.tv_nsec = ip->i_d.di_crtime.t_nsec; + } + } + + if (ip->i_d.di_flags & XFS_DIFLAG_IMMUTABLE) + stat->attributes |= STATX_ATTR_IMMUTABLE; + if (ip->i_d.di_flags & XFS_DIFLAG_APPEND) + stat->attributes |= STATX_ATTR_APPEND; + if (ip->i_d.di_flags & XFS_DIFLAG_NODUMP) + stat->attributes |= STATX_ATTR_NODUMP; switch (inode->i_mode & S_IFMT) { case S_IFBLK:
statx has the ability to report inode creation times and inode flags, so hook up di_crtime and di_flags to that functionality. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> --- fs/xfs/xfs_iops.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+)