diff mbox series

[08/11] fsinfo: Add API documentation [ver #15]

Message ID 156173671169.14042.2848093993797220779.stgit@warthog.procyon.org.uk (mailing list archive)
State New, archived
Headers show
Series VFS: Introduce filesystem information query syscall [ver #15] | expand

Commit Message

David Howells June 28, 2019, 3:45 p.m. UTC
Add API documentation for fsinfo.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 Documentation/filesystems/fsinfo.rst |  561 ++++++++++++++++++++++++++++++++++
 1 file changed, 561 insertions(+)
 create mode 100644 Documentation/filesystems/fsinfo.rst
diff mbox series

Patch

diff --git a/Documentation/filesystems/fsinfo.rst b/Documentation/filesystems/fsinfo.rst
new file mode 100644
index 000000000000..86c187a46396
--- /dev/null
+++ b/Documentation/filesystems/fsinfo.rst
@@ -0,0 +1,561 @@ 
+================================
+Filesystem Information Retrieval
+================================
+
+The fsinfo() system call allows the retrieval of filesystem and filesystem
+security information beyond what stat(), statx() and statfs() can query.  It
+does not require a file to be opened as does ioctl().
+
+fsinfo() may be called on a path, an open file descriptor, a filesystem-context
+file descriptor as allocated by fsopen() or fspick().
+
+The fsinfo() system call needs to be configured on by enabling:
+
+	"File systems"/"Enable the fsinfo() system call" (CONFIG_FSINFO)
+
+This document has the following sections:
+
+.. contents:: :local:
+
+
+Overview
+========
+
+The fsinfo() system call retrieves one of a number of attributes, specified by
+the "fsinfo_attribute" enumeration::
+
+	FSINFO_ATTR_STATFS	- statfs()-style state
+	FSINFO_ATTR_FSINFO	- Information about fsinfo() itself
+	FSINFO_ATTR_IDS		- Filesystem IDs
+	FSINFO_ATTR_LIMITS	- Filesystem limits
+	...
+
+Each attribute can have a single value, a sequence of values or a
+sequence-of-sequences of values.  All of the values of an attribute must be of
+the same type - and this is an inherent property of the attribute.  The
+available types are:
+
+ * ``Struct``.  This is a structure with a version-dependent length.  New
+   versions of the kernel may append more fields, though they are not
+   permitted to remove or replace old ones.
+
+   Older applications, expecting an older version of the field, can ask for a
+   shorter struct and will only get the fields they requested; newer
+   applications running on an older kernel will get the extra fields they
+   requested filled with zeros.  Either way, the kernel returns the actual size
+   of the internal struct, regardless of how much data it returned.
+
+   This allows for struct-type fields to be extended in future.
+
+ * ``String``.  This is a variable-length string of up to 4096 characters (no
+   NUL character is included).  The returned string will be truncated if the
+   output buffer is too small.  The total size of the string is returned,
+   regardless of any truncation.
+
+ * ``Array``.  This is a variable-length array of fixed-size structures.  The
+   element size may not vary over time, so the element format must be designed
+   with care.  The maximum length is INT_MAX bytes, though this depends on the
+   kernel being able to allocate an internal buffer large enough.
+
+ * ``Opaque``.  This is a variable-length blob of indeterminate structure.  It
+   may be up to INT_MAX bytes in size.
+
+
+Filesystem API
+==============
+
+The filesystem is called through a superblock_operations method::
+
+	int (*fsinfo) (struct path *path, struct fsinfo_kparams *params);
+
+where "path" indicates the object to be queried and params indicates the
+parameters and the output buffer description.  The function should return the
+total size of the data it would like to produce or an error.
+
+The parameter struct looks like::
+
+	struct fsinfo_kparams {
+		enum fsinfo_attribute	request;
+		__u32			Nth;
+		__u32			Mth;
+		unsigned int		buf_size;
+		unsigned int		usage;
+		void			*buffer;
+		char			*scratch_buffer;
+		...
+	};
+
+The fields relevant to the filesystem are as follows:
+
+ * ``request``
+
+   Which attribute is being requested.  EOPNOTSUPP should be returned if the
+   attribute is not supported by the filesystem or the LSM.
+
+ * ``Nth`` and ``Mth``
+
+   Which value of an attribute is being requested.
+
+   For a single-value attribute Nth and Mth will both be 0.
+
+   For a "1D" attribute, Nth will indicate which value and Mth will always
+   be 0.  Take, for example, FSINFO_ATTR_SERVER_NAME - for a network
+   filesystem, the superblock will be backed by a number of servers.  This will
+   return the name of the Nth server.  ENODATA will be returned if Nth goes
+   beyond the end of the array.
+
+   For a "2D" attribute, Mth will indicate the index in the Nth set of values.
+   Take, for example, Take, for example, FSINFO_ATTR_SERVER_ADDRESS - each
+   server listed by FSINFO_ATTR_SERVER_NAME may have one or more addresses.
+   This will return the Mth address of the Nth server.  ENODATA will be
+   returned if the Nth set doesn't exist or the Mth element of the Nth set
+   doesn't exist.
+
+ * ``buf_size``
+
+   This indicates the current size of the buffer.  For the array type and the
+   opaque type this will be increased if the current buffer won't hold the
+   value and the filesystem will be called again.
+
+ * ``usage``
+
+   This indicates how much of the buffer has been used so far for an array or
+   opaque type attribute.  This is updated by the fsinfo_note_param*()
+   functions.
+
+ * ``buffer``
+
+   This points to the output buffer.  For struct-type and string-type
+   attributes it will always be big enough; for array- and opaque-type, it will
+   be buf_size in size and will be resized if the returned size is larger than
+   this.
+
+ * ``scratch_buffer``
+
+   For array- and opaque-type attributes, this will point to a 4096-byte
+   scratch buffer.  Sometimes the value needs to be generated by sprintf(),
+   say, to find out how big is going to be, but that might not be possible in
+   the main buffer without risking an overrun.
+
+To simplify filesystem code, there will always be at least a minimal buffer
+available if the ->fsinfo() method gets called - and the filesystem should
+always write what it can into the buffer.  It's possible that the fsinfo()
+system call will then throw the contents away and just return the length.
+
+
+Helper Functions
+================
+
+The API includes a number of helper functions:
+
+ * ``int generic_fsinfo(struct path *path, struct fsinfo_kparams *params);``
+
+   This is the function that does default actions for filling out attribute
+   values from standard data, such as may be found in the file_system_type
+   struct and the super_block struct.  It also generates -EOPNOTSUPP for
+   unsupported attributes.
+
+   This should be called by a filesystem if it doesn't want to handle an
+   attribute.  The filesystem may also call this function and then adjust the
+   information returned, such as changing the listed capability flags.
+
+ * ``void fsinfo_set_cap(struct fsinfo_capabilities *c,
+			 enum fsinfo_capability cap);``
+
+   This function sets a capability flag.
+
+ * ``void fsinfo_clear_cap(struct fsinfo_capabilities *c,
+			   enum fsinfo_capability cap);``
+
+   This function clears a capability flag.
+
+ * ``void fsinfo_set_unix_caps(struct fsinfo_capabilities *caps);``
+
+   Set capability flags appropriate to the features of a standard UNIX
+   filesystem, such as having numeric UIDS and GIDS; allowing the creation of
+   directories, symbolic links, hard links, device files, FIFO and socket
+   files; permitting sparse files; and having access, change and modification
+   times.
+
+ * ``void fsinfo_note_sb_params(struct fsinfo_kparams *params,
+				unsigned int s_flags);``
+
+   This function notes the standard parameters corresponding to certain
+   ``SB_*`` flags in ``sb->s_flags`` into the parameter buffer.  The filesystem
+   is at liberty to adjust the s_flags mask as it sees fit.
+
+   This is intended for use with FSINFO_ATTR_PARAMETERS.
+
+ * ``void fsinfo_note_param(struct fsinfo_kparams *params, const char *key,
+			    const char *val);``
+
+   This function writes a pair of strings with prepended lengths into
+   params->buffer, if there's space, and always updates params->usage.  The
+   assumption is that the caller of s->s_op->fsinfo() will resize the buffer if
+   the usage grew too large and call again.
+
+   This is intended for use with FSINFO_ATTR_{,LSM_}PARAMETERS, but is not
+   limited to those.  The format allows binary data, though this API function
+   does not support anything with NUL characters in it.
+
+   Note that this function will not sleep, so is safe to take with locks held.
+
+ * ``void fsinfo_note_paramf(struct fsinfo_kparams *params, const char *key,
+			     const char *val_fmt, ...);``
+
+   This function is a simple wrapper around fsinfo_note_param(), writing the
+   value using vsnprintf() into params->scratch_buffer and then jumping to
+   fsinfo_note_param().
+
+
+Attribute Summary
+=================
+
+To summarise the attributes that are defined::
+
+  Symbolic name				Type
+  =====================================	===============
+  FSINFO_ATTR_STATFS			struct
+  FSINFO_ATTR_FSINFO			struct
+  FSINFO_ATTR_IDS			struct
+  FSINFO_ATTR_LIMITS			struct
+  FSINFO_ATTR_SUPPORTS			struct
+  FSINFO_ATTR_CAPABILITIES		struct
+  FSINFO_ATTR_TIMESTAMP_INFO		struct
+  FSINFO_ATTR_VOLUME_ID			string
+  FSINFO_ATTR_VOLUME_UUID		struct
+  FSINFO_ATTR_VOLUME_NAME		string
+  FSINFO_ATTR_NAME_ENCODING		string
+  FSINFO_ATTR_NAME_CODEPAGE		string
+  FSINFO_ATTR_PARAM_DESCRIPTION		struct
+  FSINFO_ATTR_PARAM_SPECIFICATION	N × struct
+  FSINFO_ATTR_PARAM_ENUM		N × struct
+  FSINFO_ATTR_PARAMETERS		opaque
+  FSINFO_ATTR_LSM_PARAMETERS		opaque
+  FSINFO_ATTR_SERVER_NAME		N × string
+  FSINFO_ATTR_SERVER_ADDRESS		N × M × struct
+  FSINFO_ATTR_AFS_CELL_NAME		string
+
+
+Attribute Catalogue
+===================
+
+A number of the attributes convey information about a filesystem superblock:
+
+ *  ``FSINFO_ATTR_STATFS``
+
+    This struct-type attribute gives most of the equivalent data to statfs(),
+    but with all the fields as unconditional 64-bit or 128-bit integers.  Note
+    that static data like IDs that don't change are retrieved with
+    FSINFO_ATTR_IDS instead.
+
+    Further, superblock flags (such as MS_RDONLY) are not exposed by this
+    attribute; rather the parameters must be listed and the attributes picked
+    out from that.
+
+ *  ``FSINFO_ATTR_IDS``
+
+    This struct-type attribute conveys various identifiers used by the target
+    filesystem.  This includes the filesystem name, the NFS filesystem ID, the
+    superblock ID used in notifications, the filesystem magic type number and
+    the primary device ID.
+
+ *  ``FSINFO_ATTR_LIMITS``
+
+    This struct-type attribute conveys the limits on various aspects of a
+    filesystem, such as maximum file, symlink and xattr sizes, maxiumm filename
+    and xattr name length, maximum number of symlinks, maximum device major and
+    minor numbers and maximum UID, GID and project ID numbers.
+
+ *  ``FSINFO_ATTR_SUPPORTS``
+
+    This struct-type attribute conveys information about the support the
+    filesystem has for various UAPI features of a filesystem.  This includes
+    information about which bits are supported in various masks employed by the
+    statx system call, what FS_IOC_* flags are supported by ioctls and what
+    DOS/Windows file attribute flags are supported.
+
+ *  ``FSINFO_ATTR_CAPABILITIES``
+
+    This is a special attribute, being a set of single-bit capability flags,
+    formatted as struct-type attribute.  The meanings of the capability bits
+    are listed below - see the "Capability Bit Catalogue" section.  The
+    capability bits are grouped numerically into bytes, such that capilities
+    0-7 are in byte 0, 8-15 are in byte 1, 16-23 in byte 2 and so on.
+
+    Any capability bit that's not supported by the kernel will be set to false
+    if asked for.  The highest supported capability can be obtained from
+    attribute "FSINFO_ATTR_FSINFO".
+
+ *  ``FSINFO_ATTR_TIMESTAMP_INFO``
+
+    This struct-type attribute conveys information about the resolution and
+    range of the timestamps available in a filesystem.  The resolutions are
+    given as a mantissa and exponent (resolution = mantissa * 10^exponent
+    seconds), where the exponent can be negative to indicate a sub-second
+    resolution (-9 being nanoseconds, for example).
+
+ *  ``FSINFO_ATTR_VOLUME_ID``
+
+    This is a string-type attribute that conveys the superblock identifier for
+    the volume.  By default it will be filled in from the contents of s_id from
+    the superblock.  For a block-based filesystem, for example, this might be
+    the name of the primary block device.
+
+ *  ``FSINFO_ATTR_VOLUME_UUID``
+
+    This is a struct-type attribute that conveys the UUID identifier for the
+    volume.  By default it will be filled in from the contents of s_uuid from
+    the superblock.  If this doesn't exist, it will be an entirely zeros.
+
+ *  ``FSINFO_ATTR_VOLUME_NAME``
+
+    This is a string-type attribute that conveys the name of the volume.  By
+    default it will return EOPNOTSUPP.  For a disk-based filesystem, it might
+    convey the partition label; for a network-based filesystem, it might convey
+    the name of the remote volume.
+
+ *  ``FSINFO_ATTR_NAME_ENCODING``
+
+    This is a string-type attribute that returns the type of encoding used for
+    filenames in the medium.  By default this will be filled in with "utf8".
+    Not all filesystems can support that, however, so this may indicate a
+    restriction on what characters can be used.
+
+ *  ``FSINFO_ATTR_NAME_CODEPAGE``
+
+    This is a string-type attribute that returns the name of the codepage used
+    to transliterate a Linux utf8 filename into whatever the medium supports.
+    By default it returns EOPNOTSUPP.
+
+
+The next attributes give information about the mount parameter parsers and the
+mount parameters values stored in a superblock and its security data.  The
+first few of these can be queried on the file descriptor returned by fsopen()
+before any superblock is attached:
+
+ *  ``FSINFO_ATTR_PARAM_DESCRIPTION``
+
+    This is a struct-type attribute that returns summary information about what
+    mount options are available on a filesystem, including the number of
+    parameters and the number of enum symbols.
+
+ *  ``FSINFO_ATTR_PARAM_SPECIFICATION``
+
+    This is a 1D array of struct-type attributes, indicating the type,
+    qualifiers, name and an option ID for the Nth mount parameter.  Parameters
+    that have the same option ID are presumed to be synonyms.
+
+ *  ``FSINFO_ATTR_PARAM_ENUM``
+
+    This is a 1D array of struct-type attributes, indicating the Nth value
+    symbol for the set of enumeration-type parameters.  All the values are in
+    the same table, so they can be matched to the parameter by option ID, and
+    each option ID may have several entries, each with a different name.
+
+ *  ``FSINFO_ATTR_PARAMETERS``
+ *  ``FSINFO_ATTR_LSM_PARAMETERS``
+
+    These are a pair of opaque blobs that list all the mount parameter values
+    currently set on a superblock.  The first set come from the filesystem and
+    the second is from the LSMs - and, as such, convey security information,
+    such as labelling.
+
+    Inside the filesystem or LSM, the parameter values should be read in one go
+    under lock to avoid races with remount if necessary.
+
+    Each opaque blob is encoded as a series of pairs of elements, where each
+    element begins with a length.  The first element of each pair is the key
+    name and the second is the value (which may contain commas, binary data,
+    NUL chars).
+
+    An element length is encoded as a series of bytes in most->least signifcant
+    order.  Each byte contributes 7 bits to the length.  The MSB in each byte
+    is set if there's another byte of length information following on (ie. all
+    but the last byte in the length have the MSB set).
+
+    A number of helper functions are provided to help record the parameters::
+
+	fsinfo_note_sb_params()
+	fsinfo_note_param()
+	fsinfo_note_paramf()
+
+    Note that the first is not applicable to LSM parameters.  It is called
+    automatically if the filesystem doesn't implement the attribute, but must,
+    and should, be called manually otherwise.  It should also be called first,
+    before noting any other parameters.
+
+
+Then there are filesystem-specific attributes.
+
+ *  ``FSINFO_ATTR_SERVER_NAME``
+
+    This is a string-type attribute that conveys the name of the Nth server
+    backing a network-filesystem superblock.
+
+ *  ``FSINFO_ATTR_SERVER_ADDRESS``
+
+    This is a struct-type attribute that conveys the Mth address of the Nth
+    server, as returned by FSINFO_ATTR_SERVER_NAME.
+
+ *  ``FSINFO_ATTR_AFS_CELL_NAME``
+
+    This is a string-type attribute that retrieves the AFS cell name of the
+    target object.
+
+
+Lastly, one attribute gives information about fsinfo() itself:
+
+ *  ``FSINFO_ATTR_FSINFO``
+
+    This struct-type attribute gives information about the fsinfo() system call
+    itself, including the maximum number of attributes supported and the
+    maximum number of capability bits supported.
+
+
+Capability Bit Catalogue
+========================
+
+The capability bits convey single true/false assertions about a specific
+instance of a filesystem (ie. a specific superblock).  They are accessed using
+the "FSINFO_ATTR_CAPABILITY" attribute:
+
+ *  ``FSINFO_CAP_IS_KERNEL_FS``
+ *  ``FSINFO_CAP_IS_BLOCK_FS``
+ *  ``FSINFO_CAP_IS_FLASH_FS``
+ *  ``FSINFO_CAP_IS_NETWORK_FS``
+ *  ``FSINFO_CAP_IS_AUTOMOUNTER_FS``
+ *  ``FSINFO_CAP_IS_MEMORY_FS``
+
+    These indicate what kind of filesystem the target is: kernel API (proc),
+    block-based (ext4), flash/nvm-based (jffs2), remote over the network (NFS),
+    local quasi-filesystem that acts as a tray of mountpoints (autofs), plain
+    in-memory filesystem (shmem).
+
+ *  ``FSINFO_CAP_AUTOMOUNTS``
+
+    This indicate if a filesystem may have objects that are automount points.
+
+ *  ``FSINFO_CAP_ADV_LOCKS``
+ *  ``FSINFO_CAP_MAND_LOCKS``
+ *  ``FSINFO_CAP_LEASES``
+
+    These indicate if a filesystem supports advisory locks, mandatory locks or
+    leases.
+
+ *  ``FSINFO_CAP_UIDS``
+ *  ``FSINFO_CAP_GIDS``
+ *  ``FSINFO_CAP_PROJIDS``
+
+    These indicate if a filesystem supports/stores/transports numeric user IDs,
+    group IDs or project IDs.  The "FSINFO_ATTR_LIMITS" attribute can be used
+    to find out the upper limits on the IDs values.
+
+ *  ``FSINFO_CAP_STRING_USER_IDS``
+
+    This indicates if a filesystem supports/stores/transports string user
+    identifiers.
+
+ *  ``FSINFO_CAP_GUID_USER_IDS``
+
+    This indicates if a filesystem supports/stores/transports Windows GUIDs as
+    user identifiers (eg. ntfs).
+
+ *  ``FSINFO_CAP_WINDOWS_ATTRS``
+
+    This indicates if a filesystem supports Windows FILE_* attribute bits
+    (eg. cifs, jfs).  The "FSINFO_ATTR_SUPPORTS" attribute can be used to find
+    out which windows file attributes are supported by the filesystem.
+
+ *  ``FSINFO_CAP_USER_QUOTAS``
+ *  ``FSINFO_CAP_GROUP_QUOTAS``
+ *  ``FSINFO_CAP_PROJECT_QUOTAS``
+
+    These indicate if a filesystem supports quotas for users, groups or
+    projects.
+
+ *  ``FSINFO_CAP_XATTRS``
+
+    These indicate if a filesystem supports extended attributes.  The
+    "FSINFO_ATTR_LIMITS" attribute can be used to find out the upper limits on
+    the supported name and body lengths.
+
+ *  ``FSINFO_CAP_JOURNAL``
+ *  ``FSINFO_CAP_DATA_IS_JOURNALLED``
+
+    These indicate whether the filesystem has a journal and whether data
+    changes are logged to it.
+
+ *  ``FSINFO_CAP_O_SYNC``
+ *  ``FSINFO_CAP_O_DIRECT``
+
+    These indicate whether the filesystem supports the O_SYNC and O_DIRECT
+    flags.
+
+ *  ``FSINFO_CAP_VOLUME_ID``
+ *  ``FSINFO_CAP_VOLUME_UUID``
+ *  ``FSINFO_CAP_VOLUME_NAME``
+ *  ``FSINFO_CAP_VOLUME_FSID``
+
+    These indicate whether ID, UUID, name and FSID identifiers actually exist
+    in the filesystem and thus might be considered persistent.
+
+ *  ``FSINFO_CAP_IVER_ALL_CHANGE``
+ *  ``FSINFO_CAP_IVER_DATA_CHANGE``
+ *  ``FSINFO_CAP_IVER_MONO_INCR``
+
+    These indicate whether i_version in the inode is supported and, if so, what
+    mode it operates in.  The first two indicate if it's changed for any data
+    or metadata change, or whether it's only changed for any data changes; the
+    last indicates whether or not it's monotonically increasing for each such
+    change.
+
+ *  ``FSINFO_CAP_HARD_LINKS``
+ *  ``FSINFO_CAP_HARD_LINKS_1DIR``
+
+    These indicate whether the filesystem can have hard links made in it, and
+    whether they can be made between directory or only within the same
+    directory.
+
+ *  ``FSINFO_CAP_DIRECTORIES``
+ *  ``FSINFO_CAP_SYMLINKS``
+ *  ``FSINFO_CAP_DEVICE_FILES``
+ *  ``FSINFO_CAP_UNIX_SPECIALS``
+
+    These indicate whether directories; symbolic links; device files; or pipes
+    and sockets can be made within the filesystem.
+
+ *  ``FSINFO_CAP_RESOURCE_FORKS``
+
+    This indicates if the filesystem supports resource forks.
+
+ *  ``FSINFO_CAP_NAME_CASE_INDEP``
+ *  ``FSINFO_CAP_NAME_NON_UTF8``
+ *  ``FSINFO_CAP_NAME_HAS_CODEPAGE``
+
+    These indicate if the filesystem supports case-independent file names,
+    whether the filenames are non-utf8 (see the "FSINFO_ATTR_NAME_ENCODING"
+    attribute) and whether a codepage is in use to transliterate them (see
+    the "FSINFO_ATTR_NAME_CODEPAGE" attribute).
+
+ *  ``FSINFO_CAP_SPARSE``
+
+    This indicates if a filesystem supports sparse files.
+
+ *  ``FSINFO_CAP_NOT_PERSISTENT``
+
+    This indicates if a filesystem is not persistent.
+
+ *  ``FSINFO_CAP_NO_UNIX_MODE``
+
+    This indicates if a filesystem doesn't support UNIX mode bits (though they
+    may be manufactured from other bits, such as Windows file attribute flags).
+
+ *  ``FSINFO_CAP_HAS_ATIME``
+ *  ``FSINFO_CAP_HAS_BTIME``
+ *  ``FSINFO_CAP_HAS_CTIME``
+ *  ``FSINFO_CAP_HAS_MTIME``
+
+    These indicate which timestamps a filesystem supports (access, birth,
+    change, modify).  The range and resolutions can be queried with the
+    "FSINFO_ATTR_TIMESTAMPS" attribute).