Message ID | 20190620205043.64350-2-ebiggers@kernel.org (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | fs-verity: read-only file-based authenticity protection | expand |
On 06/20, Eric Biggers wrote: > From: Eric Biggers <ebiggers@google.com> > > Add a documentation file for fs-verity, covering: > > - Introduction > - Use cases > - User API > - FS_IOC_ENABLE_VERITY > - FS_IOC_MEASURE_VERITY > - FS_IOC_GETFLAGS > - Accessing verity files > - File measurement computation > - Merkle tree > - fs-verity descriptor > - Built-in signature verification > - Filesystem support > - ext4 > - f2fs > - Implementation details > - Verifying data > - Pagecache > - Block device based filesystems > - Userspace utility > - Tests > - FAQ > > Reviewed-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jaegeuk Kim <jaegeuk@kernel.org> > Signed-off-by: Eric Biggers <ebiggers@google.com> > --- > Documentation/filesystems/fsverity.rst | 710 +++++++++++++++++++++++++ > Documentation/filesystems/index.rst | 1 + > 2 files changed, 711 insertions(+) > create mode 100644 Documentation/filesystems/fsverity.rst > > diff --git a/Documentation/filesystems/fsverity.rst b/Documentation/filesystems/fsverity.rst > new file mode 100644 > index 00000000000000..49524d7ea190e5 > --- /dev/null > +++ b/Documentation/filesystems/fsverity.rst > @@ -0,0 +1,710 @@ > +======================================================= > +fs-verity: read-only file-based authenticity protection > +======================================================= > + > +Introduction > +============ > + > +fs-verity (``fs/verity/``) is a support layer that filesystems can > +hook into to support transparent integrity and authenticity protection > +of read-only files. Currently, it is supported by the ext4 and f2fs > +filesystems. Like fscrypt, not too much filesystem-specific code is > +needed to support fs-verity. > + > +fs-verity is similar to `dm-verity > +<https://www.kernel.org/doc/Documentation/device-mapper/verity.txt>`_ > +but works on files rather than block devices. On regular files on > +filesystems supporting fs-verity, userspace can execute an ioctl that > +causes the filesystem to build a Merkle tree for the file and persist > +it to a filesystem-specific location associated with the file. > + > +After this, the file is made readonly, and all reads from the file are > +automatically verified against the file's Merkle tree. Reads of any > +corrupted data, including mmap reads, will fail. > + > +Userspace can use another ioctl to retrieve the root hash (actually > +the "file measurement", which is a hash that includes the root hash) > +that fs-verity is enforcing for the file. This ioctl executes in > +constant time, regardless of the file size. > + > +fs-verity is essentially a way to hash a file in constant time, > +subject to the caveat that reads which would violate the hash will > +fail at runtime. > + > +Use cases > +========= > + > +By itself, the base fs-verity feature only provides integrity > +protection, i.e. detection of accidental (non-malicious) corruption. > + > +However, because fs-verity makes retrieving the file hash extremely > +efficient, it's primarily meant to be used as a tool to support > +authentication (detection of malicious modifications) or auditing > +(logging file hashes before use). > + > +Trusted userspace code (e.g. operating system code running on a > +read-only partition that is itself authenticated by dm-verity) can > +authenticate the contents of an fs-verity file by using the > +`FS_IOC_MEASURE_VERITY`_ ioctl to retrieve its hash, then verifying a > +digital signature of it. > + > +A standard file hash could be used instead of fs-verity. However, > +this is inefficient if the file is large and only a small portion may > +be accessed. This is often the case for Android application package > +(APK) files, for example. These typically contain many translations, > +classes, and other resources that are infrequently or even never > +accessed on a particular device. It would be slow and wasteful to > +read and hash the entire file before starting the application. > + > +Unlike an ahead-of-time hash, fs-verity also re-verifies data each > +time it's paged in. This ensures that malicious disk firmware can't > +undetectably change the contents of the file at runtime. > + > +fs-verity does not replace or obsolete dm-verity. dm-verity should > +still be used on read-only filesystems. fs-verity is for files that > +must live on a read-write filesystem because they are independently > +updated and potentially user-installed, so dm-verity cannot be used. > + > +The base fs-verity feature is a hashing mechanism only; actually > +authenticating the files is up to userspace. However, to meet some > +users' needs, fs-verity optionally supports a simple signature > +verification mechanism where users can configure the kernel to require > +that all fs-verity files be signed by a key loaded into a keyring; see > +`Built-in signature verification`_. Support for fs-verity file hashes > +in IMA (Integrity Measurement Architecture) policies is also planned. > + > +User API > +======== > + > +FS_IOC_ENABLE_VERITY > +-------------------- > + > +The FS_IOC_ENABLE_VERITY ioctl enables fs-verity on a file. It takes > +in a pointer to a :c:type:`struct fsverity_enable_arg`, defined as > +follows:: > + > + struct fsverity_enable_arg { > + __u32 version; > + __u32 hash_algorithm; > + __u32 block_size; > + __u32 salt_size; > + __u64 salt_ptr; > + __u32 sig_size; > + __u32 __reserved1; > + __u64 sig_ptr; > + __u64 __reserved2[11]; > + }; > + > +This structure contains the parameters of the Merkle tree to build for > +the file, and optionally contains a signature. It must be initialized > +as follows: > + > +- ``version`` must be 1. > +- ``hash_algorithm`` must be the identifier for the hash algorithm to > + use for the Merkle tree, such as FS_VERITY_HASH_ALG_SHA256. See > + ``include/uapi/linux/fsverity.h`` for the list of possible values. > +- ``block_size`` must be the Merkle tree block size. Currently, this > + must be equal to the system page size, which is usually 4096 bytes. > + Other sizes may be supported in the future. This value is not > + necessarily the same as the filesystem block size. > +- ``salt_size`` is the size of the salt in bytes, or 0 if no salt is > + provided. The salt is a value that is prepended to every hashed > + block; it can be used to personalize the hashing for a particular > + file or device. Currently the maximum salt size is 32 bytes. > +- ``salt_ptr`` is the pointer to the salt, or NULL if no salt is > + provided. > +- ``sig_size`` is the size of the signature in bytes, or 0 if no > + signature is provided. Currently the signature is (somewhat > + arbitrarily) limited to 16128 bytes. See `Built-in signature > + verification`_ for more information. > +- ``sig_ptr`` is the pointer to the signature, or NULL if no > + signature is provided. > +- All reserved fields must be zeroed. > + > +FS_IOC_ENABLE_VERITY causes the filesystem to build a Merkle tree for > +the file and persist it to a filesystem-specific location associated > +with the file, then mark the file as a verity file. This ioctl may > +take a long time to execute on large files, and it is interruptible by > +fatal signals. > + > +FS_IOC_ENABLE_VERITY checks for write access to the inode. However, > +it must be executed on an O_RDONLY file descriptor and no processes > +can have the file open for writing. Attempts to open the file for > +writing while this ioctl is executing will fail with ETXTBSY. (This > +is necessary to guarantee that no writable file descriptors will exist > +after verity is enabled, and to guarantee that the file's contents are > +stable while the Merkle tree is being built over it.) > + > +On success, FS_IOC_ENABLE_VERITY returns 0, and the file becomes a > +verity file. On failure (including the case of interruption by a > +fatal signal), no changes are made to the file. > + > +FS_IOC_ENABLE_VERITY can fail with the following errors: > + > +- ``EACCES``: the process does not have write access to the file > +- ``EEXIST``: the file already has verity enabled > +- ``EFAULT``: the caller provided inaccessible memory > +- ``EINTR``: the operation was interrupted by a fatal signal > +- ``EINVAL``: unsupported version, hash algorithm, or block size; or > + reserved bits are set; or the file descriptor refers to neither a > + regular file nor a directory. > +- ``EISDIR``: the file descriptor refers to a directory > +- ``EMSGSIZE``: the salt or signature is too long > +- ``ENOENT``: fs-verity recognizes the hash algorithm, but it's not > + available in the kernel's crypto API as currently configured (e.g. > + for SHA-512, missing CONFIG_CRYPTO_SHA512). > +- ``ENOTTY``: this type of filesystem does not implement fs-verity > +- ``EOPNOTSUPP``: the kernel was not configured with fs-verity > + support; or the filesystem superblock has not had the 'verity' > + feature enabled on it; or the filesystem does not support fs-verity > + on this file. (See `Filesystem support`_.) > +- ``EPERM``: the file is append-only > +- ``EROFS``: the filesystem is read-only > +- ``ETXTBSY``: someone has the file open for writing. This can be the > + caller's file descriptor, another open file descriptor, or the file > + reference held by a writable memory map. > + > +FS_IOC_MEASURE_VERITY > +--------------------- > + > +The FS_IOC_MEASURE_VERITY ioctl retrieves the measurement of a verity > +file. The file measurement is a digest that cryptographically > +identifies the file contents that are being enforced on reads. > + > +This ioctl takes in a pointer to a variable-length structure:: > + > + struct fsverity_digest { > + __u16 digest_algorithm; > + __u16 digest_size; /* input/output */ > + __u8 digest[]; > + }; > + > +``digest_size`` is an input/output field. On input, it must be > +initialized to the number of bytes allocated for the variable-length > +``digest`` field. > + > +On success, 0 is returned and the kernel fills in the structure as > +follows: > + > +- ``digest_algorithm`` will be the hash algorithm used for the file > + measurement. It will match ``fsverity_enable_arg::hash_algorithm``. > +- ``digest_size`` will be the size of the digest in bytes, e.g. 32 > + for SHA-256. (This can be redundant with ``digest_algorithm``.) > +- ``digest`` will be the actual bytes of the digest. > + > +FS_IOC_MEASURE_VERITY is guaranteed to execute in constant time, > +regardless of the size of the file. > + > +FS_IOC_MEASURE_VERITY can fail with the following errors: > + > +- ``EFAULT``: the caller provided inaccessible memory > +- ``ENODATA``: the file is not a verity file > +- ``ENOTTY``: this type of filesystem does not implement fs-verity > +- ``EOPNOTSUPP``: the kernel was not configured with fs-verity > + support, or the filesystem superblock has not had the 'verity' > + feature enabled on it. (See `Filesystem support`_.) > +- ``EOVERFLOW``: the digest is longer than the specified > + ``digest_size`` bytes. Try providing a larger buffer. > + > +FS_IOC_GETFLAGS > +--------------- > + > +The existing ioctl FS_IOC_GETFLAGS (which isn't specific to fs-verity) > +can also be used to check whether a file has fs-verity enabled or not. > +To do so, check for FS_VERITY_FL (0x00100000) in the returned flags. > + > +The verity flag is not settable via FS_IOC_SETFLAGS. You must use > +FS_IOC_ENABLE_VERITY instead, since parameters must be provided. > + > +Accessing verity files > +====================== > + > +Applications can transparently access a verity file just like a > +non-verity one, with the following exceptions: > + > +- Verity files are readonly. They cannot be opened for writing or > + truncate()d, even if the file mode bits allow it. Attempts to do > + one of these things will fail with EPERM. However, changes to > + metadata such as owner, mode, timestamps, and xattrs are still > + allowed, since these are not measured by fs-verity. Verity files > + can also still be renamed, deleted, and linked to. > + > +- Direct I/O is not supported on verity files. Attempts to use direct > + I/O on such files will fall back to buffered I/O. > + > +- DAX (Direct Access) is not supported on verity files, because this > + would circumvent the data verification. > + > +- Reads of data that doesn't match the verity Merkle tree will fail > + with EIO (for read()) or SIGBUS (for mmap() reads). > + > +- If the sysctl "fs.verity.require_signatures" is set to 1 and the > + file's verity measurement is not signed by a key in the fs-verity > + keyring, then opening the file will fail. See `Built-in signature > + verification`_. > + > +Direct access to the Merkle tree is not supported. Therefore, if a > +verity file is copied, or is backed up and restored, then it will lose > +its "verity"-ness. fs-verity is primarily meant for files like > +executables that are managed by a package manager. > + > +File measurement computation > +============================ > + > +This section describes how fs-verity hashes the file contents using a > +Merkle tree to produce the "file measurement" which cryptographically > +identifies the file contents. This algorithm is the same for all > +filesystems that support fs-verity. > + > +Userspace only needs to be aware of this algorithm if it needs to > +compute the file measurement itself, e.g. in order to sign the file. > + > +Merkle tree > +----------- > + > +The file contents is divided into blocks, where the block size is > +configurable but is usually 4096 bytes. The end of the last block is > +zero-padded if needed. Each block is then hashed, producing the first > +level of hashes. Then, the hashes in this first level are grouped > +into 'blocksize'-byte blocks (zero-padding the ends as needed) and > +these blocks are hashed, producing the second level of hashes. This > +proceeds up the tree until only a single block remains. The hash of > +this block is the "Merkle tree root hash". > + > +If the file is nonempty and fits in one block, then the "Merkle tree > +root hash" is simply the hash of the single data block. If the file > +is empty, then the "Merkle tree root hash" is all zeroes. > + > +The "blocks" here are not necessarily the same as "filesystem blocks". > + > +If a salt was specified, then it's zero-padded to the closest multiple > +of the input size of the hash algorithm's compression function, e.g. > +64 bytes for SHA-256 or 128 bytes for SHA-512. The padded salt is > +prepended to every data or Merkle tree block that is hashed. > + > +The purpose of the block padding is to cause every hash to be taken > +over the same amount of data, which simplifies the implementation and > +keeps open more possibilities for hardware acceleration. The purpose > +of the salt padding is to make the salting "free" when the salted hash > +state is precomputed, then imported for each hash. > + > +Example: in the recommended configuration of SHA-256 and 4K blocks, > +128 hash values fit in each block. Thus, each level of the Merkle > +tree is approximately 128 times smaller than the previous, and for > +large files the Merkle tree's size converges to approximately 1/127 of > +the original file size. However, for small files, the padding is > +significant, making the space overhead proportionally more. > + > +fs-verity descriptor > +-------------------- > + > +By itself, the Merkle tree root hash is ambiguous. For example, it > +can't a distinguish a large file from a small second file whose data > +is exactly the top-level hash block of the first file. Ambiguities > +also arise from the convention of padding to the next block boundary. > + > +To solve this problem, the verity file measurement is actually > +computed as a hash of the following structure, which contains the > +Merkle tree root hash as well as other fields such as the file size:: > + > + struct fsverity_descriptor { > + __u8 version; /* must be 1 */ > + __u8 hash_algorithm; /* Merkle tree hash algorithm */ > + __u8 log_blocksize; /* log2 of size of data and tree blocks */ > + __u8 salt_size; /* size of salt in bytes; 0 if none */ > + __le32 sig_size; /* must be 0 */ > + __le64 data_size; /* size of file the Merkle tree is built over */ > + __u8 root_hash[64]; /* Merkle tree root hash */ > + __u8 salt[32]; /* salt prepended to each hashed block */ > + __u8 __reserved[144]; /* must be 0's */ > + }; > + > +Note that the ``sig_size`` field must be set to 0 for the purpose of > +computing the file measurement, even if a signature was provided (or > +will be provided) to `FS_IOC_ENABLE_VERITY`_. > + > +Built-in signature verification > +=============================== > + > +With CONFIG_FS_VERITY_BUILTIN_SIGNATURES=y, fs-verity supports putting > +a portion of an authentication policy (see `Use cases`_) in the > +kernel. Specifically, it adds support for: > + > +1. At fs-verity module initialization time, a keyring ".fs-verity" is > + created. The root user can add trusted X.509 certificates to this > + keyring using the add_key() system call, then (when done) > + optionally use keyctl_restrict_keyring() to prevent additional > + certificates from being added. > + > +2. `FS_IOC_ENABLE_VERITY`_ accepts a pointer to a PKCS#7 formatted > + signature in DER format of the file measurement. On success, this > + signature is persisted alongside the Merkle tree. Then, any time > + the file is opened, the kernel will verify this signature against > + the certificates in the ".fs-verity" keyring, and verify that it > + matches the actual file measurement. > + > +3. A new sysctl "fs.verity.require_signatures" is made available. > + When set to 1, the kernel requires that all verity files have a > + correctly signed file measurement as described in (2). > + > +File measurements must be signed in the following format, which is > +similar to the structure used by `FS_IOC_MEASURE_VERITY`_:: > + > + struct fsverity_signed_digest { > + char magic[8]; /* must be "FSVerity" */ > + __le16 digest_algorithm; > + __le16 digest_size; > + __u8 digest[]; > + }; > + > +fs-verity's built-in signature verification support is meant as a > +relatively simple mechanism that can be used to provide some level of > +authenticity protection for verity files, as an alternative to doing > +the signature verification in userspace or using IMA-appraisal. > +However, with this mechanism, userspace programs still need to check > +that the verity bit is set, and there is no protection against verity > +files being swapped around. > + > +Filesystem support > +================== > + > +fs-verity is currently supported by the ext4 and f2fs filesystems. > +The CONFIG_FS_VERITY kconfig option must be enabled to use fs-verity > +on either filesystem. > + > +``include/linux/fsverity.h`` declares the interface between the > +``fs/verity/`` support layer and filesystems. Briefly, filesystems > +must provide an ``fsverity_operations`` structure that provides > +methods to read and write the verity metadata to a filesystem-specific > +location, including the Merkle tree blocks and > +``fsverity_descriptor``. Filesystems must also call functions in > +``fs/verity/`` at certain times, such as when a file is opened or when > +pages have been read into the pagecache. (See `Verifying data`_.) > + > +ext4 > +---- > + > +ext4 supports fs-verity since Linux TODO and e2fsprogs v1.45.2. > + > +To create verity files on an ext4 filesystem, the filesystem must have > +been formatted with ``-O verity`` or had ``tune2fs -O verity`` run on > +it. "verity" is an RO_COMPAT filesystem feature, so once set, old > +kernels will only be able to mount the filesystem readonly, and old > +versions of e2fsck will be unable to check the filesystem. Moreover, > +currently ext4 only supports mounting a filesystem with the "verity" > +feature when its block size is equal to PAGE_SIZE (often 4096 bytes). > + > +ext4 sets the EXT4_VERITY_FL on-disk inode flag on verity files. It > +can only be set by `FS_IOC_ENABLE_VERITY`_, and it cannot be cleared. > + > +ext4 also supports encryption, which can be used simultaneously with > +fs-verity. In this case, the plaintext data is verified rather than > +the ciphertext. This is necessary in order to make the file > +measurement meaningful, since every file is encrypted differently. > + > +ext4 stores the verity metadata (Merkle tree and fsverity_descriptor) > +past the end of the file, starting at the first 64K boundary beyond > +i_size. This approach works because (a) verity files are readonly, > +and (b) pages fully beyond i_size aren't visible to userspace but can > +be read/written internally by ext4 with only some relatively small > +changes to ext4. This approach avoids having to depend on the > +EA_INODE feature and on rearchitecturing ext4's xattr support to > +support paging multi-gigabyte xattrs into memory, and to support > +encrypting xattrs. Note that the verity metadata *must* be encrypted > +when the file is, since it contains hashes of the plaintext data. > + > +Currently, ext4 verity only supports the case where the Merkle tree > +block size, filesystem block size, and page size are all the same. It > +also only supports extent-based files. > + > +f2fs > +---- > + > +f2fs supports fs-verity since Linux TODO and f2fs-tools v1.11.0. > + > +To create verity files on an f2fs filesystem, the filesystem must have > +been formatted with ``-O verity``. > + > +f2fs sets the FADVISE_VERITY_BIT on-disk inode flag on verity files. > +It can only be set by `FS_IOC_ENABLE_VERITY`_, and it cannot be > +cleared. > + > +Like ext4, f2fs stores the verity metadata (Merkle tree and > +fsverity_descriptor) past the end of the file, starting at the first > +64K boundary beyond i_size. See explanation for ext4 above. > +Moreover, f2fs supports at most 4096 bytes of xattr entries per inode > +which wouldn't be enough for even a single Merkle tree block. > + > +Currently, f2fs verity only supports a Merkle tree block size of 4096. > + > +Implementation details > +====================== > + > +Verifying data > +-------------- > + > +fs-verity ensures that all reads of a verity file's data are verified, > +regardless of which syscall is used to do the read (e.g. mmap(), > +read(), pread()) and regardless of whether it's the first read or a > +later read (unless the later read can return cached data that was > +already verified). Below, we describe how filesystems implement this. > + > +Pagecache > +~~~~~~~~~ > + > +For filesystems using Linux's pagecache, the ``->readpage()`` and > +``->readpages()`` methods must be modified to verify pages before they > +are marked Uptodate. Merely hooking ``->read_iter()`` would be > +insufficient, since ``->read_iter()`` is not used for memory maps. > + > +Therefore, fs/verity/ provides a function fsverity_verify_page() which > +verifies a page that has been read into the pagecache of a verity > +inode, but is still locked and not Uptodate, so it's not yet readable > +by userspace. As needed to do the verification, > +fsverity_verify_page() will call back into the filesystem to read > +Merkle tree pages via fsverity_operations::read_merkle_tree_page(). > + > +fsverity_verify_page() returns false if verification failed; in this > +case, the filesystem must not set the page Uptodate. Following this, > +as per the usual Linux pagecache behavior, attempts by userspace to > +read() from the part of the file containing the page will fail with > +EIO, and accesses to the page within a memory map will raise SIGBUS. > + > +fsverity_verify_page() currently only supports the case where the > +Merkle tree block size is equal to PAGE_SIZE (often 4096 bytes). > + > +In principle, fsverity_verify_page() verifies the entire path in the > +Merkle tree from the data page to the root hash. However, for > +efficiency the filesystem may cache the hash pages. Therefore, > +fsverity_verify_page() only ascends the tree reading hash pages until > +an already-verified hash page is seen, as indicated by the PageChecked > +bit being set. It then verifies the path to that page. > + > +This optimization, which is also used by dm-verity, results in > +excellent sequential read performance. This is because usually (e.g. > +127 in 128 times for 4K blocks and SHA-256) the hash page from the > +bottom level of the tree will already be cached and checked from > +reading a previous data page. However, random reads perform worse. > + > +Block device based filesystems > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > + > +Block device based filesystems (e.g. ext4 and f2fs) in Linux also use > +the pagecache, so the above subsection applies too. However, they > +also usually read many pages from a file at once, grouped into a > +structure called a "bio". To make it easier for these types of > +filesystems to support fs-verity, fs/verity/ also provides a function > +fsverity_verify_bio() which verifies all pages in a bio. > + > +ext4 and f2fs also support encryption. If a verity file is also > +encrypted, the pages must be decrypted before being verified. To > +support this, these filesystems allocate a "post-read context" for > +each bio and store it in ``->bi_private``:: > + > + struct bio_post_read_ctx { > + struct bio *bio; > + struct work_struct work; > + unsigned int cur_step; > + unsigned int enabled_steps; > + }; > + > +``enabled_steps`` is a bitmask that specifies whether decryption, > +verity, or both is enabled. After the bio completes, for each needed > +postprocessing step the filesystem enqueues the bio_post_read_ctx on a > +workqueue, and then the workqueue work does the decryption or > +verification. Finally, pages where no decryption or verity error > +occurred are marked Uptodate, and the pages are unlocked. > + > +Files on ext4 and f2fs may contain holes. Normally, ``->readpages()`` > +simply zeroes holes and sets the corresponding pages Uptodate; no bios > +are issued. To prevent this case from bypassing fs-verity, these > +filesystems use fsverity_verify_page() to verify hole pages. > + > +ext4 and f2fs disable direct I/O on verity files, since otherwise > +direct I/O would bypass fs-verity. (They also do the same for > +encrypted files.) > + > +Userspace utility > +================= > + > +This document focuses on the kernel, but a userspace utility for > +fs-verity can be found at: > + > + https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/fsverity-utils.git > + > +See the README.md file in the fsverity-utils source tree for details, > +including examples of setting up fs-verity protected files. > + > +Tests > +===== > + > +To test fs-verity, use xfstests. For example, using `kvm-xfstests > +<https://github.com/tytso/xfstests-bld/blob/master/Documentation/kvm-quickstart.md>`_:: > + > + kvm-xfstests -c ext4,f2fs -g verity > + > +FAQ > +=== > + > +This section answers frequently asked questions about fs-verity that > +weren't already directly answered in other parts of this document. > + > +:Q: Why isn't fs-verity part of IMA? > +:A: fs-verity and IMA (Integrity Measurement Architecture) have > + different focuses. fs-verity is a filesystem-level mechanism for > + hashing individual files using a Merkle tree. In contrast, IMA > + specifies a system-wide policy that specifies which files are > + hashed and what to do with those hashes, such as log them, > + authenticate them, or add them to a measurement list. > + > + IMA is planned to support the fs-verity hashing mechanism as an > + alternative to doing full file hashes, for people who want the > + performance and security benefits of the Merkle tree based hash. > + But it doesn't make sense to force all uses of fs-verity to be > + through IMA. As a standalone filesystem feature, fs-verity > + already meets many users' needs, and it's testable like other > + filesystem features e.g. with xfstests. > + > +:Q: Isn't fs-verity useless because the attacker can just modify the > + hashes in the Merkle tree, which is stored on-disk? > +:A: To verify the authenticity of an fs-verity file you must verify > + the authenticity of the "file measurement", which is basically the > + root hash of the Merkle tree. See `Use cases`_. > + > +:Q: Isn't fs-verity useless because the attacker can just replace a > + verity file with a non-verity one? > +:A: See `Use cases`_. In the initial use case, it's really trusted > + userspace code that authenticates the files; fs-verity is just a > + tool to do this job efficiently and securely. The trusted > + userspace code will consider non-verity files to be inauthentic. > + > +:Q: Why does the Merkle tree need to be stored on-disk? Couldn't you > + store just the root hash? > +:A: If the Merkle tree wasn't stored on-disk, then you'd have to > + compute the entire tree when the file is first accessed, even if > + just one byte is being read. This is a fundamental consequence of > + how Merkle tree hashing works. To verify a leaf node, you need to > + verify the whole path to the root hash, including the root node > + (the thing which the root hash is a hash of). But if the root > + node isn't stored on-disk, you have to compute it by hashing its > + children, and so on until you've actually hashed the entire file. > + > + That defeats most of the point of doing a Merkle tree-based hash, > + since if you have to hash the whole file ahead of time anyway, > + then you could simply do sha256(file) instead. That would be much > + simpler, and a bit faster too. > + > + It's true that an in-memory Merkle tree could still provide the > + advantage of verification on every read rather than just on the > + first read. However, it would be inefficient because every time a > + hash page gets evicted (you can't pin the entire Merkle tree into > + memory, since it may be very large), in order to restore it you > + again need to hash everything below it in the tree. This again > + defeats most of the point of doing a Merkle tree-based hash, since > + a single block read could trigger re-hashing gigabytes of data. > + > +:Q: But couldn't you store just the leaf nodes and compute the rest? > +:A: See previous answer; this really just moves up one level, since > + one could alternatively interpret the data blocks as being the > + leaf nodes of the Merkle tree. It's true that the tree can be > + computed much faster if the leaf level is stored rather than just > + the data, but that's only because each level is less than 1% the > + size of the level below (assuming the recommended settings of > + SHA-256 and 4K blocks). For the exact same reason, by storing > + "just the leaf nodes" you'd already be storing over 99% of the > + tree, so you might as well simply store the whole tree. > + > +:Q: Can the Merkle tree be built ahead of time, e.g. distributed as > + part of a package that is installed to many computers? > +:A: This isn't currently supported. It was part of the original > + design, but was removed to simplify the kernel UAPI and because it > + wasn't a critical use case. Files are usually installed once and > + used many times, and cryptographic hashing is somewhat fast on > + most modern processors. > + > +:Q: Why doesn't fs-verity support writes? > +:A: Write support would be very difficult and would require a > + completely different design, so it's well outside the scope of > + fs-verity. Write support would require: > + > + - A way to maintain consistency between the data and hashes, > + including all levels of hashes, since corruption after a crash > + (especially of potentially the entire file!) is unacceptable. > + The main options for solving this are data journalling, > + copy-on-write, and log-structured volume. But it's very hard to > + retrofit existing filesystems with new consistency mechanisms. > + Data journalling is available on ext4, but is very slow. > + > + - Rebuilding the the Merkle tree after every write, which would be > + extremely inefficient. Alternatively, a different authenticated > + dictionary structure such as an "authenticated skiplist" could > + be used. However, this would be far more complex. > + > + Compare it to dm-verity vs. dm-integrity. dm-verity is very > + simple: the kernel just verifies read-only data against a > + read-only Merkle tree. In contrast, dm-integrity supports writes > + but is slow, is much more complex, and doesn't actually support > + full-device authentication since it authenticates each sector > + independently, i.e. there is no "root hash". It doesn't really > + make sense for the same device-mapper target to support these two > + very different cases; the same applies to fs-verity. > + > +:Q: Since verity files are immutable, why isn't the immutable bit set? > +:A: The existing "immutable" bit (FS_IMMUTABLE_FL) already has a > + specific set of semantics which not only make the file contents > + read-only, but also prevent the file from being deleted, renamed, > + linked to, or having its owner or mode changed. These extra > + properties are unwanted for fs-verity, so reusing the immutable > + bit isn't appropriate. > + > +:Q: Why does the API use ioctls instead of setxattr() and getxattr()? > +:A: Abusing the xattr interface for basically arbitrary syscalls is > + heavily frowned upon by most of the Linux filesystem developers. > + An xattr should really just be an xattr on-disk, not an API to > + e.g. magically trigger construction of a Merkle tree. > + > +:Q: Does fs-verity support remote filesystems? > +:A: Only ext4 and f2fs support is implemented currently, but in > + principle any filesystem that can store per-file verity metadata > + can support fs-verity, regardless of whether it's local or remote. > + Some filesystems may have fewer options of where to store the > + verity metadata; one possibility is to store it past the end of > + the file and "hide" it from userspace by manipulating i_size. The > + data verification functions provided by ``fs/verity/`` also assume > + that the filesystem uses the Linux pagecache, but both local and > + remote filesystems normally do so. > + > +:Q: Why is anything filesystem-specific at all? Shouldn't fs-verity > + be implemented entirely at the VFS level? > +:A: There are many reasons why this is not possible or would be very > + difficult, including the following: > + > + - To prevent bypassing verification, pages must not be marked > + Uptodate until they've been verified. Currently, each > + filesystem is responsible for marking pages Uptodate via > + ``->readpages()``. Therefore, currently it's not possible for > + the VFS to do the verification on its own. Changing this would > + require significant changes to the VFS and all filesystems. > + > + - It would require defining a filesystem-independent way to store > + the verity metadata. Extended attributes don't work for this > + because (a) the Merkle tree may be gigabytes, but many > + filesystems assume that all xattrs fit into a single 4K > + filesystem block, and (b) ext4 and f2fs encryption doesn't > + encrypt xattrs, yet the Merkle tree *must* be encrypted when the > + file contents are, because it stores hashes of the plaintext > + file contents. > + > + So the verity metadata would have to be stored in an actual > + file. Using a separate file would be very ugly, since the > + metadata is fundamentally part of the file to be protected, and > + it could cause problems where users could delete the real file > + but not the metadata file or vice versa. On the other hand, > + having it be in the same file would break applications unless > + filesystems' notion of i_size were divorced from the VFS's, > + which would be complex and require changes to all filesystems. > + > + - It's desirable that FS_IOC_ENABLE_VERITY uses the filesystem's > + transaction mechanism so that either the file ends up with > + verity enabled, or no changes were made. Allowing intermediate > + states to occur after a crash may cause problems. > diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst > index 1131c34d77f6f1..416c7f0e123af7 100644 > --- a/Documentation/filesystems/index.rst > +++ b/Documentation/filesystems/index.rst > @@ -31,6 +31,7 @@ filesystem implementations. > > journalling > fscrypt > + fsverity > > Filesystem-specific documentation > ================================= > -- > 2.22.0.410.gd8fdbe21b5-goog
diff --git a/Documentation/filesystems/fsverity.rst b/Documentation/filesystems/fsverity.rst new file mode 100644 index 00000000000000..49524d7ea190e5 --- /dev/null +++ b/Documentation/filesystems/fsverity.rst @@ -0,0 +1,710 @@ +======================================================= +fs-verity: read-only file-based authenticity protection +======================================================= + +Introduction +============ + +fs-verity (``fs/verity/``) is a support layer that filesystems can +hook into to support transparent integrity and authenticity protection +of read-only files. Currently, it is supported by the ext4 and f2fs +filesystems. Like fscrypt, not too much filesystem-specific code is +needed to support fs-verity. + +fs-verity is similar to `dm-verity +<https://www.kernel.org/doc/Documentation/device-mapper/verity.txt>`_ +but works on files rather than block devices. On regular files on +filesystems supporting fs-verity, userspace can execute an ioctl that +causes the filesystem to build a Merkle tree for the file and persist +it to a filesystem-specific location associated with the file. + +After this, the file is made readonly, and all reads from the file are +automatically verified against the file's Merkle tree. Reads of any +corrupted data, including mmap reads, will fail. + +Userspace can use another ioctl to retrieve the root hash (actually +the "file measurement", which is a hash that includes the root hash) +that fs-verity is enforcing for the file. This ioctl executes in +constant time, regardless of the file size. + +fs-verity is essentially a way to hash a file in constant time, +subject to the caveat that reads which would violate the hash will +fail at runtime. + +Use cases +========= + +By itself, the base fs-verity feature only provides integrity +protection, i.e. detection of accidental (non-malicious) corruption. + +However, because fs-verity makes retrieving the file hash extremely +efficient, it's primarily meant to be used as a tool to support +authentication (detection of malicious modifications) or auditing +(logging file hashes before use). + +Trusted userspace code (e.g. operating system code running on a +read-only partition that is itself authenticated by dm-verity) can +authenticate the contents of an fs-verity file by using the +`FS_IOC_MEASURE_VERITY`_ ioctl to retrieve its hash, then verifying a +digital signature of it. + +A standard file hash could be used instead of fs-verity. However, +this is inefficient if the file is large and only a small portion may +be accessed. This is often the case for Android application package +(APK) files, for example. These typically contain many translations, +classes, and other resources that are infrequently or even never +accessed on a particular device. It would be slow and wasteful to +read and hash the entire file before starting the application. + +Unlike an ahead-of-time hash, fs-verity also re-verifies data each +time it's paged in. This ensures that malicious disk firmware can't +undetectably change the contents of the file at runtime. + +fs-verity does not replace or obsolete dm-verity. dm-verity should +still be used on read-only filesystems. fs-verity is for files that +must live on a read-write filesystem because they are independently +updated and potentially user-installed, so dm-verity cannot be used. + +The base fs-verity feature is a hashing mechanism only; actually +authenticating the files is up to userspace. However, to meet some +users' needs, fs-verity optionally supports a simple signature +verification mechanism where users can configure the kernel to require +that all fs-verity files be signed by a key loaded into a keyring; see +`Built-in signature verification`_. Support for fs-verity file hashes +in IMA (Integrity Measurement Architecture) policies is also planned. + +User API +======== + +FS_IOC_ENABLE_VERITY +-------------------- + +The FS_IOC_ENABLE_VERITY ioctl enables fs-verity on a file. It takes +in a pointer to a :c:type:`struct fsverity_enable_arg`, defined as +follows:: + + struct fsverity_enable_arg { + __u32 version; + __u32 hash_algorithm; + __u32 block_size; + __u32 salt_size; + __u64 salt_ptr; + __u32 sig_size; + __u32 __reserved1; + __u64 sig_ptr; + __u64 __reserved2[11]; + }; + +This structure contains the parameters of the Merkle tree to build for +the file, and optionally contains a signature. It must be initialized +as follows: + +- ``version`` must be 1. +- ``hash_algorithm`` must be the identifier for the hash algorithm to + use for the Merkle tree, such as FS_VERITY_HASH_ALG_SHA256. See + ``include/uapi/linux/fsverity.h`` for the list of possible values. +- ``block_size`` must be the Merkle tree block size. Currently, this + must be equal to the system page size, which is usually 4096 bytes. + Other sizes may be supported in the future. This value is not + necessarily the same as the filesystem block size. +- ``salt_size`` is the size of the salt in bytes, or 0 if no salt is + provided. The salt is a value that is prepended to every hashed + block; it can be used to personalize the hashing for a particular + file or device. Currently the maximum salt size is 32 bytes. +- ``salt_ptr`` is the pointer to the salt, or NULL if no salt is + provided. +- ``sig_size`` is the size of the signature in bytes, or 0 if no + signature is provided. Currently the signature is (somewhat + arbitrarily) limited to 16128 bytes. See `Built-in signature + verification`_ for more information. +- ``sig_ptr`` is the pointer to the signature, or NULL if no + signature is provided. +- All reserved fields must be zeroed. + +FS_IOC_ENABLE_VERITY causes the filesystem to build a Merkle tree for +the file and persist it to a filesystem-specific location associated +with the file, then mark the file as a verity file. This ioctl may +take a long time to execute on large files, and it is interruptible by +fatal signals. + +FS_IOC_ENABLE_VERITY checks for write access to the inode. However, +it must be executed on an O_RDONLY file descriptor and no processes +can have the file open for writing. Attempts to open the file for +writing while this ioctl is executing will fail with ETXTBSY. (This +is necessary to guarantee that no writable file descriptors will exist +after verity is enabled, and to guarantee that the file's contents are +stable while the Merkle tree is being built over it.) + +On success, FS_IOC_ENABLE_VERITY returns 0, and the file becomes a +verity file. On failure (including the case of interruption by a +fatal signal), no changes are made to the file. + +FS_IOC_ENABLE_VERITY can fail with the following errors: + +- ``EACCES``: the process does not have write access to the file +- ``EEXIST``: the file already has verity enabled +- ``EFAULT``: the caller provided inaccessible memory +- ``EINTR``: the operation was interrupted by a fatal signal +- ``EINVAL``: unsupported version, hash algorithm, or block size; or + reserved bits are set; or the file descriptor refers to neither a + regular file nor a directory. +- ``EISDIR``: the file descriptor refers to a directory +- ``EMSGSIZE``: the salt or signature is too long +- ``ENOENT``: fs-verity recognizes the hash algorithm, but it's not + available in the kernel's crypto API as currently configured (e.g. + for SHA-512, missing CONFIG_CRYPTO_SHA512). +- ``ENOTTY``: this type of filesystem does not implement fs-verity +- ``EOPNOTSUPP``: the kernel was not configured with fs-verity + support; or the filesystem superblock has not had the 'verity' + feature enabled on it; or the filesystem does not support fs-verity + on this file. (See `Filesystem support`_.) +- ``EPERM``: the file is append-only +- ``EROFS``: the filesystem is read-only +- ``ETXTBSY``: someone has the file open for writing. This can be the + caller's file descriptor, another open file descriptor, or the file + reference held by a writable memory map. + +FS_IOC_MEASURE_VERITY +--------------------- + +The FS_IOC_MEASURE_VERITY ioctl retrieves the measurement of a verity +file. The file measurement is a digest that cryptographically +identifies the file contents that are being enforced on reads. + +This ioctl takes in a pointer to a variable-length structure:: + + struct fsverity_digest { + __u16 digest_algorithm; + __u16 digest_size; /* input/output */ + __u8 digest[]; + }; + +``digest_size`` is an input/output field. On input, it must be +initialized to the number of bytes allocated for the variable-length +``digest`` field. + +On success, 0 is returned and the kernel fills in the structure as +follows: + +- ``digest_algorithm`` will be the hash algorithm used for the file + measurement. It will match ``fsverity_enable_arg::hash_algorithm``. +- ``digest_size`` will be the size of the digest in bytes, e.g. 32 + for SHA-256. (This can be redundant with ``digest_algorithm``.) +- ``digest`` will be the actual bytes of the digest. + +FS_IOC_MEASURE_VERITY is guaranteed to execute in constant time, +regardless of the size of the file. + +FS_IOC_MEASURE_VERITY can fail with the following errors: + +- ``EFAULT``: the caller provided inaccessible memory +- ``ENODATA``: the file is not a verity file +- ``ENOTTY``: this type of filesystem does not implement fs-verity +- ``EOPNOTSUPP``: the kernel was not configured with fs-verity + support, or the filesystem superblock has not had the 'verity' + feature enabled on it. (See `Filesystem support`_.) +- ``EOVERFLOW``: the digest is longer than the specified + ``digest_size`` bytes. Try providing a larger buffer. + +FS_IOC_GETFLAGS +--------------- + +The existing ioctl FS_IOC_GETFLAGS (which isn't specific to fs-verity) +can also be used to check whether a file has fs-verity enabled or not. +To do so, check for FS_VERITY_FL (0x00100000) in the returned flags. + +The verity flag is not settable via FS_IOC_SETFLAGS. You must use +FS_IOC_ENABLE_VERITY instead, since parameters must be provided. + +Accessing verity files +====================== + +Applications can transparently access a verity file just like a +non-verity one, with the following exceptions: + +- Verity files are readonly. They cannot be opened for writing or + truncate()d, even if the file mode bits allow it. Attempts to do + one of these things will fail with EPERM. However, changes to + metadata such as owner, mode, timestamps, and xattrs are still + allowed, since these are not measured by fs-verity. Verity files + can also still be renamed, deleted, and linked to. + +- Direct I/O is not supported on verity files. Attempts to use direct + I/O on such files will fall back to buffered I/O. + +- DAX (Direct Access) is not supported on verity files, because this + would circumvent the data verification. + +- Reads of data that doesn't match the verity Merkle tree will fail + with EIO (for read()) or SIGBUS (for mmap() reads). + +- If the sysctl "fs.verity.require_signatures" is set to 1 and the + file's verity measurement is not signed by a key in the fs-verity + keyring, then opening the file will fail. See `Built-in signature + verification`_. + +Direct access to the Merkle tree is not supported. Therefore, if a +verity file is copied, or is backed up and restored, then it will lose +its "verity"-ness. fs-verity is primarily meant for files like +executables that are managed by a package manager. + +File measurement computation +============================ + +This section describes how fs-verity hashes the file contents using a +Merkle tree to produce the "file measurement" which cryptographically +identifies the file contents. This algorithm is the same for all +filesystems that support fs-verity. + +Userspace only needs to be aware of this algorithm if it needs to +compute the file measurement itself, e.g. in order to sign the file. + +Merkle tree +----------- + +The file contents is divided into blocks, where the block size is +configurable but is usually 4096 bytes. The end of the last block is +zero-padded if needed. Each block is then hashed, producing the first +level of hashes. Then, the hashes in this first level are grouped +into 'blocksize'-byte blocks (zero-padding the ends as needed) and +these blocks are hashed, producing the second level of hashes. This +proceeds up the tree until only a single block remains. The hash of +this block is the "Merkle tree root hash". + +If the file is nonempty and fits in one block, then the "Merkle tree +root hash" is simply the hash of the single data block. If the file +is empty, then the "Merkle tree root hash" is all zeroes. + +The "blocks" here are not necessarily the same as "filesystem blocks". + +If a salt was specified, then it's zero-padded to the closest multiple +of the input size of the hash algorithm's compression function, e.g. +64 bytes for SHA-256 or 128 bytes for SHA-512. The padded salt is +prepended to every data or Merkle tree block that is hashed. + +The purpose of the block padding is to cause every hash to be taken +over the same amount of data, which simplifies the implementation and +keeps open more possibilities for hardware acceleration. The purpose +of the salt padding is to make the salting "free" when the salted hash +state is precomputed, then imported for each hash. + +Example: in the recommended configuration of SHA-256 and 4K blocks, +128 hash values fit in each block. Thus, each level of the Merkle +tree is approximately 128 times smaller than the previous, and for +large files the Merkle tree's size converges to approximately 1/127 of +the original file size. However, for small files, the padding is +significant, making the space overhead proportionally more. + +fs-verity descriptor +-------------------- + +By itself, the Merkle tree root hash is ambiguous. For example, it +can't a distinguish a large file from a small second file whose data +is exactly the top-level hash block of the first file. Ambiguities +also arise from the convention of padding to the next block boundary. + +To solve this problem, the verity file measurement is actually +computed as a hash of the following structure, which contains the +Merkle tree root hash as well as other fields such as the file size:: + + struct fsverity_descriptor { + __u8 version; /* must be 1 */ + __u8 hash_algorithm; /* Merkle tree hash algorithm */ + __u8 log_blocksize; /* log2 of size of data and tree blocks */ + __u8 salt_size; /* size of salt in bytes; 0 if none */ + __le32 sig_size; /* must be 0 */ + __le64 data_size; /* size of file the Merkle tree is built over */ + __u8 root_hash[64]; /* Merkle tree root hash */ + __u8 salt[32]; /* salt prepended to each hashed block */ + __u8 __reserved[144]; /* must be 0's */ + }; + +Note that the ``sig_size`` field must be set to 0 for the purpose of +computing the file measurement, even if a signature was provided (or +will be provided) to `FS_IOC_ENABLE_VERITY`_. + +Built-in signature verification +=============================== + +With CONFIG_FS_VERITY_BUILTIN_SIGNATURES=y, fs-verity supports putting +a portion of an authentication policy (see `Use cases`_) in the +kernel. Specifically, it adds support for: + +1. At fs-verity module initialization time, a keyring ".fs-verity" is + created. The root user can add trusted X.509 certificates to this + keyring using the add_key() system call, then (when done) + optionally use keyctl_restrict_keyring() to prevent additional + certificates from being added. + +2. `FS_IOC_ENABLE_VERITY`_ accepts a pointer to a PKCS#7 formatted + signature in DER format of the file measurement. On success, this + signature is persisted alongside the Merkle tree. Then, any time + the file is opened, the kernel will verify this signature against + the certificates in the ".fs-verity" keyring, and verify that it + matches the actual file measurement. + +3. A new sysctl "fs.verity.require_signatures" is made available. + When set to 1, the kernel requires that all verity files have a + correctly signed file measurement as described in (2). + +File measurements must be signed in the following format, which is +similar to the structure used by `FS_IOC_MEASURE_VERITY`_:: + + struct fsverity_signed_digest { + char magic[8]; /* must be "FSVerity" */ + __le16 digest_algorithm; + __le16 digest_size; + __u8 digest[]; + }; + +fs-verity's built-in signature verification support is meant as a +relatively simple mechanism that can be used to provide some level of +authenticity protection for verity files, as an alternative to doing +the signature verification in userspace or using IMA-appraisal. +However, with this mechanism, userspace programs still need to check +that the verity bit is set, and there is no protection against verity +files being swapped around. + +Filesystem support +================== + +fs-verity is currently supported by the ext4 and f2fs filesystems. +The CONFIG_FS_VERITY kconfig option must be enabled to use fs-verity +on either filesystem. + +``include/linux/fsverity.h`` declares the interface between the +``fs/verity/`` support layer and filesystems. Briefly, filesystems +must provide an ``fsverity_operations`` structure that provides +methods to read and write the verity metadata to a filesystem-specific +location, including the Merkle tree blocks and +``fsverity_descriptor``. Filesystems must also call functions in +``fs/verity/`` at certain times, such as when a file is opened or when +pages have been read into the pagecache. (See `Verifying data`_.) + +ext4 +---- + +ext4 supports fs-verity since Linux TODO and e2fsprogs v1.45.2. + +To create verity files on an ext4 filesystem, the filesystem must have +been formatted with ``-O verity`` or had ``tune2fs -O verity`` run on +it. "verity" is an RO_COMPAT filesystem feature, so once set, old +kernels will only be able to mount the filesystem readonly, and old +versions of e2fsck will be unable to check the filesystem. Moreover, +currently ext4 only supports mounting a filesystem with the "verity" +feature when its block size is equal to PAGE_SIZE (often 4096 bytes). + +ext4 sets the EXT4_VERITY_FL on-disk inode flag on verity files. It +can only be set by `FS_IOC_ENABLE_VERITY`_, and it cannot be cleared. + +ext4 also supports encryption, which can be used simultaneously with +fs-verity. In this case, the plaintext data is verified rather than +the ciphertext. This is necessary in order to make the file +measurement meaningful, since every file is encrypted differently. + +ext4 stores the verity metadata (Merkle tree and fsverity_descriptor) +past the end of the file, starting at the first 64K boundary beyond +i_size. This approach works because (a) verity files are readonly, +and (b) pages fully beyond i_size aren't visible to userspace but can +be read/written internally by ext4 with only some relatively small +changes to ext4. This approach avoids having to depend on the +EA_INODE feature and on rearchitecturing ext4's xattr support to +support paging multi-gigabyte xattrs into memory, and to support +encrypting xattrs. Note that the verity metadata *must* be encrypted +when the file is, since it contains hashes of the plaintext data. + +Currently, ext4 verity only supports the case where the Merkle tree +block size, filesystem block size, and page size are all the same. It +also only supports extent-based files. + +f2fs +---- + +f2fs supports fs-verity since Linux TODO and f2fs-tools v1.11.0. + +To create verity files on an f2fs filesystem, the filesystem must have +been formatted with ``-O verity``. + +f2fs sets the FADVISE_VERITY_BIT on-disk inode flag on verity files. +It can only be set by `FS_IOC_ENABLE_VERITY`_, and it cannot be +cleared. + +Like ext4, f2fs stores the verity metadata (Merkle tree and +fsverity_descriptor) past the end of the file, starting at the first +64K boundary beyond i_size. See explanation for ext4 above. +Moreover, f2fs supports at most 4096 bytes of xattr entries per inode +which wouldn't be enough for even a single Merkle tree block. + +Currently, f2fs verity only supports a Merkle tree block size of 4096. + +Implementation details +====================== + +Verifying data +-------------- + +fs-verity ensures that all reads of a verity file's data are verified, +regardless of which syscall is used to do the read (e.g. mmap(), +read(), pread()) and regardless of whether it's the first read or a +later read (unless the later read can return cached data that was +already verified). Below, we describe how filesystems implement this. + +Pagecache +~~~~~~~~~ + +For filesystems using Linux's pagecache, the ``->readpage()`` and +``->readpages()`` methods must be modified to verify pages before they +are marked Uptodate. Merely hooking ``->read_iter()`` would be +insufficient, since ``->read_iter()`` is not used for memory maps. + +Therefore, fs/verity/ provides a function fsverity_verify_page() which +verifies a page that has been read into the pagecache of a verity +inode, but is still locked and not Uptodate, so it's not yet readable +by userspace. As needed to do the verification, +fsverity_verify_page() will call back into the filesystem to read +Merkle tree pages via fsverity_operations::read_merkle_tree_page(). + +fsverity_verify_page() returns false if verification failed; in this +case, the filesystem must not set the page Uptodate. Following this, +as per the usual Linux pagecache behavior, attempts by userspace to +read() from the part of the file containing the page will fail with +EIO, and accesses to the page within a memory map will raise SIGBUS. + +fsverity_verify_page() currently only supports the case where the +Merkle tree block size is equal to PAGE_SIZE (often 4096 bytes). + +In principle, fsverity_verify_page() verifies the entire path in the +Merkle tree from the data page to the root hash. However, for +efficiency the filesystem may cache the hash pages. Therefore, +fsverity_verify_page() only ascends the tree reading hash pages until +an already-verified hash page is seen, as indicated by the PageChecked +bit being set. It then verifies the path to that page. + +This optimization, which is also used by dm-verity, results in +excellent sequential read performance. This is because usually (e.g. +127 in 128 times for 4K blocks and SHA-256) the hash page from the +bottom level of the tree will already be cached and checked from +reading a previous data page. However, random reads perform worse. + +Block device based filesystems +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Block device based filesystems (e.g. ext4 and f2fs) in Linux also use +the pagecache, so the above subsection applies too. However, they +also usually read many pages from a file at once, grouped into a +structure called a "bio". To make it easier for these types of +filesystems to support fs-verity, fs/verity/ also provides a function +fsverity_verify_bio() which verifies all pages in a bio. + +ext4 and f2fs also support encryption. If a verity file is also +encrypted, the pages must be decrypted before being verified. To +support this, these filesystems allocate a "post-read context" for +each bio and store it in ``->bi_private``:: + + struct bio_post_read_ctx { + struct bio *bio; + struct work_struct work; + unsigned int cur_step; + unsigned int enabled_steps; + }; + +``enabled_steps`` is a bitmask that specifies whether decryption, +verity, or both is enabled. After the bio completes, for each needed +postprocessing step the filesystem enqueues the bio_post_read_ctx on a +workqueue, and then the workqueue work does the decryption or +verification. Finally, pages where no decryption or verity error +occurred are marked Uptodate, and the pages are unlocked. + +Files on ext4 and f2fs may contain holes. Normally, ``->readpages()`` +simply zeroes holes and sets the corresponding pages Uptodate; no bios +are issued. To prevent this case from bypassing fs-verity, these +filesystems use fsverity_verify_page() to verify hole pages. + +ext4 and f2fs disable direct I/O on verity files, since otherwise +direct I/O would bypass fs-verity. (They also do the same for +encrypted files.) + +Userspace utility +================= + +This document focuses on the kernel, but a userspace utility for +fs-verity can be found at: + + https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/fsverity-utils.git + +See the README.md file in the fsverity-utils source tree for details, +including examples of setting up fs-verity protected files. + +Tests +===== + +To test fs-verity, use xfstests. For example, using `kvm-xfstests +<https://github.com/tytso/xfstests-bld/blob/master/Documentation/kvm-quickstart.md>`_:: + + kvm-xfstests -c ext4,f2fs -g verity + +FAQ +=== + +This section answers frequently asked questions about fs-verity that +weren't already directly answered in other parts of this document. + +:Q: Why isn't fs-verity part of IMA? +:A: fs-verity and IMA (Integrity Measurement Architecture) have + different focuses. fs-verity is a filesystem-level mechanism for + hashing individual files using a Merkle tree. In contrast, IMA + specifies a system-wide policy that specifies which files are + hashed and what to do with those hashes, such as log them, + authenticate them, or add them to a measurement list. + + IMA is planned to support the fs-verity hashing mechanism as an + alternative to doing full file hashes, for people who want the + performance and security benefits of the Merkle tree based hash. + But it doesn't make sense to force all uses of fs-verity to be + through IMA. As a standalone filesystem feature, fs-verity + already meets many users' needs, and it's testable like other + filesystem features e.g. with xfstests. + +:Q: Isn't fs-verity useless because the attacker can just modify the + hashes in the Merkle tree, which is stored on-disk? +:A: To verify the authenticity of an fs-verity file you must verify + the authenticity of the "file measurement", which is basically the + root hash of the Merkle tree. See `Use cases`_. + +:Q: Isn't fs-verity useless because the attacker can just replace a + verity file with a non-verity one? +:A: See `Use cases`_. In the initial use case, it's really trusted + userspace code that authenticates the files; fs-verity is just a + tool to do this job efficiently and securely. The trusted + userspace code will consider non-verity files to be inauthentic. + +:Q: Why does the Merkle tree need to be stored on-disk? Couldn't you + store just the root hash? +:A: If the Merkle tree wasn't stored on-disk, then you'd have to + compute the entire tree when the file is first accessed, even if + just one byte is being read. This is a fundamental consequence of + how Merkle tree hashing works. To verify a leaf node, you need to + verify the whole path to the root hash, including the root node + (the thing which the root hash is a hash of). But if the root + node isn't stored on-disk, you have to compute it by hashing its + children, and so on until you've actually hashed the entire file. + + That defeats most of the point of doing a Merkle tree-based hash, + since if you have to hash the whole file ahead of time anyway, + then you could simply do sha256(file) instead. That would be much + simpler, and a bit faster too. + + It's true that an in-memory Merkle tree could still provide the + advantage of verification on every read rather than just on the + first read. However, it would be inefficient because every time a + hash page gets evicted (you can't pin the entire Merkle tree into + memory, since it may be very large), in order to restore it you + again need to hash everything below it in the tree. This again + defeats most of the point of doing a Merkle tree-based hash, since + a single block read could trigger re-hashing gigabytes of data. + +:Q: But couldn't you store just the leaf nodes and compute the rest? +:A: See previous answer; this really just moves up one level, since + one could alternatively interpret the data blocks as being the + leaf nodes of the Merkle tree. It's true that the tree can be + computed much faster if the leaf level is stored rather than just + the data, but that's only because each level is less than 1% the + size of the level below (assuming the recommended settings of + SHA-256 and 4K blocks). For the exact same reason, by storing + "just the leaf nodes" you'd already be storing over 99% of the + tree, so you might as well simply store the whole tree. + +:Q: Can the Merkle tree be built ahead of time, e.g. distributed as + part of a package that is installed to many computers? +:A: This isn't currently supported. It was part of the original + design, but was removed to simplify the kernel UAPI and because it + wasn't a critical use case. Files are usually installed once and + used many times, and cryptographic hashing is somewhat fast on + most modern processors. + +:Q: Why doesn't fs-verity support writes? +:A: Write support would be very difficult and would require a + completely different design, so it's well outside the scope of + fs-verity. Write support would require: + + - A way to maintain consistency between the data and hashes, + including all levels of hashes, since corruption after a crash + (especially of potentially the entire file!) is unacceptable. + The main options for solving this are data journalling, + copy-on-write, and log-structured volume. But it's very hard to + retrofit existing filesystems with new consistency mechanisms. + Data journalling is available on ext4, but is very slow. + + - Rebuilding the the Merkle tree after every write, which would be + extremely inefficient. Alternatively, a different authenticated + dictionary structure such as an "authenticated skiplist" could + be used. However, this would be far more complex. + + Compare it to dm-verity vs. dm-integrity. dm-verity is very + simple: the kernel just verifies read-only data against a + read-only Merkle tree. In contrast, dm-integrity supports writes + but is slow, is much more complex, and doesn't actually support + full-device authentication since it authenticates each sector + independently, i.e. there is no "root hash". It doesn't really + make sense for the same device-mapper target to support these two + very different cases; the same applies to fs-verity. + +:Q: Since verity files are immutable, why isn't the immutable bit set? +:A: The existing "immutable" bit (FS_IMMUTABLE_FL) already has a + specific set of semantics which not only make the file contents + read-only, but also prevent the file from being deleted, renamed, + linked to, or having its owner or mode changed. These extra + properties are unwanted for fs-verity, so reusing the immutable + bit isn't appropriate. + +:Q: Why does the API use ioctls instead of setxattr() and getxattr()? +:A: Abusing the xattr interface for basically arbitrary syscalls is + heavily frowned upon by most of the Linux filesystem developers. + An xattr should really just be an xattr on-disk, not an API to + e.g. magically trigger construction of a Merkle tree. + +:Q: Does fs-verity support remote filesystems? +:A: Only ext4 and f2fs support is implemented currently, but in + principle any filesystem that can store per-file verity metadata + can support fs-verity, regardless of whether it's local or remote. + Some filesystems may have fewer options of where to store the + verity metadata; one possibility is to store it past the end of + the file and "hide" it from userspace by manipulating i_size. The + data verification functions provided by ``fs/verity/`` also assume + that the filesystem uses the Linux pagecache, but both local and + remote filesystems normally do so. + +:Q: Why is anything filesystem-specific at all? Shouldn't fs-verity + be implemented entirely at the VFS level? +:A: There are many reasons why this is not possible or would be very + difficult, including the following: + + - To prevent bypassing verification, pages must not be marked + Uptodate until they've been verified. Currently, each + filesystem is responsible for marking pages Uptodate via + ``->readpages()``. Therefore, currently it's not possible for + the VFS to do the verification on its own. Changing this would + require significant changes to the VFS and all filesystems. + + - It would require defining a filesystem-independent way to store + the verity metadata. Extended attributes don't work for this + because (a) the Merkle tree may be gigabytes, but many + filesystems assume that all xattrs fit into a single 4K + filesystem block, and (b) ext4 and f2fs encryption doesn't + encrypt xattrs, yet the Merkle tree *must* be encrypted when the + file contents are, because it stores hashes of the plaintext + file contents. + + So the verity metadata would have to be stored in an actual + file. Using a separate file would be very ugly, since the + metadata is fundamentally part of the file to be protected, and + it could cause problems where users could delete the real file + but not the metadata file or vice versa. On the other hand, + having it be in the same file would break applications unless + filesystems' notion of i_size were divorced from the VFS's, + which would be complex and require changes to all filesystems. + + - It's desirable that FS_IOC_ENABLE_VERITY uses the filesystem's + transaction mechanism so that either the file ends up with + verity enabled, or no changes were made. Allowing intermediate + states to occur after a crash may cause problems. diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst index 1131c34d77f6f1..416c7f0e123af7 100644 --- a/Documentation/filesystems/index.rst +++ b/Documentation/filesystems/index.rst @@ -31,6 +31,7 @@ filesystem implementations. journalling fscrypt + fsverity Filesystem-specific documentation =================================