diff mbox series

[5/6] composefs: Add documentation

Message ID 8a9aefceebe42d36164f3516c173f18189f0d7e7.1669631086.git.alexl@redhat.com (mailing list archive)
State New, archived
Headers show
Series Composefs: an opportunistically sharing verified image filesystem | expand

Commit Message

Alexander Larsson Nov. 28, 2022, 11:17 a.m. UTC
This adds documentation about the composefs filesystem and
how to use it.

Signed-off-by: Alexander Larsson <alexl@redhat.com>
---
 Documentation/filesystems/composefs.rst | 162 ++++++++++++++++++++++++
 1 file changed, 162 insertions(+)
 create mode 100644 Documentation/filesystems/composefs.rst

Comments

Bagas Sanjaya Nov. 29, 2022, 2:08 p.m. UTC | #1
On Mon, Nov 28, 2022 at 12:17:26PM +0100, Alexander Larsson wrote:
> This adds documentation about the composefs filesystem and
> how to use it.
> 

Nit: s/This adds/Add/
Also, please Cc: linux-doc list for documentation patches.

> +Given such a descriptor called "image.cfs" and a directory with files
> +called "/dir" you can mount it like:
> +
> +  mount -t composefs image.cfs -o basedir=/dir /mnt

What about using literal code block, like below?

---- >8 ----
diff --git a/Documentation/filesystems/composefs.rst b/Documentation/filesystems/composefs.rst
index a0d88cc9baf9fb..e1faaf0ca69181 100644
--- a/Documentation/filesystems/composefs.rst
+++ b/Documentation/filesystems/composefs.rst
@@ -22,7 +22,7 @@ file content from the filesystem by looking up that filename in a set
 of base directories.
 
 Given such a descriptor called "image.cfs" and a directory with files
-called "/dir" you can mount it like:
+called "/dir" you can mount it like::
 
   mount -t composefs image.cfs -o basedir=/dir /mnt
 

> +Composefs uses `fs-verity
> +<https://www.kernel.org/doc/Documentation/filesystems/fsverity.rst>`

Use :doc: for internal linking to other documentation:

---- >8 ----
diff --git a/Documentation/filesystems/composefs.rst b/Documentation/filesystems/composefs.rst
index 65a8c9889427b2..38dac5af117551 100644
--- a/Documentation/filesystems/composefs.rst
+++ b/Documentation/filesystems/composefs.rst
@@ -51,8 +51,7 @@ all mounts.
 Integrity validation
 ====================
 
-Composefs uses `fs-verity
-<https://www.kernel.org/doc/Documentation/filesystems/fsverity.rst>`
+Composefs uses :doc:`fs-verity <fsverity>`
 for integrity validation, and extends it by making the validation also
 apply to the directory metadata.  This happens on two levels,
 validation of the descriptor and validation of the backing files.

> +
> +Expected use-cases
> +=================
> +

Sphinx reported underline too short warning, so you need to match the underline
length with title text:

---- >8 ----
diff --git a/Documentation/filesystems/composefs.rst b/Documentation/filesystems/composefs.rst
index 75fbf14aeb3355..65a8c9889427b2 100644
--- a/Documentation/filesystems/composefs.rst
+++ b/Documentation/filesystems/composefs.rst
@@ -72,7 +72,7 @@ files. This means any (accidental or malicious) modification of the
 basedir will be detected at the time the file is used.
 
 Expected use-cases
-=================
+==================
 
 Container Image Storage
 ```````````````````````

> +Mount options
> +=============
> +
> +`basedir`: A colon separated list of directories to use as a base when resolving relative content paths.
> +`verity_check=[0,1,2]`: When to verify backing file fs-verity: 0 == never, 1 == if specified in image, 2 == always and require it in image.
> +`digest`: A fs-verity sha256 digest that the descriptor file must match. If set, `verity_check` defaults to 2.

Use definition list for list of options and its descriptions:

---- >8 ----
diff --git a/Documentation/filesystems/composefs.rst b/Documentation/filesystems/composefs.rst
index 38dac5af117551..a0d88cc9baf9fb 100644
--- a/Documentation/filesystems/composefs.rst
+++ b/Documentation/filesystems/composefs.rst
@@ -112,9 +112,17 @@ all directory metadata and file content is validated lazily at use.
 Mount options
 =============
 
-`basedir`: A colon separated list of directories to use as a base when resolving relative content paths.
-`verity_check=[0,1,2]`: When to verify backing file fs-verity: 0 == never, 1 == if specified in image, 2 == always and require it in image.
-`digest`: A fs-verity sha256 digest that the descriptor file must match. If set, `verity_check` defaults to 2.
+basedir
+    A colon separated list of directories to use as a base when resolving
+    relative content paths.
+
+verity_check=[0,1,2]
+    When to verify backing file fs-verity: 0 == never; 1 == if specified in
+    image; 2 == always and require it in image.
+
+digest
+    A fs-verity sha256 digest that the descriptor file must match. If set,
+    verity_check defaults to 2.
 
 
 Filesystem format

Finally, you need to add the documentation to table of contents for filesystem
documentation:

---- >8 ----
diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index bee63d42e5eca0..9b7cf136755dce 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -75,6 +75,7 @@ Documentation for filesystem implementations.
    cifs/index
    ceph
    coda
+   composefs
    configfs
    cramfs
    dax
 
Thanks.
diff mbox series

Patch

diff --git a/Documentation/filesystems/composefs.rst b/Documentation/filesystems/composefs.rst
new file mode 100644
index 000000000000..75fbf14aeb33
--- /dev/null
+++ b/Documentation/filesystems/composefs.rst
@@ -0,0 +1,162 @@ 
+.. SPDX-License-Identifier: GPL-2.0
+
+====================
+Composefs Filesystem
+====================
+
+Introduction
+============
+
+Composefs is a read-only file system that is backed by regular files
+(rather than a block device). It is designed to help easily share
+content between different directory trees, such as container images in
+a local store or ostree checkouts. In addition it also has support for
+integrity validation of file content and directory metadata, in an
+efficient way (using fs-verity).
+
+The filesystem mount source is a binary blob called the descriptor. It
+contains all the inode and directory entry data for the entire
+filesystem. However, instead of storing the file content each regular
+file inode stores a relative path name, and the filesystem gets the
+file content from the filesystem by looking up that filename in a set
+of base directories.
+
+Given such a descriptor called "image.cfs" and a directory with files
+called "/dir" you can mount it like:
+
+  mount -t composefs image.cfs -o basedir=/dir /mnt
+
+Content sharing
+===============
+
+Suppose you have a single basedir where the files are content
+addressed (i.e. named by content digest), and a set of composefs
+descriptors using this basedir. Any file that happen to be shared
+between two images (same content, so same digest) will now only be
+stored once on the disk.
+
+Such sharing is possible even if the metadata for the file in the
+image differs (common reasons for metadata difference are mtime,
+permissions, xattrs, etc). The sharing is also anonymous in the sense
+that you can't tell the difference on the mounted files from a
+non-shared file (for example by looking at the link count for a
+hardlinked file).
+
+In addition, any shared files that are actively in use will share
+page-cache, because the page cache for the file contents will be
+addressed by the backing file in the basedir, This means (for example)
+that shared libraries between images will only be mmap:ed once across
+all mounts.
+
+Integrity validation
+====================
+
+Composefs uses `fs-verity
+<https://www.kernel.org/doc/Documentation/filesystems/fsverity.rst>`
+for integrity validation, and extends it by making the validation also
+apply to the directory metadata.  This happens on two levels,
+validation of the descriptor and validation of the backing files.
+
+For descriptor validation, the idea is that you enable fs-verity on
+the descriptor file which seals it from changes that would affect the
+directory metadata. Additionally you can pass a `digest` mount option,
+which composefs verifies against the descriptor fs-verity
+measure. Such a mount option could be encoded in a trusted source
+(like a signed kernel command line) and be used as a root of trust if
+using composefs for the root filesystem.
+
+For file validation, the descriptor can contain digest for each
+backing file, and you can enable fs-verity on the backing
+files. Composefs will validate the digest before using the backing
+files. This means any (accidental or malicious) modification of the
+basedir will be detected at the time the file is used.
+
+Expected use-cases
+=================
+
+Container Image Storage
+```````````````````````
+
+Typically a container image is stored as a set of "layer"
+directories. merged into one mount by using overlayfs.  The lower
+layers are read-only image content and the upper layer is the
+writable state of a running container. Multiple uses of the same
+layer can be shared this way, but it is hard to share individual
+files between unrelated layers.
+
+Using composefs, we can instead use a shared, content-addressed
+store for all the images in the system, and use a composefs image
+for the read-only image content of each image, pointing into the
+shared store. Then for a running container we use an overlayfs
+with the lower dir being the composefs and the upper dir being
+the writable state.
+
+
+Ostree root filesystem validation
+`````````````````````````````````
+
+Ostree uses a content-addressed on-disk store for file content,
+allowing efficient updates and sharing of content. However to actually
+use these as a root filesystem it needs to create a real
+"chroot-style" directory, containing hard links into the store. The
+store itself is validated when created, but once the hard-link
+directory is created, nothing validates the directory structure of
+that.
+
+Instead of a chroot we can we can use composefs. We create a composefs
+image pointing into the object store, enable fs-verity for everything
+and encode the fs-verity digest of the descriptor in the
+kernel-command line. This will allow booting a trusted system where
+all directory metadata and file content is validated lazily at use.
+
+
+Mount options
+=============
+
+`basedir`: A colon separated list of directories to use as a base when resolving relative content paths.
+`verity_check=[0,1,2]`: When to verify backing file fs-verity: 0 == never, 1 == if specified in image, 2 == always and require it in image.
+`digest`: A fs-verity sha256 digest that the descriptor file must match. If set, `verity_check` defaults to 2.
+
+
+Filesystem format
+=================
+
+The format of the descriptor is contains three sections: header,
+inodes and variable data. All data in the file is stored in
+little-endian form.
+
+The header starts at the beginning of the file and contains version,
+magic value, offsets to the variable data and the root inode nr.
+
+The inode section starts at a fixed location right after the
+header. It is a array of inode data, where for each inode there is
+first a variable length chunk and then a fixed size chunk. An inode nr
+is the offset in the inode data to the start of the fixed chunk.
+
+The fixed inode chunk starts with a flag that tells what parts of the
+inode are stored in the file (meaning it is only the maximal size that
+is fixed). After that the various inode attributes are serialized in
+order, such as mode, ownership, xattrs, and payload length. The
+payload length attribute gives the size of the variable chunk.
+
+The inode variable chunk contains different things depending on the
+file type.  For regular files it is the backing filename. For symlinks
+it is the symlink target. For directories it is a list of references to
+dentries, stored in chunks of maximum 4k. The dentry chunks themselves
+are stored in the variable data section.
+
+The variable data section is stored after the inode section, and you
+can find it from the offset in the header. It contains dentries and
+Xattrs data. The xattrs are referred to by offset and size in the
+xattr attribute in the inode data. Each xattr data can be used by many
+inodes in the filesystem. The variable data chunks are all smaller than
+a page (4K) and are padded to not span pages.
+
+Tools
+=====
+
+Tools for composefs can be found at https://github.com/containers/composefs
+
+There is a mkcomposefs tool which can be used to create images on the
+CLI, and a library that applications can use to create composefs
+images.