diff mbox series

[04/18] fsverity: support block-based Merkle tree caching

Message ID 171444679658.955480.4637262867075831070.stgit@frogsfrogsfrogs (mailing list archive)
State New
Headers show
Series [01/18] fs: add FS_XFLAG_VERITY for verity files | expand

Commit Message

Darrick J. Wong April 30, 2024, 3:20 a.m. UTC
From: Andrey Albershteyn <aalbersh@redhat.com>

In the current implementation fs-verity expects filesystem to provide
PAGEs filled with Merkle tree blocks. Then, when fs-verity is done with
processing the blocks, reference to PAGE is freed. This doesn't fit well
with the way XFS manages its memory.

To allow XFS integrate fs-verity this patch adds ability to fs-verity
verification code to take Merkle tree blocks instead of PAGE reference.
This way ext4, f2fs, and btrfs are still able to pass PAGE references
and XFS can pass reference to Merkle tree blocks stored in XFS's
extended attribute infrastructure.

To achieve this, the XFS implementation will implement its own incore
merkle tree block cache.  These blocks will be passed to fsverity when
it needs to read a merkle tree block, and dropped  by fsverity when
validation completes.  The cache will keep track of whether or not a
given block has already been verified, which will improve performance on
random reads.

Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
[djwong: fix uninit err variable, remove dependency on bitmap, apply
 various suggestions from maintainer, tighten changelog]
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/verity/open.c         |   22 +++++++++++++++-
 fs/verity/verify.c       |   41 +++++++++++++++++++++++++++--
 include/linux/fsverity.h |   64 +++++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 120 insertions(+), 7 deletions(-)

Comments

Christoph Hellwig May 1, 2024, 7:36 a.m. UTC | #1
> @@ -377,6 +391,19 @@ int fsverity_read_merkle_tree_block(struct inode *inode,
>  
>  	block->pos = pos;
>  	block->size = params->block_size;
> +	block->verified = false;
> +
> +	if (vops->read_merkle_tree_block) {
> +		struct fsverity_readmerkle req = {
> +			.inode = inode,
> +			.ra_bytes = ra_bytes,
> +		};
> +
> +		err = vops->read_merkle_tree_block(&req, block);
> +		if (err)
> +			goto bad;
> +		return 0;

I still don't understand why we're keeping two interfaces instead of
providing a read through pagecache helper that implements the
->read_block interface.  That makes the interface really hard to follow
and feel rather ad-hoc.  I also have vague memories of providing such a
refactoring a long time ago.
Darrick J. Wong May 1, 2024, 10:35 p.m. UTC | #2
On Wed, May 01, 2024 at 12:36:11AM -0700, Christoph Hellwig wrote:
> > @@ -377,6 +391,19 @@ int fsverity_read_merkle_tree_block(struct inode *inode,
> >  
> >  	block->pos = pos;
> >  	block->size = params->block_size;
> > +	block->verified = false;
> > +
> > +	if (vops->read_merkle_tree_block) {
> > +		struct fsverity_readmerkle req = {
> > +			.inode = inode,
> > +			.ra_bytes = ra_bytes,
> > +		};
> > +
> > +		err = vops->read_merkle_tree_block(&req, block);
> > +		if (err)
> > +			goto bad;
> > +		return 0;
> 
> I still don't understand why we're keeping two interfaces instead of
> providing a read through pagecache helper that implements the
> ->read_block interface.  That makes the interface really hard to follow
> and feel rather ad-hoc.  I also have vague memories of providing such a
> refactoring a long time ago.

Got a link?  This is the first I've heard of this, but TBH I've been
ignoring a /lot/ of things trying to get online repair merged (thank
you!) over the past months...

--D
Christoph Hellwig May 2, 2024, 4:42 a.m. UTC | #3
On Wed, May 01, 2024 at 03:35:19PM -0700, Darrick J. Wong wrote:
> Got a link?  This is the first I've heard of this, but TBH I've been
> ignoring a /lot/ of things trying to get online repair merged (thank
> you!) over the past months...

This was long before I got involved with repair :)

Below is what I found in my local tree.  It doesn't have a proper commit
log, so I probably only sent it out as a RFC in reply to a patch series
posting, most likely untested:

commit c11dcbe101a240c7a9e9bae7efaff2779d88b292
Author: Christoph Hellwig <hch@lst.de>
Date:   Mon Oct 16 14:14:11 2023 +0200

    fsverity block interface

diff --git a/Documentation/filesystems/fsverity.rst b/Documentation/filesystems/fsverity.rst
index af889512c6ac99..c616d530a89086 100644
--- a/Documentation/filesystems/fsverity.rst
+++ b/Documentation/filesystems/fsverity.rst
@@ -648,7 +648,7 @@ which verifies data that has been read into the pagecache of a verity
 inode.  The containing folio must still be locked and not Uptodate, so
 it's not yet readable by userspace.  As needed to do the verification,
 fsverity_verify_blocks() will call back into the filesystem to read
-hash blocks via fsverity_operations::read_merkle_tree_page().
+hash blocks via fsverity_operations::read_merkle_tree_block().
 
 fsverity_verify_blocks() returns false if verification failed; in this
 case, the filesystem must not set the folio Uptodate.  Following this,
diff --git a/fs/btrfs/verity.c b/fs/btrfs/verity.c
index 2b34796f68d349..4b6134923232e7 100644
--- a/fs/btrfs/verity.c
+++ b/fs/btrfs/verity.c
@@ -713,20 +713,20 @@ int btrfs_get_verity_descriptor(struct inode *inode, void *buf, size_t buf_size)
  *
  * Returns the page we read, or an ERR_PTR on error.
  */
-static struct page *btrfs_read_merkle_tree_page(struct inode *inode,
-						pgoff_t index,
-						unsigned long num_ra_pages,
-						u8 log_blocksize)
+static int btrfs_read_merkle_tree_block(struct inode *inode,
+		unsigned int offset, struct fsverity_block *block,
+		unsigned long num_ra_pages)
 {
 	struct folio *folio;
+	pgoff_t index = offset >> PAGE_SHIFT;
 	u64 off = (u64)index << PAGE_SHIFT;
 	loff_t merkle_pos = merkle_file_pos(inode);
 	int ret;
 
 	if (merkle_pos < 0)
-		return ERR_PTR(merkle_pos);
+		return merkle_pos;
 	if (merkle_pos > inode->i_sb->s_maxbytes - off - PAGE_SIZE)
-		return ERR_PTR(-EFBIG);
+		return -EFBIG;
 	index += merkle_pos >> PAGE_SHIFT;
 again:
 	folio = __filemap_get_folio(inode->i_mapping, index, FGP_ACCESSED, 0);
@@ -739,7 +739,7 @@ static struct page *btrfs_read_merkle_tree_page(struct inode *inode,
 		if (!folio_test_uptodate(folio)) {
 			folio_unlock(folio);
 			folio_put(folio);
-			return ERR_PTR(-EIO);
+			return -EIO;
 		}
 		folio_unlock(folio);
 		goto out;
@@ -748,7 +748,7 @@ static struct page *btrfs_read_merkle_tree_page(struct inode *inode,
 	folio = filemap_alloc_folio(mapping_gfp_constraint(inode->i_mapping, ~__GFP_FS),
 				    0);
 	if (!folio)
-		return ERR_PTR(-ENOMEM);
+		return -ENOMEM;
 
 	ret = filemap_add_folio(inode->i_mapping, folio, index, GFP_NOFS);
 	if (ret) {
@@ -756,7 +756,7 @@ static struct page *btrfs_read_merkle_tree_page(struct inode *inode,
 		/* Did someone else insert a folio here? */
 		if (ret == -EEXIST)
 			goto again;
-		return ERR_PTR(ret);
+		return ret;
 	}
 
 	/*
@@ -769,7 +769,7 @@ static struct page *btrfs_read_merkle_tree_page(struct inode *inode,
 			     folio_address(folio), PAGE_SIZE, &folio->page);
 	if (ret < 0) {
 		folio_put(folio);
-		return ERR_PTR(ret);
+		return ret;
 	}
 	if (ret < PAGE_SIZE)
 		folio_zero_segment(folio, ret, PAGE_SIZE);
@@ -778,7 +778,8 @@ static struct page *btrfs_read_merkle_tree_page(struct inode *inode,
 	folio_unlock(folio);
 
 out:
-	return folio_file_page(folio, index);
+	return fsverity_set_block_page(block, folio_file_page(folio, index),
+				       offset);
 }
 
 /*
@@ -809,6 +810,7 @@ const struct fsverity_operations btrfs_verityops = {
 	.begin_enable_verity     = btrfs_begin_enable_verity,
 	.end_enable_verity       = btrfs_end_enable_verity,
 	.get_verity_descriptor   = btrfs_get_verity_descriptor,
-	.read_merkle_tree_page   = btrfs_read_merkle_tree_page,
+	.read_merkle_tree_block  = btrfs_read_merkle_tree_block,
 	.write_merkle_tree_block = btrfs_write_merkle_tree_block,
+	.drop_merkle_tree_block	 = fsverity_drop_page_merke_tree_block,
 };
diff --git a/fs/ext4/verity.c b/fs/ext4/verity.c
index 4e2f01f048c09b..5623e2c1c302e8 100644
--- a/fs/ext4/verity.c
+++ b/fs/ext4/verity.c
@@ -358,15 +358,13 @@ static int ext4_get_verity_descriptor(struct inode *inode, void *buf,
 	return desc_size;
 }
 
-static struct page *ext4_read_merkle_tree_page(struct inode *inode,
-					       pgoff_t index,
-					       unsigned long num_ra_pages,
-					       u8 log_blocksize)
+static int ext4_read_merkle_tree_block(struct inode *inode, unsigned int offset,
+		struct fsverity_block *block, unsigned long num_ra_pages)
 {
 	struct folio *folio;
+	pgoff_t index;
 
-	index += ext4_verity_metadata_pos(inode) >> PAGE_SHIFT;
-
+	index = (ext4_verity_metadata_pos(inode) + offset) >> PAGE_SHIFT;
 	folio = __filemap_get_folio(inode->i_mapping, index, FGP_ACCESSED, 0);
 	if (IS_ERR(folio) || !folio_test_uptodate(folio)) {
 		DEFINE_READAHEAD(ractl, NULL, NULL, inode->i_mapping, index);
@@ -377,9 +375,10 @@ static struct page *ext4_read_merkle_tree_page(struct inode *inode,
 			page_cache_ra_unbounded(&ractl, num_ra_pages, 0);
 		folio = read_mapping_folio(inode->i_mapping, index, NULL);
 		if (IS_ERR(folio))
-			return ERR_CAST(folio);
+			return PTR_ERR(folio);
 	}
-	return folio_file_page(folio, index);
+	return fsverity_set_block_page(block, folio_file_page(folio, index),
+				       offset);
 }
 
 static int ext4_write_merkle_tree_block(struct inode *inode, const void *buf,
@@ -394,6 +393,7 @@ const struct fsverity_operations ext4_verityops = {
 	.begin_enable_verity	= ext4_begin_enable_verity,
 	.end_enable_verity	= ext4_end_enable_verity,
 	.get_verity_descriptor	= ext4_get_verity_descriptor,
-	.read_merkle_tree_page	= ext4_read_merkle_tree_page,
+	.read_merkle_tree_block	= ext4_read_merkle_tree_block,
 	.write_merkle_tree_block = ext4_write_merkle_tree_block,
+	.drop_merkle_tree_block	= fsverity_drop_page_merke_tree_block,
 };
diff --git a/fs/f2fs/verity.c b/fs/f2fs/verity.c
index 601ab9f0c02492..aac9281e9c4565 100644
--- a/fs/f2fs/verity.c
+++ b/fs/f2fs/verity.c
@@ -255,15 +255,13 @@ static int f2fs_get_verity_descriptor(struct inode *inode, void *buf,
 	return size;
 }
 
-static struct page *f2fs_read_merkle_tree_page(struct inode *inode,
-					       pgoff_t index,
-					       unsigned long num_ra_pages,
-					       u8 log_blocksize)
+static int f2fs_read_merkle_tree_block(struct inode *inode, unsigned int offset,
+		struct fsverity_block *block, unsigned long num_ra_pages)
 {
 	struct page *page;
+	pgoff_t index;
 
-	index += f2fs_verity_metadata_pos(inode) >> PAGE_SHIFT;
-
+	index = (f2fs_verity_metadata_pos(inode) + offset) >> PAGE_SHIFT;
 	page = find_get_page_flags(inode->i_mapping, index, FGP_ACCESSED);
 	if (!page || !PageUptodate(page)) {
 		DEFINE_READAHEAD(ractl, NULL, NULL, inode->i_mapping, index);
@@ -274,7 +272,7 @@ static struct page *f2fs_read_merkle_tree_page(struct inode *inode,
 			page_cache_ra_unbounded(&ractl, num_ra_pages, 0);
 		page = read_mapping_page(inode->i_mapping, index, NULL);
 	}
-	return page;
+	return fsverity_set_block_page(block, page, offset);
 }
 
 static int f2fs_write_merkle_tree_block(struct inode *inode, const void *buf,
@@ -289,6 +287,7 @@ const struct fsverity_operations f2fs_verityops = {
 	.begin_enable_verity	= f2fs_begin_enable_verity,
 	.end_enable_verity	= f2fs_end_enable_verity,
 	.get_verity_descriptor	= f2fs_get_verity_descriptor,
-	.read_merkle_tree_page	= f2fs_read_merkle_tree_page,
+	.read_merkle_tree_block	= f2fs_read_merkle_tree_block,
 	.write_merkle_tree_block = f2fs_write_merkle_tree_block,
+	.drop_merkle_tree_block	= fsverity_drop_page_merke_tree_block,
 };
diff --git a/fs/verity/read_metadata.c b/fs/verity/read_metadata.c
index 182bddf5dec54c..5e362f8562bd5d 100644
--- a/fs/verity/read_metadata.c
+++ b/fs/verity/read_metadata.c
@@ -12,10 +12,33 @@
 #include <linux/sched/signal.h>
 #include <linux/uaccess.h>
 
+int fsverity_set_block_page(struct fsverity_block *block,
+		struct page *page, unsigned int index)
+{
+	if (IS_ERR(page))
+		return PTR_ERR(page);
+	block->kaddr = page_address(page) + (index % PAGE_SIZE);
+	block->cached = PageChecked(page);
+	block->context = page;
+	return 0;
+}
+EXPORT_SYMBOL_GPL(fsverity_set_block_page);
+
+void fsverity_drop_page_merke_tree_block(struct fsverity_block *block)
+{
+	struct page *page = block->context;
+
+	if (block->verified)
+		SetPageChecked(page);
+	put_page(page);
+}
+EXPORT_SYMBOL_GPL(fsverity_drop_page_merke_tree_block);
+
 static int fsverity_read_merkle_tree(struct inode *inode,
 				     const struct fsverity_info *vi,
 				     void __user *buf, u64 offset, int length)
 {
+	const struct fsverity_operations *vop = inode->i_sb->s_vop;
 	u64 end_offset;
 	unsigned int offs_in_block;
 	unsigned int block_size = vi->tree_params.block_size;
@@ -45,20 +68,19 @@ static int fsverity_read_merkle_tree(struct inode *inode,
 		struct fsverity_block block;
 
 		block.len = block_size;
-		if (fsverity_read_merkle_tree_block(inode,
-					index << vi->tree_params.log_blocksize,
-					&block, num_ra_pages)) {
-			fsverity_drop_block(inode, &block);
+		if (vop->read_merkle_tree_block(inode,
+				index << vi->tree_params.log_blocksize,
+				&block, num_ra_pages)) {
 			err = -EFAULT;
 			break;
 		}
 
 		if (copy_to_user(buf, block.kaddr + offs_in_block, bytes_to_copy)) {
-			fsverity_drop_block(inode, &block);
+			vop->drop_merkle_tree_block(&block);
 			err = -EFAULT;
 			break;
 		}
-		fsverity_drop_block(inode, &block);
+		vop->drop_merkle_tree_block(&block);
 		block.kaddr = NULL;
 
 		retval += bytes_to_copy;
diff --git a/fs/verity/verify.c b/fs/verity/verify.c
index dfe01f12184341..9b84262a6fa413 100644
--- a/fs/verity/verify.c
+++ b/fs/verity/verify.c
@@ -42,6 +42,7 @@ verify_data_block(struct inode *inode, struct fsverity_info *vi,
 		  const void *data, u64 data_pos, unsigned long max_ra_pages)
 {
 	const struct merkle_tree_params *params = &vi->tree_params;
+	const struct fsverity_operations *vop = inode->i_sb->s_vop;
 	const unsigned int hsize = params->digest_size;
 	int level;
 	int err;
@@ -115,9 +116,9 @@ verify_data_block(struct inode *inode, struct fsverity_info *vi,
 		block->len = params->block_size;
 		num_ra_pages = level == 0 ?
 			min(max_ra_pages, params->tree_pages - hpage_idx) : 0;
-		err = fsverity_read_merkle_tree_block(
-			inode, hblock_idx << params->log_blocksize, block,
-			num_ra_pages);
+		err = vop->read_merkle_tree_block(inode,
+				hblock_idx << params->log_blocksize, block,
+				num_ra_pages);
 		if (err) {
 			fsverity_err(inode,
 				     "Error %d reading Merkle tree block %lu",
@@ -127,7 +128,7 @@ verify_data_block(struct inode *inode, struct fsverity_info *vi,
 		if (is_hash_block_verified(vi, hblock_idx, block->cached)) {
 			memcpy(_want_hash, block->kaddr + hoffset, hsize);
 			want_hash = _want_hash;
-			fsverity_drop_block(inode, block);
+			vop->drop_merkle_tree_block(block);
 			goto descend;
 		}
 		hblocks[level].index = hblock_idx;
@@ -157,7 +158,7 @@ verify_data_block(struct inode *inode, struct fsverity_info *vi,
 		block->verified = true;
 		memcpy(_want_hash, haddr + hoffset, hsize);
 		want_hash = _want_hash;
-		fsverity_drop_block(inode, block);
+		vop->drop_merkle_tree_block(block);
 	}
 
 	/* Finally, verify the data block. */
@@ -174,9 +175,8 @@ verify_data_block(struct inode *inode, struct fsverity_info *vi,
 		     params->hash_alg->name, hsize, want_hash,
 		     params->hash_alg->name, hsize, real_hash);
 error:
-	for (; level > 0; level--) {
-		fsverity_drop_block(inode, &hblocks[level - 1].block);
-	}
+	for (; level > 0; level--)
+		vop->drop_merkle_tree_block(&hblocks[level - 1].block);
 	return false;
 }
 
diff --git a/include/linux/fsverity.h b/include/linux/fsverity.h
index ce37a430bc97f2..ae9ae7719af558 100644
--- a/include/linux/fsverity.h
+++ b/include/linux/fsverity.h
@@ -104,27 +104,6 @@ struct fsverity_operations {
 	int (*get_verity_descriptor)(struct inode *inode, void *buf,
 				     size_t bufsize);
 
-	/**
-	 * Read a Merkle tree page of the given inode.
-	 *
-	 * @inode: the inode
-	 * @index: 0-based index of the page within the Merkle tree
-	 * @num_ra_pages: The number of Merkle tree pages that should be
-	 *		  prefetched starting at @index if the page at @index
-	 *		  isn't already cached.  Implementations may ignore this
-	 *		  argument; it's only a performance optimization.
-	 *
-	 * This can be called at any time on an open verity file.  It may be
-	 * called by multiple processes concurrently, even with the same page.
-	 *
-	 * Note that this must retrieve a *page*, not necessarily a *block*.
-	 *
-	 * Return: the page on success, ERR_PTR() on failure
-	 */
-	struct page *(*read_merkle_tree_page)(struct inode *inode,
-					      pgoff_t index,
-					      unsigned long num_ra_pages,
-					      u8 log_blocksize);
 	/**
 	 * Read a Merkle tree block of the given inode.
 	 * @inode: the inode
@@ -162,13 +141,12 @@ struct fsverity_operations {
 
 	/**
 	 * Release the reference to a Merkle tree block
-	 *
-	 * @page: the block to release
+	 * @block: the block to release
 	 *
 	 * This is called when fs-verity is done with a block obtained with
 	 * ->read_merkle_tree_block().
 	 */
-	void (*drop_block)(struct fsverity_block *block);
+	void (*drop_merkle_tree_block)(struct fsverity_block *block);
 };
 
 #ifdef CONFIG_FS_VERITY
@@ -217,74 +195,16 @@ static inline void fsverity_cleanup_inode(struct inode *inode)
 
 int fsverity_ioctl_read_metadata(struct file *filp, const void __user *uarg);
 
+int fsverity_set_block_page(struct fsverity_block *block,
+		struct page *page, unsigned int index);
+void fsverity_drop_page_merke_tree_block(struct fsverity_block *block);
+
 /* verify.c */
 
 bool fsverity_verify_blocks(struct folio *folio, size_t len, size_t offset);
 void fsverity_verify_bio(struct bio *bio);
 void fsverity_enqueue_verify_work(struct work_struct *work);
 
-/**
- * fsverity_drop_block() - drop block obtained with ->read_merkle_tree_block()
- * @inode: inode in use for verification or metadata reading
- * @block: block to be dropped
- *
- * Generic put_page() method. Calls out back to filesystem if ->drop_block() is
- * set, otherwise do nothing.
- *
- */
-static inline void fsverity_drop_block(struct inode *inode,
-		struct fsverity_block *block)
-{
-	if (inode->i_sb->s_vop->drop_block)
-		inode->i_sb->s_vop->drop_block(block);
-	else {
-		struct page *page = (struct page *)block->context;
-
-		if (block->verified)
-			SetPageChecked(page);
-
-		put_page(page);
-	}
-}
-
-/**
- * fsverity_read_block_from_page() - layer between fs using read page
- * and read block
- * @inode: inode in use for verification or metadata reading
- * @index: index of the block in the tree (offset into the tree)
- * @block: block to be read
- * @num_ra_pages: number of pages to readahead, may be ignored
- *
- * Depending on fs implementation use read_merkle_tree_block or
- * read_merkle_tree_page.
- */
-static inline int fsverity_read_merkle_tree_block(struct inode *inode,
-					unsigned int index,
-					struct fsverity_block *block,
-					unsigned long num_ra_pages)
-{
-	struct page *page;
-
-	if (inode->i_sb->s_vop->read_merkle_tree_block)
-		return inode->i_sb->s_vop->read_merkle_tree_block(
-			inode, index, block, num_ra_pages);
-
-	page = inode->i_sb->s_vop->read_merkle_tree_page(
-			inode, index >> PAGE_SHIFT, num_ra_pages,
-			block->len);
-
-	block->kaddr = page_address(page) + (index % PAGE_SIZE);
-	block->cached = PageChecked(page);
-	block->context = page;
-
-	if (IS_ERR(page))
-		return PTR_ERR(page);
-	else
-		return 0;
-}
-
-
-
 #else /* !CONFIG_FS_VERITY */
 
 static inline struct fsverity_info *fsverity_get_info(const struct inode *inode)
@@ -362,20 +282,6 @@ static inline void fsverity_enqueue_verify_work(struct work_struct *work)
 	WARN_ON_ONCE(1);
 }
 
-static inline void fsverity_drop_page(struct inode *inode, struct page *page)
-{
-	WARN_ON_ONCE(1);
-}
-
-static inline int fsverity_read_merkle_tree_block(struct inode *inode,
-					unsigned int index,
-					struct fsverity_block *block,
-					unsigned long num_ra_pages)
-{
-	WARN_ON_ONCE(1);
-	return -EOPNOTSUPP;
-}
-
 #endif	/* !CONFIG_FS_VERITY */
 
 static inline bool fsverity_verify_folio(struct folio *folio)
Eric Biggers May 15, 2024, 2:16 a.m. UTC | #4
On Wed, May 01, 2024 at 09:42:07PM -0700, Christoph Hellwig wrote:
> On Wed, May 01, 2024 at 03:35:19PM -0700, Darrick J. Wong wrote:
> > Got a link?  This is the first I've heard of this, but TBH I've been
> > ignoring a /lot/ of things trying to get online repair merged (thank
> > you!) over the past months...
> 
> This was long before I got involved with repair :)
> 
> Below is what I found in my local tree.  It doesn't have a proper commit
> log, so I probably only sent it out as a RFC in reply to a patch series
> posting, most likely untested:
> 
> commit c11dcbe101a240c7a9e9bae7efaff2779d88b292
> Author: Christoph Hellwig <hch@lst.de>
> Date:   Mon Oct 16 14:14:11 2023 +0200
> 
>     fsverity block interface

That RFC patch doesn't take into account the bitmap, but the overall idea does
seem to work.  I've had a go at the block-based Merkle tree caching support at
https://lore.kernel.org/fsverity/20240515015320.323443-1-ebiggers@kernel.org.
Let me know what you think.

(The one thing I'm not a huge fan of is the indirect call on the drop path.
Previously, it wasn't necessary for filesystems using page based caching.  This
hopefully is a minor point, but I'm not sure, since unfortunately indirect calls
are atrociously expensive these days -- especially on x86.  Having the single
read_block / drop_block interface does seem like the right solution, though.  We
could always optimize the pagecache-based drop to a direct call later, while
conceptually still having it be an implementation of the same interface.)

- Eric
diff mbox series

Patch

diff --git a/fs/verity/open.c b/fs/verity/open.c
index fdeb95eca3af3..4777130322866 100644
--- a/fs/verity/open.c
+++ b/fs/verity/open.c
@@ -180,9 +180,23 @@  static int compute_file_digest(const struct fsverity_hash_alg *hash_alg,
 struct fsverity_info *fsverity_create_info(const struct inode *inode,
 					   struct fsverity_descriptor *desc)
 {
+	const struct fsverity_operations *vops = inode->i_sb->s_vop;
 	struct fsverity_info *vi;
 	int err;
 
+	/*
+	 * If the filesystem implementation supplies Merkle tree content on a
+	 * per-block basis, it must implement both the read and drop functions.
+	 * If it supplies content on a per-page basis, neither should be
+	 * provided.
+	 */
+	if (vops->read_merkle_tree_page)
+		WARN_ON_ONCE(vops->read_merkle_tree_block != NULL ||
+			     vops->drop_merkle_tree_block != NULL);
+	else
+		WARN_ON_ONCE(vops->read_merkle_tree_block == NULL ||
+			     vops->drop_merkle_tree_block == NULL);
+
 	vi = kmem_cache_zalloc(fsverity_info_cachep, GFP_KERNEL);
 	if (!vi)
 		return ERR_PTR(-ENOMEM);
@@ -213,7 +227,13 @@  struct fsverity_info *fsverity_create_info(const struct inode *inode,
 	if (err)
 		goto fail;
 
-	if (vi->tree_params.block_size != PAGE_SIZE) {
+	/*
+	 * If the fs supplies Merkle tree content on a per-page basis and the
+	 * page size doesn't match the block size, fs-verity must use the
+	 * hash_block_verified bitmap instead of PG_checked.
+	 */
+	if (vops->read_merkle_tree_block == NULL &&
+	    vi->tree_params.block_size != PAGE_SIZE) {
 		/*
 		 * When the Merkle tree block size and page size differ, we use
 		 * a bitmap to keep track of which hash blocks have been
diff --git a/fs/verity/verify.c b/fs/verity/verify.c
index 1c4a7c63c0a1c..55ada2af290ac 100644
--- a/fs/verity/verify.c
+++ b/fs/verity/verify.c
@@ -20,11 +20,22 @@  static bool is_hash_block_verified(struct inode *inode,
 				   struct fsverity_blockbuf *block,
 				   unsigned long hblock_idx)
 {
+	const struct fsverity_operations *vops = inode->i_sb->s_vop;
 	struct fsverity_info *vi = inode->i_verity_info;
-	struct page *hpage = (struct page *)block->context;
+	struct page *hpage;
 	unsigned int blocks_per_page;
 	unsigned int i;
 
+	/*
+	 * If the filesystem supplies Merkle tree content on a per-block basis,
+	 * rely on the implementation to retain verified status.
+	 */
+	if (vops->read_merkle_tree_block)
+		return block->verified;
+
+	/* Otherwise, the filesystem uses page-based caching. */
+	hpage = (struct page *)block->context;
+
 	/*
 	 * When the Merkle tree block size and page size are the same, then the
 	 * ->hash_block_verified bitmap isn't allocated, and we use PG_checked
@@ -96,6 +107,7 @@  verify_data_block(struct inode *inode, struct fsverity_info *vi,
 		  const void *data, u64 data_pos, unsigned long max_ra_bytes)
 {
 	const struct merkle_tree_params *params = &vi->tree_params;
+	const struct fsverity_operations *vops = inode->i_sb->s_vop;
 	const unsigned int hsize = params->digest_size;
 	int level;
 	unsigned long ra_bytes;
@@ -204,7 +216,9 @@  verify_data_block(struct inode *inode, struct fsverity_info *vi,
 		 * idempotent, as the same hash block might be verified by
 		 * multiple threads concurrently.
 		 */
-		if (vi->hash_block_verified)
+		if (vops->read_merkle_tree_block)
+			block->verified = true;
+		else if (vi->hash_block_verified)
 			set_bit(hblock_idx, vi->hash_block_verified);
 		else
 			SetPageChecked((struct page *)block->context);
@@ -377,6 +391,19 @@  int fsverity_read_merkle_tree_block(struct inode *inode,
 
 	block->pos = pos;
 	block->size = params->block_size;
+	block->verified = false;
+
+	if (vops->read_merkle_tree_block) {
+		struct fsverity_readmerkle req = {
+			.inode = inode,
+			.ra_bytes = ra_bytes,
+		};
+
+		err = vops->read_merkle_tree_block(&req, block);
+		if (err)
+			goto bad;
+		return 0;
+	}
 
 	index = pos >> params->log_blocksize;
 	page_idx = round_down(index, params->blocks_per_page);
@@ -408,8 +435,14 @@  int fsverity_read_merkle_tree_block(struct inode *inode,
 void fsverity_drop_merkle_tree_block(struct inode *inode,
 				     struct fsverity_blockbuf *block)
 {
-	kunmap_local(block->kaddr);
-	put_page((struct page *)block->context);
+	const struct fsverity_operations *vops = inode->i_sb->s_vop;
+
+	if (vops->drop_merkle_tree_block) {
+		vops->drop_merkle_tree_block(block);
+	} else {
+		kunmap_local(block->kaddr);
+		put_page((struct page *)block->context);
+	}
 	block->kaddr = NULL;
 	block->context = NULL;
 }
diff --git a/include/linux/fsverity.h b/include/linux/fsverity.h
index 05f8e89e0f470..ad17f8553f9cf 100644
--- a/include/linux/fsverity.h
+++ b/include/linux/fsverity.h
@@ -32,17 +32,38 @@ 
  * @kaddr: virtual address of the block's data
  * @pos: the position of the block in the Merkle tree (in bytes)
  * @size: the Merkle tree block size
+ * @verified: has this buffer been validated?
  *
  * Buffer containing a single Merkle Tree block.  When fs-verity wants to read
  * merkle data from disk, it passes the filesystem a buffer with the @pos,
- * @index, and @size fields filled out.  The filesystem sets @kaddr and
- * @context.
+ * @index, and @size fields filled out.  The filesystem sets @kaddr, @context,
+ * and @verified.
+ *
+ * While reading the tree, fs-verity calls ->read_merkle_tree_block followed by
+ * ->drop_merkle_tree_block to let filesystem know that memory can be freed.
+ *
+ * The context is optional. This field can be used by filesystem to pass
+ * through state from ->read_merkle_tree_block to ->drop_merkle_tree_block.
  */
 struct fsverity_blockbuf {
 	void *context;
 	void *kaddr;
 	loff_t pos;
 	unsigned int size;
+	unsigned int verified:1;
+};
+
+/**
+ * struct fsverity_readmerkle - Request to read a Merkle Tree block buffer
+ * @inode: the inode to read
+ * @ra_bytes: The number of bytes that should be prefetched starting at pos
+ *		if the page at @block->offset isn't already cached.
+ *		Implementations may ignore this argument; it's only a
+ *		performance optimization.
+ */
+struct fsverity_readmerkle {
+	struct inode *inode;
+	unsigned long ra_bytes;
 };
 
 /* Verity operations for filesystems */
@@ -120,12 +141,35 @@  struct fsverity_operations {
 	 *
 	 * Note that this must retrieve a *page*, not necessarily a *block*.
 	 *
+	 * If this function is implemented, do not implement
+	 * ->read_merkle_tree_block or ->drop_merkle_tree_block.
+	 *
 	 * Return: the page on success, ERR_PTR() on failure
 	 */
 	struct page *(*read_merkle_tree_page)(struct inode *inode,
 					      pgoff_t index,
 					      unsigned long num_ra_pages);
 
+	/**
+	 * Read a Merkle tree block of the given inode.
+	 * @req: read request; see struct fsverity_readmerkle
+	 * @block: block buffer for filesystem to point it to the block
+	 *
+	 * This can be called at any time on an open verity file.  It may be
+	 * called by multiple processes concurrently.
+	 *
+	 * Implementations may cache the @block->verified state in
+	 * ->drop_merkle_tree_block.  They must clear the @block->verified
+	 * flag for a cache miss.
+	 *
+	 * If this function is implemented, ->drop_merkle_tree_block must also
+	 * be implemented.
+	 *
+	 * Return: 0 on success, -errno on failure
+	 */
+	int (*read_merkle_tree_block)(const struct fsverity_readmerkle *req,
+				      struct fsverity_blockbuf *block);
+
 	/**
 	 * Write a Merkle tree block to the given inode.
 	 *
@@ -141,6 +185,22 @@  struct fsverity_operations {
 	 */
 	int (*write_merkle_tree_block)(struct inode *inode, const void *buf,
 				       u64 pos, unsigned int size);
+
+	/**
+	 * Release the reference to a Merkle tree block
+	 *
+	 * @block: the block to release
+	 *
+	 * This is called when fs-verity is done with a block obtained with
+	 * ->read_merkle_tree_block().
+	 *
+	 * Implementations should cache a @block->verified==1 state to avoid
+	 * unnecessary revalidations during later accesses.
+	 *
+	 * If this function is implemented, ->read_merkle_tree_block must also
+	 * be implemented.
+	 */
+	void (*drop_merkle_tree_block)(struct fsverity_blockbuf *block);
 };
 
 #ifdef CONFIG_FS_VERITY