Message ID | 20231006184922.252188-11-aalbersh@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | fs-verity support for XFS | expand |
Hi Andrey,
kernel test robot noticed the following build errors:
[auto build test ERROR on xfs-linux/for-next]
[also build test ERROR on kdave/for-next tytso-ext4/dev jaegeuk-f2fs/dev-test jaegeuk-f2fs/dev linus/master v6.6-rc4 next-20231006]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Andrey-Albershteyn/xfs-Add-new-name-to-attri-d/20231007-025742
base: https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git for-next
patch link: https://lore.kernel.org/r/20231006184922.252188-11-aalbersh%40redhat.com
patch subject: [PATCH v3 10/28] fsverity: operate with Merkle tree blocks instead of pages
config: powerpc-randconfig-003-20231007 (https://download.01.org/0day-ci/archive/20231007/202310071152.hVd0qEeD-lkp@intel.com/config)
compiler: powerpc-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231007/202310071152.hVd0qEeD-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202310071152.hVd0qEeD-lkp@intel.com/
All errors (new ones prefixed by >>):
powerpc-linux-ld: fs/verity/read_metadata.o: in function `fsverity_read_merkle_tree':
>> read_metadata.c:(.text.fsverity_read_merkle_tree+0xa0): undefined reference to `__umoddi3'
On Fri, Oct 06, 2023 at 08:49:04PM +0200, Andrey Albershteyn wrote: > fsverity: operate with Merkle tree blocks instead of pages Well, it already does, just not for the Merkle tree caching. A better title might be something like "fsverity: support block-based Merkle tree caching". > fsverity expects filesystem to provide PAGEs with Merkle tree > blocks in it. Then, when fsverity is done with processing the > blocks, reference to PAGE is freed. This doesn't fit well with the > way XFS manages its memory. BTW, I encourage using "fs/verity/" when referring specifically to the fsverity support code in the kernel, which is located in that directory, as opposed to the whole fsverity feature. > This patch moves page reference management out of fsverity to > filesystem. This way fsverity expects a kaddr to the Merkle tree > block and filesystem can handle all caching and reference counting. That's not done for the existing filesystems, though. Which is probably the right choice, but it isn't what this commit message says. > diff --git a/fs/verity/open.c b/fs/verity/open.c > index dfb9fe6aaae9..8665d8b40081 100644 > --- a/fs/verity/open.c > +++ b/fs/verity/open.c > @@ -126,19 +126,16 @@ int fsverity_init_merkle_tree_params(struct merkle_tree_params *params, > } > > /* > - * With block_size != PAGE_SIZE, an in-memory bitmap will need to be > - * allocated to track the "verified" status of hash blocks. Don't allow > - * this bitmap to get too large. For now, limit it to 1 MiB, which > - * limits the file size to about 4.4 TB with SHA-256 and 4K blocks. > + * An in-memory bitmap will need to be allocated to track the "verified" > + * status of hash blocks. Don't allow this bitmap to get too large. > + * For now, limit it to 1 MiB, which limits the file size to > + * about 4.4 TB with SHA-256 and 4K blocks. > * > * Together with the fact that the data, and thus also the Merkle tree, > * cannot have more than ULONG_MAX pages, this implies that hash block > - * indices can always fit in an 'unsigned long'. But to be safe, we > - * explicitly check for that too. Note, this is only for hash block > - * indices; data block indices might not fit in an 'unsigned long'. > + * indices can always fit in an 'unsigned long'. > */ > - if ((params->block_size != PAGE_SIZE && offset > 1 << 23) || > - offset > ULONG_MAX) { > + if (offset > (1 << 23)) { > fsverity_err(inode, "Too many blocks in Merkle tree"); > err = -EFBIG; > goto out_err; This hunk should have been in the patch "fsverity: always use bitmap to track verified status". > diff --git a/fs/verity/read_metadata.c b/fs/verity/read_metadata.c > index 197624cab43e..182bddf5dec5 100644 > --- a/fs/verity/read_metadata.c > +++ b/fs/verity/read_metadata.c > @@ -16,9 +16,9 @@ static int fsverity_read_merkle_tree(struct inode *inode, > const struct fsverity_info *vi, > void __user *buf, u64 offset, int length) > { > - const struct fsverity_operations *vops = inode->i_sb->s_vop; > u64 end_offset; > - unsigned int offs_in_page; > + unsigned int offs_in_block; > + unsigned int block_size = vi->tree_params.block_size; Maybe do here: const unsigned int block_size = vi->tree_params.block_size; const u8 log_blocksize = vi->tree_params.log_blocksize; Then both are easily accessible to the rest of the function. > pgoff_t index, last_index; > int retval = 0; > int err = 0; > @@ -26,8 +26,8 @@ static int fsverity_read_merkle_tree(struct inode *inode, > end_offset = min(offset + length, vi->tree_params.tree_size); > if (offset >= end_offset) > return 0; > - offs_in_page = offset_in_page(offset); > - last_index = (end_offset - 1) >> PAGE_SHIFT; > + offs_in_block = offset % block_size; No modulo by non-constant values, please. The block size always is a power of 2, so use 'offset & (block_size - 1)'. > + last_index = (end_offset - 1) >> vi->tree_params.log_blocksize; > > /* > * Iterate through each Merkle tree page in the requested range and copy > * the requested portion to userspace. Note that the Merkle tree block > * size isn't important here, as we are returning a byte stream; i.e., > * we can just work with pages even if the tree block size != PAGE_SIZE. > */ The above comment needs to be updated. > - for (index = offset >> PAGE_SHIFT; index <= last_index; index++) { > + for (index = offset >> vi->tree_params.log_blocksize; > + index <= last_index; index++) { > unsigned long num_ra_pages = > min_t(unsigned long, last_index - index + 1, > inode->i_sb->s_bdi->io_pages); > unsigned int bytes_to_copy = min_t(u64, end_offset - offset, > - PAGE_SIZE - offs_in_page); > - struct page *page; > - const void *virt; > + block_size - offs_in_block); > + struct fsverity_block block; > > - page = vops->read_merkle_tree_page(inode, index, num_ra_pages, > - vi->tree_params.log_blocksize); > - if (IS_ERR(page)) { > - err = PTR_ERR(page); > - fsverity_err(inode, > - "Error %d reading Merkle tree page %lu", > - err, index); > + block.len = block_size; > + if (fsverity_read_merkle_tree_block(inode, > + index << vi->tree_params.log_blocksize, > + &block, num_ra_pages)) { > + fsverity_drop_block(inode, &block); > + err = -EFAULT; > break; > } EFAULT is the wrong error code for an I/O error. > diff --git a/fs/verity/verify.c b/fs/verity/verify.c > index f556336ebd8d..dfe01f121843 100644 > --- a/fs/verity/verify.c > +++ b/fs/verity/verify.c > @@ -44,15 +44,15 @@ verify_data_block(struct inode *inode, struct fsverity_info *vi, > const struct merkle_tree_params *params = &vi->tree_params; > const unsigned int hsize = params->digest_size; > int level; > + int err; > + int num_ra_pages; > u8 _want_hash[FS_VERITY_MAX_DIGEST_SIZE]; > const u8 *want_hash; > u8 real_hash[FS_VERITY_MAX_DIGEST_SIZE]; > /* The hash blocks that are traversed, indexed by level */ > struct { > - /* Page containing the hash block */ > - struct page *page; > - /* Mapped address of the hash block (will be within @page) */ > - const void *addr; > + /* Block containing the hash block */ > + struct fsverity_block block; "Block containing the hash block" is nonsensical. I think this indicates that fsverity_block is misnamed. It should be called something like fsverity_blockbuf. Then the above comment would be something like "Buffer containing the hash block". > @@ -93,10 +93,8 @@ verify_data_block(struct inode *inode, struct fsverity_info *vi, > unsigned long next_hidx; > unsigned long hblock_idx; > pgoff_t hpage_idx; > - unsigned int hblock_offset_in_page; > unsigned int hoffset; > - struct page *hpage; > - const void *haddr; > + struct fsverity_block *block = &hblocks[level].block; > > /* > * The index of the block in the current level; also the index > @@ -110,34 +108,28 @@ verify_data_block(struct inode *inode, struct fsverity_info *vi, > /* Index of the hash page in the tree overall */ > hpage_idx = hblock_idx >> params->log_blocks_per_page; > > - /* Byte offset of the hash block within the page */ > - hblock_offset_in_page = > - (hblock_idx << params->log_blocksize) & ~PAGE_MASK; > - > /* Byte offset of the hash within the block */ > hoffset = (hidx << params->log_digestsize) & > (params->block_size - 1); > > - hpage = inode->i_sb->s_vop->read_merkle_tree_page(inode, > - hpage_idx, level == 0 ? min(max_ra_pages, > - params->tree_pages - hpage_idx) : 0, > - params->log_blocksize); > - if (IS_ERR(hpage)) { > + block->len = params->block_size; > + num_ra_pages = level == 0 ? > + min(max_ra_pages, params->tree_pages - hpage_idx) : 0; The fact that the readahead amount is still calculated in pages seems out of place. Maybe it should be done in bytes? > - if (vi->hash_block_verified) > - set_bit(hblock_idx, vi->hash_block_verified); > - else > - SetPageChecked(hpage); > + set_bit(hblock_idx, vi->hash_block_verified); This hunk also should have been in "fsverity: always use bitmap to track verified status". But as I mentioned on that patch, I'm not sure we should always use the bitmap. > diff --git a/include/linux/fsverity.h b/include/linux/fsverity.h > index cac012d4c86a..ce37a430bc97 100644 > --- a/include/linux/fsverity.h > +++ b/include/linux/fsverity.h > @@ -26,6 +26,24 @@ > /* Arbitrary limit to bound the kmalloc() size. Can be changed. */ > #define FS_VERITY_MAX_DESCRIPTOR_SIZE 16384 > > +/** > + * struct fsverity_block - Merkle Tree block > + * @kaddr: virtual address of the block's data > + * @len: length of the data Maybe use size instead of len? Currently it's pretty consistently called the "block size", not "block length". > + * @cached: true if block was already in cache, false otherwise cached => already_cached? > + * @verified: true if block is verified against Merkle tree > + * @context: filesystem private context > + * > + * Merkle Tree blocks passed and requested from filesystem > + */ It needs to be clearly documented how these fields flow between the filesystem and fs/verity/ when ->read_merkle_tree_block() and ->drop_block() are used. It seems to be different for different fields. Which are inputs and which are outputs to which functions? > + /** > + * Read a Merkle tree block of the given inode. > + * @inode: the inode > + * @index: 0-based index of the block within the Merkle tree > + * @num_ra_pages: The number of pages with blocks that should be > + * prefetched starting at @index if the page at @index > + * isn't already cached. Implementations may ignore this > + * argument; it's only a performance optimization. > + * > + * This can be called at any time on an open verity file. It may be > + * called by multiple processes concurrently. > + * > + * Return: 0 on success, -errno on failure > + */ > + int (*read_merkle_tree_block)(struct inode *inode, > + unsigned int index, > + struct fsverity_block *block, > + unsigned long num_ra_pages); For the second parameter your code actually passes the block position (in bytes), not the block index. Perhaps you meant for it to be 'u64 pos'? Either way, 'unsigned int' is wrong. Merkle tree block indices are unsigned long; positions are u64. > -static inline void fsverity_drop_page(struct inode *inode, struct page *page) > +static inline void fsverity_drop_block(struct inode *inode, > + struct fsverity_block *block) > { > - if (inode->i_sb->s_vop->drop_page) > - inode->i_sb->s_vop->drop_page(page); > - else > + if (inode->i_sb->s_vop->drop_block) > + inode->i_sb->s_vop->drop_block(block); > + else { > + struct page *page = (struct page *)block->context; > + > + if (block->verified) > + SetPageChecked(page); > + Why is PG_checked being set here? > +/** > + * fsverity_read_block_from_page() - layer between fs using read page > + * and read block > + * @inode: inode in use for verification or metadata reading > + * @index: index of the block in the tree (offset into the tree) > + * @block: block to be read > + * @num_ra_pages: number of pages to readahead, may be ignored > + * > + * Depending on fs implementation use read_merkle_tree_block or > + * read_merkle_tree_page. > + */ > +static inline int fsverity_read_merkle_tree_block(struct inode *inode, > + unsigned int index, > + struct fsverity_block *block, > + unsigned long num_ra_pages) > +{ > + struct page *page; > + > + if (inode->i_sb->s_vop->read_merkle_tree_block) > + return inode->i_sb->s_vop->read_merkle_tree_block( > + inode, index, block, num_ra_pages); > + > + page = inode->i_sb->s_vop->read_merkle_tree_page( > + inode, index >> PAGE_SHIFT, num_ra_pages, > + block->len); > + > + block->kaddr = page_address(page) + (index % PAGE_SIZE); > + block->cached = PageChecked(page); > + block->context = page; > + > + if (IS_ERR(page)) > + return PTR_ERR(page); > + else > + return 0; > +} This isn't handling errors from ->read_merkle_tree_page() correctly. Also, page_address() can't be used for pagecache pages unless the file has opted out of highmem. You need to use kmap_local_page(), as the code was doing before. Also, since fsverity_read_merkle_tree_block() is a private function for fs/verity/, not for filesystems to use, it should be put in fs/verity/fsverity_private.h, not in include/linux/fsverity.h. - Eric
Can we just switch everyone over to the block interface and just provide helpers for the existing users? The patch below (untested) should do that and the diffstate looks quite nice. Btw, why do we pass the block offset as unsigned int? Is there anything that guarantees it stays under 32-bits? I would have expected a loff_t there. --- diff --git a/Documentation/filesystems/fsverity.rst b/Documentation/filesystems/fsverity.rst index af889512c6ac99..c616d530a89086 100644 --- a/Documentation/filesystems/fsverity.rst +++ b/Documentation/filesystems/fsverity.rst @@ -648,7 +648,7 @@ which verifies data that has been read into the pagecache of a verity inode. The containing folio must still be locked and not Uptodate, so it's not yet readable by userspace. As needed to do the verification, fsverity_verify_blocks() will call back into the filesystem to read -hash blocks via fsverity_operations::read_merkle_tree_page(). +hash blocks via fsverity_operations::read_merkle_tree_block(). fsverity_verify_blocks() returns false if verification failed; in this case, the filesystem must not set the folio Uptodate. Following this, diff --git a/fs/btrfs/verity.c b/fs/btrfs/verity.c index 2b34796f68d349..4b6134923232e7 100644 --- a/fs/btrfs/verity.c +++ b/fs/btrfs/verity.c @@ -713,20 +713,20 @@ int btrfs_get_verity_descriptor(struct inode *inode, void *buf, size_t buf_size) * * Returns the page we read, or an ERR_PTR on error. */ -static struct page *btrfs_read_merkle_tree_page(struct inode *inode, - pgoff_t index, - unsigned long num_ra_pages, - u8 log_blocksize) +static int btrfs_read_merkle_tree_block(struct inode *inode, + unsigned int offset, struct fsverity_block *block, + unsigned long num_ra_pages) { struct folio *folio; + pgoff_t index = offset >> PAGE_SHIFT; u64 off = (u64)index << PAGE_SHIFT; loff_t merkle_pos = merkle_file_pos(inode); int ret; if (merkle_pos < 0) - return ERR_PTR(merkle_pos); + return merkle_pos; if (merkle_pos > inode->i_sb->s_maxbytes - off - PAGE_SIZE) - return ERR_PTR(-EFBIG); + return -EFBIG; index += merkle_pos >> PAGE_SHIFT; again: folio = __filemap_get_folio(inode->i_mapping, index, FGP_ACCESSED, 0); @@ -739,7 +739,7 @@ static struct page *btrfs_read_merkle_tree_page(struct inode *inode, if (!folio_test_uptodate(folio)) { folio_unlock(folio); folio_put(folio); - return ERR_PTR(-EIO); + return -EIO; } folio_unlock(folio); goto out; @@ -748,7 +748,7 @@ static struct page *btrfs_read_merkle_tree_page(struct inode *inode, folio = filemap_alloc_folio(mapping_gfp_constraint(inode->i_mapping, ~__GFP_FS), 0); if (!folio) - return ERR_PTR(-ENOMEM); + return -ENOMEM; ret = filemap_add_folio(inode->i_mapping, folio, index, GFP_NOFS); if (ret) { @@ -756,7 +756,7 @@ static struct page *btrfs_read_merkle_tree_page(struct inode *inode, /* Did someone else insert a folio here? */ if (ret == -EEXIST) goto again; - return ERR_PTR(ret); + return ret; } /* @@ -769,7 +769,7 @@ static struct page *btrfs_read_merkle_tree_page(struct inode *inode, folio_address(folio), PAGE_SIZE, &folio->page); if (ret < 0) { folio_put(folio); - return ERR_PTR(ret); + return ret; } if (ret < PAGE_SIZE) folio_zero_segment(folio, ret, PAGE_SIZE); @@ -778,7 +778,8 @@ static struct page *btrfs_read_merkle_tree_page(struct inode *inode, folio_unlock(folio); out: - return folio_file_page(folio, index); + return fsverity_set_block_page(block, folio_file_page(folio, index), + offset); } /* @@ -809,6 +810,7 @@ const struct fsverity_operations btrfs_verityops = { .begin_enable_verity = btrfs_begin_enable_verity, .end_enable_verity = btrfs_end_enable_verity, .get_verity_descriptor = btrfs_get_verity_descriptor, - .read_merkle_tree_page = btrfs_read_merkle_tree_page, + .read_merkle_tree_block = btrfs_read_merkle_tree_block, .write_merkle_tree_block = btrfs_write_merkle_tree_block, + .drop_merkle_tree_block = fsverity_drop_page_merke_tree_block, }; diff --git a/fs/ext4/verity.c b/fs/ext4/verity.c index 4e2f01f048c09b..5623e2c1c302e8 100644 --- a/fs/ext4/verity.c +++ b/fs/ext4/verity.c @@ -358,15 +358,13 @@ static int ext4_get_verity_descriptor(struct inode *inode, void *buf, return desc_size; } -static struct page *ext4_read_merkle_tree_page(struct inode *inode, - pgoff_t index, - unsigned long num_ra_pages, - u8 log_blocksize) +static int ext4_read_merkle_tree_block(struct inode *inode, unsigned int offset, + struct fsverity_block *block, unsigned long num_ra_pages) { struct folio *folio; + pgoff_t index; - index += ext4_verity_metadata_pos(inode) >> PAGE_SHIFT; - + index = (ext4_verity_metadata_pos(inode) + offset) >> PAGE_SHIFT; folio = __filemap_get_folio(inode->i_mapping, index, FGP_ACCESSED, 0); if (IS_ERR(folio) || !folio_test_uptodate(folio)) { DEFINE_READAHEAD(ractl, NULL, NULL, inode->i_mapping, index); @@ -377,9 +375,10 @@ static struct page *ext4_read_merkle_tree_page(struct inode *inode, page_cache_ra_unbounded(&ractl, num_ra_pages, 0); folio = read_mapping_folio(inode->i_mapping, index, NULL); if (IS_ERR(folio)) - return ERR_CAST(folio); + return PTR_ERR(folio); } - return folio_file_page(folio, index); + return fsverity_set_block_page(block, folio_file_page(folio, index), + offset); } static int ext4_write_merkle_tree_block(struct inode *inode, const void *buf, @@ -394,6 +393,7 @@ const struct fsverity_operations ext4_verityops = { .begin_enable_verity = ext4_begin_enable_verity, .end_enable_verity = ext4_end_enable_verity, .get_verity_descriptor = ext4_get_verity_descriptor, - .read_merkle_tree_page = ext4_read_merkle_tree_page, + .read_merkle_tree_block = ext4_read_merkle_tree_block, .write_merkle_tree_block = ext4_write_merkle_tree_block, + .drop_merkle_tree_block = fsverity_drop_page_merke_tree_block, }; diff --git a/fs/f2fs/verity.c b/fs/f2fs/verity.c index 601ab9f0c02492..aac9281e9c4565 100644 --- a/fs/f2fs/verity.c +++ b/fs/f2fs/verity.c @@ -255,15 +255,13 @@ static int f2fs_get_verity_descriptor(struct inode *inode, void *buf, return size; } -static struct page *f2fs_read_merkle_tree_page(struct inode *inode, - pgoff_t index, - unsigned long num_ra_pages, - u8 log_blocksize) +static int f2fs_read_merkle_tree_block(struct inode *inode, unsigned int offset, + struct fsverity_block *block, unsigned long num_ra_pages) { struct page *page; + pgoff_t index; - index += f2fs_verity_metadata_pos(inode) >> PAGE_SHIFT; - + index = (f2fs_verity_metadata_pos(inode) + offset) >> PAGE_SHIFT; page = find_get_page_flags(inode->i_mapping, index, FGP_ACCESSED); if (!page || !PageUptodate(page)) { DEFINE_READAHEAD(ractl, NULL, NULL, inode->i_mapping, index); @@ -274,7 +272,7 @@ static struct page *f2fs_read_merkle_tree_page(struct inode *inode, page_cache_ra_unbounded(&ractl, num_ra_pages, 0); page = read_mapping_page(inode->i_mapping, index, NULL); } - return page; + return fsverity_set_block_page(block, page, offset); } static int f2fs_write_merkle_tree_block(struct inode *inode, const void *buf, @@ -289,6 +287,7 @@ const struct fsverity_operations f2fs_verityops = { .begin_enable_verity = f2fs_begin_enable_verity, .end_enable_verity = f2fs_end_enable_verity, .get_verity_descriptor = f2fs_get_verity_descriptor, - .read_merkle_tree_page = f2fs_read_merkle_tree_page, + .read_merkle_tree_block = f2fs_read_merkle_tree_block, .write_merkle_tree_block = f2fs_write_merkle_tree_block, + .drop_merkle_tree_block = fsverity_drop_page_merke_tree_block, }; diff --git a/fs/verity/read_metadata.c b/fs/verity/read_metadata.c index 182bddf5dec54c..5e362f8562bd5d 100644 --- a/fs/verity/read_metadata.c +++ b/fs/verity/read_metadata.c @@ -12,10 +12,33 @@ #include <linux/sched/signal.h> #include <linux/uaccess.h> +int fsverity_set_block_page(struct fsverity_block *block, + struct page *page, unsigned int index) +{ + if (IS_ERR(page)) + return PTR_ERR(page); + block->kaddr = page_address(page) + (index % PAGE_SIZE); + block->cached = PageChecked(page); + block->context = page; + return 0; +} +EXPORT_SYMBOL_GPL(fsverity_set_block_page); + +void fsverity_drop_page_merke_tree_block(struct fsverity_block *block) +{ + struct page *page = block->context; + + if (block->verified) + SetPageChecked(page); + put_page(page); +} +EXPORT_SYMBOL_GPL(fsverity_drop_page_merke_tree_block); + static int fsverity_read_merkle_tree(struct inode *inode, const struct fsverity_info *vi, void __user *buf, u64 offset, int length) { + const struct fsverity_operations *vop = inode->i_sb->s_vop; u64 end_offset; unsigned int offs_in_block; unsigned int block_size = vi->tree_params.block_size; @@ -45,20 +68,19 @@ static int fsverity_read_merkle_tree(struct inode *inode, struct fsverity_block block; block.len = block_size; - if (fsverity_read_merkle_tree_block(inode, - index << vi->tree_params.log_blocksize, - &block, num_ra_pages)) { - fsverity_drop_block(inode, &block); + if (vop->read_merkle_tree_block(inode, + index << vi->tree_params.log_blocksize, + &block, num_ra_pages)) { err = -EFAULT; break; } if (copy_to_user(buf, block.kaddr + offs_in_block, bytes_to_copy)) { - fsverity_drop_block(inode, &block); + vop->drop_merkle_tree_block(&block); err = -EFAULT; break; } - fsverity_drop_block(inode, &block); + vop->drop_merkle_tree_block(&block); block.kaddr = NULL; retval += bytes_to_copy; diff --git a/fs/verity/verify.c b/fs/verity/verify.c index dfe01f12184341..9b84262a6fa413 100644 --- a/fs/verity/verify.c +++ b/fs/verity/verify.c @@ -42,6 +42,7 @@ verify_data_block(struct inode *inode, struct fsverity_info *vi, const void *data, u64 data_pos, unsigned long max_ra_pages) { const struct merkle_tree_params *params = &vi->tree_params; + const struct fsverity_operations *vop = inode->i_sb->s_vop; const unsigned int hsize = params->digest_size; int level; int err; @@ -115,9 +116,9 @@ verify_data_block(struct inode *inode, struct fsverity_info *vi, block->len = params->block_size; num_ra_pages = level == 0 ? min(max_ra_pages, params->tree_pages - hpage_idx) : 0; - err = fsverity_read_merkle_tree_block( - inode, hblock_idx << params->log_blocksize, block, - num_ra_pages); + err = vop->read_merkle_tree_block(inode, + hblock_idx << params->log_blocksize, block, + num_ra_pages); if (err) { fsverity_err(inode, "Error %d reading Merkle tree block %lu", @@ -127,7 +128,7 @@ verify_data_block(struct inode *inode, struct fsverity_info *vi, if (is_hash_block_verified(vi, hblock_idx, block->cached)) { memcpy(_want_hash, block->kaddr + hoffset, hsize); want_hash = _want_hash; - fsverity_drop_block(inode, block); + vop->drop_merkle_tree_block(block); goto descend; } hblocks[level].index = hblock_idx; @@ -157,7 +158,7 @@ verify_data_block(struct inode *inode, struct fsverity_info *vi, block->verified = true; memcpy(_want_hash, haddr + hoffset, hsize); want_hash = _want_hash; - fsverity_drop_block(inode, block); + vop->drop_merkle_tree_block(block); } /* Finally, verify the data block. */ @@ -174,9 +175,8 @@ verify_data_block(struct inode *inode, struct fsverity_info *vi, params->hash_alg->name, hsize, want_hash, params->hash_alg->name, hsize, real_hash); error: - for (; level > 0; level--) { - fsverity_drop_block(inode, &hblocks[level - 1].block); - } + for (; level > 0; level--) + vop->drop_merkle_tree_block(&hblocks[level - 1].block); return false; } diff --git a/include/linux/fsverity.h b/include/linux/fsverity.h index ce37a430bc97f2..ae9ae7719af558 100644 --- a/include/linux/fsverity.h +++ b/include/linux/fsverity.h @@ -104,27 +104,6 @@ struct fsverity_operations { int (*get_verity_descriptor)(struct inode *inode, void *buf, size_t bufsize); - /** - * Read a Merkle tree page of the given inode. - * - * @inode: the inode - * @index: 0-based index of the page within the Merkle tree - * @num_ra_pages: The number of Merkle tree pages that should be - * prefetched starting at @index if the page at @index - * isn't already cached. Implementations may ignore this - * argument; it's only a performance optimization. - * - * This can be called at any time on an open verity file. It may be - * called by multiple processes concurrently, even with the same page. - * - * Note that this must retrieve a *page*, not necessarily a *block*. - * - * Return: the page on success, ERR_PTR() on failure - */ - struct page *(*read_merkle_tree_page)(struct inode *inode, - pgoff_t index, - unsigned long num_ra_pages, - u8 log_blocksize); /** * Read a Merkle tree block of the given inode. * @inode: the inode @@ -162,13 +141,12 @@ struct fsverity_operations { /** * Release the reference to a Merkle tree block - * - * @page: the block to release + * @block: the block to release * * This is called when fs-verity is done with a block obtained with * ->read_merkle_tree_block(). */ - void (*drop_block)(struct fsverity_block *block); + void (*drop_merkle_tree_block)(struct fsverity_block *block); }; #ifdef CONFIG_FS_VERITY @@ -217,74 +195,16 @@ static inline void fsverity_cleanup_inode(struct inode *inode) int fsverity_ioctl_read_metadata(struct file *filp, const void __user *uarg); +int fsverity_set_block_page(struct fsverity_block *block, + struct page *page, unsigned int index); +void fsverity_drop_page_merke_tree_block(struct fsverity_block *block); + /* verify.c */ bool fsverity_verify_blocks(struct folio *folio, size_t len, size_t offset); void fsverity_verify_bio(struct bio *bio); void fsverity_enqueue_verify_work(struct work_struct *work); -/** - * fsverity_drop_block() - drop block obtained with ->read_merkle_tree_block() - * @inode: inode in use for verification or metadata reading - * @block: block to be dropped - * - * Generic put_page() method. Calls out back to filesystem if ->drop_block() is - * set, otherwise do nothing. - * - */ -static inline void fsverity_drop_block(struct inode *inode, - struct fsverity_block *block) -{ - if (inode->i_sb->s_vop->drop_block) - inode->i_sb->s_vop->drop_block(block); - else { - struct page *page = (struct page *)block->context; - - if (block->verified) - SetPageChecked(page); - - put_page(page); - } -} - -/** - * fsverity_read_block_from_page() - layer between fs using read page - * and read block - * @inode: inode in use for verification or metadata reading - * @index: index of the block in the tree (offset into the tree) - * @block: block to be read - * @num_ra_pages: number of pages to readahead, may be ignored - * - * Depending on fs implementation use read_merkle_tree_block or - * read_merkle_tree_page. - */ -static inline int fsverity_read_merkle_tree_block(struct inode *inode, - unsigned int index, - struct fsverity_block *block, - unsigned long num_ra_pages) -{ - struct page *page; - - if (inode->i_sb->s_vop->read_merkle_tree_block) - return inode->i_sb->s_vop->read_merkle_tree_block( - inode, index, block, num_ra_pages); - - page = inode->i_sb->s_vop->read_merkle_tree_page( - inode, index >> PAGE_SHIFT, num_ra_pages, - block->len); - - block->kaddr = page_address(page) + (index % PAGE_SIZE); - block->cached = PageChecked(page); - block->context = page; - - if (IS_ERR(page)) - return PTR_ERR(page); - else - return 0; -} - - - #else /* !CONFIG_FS_VERITY */ static inline struct fsverity_info *fsverity_get_info(const struct inode *inode) @@ -362,20 +282,6 @@ static inline void fsverity_enqueue_verify_work(struct work_struct *work) WARN_ON_ONCE(1); } -static inline void fsverity_drop_page(struct inode *inode, struct page *page) -{ - WARN_ON_ONCE(1); -} - -static inline int fsverity_read_merkle_tree_block(struct inode *inode, - unsigned int index, - struct fsverity_block *block, - unsigned long num_ra_pages) -{ - WARN_ON_ONCE(1); - return -EOPNOTSUPP; -} - #endif /* !CONFIG_FS_VERITY */ static inline bool fsverity_verify_folio(struct folio *folio)
diff --git a/fs/verity/open.c b/fs/verity/open.c index dfb9fe6aaae9..8665d8b40081 100644 --- a/fs/verity/open.c +++ b/fs/verity/open.c @@ -126,19 +126,16 @@ int fsverity_init_merkle_tree_params(struct merkle_tree_params *params, } /* - * With block_size != PAGE_SIZE, an in-memory bitmap will need to be - * allocated to track the "verified" status of hash blocks. Don't allow - * this bitmap to get too large. For now, limit it to 1 MiB, which - * limits the file size to about 4.4 TB with SHA-256 and 4K blocks. + * An in-memory bitmap will need to be allocated to track the "verified" + * status of hash blocks. Don't allow this bitmap to get too large. + * For now, limit it to 1 MiB, which limits the file size to + * about 4.4 TB with SHA-256 and 4K blocks. * * Together with the fact that the data, and thus also the Merkle tree, * cannot have more than ULONG_MAX pages, this implies that hash block - * indices can always fit in an 'unsigned long'. But to be safe, we - * explicitly check for that too. Note, this is only for hash block - * indices; data block indices might not fit in an 'unsigned long'. + * indices can always fit in an 'unsigned long'. */ - if ((params->block_size != PAGE_SIZE && offset > 1 << 23) || - offset > ULONG_MAX) { + if (offset > (1 << 23)) { fsverity_err(inode, "Too many blocks in Merkle tree"); err = -EFBIG; goto out_err; diff --git a/fs/verity/read_metadata.c b/fs/verity/read_metadata.c index 197624cab43e..182bddf5dec5 100644 --- a/fs/verity/read_metadata.c +++ b/fs/verity/read_metadata.c @@ -16,9 +16,9 @@ static int fsverity_read_merkle_tree(struct inode *inode, const struct fsverity_info *vi, void __user *buf, u64 offset, int length) { - const struct fsverity_operations *vops = inode->i_sb->s_vop; u64 end_offset; - unsigned int offs_in_page; + unsigned int offs_in_block; + unsigned int block_size = vi->tree_params.block_size; pgoff_t index, last_index; int retval = 0; int err = 0; @@ -26,8 +26,8 @@ static int fsverity_read_merkle_tree(struct inode *inode, end_offset = min(offset + length, vi->tree_params.tree_size); if (offset >= end_offset) return 0; - offs_in_page = offset_in_page(offset); - last_index = (end_offset - 1) >> PAGE_SHIFT; + offs_in_block = offset % block_size; + last_index = (end_offset - 1) >> vi->tree_params.log_blocksize; /* * Iterate through each Merkle tree page in the requested range and copy @@ -35,34 +35,31 @@ static int fsverity_read_merkle_tree(struct inode *inode, * size isn't important here, as we are returning a byte stream; i.e., * we can just work with pages even if the tree block size != PAGE_SIZE. */ - for (index = offset >> PAGE_SHIFT; index <= last_index; index++) { + for (index = offset >> vi->tree_params.log_blocksize; + index <= last_index; index++) { unsigned long num_ra_pages = min_t(unsigned long, last_index - index + 1, inode->i_sb->s_bdi->io_pages); unsigned int bytes_to_copy = min_t(u64, end_offset - offset, - PAGE_SIZE - offs_in_page); - struct page *page; - const void *virt; + block_size - offs_in_block); + struct fsverity_block block; - page = vops->read_merkle_tree_page(inode, index, num_ra_pages, - vi->tree_params.log_blocksize); - if (IS_ERR(page)) { - err = PTR_ERR(page); - fsverity_err(inode, - "Error %d reading Merkle tree page %lu", - err, index); + block.len = block_size; + if (fsverity_read_merkle_tree_block(inode, + index << vi->tree_params.log_blocksize, + &block, num_ra_pages)) { + fsverity_drop_block(inode, &block); + err = -EFAULT; break; } - virt = kmap_local_page(page); - if (copy_to_user(buf, virt + offs_in_page, bytes_to_copy)) { - kunmap_local(virt); - fsverity_drop_page(inode, page); + if (copy_to_user(buf, block.kaddr + offs_in_block, bytes_to_copy)) { + fsverity_drop_block(inode, &block); err = -EFAULT; break; } - kunmap_local(virt); - fsverity_drop_page(inode, page); + fsverity_drop_block(inode, &block); + block.kaddr = NULL; retval += bytes_to_copy; buf += bytes_to_copy; @@ -73,7 +70,7 @@ static int fsverity_read_merkle_tree(struct inode *inode, break; } cond_resched(); - offs_in_page = 0; + offs_in_block = 0; } return retval ? retval : err; } diff --git a/fs/verity/verify.c b/fs/verity/verify.c index f556336ebd8d..dfe01f121843 100644 --- a/fs/verity/verify.c +++ b/fs/verity/verify.c @@ -44,15 +44,15 @@ verify_data_block(struct inode *inode, struct fsverity_info *vi, const struct merkle_tree_params *params = &vi->tree_params; const unsigned int hsize = params->digest_size; int level; + int err; + int num_ra_pages; u8 _want_hash[FS_VERITY_MAX_DIGEST_SIZE]; const u8 *want_hash; u8 real_hash[FS_VERITY_MAX_DIGEST_SIZE]; /* The hash blocks that are traversed, indexed by level */ struct { - /* Page containing the hash block */ - struct page *page; - /* Mapped address of the hash block (will be within @page) */ - const void *addr; + /* Block containing the hash block */ + struct fsverity_block block; /* Index of the hash block in the tree overall */ unsigned long index; /* Byte offset of the wanted hash relative to @addr */ @@ -93,10 +93,8 @@ verify_data_block(struct inode *inode, struct fsverity_info *vi, unsigned long next_hidx; unsigned long hblock_idx; pgoff_t hpage_idx; - unsigned int hblock_offset_in_page; unsigned int hoffset; - struct page *hpage; - const void *haddr; + struct fsverity_block *block = &hblocks[level].block; /* * The index of the block in the current level; also the index @@ -110,34 +108,28 @@ verify_data_block(struct inode *inode, struct fsverity_info *vi, /* Index of the hash page in the tree overall */ hpage_idx = hblock_idx >> params->log_blocks_per_page; - /* Byte offset of the hash block within the page */ - hblock_offset_in_page = - (hblock_idx << params->log_blocksize) & ~PAGE_MASK; - /* Byte offset of the hash within the block */ hoffset = (hidx << params->log_digestsize) & (params->block_size - 1); - hpage = inode->i_sb->s_vop->read_merkle_tree_page(inode, - hpage_idx, level == 0 ? min(max_ra_pages, - params->tree_pages - hpage_idx) : 0, - params->log_blocksize); - if (IS_ERR(hpage)) { + block->len = params->block_size; + num_ra_pages = level == 0 ? + min(max_ra_pages, params->tree_pages - hpage_idx) : 0; + err = fsverity_read_merkle_tree_block( + inode, hblock_idx << params->log_blocksize, block, + num_ra_pages); + if (err) { fsverity_err(inode, - "Error %ld reading Merkle tree page %lu", - PTR_ERR(hpage), hpage_idx); + "Error %d reading Merkle tree block %lu", + err, hblock_idx); goto error; } - haddr = kmap_local_page(hpage) + hblock_offset_in_page; - if (is_hash_block_verified(vi, hblock_idx, PageChecked(hpage))) { - memcpy(_want_hash, haddr + hoffset, hsize); + if (is_hash_block_verified(vi, hblock_idx, block->cached)) { + memcpy(_want_hash, block->kaddr + hoffset, hsize); want_hash = _want_hash; - kunmap_local(haddr); - fsverity_drop_page(inode, hpage); + fsverity_drop_block(inode, block); goto descend; } - hblocks[level].page = hpage; - hblocks[level].addr = haddr; hblocks[level].index = hblock_idx; hblocks[level].hoffset = hoffset; hidx = next_hidx; @@ -147,8 +139,8 @@ verify_data_block(struct inode *inode, struct fsverity_info *vi, descend: /* Descend the tree verifying hash blocks. */ for (; level > 0; level--) { - struct page *hpage = hblocks[level - 1].page; - const void *haddr = hblocks[level - 1].addr; + struct fsverity_block *block = &hblocks[level - 1].block; + const void *haddr = block->kaddr; unsigned long hblock_idx = hblocks[level - 1].index; unsigned int hoffset = hblocks[level - 1].hoffset; @@ -161,14 +153,11 @@ verify_data_block(struct inode *inode, struct fsverity_info *vi, * idempotent, as the same hash block might be verified by * multiple threads concurrently. */ - if (vi->hash_block_verified) - set_bit(hblock_idx, vi->hash_block_verified); - else - SetPageChecked(hpage); + set_bit(hblock_idx, vi->hash_block_verified); + block->verified = true; memcpy(_want_hash, haddr + hoffset, hsize); want_hash = _want_hash; - kunmap_local(haddr); - fsverity_drop_page(inode, hpage); + fsverity_drop_block(inode, block); } /* Finally, verify the data block. */ @@ -186,8 +175,7 @@ verify_data_block(struct inode *inode, struct fsverity_info *vi, params->hash_alg->name, hsize, real_hash); error: for (; level > 0; level--) { - kunmap_local(hblocks[level - 1].addr); - fsverity_drop_page(inode, hblocks[level - 1].page); + fsverity_drop_block(inode, &hblocks[level - 1].block); } return false; } diff --git a/include/linux/fsverity.h b/include/linux/fsverity.h index cac012d4c86a..ce37a430bc97 100644 --- a/include/linux/fsverity.h +++ b/include/linux/fsverity.h @@ -26,6 +26,24 @@ /* Arbitrary limit to bound the kmalloc() size. Can be changed. */ #define FS_VERITY_MAX_DESCRIPTOR_SIZE 16384 +/** + * struct fsverity_block - Merkle Tree block + * @kaddr: virtual address of the block's data + * @len: length of the data + * @cached: true if block was already in cache, false otherwise + * @verified: true if block is verified against Merkle tree + * @context: filesystem private context + * + * Merkle Tree blocks passed and requested from filesystem + */ +struct fsverity_block { + void *kaddr; + unsigned int len; + bool cached; + bool verified; + void *context; +}; + /* Verity operations for filesystems */ struct fsverity_operations { @@ -107,6 +125,24 @@ struct fsverity_operations { pgoff_t index, unsigned long num_ra_pages, u8 log_blocksize); + /** + * Read a Merkle tree block of the given inode. + * @inode: the inode + * @index: 0-based index of the block within the Merkle tree + * @num_ra_pages: The number of pages with blocks that should be + * prefetched starting at @index if the page at @index + * isn't already cached. Implementations may ignore this + * argument; it's only a performance optimization. + * + * This can be called at any time on an open verity file. It may be + * called by multiple processes concurrently. + * + * Return: 0 on success, -errno on failure + */ + int (*read_merkle_tree_block)(struct inode *inode, + unsigned int index, + struct fsverity_block *block, + unsigned long num_ra_pages); /** * Write a Merkle tree block to the given inode. @@ -125,14 +161,14 @@ struct fsverity_operations { u64 pos, unsigned int size); /** - * Release the reference to a Merkle tree page + * Release the reference to a Merkle tree block * - * @page: the page to release + * @page: the block to release * - * This is called when fs-verity is done with a page obtained with - * ->read_merkle_tree_page(). + * This is called when fs-verity is done with a block obtained with + * ->read_merkle_tree_block(). */ - void (*drop_page)(struct page *page); + void (*drop_block)(struct fsverity_block *block); }; #ifdef CONFIG_FS_VERITY @@ -188,22 +224,66 @@ void fsverity_verify_bio(struct bio *bio); void fsverity_enqueue_verify_work(struct work_struct *work); /** - * fsverity_drop_page() - drop page obtained with ->read_merkle_tree_page() + * fsverity_drop_block() - drop block obtained with ->read_merkle_tree_block() * @inode: inode in use for verification or metadata reading - * @page: page to be dropped + * @block: block to be dropped * - * Generic put_page() method. Calls out back to filesystem if ->drop_page() is - * set, otherwise just drops the reference to a page. + * Generic put_page() method. Calls out back to filesystem if ->drop_block() is + * set, otherwise do nothing. * */ -static inline void fsverity_drop_page(struct inode *inode, struct page *page) +static inline void fsverity_drop_block(struct inode *inode, + struct fsverity_block *block) { - if (inode->i_sb->s_vop->drop_page) - inode->i_sb->s_vop->drop_page(page); - else + if (inode->i_sb->s_vop->drop_block) + inode->i_sb->s_vop->drop_block(block); + else { + struct page *page = (struct page *)block->context; + + if (block->verified) + SetPageChecked(page); + put_page(page); + } } +/** + * fsverity_read_block_from_page() - layer between fs using read page + * and read block + * @inode: inode in use for verification or metadata reading + * @index: index of the block in the tree (offset into the tree) + * @block: block to be read + * @num_ra_pages: number of pages to readahead, may be ignored + * + * Depending on fs implementation use read_merkle_tree_block or + * read_merkle_tree_page. + */ +static inline int fsverity_read_merkle_tree_block(struct inode *inode, + unsigned int index, + struct fsverity_block *block, + unsigned long num_ra_pages) +{ + struct page *page; + + if (inode->i_sb->s_vop->read_merkle_tree_block) + return inode->i_sb->s_vop->read_merkle_tree_block( + inode, index, block, num_ra_pages); + + page = inode->i_sb->s_vop->read_merkle_tree_page( + inode, index >> PAGE_SHIFT, num_ra_pages, + block->len); + + block->kaddr = page_address(page) + (index % PAGE_SIZE); + block->cached = PageChecked(page); + block->context = page; + + if (IS_ERR(page)) + return PTR_ERR(page); + else + return 0; +} + + #else /* !CONFIG_FS_VERITY */ @@ -287,6 +367,15 @@ static inline void fsverity_drop_page(struct inode *inode, struct page *page) WARN_ON_ONCE(1); } +static inline int fsverity_read_merkle_tree_block(struct inode *inode, + unsigned int index, + struct fsverity_block *block, + unsigned long num_ra_pages) +{ + WARN_ON_ONCE(1); + return -EOPNOTSUPP; +} + #endif /* !CONFIG_FS_VERITY */ static inline bool fsverity_verify_folio(struct folio *folio)
fsverity expects filesystem to provide PAGEs with Merkle tree blocks in it. Then, when fsverity is done with processing the blocks, reference to PAGE is freed. This doesn't fit well with the way XFS manages its memory. This patch moves page reference management out of fsverity to filesystem. This way fsverity expects a kaddr to the Merkle tree block and filesystem can handle all caching and reference counting. As btrfs, ext4 and f2fs return page with Merkle tree blocks this patch also adds fsverity_read_merkle_tree_block() which wraps addressing blocks in the page (not to implement it in every fs). Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com> --- fs/verity/open.c | 15 ++--- fs/verity/read_metadata.c | 41 +++++++------- fs/verity/verify.c | 58 ++++++++----------- include/linux/fsverity.h | 115 +++++++++++++++++++++++++++++++++----- 4 files changed, 150 insertions(+), 79 deletions(-)