[1/3] VFS: Add a call to obtain a file's hash

Message ID	20181004203007.217320-2-mjg59@google.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-fsdevel-owner@kernel.org> Date: Thu, 4 Oct 2018 13:30:05 -0700 In-Reply-To: <20181004203007.217320-1-mjg59@google.com> Message-Id: <20181004203007.217320-2-mjg59@google.com> Mime-Version: 1.0 References: <20181004203007.217320-1-mjg59@google.com> Subject: [PATCH 1/3] VFS: Add a call to obtain a file's hash From: Matthew Garrett <mjg59@google.com> To: linux-integrity@vger.kernel.org Cc: zohar@linux.vnet.ibm.com, dmitry.kasatkin@gmail.com, miklos@szeredi.hu, linux-fsdevel@vger.kernel.org, viro@zeniv.linux.org.uk, Matthew Garrett <mjg59@google.com> Content-Type: text/plain; charset="UTF-8" Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk
Series	[1/3] VFS: Add a call to obtain a file's hash \| expand [1/3] VFS: Add a call to obtain a file's hash [2/3] IMA: Make use of filesystem-provided hashes [3/3] FUSE: Allow filesystems to provide gethash methods

Matthew Garrett Oct. 4, 2018, 8:30 p.m. UTC

IMA wants to know what the hash of a file is, and currently does so by
reading the entire file and generating the hash. Some filesystems may
have the ability to store the hash in a secure manner resistant to
offline attacks (eg, filesystem-level file signing), and in that case
it's a performance win for IMA to be able to use that rather than having
to re-hash everything. This patch simply adds VFS-level support for
calling down to filesystems.

Signed-off-by: Matthew Garrett <mjg59@google.com>
---
 fs/read_write.c    | 24 ++++++++++++++++++++++++
 include/linux/fs.h |  6 +++++-
 2 files changed, 29 insertions(+), 1 deletion(-)

Mimi Zohar Oct. 11, 2018, 3:22 p.m. UTC | #1

On Thu, 2018-10-04 at 13:30 -0700, Matthew Garrett wrote:
> IMA wants to know what the hash of a file is, and currently does so by
> reading the entire file and generating the hash. Some filesystems may
> have the ability to store the hash in a secure manner resistant to
> offline attacks (eg, filesystem-level file signing), and in that case
> it's a performance win for IMA to be able to use that rather than having
> to re-hash everything. This patch simply adds VFS-level support for
> calling down to filesystems.

This patch description starts out saying that IMA needs the file hash
without explaining why.  Without that explanation, simply extracting
the file hash included in the file signature might sound plausible,
but kind of defeats the purpose of IMA.

Mimi


> 
> Signed-off-by: Matthew Garrett <mjg59@google.com>
> ---
>  fs/read_write.c    | 24 ++++++++++++++++++++++++
>  include/linux/fs.h |  6 +++++-
>  2 files changed, 29 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/read_write.c b/fs/read_write.c
> index 39b4a21dd933..9ba3ce4bb838 100644
> --- a/fs/read_write.c
> +++ b/fs/read_write.c
> @@ -2081,3 +2081,27 @@ int vfs_dedupe_file_range(struct file *file, struct file_dedupe_range *same)
>  	return ret;
>  }
>  EXPORT_SYMBOL(vfs_dedupe_file_range);
> +
> +/**
> + * vfs_gethash - obtain a file's hash
> + * @file:	file structure in question
> + * @hash_algo:	the hash algorithm requested
> + * @buf:	buffer to return the hash in
> + * @size:	size allocated for the buffer by the caller
> + *
> + * This function allows filesystems that support securely storing the hash
> + * of a file to return it rather than forcing the kernel to recalculate it.
> + * Filesystems that cannot provide guarantees about the hash being resistant
> + * to offline attack should not implement this functionality.
> + *
> + * Returns 0 on success, -EOPNOTSUPP if the filesystem doesn't support it.
> + */
> +int vfs_get_hash(struct file *file, enum hash_algo hash, uint8_t *buf,
> +		 size_t size)
> +{
> +	if (!file->f_op->get_hash)
> +		return -EOPNOTSUPP;
> +
> +	return file->f_op->get_hash(file, hash, buf, size);
> +}
> +EXPORT_SYMBOL(vfs_get_hash);
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 6c0b4a1c22ff..540316cfd461 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -40,6 +40,7 @@
> 
>  #include <asm/byteorder.h>
>  #include <uapi/linux/fs.h>
> +#include <uapi/linux/hash_info.h>
> 
>  struct backing_dev_info;
>  struct bdi_writeback;
> @@ -1764,6 +1765,8 @@ struct file_operations {
>  	int (*dedupe_file_range)(struct file *, loff_t, struct file *, loff_t,
>  			u64);
>  	int (*fadvise)(struct file *, loff_t, loff_t, int);
> +	int (*get_hash)(struct file *, enum hash_algo hash, uint8_t *buf,
> +			size_t size);
>  } __randomize_layout;
> 
>  struct inode_operations {
> @@ -1838,7 +1841,8 @@ extern int vfs_dedupe_file_range(struct file *file,
>  extern int vfs_dedupe_file_range_one(struct file *src_file, loff_t src_pos,
>  				     struct file *dst_file, loff_t dst_pos,
>  				     u64 len);
> -
> +extern int vfs_get_hash(struct file *file, enum hash_algo hash, uint8_t *buf,
> +			size_t size);
> 
>  struct super_operations {
>     	struct inode *(*alloc_inode)(struct super_block *sb);

Matthew Garrett Oct. 11, 2018, 6:21 p.m. UTC | #2

On Thu, Oct 11, 2018 at 8:22 AM Mimi Zohar <zohar@linux.ibm.com> wrote:
>
> On Thu, 2018-10-04 at 13:30 -0700, Matthew Garrett wrote:
> > IMA wants to know what the hash of a file is, and currently does so by
> > reading the entire file and generating the hash. Some filesystems may
> > have the ability to store the hash in a secure manner resistant to
> > offline attacks (eg, filesystem-level file signing), and in that case
> > it's a performance win for IMA to be able to use that rather than having
> > to re-hash everything. This patch simply adds VFS-level support for
> > calling down to filesystems.
>
> This patch description starts out saying that IMA needs the file hash
> without explaining why.  Without that explanation, simply extracting
> the file hash included in the file signature might sound plausible,
> but kind of defeats the purpose of IMA.

I'm not sure how it defeats the purpose - IMA wants to know the hash
of a file so it can either log it or compare it against a signature,
and it currently obtains this hash by reading the entire file at
measurement time. If the filesystem later returns different data then
IMA won't notice, which allows a malicious filesystem to bypass the
measurements - there's no guarantee that we won't evict large parts of
the copy of an executable that IMA read, and the filesystem can give
us back a modified page when we page it back in. So IMA fundamentally
relies on the filesystem to be trustworthy, and if we rely on the
filesystem to be trustworthy then we should be able to rely on it to
accurately store and provide the hash of a file.

Matthew Garrett Oct. 11, 2018, 6:24 p.m. UTC | #3

On Thu, Oct 11, 2018 at 11:21 AM Matthew Garrett <mjg59@google.com> wrote:
>
> On Thu, Oct 11, 2018 at 8:22 AM Mimi Zohar <zohar@linux.ibm.com> wrote:
> >
> > This patch description starts out saying that IMA needs the file hash
> > without explaining why.  Without that explanation, simply extracting
> > the file hash included in the file signature might sound plausible,
> > but kind of defeats the purpose of IMA.
>
> I'm not sure how it defeats the purpose - IMA wants to know the hash
> of a file so it can either log it or compare it against a signature,
> and it currently obtains this hash by reading the entire file at
> measurement time. If the filesystem later returns different data then
> IMA won't notice, which allows a malicious filesystem to bypass the
> measurements - there's no guarantee that we won't evict large parts of
> the copy of an executable that IMA read, and the filesystem can give
> us back a modified page when we page it back in. So IMA fundamentally
> relies on the filesystem to be trustworthy, and if we rely on the
> filesystem to be trustworthy then we should be able to rely on it to
> accurately store and provide the hash of a file.

Oh, to clarify on the signature part of things - it would obviously be
inappropriate to, say, just read the hash out of security.ima and hand
that back. But for a hypothetical case where the filesystem itself
verifies the signature, then the filesystem would abort the
transaction if the signature didn't match and it seems reasonable to
avoid doing the validation twice (once up front and then again on
every read)

Mimi Zohar Oct. 11, 2018, 6:37 p.m. UTC | #4

On Thu, 2018-10-11 at 11:24 -0700, Matthew Garrett wrote:
> On Thu, Oct 11, 2018 at 11:21 AM Matthew Garrett <mjg59@google.com> wrote:
> >
> > On Thu, Oct 11, 2018 at 8:22 AM Mimi Zohar <zohar@linux.ibm.com> wrote:
> > >
> > > This patch description starts out saying that IMA needs the file hash
> > > without explaining why.  Without that explanation, simply extracting
> > > the file hash included in the file signature might sound plausible,
> > > but kind of defeats the purpose of IMA.
> >
> > I'm not sure how it defeats the purpose - IMA wants to know the hash
> > of a file so it can either log it or compare it against a signature,
> > and it currently obtains this hash by reading the entire file at
> > measurement time. If the filesystem later returns different data then
> > IMA won't notice, which allows a malicious filesystem to bypass the
> > measurements - there's no guarantee that we won't evict large parts of
> > the copy of an executable that IMA read, and the filesystem can give
> > us back a modified page when we page it back in. So IMA fundamentally
> > relies on the filesystem to be trustworthy, and if we rely on the
> > filesystem to be trustworthy then we should be able to rely on it to
> > accurately store and provide the hash of a file.
> 
> Oh, to clarify on the signature part of things - it would obviously be
> inappropriate to, say, just read the hash out of security.ima and hand
> that back.

Right, reading it either directly or extracted from the file signature
stored in security.ima.

> But for a hypothetical case where the filesystem itself
> verifies the signature, then the filesystem would abort the
> transaction if the signature didn't match and it seems reasonable to
> avoid doing the validation twice (once up front and then again on
> every read)

Right, this is a hypothetical scenario as far as I'm aware, since none
of the filesystems are currently calculating and storing the file
hash.  The default should be for IMA to re-calculate the file hash.

Mimi

Matthew Garrett Oct. 11, 2018, 6:43 p.m. UTC | #5

On Thu, Oct 11, 2018 at 11:37 AM Mimi Zohar <zohar@linux.ibm.com> wrote:
> On Thu, 2018-10-11 at 11:24 -0700, Matthew Garrett wrote:
> > But for a hypothetical case where the filesystem itself
> > verifies the signature, then the filesystem would abort the
> > transaction if the signature didn't match and it seems reasonable to
> > avoid doing the validation twice (once up front and then again on
> > every read)
>
> Right, this is a hypothetical scenario as far as I'm aware, since none
> of the filesystems are currently calculating and storing the file
> hash.  The default should be for IMA to re-calculate the file hash.

There are FUSE filesystems that do.

[1/3] VFS: Add a call to obtain a file's hash

Commit Message

Comments

Patch