mbox series

[v3,0/3] Add file-system authentication to BTRFS

Message ID 20200514092415.5389-1-jth@kernel.org (mailing list archive)
Headers show
Series Add file-system authentication to BTRFS | expand

Message

Johannes Thumshirn May 14, 2020, 9:24 a.m. UTC
From: Johannes Thumshirn <johannes.thumshirn@wdc.com>

This series adds file-system authentication to BTRFS. 

Unlike other verified file-system techniques like fs-verity the
authenticated version of BTRFS does not need extra meta-data on disk.

This works because in BTRFS every on-disk block has a checksum, for meta-data
the checksum is in the header of each meta-data item. For data blocks, a
separate checksum tree exists, which holds the checksums for each block.

Currently BRTFS supports CRC32C, XXHASH64, SHA256 and Blake2b for checksumming
these blocks. This series adds a new checksum algorithm, HMAC(SHA-256), which
does need an authentication key. When no, or an incoreect authentication key
is supplied no valid checksum can be generated and a read, fsck or scrub
operation would detect invalid or tampered blocks once the file-system is
mounted again with the correct key. 

Getting the key inside the kernel is out of scope of this implementation, the
file-system driver assumes the key is already in the kernel's keyring at mount
time.

There was interest in also using keyed Blake2b from the community, but this
support is not yet included.

I have CCed Eric Biggers and Richard Weinberger in the submission, as they
previously have worked on filesystem authentication and I hope we can get
input from them as well.

Example usage:
Create a file-system with authentication key 0123456
mkfs.btrfs --csum "hmac(sha256)" --auth-key 0123456 /dev/disk

Add the key to the kernel's keyring as keyid 'btrfs:foo'
keyctl add logon btrfs:foo 0123456 @u

Mount the fs using the 'btrfs:foo' key
mount -t btrfs -o auth_key=btrfs:foo,auth_hash_name="hmac(sha256)" /dev/disk /mnt/point

Note, this is a re-base of the work I did when I was still at SUSE, hence the
S-o-b being my SUSE address, while the Author being with my WDC address (to
not generate bouncing mails).

Changes since v2:
- Select CONFIG_CRYPTO_HMAC and CONFIG_KEYS (kbuild robot)
- Fix double free in error path
- Fix memory leak in error path
- Disallow nodatasum and nodatacow when authetication is use (Eric)
- Pass in authentication algorithm as mount option (Eric)
- Don't use the work "replay" in the documentation, as it is wrong and
  harmful in this context (Eric)
- Force key name to begin with 'btrfs:' (Eric)
- Use '4' as on-disk checksum type for HMAC(SHA256) to not have holes in the
  checksum types array.

Changes since v1:
- None, only rebased the series

Johannes Thumshirn (3):
  btrfs: rename btrfs_parse_device_options back to
    btrfs_parse_early_options
  btrfs: add authentication support
  btrfs: document btrfs authentication

 .../filesystems/btrfs-authentication.rst      | 168 ++++++++++++++++++
 fs/btrfs/Kconfig                              |   2 +
 fs/btrfs/ctree.c                              |  22 ++-
 fs/btrfs/ctree.h                              |   5 +-
 fs/btrfs/disk-io.c                            |  71 +++++++-
 fs/btrfs/ioctl.c                              |   7 +-
 fs/btrfs/super.c                              |  65 ++++++-
 include/uapi/linux/btrfs_tree.h               |   1 +
 8 files changed, 326 insertions(+), 15 deletions(-)
 create mode 100644 Documentation/filesystems/btrfs-authentication.rst

Comments

David Sterba May 25, 2020, 1:10 p.m. UTC | #1
On Thu, May 14, 2020 at 11:24:12AM +0200, Johannes Thumshirn wrote:
> From: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> 
> This series adds file-system authentication to BTRFS. 
> 
> Unlike other verified file-system techniques like fs-verity the
> authenticated version of BTRFS does not need extra meta-data on disk.
> 
> This works because in BTRFS every on-disk block has a checksum, for meta-data
> the checksum is in the header of each meta-data item. For data blocks, a
> separate checksum tree exists, which holds the checksums for each block.
> 
> Currently BRTFS supports CRC32C, XXHASH64, SHA256 and Blake2b for checksumming
> these blocks. This series adds a new checksum algorithm, HMAC(SHA-256), which
> does need an authentication key. When no, or an incoreect authentication key
> is supplied no valid checksum can be generated and a read, fsck or scrub
> operation would detect invalid or tampered blocks once the file-system is
> mounted again with the correct key. 

As mentioned in the discussion under LWN article, https://lwn.net/Articles/818842/
ZFS implements split hash where one half is (partial) authenticated hash
and the other half is a checksum. This allows to have at least some sort
of verification when the auth key is not available. This applies to the
fixed size checksum area of metadata blocks, for data we can afford to
store both hashes in full.

I like this idea, however it brings interesting design decisions, "what
if" and corner cases:

- what hashes to use for the plain checksum, and thus what's the split
- what if one hash matches and the other not
- increased checksum calculation time due to doubled block read
- whether to store the same parital hash+checksum for data too

As the authenticated hash is the main usecase, I'd reserve most of the
32 byte buffer to it and use a weak hash for checksum: 24 bytes for HMAC
and 8 bytes for checksum. As an example: sha256+xxhash or
blake2b+xxhash.

I'd outright skip crc32c for the checksum so we have only small number
of authenticated checksums and avoid too many options, eg.
hmac-sha256-crc32c etc. The result will be still 2 authenticated hashes
with the added checksum hardcoded to xxhash.
Johannes Thumshirn May 26, 2020, 7:50 a.m. UTC | #2
On 25/05/2020 15:11, David Sterba wrote:
> On Thu, May 14, 2020 at 11:24:12AM +0200, Johannes Thumshirn wrote:
>> From: Johannes Thumshirn <johannes.thumshirn@wdc.com>
>>
>> This series adds file-system authentication to BTRFS. 
>>
>> Unlike other verified file-system techniques like fs-verity the
>> authenticated version of BTRFS does not need extra meta-data on disk.
>>
>> This works because in BTRFS every on-disk block has a checksum, for meta-data
>> the checksum is in the header of each meta-data item. For data blocks, a
>> separate checksum tree exists, which holds the checksums for each block.
>>
>> Currently BRTFS supports CRC32C, XXHASH64, SHA256 and Blake2b for checksumming
>> these blocks. This series adds a new checksum algorithm, HMAC(SHA-256), which
>> does need an authentication key. When no, or an incoreect authentication key
>> is supplied no valid checksum can be generated and a read, fsck or scrub
>> operation would detect invalid or tampered blocks once the file-system is
>> mounted again with the correct key. 
> 
> As mentioned in the discussion under LWN article, https://lwn.net/Articles/818842/
> ZFS implements split hash where one half is (partial) authenticated hash
> and the other half is a checksum. This allows to have at least some sort
> of verification when the auth key is not available. This applies to the
> fixed size checksum area of metadata blocks, for data we can afford to
> store both hashes in full.
> 
> I like this idea, however it brings interesting design decisions, "what
> if" and corner cases:
> 
> - what hashes to use for the plain checksum, and thus what's the split
> - what if one hash matches and the other not
> - increased checksum calculation time due to doubled block read
> - whether to store the same parital hash+checksum for data too
> 
> As the authenticated hash is the main usecase, I'd reserve most of the
> 32 byte buffer to it and use a weak hash for checksum: 24 bytes for HMAC
> and 8 bytes for checksum. As an example: sha256+xxhash or
> blake2b+xxhash.
> 
> I'd outright skip crc32c for the checksum so we have only small number
> of authenticated checksums and avoid too many options, eg.
> hmac-sha256-crc32c etc. The result will be still 2 authenticated hashes
> with the added checksum hardcoded to xxhash.
> 

Hmm I'm really not a fan of this. We would have to use something like 
sha2-224 to get the room for the 2nd checksum. So we're using a weaker
hash just so we can add a second checksum. On the other hand you've asked 
me to add the known pieces of information into the hashes as a salt to
"make attacks harder at a small cost".
David Sterba May 26, 2020, 11:53 a.m. UTC | #3
On Tue, May 26, 2020 at 07:50:53AM +0000, Johannes Thumshirn wrote:
> On 25/05/2020 15:11, David Sterba wrote:
> > On Thu, May 14, 2020 at 11:24:12AM +0200, Johannes Thumshirn wrote:
> > As mentioned in the discussion under LWN article, https://lwn.net/Articles/818842/
> > ZFS implements split hash where one half is (partial) authenticated hash
> > and the other half is a checksum. This allows to have at least some sort
> > of verification when the auth key is not available. This applies to the
> > fixed size checksum area of metadata blocks, for data we can afford to
> > store both hashes in full.
> > 
> > I like this idea, however it brings interesting design decisions, "what
> > if" and corner cases:
> > 
> > - what hashes to use for the plain checksum, and thus what's the split
> > - what if one hash matches and the other not
> > - increased checksum calculation time due to doubled block read
> > - whether to store the same parital hash+checksum for data too
> > 
> > As the authenticated hash is the main usecase, I'd reserve most of the
> > 32 byte buffer to it and use a weak hash for checksum: 24 bytes for HMAC
> > and 8 bytes for checksum. As an example: sha256+xxhash or
> > blake2b+xxhash.
> > 
> > I'd outright skip crc32c for the checksum so we have only small number
> > of authenticated checksums and avoid too many options, eg.
> > hmac-sha256-crc32c etc. The result will be still 2 authenticated hashes
> > with the added checksum hardcoded to xxhash.
> 
> Hmm I'm really not a fan of this. We would have to use something like 
> sha2-224 to get the room for the 2nd checksum. So we're using a weaker
> hash just so we can add a second checksum.

The idea is to calculate full hash (32 bytes) and store only the part
(24 bytes). Yes this means there's some information loss and weakening,
but enables a usecase.

> On the other hand you've asked 
> me to add the known pieces of information into the hashes as a salt to
> "make attacks harder at a small cost".

Yes and this makes it harder to attack the hash, it should be there
regardless of the additional checksums.
Johannes Thumshirn May 26, 2020, 12:44 p.m. UTC | #4
On 26/05/2020 13:54, David Sterba wrote:
> On Tue, May 26, 2020 at 07:50:53AM +0000, Johannes Thumshirn wrote:
>> On 25/05/2020 15:11, David Sterba wrote:
>>> On Thu, May 14, 2020 at 11:24:12AM +0200, Johannes Thumshirn wrote:
>>> As mentioned in the discussion under LWN article, https://lwn.net/Articles/818842/
>>> ZFS implements split hash where one half is (partial) authenticated hash
>>> and the other half is a checksum. This allows to have at least some sort
>>> of verification when the auth key is not available. This applies to the
>>> fixed size checksum area of metadata blocks, for data we can afford to
>>> store both hashes in full.
>>>
>>> I like this idea, however it brings interesting design decisions, "what
>>> if" and corner cases:
>>>
>>> - what hashes to use for the plain checksum, and thus what's the split
>>> - what if one hash matches and the other not
>>> - increased checksum calculation time due to doubled block read
>>> - whether to store the same parital hash+checksum for data too
>>>
>>> As the authenticated hash is the main usecase, I'd reserve most of the
>>> 32 byte buffer to it and use a weak hash for checksum: 24 bytes for HMAC
>>> and 8 bytes for checksum. As an example: sha256+xxhash or
>>> blake2b+xxhash.
>>>
>>> I'd outright skip crc32c for the checksum so we have only small number
>>> of authenticated checksums and avoid too many options, eg.
>>> hmac-sha256-crc32c etc. The result will be still 2 authenticated hashes
>>> with the added checksum hardcoded to xxhash.
>>
>> Hmm I'm really not a fan of this. We would have to use something like 
>> sha2-224 to get the room for the 2nd checksum. So we're using a weaker
>> hash just so we can add a second checksum.
> 
> The idea is to calculate full hash (32 bytes) and store only the part
> (24 bytes). Yes this means there's some information loss and weakening,
> but enables a usecase.

I'm not enough a security expert to be able to judge this. Eric can I hear 
your opinion on this?

Thanks,
	Johannes
Qu Wenruo May 27, 2020, 2:08 a.m. UTC | #5
On 2020/5/14 下午5:24, Johannes Thumshirn wrote:
> From: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> 
> This series adds file-system authentication to BTRFS. 
> 
> Unlike other verified file-system techniques like fs-verity the
> authenticated version of BTRFS does not need extra meta-data on disk.
> 
> This works because in BTRFS every on-disk block has a checksum, for meta-data
> the checksum is in the header of each meta-data item. For data blocks, a
> separate checksum tree exists, which holds the checksums for each block.
> 
> Currently BRTFS supports CRC32C, XXHASH64, SHA256 and Blake2b for checksumming
> these blocks. This series adds a new checksum algorithm, HMAC(SHA-256), which
> does need an authentication key. When no, or an incoreect authentication key
> is supplied no valid checksum can be generated and a read, fsck or scrub
> operation would detect invalid or tampered blocks once the file-system is
> mounted again with the correct key. 
> 
> Getting the key inside the kernel is out of scope of this implementation, the
> file-system driver assumes the key is already in the kernel's keyring at mount
> time.
> 
> There was interest in also using keyed Blake2b from the community, but this
> support is not yet included.
> 
> I have CCed Eric Biggers and Richard Weinberger in the submission, as they
> previously have worked on filesystem authentication and I hope we can get
> input from them as well.
> 
> Example usage:
> Create a file-system with authentication key 0123456
> mkfs.btrfs --csum "hmac(sha256)" --auth-key 0123456 /dev/disk
> 
> Add the key to the kernel's keyring as keyid 'btrfs:foo'
> keyctl add logon btrfs:foo 0123456 @u
> 
> Mount the fs using the 'btrfs:foo' key
> mount -t btrfs -o auth_key=btrfs:foo,auth_hash_name="hmac(sha256)" /dev/disk /mnt/point
> 
> Note, this is a re-base of the work I did when I was still at SUSE, hence the
> S-o-b being my SUSE address, while the Author being with my WDC address (to
> not generate bouncing mails).
> 
> Changes since v2:
> - Select CONFIG_CRYPTO_HMAC and CONFIG_KEYS (kbuild robot)
> - Fix double free in error path
> - Fix memory leak in error path
> - Disallow nodatasum and nodatacow when authetication is use (Eric)

Since we're disabling NODATACOW usages, can we also disable the
following features?
- v1 space cache
  V1 space cache uses NODATACOW file to store space cache, althouhg it
  has inline csum, but it's fixed to crc32c. So attacker can easily
  utilize this hole to mess space cache, and do some DoS attack.

- fallocate
  I'm not 100% sure about this, but since nodatacow is already a second
  class citizen in btrfs, maybe not supporting fallocate is not a
  strange move.

Thanks,
Qu

> - Pass in authentication algorithm as mount option (Eric)
> - Don't use the work "replay" in the documentation, as it is wrong and
>   harmful in this context (Eric)
> - Force key name to begin with 'btrfs:' (Eric)
> - Use '4' as on-disk checksum type for HMAC(SHA256) to not have holes in the
>   checksum types array.
> 
> Changes since v1:
> - None, only rebased the series
> 
> Johannes Thumshirn (3):
>   btrfs: rename btrfs_parse_device_options back to
>     btrfs_parse_early_options
>   btrfs: add authentication support
>   btrfs: document btrfs authentication
> 
>  .../filesystems/btrfs-authentication.rst      | 168 ++++++++++++++++++
>  fs/btrfs/Kconfig                              |   2 +
>  fs/btrfs/ctree.c                              |  22 ++-
>  fs/btrfs/ctree.h                              |   5 +-
>  fs/btrfs/disk-io.c                            |  71 +++++++-
>  fs/btrfs/ioctl.c                              |   7 +-
>  fs/btrfs/super.c                              |  65 ++++++-
>  include/uapi/linux/btrfs_tree.h               |   1 +
>  8 files changed, 326 insertions(+), 15 deletions(-)
>  create mode 100644 Documentation/filesystems/btrfs-authentication.rst
>
David Sterba May 27, 2020, 11:27 a.m. UTC | #6
On Wed, May 27, 2020 at 10:08:06AM +0800, Qu Wenruo wrote:
> > Changes since v2:
> > - Select CONFIG_CRYPTO_HMAC and CONFIG_KEYS (kbuild robot)
> > - Fix double free in error path
> > - Fix memory leak in error path
> > - Disallow nodatasum and nodatacow when authetication is use (Eric)
> 
> Since we're disabling NODATACOW usages, can we also disable the
> following features?
> - v1 space cache
>   V1 space cache uses NODATACOW file to store space cache, althouhg it
>   has inline csum, but it's fixed to crc32c. So attacker can easily
>   utilize this hole to mess space cache, and do some DoS attack.

That's a good point.

The v1 space cache will be phased out but it won't be in a timeframe
we'll get in the authentication. At this point we don't even have a way
to select v2 at mkfs time (it's work in progress though), so it would be
required to switch to v2 on the first mount.

> - fallocate
>   I'm not 100% sure about this, but since nodatacow is already a second
>   class citizen in btrfs, maybe not supporting fallocate is not a
>   strange move.

Fallocate is a standard file operation, not supporting would be quite
strange. What's the problem with fallocate and authentication?
Qu Wenruo May 27, 2020, 11:58 a.m. UTC | #7
On 2020/5/27 下午7:27, David Sterba wrote:
> On Wed, May 27, 2020 at 10:08:06AM +0800, Qu Wenruo wrote:
>>> Changes since v2:
>>> - Select CONFIG_CRYPTO_HMAC and CONFIG_KEYS (kbuild robot)
>>> - Fix double free in error path
>>> - Fix memory leak in error path
>>> - Disallow nodatasum and nodatacow when authetication is use (Eric)
>>
>> Since we're disabling NODATACOW usages, can we also disable the
>> following features?
>> - v1 space cache
>>   V1 space cache uses NODATACOW file to store space cache, althouhg it
>>   has inline csum, but it's fixed to crc32c. So attacker can easily
>>   utilize this hole to mess space cache, and do some DoS attack.
> 
> That's a good point.
> 
> The v1 space cache will be phased out but it won't be in a timeframe
> we'll get in the authentication. At this point we don't even have a way
> to select v2 at mkfs time (it's work in progress though), so it would be
> required to switch to v2 on the first mount.
> 
>> - fallocate
>>   I'm not 100% sure about this, but since nodatacow is already a second
>>   class citizen in btrfs, maybe not supporting fallocate is not a
>>   strange move.
> 
> Fallocate is a standard file operation, not supporting would be quite
> strange. What's the problem with fallocate and authentication?
> 
As said, I'm not that sure about preallocate, but that's the remaining
user of nodatacow.
Although it's a pretty common interface, but in btrfs it doesn't really
make much sense.
In case like fallocate then snapshot use case, there is really no
benefit from writing into fallocated range.

Not to mention the extra cross-ref check involved when writing into
possible preallocated range.

Thanks,
Qu
David Sterba May 27, 2020, 1:11 p.m. UTC | #8
On Wed, May 27, 2020 at 10:08:06AM +0800, Qu Wenruo wrote:
> > Changes since v2:
> > - Select CONFIG_CRYPTO_HMAC and CONFIG_KEYS (kbuild robot)
> > - Fix double free in error path
> > - Fix memory leak in error path
> > - Disallow nodatasum and nodatacow when authetication is use (Eric)
> 
> Since we're disabling NODATACOW usages, can we also disable the
> following features?
> - v1 space cache
>   V1 space cache uses NODATACOW file to store space cache, althouhg it
>   has inline csum, but it's fixed to crc32c. So attacker can easily
>   utilize this hole to mess space cache, and do some DoS attack.
> 
> - fallocate
>   I'm not 100% sure about this, but since nodatacow is already a second
>   class citizen in btrfs, maybe not supporting fallocate is not a
>   strange move.

- swapfile
  NODATACOW is required for swapfile, so authentication and swapfile are
  mutualy exclusive.
David Sterba June 1, 2020, 2:59 p.m. UTC | #9
On Tue, May 26, 2020 at 12:44:28PM +0000, Johannes Thumshirn wrote:
> On 26/05/2020 13:54, David Sterba wrote:
> > On Tue, May 26, 2020 at 07:50:53AM +0000, Johannes Thumshirn wrote:
> >> On 25/05/2020 15:11, David Sterba wrote:
> >>> I'd outright skip crc32c for the checksum so we have only small number
> >>> of authenticated checksums and avoid too many options, eg.
> >>> hmac-sha256-crc32c etc. The result will be still 2 authenticated hashes
> >>> with the added checksum hardcoded to xxhash.
> >>
> >> Hmm I'm really not a fan of this. We would have to use something like 
> >> sha2-224 to get the room for the 2nd checksum. So we're using a weaker
> >> hash just so we can add a second checksum.
> > 
> > The idea is to calculate full hash (32 bytes) and store only the part
> > (24 bytes). Yes this means there's some information loss and weakening,
> > but enables a usecase.
> 
> I'm not enough a security expert to be able to judge this. Eric can I hear 
> your opinion on this?

Given that this has implications on strength and the usecases, I'd
rather let the filesystem provide the options and let the user choose
and not make the decision for their behalf.

This would increase number of authenticated hashes to 4 in the end:

1. authenticated with 32byte/256bit hash (sha256, blake2b)
   + full strength
   - no way to verify checksums without the key

2. authenticated with 24bytes/192bit hash (sha256, blake2b)
   where the last 8 bytes are xxhash64
   ~ weakened strength but should be still sufficient
   + possibility to verify checksums without the key
   - slight perf cost for the 2nd hash

As option 2 needs some evaluation and reasoning whether it does not
compromise the security, I don't insist on having it implemented in the
first phase. I have a prototype code for that so it might live in
linux-next for some time before we'd merge it.

Regarding backward compatibility, the checksums are easy compared to
other features. The supported status can be deteremined directly from
superblock so adding new types of checksum do not require compat bits
and the code for that.