[v3] btrfs: pass checksum type via BTRFS_IOC_FS_INFO ioctl

Message ID	20200626150107.19666-1-johannes.thumshirn@wdc.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=C+P9=AH=vger.kernel.org=linux-btrfs-owner@kernel.org> IronPort-SDR: S82CB/NPH2DDnOvxTvn83x4n9bxZqCjAo9rXnf2j63n7tMIWrGIjsX3WOOe9F3ElY8alnUHyAv ZdZ62LRFq5NTFmb8t3jZlBSSkKAmz7iN1woPh/1mBd7fLh+Kng+jtXDe3l0lB/b639I6WqW4jd eMBRvKpSohHTFMcjB1H7/0k6s7CM0/o71Pj7pEVaID8aC0mSHIinvpt5Fx2sWSqUJRej4DTyjs BbLmaybH6Pit2E610bp2mQrnPMsQmoebTJh1j9W3ng/URmnJg47OkZOH2c1tHF+4iPHrP++GNS RDw= IronPort-SDR: rAi5oGCXKRlh5RoR/JcnkIt9q4Al335dGpIYmP8aVgNo5jfMNShXhD1oEOV/jnEomTJGriAQts ghj2aMmOO8W13Xd0JfGnvp0a6/J5CLcgg= IronPort-SDR: tKUo55bvFSD+gGu6eu5/gmKH/7yYyn7sa6iqC18EbWVYC07paHgVocfTTd8Xf9lgA62+FXxWpP fStlgLXeEEuw== WDCIronportException: Internal From: Johannes Thumshirn <johannes.thumshirn@wdc.com> To: David Sterba <dsterba@suse.cz> Cc: linux-btrfs@vger.kernel.org, Johannes Thumshirn <johannes.thumshirn@wdc.com>, stable@vger.kernel.org Subject: [PATCH v3] btrfs: pass checksum type via BTRFS_IOC_FS_INFO ioctl Date: Sat, 27 Jun 2020 00:01:07 +0900 Message-Id: <20200626150107.19666-1-johannes.thumshirn@wdc.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk
Series	[v3] btrfs: pass checksum type via BTRFS_IOC_FS_INFO ioctl \| expand [v3] btrfs: pass checksum type via BTRFS_IOC_FS_INFO ioctl

Johannes Thumshirn June 26, 2020, 3:01 p.m. UTC

With the recent addition of filesystem checksum types other than CRC32c,
it is not anymore hard-coded which checksum type a btrfs filesystem uses.

Up to now there is no good way to read the filesystem checksum, apart from
reading the filesystem UUID and then query sysfs for the checksum type.

Add a new csum_type field to the BTRFS_IOC_FS_INFO ioctl command which
usually is used to query filesystem features. Also add a flags member
indicating that the kernel responded with a set csum_type field.

To simplify further additions to the ioctl, also switch the padding to a
u8 array. Pahole was used to verify the result of this switch:

pahole -C btrfs_ioctl_fs_info_args fs/btrfs/btrfs.ko
struct btrfs_ioctl_fs_info_args {
        __u64                      max_id;               /*     0     8 */
        __u64                      num_devices;          /*     8     8 */
        __u8                       fsid[16];             /*    16    16 */
        __u32                      nodesize;             /*    32     4 */
        __u32                      sectorsize;           /*    36     4 */
        __u32                      clone_alignment;      /*    40     4 */
        __u32                      flags;                /*    44     4 */
        __u16                      csum_type;            /*    48     2 */
        __u16                      csum_size;            /*    50     2 */
        __u8                       reserved[972];        /*    52   972 */

        /* size: 1024, cachelines: 16, members: 10 */
};

Fixes: 3951e7f050ac ("btrfs: add xxhash64 to checksumming algorithms")
Fixes: 3831bf0094ab ("btrfs: add sha256 to checksumming algorithm")
Cc: stable@vger.kernel.org
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
Changes to v2:
* add additional csum_size (David)
* rename flag value to BTRFS_FS_INFO_FLAG_CSUM_TYPE_SIZE to reflect
  additional size

Changes to v1:
* add 'out' comment to be consistent (Hans)
* remove le16_to_cpu() (kbuild robot)
* switch padding to be all u8 (David)
---
 fs/btrfs/ioctl.c           |  3 +++
 include/uapi/linux/btrfs.h | 14 ++++++++++++--
 2 files changed, 15 insertions(+), 2 deletions(-)

David Sterba June 26, 2020, 8:06 p.m. UTC | #1

On Sat, Jun 27, 2020 at 12:01:07AM +0900, Johannes Thumshirn wrote:
> With the recent addition of filesystem checksum types other than CRC32c,
> it is not anymore hard-coded which checksum type a btrfs filesystem uses.
> 
> Up to now there is no good way to read the filesystem checksum, apart from
> reading the filesystem UUID and then query sysfs for the checksum type.
> 
> Add a new csum_type field to the BTRFS_IOC_FS_INFO ioctl command which
> usually is used to query filesystem features. Also add a flags member
> indicating that the kernel responded with a set csum_type field.
> 
> To simplify further additions to the ioctl, also switch the padding to a
> u8 array. Pahole was used to verify the result of this switch:
> 
> pahole -C btrfs_ioctl_fs_info_args fs/btrfs/btrfs.ko
> struct btrfs_ioctl_fs_info_args {
>         __u64                      max_id;               /*     0     8 */
>         __u64                      num_devices;          /*     8     8 */
>         __u8                       fsid[16];             /*    16    16 */
g         __u32                      nodesize;             /*    32     4 */
>         __u32                      sectorsize;           /*    36     4 */
>         __u32                      clone_alignment;      /*    40     4 */
>         __u32                      flags;                /*    44     4 */
>         __u16                      csum_type;            /*    48     2 */
>         __u16                      csum_size;            /*    50     2 */
>         __u8                       reserved[972];        /*    52   972 */
> 
>         /* size: 1024, cachelines: 16, members: 10 */
> };
> 
> Fixes: 3951e7f050ac ("btrfs: add xxhash64 to checksumming algorithms")
> Fixes: 3831bf0094ab ("btrfs: add sha256 to checksumming algorithm")
> Cc: stable@vger.kernel.org

CC: stable@vger.kernel.org # 5.5+

it'll not compile otherwise.

> +++ b/fs/btrfs/ioctl.c
> @@ -3217,6 +3217,9 @@ static long btrfs_ioctl_fs_info(struct btrfs_fs_info *fs_info,
>  	fi_args->nodesize = fs_info->nodesize;
>  	fi_args->sectorsize = fs_info->sectorsize;
>  	fi_args->clone_alignment = fs_info->sectorsize;
> +	fi_args->csum_type = btrfs_super_csum_type(fs_info->super_copy);
> +	fi_args->csum_size = btrfs_super_csum_size(fs_info->super_copy);
> +	fi_args->flags |= BTRFS_FS_INFO_FLAG_CSUM_TYPE_SIZE;
>  
>  	if (copy_to_user(arg, fi_args, sizeof(*fi_args)))
>  		ret = -EFAULT;
> diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h
> index e6b6cb0f8bc6..2de3ef3c5c71 100644
> --- a/include/uapi/linux/btrfs.h
> +++ b/include/uapi/linux/btrfs.h
> @@ -250,10 +250,20 @@ struct btrfs_ioctl_fs_info_args {
>  	__u32 nodesize;				/* out */
>  	__u32 sectorsize;			/* out */
>  	__u32 clone_alignment;			/* out */
> -	__u32 reserved32;
> -	__u64 reserved[122];			/* pad to 1k */
> +	__u32 flags;				/* out */

After the discussion under v2 with Hans, I think he has a point that
future extension could be problematic as it was with the LOGICAL_INO.
It's similar, once we'd want to do the input flags, there's no way to
make the behaviour safe.

If all ioctl users would zero the buffer it's all fine, but I don't know
how to make that more than a convention and given that this is not well
documented we can't blame users/programs when this is not honored.

So, my suggestion is to make the flags also input, where the valid value
is 0, meaning 'return everything you have'. In this case it's a no-op,
but allows future extensions and fine grained data retrieval.

There's effectively no change in the implementation, other than
documenting the 'in' semantics.

Although this is basically the same situation as in the LOGICAL_INO v1
and v2, the number of users of FS_INFO ioctl is presumably not high and
the buffer has been write-only so far, there's no existing logic that
would had to be tweaked.

Once the flags are there, all new implementations should take the
semantics into account. Therefore I think this is a safe plan, but feel
free to poke more holes to that.

Hans van Kranenburg June 27, 2020, 9:01 p.m. UTC | #2

On 6/26/20 10:06 PM, David Sterba wrote:
> On Sat, Jun 27, 2020 at 12:01:07AM +0900, Johannes Thumshirn wrote:
>> With the recent addition of filesystem checksum types other than CRC32c,
>> it is not anymore hard-coded which checksum type a btrfs filesystem uses.
>>
>> Up to now there is no good way to read the filesystem checksum, apart from
>> reading the filesystem UUID and then query sysfs for the checksum type.
>>
>> Add a new csum_type field to the BTRFS_IOC_FS_INFO ioctl command which
>> usually is used to query filesystem features. Also add a flags member
>> indicating that the kernel responded with a set csum_type field.
>>
>> To simplify further additions to the ioctl, also switch the padding to a
>> u8 array. Pahole was used to verify the result of this switch:
>>
>> pahole -C btrfs_ioctl_fs_info_args fs/btrfs/btrfs.ko
>> struct btrfs_ioctl_fs_info_args {
>>         __u64                      max_id;               /*     0     8 */
>>         __u64                      num_devices;          /*     8     8 */
>>         __u8                       fsid[16];             /*    16    16 */
> g         __u32                      nodesize;             /*    32     4 */
>>         __u32                      sectorsize;           /*    36     4 */
>>         __u32                      clone_alignment;      /*    40     4 */
>>         __u32                      flags;                /*    44     4 */
>>         __u16                      csum_type;            /*    48     2 */
>>         __u16                      csum_size;            /*    50     2 */
>>         __u8                       reserved[972];        /*    52   972 */
>>
>>         /* size: 1024, cachelines: 16, members: 10 */
>> };
>>
>> Fixes: 3951e7f050ac ("btrfs: add xxhash64 to checksumming algorithms")
>> Fixes: 3831bf0094ab ("btrfs: add sha256 to checksumming algorithm")
>> Cc: stable@vger.kernel.org
> 
> CC: stable@vger.kernel.org # 5.5+
> 
> it'll not compile otherwise.
> 
>> +++ b/fs/btrfs/ioctl.c
>> @@ -3217,6 +3217,9 @@ static long btrfs_ioctl_fs_info(struct btrfs_fs_info *fs_info,
>>  	fi_args->nodesize = fs_info->nodesize;
>>  	fi_args->sectorsize = fs_info->sectorsize;
>>  	fi_args->clone_alignment = fs_info->sectorsize;
>> +	fi_args->csum_type = btrfs_super_csum_type(fs_info->super_copy);
>> +	fi_args->csum_size = btrfs_super_csum_size(fs_info->super_copy);
>> +	fi_args->flags |= BTRFS_FS_INFO_FLAG_CSUM_TYPE_SIZE;
>>  
>>  	if (copy_to_user(arg, fi_args, sizeof(*fi_args)))
>>  		ret = -EFAULT;
>> diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h
>> index e6b6cb0f8bc6..2de3ef3c5c71 100644
>> --- a/include/uapi/linux/btrfs.h
>> +++ b/include/uapi/linux/btrfs.h
>> @@ -250,10 +250,20 @@ struct btrfs_ioctl_fs_info_args {
>>  	__u32 nodesize;				/* out */
>>  	__u32 sectorsize;			/* out */
>>  	__u32 clone_alignment;			/* out */
>> -	__u32 reserved32;
>> -	__u64 reserved[122];			/* pad to 1k */
>> +	__u32 flags;				/* out */
> 
> After the discussion under v2 with Hans, I think he has a point that
> future extension could be problematic as it was with the LOGICAL_INO.
> It's similar, once we'd want to do the input flags, there's no way to
> make the behaviour safe.
> 
> If all ioctl users would zero the buffer it's all fine, but I don't know
> how to make that more than a convention

At least for all new code, check and demand that the buffer is zeroed
(except for input fields, of course). This is typically something from
the 'been-there-done-that-oops' category. It's not realized that it's
necessary until running into these issues when it's too late. :)

> and given that this is not well
> documented we can't blame users/programs when this is not honored.

The fun of maintaining stable APIs :)

> So, my suggestion is to make the flags also input,

If we think that with whatever being added in the future the output will
still only contain a bunch of values that are very cheap to collect when
filling the response buffer, then why would you still want to have this?

> where the valid value
> is 0, meaning 'return everything you have'. In this case it's a no-op,
> but allows future extensions and fine grained data retrieval.

This will work right now when 0 means return everything and when
uninitialized data in the field (with 1s somewhere) returns or does not
return extra stuff. The old calling code will not look at those parts of
the response anyway.

But this does not allow you to signal that requested data that is not
implemented is not available with ENOENT (e.g. with the unavailable
flags set in the response) or anything. Ever. Or am I missing something?

> There's effectively no change in the implementation, other than
> documenting the 'in' semantics.
> 
> Although this is basically the same situation as in the LOGICAL_INO v1
> and v2, the number of users of FS_INFO ioctl is presumably not high and
> the buffer has been write-only so far, there's no existing logic that
> would had to be tweaked.
> 
> Once the flags are there, all new implementations should take the
> semantics into account. Therefore I think this is a safe plan, but feel
> free to poke more holes to that.

In the V2 thread you mentioned generation, metadata_uuid, total_bytes as
interesting missing ones. What about adding them just right now directly?

K

Hans van Kranenburg June 27, 2020, 10:35 p.m. UTC | #3

Ah,

On 6/27/20 11:01 PM, Hans van Kranenburg wrote:
> On 6/26/20 10:06 PM, David Sterba wrote:
>> On Sat, Jun 27, 2020 at 12:01:07AM +0900, Johannes Thumshirn wrote:
>>> With the recent addition of filesystem checksum types other than CRC32c,
>>> it is not anymore hard-coded which checksum type a btrfs filesystem uses.
>>>
>>> Up to now there is no good way to read the filesystem checksum, apart from
>>> reading the filesystem UUID and then query sysfs for the checksum type.
>>>
>>> Add a new csum_type field to the BTRFS_IOC_FS_INFO ioctl command which
>>> usually is used to query filesystem features. Also add a flags member
>>> indicating that the kernel responded with a set csum_type field.
>>>
>>> To simplify further additions to the ioctl, also switch the padding to a
>>> u8 array. Pahole was used to verify the result of this switch:
>>>
>>> pahole -C btrfs_ioctl_fs_info_args fs/btrfs/btrfs.ko
>>> struct btrfs_ioctl_fs_info_args {
>>>         __u64                      max_id;               /*     0     8 */
>>>         __u64                      num_devices;          /*     8     8 */
>>>         __u8                       fsid[16];             /*    16    16 */
>> g         __u32                      nodesize;             /*    32     4 */
>>>         __u32                      sectorsize;           /*    36     4 */
>>>         __u32                      clone_alignment;      /*    40     4 */
>>>         __u32                      flags;                /*    44     4 */
>>>         __u16                      csum_type;            /*    48     2 */
>>>         __u16                      csum_size;            /*    50     2 */
>>>         __u8                       reserved[972];        /*    52   972 */
>>>
>>>         /* size: 1024, cachelines: 16, members: 10 */
>>> };
>>>
>>> Fixes: 3951e7f050ac ("btrfs: add xxhash64 to checksumming algorithms")
>>> Fixes: 3831bf0094ab ("btrfs: add sha256 to checksumming algorithm")
>>> Cc: stable@vger.kernel.org
>>
>> CC: stable@vger.kernel.org # 5.5+
>>
>> it'll not compile otherwise.
>>
>>> +++ b/fs/btrfs/ioctl.c
>>> @@ -3217,6 +3217,9 @@ static long btrfs_ioctl_fs_info(struct btrfs_fs_info *fs_info,
>>>  	fi_args->nodesize = fs_info->nodesize;
>>>  	fi_args->sectorsize = fs_info->sectorsize;
>>>  	fi_args->clone_alignment = fs_info->sectorsize;
>>> +	fi_args->csum_type = btrfs_super_csum_type(fs_info->super_copy);
>>> +	fi_args->csum_size = btrfs_super_csum_size(fs_info->super_copy);
>>> +	fi_args->flags |= BTRFS_FS_INFO_FLAG_CSUM_TYPE_SIZE;
>>>  
>>>  	if (copy_to_user(arg, fi_args, sizeof(*fi_args)))
>>>  		ret = -EFAULT;
>>> diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h
>>> index e6b6cb0f8bc6..2de3ef3c5c71 100644
>>> --- a/include/uapi/linux/btrfs.h
>>> +++ b/include/uapi/linux/btrfs.h
>>> @@ -250,10 +250,20 @@ struct btrfs_ioctl_fs_info_args {
>>>  	__u32 nodesize;				/* out */
>>>  	__u32 sectorsize;			/* out */
>>>  	__u32 clone_alignment;			/* out */
>>> -	__u32 reserved32;
>>> -	__u64 reserved[122];			/* pad to 1k */
>>> +	__u32 flags;				/* out */
>>
>> After the discussion under v2 with Hans, I think he has a point that
>> future extension could be problematic as it was with the LOGICAL_INO.
>> It's similar, once we'd want to do the input flags, there's no way to
>> make the behaviour safe.
>>
>> If all ioctl users would zero the buffer it's all fine, but I don't know
>> how to make that more than a convention
> 
> At least for all new code, check and demand that the buffer is zeroed
> (except for input fields, of course). This is typically something from
> the 'been-there-done-that-oops' category. It's not realized that it's
> necessary until running into these issues when it's too late. :)
> 
>> and given that this is not well
>> documented we can't blame users/programs when this is not honored.
> 
> The fun of maintaining stable APIs :)
> 
>> So, my suggestion is to make the flags also input,
> 
> If we think that with whatever being added in the future the output will
> still only contain a bunch of values that are very cheap to collect when
> filling the response buffer, then why would you still want to have this?
> 
>> where the valid value
>> is 0, meaning 'return everything you have'. In this case it's a no-op,
>> but allows future extensions and fine grained data retrieval.
> 
> This will work right now when 0 means return everything and when
> uninitialized data in the field (with 1s somewhere) returns or does not
> return extra stuff. The old calling code will not look at those parts of
> the response anyway.
> 
> But this does not allow you to signal that requested data that is not
> implemented is not available with ENOENT (e.g. with the unavailable
> flags set in the response) or anything. Ever. Or am I missing something?

Ah, I think what I'm missing is that this can work without ever sending
an error code if something is requested that the currently running
kernel does not support, since the flag can be reset in the response, so
the caller can (must) look at that and see it's not available.

Clever :)

However, then I would propose to implement that right now. So, tell the
users that "hey! there are new fields, if you want to see them just flip
all bits to 1 in the flags input". I mean, the calling code has to be
touched anyway to do something with it, so adding some ones in the
buffer is a small extra change.

Current calling code following best practices will send a zeroed buffer,
and does not get any new goodies, but that old code doesn't do anything
with it anyway. (So, that means *not* mapping the all zeroes to all
ones. All zero means no new features!)

Caller with uninitialized memory will get what they accidentally ask
for, with unavailable bits cleared in the output, but noone cares, since
the calling code doesn't look at that part of the buffer anyway.

:O

The BIG difference for this case vs. LOGICAL_INO_V2 is that for
LOGICAL_INO it would change the behavior of the called function, which
is a totally different cup of tea than just messing around in a part of
the buffer that's ignored anyway.

So, the same thing as done here could be done when extending
GET_DEV_STATS because I see users are asking about counters for repaired
errors to be added, so they can be used for early warning alerts of
failing disks.

>> There's effectively no change in the implementation, other than
>> documenting the 'in' semantics.
>>
>> Although this is basically the same situation as in the LOGICAL_INO v1
>> and v2, the number of users of FS_INFO ioctl is presumably not high and
>> the buffer has been write-only so far, there's no existing logic that
>> would had to be tweaked.
>>
>> Once the flags are there, all new implementations should take the
>> semantics into account. Therefore I think this is a safe plan, but feel
>> free to poke more holes to that.
> 
> In the V2 thread you mentioned generation, metadata_uuid, total_bytes as
> interesting missing ones. What about adding them just right now directly?
> 
> K

Johannes Thumshirn June 29, 2020, 6:45 a.m. UTC | #4

On 28/06/2020 00:35, Hans van Kranenburg wrote:
> However, then I would propose to implement that right now. So, tell the
> users that "hey! there are new fields, if you want to see them just flip
> all bits to 1 in the flags input". I mean, the calling code has to be
> touched anyway to do something with it, so adding some ones in the
> buffer is a small extra change.

Agreed, let's make the new features opt-in.

David Sterba June 30, 2020, 8:48 a.m. UTC | #5

On Sun, Jun 28, 2020 at 12:35:27AM +0200, Hans van Kranenburg wrote:
> So, the same thing as done here could be done when extending
> GET_DEV_STATS because I see users are asking about counters for repaired
> errors to be added, so they can be used for early warning alerts of
> failing disks.

Yes the dev stats are extensible following the same scheme and we'll add
new counters eventually.

David Sterba June 30, 2020, 8:50 a.m. UTC | #6

On Sat, Jun 27, 2020 at 11:01:36PM +0200, Hans van Kranenburg wrote:
> > There's effectively no change in the implementation, other than
> > documenting the 'in' semantics.
> > 
> > Although this is basically the same situation as in the LOGICAL_INO v1
> > and v2, the number of users of FS_INFO ioctl is presumably not high and
> > the buffer has been write-only so far, there's no existing logic that
> > would had to be tweaked.
> > 
> > Once the flags are there, all new implementations should take the
> > semantics into account. Therefore I think this is a safe plan, but feel
> > free to poke more holes to that.
> 
> In the V2 thread you mentioned generation, metadata_uuid, total_bytes as
> interesting missing ones. What about adding them just right now directly?

If you mean in the same patch, that would be mixing interface fix with
interface extension, so no, unless there's a really good reason to do
that in one patch.

[v3] btrfs: pass checksum type via BTRFS_IOC_FS_INFO ioctl

Commit Message

Comments

Patch