mbox series

[RFC,V9,0/6] btrfs: allocation_hint mode

Message ID cover.1639766364.git.kreijack@inwind.it (mailing list archive)
Headers show
Series btrfs: allocation_hint mode | expand

Message

Goffredo Baroncelli Dec. 17, 2021, 6:47 p.m. UTC
From: Goffredo Baroncelli <kreijack@inwind.it>

Hi all,

This patches set was born after some discussion between me, Zygo and Josef.
Some details can be found in https://github.com/btrfs/btrfs-todo/issues/19.

Some further information about a real use case can be found in
https://lore.kernel.org/linux-btrfs/20210116002533.GE31381@hungrycats.org/

Reently Shafeeq told me that he is interested too, due to the performance gain.

In V8 revision I switched away from an ioctl API in favor of a sysfs API (
see patch #2 and #3).

In V9 I renamed the sysfs interface from devinfo/type to devinfo/allocation_hint.
Moreover I renamed dev_info->type to dev_info->flags.

The idea behind this patches set, is to dedicate some disks (the fastest one)
to the metadata chunk. My initial idea was a "soft" hint. However Zygo
asked an option for a "strong" hint (== mandatory). The result is that
each disk can be "tagged" by one of the following flags:
- BTRFS_DEV_ALLOCATION_METADATA_ONLY
- BTRFS_DEV_ALLOCATION_PREFERRED_METADATA
- BTRFS_DEV_ALLOCATION_PREFERRED_DATA
- BTRFS_DEV_ALLOCATION_DATA_ONLY

When the chunk allocator search a disks to allocate a chunk, scans the disks
in an order decided by these tags. For metadata, the order is:
*_METADATA_ONLY
*_PREFERRED_METADATA
*_PREFERRED_DATA

The *_DATA_ONLY are not eligible from metadata chunk allocation.

For the data chunk, the order is reversed, and the *_METADATA_ONLY are
excluded.

The exact sort logic is to sort first for the "tag", and then for the space
available. If there is no space available, the next "tag" disks set are
selected.

To set these tags, a new property called "allocation_hint" was created.
There is a dedicated btrfs-prog patches set [[PATCH V9] btrfs-progs:
allocation_hint disk property].

$ sudo mount /dev/loop0 /mnt/test-btrfs/
$ for i in /dev/loop[0-9]; do sudo ./btrfs prop get $i allocation_hint; done
devid=1, path=/dev/loop0: allocation_hint=PREFERRED_METADATA
devid=2, path=/dev/loop1: allocation_hint=PREFERRED_METADATA
devid=3, path=/dev/loop2: allocation_hint=PREFERRED_DATA
devid=4, path=/dev/loop3: allocation_hint=PREFERRED_DATA
devid=5, path=/dev/loop4: allocation_hint=PREFERRED_DATA
devid=6, path=/dev/loop5: allocation_hint=DATA_ONLY
devid=7, path=/dev/loop6: allocation_hint=METADATA_ONLY
devid=8, path=/dev/loop7: allocation_hint=METADATA_ONLY

$ sudo ./btrfs fi us /mnt/test-btrfs/
Overall:
    Device size:           2.75GiB
    Device allocated:           1.34GiB
    Device unallocated:           1.41GiB
    Device missing:             0.00B
    Used:             400.89MiB
    Free (estimated):           1.04GiB    (min: 1.04GiB)
    Data ratio:                  2.00
    Metadata ratio:              1.00
    Global reserve:           3.25MiB    (used: 0.00B)
    Multiple profiles:                no

Data,RAID1: Size:542.00MiB, Used:200.25MiB (36.95%)
   /dev/loop0     288.00MiB
   /dev/loop1     288.00MiB
   /dev/loop2     127.00MiB
   /dev/loop3     127.00MiB
   /dev/loop4     127.00MiB
   /dev/loop5     127.00MiB

Metadata,single: Size:256.00MiB, Used:384.00KiB (0.15%)
   /dev/loop1     256.00MiB

System,single: Size:32.00MiB, Used:16.00KiB (0.05%)
   /dev/loop0      32.00MiB

Unallocated:
   /dev/loop0     704.00MiB
   /dev/loop1     480.00MiB
   /dev/loop2       1.00MiB
   /dev/loop3       1.00MiB
   /dev/loop4       1.00MiB
   /dev/loop5       1.00MiB
   /dev/loop6     128.00MiB
   /dev/loop7     128.00MiB

# change the tag of some disks

$ sudo ./btrfs prop set /dev/loop0 allocation_hint DATA_ONLY
$ sudo ./btrfs prop set /dev/loop1 allocation_hint DATA_ONLY
$ sudo ./btrfs prop set /dev/loop5 allocation_hint METADATA_ONLY

$ for i in /dev/loop[0-9]; do sudo ./btrfs prop get $i allocation_hint; done
devid=1, path=/dev/loop0: allocation_hint=DATA_ONLY
devid=2, path=/dev/loop1: allocation_hint=DATA_ONLY
devid=3, path=/dev/loop2: allocation_hint=PREFERRED_DATA
devid=4, path=/dev/loop3: allocation_hint=PREFERRED_DATA
devid=5, path=/dev/loop4: allocation_hint=PREFERRED_DATA
devid=6, path=/dev/loop5: allocation_hint=METADATA_ONLY
devid=7, path=/dev/loop6: allocation_hint=METADATA_ONLY
devid=8, path=/dev/loop7: allocation_hint=METADATA_ONLY

$ sudo btrfs bal start --full-balance /mnt/test-btrfs/
$ sudo ./btrfs fi us /mnt/test-btrfs/
Overall:
    Device size:           2.75GiB
    Device allocated:         735.00MiB
    Device unallocated:           2.03GiB
    Device missing:             0.00B
    Used:             400.72MiB
    Free (estimated):           1.10GiB    (min: 1.10GiB)
    Data ratio:                  2.00
    Metadata ratio:              1.00
    Global reserve:           3.25MiB    (used: 0.00B)
    Multiple profiles:                no

Data,RAID1: Size:288.00MiB, Used:200.19MiB (69.51%)
   /dev/loop0     288.00MiB
   /dev/loop1     288.00MiB

Metadata,single: Size:127.00MiB, Used:336.00KiB (0.26%)
   /dev/loop5     127.00MiB

System,single: Size:32.00MiB, Used:16.00KiB (0.05%)
   /dev/loop7      32.00MiB

Unallocated:
   /dev/loop0     736.00MiB
   /dev/loop1     736.00MiB
   /dev/loop2     128.00MiB
   /dev/loop3     128.00MiB
   /dev/loop4     128.00MiB
   /dev/loop5       1.00MiB
   /dev/loop6     128.00MiB
   /dev/loop7      96.00MiB


#As you can see all the metadata were placed on the disk loop5/loop7 even if
#the most empty one are loop0 and loop1.



TODO:
- more tests
- the tool which show the space available should consider the tagging (eg
  the disks tagged by _METADATA_ONLY should be excluded from the data
  availability)
- allow btrfs-prog to change the allocation_hint even when the filesystem
  is not mounted.


Comments are welcome
BR
G.Baroncelli

Revision:
V9:
- rename dev_item->type to dev_item->flags
- rename /sys/fs/btrfs/$UUID/devinfo/type -> allocation_hint

V8:
- drop the ioctl API, instead use a sysfs one

V7:
- make more room in the struct btrfs_ioctl_dev_properties up to 1K
- leave in btrfs_tree.h only the costants
- removed the mount option (sic)
- correct an 'use before check' in the while loop (signaled
  by Zygo)
- add a 2nd sort to be sure that the device_info array is in the
  expected order

V6:
- add further values to the hints: add the possibility to
  exclude a disk for a chunk type 


Goffredo Baroncelli (6):
  btrfs: add flags to give an hint to the chunk allocator
  btrfs: export the device allocation_hint property in sysfs
  btrfs: change the device allocation_hint property via sysfs
  btrfs: add allocation_hint mode
  btrfs: rename dev_item->type to dev_item->flags
  btrfs: add allocation_hint option.

 fs/btrfs/ctree.h                |  18 +++++-
 fs/btrfs/disk-io.c              |   4 +-
 fs/btrfs/super.c                |  17 ++++++
 fs/btrfs/sysfs.c                |  73 ++++++++++++++++++++++
 fs/btrfs/volumes.c              | 105 ++++++++++++++++++++++++++++++--
 fs/btrfs/volumes.h              |   7 ++-
 include/uapi/linux/btrfs_tree.h |  20 +++++-
 7 files changed, 232 insertions(+), 12 deletions(-)

Comments

Boris Burkov Jan. 5, 2022, 2:44 a.m. UTC | #1
On Fri, Dec 17, 2021 at 07:47:16PM +0100, Goffredo Baroncelli wrote:
> From: Goffredo Baroncelli <kreijack@inwind.it>
> 
> Hi all,
> 
> This patches set was born after some discussion between me, Zygo and Josef.
> Some details can be found in https://github.com/btrfs/btrfs-todo/issues/19.
> 
> Some further information about a real use case can be found in
> https://lore.kernel.org/linux-btrfs/20210116002533.GE31381@hungrycats.org/
> 
> Reently Shafeeq told me that he is interested too, due to the performance gain.
> 
> In V8 revision I switched away from an ioctl API in favor of a sysfs API (
> see patch #2 and #3).
> 
> In V9 I renamed the sysfs interface from devinfo/type to devinfo/allocation_hint.
> Moreover I renamed dev_info->type to dev_info->flags.
> 
> The idea behind this patches set, is to dedicate some disks (the fastest one)
> to the metadata chunk. My initial idea was a "soft" hint. However Zygo
> asked an option for a "strong" hint (== mandatory). The result is that
> each disk can be "tagged" by one of the following flags:
> - BTRFS_DEV_ALLOCATION_METADATA_ONLY
> - BTRFS_DEV_ALLOCATION_PREFERRED_METADATA
> - BTRFS_DEV_ALLOCATION_PREFERRED_DATA
> - BTRFS_DEV_ALLOCATION_DATA_ONLY
> 
> When the chunk allocator search a disks to allocate a chunk, scans the disks
> in an order decided by these tags. For metadata, the order is:
> *_METADATA_ONLY
> *_PREFERRED_METADATA
> *_PREFERRED_DATA
> 
> The *_DATA_ONLY are not eligible from metadata chunk allocation.
> 
> For the data chunk, the order is reversed, and the *_METADATA_ONLY are
> excluded.
> 
> The exact sort logic is to sort first for the "tag", and then for the space
> available. If there is no space available, the next "tag" disks set are
> selected.
> 
> To set these tags, a new property called "allocation_hint" was created.
> There is a dedicated btrfs-prog patches set [[PATCH V9] btrfs-progs:
> allocation_hint disk property].
> 
> $ sudo mount /dev/loop0 /mnt/test-btrfs/
> $ for i in /dev/loop[0-9]; do sudo ./btrfs prop get $i allocation_hint; done
> devid=1, path=/dev/loop0: allocation_hint=PREFERRED_METADATA
> devid=2, path=/dev/loop1: allocation_hint=PREFERRED_METADATA
> devid=3, path=/dev/loop2: allocation_hint=PREFERRED_DATA
> devid=4, path=/dev/loop3: allocation_hint=PREFERRED_DATA
> devid=5, path=/dev/loop4: allocation_hint=PREFERRED_DATA
> devid=6, path=/dev/loop5: allocation_hint=DATA_ONLY
> devid=7, path=/dev/loop6: allocation_hint=METADATA_ONLY
> devid=8, path=/dev/loop7: allocation_hint=METADATA_ONLY
> 
> $ sudo ./btrfs fi us /mnt/test-btrfs/
> Overall:
>     Device size:           2.75GiB
>     Device allocated:           1.34GiB
>     Device unallocated:           1.41GiB
>     Device missing:             0.00B
>     Used:             400.89MiB
>     Free (estimated):           1.04GiB    (min: 1.04GiB)
>     Data ratio:                  2.00
>     Metadata ratio:              1.00
>     Global reserve:           3.25MiB    (used: 0.00B)
>     Multiple profiles:                no
> 
> Data,RAID1: Size:542.00MiB, Used:200.25MiB (36.95%)
>    /dev/loop0     288.00MiB
>    /dev/loop1     288.00MiB
>    /dev/loop2     127.00MiB
>    /dev/loop3     127.00MiB
>    /dev/loop4     127.00MiB
>    /dev/loop5     127.00MiB
> 
> Metadata,single: Size:256.00MiB, Used:384.00KiB (0.15%)
>    /dev/loop1     256.00MiB
> 
> System,single: Size:32.00MiB, Used:16.00KiB (0.05%)
>    /dev/loop0      32.00MiB
> 
> Unallocated:
>    /dev/loop0     704.00MiB
>    /dev/loop1     480.00MiB
>    /dev/loop2       1.00MiB
>    /dev/loop3       1.00MiB
>    /dev/loop4       1.00MiB
>    /dev/loop5       1.00MiB
>    /dev/loop6     128.00MiB
>    /dev/loop7     128.00MiB
> 
> # change the tag of some disks
> 
> $ sudo ./btrfs prop set /dev/loop0 allocation_hint DATA_ONLY
> $ sudo ./btrfs prop set /dev/loop1 allocation_hint DATA_ONLY
> $ sudo ./btrfs prop set /dev/loop5 allocation_hint METADATA_ONLY
> 
> $ for i in /dev/loop[0-9]; do sudo ./btrfs prop get $i allocation_hint; done
> devid=1, path=/dev/loop0: allocation_hint=DATA_ONLY
> devid=2, path=/dev/loop1: allocation_hint=DATA_ONLY
> devid=3, path=/dev/loop2: allocation_hint=PREFERRED_DATA
> devid=4, path=/dev/loop3: allocation_hint=PREFERRED_DATA
> devid=5, path=/dev/loop4: allocation_hint=PREFERRED_DATA
> devid=6, path=/dev/loop5: allocation_hint=METADATA_ONLY
> devid=7, path=/dev/loop6: allocation_hint=METADATA_ONLY
> devid=8, path=/dev/loop7: allocation_hint=METADATA_ONLY
> 
> $ sudo btrfs bal start --full-balance /mnt/test-btrfs/
> $ sudo ./btrfs fi us /mnt/test-btrfs/
> Overall:
>     Device size:           2.75GiB
>     Device allocated:         735.00MiB
>     Device unallocated:           2.03GiB
>     Device missing:             0.00B
>     Used:             400.72MiB
>     Free (estimated):           1.10GiB    (min: 1.10GiB)
>     Data ratio:                  2.00
>     Metadata ratio:              1.00
>     Global reserve:           3.25MiB    (used: 0.00B)
>     Multiple profiles:                no
> 
> Data,RAID1: Size:288.00MiB, Used:200.19MiB (69.51%)
>    /dev/loop0     288.00MiB
>    /dev/loop1     288.00MiB
> 
> Metadata,single: Size:127.00MiB, Used:336.00KiB (0.26%)
>    /dev/loop5     127.00MiB
> 
> System,single: Size:32.00MiB, Used:16.00KiB (0.05%)
>    /dev/loop7      32.00MiB
> 
> Unallocated:
>    /dev/loop0     736.00MiB
>    /dev/loop1     736.00MiB
>    /dev/loop2     128.00MiB
>    /dev/loop3     128.00MiB
>    /dev/loop4     128.00MiB
>    /dev/loop5       1.00MiB
>    /dev/loop6     128.00MiB
>    /dev/loop7      96.00MiB
> 
> 
> #As you can see all the metadata were placed on the disk loop5/loop7 even if
> #the most empty one are loop0 and loop1.
> 
> 
> 
> TODO:
> - more tests
> - the tool which show the space available should consider the tagging (eg
>   the disks tagged by _METADATA_ONLY should be excluded from the data
>   availability)
> - allow btrfs-prog to change the allocation_hint even when the filesystem
>   is not mounted.
> 
> 
> Comments are welcome

This is cool, thanks for building it!

I'm playing with setting this up for a test I'm working on where I want
to send data to a dm-zero device. To that end, I applied this patchset
on top of misc-next and ran:

$ mkfs.btrfs -f /dev/vg0/lv0 -dsingle -msingle
$ mount /dev/vg0/lv0 /mnt/lol
$ btrfs device add /dev/mapper/zero-data /mnt/lol
$ btrfs fi usage /mnt/lol
Overall:
    Device size:                  50.01TiB
    Device allocated:             20.00MiB
    Device unallocated:           50.01TiB
    Device missing:                  0.00B
    Used:                        128.00KiB
    Free (estimated):             50.01TiB      (min: 50.01TiB)
    Free (statfs, df):            50.01TiB
    Data ratio:                       1.00
    Metadata ratio:                   1.00
    Global reserve:                3.25MiB      (used: 0.00B)
    Multiple profiles:                  no

Data,single: Size:8.00MiB, Used:0.00B (0.00%)
   /dev/mapper/vg0-lv0     8.00MiB

Metadata,single: Size:8.00MiB, Used:112.00KiB (1.37%)
   /dev/mapper/vg0-lv0     8.00MiB

System,single: Size:4.00MiB, Used:16.00KiB (0.39%)
   /dev/mapper/vg0-lv0     4.00MiB

Unallocated:
   /dev/mapper/vg0-lv0     9.98GiB
   /dev/mapper/zero-data          50.00TiB

$ ./btrfs property set -t device /dev/mapper/zero-data allocation_hint DATA_ONLY
$ ./btrfs property set -t device /dev/vg0/lv0 allocation_hint METADATA_ONLY

$ btrfs balance start --full-balance /mnt/lol
Done, had to relocate 3 out of 3 chunks

$ btrfs fi usage /mnt/lol
Overall:
    Device size:                  50.01TiB
    Device allocated:              2.03GiB
    Device unallocated:           50.01TiB
    Device missing:                  0.00B
    Used:                        640.00KiB
    Free (estimated):             50.01TiB      (min: 50.01TiB)
    Free (statfs, df):            50.01TiB
    Data ratio:                       1.00
    Metadata ratio:                   1.00
    Global reserve:                3.25MiB      (used: 0.00B)
    Multiple profiles:                  no

Data,single: Size:1.00GiB, Used:512.00KiB (0.05%)
   /dev/mapper/zero-data           1.00GiB

Metadata,single: Size:1.00GiB, Used:112.00KiB (0.01%)
   /dev/mapper/zero-data           1.00GiB

System,single: Size:32.00MiB, Used:16.00KiB (0.05%)
   /dev/mapper/zero-data          32.00MiB

Unallocated:
   /dev/mapper/vg0-lv0    10.00GiB
   /dev/mapper/zero-data          50.00TiB


I expected that I would have data on /dev/mapper/zero-data and metadata
on /dev/mapper/vg0-lv0, but it seems both of them were written to the zero
device. Attempting to actually use the file system eventually fails, since
the metadata is black-holed :)

Did I make some mistake in how I used it, or is this a bug?

Thanks,
Boris

> BR
> G.Baroncelli
> 
> Revision:
> V9:
> - rename dev_item->type to dev_item->flags
> - rename /sys/fs/btrfs/$UUID/devinfo/type -> allocation_hint
> 
> V8:
> - drop the ioctl API, instead use a sysfs one
> 
> V7:
> - make more room in the struct btrfs_ioctl_dev_properties up to 1K
> - leave in btrfs_tree.h only the costants
> - removed the mount option (sic)
> - correct an 'use before check' in the while loop (signaled
>   by Zygo)
> - add a 2nd sort to be sure that the device_info array is in the
>   expected order
> 
> V6:
> - add further values to the hints: add the possibility to
>   exclude a disk for a chunk type 
> 
> 
> Goffredo Baroncelli (6):
>   btrfs: add flags to give an hint to the chunk allocator
>   btrfs: export the device allocation_hint property in sysfs
>   btrfs: change the device allocation_hint property via sysfs
>   btrfs: add allocation_hint mode
>   btrfs: rename dev_item->type to dev_item->flags
>   btrfs: add allocation_hint option.
> 
>  fs/btrfs/ctree.h                |  18 +++++-
>  fs/btrfs/disk-io.c              |   4 +-
>  fs/btrfs/super.c                |  17 ++++++
>  fs/btrfs/sysfs.c                |  73 ++++++++++++++++++++++
>  fs/btrfs/volumes.c              | 105 ++++++++++++++++++++++++++++++--
>  fs/btrfs/volumes.h              |   7 ++-
>  include/uapi/linux/btrfs_tree.h |  20 +++++-
>  7 files changed, 232 insertions(+), 12 deletions(-)
> 
> -- 
> 2.34.1
>
Goffredo Baroncelli Jan. 5, 2022, 9:16 a.m. UTC | #2
Hi Boris,



On 1/5/22 03:44, Boris Burkov wrote:
[...]
> 
> This is cool, thanks for building it!
> 
> I'm playing with setting this up for a test I'm working on where I want
> to send data to a dm-zero device. To that end, I applied this patchset
> on top of misc-next and ran:
> 
> $ mkfs.btrfs -f /dev/vg0/lv0 -dsingle -msingle
> $ mount /dev/vg0/lv0 /mnt/lol

You should mount the filesystem with

$ mount -o allocation_hint=1 /dev/vg0/lv0 /mnt/lol


In the previous iteration I missed the patch #6, which activates this option. You can drop patch #6 and avoid to pass this option.

Please give me a feedback if this resolve.

BR
G.Baroncelli

> $ btrfs device add /dev/mapper/zero-data /mnt/lol
> $ btrfs fi usage /mnt/lol
> Overall:
>      Device size:                  50.01TiB
>      Device allocated:             20.00MiB
>      Device unallocated:           50.01TiB
>      Device missing:                  0.00B
>      Used:                        128.00KiB
>      Free (estimated):             50.01TiB      (min: 50.01TiB)
>      Free (statfs, df):            50.01TiB
>      Data ratio:                       1.00
>      Metadata ratio:                   1.00
>      Global reserve:                3.25MiB      (used: 0.00B)
>      Multiple profiles:                  no
> 
> Data,single: Size:8.00MiB, Used:0.00B (0.00%)
>     /dev/mapper/vg0-lv0     8.00MiB
> 
> Metadata,single: Size:8.00MiB, Used:112.00KiB (1.37%)
>     /dev/mapper/vg0-lv0     8.00MiB
> 
> System,single: Size:4.00MiB, Used:16.00KiB (0.39%)
>     /dev/mapper/vg0-lv0     4.00MiB
> 
> Unallocated:
>     /dev/mapper/vg0-lv0     9.98GiB
>     /dev/mapper/zero-data          50.00TiB
> 
> $ ./btrfs property set -t device /dev/mapper/zero-data allocation_hint DATA_ONLY
> $ ./btrfs property set -t device /dev/vg0/lv0 allocation_hint METADATA_ONLY
> 
> $ btrfs balance start --full-balance /mnt/lol
> Done, had to relocate 3 out of 3 chunks
> 
> $ btrfs fi usage /mnt/lol
> Overall:
>      Device size:                  50.01TiB
>      Device allocated:              2.03GiB
>      Device unallocated:           50.01TiB
>      Device missing:                  0.00B
>      Used:                        640.00KiB
>      Free (estimated):             50.01TiB      (min: 50.01TiB)
>      Free (statfs, df):            50.01TiB
>      Data ratio:                       1.00
>      Metadata ratio:                   1.00
>      Global reserve:                3.25MiB      (used: 0.00B)
>      Multiple profiles:                  no
> 
> Data,single: Size:1.00GiB, Used:512.00KiB (0.05%)
>     /dev/mapper/zero-data           1.00GiB
> 
> Metadata,single: Size:1.00GiB, Used:112.00KiB (0.01%)
>     /dev/mapper/zero-data           1.00GiB
> 
> System,single: Size:32.00MiB, Used:16.00KiB (0.05%)
>     /dev/mapper/zero-data          32.00MiB
> 
> Unallocated:
>     /dev/mapper/vg0-lv0    10.00GiB
>     /dev/mapper/zero-data          50.00TiB
> 
> 
> I expected that I would have data on /dev/mapper/zero-data and metadata
> on /dev/mapper/vg0-lv0, but it seems both of them were written to the zero
> device. Attempting to actually use the file system eventually fails, since
> the metadata is black-holed :)
> 
> Did I make some mistake in how I used it, or is this a bug?
> 
> Thanks,
> Boris
> 
>> BR
>> G.Baroncelli
>>
>> Revision:
>> V9:
>> - rename dev_item->type to dev_item->flags
>> - rename /sys/fs/btrfs/$UUID/devinfo/type -> allocation_hint
>>
>> V8:
>> - drop the ioctl API, instead use a sysfs one
>>
>> V7:
>> - make more room in the struct btrfs_ioctl_dev_properties up to 1K
>> - leave in btrfs_tree.h only the costants
>> - removed the mount option (sic)
>> - correct an 'use before check' in the while loop (signaled
>>    by Zygo)
>> - add a 2nd sort to be sure that the device_info array is in the
>>    expected order
>>
>> V6:
>> - add further values to the hints: add the possibility to
>>    exclude a disk for a chunk type
>>
>>
>> Goffredo Baroncelli (6):
>>    btrfs: add flags to give an hint to the chunk allocator
>>    btrfs: export the device allocation_hint property in sysfs
>>    btrfs: change the device allocation_hint property via sysfs
>>    btrfs: add allocation_hint mode
>>    btrfs: rename dev_item->type to dev_item->flags
>>    btrfs: add allocation_hint option.
>>
>>   fs/btrfs/ctree.h                |  18 +++++-
>>   fs/btrfs/disk-io.c              |   4 +-
>>   fs/btrfs/super.c                |  17 ++++++
>>   fs/btrfs/sysfs.c                |  73 ++++++++++++++++++++++
>>   fs/btrfs/volumes.c              | 105 ++++++++++++++++++++++++++++++--
>>   fs/btrfs/volumes.h              |   7 ++-
>>   include/uapi/linux/btrfs_tree.h |  20 +++++-
>>   7 files changed, 232 insertions(+), 12 deletions(-)
>>
>> -- 
>> 2.34.1
>>
Boris Burkov Jan. 5, 2022, 5:55 p.m. UTC | #3
On Wed, Jan 05, 2022 at 10:16:08AM +0100, Goffredo Baroncelli wrote:
> Hi Boris,
> 
> 
> 
> On 1/5/22 03:44, Boris Burkov wrote:
> [...]
> > 
> > This is cool, thanks for building it!
> > 
> > I'm playing with setting this up for a test I'm working on where I want
> > to send data to a dm-zero device. To that end, I applied this patchset
> > on top of misc-next and ran:
> > 
> > $ mkfs.btrfs -f /dev/vg0/lv0 -dsingle -msingle
> > $ mount /dev/vg0/lv0 /mnt/lol
> 
> You should mount the filesystem with
> 
> $ mount -o allocation_hint=1 /dev/vg0/lv0 /mnt/lol
> 

With this option, I got the expected usage output:

Data,single: Size:1.00GiB, Used:512.00KiB (0.05%)
   /dev/mapper/zero-data           1.00GiB

Metadata,single: Size:1.00GiB, Used:112.00KiB (0.01%)
   /dev/mapper/vg0-lv0     1.00GiB

Sorry I missed that, and thanks for the quick reply.

> 
> In the previous iteration I missed the patch #6, which activates this option. You can drop patch #6 and avoid to pass this option.
> 
> Please give me a feedback if this resolve.
> 
> BR
> G.Baroncelli
> 
> > $ btrfs device add /dev/mapper/zero-data /mnt/lol
> > $ btrfs fi usage /mnt/lol
> > Overall:
> >      Device size:                  50.01TiB
> >      Device allocated:             20.00MiB
> >      Device unallocated:           50.01TiB
> >      Device missing:                  0.00B
> >      Used:                        128.00KiB
> >      Free (estimated):             50.01TiB      (min: 50.01TiB)
> >      Free (statfs, df):            50.01TiB
> >      Data ratio:                       1.00
> >      Metadata ratio:                   1.00
> >      Global reserve:                3.25MiB      (used: 0.00B)
> >      Multiple profiles:                  no
> > 
> > Data,single: Size:8.00MiB, Used:0.00B (0.00%)
> >     /dev/mapper/vg0-lv0     8.00MiB
> > 
> > Metadata,single: Size:8.00MiB, Used:112.00KiB (1.37%)
> >     /dev/mapper/vg0-lv0     8.00MiB
> > 
> > System,single: Size:4.00MiB, Used:16.00KiB (0.39%)
> >     /dev/mapper/vg0-lv0     4.00MiB
> > 
> > Unallocated:
> >     /dev/mapper/vg0-lv0     9.98GiB
> >     /dev/mapper/zero-data          50.00TiB
> > 
> > $ ./btrfs property set -t device /dev/mapper/zero-data allocation_hint DATA_ONLY
> > $ ./btrfs property set -t device /dev/vg0/lv0 allocation_hint METADATA_ONLY
> > 
> > $ btrfs balance start --full-balance /mnt/lol
> > Done, had to relocate 3 out of 3 chunks
> > 
> > $ btrfs fi usage /mnt/lol
> > Overall:
> >      Device size:                  50.01TiB
> >      Device allocated:              2.03GiB
> >      Device unallocated:           50.01TiB
> >      Device missing:                  0.00B
> >      Used:                        640.00KiB
> >      Free (estimated):             50.01TiB      (min: 50.01TiB)
> >      Free (statfs, df):            50.01TiB
> >      Data ratio:                       1.00
> >      Metadata ratio:                   1.00
> >      Global reserve:                3.25MiB      (used: 0.00B)
> >      Multiple profiles:                  no
> > 
> > Data,single: Size:1.00GiB, Used:512.00KiB (0.05%)
> >     /dev/mapper/zero-data           1.00GiB
> > 
> > Metadata,single: Size:1.00GiB, Used:112.00KiB (0.01%)
> >     /dev/mapper/zero-data           1.00GiB
> > 
> > System,single: Size:32.00MiB, Used:16.00KiB (0.05%)
> >     /dev/mapper/zero-data          32.00MiB
> > 
> > Unallocated:
> >     /dev/mapper/vg0-lv0    10.00GiB
> >     /dev/mapper/zero-data          50.00TiB
> > 
> > 
> > I expected that I would have data on /dev/mapper/zero-data and metadata
> > on /dev/mapper/vg0-lv0, but it seems both of them were written to the zero
> > device. Attempting to actually use the file system eventually fails, since
> > the metadata is black-holed :)
> > 
> > Did I make some mistake in how I used it, or is this a bug?
> > 
> > Thanks,
> > Boris
> > 
> > > BR
> > > G.Baroncelli
> > > 
> > > Revision:
> > > V9:
> > > - rename dev_item->type to dev_item->flags
> > > - rename /sys/fs/btrfs/$UUID/devinfo/type -> allocation_hint
> > > 
> > > V8:
> > > - drop the ioctl API, instead use a sysfs one
> > > 
> > > V7:
> > > - make more room in the struct btrfs_ioctl_dev_properties up to 1K
> > > - leave in btrfs_tree.h only the costants
> > > - removed the mount option (sic)
> > > - correct an 'use before check' in the while loop (signaled
> > >    by Zygo)
> > > - add a 2nd sort to be sure that the device_info array is in the
> > >    expected order
> > > 
> > > V6:
> > > - add further values to the hints: add the possibility to
> > >    exclude a disk for a chunk type
> > > 
> > > 
> > > Goffredo Baroncelli (6):
> > >    btrfs: add flags to give an hint to the chunk allocator
> > >    btrfs: export the device allocation_hint property in sysfs
> > >    btrfs: change the device allocation_hint property via sysfs
> > >    btrfs: add allocation_hint mode
> > >    btrfs: rename dev_item->type to dev_item->flags
> > >    btrfs: add allocation_hint option.
> > > 
> > >   fs/btrfs/ctree.h                |  18 +++++-
> > >   fs/btrfs/disk-io.c              |   4 +-
> > >   fs/btrfs/super.c                |  17 ++++++
> > >   fs/btrfs/sysfs.c                |  73 ++++++++++++++++++++++
> > >   fs/btrfs/volumes.c              | 105 ++++++++++++++++++++++++++++++--
> > >   fs/btrfs/volumes.h              |   7 ++-
> > >   include/uapi/linux/btrfs_tree.h |  20 +++++-
> > >   7 files changed, 232 insertions(+), 12 deletions(-)
> > > 
> > > -- 
> > > 2.34.1
> > > 
> 
> 
> -- 
> gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
> Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
Zygo Blaxell Jan. 5, 2022, 6:07 p.m. UTC | #4
On Wed, Jan 05, 2022 at 10:16:08AM +0100, Goffredo Baroncelli wrote:
> Hi Boris,
> 
> 
> 
> On 1/5/22 03:44, Boris Burkov wrote:
> [...]
> > 
> > This is cool, thanks for building it!
> > 
> > I'm playing with setting this up for a test I'm working on where I want
> > to send data to a dm-zero device. To that end, I applied this patchset
> > on top of misc-next and ran:
> > 
> > $ mkfs.btrfs -f /dev/vg0/lv0 -dsingle -msingle
> > $ mount /dev/vg0/lv0 /mnt/lol
> 
> You should mount the filesystem with
> 
> $ mount -o allocation_hint=1 /dev/vg0/lv0 /mnt/lol
> 
> In the previous iteration I missed the patch #6, which activates this option. You can drop patch #6 and avoid to pass this option.

Can we drop the mount option from the series?  It isn't needed.

Or, if we must have it (and I am in no way conceding that we do),
at least make it default to enabled.  Or turn the mount option into a
disable flag under the 'rescue=' option to make it clear this option is
not intended to be used in normal operation.

> Please give me a feedback if this resolve.
> 
> BR
> G.Baroncelli
> 
> > $ btrfs device add /dev/mapper/zero-data /mnt/lol
> > $ btrfs fi usage /mnt/lol
> > Overall:
> >      Device size:                  50.01TiB
> >      Device allocated:             20.00MiB
> >      Device unallocated:           50.01TiB
> >      Device missing:                  0.00B
> >      Used:                        128.00KiB
> >      Free (estimated):             50.01TiB      (min: 50.01TiB)
> >      Free (statfs, df):            50.01TiB
> >      Data ratio:                       1.00
> >      Metadata ratio:                   1.00
> >      Global reserve:                3.25MiB      (used: 0.00B)
> >      Multiple profiles:                  no
> > 
> > Data,single: Size:8.00MiB, Used:0.00B (0.00%)
> >     /dev/mapper/vg0-lv0     8.00MiB
> > 
> > Metadata,single: Size:8.00MiB, Used:112.00KiB (1.37%)
> >     /dev/mapper/vg0-lv0     8.00MiB
> > 
> > System,single: Size:4.00MiB, Used:16.00KiB (0.39%)
> >     /dev/mapper/vg0-lv0     4.00MiB
> > 
> > Unallocated:
> >     /dev/mapper/vg0-lv0     9.98GiB
> >     /dev/mapper/zero-data          50.00TiB
> > 
> > $ ./btrfs property set -t device /dev/mapper/zero-data allocation_hint DATA_ONLY
> > $ ./btrfs property set -t device /dev/vg0/lv0 allocation_hint METADATA_ONLY
> > 
> > $ btrfs balance start --full-balance /mnt/lol
> > Done, had to relocate 3 out of 3 chunks
> > 
> > $ btrfs fi usage /mnt/lol
> > Overall:
> >      Device size:                  50.01TiB
> >      Device allocated:              2.03GiB
> >      Device unallocated:           50.01TiB
> >      Device missing:                  0.00B
> >      Used:                        640.00KiB
> >      Free (estimated):             50.01TiB      (min: 50.01TiB)
> >      Free (statfs, df):            50.01TiB
> >      Data ratio:                       1.00
> >      Metadata ratio:                   1.00
> >      Global reserve:                3.25MiB      (used: 0.00B)
> >      Multiple profiles:                  no
> > 
> > Data,single: Size:1.00GiB, Used:512.00KiB (0.05%)
> >     /dev/mapper/zero-data           1.00GiB
> > 
> > Metadata,single: Size:1.00GiB, Used:112.00KiB (0.01%)
> >     /dev/mapper/zero-data           1.00GiB
> > 
> > System,single: Size:32.00MiB, Used:16.00KiB (0.05%)
> >     /dev/mapper/zero-data          32.00MiB
> > 
> > Unallocated:
> >     /dev/mapper/vg0-lv0    10.00GiB
> >     /dev/mapper/zero-data          50.00TiB
> > 
> > 
> > I expected that I would have data on /dev/mapper/zero-data and metadata
> > on /dev/mapper/vg0-lv0, but it seems both of them were written to the zero
> > device. Attempting to actually use the file system eventually fails, since
> > the metadata is black-holed :)
> > 
> > Did I make some mistake in how I used it, or is this a bug?
> > 
> > Thanks,
> > Boris
> > 
> > > BR
> > > G.Baroncelli
> > > 
> > > Revision:
> > > V9:
> > > - rename dev_item->type to dev_item->flags
> > > - rename /sys/fs/btrfs/$UUID/devinfo/type -> allocation_hint
> > > 
> > > V8:
> > > - drop the ioctl API, instead use a sysfs one
> > > 
> > > V7:
> > > - make more room in the struct btrfs_ioctl_dev_properties up to 1K
> > > - leave in btrfs_tree.h only the costants
> > > - removed the mount option (sic)
> > > - correct an 'use before check' in the while loop (signaled
> > >    by Zygo)
> > > - add a 2nd sort to be sure that the device_info array is in the
> > >    expected order
> > > 
> > > V6:
> > > - add further values to the hints: add the possibility to
> > >    exclude a disk for a chunk type
> > > 
> > > 
> > > Goffredo Baroncelli (6):
> > >    btrfs: add flags to give an hint to the chunk allocator
> > >    btrfs: export the device allocation_hint property in sysfs
> > >    btrfs: change the device allocation_hint property via sysfs
> > >    btrfs: add allocation_hint mode
> > >    btrfs: rename dev_item->type to dev_item->flags
> > >    btrfs: add allocation_hint option.
> > > 
> > >   fs/btrfs/ctree.h                |  18 +++++-
> > >   fs/btrfs/disk-io.c              |   4 +-
> > >   fs/btrfs/super.c                |  17 ++++++
> > >   fs/btrfs/sysfs.c                |  73 ++++++++++++++++++++++
> > >   fs/btrfs/volumes.c              | 105 ++++++++++++++++++++++++++++++--
> > >   fs/btrfs/volumes.h              |   7 ++-
> > >   include/uapi/linux/btrfs_tree.h |  20 +++++-
> > >   7 files changed, 232 insertions(+), 12 deletions(-)
> > > 
> > > -- 
> > > 2.34.1
> > > 
> 
> 
> -- 
> gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
> Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
Goffredo Baroncelli Jan. 5, 2022, 6:16 p.m. UTC | #5
On 05/01/2022 19.07, Zygo Blaxell wrote:
> On Wed, Jan 05, 2022 at 10:16:08AM +0100, Goffredo Baroncelli wrote:
>> Hi Boris,
>>
>>
>>
>> On 1/5/22 03:44, Boris Burkov wrote:
>> [...]
>>>
>>> This is cool, thanks for building it!
>>>
>>> I'm playing with setting this up for a test I'm working on where I want
>>> to send data to a dm-zero device. To that end, I applied this patchset
>>> on top of misc-next and ran:
>>>
>>> $ mkfs.btrfs -f /dev/vg0/lv0 -dsingle -msingle
>>> $ mount /dev/vg0/lv0 /mnt/lol
>>
>> You should mount the filesystem with
>>
>> $ mount -o allocation_hint=1 /dev/vg0/lv0 /mnt/lol
>>
>> In the previous iteration I missed the patch #6, which activates this option. You can drop patch #6 and avoid to pass this option.
> 
> Can we drop the mount option from the series?  It isn't needed.
> 
> Or, if we must have it (and I am in no way conceding that we do),
> at least make it default to enabled.  Or turn the mount option into a
> disable flag under the 'rescue=' option to make it clear this option is
> not intended to be used in normal operation.

Frankly speaking it was a my mistake to add this patch. It was in the
queue, but in the last patches sets I dropped it. However in the last
one I forgot to drop it manually so it reappeared :-)

However I like your suggestion to add as 'rescue' option, where the
default is "enabled".

@Josef,
do you started to play with this patch ? If not can I send an update
where the main change is the renaming of the properties from

PREFERRED_<X> to <X>_PREFERRED

(e.g. PREFERRED_DATA to DATA_PREFERRED) which are more correct ?

BR
G.Baroncelli
> 
>> Please give me a feedback if this resolve.
>>
>> BR
>> G.Baroncelli
>>
>>> $ btrfs device add /dev/mapper/zero-data /mnt/lol
>>> $ btrfs fi usage /mnt/lol
>>> Overall:
>>>       Device size:                  50.01TiB
>>>       Device allocated:             20.00MiB
>>>       Device unallocated:           50.01TiB
>>>       Device missing:                  0.00B
>>>       Used:                        128.00KiB
>>>       Free (estimated):             50.01TiB      (min: 50.01TiB)
>>>       Free (statfs, df):            50.01TiB
>>>       Data ratio:                       1.00
>>>       Metadata ratio:                   1.00
>>>       Global reserve:                3.25MiB      (used: 0.00B)
>>>       Multiple profiles:                  no
>>>
>>> Data,single: Size:8.00MiB, Used:0.00B (0.00%)
>>>      /dev/mapper/vg0-lv0     8.00MiB
>>>
>>> Metadata,single: Size:8.00MiB, Used:112.00KiB (1.37%)
>>>      /dev/mapper/vg0-lv0     8.00MiB
>>>
>>> System,single: Size:4.00MiB, Used:16.00KiB (0.39%)
>>>      /dev/mapper/vg0-lv0     4.00MiB
>>>
>>> Unallocated:
>>>      /dev/mapper/vg0-lv0     9.98GiB
>>>      /dev/mapper/zero-data          50.00TiB
>>>
>>> $ ./btrfs property set -t device /dev/mapper/zero-data allocation_hint DATA_ONLY
>>> $ ./btrfs property set -t device /dev/vg0/lv0 allocation_hint METADATA_ONLY
>>>
>>> $ btrfs balance start --full-balance /mnt/lol
>>> Done, had to relocate 3 out of 3 chunks
>>>
>>> $ btrfs fi usage /mnt/lol
>>> Overall:
>>>       Device size:                  50.01TiB
>>>       Device allocated:              2.03GiB
>>>       Device unallocated:           50.01TiB
>>>       Device missing:                  0.00B
>>>       Used:                        640.00KiB
>>>       Free (estimated):             50.01TiB      (min: 50.01TiB)
>>>       Free (statfs, df):            50.01TiB
>>>       Data ratio:                       1.00
>>>       Metadata ratio:                   1.00
>>>       Global reserve:                3.25MiB      (used: 0.00B)
>>>       Multiple profiles:                  no
>>>
>>> Data,single: Size:1.00GiB, Used:512.00KiB (0.05%)
>>>      /dev/mapper/zero-data           1.00GiB
>>>
>>> Metadata,single: Size:1.00GiB, Used:112.00KiB (0.01%)
>>>      /dev/mapper/zero-data           1.00GiB
>>>
>>> System,single: Size:32.00MiB, Used:16.00KiB (0.05%)
>>>      /dev/mapper/zero-data          32.00MiB
>>>
>>> Unallocated:
>>>      /dev/mapper/vg0-lv0    10.00GiB
>>>      /dev/mapper/zero-data          50.00TiB
>>>
>>>
>>> I expected that I would have data on /dev/mapper/zero-data and metadata
>>> on /dev/mapper/vg0-lv0, but it seems both of them were written to the zero
>>> device. Attempting to actually use the file system eventually fails, since
>>> the metadata is black-holed :)
>>>
>>> Did I make some mistake in how I used it, or is this a bug?
>>>
>>> Thanks,
>>> Boris
>>>
>>>> BR
>>>> G.Baroncelli
>>>>
>>>> Revision:
>>>> V9:
>>>> - rename dev_item->type to dev_item->flags
>>>> - rename /sys/fs/btrfs/$UUID/devinfo/type -> allocation_hint
>>>>
>>>> V8:
>>>> - drop the ioctl API, instead use a sysfs one
>>>>
>>>> V7:
>>>> - make more room in the struct btrfs_ioctl_dev_properties up to 1K
>>>> - leave in btrfs_tree.h only the costants
>>>> - removed the mount option (sic)
>>>> - correct an 'use before check' in the while loop (signaled
>>>>     by Zygo)
>>>> - add a 2nd sort to be sure that the device_info array is in the
>>>>     expected order
>>>>
>>>> V6:
>>>> - add further values to the hints: add the possibility to
>>>>     exclude a disk for a chunk type
>>>>
>>>>
>>>> Goffredo Baroncelli (6):
>>>>     btrfs: add flags to give an hint to the chunk allocator
>>>>     btrfs: export the device allocation_hint property in sysfs
>>>>     btrfs: change the device allocation_hint property via sysfs
>>>>     btrfs: add allocation_hint mode
>>>>     btrfs: rename dev_item->type to dev_item->flags
>>>>     btrfs: add allocation_hint option.
>>>>
>>>>    fs/btrfs/ctree.h                |  18 +++++-
>>>>    fs/btrfs/disk-io.c              |   4 +-
>>>>    fs/btrfs/super.c                |  17 ++++++
>>>>    fs/btrfs/sysfs.c                |  73 ++++++++++++++++++++++
>>>>    fs/btrfs/volumes.c              | 105 ++++++++++++++++++++++++++++++--
>>>>    fs/btrfs/volumes.h              |   7 ++-
>>>>    include/uapi/linux/btrfs_tree.h |  20 +++++-
>>>>    7 files changed, 232 insertions(+), 12 deletions(-)
>>>>
>>>> -- 
>>>> 2.34.1
>>>>
>>
>>
>> -- 
>> gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
>> Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
Boris Burkov Jan. 5, 2022, 6:29 p.m. UTC | #6
On Wed, Jan 05, 2022 at 07:16:04PM +0100, Goffredo Baroncelli wrote:
> On 05/01/2022 19.07, Zygo Blaxell wrote:
> > On Wed, Jan 05, 2022 at 10:16:08AM +0100, Goffredo Baroncelli wrote:
> > > Hi Boris,
> > > 
> > > 
> > > 
> > > On 1/5/22 03:44, Boris Burkov wrote:
> > > [...]
> > > > 
> > > > This is cool, thanks for building it!
> > > > 
> > > > I'm playing with setting this up for a test I'm working on where I want
> > > > to send data to a dm-zero device. To that end, I applied this patchset
> > > > on top of misc-next and ran:
> > > > 
> > > > $ mkfs.btrfs -f /dev/vg0/lv0 -dsingle -msingle
> > > > $ mount /dev/vg0/lv0 /mnt/lol
> > > 
> > > You should mount the filesystem with
> > > 
> > > $ mount -o allocation_hint=1 /dev/vg0/lv0 /mnt/lol
> > > 
> > > In the previous iteration I missed the patch #6, which activates this option. You can drop patch #6 and avoid to pass this option.
> > 
> > Can we drop the mount option from the series?  It isn't needed.
> > 
> > Or, if we must have it (and I am in no way conceding that we do),
> > at least make it default to enabled.  Or turn the mount option into a
> > disable flag under the 'rescue=' option to make it clear this option is
> > not intended to be used in normal operation.
> 
> Frankly speaking it was a my mistake to add this patch. It was in the
> queue, but in the last patches sets I dropped it. However in the last
> one I forgot to drop it manually so it reappeared :-)
> 
> However I like your suggestion to add as 'rescue' option, where the
> default is "enabled".

A mount option adds a lot of testing burden:
enabling it where it was disabled
disabling it where it was enabled
does the above work on remount
does it always print what's expected in /proc/mounts
etc..

So I think it should have a strong justification for adding it, and the
xfstests will need to reflect the above.

Unless it's the best way to support some otherwise impossible recovery
for a realistic failure mode, I would personally suggest just skipping
it. However, I only skimmed through the discussion about recovery in the
older thread, FWIW.

> 
> @Josef,
> do you started to play with this patch ? If not can I send an update
> where the main change is the renaming of the properties from
> 
> PREFERRED_<X> to <X>_PREFERRED
> 
> (e.g. PREFERRED_DATA to DATA_PREFERRED) which are more correct ?
> 
> BR
> G.Baroncelli
> > 
> > > Please give me a feedback if this resolve.
> > > 
> > > BR
> > > G.Baroncelli
> > > 
> > > > $ btrfs device add /dev/mapper/zero-data /mnt/lol
> > > > $ btrfs fi usage /mnt/lol
> > > > Overall:
> > > >       Device size:                  50.01TiB
> > > >       Device allocated:             20.00MiB
> > > >       Device unallocated:           50.01TiB
> > > >       Device missing:                  0.00B
> > > >       Used:                        128.00KiB
> > > >       Free (estimated):             50.01TiB      (min: 50.01TiB)
> > > >       Free (statfs, df):            50.01TiB
> > > >       Data ratio:                       1.00
> > > >       Metadata ratio:                   1.00
> > > >       Global reserve:                3.25MiB      (used: 0.00B)
> > > >       Multiple profiles:                  no
> > > > 
> > > > Data,single: Size:8.00MiB, Used:0.00B (0.00%)
> > > >      /dev/mapper/vg0-lv0     8.00MiB
> > > > 
> > > > Metadata,single: Size:8.00MiB, Used:112.00KiB (1.37%)
> > > >      /dev/mapper/vg0-lv0     8.00MiB
> > > > 
> > > > System,single: Size:4.00MiB, Used:16.00KiB (0.39%)
> > > >      /dev/mapper/vg0-lv0     4.00MiB
> > > > 
> > > > Unallocated:
> > > >      /dev/mapper/vg0-lv0     9.98GiB
> > > >      /dev/mapper/zero-data          50.00TiB
> > > > 
> > > > $ ./btrfs property set -t device /dev/mapper/zero-data allocation_hint DATA_ONLY
> > > > $ ./btrfs property set -t device /dev/vg0/lv0 allocation_hint METADATA_ONLY
> > > > 
> > > > $ btrfs balance start --full-balance /mnt/lol
> > > > Done, had to relocate 3 out of 3 chunks
> > > > 
> > > > $ btrfs fi usage /mnt/lol
> > > > Overall:
> > > >       Device size:                  50.01TiB
> > > >       Device allocated:              2.03GiB
> > > >       Device unallocated:           50.01TiB
> > > >       Device missing:                  0.00B
> > > >       Used:                        640.00KiB
> > > >       Free (estimated):             50.01TiB      (min: 50.01TiB)
> > > >       Free (statfs, df):            50.01TiB
> > > >       Data ratio:                       1.00
> > > >       Metadata ratio:                   1.00
> > > >       Global reserve:                3.25MiB      (used: 0.00B)
> > > >       Multiple profiles:                  no
> > > > 
> > > > Data,single: Size:1.00GiB, Used:512.00KiB (0.05%)
> > > >      /dev/mapper/zero-data           1.00GiB
> > > > 
> > > > Metadata,single: Size:1.00GiB, Used:112.00KiB (0.01%)
> > > >      /dev/mapper/zero-data           1.00GiB
> > > > 
> > > > System,single: Size:32.00MiB, Used:16.00KiB (0.05%)
> > > >      /dev/mapper/zero-data          32.00MiB
> > > > 
> > > > Unallocated:
> > > >      /dev/mapper/vg0-lv0    10.00GiB
> > > >      /dev/mapper/zero-data          50.00TiB
> > > > 
> > > > 
> > > > I expected that I would have data on /dev/mapper/zero-data and metadata
> > > > on /dev/mapper/vg0-lv0, but it seems both of them were written to the zero
> > > > device. Attempting to actually use the file system eventually fails, since
> > > > the metadata is black-holed :)
> > > > 
> > > > Did I make some mistake in how I used it, or is this a bug?
> > > > 
> > > > Thanks,
> > > > Boris
> > > > 
> > > > > BR
> > > > > G.Baroncelli
> > > > > 
> > > > > Revision:
> > > > > V9:
> > > > > - rename dev_item->type to dev_item->flags
> > > > > - rename /sys/fs/btrfs/$UUID/devinfo/type -> allocation_hint
> > > > > 
> > > > > V8:
> > > > > - drop the ioctl API, instead use a sysfs one
> > > > > 
> > > > > V7:
> > > > > - make more room in the struct btrfs_ioctl_dev_properties up to 1K
> > > > > - leave in btrfs_tree.h only the costants
> > > > > - removed the mount option (sic)
> > > > > - correct an 'use before check' in the while loop (signaled
> > > > >     by Zygo)
> > > > > - add a 2nd sort to be sure that the device_info array is in the
> > > > >     expected order
> > > > > 
> > > > > V6:
> > > > > - add further values to the hints: add the possibility to
> > > > >     exclude a disk for a chunk type
> > > > > 
> > > > > 
> > > > > Goffredo Baroncelli (6):
> > > > >     btrfs: add flags to give an hint to the chunk allocator
> > > > >     btrfs: export the device allocation_hint property in sysfs
> > > > >     btrfs: change the device allocation_hint property via sysfs
> > > > >     btrfs: add allocation_hint mode
> > > > >     btrfs: rename dev_item->type to dev_item->flags
> > > > >     btrfs: add allocation_hint option.
> > > > > 
> > > > >    fs/btrfs/ctree.h                |  18 +++++-
> > > > >    fs/btrfs/disk-io.c              |   4 +-
> > > > >    fs/btrfs/super.c                |  17 ++++++
> > > > >    fs/btrfs/sysfs.c                |  73 ++++++++++++++++++++++
> > > > >    fs/btrfs/volumes.c              | 105 ++++++++++++++++++++++++++++++--
> > > > >    fs/btrfs/volumes.h              |   7 ++-
> > > > >    include/uapi/linux/btrfs_tree.h |  20 +++++-
> > > > >    7 files changed, 232 insertions(+), 12 deletions(-)
> > > > > 
> > > > > -- 
> > > > > 2.34.1
> > > > > 
> > > 
> > > 
> > > -- 
> > > gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
> > > Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
> 
> 
> -- 
> gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
> Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
Boris Burkov Jan. 5, 2022, 10:21 p.m. UTC | #7
I noticed this patchset still suffers from some checkpatch failures with
stuff like spaces around parens and minus signs, tabs vs spaces, etc. We
generally try to keep checkpatch happy in btrfs, though of course reason
should always prevail.

FWIW, I have it rebased on kdave/misc-next and ran:
checkpatch.pl -g kdave/misc-next..