mbox series

[0/3] btrfs: reduce the memory usage for replace in btrfs_io_context.

Message ID cover.1674893735.git.wqu@suse.com (mailing list archive)
Headers show
Series btrfs: reduce the memory usage for replace in btrfs_io_context. | expand

Message

Qu Wenruo Jan. 28, 2023, 8:23 a.m. UTC
In btrfs_io_context, we have two members dedicated for dev-replace:

- num_tgtdevs
  This is straight-forward, just the number of extra stripes for replace
  usage.

- tgtdev_map[]
  This is a little complex, it represents the mapping between the
  original stripes and dev-replace stripes.

  This is mostly for RAID56, as only in RAID56 the stripes contain
  different contents, thus it's important to know the mapping.

  It goes like this:

    num_stripes = 4 (3 + 1 for replace)
    stripes[0]:		dev = devid 1, physical = X
    stripes[1]:		dev = devid 2, physical = Y
    stripes[2]:		dev = devid 3, physical = Z
    stripes[3]:		dev = devid 0, physical = Y

    num_tgtdevs = 1
    tgtdev_map[0] = 0	<- Means stripes[0] is not involved in replace.
    tgtdev_map[1] = 3	<- Means stripes[1] is involved in replace,
			   and it's duplicated to stripes[3].
    tgtdev_map[2] = 0	<- Means stripes[2] is not involved in replace.

  Thus most space is wasted, and the more devices in the array, the more
  space wasted.


For the current tgtdev_map[] design, it's wasting quite some space.
E.g. in the above case, we only need on slot to record the source stripe
number, and the other two slots are just a waste of space.

The existing tgtdev_map[] will make more sense if we support multiple
running dev-replaces, but that's not the case.

So this patch would mostly change it to a new, and more space efficient
way, by going something like this for the same example:

  replace_nr_stripes = 1
  tgtdev_map[0] = 1	<- Means stripes[1] is involved in replace.
  tgtdev_map[1] = -1	<- Means the second slot is not used.
		 	   (Only DUP can use this slot, but they
			    don't really care)

Furthermore we reduce the width of nr_stripes related member to u16, the
same as on-disk format width.

This not only saved some space for btrfs_io_context structure, but also
allows the following cleanups:

- Streamline handle_ops_on_dev_replace()
  We go a common path for both WRITE and GET_READ_MIRRORS, and only
  for DUP and GET_READ_MIRRORS, we shrink the bioc to keep the same
  old behavior.

- Remove some unnecessary variables

Although the series still increases the number of lines, the net
increase mostly comes from comments, in fact around 70 lines of comments
are added around the replace related members.


Qu Wenruo (3):
  btrfs: simplify the @bioc argument for handle_ops_on_dev_replace()
  btrfs: small improvement for btrfs_io_context structure
  btrfs: use a more space efficient way to represent the source of
    duplicated stripes

 fs/btrfs/raid56.c  |  44 +++++++++--
 fs/btrfs/scrub.c   |   4 +-
 fs/btrfs/volumes.c | 187 +++++++++++++++++++++------------------------
 fs/btrfs/volumes.h |  52 +++++++++++--
 4 files changed, 174 insertions(+), 113 deletions(-)