mbox series

[v7,0/3] btrfs: try to allocate larger folios for metadata

Message ID cover.1721384771.git.wqu@suse.com (mailing list archive)
Headers show
Series btrfs: try to allocate larger folios for metadata | expand

Message

Qu Wenruo July 19, 2024, 10:28 a.m. UTC
[CHANGELOG]
v7:
- Fix an accidentially removed line caused by previous modification
  attempt
  Previously I was moving that line to the common branch to
  unconditionally define root_mem_cgroup pointer.
  But that's later discarded and changed to use macro definition, but
  forgot to add back the original line.

v6:
- Add a new root_mem_cgroup definition for CONFIG_MEMCG=n cases
  So that users of root_mem_cgroup no longer needs to check
  CONFIG_MEMCG.
  This is to fix the compile error for CONFIG_MEMCG=n cases.

- Slight rewording of the 2nd patch

v5:
- Use root memcgroup to attach folios to btree inode filemap
- Only try higher order folio once without NOFAIL nor extra retry

v4:
- Hide the feature behind CONFIG_BTRFS_DEBUG
  So that end users won't be affected (aka, still per-page based
  allocation) meanwhile we can do more testing on this new behavior.

v3:
- Rebased to the latest for-next branch
- Use PAGE_ALLOC_COSTLY_ORDER to determine whether to use __GFP_NOFAIL
- Add a dependency MM patch "mm/page_alloc: unify the warning on NOFAIL
  and high order allocation"
  This allows us to use NOFAIL up to 32K nodesize, and makes sure for
  default 16K nodesize, all metadata would go 16K folios

v2:
- Rebased to handle the change in "btrfs: cache folio size and shift in extent_buffer"

This is the latest update on the attempt to utilize larger folios for
btrfs metadata.

The previous version exposed a reproducibe hang at btrfs/187, where we
hang at filemap_add_folio() around its memcgroup charge code.

Even without the problem, I still believe for btree inode we do not
really need all the memcgroup charge, nor using __GFP_NOFAIL to work
around the possible memcgroup limits.

So in this update, suggested by the memcgroup people from SUSE, there is
a new patch to make btree inode filemap folio attaching to use the root
memcgroup, so that we won't be limited by the memcgroup.

Then for the patch enabling the larger folio, I reverted back to the old
behavior that we only try larger folio once without extra retry, just to
be extra safe.


Qu Wenruo (3):
  memcontrol: define root_mem_cgroup for CONFIG_MEMCG=n cases
  btrfs: always uses root memcgroup for filemap_add_folio()
  btrfs: prefer to allocate larger folio for metadata

 fs/btrfs/extent_io.c       | 112 ++++++++++++++++++++++++++-----------
 include/linux/memcontrol.h |   6 ++
 2 files changed, 84 insertions(+), 34 deletions(-)