mbox series

[GIT,PULL] Btrfs updates for 6.10

Message ID cover.1715616501.git.dsterba@suse.com (mailing list archive)
State New
Headers show
Series [GIT,PULL] Btrfs updates for 6.10 | expand

Pull-request

git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git tags/for-6.10-tag

Message

David Sterba May 13, 2024, 4:20 p.m. UTC
Hi,

this update brings a few minor performance improvements, otherwise
there's a lot of refactoring, cleanups and other sort of not user
visible changes.

Please pull, thanks.


Performance improvements

- inline b-tree locking functions, improvement in metadata-heavy changes

- relax locking on a range that's being reflinked, allows read operations to
  run in parallel

- speed up NOCOW write checks (throughput +9% on a sample test)

- extent locking ranges have been reduced in several places, namely
  around delayed ref processing

Core

- more page to folio conversions
  - relocation
  - send
  - compression
  - inline extent handling
  - super block write and wait

- extent_map structure optimizations
  - reduced structure size
  - code simplifications
  - add shrinker for allocated objects, the numbers can go high and could
    exhaust memory on smaller systems (reported) as they may not get an
    opportunity to be freed fast enough

- extent locking optimizations
  - reduce locking ranges where it does not seem to be necessary and
    are safe due to other means of synchronization
  - potential improvements due to lower contention, allocation/freeing
    and state management operations of extent state tracking structures

- delayed ref cleanups and simplifications

- updated trace points

- improved error handling, warnings and assertions

- cleanups and refactoring, unification of error handling paths

----------------------------------------------------------------
The following changes since commit dccb07f2914cdab2ac3a5b6c98406f765acab803:

  Merge tag 'for-6.9-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux (2024-05-06 13:43:13 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git tags/for-6.10-tag

for you to fetch changes up to 0e39c9e524479b85c1b83134df0cfc6e3cb5353a:

  btrfs: qgroup: fix initialization of auto inherit array (2024-05-07 21:31:11 +0200)

----------------------------------------------------------------
Anand Jain (20):
      btrfs: rename err to ret in btrfs_initxattrs()
      btrfs: rename err to ret in btrfs_rmdir()
      btrfs: rename err to ret in btrfs_cont_expand()
      btrfs: rename err to ret in btrfs_ioctl_snap_destroy()
      btrfs: rename err to ret in __set_extent_bit()
      btrfs: rename err to ret in convert_extent_bit()
      btrfs: rename err to ret in __btrfs_end_transaction()
      btrfs: rename err to ret in create_reloc_inode()
      btrfs: rename err to ret in btrfs_dirty_pages()
      btrfs: rename err to ret in prepare_pages()
      btrfs: rename err to ret in btrfs_direct_write()
      btrfs: report filemap_fdata<write|wait>_range() error
      btrfs: rename werr and err to ret in btrfs_write_marked_extents()
      btrfs: rename werr and err to ret in __btrfs_wait_marked_extents()
      btrfs: rename err and ret to ret in build_backref_tree()
      btrfs: reuse ret instead of err in relocate_tree_blocks()
      btrfs: drop variable err in quick_update_accounting()
      btrfs: rename return variables in btrfs_qgroup_rescan_worker()
      btrfs: simplify return variables in lookup_extent_data_ref()
      btrfs: simplify return variables in btrfs_drop_subtree()

Boris Burkov (1):
      btrfs: free PERTRANS at the end of cleanup_transaction()

Dan Carpenter (2):
      btrfs: qgroup: delete unnecessary check in btrfs_qgroup_check_inherit()
      btrfs: qgroup: fix initialization of auto inherit array

David Sterba (1):
      btrfs: use btrfs_is_testing() everywhere

Filipe Manana (39):
      btrfs: remove pointless BUG_ON() when creating snapshot
      btrfs: locking: inline btrfs_tree_lock() and btrfs_tree_read_lock()
      btrfs: locking: rename __btrfs_tree_lock() and __btrfs_tree_read_lock()
      btrfs: remove pointless readahead callback wrapper
      btrfs: remove pointless writepages callback wrapper
      btrfs: avoid pointless wake ups of drew lock readers
      btrfs: stop locking the source extent range during reflink
      btrfs: remove not needed mod_start and mod_len from struct extent_map
      btrfs: remove pointless return value assignment at btrfs_finish_one_ordered()
      btrfs: remove list_empty() check at warn_about_uncommitted_trans()
      btrfs: remove no longer used btrfs_clone_chunk_map()
      btrfs: move btrfs_page_mkwrite() from inode.c into file.c
      btrfs: add function comment to btrfs_lookup_csums_list()
      btrfs: remove search_commit parameter from btrfs_lookup_csums_list()
      btrfs: remove use of a temporary list at btrfs_lookup_csums_list()
      btrfs: simplify error path for btrfs_lookup_csums_list()
      btrfs: make NOCOW checks for existence of checksums in a range more efficient
      btrfs: open code csum_exist_in_range()
      btrfs: pass an inode to btrfs_add_extent_mapping()
      btrfs: tests: error out on unexpected extent map reference count
      btrfs: simplify add_extent_mapping() by removing pointless label
      btrfs: export find_next_inode() as btrfs_find_first_inode()
      btrfs: use btrfs_find_first_inode() at btrfs_prune_dentries()
      btrfs: pass the extent map tree's inode to add_extent_mapping()
      btrfs: pass the extent map tree's inode to clear_em_logging()
      btrfs: pass the extent map tree's inode to remove_extent_mapping()
      btrfs: pass the extent map tree's inode to replace_extent_mapping()
      btrfs: pass the extent map tree's inode to setup_extent_mapping()
      btrfs: pass the extent map tree's inode to try_merge_map()
      btrfs: add a global per cpu counter to track number of used extent maps
      btrfs: add a shrinker for extent maps
      btrfs: update comment for btrfs_set_inode_full_sync() about locking
      btrfs: add tracepoints for extent map shrinker events
      btrfs: rename some variables at try_release_extent_mapping()
      btrfs: use btrfs_get_fs_generation() at try_release_extent_mapping()
      btrfs: remove i_size restriction at try_release_extent_mapping()
      btrfs: be better releasing extent maps at try_release_extent_mapping()
      btrfs: make try_release_extent_mapping() return a bool
      btrfs: initialize delayed inodes xarray without GFP_ATOMIC

Goldwyn Rodrigues (3):
      btrfs: page to folio conversion: prealloc_file_extent_cluster()
      btrfs: convert relocate_one_page() to folios and rename
      btrfs: convert put_file_data() to folios

Josef Bacik (38):
      btrfs: add a helper to get the delayed ref node from the data/tree ref
      btrfs: embed data_ref and tree_ref in btrfs_delayed_ref_node
      btrfs: do not use a function to initialize btrfs_ref
      btrfs: move ref_root into btrfs_ref
      btrfs: pass btrfs_ref to init_delayed_ref_common
      btrfs: initialize btrfs_delayed_ref_head with btrfs_ref
      btrfs: move ref specific initialization into init_delayed_ref_common
      btrfs: simplify delayed ref tracepoints
      btrfs: unify the btrfs_add_delayed_*_ref helpers into one helper
      btrfs: rename ->len to ->num_bytes in btrfs_ref
      btrfs: move ->parent and ->ref_root into btrfs_delayed_ref_node
      btrfs: rename btrfs_data_ref->ino to ->objectid
      btrfs: make __btrfs_inc_extent_ref take a btrfs_delayed_ref_node
      btrfs: drop unnecessary arguments from __btrfs_free_extent
      btrfs: make the insert backref helpers take a btrfs_delayed_ref_node
      btrfs: stop referencing btrfs_delayed_data_ref directly
      btrfs: stop referencing btrfs_delayed_tree_ref directly
      btrfs: remove the btrfs_delayed_ref_node container helpers
      btrfs: replace btrfs_delayed_*_ref with btrfs_*_ref
      btrfs: set start on clone before calling copy_extent_buffer_full
      btrfs: change root->root_key.objectid to btrfs_root_id()
      btrfs: handle errors in btrfs_reloc_clone_csums properly
      btrfs: push all inline logic into cow_file_range
      btrfs: unlock all the pages with successful inline extent creation
      btrfs: move extent bit and page cleanup into cow_file_range_inline
      btrfs: lock extent when doing inline extent in compression
      btrfs: push the extent lock into btrfs_run_delalloc_range
      btrfs: push extent lock into run_delalloc_nocow
      btrfs: adjust while loop condition in run_delalloc_nocow
      btrfs: push extent lock down in run_delalloc_nocow
      btrfs: remove unlock_extent from run_delalloc_compressed
      btrfs: push extent lock into run_delalloc_cow
      btrfs: push extent lock into cow_file_range
      btrfs: push lock_extent into cow_file_range_inline
      btrfs: move can_cow_file_range_inline() outside of the extent lock
      btrfs: push lock_extent down in cow_file_range()
      btrfs: push extent lock down in submit_one_async_extent
      btrfs: add a cached state to extent_clear_unlock_delalloc

Matthew Wilcox (Oracle) (5):
      bio: Export bio_add_folio_nofail to modules
      btrfs: convert super block writes to folio in wait_dev_supers()
      btrfs: convert super block writes to folio in write_dev_supers()
      btrfs: use the folio iterator in btrfs_end_super_write()
      btrfs: count super block write errors in device instead of tracking folio error state

Naohiro Aota (1):
      btrfs: drop unused argument of calcu_metadata_size()

Qu Wenruo (9):
      btrfs: compression: add error handling for missed page cache
      btrfs: compression: convert page allocation to folio interfaces
      btrfs: make insert_inline_extent() accept one page directly
      btrfs: migrate insert_inline_extent() to folio interfaces
      btrfs: introduce btrfs_alloc_folio_array()
      btrfs: compression: migrate compression/decompression paths to folios
      btrfs: add extra comments on extent_map members
      btrfs: simplify the inline extent map creation
      btrfs: add extra sanity checks for create_io_em()

Tavian Barnes (2):
      btrfs: add helper to clear EXTENT_BUFFER_READING
      btrfs: warn if EXTENT_BUFFER_UPTODATE is set while reading

Thorsten Blum (1):
      btrfs: remove duplicate included header from fs.h

 block/bio.c                       |   1 +
 fs/btrfs/backref.c                |  48 +-
 fs/btrfs/block-rsv.c              |  11 +-
 fs/btrfs/btrfs_inode.h            |  10 +-
 fs/btrfs/compression.c            | 119 +++--
 fs/btrfs/compression.h            |  42 +-
 fs/btrfs/ctree.c                  |  51 +--
 fs/btrfs/defrag.c                 |   2 +-
 fs/btrfs/delayed-inode.c          |   2 +-
 fs/btrfs/delayed-ref.c            | 365 +++++----------
 fs/btrfs/delayed-ref.h            | 148 +++---
 fs/btrfs/disk-io.c                | 157 +++----
 fs/btrfs/export.c                 |   8 +-
 fs/btrfs/extent-io-tree.c         |  58 +--
 fs/btrfs/extent-tree.c            | 366 +++++++--------
 fs/btrfs/extent_io.c              | 223 +++++----
 fs/btrfs/extent_io.h              |  11 +-
 fs/btrfs/extent_map.c             | 316 ++++++++++---
 fs/btrfs/extent_map.h             |  67 ++-
 fs/btrfs/file-item.c              |  90 ++--
 fs/btrfs/file-item.h              |   3 +-
 fs/btrfs/file.c                   | 327 ++++++++++----
 fs/btrfs/fs.h                     |   5 +-
 fs/btrfs/inode-item.c             |  16 +-
 fs/btrfs/inode.c                  | 923 +++++++++++++++++---------------------
 fs/btrfs/ioctl.c                  |  86 ++--
 fs/btrfs/locking.c                |  26 +-
 fs/btrfs/locking.h                |  18 +-
 fs/btrfs/lzo.c                    |  89 ++--
 fs/btrfs/ordered-data.c           |   8 +-
 fs/btrfs/ordered-data.h           |   1 +
 fs/btrfs/props.c                  |   2 +-
 fs/btrfs/qgroup.c                 |  79 ++--
 fs/btrfs/ref-verify.c             |   8 +-
 fs/btrfs/reflink.c                |  56 +--
 fs/btrfs/relocation.c             | 417 ++++++++---------
 fs/btrfs/root-tree.c              |   3 +-
 fs/btrfs/send.c                   |  74 +--
 fs/btrfs/super.c                  |  33 +-
 fs/btrfs/sysfs.c                  |   8 +-
 fs/btrfs/tests/btrfs-tests.c      |   3 +-
 fs/btrfs/tests/extent-map-tests.c | 216 +++++----
 fs/btrfs/transaction.c            |  76 ++--
 fs/btrfs/tree-checker.c           |   2 +-
 fs/btrfs/tree-log.c               |  46 +-
 fs/btrfs/tree-mod-log.c           |   2 +-
 fs/btrfs/volumes.c                |  15 -
 fs/btrfs/volumes.h                |  10 +-
 fs/btrfs/xattr.c                  |  10 +-
 fs/btrfs/zlib.c                   | 112 ++---
 fs/btrfs/zstd.c                   |  80 ++--
 include/trace/events/btrfs.h      | 158 +++++--
 52 files changed, 2650 insertions(+), 2357 deletions(-)

Comments

pr-tracker-bot@kernel.org May 15, 2024, 12:53 a.m. UTC | #1
The pull request you sent on Mon, 13 May 2024 18:20:55 +0200:

> git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git tags/for-6.10-tag

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/a3d1f54d7aa4c3be2c6a10768d4ffa1dcb620da9

Thank you!
Linus Torvalds May 16, 2024, 12:31 a.m. UTC | #2
On Mon, 13 May 2024 at 09:28, David Sterba <dsterba@suse.com> wrote:
>
>   git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git tags/for-6.10-tag

So I initially blamed a GPU driver for the following problem, but Dave
Airlie seems to think it's unlikely that problem would cause this kind
of corruption, so now it looks like it might just be btrfs itself:

  BUG: Bad page state in process kworker/u261:13  pfn:31fb9a
  page: refcount:0 mapcount:0 mapping:00000000ff0b239e index:0x37ce8
pfn:0x31fb9a
  aops:btree_aops ino:1
  flags: 0x2fffc600000020c(referenced|uptodate|workingset|node=0|zone=2|lastcpupid=0x3fff)
  page_type: 0xffffffff()
  raw: 02fffc600000020c dead000000000100 dead000000000122 ffff9b191efb0338
  raw: 0000000000037ce8 0000000000000000 00000000ffffffff 0000000000000000
  page dumped because: non-NULL mapping
  CPU: 18 PID: 141351 Comm: kworker/u261:13 Tainted: G        W
  6.9.0-07381-g3860ca371740 #60
  Workqueue: btrfs-delayed-meta btrfs_work_helper
  Call Trace:
   bad_page+0xe0/0xf0
   free_unref_page_prepare+0x363/0x380
   ? __count_memcg_events+0x63/0xd0
   free_unref_page+0x33/0x1f0
   ? __mem_cgroup_uncharge+0x80/0xb0
   __folio_put+0x62/0x80
   release_extent_buffer+0xad/0x110
   btrfs_force_cow_block+0x68f/0x890
   btrfs_cow_block+0xe5/0x240
   btrfs_search_slot+0x30e/0x9f0
   btrfs_lookup_inode+0x31/0xb0
   __btrfs_update_delayed_inode+0x5c/0x350
   ? kfree+0x80/0x250
   __btrfs_commit_inode_delayed_items+0x7a1/0x7d0
   btrfs_async_run_delayed_root+0xf7/0x1b0
   btrfs_work_helper+0xc0/0x320
   process_scheduled_works+0x196/0x360
   worker_thread+0x2b8/0x370
   ? pr_cont_work+0x190/0x190
   kthread+0x111/0x120
   ? kthread_blkcg+0x30/0x30
   ret_from_fork+0x30/0x40
   ? kthread_blkcg+0x30/0x30
   ret_from_fork_asm+0x11/0x20

Note the line

    page dumped because: non-NULL mapping

but the actual mapping pointer isn't a valid kernel pointer. I suspect
that may be due to pointer hashing, though. I'm not convinced that's a
great idea for this case, but hey, here we are. Sometimes those "don't
leak kernel pointers" things cause problems for debugging.

Anyway, it looks like the btrfs_cow_block -> btrfs_force_cow_block ->
release_extent_buffer -> __folio_put path might be releasing a page
that is still attached to a mapping. Perhaps some page counting
imbalance?

This all happened under fairly normal - for me - workstation loads. I
was (of course) doing an allmodconfig kernel build after a pull, and I
had a handful of terminals and the web browser open. Nothing
particularly interesting or odd.

Does the above make any btrfs people go "Ahh, I see how that would be
a problem"?

            Linus
Qu Wenruo May 16, 2024, 9:01 a.m. UTC | #3
在 2024/5/16 10:01, Linus Torvalds 写道:
> On Mon, 13 May 2024 at 09:28, David Sterba <dsterba@suse.com> wrote:
>>
>>    git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git tags/for-6.10-tag
>
> So I initially blamed a GPU driver for the following problem, but Dave
> Airlie seems to think it's unlikely that problem would cause this kind
> of corruption, so now it looks like it might just be btrfs itself:
>
>    BUG: Bad page state in process kworker/u261:13  pfn:31fb9a
>    page: refcount:0 mapcount:0 mapping:00000000ff0b239e index:0x37ce8
> pfn:0x31fb9a
>    aops:btree_aops ino:1
>    flags: 0x2fffc600000020c(referenced|uptodate|workingset|node=0|zone=2|lastcpupid=0x3fff)
>    page_type: 0xffffffff()
>    raw: 02fffc600000020c dead000000000100 dead000000000122 ffff9b191efb0338
>    raw: 0000000000037ce8 0000000000000000 00000000ffffffff 0000000000000000
>    page dumped because: non-NULL mapping
>    CPU: 18 PID: 141351 Comm: kworker/u261:13 Tainted: G        W
>    6.9.0-07381-g3860ca371740 #60
>    Workqueue: btrfs-delayed-meta btrfs_work_helper
>    Call Trace:
>     bad_page+0xe0/0xf0
>     free_unref_page_prepare+0x363/0x380
>     ? __count_memcg_events+0x63/0xd0
>     free_unref_page+0x33/0x1f0
>     ? __mem_cgroup_uncharge+0x80/0xb0
>     __folio_put+0x62/0x80
>     release_extent_buffer+0xad/0x110
>     btrfs_force_cow_block+0x68f/0x890
>     btrfs_cow_block+0xe5/0x240
>     btrfs_search_slot+0x30e/0x9f0
>     btrfs_lookup_inode+0x31/0xb0
>     __btrfs_update_delayed_inode+0x5c/0x350
>     ? kfree+0x80/0x250
>     __btrfs_commit_inode_delayed_items+0x7a1/0x7d0
>     btrfs_async_run_delayed_root+0xf7/0x1b0
>     btrfs_work_helper+0xc0/0x320
>     process_scheduled_works+0x196/0x360
>     worker_thread+0x2b8/0x370
>     ? pr_cont_work+0x190/0x190
>     kthread+0x111/0x120
>     ? kthread_blkcg+0x30/0x30
>     ret_from_fork+0x30/0x40
>     ? kthread_blkcg+0x30/0x30
>     ret_from_fork_asm+0x11/0x20
>
> Note the line
>
>      page dumped because: non-NULL mapping
>
> but the actual mapping pointer isn't a valid kernel pointer. I suspect
> that may be due to pointer hashing, though. I'm not convinced that's a
> great idea for this case, but hey, here we are. Sometimes those "don't
> leak kernel pointers" things cause problems for debugging.
>
> Anyway, it looks like the btrfs_cow_block -> btrfs_force_cow_block ->
> release_extent_buffer -> __folio_put path might be releasing a page
> that is still attached to a mapping. Perhaps some page counting
> imbalance?
>
> This all happened under fairly normal - for me - workstation loads. I
> was (of course) doing an allmodconfig kernel build after a pull, and I
> had a handful of terminals and the web browser open. Nothing
> particularly interesting or odd.

Considering aarch64 is going more and more common, is the workstation
also an aarch64 platform? (the Ampere one?)
If so, mind to share the page size and the fs sectorsize?
That would at least help us to know if it's the subpage routine or the
regular routine.

Thanks,
Qu

>
> Does the above make any btrfs people go "Ahh, I see how that would be
> a problem"?
>
>              Linus
>
Linus Torvalds May 16, 2024, 3:47 p.m. UTC | #4
On Thu, 16 May 2024 at 02:02, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
> Considering aarch64 is going more and more common, is the workstation
> also an aarch64 platform? (the Ampere one?)

No, this happened on my regular old AMD Threadripper.

                   Linus