mbox series

[GIT,PULL] Btrfs updates for 5.13

Message ID cover.1619466460.git.dsterba@suse.com (mailing list archive)
State New, archived
Headers show
Series [GIT,PULL] Btrfs updates for 5.13 | expand

Pull-request

git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-5.13-tag

Message

David Sterba April 26, 2021, 7:59 p.m. UTC
From: David Sterba <dsterba@suse.cz>

Hi,

the updates this time are mostly stabilization, preparation and minor
improvements.

Please pull, thanks.

User visible improvements:

- readahead for send, improving run time of full send by 10% and for
  incremental by 25%

- make reflinks respect O_SYNC, O_DSYNC and S_SYNC flags

- export supported sectorsize values in sysfs (currently only page size,
  more once full subpage support lands)

- more graceful errors and warnings on 32bit systems when logical
  addresses for metadata reach the limit posed by unsigned long in
  page::index
  - error: fail mount if there's a metadata block beyond the limit
  - error: new metadata block would be at unreachable address
  - warn when 5/8th of the limit is reached, for 4K page systems it's
    10T, for 64K page it's 160T

- zoned mode
  - relocated zones get reset at the end instead of discard
  - automatic background reclaim of zones that have 75%+ of unusable
    space, the threshold is tunable in sysfs

Fixes:

- fsync and tree mod log fixes

- fix inefficient preemptive reclaim calculations

- fix exhaustion of the system chunk array due to concurrent allocations

- fix fallback to no compression when racing with remount

- preemptive fix for dm-crypt on zoned device that does not properly
  advertise zoned support

Core changes:

- add inode lock to synchronize mmap and other block updates (eg.
  deduplication, fallocate, fsync)

- kmap conversions to new kmap_local API

- subpage support (continued)
  - new helpers for page state/extent buffer tracking
  - metadata changes now support read and write

- error handling through out relocation call paths

- many other cleanups and code simplifications

----------------------------------------------------------------
The following changes since commit bf05bf16c76bb44ab5156223e1e58e26dfe30a88:

  Linux 5.12-rc8 (2021-04-18 14:45:32 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-5.13-tag

for you to fetch changes up to 18bb8bbf13c1839b43c9e09e76d397b753989af2:

  btrfs: zoned: automatically reclaim zones (2021-04-20 20:46:31 +0200)

----------------------------------------------------------------
Anand Jain (3):
      btrfs: unexport btrfs_extent_readonly() and make it static
      btrfs: change return type to bool in btrfs_extent_readonly
      btrfs: scrub: drop a few function declarations

Arnd Bergmann (1):
      btrfs: zoned: bail out in btrfs_alloc_chunk for bad input

BingJing Chang (1):
      btrfs: fix a potential hole punching failure

Filipe Manana (21):
      btrfs: add btree read ahead for full send operations
      btrfs: add btree read ahead for incremental send operations
      btrfs: fix race between memory mapped writes and fsync
      btrfs: fix race between marking inode needs to be logged and log syncing
      btrfs: remove stale comment and logic from btrfs_inode_in_log()
      btrfs: move the tree mod log code into its own file
      btrfs: use booleans where appropriate for the tree mod log functions
      btrfs: use a bit to track the existence of tree mod log users
      btrfs: use the new bit BTRFS_FS_TREE_MOD_LOG_USERS at btrfs_free_tree_block()
      btrfs: remove unnecessary leaf check at btrfs_tree_mod_log_free_eb()
      btrfs: add and use helper to get lowest sequence number for the tree mod log
      btrfs: update debug message when checking seq number of a delayed ref
      btrfs: update outdated comment at btrfs_orphan_cleanup()
      btrfs: update outdated comment at btrfs_replace_file_extents()
      btrfs: make reflinks respect O_SYNC O_DSYNC and S_SYNC flags
      btrfs: fix exhaustion of the system chunk array due to concurrent allocations
      btrfs: improve btree readahead for full send operations
      btrfs: fix race between transaction aborts and fsyncs leading to use-after-free
      btrfs: fix metadata extent leak after failure to create subvolume
      btrfs: fix race when picking most recent mod log operation for an old root
      btrfs: zoned: fix unpaired block group unfreeze during device replace

Goldwyn Rodrigues (2):
      btrfs: remove force argument from run_delalloc_nocow()
      btrfs: remove mirror argument from btrfs_csum_verify_data()

Ira Weiny (4):
      btrfs: convert kmap to kmap_local_page, simple cases
      btrfs: raid56: convert kmaps to kmap_local_page
      btrfs: integrity-checker: use kmap_local_page in __btrfsic_submit_bio
      btrfs: integrity-checker: convert block context kmap's to kmap_local_page

Jiapeng Chong (1):
      btrfs: assign proper values to a bool variable in dev_extent_hole_check_zoned

Johannes Thumshirn (5):
      btrfs: remove duplicated in_range() macro
      btrfs: zoned: fail mount if the device does not support zone append
      btrfs: zoned: reset zones of relocated block groups
      btrfs: rename delete_unused_bgs_mutex to reclaim_bgs_lock
      btrfs: zoned: automatically reclaim zones

Josef Bacik (44):
      btrfs: add a i_mmap_lock to our inode
      btrfs: use btrfs_inode_lock/btrfs_inode_unlock inode lock helpers
      btrfs: exclude mmaps while doing remap
      btrfs: exclude mmap from happening during all fallocate operations
      btrfs: use percpu_read_positive instead of sum_positive for need_preempt
      btrfs: convert some BUG_ON()'s to ASSERT()'s in do_relocation
      btrfs: convert BUG_ON()'s in relocate_tree_block
      btrfs: handle errors from select_reloc_root()
      btrfs: convert BUG_ON()'s in select_reloc_root() to proper errors
      btrfs: check record_root_in_trans related failures in select_reloc_root
      btrfs: do proper error handling in record_reloc_root_in_trans
      btrfs: handle btrfs_record_root_in_trans failure in btrfs_rename_exchange
      btrfs: handle btrfs_record_root_in_trans failure in btrfs_rename
      btrfs: handle btrfs_record_root_in_trans failure in btrfs_delete_subvolume
      btrfs: handle btrfs_record_root_in_trans failure in btrfs_recover_log_trees
      btrfs: handle btrfs_record_root_in_trans failure in create_subvol
      btrfs: handle btrfs_record_root_in_trans failure in relocate_tree_block
      btrfs: handle btrfs_record_root_in_trans failure in start_transaction
      btrfs: handle record_root_in_trans failure in qgroup_account_snapshot
      btrfs: handle record_root_in_trans failure in btrfs_record_root_in_trans
      btrfs: handle record_root_in_trans failure in create_pending_snapshot
      btrfs: return an error from btrfs_record_root_in_trans
      btrfs: have proper error handling in btrfs_init_reloc_root
      btrfs: do proper error handling in create_reloc_root
      btrfs: validate root::reloc_root after recording root in trans
      btrfs: handle btrfs_update_reloc_root failure in commit_fs_roots
      btrfs: change insert_dirty_subvol to return errors
      btrfs: handle btrfs_update_reloc_root failure in insert_dirty_subvol
      btrfs: handle btrfs_update_reloc_root failure in prepare_to_merge
      btrfs: do proper error handling in btrfs_update_reloc_root
      btrfs: convert logic BUG_ON()'s in replace_path to ASSERT()'s
      btrfs: handle btrfs_cow_block errors in replace_path
      btrfs: handle btrfs_search_slot failure in replace_path
      btrfs: handle errors in reference count manipulation in replace_path
      btrfs: handle extent reference errors in do_relocation
      btrfs: tree-checker: check for BTRFS_BLOCK_FLAG_FULL_BACKREF being set improperly
      btrfs: remove the extent item sanity checks in relocate_block_group
      btrfs: do proper error handling in create_reloc_inode
      btrfs: handle __add_reloc_root failures in btrfs_recover_relocation
      btrfs: do not panic in __add_reloc_root
      btrfs: cleanup error handling in prepare_to_merge
      btrfs: handle extent corruption with select_one_root properly
      btrfs: do proper error handling in merge_reloc_roots
      btrfs: check return value of btrfs_commit_transaction in relocation

Matthew Wilcox (Oracle) (1):
      btrfs: add and use readahead_batch_length

Naohiro Aota (1):
      btrfs: zoned: move log tree node allocation out of log_root_tree->log_mutex

Nikolay Borisov (8):
      btrfs: make btrfs_replace_file_extents take btrfs_inode
      btrfs: make find_desired_extent take btrfs_inode
      btrfs: replace offset_in_entry with in_range
      btrfs: replace open coded while loop with proper construct
      btrfs: simplify commit logic in try_flush_qgroup
      btrfs: remove btrfs_inode parameter from btrfs_delayed_inode_reserve_metadata
      btrfs: simplify code flow in btrfs_delayed_inode_reserve_metadata
      btrfs: don't opencode extent_changeset_free

Qu Wenruo (19):
      btrfs: fix comment for btrfs ordered extent flag bits
      btrfs: add sysfs interface for supported sectorsize
      btrfs: use min() to replace open-code in btrfs_invalidatepage()
      btrfs: remove unnecessary variable shadowing in btrfs_invalidatepage()
      btrfs: subpage: introduce helpers for dirty status
      btrfs: subpage: introduce helpers for writeback status
      btrfs: subpage: do more sanity checks on metadata page dirtying
      btrfs: subpage: support metadata checksum calculation at write time
      btrfs: make alloc_extent_buffer() check subpage dirty bitmap
      btrfs: support page uptodate assertions in subpage mode
      btrfs: make set/clear_extent_buffer_dirty() subpage compatible
      btrfs: make set_btree_ioerr accept extent buffer and be subpage compatible
      btrfs: subpage: add overview comments
      btrfs: introduce end_bio_subpage_eb_writepage() function
      btrfs: introduce write_one_subpage_eb() function
      btrfs: make lock_extent_buffer_for_io() to be subpage compatible
      btrfs: introduce submit_eb_subpage() to submit a subpage metadata page
      btrfs: handle remount to no compress during compression
      btrfs: more graceful errors/warnings on 32bit systems when reaching limits

Wan Jiabing (1):
      btrfs: move forward declarations to the beginning of extent_io.h

 fs/btrfs/Makefile            |   2 +-
 fs/btrfs/backref.c           |  33 +-
 fs/btrfs/block-group.c       | 207 ++++++++-
 fs/btrfs/block-group.h       |   3 +
 fs/btrfs/btrfs_inode.h       |  33 +-
 fs/btrfs/check-integrity.c   |  14 +-
 fs/btrfs/compression.c       |  15 +-
 fs/btrfs/ctree.c             | 984 +++----------------------------------------
 fs/btrfs/ctree.h             |  80 ++--
 fs/btrfs/delayed-inode.c     |  35 +-
 fs/btrfs/delayed-ref.c       |  31 +-
 fs/btrfs/disk-io.c           | 162 +++++--
 fs/btrfs/extent-tree.c       |  21 +-
 fs/btrfs/extent_io.c         | 439 ++++++++++++++++---
 fs/btrfs/extent_io.h         |   4 +-
 fs/btrfs/file-item.c         |   1 +
 fs/btrfs/file.c              | 118 +++---
 fs/btrfs/free-space-cache.c  |   9 +-
 fs/btrfs/inode.c             | 125 +++---
 fs/btrfs/ioctl.c             |  51 ++-
 fs/btrfs/lzo.c               |   9 +-
 fs/btrfs/ordered-data.c      |  19 +-
 fs/btrfs/ordered-data.h      |   4 +-
 fs/btrfs/qgroup.c            |  47 +--
 fs/btrfs/raid56.c            |  70 +--
 fs/btrfs/reflink.c           |  65 ++-
 fs/btrfs/relocation.c        | 448 +++++++++++++++-----
 fs/btrfs/scrub.c             |  13 +-
 fs/btrfs/send.c              |  43 +-
 fs/btrfs/space-info.c        |   4 +-
 fs/btrfs/subpage.c           | 140 ++++++
 fs/btrfs/subpage.h           |   7 +
 fs/btrfs/super.c             |  26 ++
 fs/btrfs/sysfs.c             |  50 +++
 fs/btrfs/transaction.c       |  59 ++-
 fs/btrfs/transaction.h       |   9 +-
 fs/btrfs/tree-checker.c      |   5 +
 fs/btrfs/tree-log.c          |  21 +-
 fs/btrfs/tree-mod-log.c      | 929 ++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/tree-mod-log.h      |  53 +++
 fs/btrfs/volumes.c           | 123 ++++--
 fs/btrfs/volumes.h           |   1 +
 fs/btrfs/zoned.c             |   7 +
 fs/btrfs/zoned.h             |   6 +
 include/linux/pagemap.h      |   9 +
 include/trace/events/btrfs.h |  12 +
 46 files changed, 2964 insertions(+), 1582 deletions(-)
 create mode 100644 fs/btrfs/tree-mod-log.c
 create mode 100644 fs/btrfs/tree-mod-log.h

Comments

Linus Torvalds April 26, 2021, 8:55 p.m. UTC | #1
I've pulled this, but:

On Mon, Apr 26, 2021 at 1:01 PM David Sterba <dsterba@suse.com> wrote:
>
> Matthew Wilcox (Oracle) (1):
>       btrfs: add and use readahead_batch_length

This one is buggy, or at least questionable.

Yes, yes, the function looks trivial. That doesn't make it right:

  static inline loff_t readahead_batch_length(struct readahead_control *rac)
  {
          return rac->_batch_count * PAGE_SIZE;
  }

the above does not get the types right, and silently does different
typecasting than the code clearly intends from the return type of the
function.

It may not matter much in practice, but it's still wrong.

               Linus
pr-tracker-bot@kernel.org April 26, 2021, 8:57 p.m. UTC | #2
The pull request you sent on Mon, 26 Apr 2021 21:59:06 +0200:

> git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-5.13-tag

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/55ba0fe059a577fa08f23223991b24564962620f

Thank you!
Matthew Wilcox April 26, 2021, 9:07 p.m. UTC | #3
On Mon, Apr 26, 2021 at 01:55:03PM -0700, Linus Torvalds wrote:
> I've pulled this, but:
> 
> On Mon, Apr 26, 2021 at 1:01 PM David Sterba <dsterba@suse.com> wrote:
> >
> > Matthew Wilcox (Oracle) (1):
> >       btrfs: add and use readahead_batch_length
> 
> This one is buggy, or at least questionable.
> 
> Yes, yes, the function looks trivial. That doesn't make it right:
> 
>   static inline loff_t readahead_batch_length(struct readahead_control *rac)
>   {
>           return rac->_batch_count * PAGE_SIZE;
>   }
> 
> the above does not get the types right, and silently does different
> typecasting than the code clearly intends from the return type of the
> function.
> 
> It may not matter much in practice, but it's still wrong.

Thanks.  You're right that it doesn't matter in practice (because a
batch length is always much, much less than 4GB), but I'll fix it to
return a size_t (which is just the obvious s/loff_t/size_t/, because
PAGE_SIZE is an unsigned long).