Message ID | 20220708200829.3682503-2-iangelak@fb.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs: Annotate transaction wait events with | expand |
On 8.07.22 г. 23:08 ч., Ioannis Angelakopoulos wrote: > Annotate the num_writers wait event in fs/btrfs/transaction.c with lockdep > in order to catch deadlocks involving this wait event. > > Use a read/write lockdep map for the annotation. A thread starting/joining > the transaction acquires the map as a reader when it increments > cur_trans->num_writers and it acquires the map as a writer before it > blocks on the wait event. > > > Signed-off-by: Ioannis Angelakopoulos <iangelak@fb.com> > --- > fs/btrfs/ctree.h | 14 ++++++++++++++ > fs/btrfs/disk-io.c | 6 ++++++ > fs/btrfs/transaction.c | 39 +++++++++++++++++++++++++++++++++++---- > 3 files changed, 55 insertions(+), 4 deletions(-) > > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h > index 4e2569f84aab..160c22391cb5 100644 > --- a/fs/btrfs/ctree.h > +++ b/fs/btrfs/ctree.h > @@ -1091,6 +1091,8 @@ struct btrfs_fs_info { > /* Updates are not protected by any lock */ > struct btrfs_commit_stats commit_stats; > > + struct lockdep_map btrfs_trans_num_writers_map; > + > #ifdef CONFIG_BTRFS_FS_REF_VERIFY > spinlock_t ref_verify_lock; > struct rb_root block_tree; > @@ -1172,6 +1174,18 @@ enum { > BTRFS_ROOT_UNFINISHED_DROP, > }; > > +#define btrfs_might_wait_for_event(b, lock) \ > + do { \ > + rwsem_acquire(&b->lock##_map, 0, 0, _THIS_IP_); \ > + rwsem_release(&b->lock##_map, _THIS_IP_); \ > + } while (0) > + > +#define btrfs_lockdep_acquire(b, lock) \ > + rwsem_acquire_read(&b->lock##_map, 0, 0, _THIS_IP_) > + > +#define btrfs_lockdep_release(b, lock) \ > + rwsem_release(&b->lock##_map, _THIS_IP_) > + > static inline void btrfs_wake_unfinished_drop(struct btrfs_fs_info *fs_info) > { > clear_and_wake_up_bit(BTRFS_FS_UNFINISHED_DROPS, &fs_info->flags); > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c > index 76835394a61b..4061886024de 100644 > --- a/fs/btrfs/disk-io.c > +++ b/fs/btrfs/disk-io.c > @@ -3029,6 +3029,8 @@ static int __cold init_tree_roots(struct btrfs_fs_info *fs_info) > > void btrfs_init_fs_info(struct btrfs_fs_info *fs_info) > { > + static struct lock_class_key btrfs_trans_num_writers_key; Shouldn't this variable and its initialization be defined in one of the__init functions (i.e init_btrfs_fs() )for the btrfs' kernel module? As it stands this would be called on every mount of any instance of a btrfs filesystem? <snip>
On 08.07.22 22:10, Ioannis Angelakopoulos wrote: > +#define btrfs_might_wait_for_event(b, lock) \ > + do { \ > + rwsem_acquire(&b->lock##_map, 0, 0, _THIS_IP_); \ > + rwsem_release(&b->lock##_map, _THIS_IP_); \ > + } while (0) > + > +#define btrfs_lockdep_acquire(b, lock) \ > + rwsem_acquire_read(&b->lock##_map, 0, 0, _THIS_IP_) > + > +#define btrfs_lockdep_release(b, lock) \ > + rwsem_release(&b->lock##_map, _THIS_IP_) Shouldn't this be only defined for CONFIG_LOCKDEP=y and be stubbed out for CONFIG_LOCKDEP=n?
On Mon, Jul 11, 2022 at 04:04:01PM +0300, Nikolay Borisov wrote: > > + rwsem_release(&b->lock##_map, _THIS_IP_) > > + > > static inline void btrfs_wake_unfinished_drop(struct btrfs_fs_info *fs_info) > > { > > clear_and_wake_up_bit(BTRFS_FS_UNFINISHED_DROPS, &fs_info->flags); > > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c > > index 76835394a61b..4061886024de 100644 > > --- a/fs/btrfs/disk-io.c > > +++ b/fs/btrfs/disk-io.c > > @@ -3029,6 +3029,8 @@ static int __cold init_tree_roots(struct btrfs_fs_info *fs_info) > > > > void btrfs_init_fs_info(struct btrfs_fs_info *fs_info) > > { > > + static struct lock_class_key btrfs_trans_num_writers_key; > > Shouldn't this variable and its initialization be defined in one of > the__init functions (i.e init_btrfs_fs() )for the btrfs' kernel module? > As it stands this would be called on every mount of any instance of a > btrfs filesystem? Yeah I think it should be initialized once for the whole module, however a static variable in __init function is not safe as the whole section gets removed from memory once all functions get called (module vs built-in), either an __initdata annotation or a static variable outside of a function (eg. like fs_uuids in super.c).
On Wed, Jul 13, 2022 at 03:43:14PM +0000, Johannes Thumshirn wrote: > On 08.07.22 22:10, Ioannis Angelakopoulos wrote: > > +#define btrfs_might_wait_for_event(b, lock) \ > > + do { \ > > + rwsem_acquire(&b->lock##_map, 0, 0, _THIS_IP_); \ > > + rwsem_release(&b->lock##_map, _THIS_IP_); \ > > + } while (0) > > + > > +#define btrfs_lockdep_acquire(b, lock) \ > > + rwsem_acquire_read(&b->lock##_map, 0, 0, _THIS_IP_) > > + > > +#define btrfs_lockdep_release(b, lock) \ > > + rwsem_release(&b->lock##_map, _THIS_IP_) > > Shouldn't this be only defined for CONFIG_LOCKDEP=y and be > stubbed out for CONFIG_LOCKDEP=n? Yes, all related code should be under the lockdep ifdef, struct lockdep_map btrfs_trans_num_writers_map too.
On Fri, Jul 08, 2022 at 01:08:30PM -0700, Ioannis Angelakopoulos wrote: > @@ -2207,8 +2221,10 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans) > ret = READ_ONCE(prev_trans->aborted); > > btrfs_put_transaction(prev_trans); > - if (ret) > + if (ret) { > + btrfs_lockdep_release(fs_info, btrfs_trans_num_writers); > goto cleanup_transaction; Please move the call btrfs_lockdep_release to the cleanup block before the cleanup_transaction label so it's not repeated everywhere. > + btrfs_lockdep_release(fs_info, btrfs_trans_num_writers); > goto cleanup_transaction; > + btrfs_lockdep_release(fs_info, btrfs_trans_num_writers); > goto cleanup_transaction; > + btrfs_lockdep_release(fs_info, btrfs_trans_num_writers); > goto cleanup_transaction; > + btrfs_lockdep_release(fs_info, btrfs_trans_num_writers); > goto cleanup_transaction;
On 13.07.22 г. 18:43 ч., Johannes Thumshirn wrote: > On 08.07.22 22:10, Ioannis Angelakopoulos wrote: >> +#define btrfs_might_wait_for_event(b, lock) \ >> + do { \ >> + rwsem_acquire(&b->lock##_map, 0, 0, _THIS_IP_); \ >> + rwsem_release(&b->lock##_map, _THIS_IP_); \ >> + } while (0) >> + >> +#define btrfs_lockdep_acquire(b, lock) \ >> + rwsem_acquire_read(&b->lock##_map, 0, 0, _THIS_IP_) >> + >> +#define btrfs_lockdep_release(b, lock) \ >> + rwsem_release(&b->lock##_map, _THIS_IP_) > > Shouldn't this be only defined for CONFIG_LOCKDEP=y and be > stubbed out for CONFIG_LOCKDEP=n? rwsem_acquire/rwsem_release is actually stubbed when lockdep is disabled i.e check lock_acquire/lock_release. So in effect this is not a problem.
On 14.07.22 15:28, Nikolay Borisov wrote: > > > On 13.07.22 г. 18:43 ч., Johannes Thumshirn wrote: >> On 08.07.22 22:10, Ioannis Angelakopoulos wrote: >>> +#define btrfs_might_wait_for_event(b, lock) \ >>> + do { \ >>> + rwsem_acquire(&b->lock##_map, 0, 0, _THIS_IP_); \ >>> + rwsem_release(&b->lock##_map, _THIS_IP_); \ >>> + } while (0) >>> + >>> +#define btrfs_lockdep_acquire(b, lock) \ >>> + rwsem_acquire_read(&b->lock##_map, 0, 0, _THIS_IP_) >>> + >>> +#define btrfs_lockdep_release(b, lock) \ >>> + rwsem_release(&b->lock##_map, _THIS_IP_) >> >> Shouldn't this be only defined for CONFIG_LOCKDEP=y and be >> stubbed out for CONFIG_LOCKDEP=n? > > > rwsem_acquire/rwsem_release is actually stubbed when lockdep is disabled > i.e check lock_acquire/lock_release. So in effect this is not a problem. > Ah yeah its rwsem_acquire() `-> lock_acquire_exclusive() `-> lock_acquire() With lock_acquire() being stubbed out.
On 14.07.22 г. 16:13 ч., David Sterba wrote: > On Mon, Jul 11, 2022 at 04:04:01PM +0300, Nikolay Borisov wrote: >>> + rwsem_release(&b->lock##_map, _THIS_IP_) >>> + >>> static inline void btrfs_wake_unfinished_drop(struct btrfs_fs_info *fs_info) >>> { >>> clear_and_wake_up_bit(BTRFS_FS_UNFINISHED_DROPS, &fs_info->flags); >>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c >>> index 76835394a61b..4061886024de 100644 >>> --- a/fs/btrfs/disk-io.c >>> +++ b/fs/btrfs/disk-io.c >>> @@ -3029,6 +3029,8 @@ static int __cold init_tree_roots(struct btrfs_fs_info *fs_info) >>> >>> void btrfs_init_fs_info(struct btrfs_fs_info *fs_info) >>> { >>> + static struct lock_class_key btrfs_trans_num_writers_key; >> >> Shouldn't this variable and its initialization be defined in one of >> the__init functions (i.e init_btrfs_fs() )for the btrfs' kernel module? >> As it stands this would be called on every mount of any instance of a >> btrfs filesystem? > > Yeah I think it should be initialized once for the whole module, however > a static variable in __init function is not safe as the whole section > gets removed from memory once all functions get called (module vs > built-in), either an __initdata annotation or a static variable outside > of a function (eg. like fs_uuids in super.c). Actually the code as is makes sense, since the initialization is about the lockdep map and not the lock_class_key, as such it needs to be called in a place where we have a reference to btrfs_fs_info and since this is per-fs, it makes sense to be in btrfs_init_fs_info.
On 7/14/22 6:21 AM, David Sterba wrote: > On Fri, Jul 08, 2022 at 01:08:30PM -0700, Ioannis Angelakopoulos wrote: >> @@ -2207,8 +2221,10 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans) >> ret = READ_ONCE(prev_trans->aborted); >> >> btrfs_put_transaction(prev_trans); >> - if (ret) >> + if (ret) { >> + btrfs_lockdep_release(fs_info, btrfs_trans_num_writers); >> goto cleanup_transaction; > > Please move the call btrfs_lockdep_release to the cleanup block before > the cleanup_transaction label so it's not repeated everywhere. > Unfortunately, I cannot do this with the code as it is. If the commit thread releases the btrfs_num_writers lock as reader before the wait_event and then jumps to either unlock_reloc or scrub_continue labels later in the execution, it will try to release the lock again in cleanup_transaction resulting in a double release. >> + btrfs_lockdep_release(fs_info, btrfs_trans_num_writers); >> goto cleanup_transaction; > >> + btrfs_lockdep_release(fs_info, btrfs_trans_num_writers); >> goto cleanup_transaction; > >> + btrfs_lockdep_release(fs_info, btrfs_trans_num_writers); >> goto cleanup_transaction; > >> + btrfs_lockdep_release(fs_info, btrfs_trans_num_writers); >> goto cleanup_transaction;
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 4e2569f84aab..160c22391cb5 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1091,6 +1091,8 @@ struct btrfs_fs_info { /* Updates are not protected by any lock */ struct btrfs_commit_stats commit_stats; + struct lockdep_map btrfs_trans_num_writers_map; + #ifdef CONFIG_BTRFS_FS_REF_VERIFY spinlock_t ref_verify_lock; struct rb_root block_tree; @@ -1172,6 +1174,18 @@ enum { BTRFS_ROOT_UNFINISHED_DROP, }; +#define btrfs_might_wait_for_event(b, lock) \ + do { \ + rwsem_acquire(&b->lock##_map, 0, 0, _THIS_IP_); \ + rwsem_release(&b->lock##_map, _THIS_IP_); \ + } while (0) + +#define btrfs_lockdep_acquire(b, lock) \ + rwsem_acquire_read(&b->lock##_map, 0, 0, _THIS_IP_) + +#define btrfs_lockdep_release(b, lock) \ + rwsem_release(&b->lock##_map, _THIS_IP_) + static inline void btrfs_wake_unfinished_drop(struct btrfs_fs_info *fs_info) { clear_and_wake_up_bit(BTRFS_FS_UNFINISHED_DROPS, &fs_info->flags); diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 76835394a61b..4061886024de 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3029,6 +3029,8 @@ static int __cold init_tree_roots(struct btrfs_fs_info *fs_info) void btrfs_init_fs_info(struct btrfs_fs_info *fs_info) { + static struct lock_class_key btrfs_trans_num_writers_key; + xa_init_flags(&fs_info->fs_roots, GFP_ATOMIC); xa_init_flags(&fs_info->extent_buffers, GFP_ATOMIC); INIT_LIST_HEAD(&fs_info->trans_list); @@ -3057,6 +3059,10 @@ void btrfs_init_fs_info(struct btrfs_fs_info *fs_info) mutex_init(&fs_info->zoned_data_reloc_io_lock); seqlock_init(&fs_info->profiles_lock); + lockdep_init_map(&fs_info->btrfs_trans_num_writers_map, + "btrfs_trans_num_writers", + &btrfs_trans_num_writers_key, 0); + INIT_LIST_HEAD(&fs_info->dirty_cowonly_roots); INIT_LIST_HEAD(&fs_info->space_info); INIT_LIST_HEAD(&fs_info->tree_mod_seq_list); diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 0a50d5746f6f..7ef24e1f1446 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -313,6 +313,7 @@ static noinline int join_transaction(struct btrfs_fs_info *fs_info, atomic_inc(&cur_trans->num_writers); extwriter_counter_inc(cur_trans, type); spin_unlock(&fs_info->trans_lock); + btrfs_lockdep_acquire(fs_info, btrfs_trans_num_writers); return 0; } spin_unlock(&fs_info->trans_lock); @@ -334,16 +335,20 @@ static noinline int join_transaction(struct btrfs_fs_info *fs_info, if (!cur_trans) return -ENOMEM; + btrfs_lockdep_acquire(fs_info, btrfs_trans_num_writers); + spin_lock(&fs_info->trans_lock); if (fs_info->running_transaction) { /* * someone started a transaction after we unlocked. Make sure * to redo the checks above */ + btrfs_lockdep_release(fs_info, btrfs_trans_num_writers); kfree(cur_trans); goto loop; } else if (BTRFS_FS_ERROR(fs_info)) { spin_unlock(&fs_info->trans_lock); + btrfs_lockdep_release(fs_info, btrfs_trans_num_writers); kfree(cur_trans); return -EROFS; } @@ -1020,6 +1025,9 @@ static int __btrfs_end_transaction(struct btrfs_trans_handle *trans, extwriter_counter_dec(cur_trans, trans->type); cond_wake_up(&cur_trans->writer_wait); + + btrfs_lockdep_release(info, btrfs_trans_num_writers); + btrfs_put_transaction(cur_trans); if (current->journal_info == trans) @@ -1980,6 +1988,12 @@ static void cleanup_transaction(struct btrfs_trans_handle *trans, int err) if (cur_trans == fs_info->running_transaction) { cur_trans->state = TRANS_STATE_COMMIT_DOING; spin_unlock(&fs_info->trans_lock); + + /* + * The thread has already released the lockdep map as reader already in + * btrfs_commit_transaction(). + */ + btrfs_might_wait_for_event(fs_info, btrfs_trans_num_writers); wait_event(cur_trans->writer_wait, atomic_read(&cur_trans->num_writers) == 1); @@ -2207,8 +2221,10 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans) ret = READ_ONCE(prev_trans->aborted); btrfs_put_transaction(prev_trans); - if (ret) + if (ret) { + btrfs_lockdep_release(fs_info, btrfs_trans_num_writers); goto cleanup_transaction; + } } else { spin_unlock(&fs_info->trans_lock); } @@ -2222,6 +2238,7 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans) */ if (BTRFS_FS_ERROR(fs_info)) { ret = -EROFS; + btrfs_lockdep_release(fs_info, btrfs_trans_num_writers); goto cleanup_transaction; } } @@ -2235,20 +2252,26 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans) extwriter_counter_dec(cur_trans, trans->type); ret = btrfs_start_delalloc_flush(fs_info); - if (ret) + if (ret) { + btrfs_lockdep_release(fs_info, btrfs_trans_num_writers); goto cleanup_transaction; + } ret = btrfs_run_delayed_items(trans); - if (ret) + if (ret) { + btrfs_lockdep_release(fs_info, btrfs_trans_num_writers); goto cleanup_transaction; + } wait_event(cur_trans->writer_wait, extwriter_counter_read(cur_trans) == 0); /* some pending stuffs might be added after the previous flush. */ ret = btrfs_run_delayed_items(trans); - if (ret) + if (ret) { + btrfs_lockdep_release(fs_info, btrfs_trans_num_writers); goto cleanup_transaction; + } btrfs_wait_delalloc_flush(fs_info); @@ -2270,6 +2293,14 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans) add_pending_snapshot(trans); cur_trans->state = TRANS_STATE_COMMIT_DOING; spin_unlock(&fs_info->trans_lock); + + /* + * The thread has started/joined the transaction thus it holds the lockdep + * map as a reader. It has to release it before acquiring the lockdep map + * as a writer. + */ + btrfs_lockdep_release(fs_info, btrfs_trans_num_writers); + btrfs_might_wait_for_event(fs_info, btrfs_trans_num_writers); wait_event(cur_trans->writer_wait, atomic_read(&cur_trans->num_writers) == 1);
Annotate the num_writers wait event in fs/btrfs/transaction.c with lockdep in order to catch deadlocks involving this wait event. Use a read/write lockdep map for the annotation. A thread starting/joining the transaction acquires the map as a reader when it increments cur_trans->num_writers and it acquires the map as a writer before it blocks on the wait event. Signed-off-by: Ioannis Angelakopoulos <iangelak@fb.com> --- fs/btrfs/ctree.h | 14 ++++++++++++++ fs/btrfs/disk-io.c | 6 ++++++ fs/btrfs/transaction.c | 39 +++++++++++++++++++++++++++++++++++---- 3 files changed, 55 insertions(+), 4 deletions(-)