diff mbox series

[1/2] btrfs: Add a lockdep model for the num_writers wait event

Message ID 20220708200829.3682503-2-iangelak@fb.com (mailing list archive)
State New, archived
Headers show
Series btrfs: Annotate transaction wait events with | expand

Commit Message

Ioannis Angelakopoulos July 8, 2022, 8:08 p.m. UTC
Annotate the num_writers wait event in fs/btrfs/transaction.c with lockdep
in order to catch deadlocks involving this wait event.

Use a read/write lockdep map for the annotation. A thread starting/joining
the transaction acquires the map as a reader when it increments
cur_trans->num_writers and it acquires the map as a writer before it
blocks on the wait event.


Signed-off-by: Ioannis Angelakopoulos <iangelak@fb.com>
---
 fs/btrfs/ctree.h       | 14 ++++++++++++++
 fs/btrfs/disk-io.c     |  6 ++++++
 fs/btrfs/transaction.c | 39 +++++++++++++++++++++++++++++++++++----
 3 files changed, 55 insertions(+), 4 deletions(-)

Comments

Nikolay Borisov July 11, 2022, 1:04 p.m. UTC | #1
On 8.07.22 г. 23:08 ч., Ioannis Angelakopoulos wrote:
> Annotate the num_writers wait event in fs/btrfs/transaction.c with lockdep
> in order to catch deadlocks involving this wait event.
> 
> Use a read/write lockdep map for the annotation. A thread starting/joining
> the transaction acquires the map as a reader when it increments
> cur_trans->num_writers and it acquires the map as a writer before it
> blocks on the wait event.
> 
> 
> Signed-off-by: Ioannis Angelakopoulos <iangelak@fb.com>
> ---
>   fs/btrfs/ctree.h       | 14 ++++++++++++++
>   fs/btrfs/disk-io.c     |  6 ++++++
>   fs/btrfs/transaction.c | 39 +++++++++++++++++++++++++++++++++++----
>   3 files changed, 55 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 4e2569f84aab..160c22391cb5 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -1091,6 +1091,8 @@ struct btrfs_fs_info {
>   	/* Updates are not protected by any lock */
>   	struct btrfs_commit_stats commit_stats;
>   
> +	struct lockdep_map btrfs_trans_num_writers_map;
> +
>   #ifdef CONFIG_BTRFS_FS_REF_VERIFY
>   	spinlock_t ref_verify_lock;
>   	struct rb_root block_tree;
> @@ -1172,6 +1174,18 @@ enum {
>   	BTRFS_ROOT_UNFINISHED_DROP,
>   };
>   
> +#define btrfs_might_wait_for_event(b, lock) \
> +	do { \
> +		rwsem_acquire(&b->lock##_map, 0, 0, _THIS_IP_); \
> +		rwsem_release(&b->lock##_map, _THIS_IP_); \
> +	} while (0)
> +
> +#define btrfs_lockdep_acquire(b, lock) \
> +	rwsem_acquire_read(&b->lock##_map, 0, 0, _THIS_IP_)
> +
> +#define btrfs_lockdep_release(b, lock) \
> +	rwsem_release(&b->lock##_map, _THIS_IP_)
> +
>   static inline void btrfs_wake_unfinished_drop(struct btrfs_fs_info *fs_info)
>   {
>   	clear_and_wake_up_bit(BTRFS_FS_UNFINISHED_DROPS, &fs_info->flags);
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 76835394a61b..4061886024de 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -3029,6 +3029,8 @@ static int __cold init_tree_roots(struct btrfs_fs_info *fs_info)
>   
>   void btrfs_init_fs_info(struct btrfs_fs_info *fs_info)
>   {
> +	static struct lock_class_key btrfs_trans_num_writers_key;

Shouldn't this variable and its initialization be defined in one of 
the__init functions (i.e init_btrfs_fs() )for the btrfs' kernel module? 
As it stands this would be called on every mount of any instance of a 
btrfs filesystem?

<snip>
Johannes Thumshirn July 13, 2022, 3:43 p.m. UTC | #2
On 08.07.22 22:10, Ioannis Angelakopoulos wrote:
> +#define btrfs_might_wait_for_event(b, lock) \
> +	do { \
> +		rwsem_acquire(&b->lock##_map, 0, 0, _THIS_IP_); \
> +		rwsem_release(&b->lock##_map, _THIS_IP_); \
> +	} while (0)
> +
> +#define btrfs_lockdep_acquire(b, lock) \
> +	rwsem_acquire_read(&b->lock##_map, 0, 0, _THIS_IP_)
> +
> +#define btrfs_lockdep_release(b, lock) \
> +	rwsem_release(&b->lock##_map, _THIS_IP_)

Shouldn't this be only defined for CONFIG_LOCKDEP=y and be
stubbed out for CONFIG_LOCKDEP=n?
David Sterba July 14, 2022, 1:13 p.m. UTC | #3
On Mon, Jul 11, 2022 at 04:04:01PM +0300, Nikolay Borisov wrote:
> > +	rwsem_release(&b->lock##_map, _THIS_IP_)
> > +
> >   static inline void btrfs_wake_unfinished_drop(struct btrfs_fs_info *fs_info)
> >   {
> >   	clear_and_wake_up_bit(BTRFS_FS_UNFINISHED_DROPS, &fs_info->flags);
> > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> > index 76835394a61b..4061886024de 100644
> > --- a/fs/btrfs/disk-io.c
> > +++ b/fs/btrfs/disk-io.c
> > @@ -3029,6 +3029,8 @@ static int __cold init_tree_roots(struct btrfs_fs_info *fs_info)
> >   
> >   void btrfs_init_fs_info(struct btrfs_fs_info *fs_info)
> >   {
> > +	static struct lock_class_key btrfs_trans_num_writers_key;
> 
> Shouldn't this variable and its initialization be defined in one of 
> the__init functions (i.e init_btrfs_fs() )for the btrfs' kernel module? 
> As it stands this would be called on every mount of any instance of a 
> btrfs filesystem?

Yeah I think it should be initialized once for the whole module, however
a static variable in __init function is not safe as the whole section
gets removed from memory once all functions get called (module vs
built-in), either an __initdata annotation or a static variable outside
of a function (eg. like fs_uuids in super.c).
David Sterba July 14, 2022, 1:20 p.m. UTC | #4
On Wed, Jul 13, 2022 at 03:43:14PM +0000, Johannes Thumshirn wrote:
> On 08.07.22 22:10, Ioannis Angelakopoulos wrote:
> > +#define btrfs_might_wait_for_event(b, lock) \
> > +	do { \
> > +		rwsem_acquire(&b->lock##_map, 0, 0, _THIS_IP_); \
> > +		rwsem_release(&b->lock##_map, _THIS_IP_); \
> > +	} while (0)
> > +
> > +#define btrfs_lockdep_acquire(b, lock) \
> > +	rwsem_acquire_read(&b->lock##_map, 0, 0, _THIS_IP_)
> > +
> > +#define btrfs_lockdep_release(b, lock) \
> > +	rwsem_release(&b->lock##_map, _THIS_IP_)
> 
> Shouldn't this be only defined for CONFIG_LOCKDEP=y and be
> stubbed out for CONFIG_LOCKDEP=n?

Yes, all related code should be under the lockdep ifdef, struct
lockdep_map btrfs_trans_num_writers_map too.
David Sterba July 14, 2022, 1:21 p.m. UTC | #5
On Fri, Jul 08, 2022 at 01:08:30PM -0700, Ioannis Angelakopoulos wrote:
> @@ -2207,8 +2221,10 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
>  			ret = READ_ONCE(prev_trans->aborted);
>  
>  			btrfs_put_transaction(prev_trans);
> -			if (ret)
> +			if (ret) {
> +				btrfs_lockdep_release(fs_info, btrfs_trans_num_writers);
>  				goto cleanup_transaction;

Please move the call btrfs_lockdep_release to the cleanup block before
the cleanup_transaction label so it's not repeated everywhere.

> +			btrfs_lockdep_release(fs_info, btrfs_trans_num_writers);
>  			goto cleanup_transaction;

> +		btrfs_lockdep_release(fs_info, btrfs_trans_num_writers);
>  		goto cleanup_transaction;

> +		btrfs_lockdep_release(fs_info, btrfs_trans_num_writers);
>  		goto cleanup_transaction;

> +		btrfs_lockdep_release(fs_info, btrfs_trans_num_writers);
>  		goto cleanup_transaction;
Nikolay Borisov July 14, 2022, 1:28 p.m. UTC | #6
On 13.07.22 г. 18:43 ч., Johannes Thumshirn wrote:
> On 08.07.22 22:10, Ioannis Angelakopoulos wrote:
>> +#define btrfs_might_wait_for_event(b, lock) \
>> +	do { \
>> +		rwsem_acquire(&b->lock##_map, 0, 0, _THIS_IP_); \
>> +		rwsem_release(&b->lock##_map, _THIS_IP_); \
>> +	} while (0)
>> +
>> +#define btrfs_lockdep_acquire(b, lock) \
>> +	rwsem_acquire_read(&b->lock##_map, 0, 0, _THIS_IP_)
>> +
>> +#define btrfs_lockdep_release(b, lock) \
>> +	rwsem_release(&b->lock##_map, _THIS_IP_)
> 
> Shouldn't this be only defined for CONFIG_LOCKDEP=y and be
> stubbed out for CONFIG_LOCKDEP=n?


rwsem_acquire/rwsem_release is actually stubbed when lockdep is disabled 
i.e check lock_acquire/lock_release. So in effect this is not a problem.
Johannes Thumshirn July 14, 2022, 1:35 p.m. UTC | #7
On 14.07.22 15:28, Nikolay Borisov wrote:
> 
> 
> On 13.07.22 г. 18:43 ч., Johannes Thumshirn wrote:
>> On 08.07.22 22:10, Ioannis Angelakopoulos wrote:
>>> +#define btrfs_might_wait_for_event(b, lock) \
>>> +	do { \
>>> +		rwsem_acquire(&b->lock##_map, 0, 0, _THIS_IP_); \
>>> +		rwsem_release(&b->lock##_map, _THIS_IP_); \
>>> +	} while (0)
>>> +
>>> +#define btrfs_lockdep_acquire(b, lock) \
>>> +	rwsem_acquire_read(&b->lock##_map, 0, 0, _THIS_IP_)
>>> +
>>> +#define btrfs_lockdep_release(b, lock) \
>>> +	rwsem_release(&b->lock##_map, _THIS_IP_)
>>
>> Shouldn't this be only defined for CONFIG_LOCKDEP=y and be
>> stubbed out for CONFIG_LOCKDEP=n?
> 
> 
> rwsem_acquire/rwsem_release is actually stubbed when lockdep is disabled 
> i.e check lock_acquire/lock_release. So in effect this is not a problem.
> 

Ah yeah its

rwsem_acquire()
`-> lock_acquire_exclusive()
    `-> lock_acquire()

With lock_acquire() being stubbed out.
Nikolay Borisov July 14, 2022, 2:24 p.m. UTC | #8
On 14.07.22 г. 16:13 ч., David Sterba wrote:
> On Mon, Jul 11, 2022 at 04:04:01PM +0300, Nikolay Borisov wrote:
>>> +	rwsem_release(&b->lock##_map, _THIS_IP_)
>>> +
>>>    static inline void btrfs_wake_unfinished_drop(struct btrfs_fs_info *fs_info)
>>>    {
>>>    	clear_and_wake_up_bit(BTRFS_FS_UNFINISHED_DROPS, &fs_info->flags);
>>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>>> index 76835394a61b..4061886024de 100644
>>> --- a/fs/btrfs/disk-io.c
>>> +++ b/fs/btrfs/disk-io.c
>>> @@ -3029,6 +3029,8 @@ static int __cold init_tree_roots(struct btrfs_fs_info *fs_info)
>>>    
>>>    void btrfs_init_fs_info(struct btrfs_fs_info *fs_info)
>>>    {
>>> +	static struct lock_class_key btrfs_trans_num_writers_key;
>>
>> Shouldn't this variable and its initialization be defined in one of
>> the__init functions (i.e init_btrfs_fs() )for the btrfs' kernel module?
>> As it stands this would be called on every mount of any instance of a
>> btrfs filesystem?
> 
> Yeah I think it should be initialized once for the whole module, however
> a static variable in __init function is not safe as the whole section
> gets removed from memory once all functions get called (module vs
> built-in), either an __initdata annotation or a static variable outside
> of a function (eg. like fs_uuids in super.c).


Actually the code as is makes sense, since the initialization is about 
the lockdep map and not the lock_class_key, as such it needs to be 
called in a place where we have a reference to btrfs_fs_info and since 
this is per-fs, it makes sense to be in btrfs_init_fs_info.
Ioannis Angelakopoulos July 14, 2022, 4:29 p.m. UTC | #9
On 7/14/22 6:21 AM, David Sterba wrote:
> On Fri, Jul 08, 2022 at 01:08:30PM -0700, Ioannis Angelakopoulos wrote:
>> @@ -2207,8 +2221,10 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
>>   			ret = READ_ONCE(prev_trans->aborted);
>>   
>>   			btrfs_put_transaction(prev_trans);
>> -			if (ret)
>> +			if (ret) {
>> +				btrfs_lockdep_release(fs_info, btrfs_trans_num_writers);
>>   				goto cleanup_transaction;
> 
> Please move the call btrfs_lockdep_release to the cleanup block before
> the cleanup_transaction label so it's not repeated everywhere.
>
Unfortunately, I cannot do this with the code as it is. If the commit 
thread releases the btrfs_num_writers lock as reader before the 
wait_event and then jumps to either unlock_reloc or scrub_continue 
labels later in the execution, it will try to release the lock again in 
cleanup_transaction resulting in a double release.
>> +			btrfs_lockdep_release(fs_info, btrfs_trans_num_writers);
>>   			goto cleanup_transaction;
> 
>> +		btrfs_lockdep_release(fs_info, btrfs_trans_num_writers);
>>   		goto cleanup_transaction;
> 
>> +		btrfs_lockdep_release(fs_info, btrfs_trans_num_writers);
>>   		goto cleanup_transaction;
> 
>> +		btrfs_lockdep_release(fs_info, btrfs_trans_num_writers);
>>   		goto cleanup_transaction;
diff mbox series

Patch

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 4e2569f84aab..160c22391cb5 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1091,6 +1091,8 @@  struct btrfs_fs_info {
 	/* Updates are not protected by any lock */
 	struct btrfs_commit_stats commit_stats;
 
+	struct lockdep_map btrfs_trans_num_writers_map;
+
 #ifdef CONFIG_BTRFS_FS_REF_VERIFY
 	spinlock_t ref_verify_lock;
 	struct rb_root block_tree;
@@ -1172,6 +1174,18 @@  enum {
 	BTRFS_ROOT_UNFINISHED_DROP,
 };
 
+#define btrfs_might_wait_for_event(b, lock) \
+	do { \
+		rwsem_acquire(&b->lock##_map, 0, 0, _THIS_IP_); \
+		rwsem_release(&b->lock##_map, _THIS_IP_); \
+	} while (0)
+
+#define btrfs_lockdep_acquire(b, lock) \
+	rwsem_acquire_read(&b->lock##_map, 0, 0, _THIS_IP_)
+
+#define btrfs_lockdep_release(b, lock) \
+	rwsem_release(&b->lock##_map, _THIS_IP_)
+
 static inline void btrfs_wake_unfinished_drop(struct btrfs_fs_info *fs_info)
 {
 	clear_and_wake_up_bit(BTRFS_FS_UNFINISHED_DROPS, &fs_info->flags);
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 76835394a61b..4061886024de 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3029,6 +3029,8 @@  static int __cold init_tree_roots(struct btrfs_fs_info *fs_info)
 
 void btrfs_init_fs_info(struct btrfs_fs_info *fs_info)
 {
+	static struct lock_class_key btrfs_trans_num_writers_key;
+
 	xa_init_flags(&fs_info->fs_roots, GFP_ATOMIC);
 	xa_init_flags(&fs_info->extent_buffers, GFP_ATOMIC);
 	INIT_LIST_HEAD(&fs_info->trans_list);
@@ -3057,6 +3059,10 @@  void btrfs_init_fs_info(struct btrfs_fs_info *fs_info)
 	mutex_init(&fs_info->zoned_data_reloc_io_lock);
 	seqlock_init(&fs_info->profiles_lock);
 
+	lockdep_init_map(&fs_info->btrfs_trans_num_writers_map,
+					 "btrfs_trans_num_writers",
+					 &btrfs_trans_num_writers_key, 0);
+
 	INIT_LIST_HEAD(&fs_info->dirty_cowonly_roots);
 	INIT_LIST_HEAD(&fs_info->space_info);
 	INIT_LIST_HEAD(&fs_info->tree_mod_seq_list);
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 0a50d5746f6f..7ef24e1f1446 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -313,6 +313,7 @@  static noinline int join_transaction(struct btrfs_fs_info *fs_info,
 		atomic_inc(&cur_trans->num_writers);
 		extwriter_counter_inc(cur_trans, type);
 		spin_unlock(&fs_info->trans_lock);
+		btrfs_lockdep_acquire(fs_info, btrfs_trans_num_writers);
 		return 0;
 	}
 	spin_unlock(&fs_info->trans_lock);
@@ -334,16 +335,20 @@  static noinline int join_transaction(struct btrfs_fs_info *fs_info,
 	if (!cur_trans)
 		return -ENOMEM;
 
+	btrfs_lockdep_acquire(fs_info, btrfs_trans_num_writers);
+
 	spin_lock(&fs_info->trans_lock);
 	if (fs_info->running_transaction) {
 		/*
 		 * someone started a transaction after we unlocked.  Make sure
 		 * to redo the checks above
 		 */
+		btrfs_lockdep_release(fs_info, btrfs_trans_num_writers);
 		kfree(cur_trans);
 		goto loop;
 	} else if (BTRFS_FS_ERROR(fs_info)) {
 		spin_unlock(&fs_info->trans_lock);
+		btrfs_lockdep_release(fs_info, btrfs_trans_num_writers);
 		kfree(cur_trans);
 		return -EROFS;
 	}
@@ -1020,6 +1025,9 @@  static int __btrfs_end_transaction(struct btrfs_trans_handle *trans,
 	extwriter_counter_dec(cur_trans, trans->type);
 
 	cond_wake_up(&cur_trans->writer_wait);
+
+	btrfs_lockdep_release(info, btrfs_trans_num_writers);
+
 	btrfs_put_transaction(cur_trans);
 
 	if (current->journal_info == trans)
@@ -1980,6 +1988,12 @@  static void cleanup_transaction(struct btrfs_trans_handle *trans, int err)
 	if (cur_trans == fs_info->running_transaction) {
 		cur_trans->state = TRANS_STATE_COMMIT_DOING;
 		spin_unlock(&fs_info->trans_lock);
+
+		/*
+		 * The thread has already released the lockdep map as reader already in
+		 * btrfs_commit_transaction().
+		 */
+		btrfs_might_wait_for_event(fs_info, btrfs_trans_num_writers);
 		wait_event(cur_trans->writer_wait,
 			   atomic_read(&cur_trans->num_writers) == 1);
 
@@ -2207,8 +2221,10 @@  int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
 			ret = READ_ONCE(prev_trans->aborted);
 
 			btrfs_put_transaction(prev_trans);
-			if (ret)
+			if (ret) {
+				btrfs_lockdep_release(fs_info, btrfs_trans_num_writers);
 				goto cleanup_transaction;
+			}
 		} else {
 			spin_unlock(&fs_info->trans_lock);
 		}
@@ -2222,6 +2238,7 @@  int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
 		 */
 		if (BTRFS_FS_ERROR(fs_info)) {
 			ret = -EROFS;
+			btrfs_lockdep_release(fs_info, btrfs_trans_num_writers);
 			goto cleanup_transaction;
 		}
 	}
@@ -2235,20 +2252,26 @@  int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
 	extwriter_counter_dec(cur_trans, trans->type);
 
 	ret = btrfs_start_delalloc_flush(fs_info);
-	if (ret)
+	if (ret) {
+		btrfs_lockdep_release(fs_info, btrfs_trans_num_writers);
 		goto cleanup_transaction;
+	}
 
 	ret = btrfs_run_delayed_items(trans);
-	if (ret)
+	if (ret) {
+		btrfs_lockdep_release(fs_info, btrfs_trans_num_writers);
 		goto cleanup_transaction;
+	}
 
 	wait_event(cur_trans->writer_wait,
 		   extwriter_counter_read(cur_trans) == 0);
 
 	/* some pending stuffs might be added after the previous flush. */
 	ret = btrfs_run_delayed_items(trans);
-	if (ret)
+	if (ret) {
+		btrfs_lockdep_release(fs_info, btrfs_trans_num_writers);
 		goto cleanup_transaction;
+	}
 
 	btrfs_wait_delalloc_flush(fs_info);
 
@@ -2270,6 +2293,14 @@  int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
 	add_pending_snapshot(trans);
 	cur_trans->state = TRANS_STATE_COMMIT_DOING;
 	spin_unlock(&fs_info->trans_lock);
+
+	/*
+	 * The thread has started/joined the transaction thus it holds the lockdep
+	 * map as a reader. It has to release it before acquiring the lockdep map
+	 * as a writer.
+	 */
+	btrfs_lockdep_release(fs_info, btrfs_trans_num_writers);
+	btrfs_might_wait_for_event(fs_info, btrfs_trans_num_writers);
 	wait_event(cur_trans->writer_wait,
 		   atomic_read(&cur_trans->num_writers) == 1);