diff mbox series

[V2,7/7] fs/jbd2: Free journal head outside of locked region

Message ID 20190801010944.549462805@linutronix.de (mailing list archive)
State New, archived
Headers show
Series fs: Substitute bit-spinlocks for PREEMPT_RT and debugging | expand

Commit Message

Thomas Gleixner Aug. 1, 2019, 1:01 a.m. UTC
On PREEMPT_RT bit-spinlocks have the same semantics as on PREEMPT_RT=n,
i.e. they disable preemption. That means functions which are not safe to be
called in preempt disabled context on RT trigger a might_sleep() assert.

The journal head bit spinlock is mostly held for short code sequences with
trivial RT safe functionality, except for one place:

jbd2_journal_put_journal_head() invokes __journal_remove_journal_head()
with the journal head bit spinlock held. __journal_remove_journal_head()
invokes kmem_cache_free() which must not be called with preemption disabled
on RT.

Jan suggested to rework the removal function so the actual free happens
outside the bit-spinlocked region.

Split it into two parts:

  - Do the sanity checks and the buffer head detach under the lock

  - Do the actual free after dropping the lock

There is error case handling in the free part which needs to dereference
the b_size field of the now detached buffer head. Due to paranoia (caused
by ignorance) the size is retrieved in the detach function and handed into
the free function. Might be over-engineered, but better safe than sorry.

This makes the journal head bit-spinlock usage RT compliant and also avoids
nested locking which is not covered by lockdep.

Suggested-by: Jan Kara <jack@suse.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-ext4@vger.kernel.org
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Jan Kara <jack@suse.com>
---
V2: New patch
---
 fs/jbd2/journal.c |   28 ++++++++++++++++++++--------
 1 file changed, 20 insertions(+), 8 deletions(-)

Comments

Jan Kara Aug. 1, 2019, 9:22 a.m. UTC | #1
On Thu 01-08-19 03:01:33, Thomas Gleixner wrote:
> On PREEMPT_RT bit-spinlocks have the same semantics as on PREEMPT_RT=n,
> i.e. they disable preemption. That means functions which are not safe to be
> called in preempt disabled context on RT trigger a might_sleep() assert.
> 
> The journal head bit spinlock is mostly held for short code sequences with
> trivial RT safe functionality, except for one place:
> 
> jbd2_journal_put_journal_head() invokes __journal_remove_journal_head()
> with the journal head bit spinlock held. __journal_remove_journal_head()
> invokes kmem_cache_free() which must not be called with preemption disabled
> on RT.
> 
> Jan suggested to rework the removal function so the actual free happens
> outside the bit-spinlocked region.
> 
> Split it into two parts:
> 
>   - Do the sanity checks and the buffer head detach under the lock
> 
>   - Do the actual free after dropping the lock
> 
> There is error case handling in the free part which needs to dereference
> the b_size field of the now detached buffer head. Due to paranoia (caused
> by ignorance) the size is retrieved in the detach function and handed into
> the free function. Might be over-engineered, but better safe than sorry.
> 
> This makes the journal head bit-spinlock usage RT compliant and also avoids
> nested locking which is not covered by lockdep.
> 
> Suggested-by: Jan Kara <jack@suse.com>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: linux-ext4@vger.kernel.org
> Cc: "Theodore Ts'o" <tytso@mit.edu>
> Cc: Jan Kara <jack@suse.com>

Looks mostly good. Just a small suggestion for simplification below:

> @@ -2559,11 +2568,14 @@ void jbd2_journal_put_journal_head(struc
>  	J_ASSERT_JH(jh, jh->b_jcount > 0);
>  	--jh->b_jcount;
>  	if (!jh->b_jcount) {
> -		__journal_remove_journal_head(bh);
> +		size_t b_size = __journal_remove_journal_head(bh);
> +
>  		jbd_unlock_bh_journal_head(bh);
> +		journal_release_journal_head(jh, b_size);
>  		__brelse(bh);

The bh is pinned until you call __brelse(bh) above and bh->b_size doesn't
change during the lifetime of the buffer. So there's no need of
fetching bh->b_size in __journal_remove_journal_head() and passing it back.
You can just:

		journal_release_journal_head(jh, bh->b_size);

> -	} else
> +	} else {
>  		jbd_unlock_bh_journal_head(bh);
> +	}
>  }
>  

								Honza
Thomas Gleixner Aug. 1, 2019, 6:08 p.m. UTC | #2
On Thu, 1 Aug 2019, Jan Kara wrote:
> On Thu 01-08-19 03:01:33, Thomas Gleixner wrote:
> > On PREEMPT_RT bit-spinlocks have the same semantics as on PREEMPT_RT=n,
> > i.e. they disable preemption. That means functions which are not safe to be
> > called in preempt disabled context on RT trigger a might_sleep() assert.
> Looks mostly good. Just a small suggestion for simplification below:
> 
> > @@ -2559,11 +2568,14 @@ void jbd2_journal_put_journal_head(struc
> >  	J_ASSERT_JH(jh, jh->b_jcount > 0);
> >  	--jh->b_jcount;
> >  	if (!jh->b_jcount) {
> > -		__journal_remove_journal_head(bh);
> > +		size_t b_size = __journal_remove_journal_head(bh);
> > +
> >  		jbd_unlock_bh_journal_head(bh);
> > +		journal_release_journal_head(jh, b_size);
> >  		__brelse(bh);
> 
> The bh is pinned until you call __brelse(bh) above and bh->b_size doesn't
> change during the lifetime of the buffer. So there's no need of
> fetching bh->b_size in __journal_remove_journal_head() and passing it back.
> You can just:
> 
> 		journal_release_journal_head(jh, bh->b_size);

Ah. Nice. I assumed that this would be possible, but then my ignorance
induced paranoia won.

Thanks,

	tglx
diff mbox series

Patch

--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -2521,9 +2521,10 @@  struct journal_head *jbd2_journal_grab_j
 	return jh;
 }
 
-static void __journal_remove_journal_head(struct buffer_head *bh)
+static size_t __journal_remove_journal_head(struct buffer_head *bh)
 {
 	struct journal_head *jh = bh2jh(bh);
+	size_t b_size = READ_ONCE(bh->b_size);
 
 	J_ASSERT_JH(jh, jh->b_jcount >= 0);
 	J_ASSERT_JH(jh, jh->b_transaction == NULL);
@@ -2533,17 +2534,25 @@  static void __journal_remove_journal_hea
 	J_ASSERT_BH(bh, buffer_jbd(bh));
 	J_ASSERT_BH(bh, jh2bh(jh) == bh);
 	BUFFER_TRACE(bh, "remove journal_head");
+
+	/* Unlink before dropping the lock */
+	bh->b_private = NULL;
+	jh->b_bh = NULL;	/* debug, really */
+	clear_buffer_jbd(bh);
+
+	return b_size;
+}
+
+static void journal_release_journal_head(struct journal_head *jh, size_t b_size)
+{
 	if (jh->b_frozen_data) {
 		printk(KERN_WARNING "%s: freeing b_frozen_data\n", __func__);
-		jbd2_free(jh->b_frozen_data, bh->b_size);
+		jbd2_free(jh->b_frozen_data, b_size);
 	}
 	if (jh->b_committed_data) {
 		printk(KERN_WARNING "%s: freeing b_committed_data\n", __func__);
-		jbd2_free(jh->b_committed_data, bh->b_size);
+		jbd2_free(jh->b_committed_data, b_size);
 	}
-	bh->b_private = NULL;
-	jh->b_bh = NULL;	/* debug, really */
-	clear_buffer_jbd(bh);
 	journal_free_journal_head(jh);
 }
 
@@ -2559,11 +2568,14 @@  void jbd2_journal_put_journal_head(struc
 	J_ASSERT_JH(jh, jh->b_jcount > 0);
 	--jh->b_jcount;
 	if (!jh->b_jcount) {
-		__journal_remove_journal_head(bh);
+		size_t b_size = __journal_remove_journal_head(bh);
+
 		jbd_unlock_bh_journal_head(bh);
+		journal_release_journal_head(jh, b_size);
 		__brelse(bh);
-	} else
+	} else {
 		jbd_unlock_bh_journal_head(bh);
+	}
 }
 
 /*