diff mbox

[v2,2/8] fix the broken lockdep logic in __sb_start_write()

Message ID 20150811170401.GA26904@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Oleg Nesterov Aug. 11, 2015, 5:04 p.m. UTC
1. wait_event(frozen < level) without rwsem_acquire_read() is just
   wrong from lockdep perspective. If we are going to deadlock
   because the caller is buggy, lockdep detect this problem.

2. __sb_start_write() can race with thaw_super() + freeze_super(),
   and after "goto retry" the 2nd  acquire_freeze_lock() is wrong.

3. The "tell lockdep we are doing trylock" hack doesn't look nice.

   I think this is correct, but this logic should be more explicit.
   Yes, the recursive read_lock() is fine if we hold the lock on a
   higher level. But we do not need to fool lockdep. If we can not
   deadlock in this case then try-lock must not fail and we can use
   use wait == F throughout this code.

Note: as Dave Chinner explains, the "trylock" hack and the fat comment
can be probably removed. But this needs a separate change and it will
be trivial: just kill __sb_start_write() and rename do_sb_start_write()
back to __sb_start_write().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 fs/super.c |   73 ++++++++++++++++++++++++++++++++---------------------------
 1 files changed, 40 insertions(+), 33 deletions(-)

Comments

Jan Kara Aug. 13, 2015, 10:02 a.m. UTC | #1
On Tue 11-08-15 19:04:01, Oleg Nesterov wrote:
> 1. wait_event(frozen < level) without rwsem_acquire_read() is just
>    wrong from lockdep perspective. If we are going to deadlock
>    because the caller is buggy, lockdep detect this problem.
> 
> 2. __sb_start_write() can race with thaw_super() + freeze_super(),
>    and after "goto retry" the 2nd  acquire_freeze_lock() is wrong.
> 
> 3. The "tell lockdep we are doing trylock" hack doesn't look nice.
> 
>    I think this is correct, but this logic should be more explicit.
>    Yes, the recursive read_lock() is fine if we hold the lock on a
>    higher level. But we do not need to fool lockdep. If we can not
>    deadlock in this case then try-lock must not fail and we can use
>    use wait == F throughout this code.
> 
> Note: as Dave Chinner explains, the "trylock" hack and the fat comment
> can be probably removed. But this needs a separate change and it will
> be trivial: just kill __sb_start_write() and rename do_sb_start_write()
> back to __sb_start_write().
> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>

Just a nit below...

> +	if (wait)
> +		rwsem_acquire_read(&sb->s_writers.lock_map[level-1], 0, 0, ip);

If we provided also __sb_writers_acquire() helper (in addition to _nowait)
variant, we could use these helpers in __sb_start_write() /
__sb_end_write() as well which would look better to me when we already have
them.

Once this is updated, feel free to add:

Reviewed-by: Jan Kara <jack@suse.com>

								Honza
Oleg Nesterov Aug. 13, 2015, 1:22 p.m. UTC | #2
On 08/13, Jan Kara wrote:
>
> On Tue 11-08-15 19:04:01, Oleg Nesterov wrote:
> > 1. wait_event(frozen < level) without rwsem_acquire_read() is just
> >    wrong from lockdep perspective. If we are going to deadlock
> >    because the caller is buggy, lockdep detect this problem.
> >
> > 2. __sb_start_write() can race with thaw_super() + freeze_super(),
> >    and after "goto retry" the 2nd  acquire_freeze_lock() is wrong.
> >
> > 3. The "tell lockdep we are doing trylock" hack doesn't look nice.
> >
> >    I think this is correct, but this logic should be more explicit.
> >    Yes, the recursive read_lock() is fine if we hold the lock on a
> >    higher level. But we do not need to fool lockdep. If we can not
> >    deadlock in this case then try-lock must not fail and we can use
> >    use wait == F throughout this code.
> >
> > Note: as Dave Chinner explains, the "trylock" hack and the fat comment
> > can be probably removed. But this needs a separate change and it will
> > be trivial: just kill __sb_start_write() and rename do_sb_start_write()
> > back to __sb_start_write().
> >
> > Signed-off-by: Oleg Nesterov <oleg@redhat.com>
>
> Just a nit below...
>
> > +	if (wait)
> > +		rwsem_acquire_read(&sb->s_writers.lock_map[level-1], 0, 0, ip);
>
> If we provided also __sb_writers_acquire() helper (in addition to _nowait)
> variant, we could use these helpers in __sb_start_write() /
> __sb_end_write() as well which would look better to me when we already have
> them.

Why? This code goes away after 8/8.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Kara Aug. 13, 2015, 1:29 p.m. UTC | #3
On Thu 13-08-15 15:22:23, Oleg Nesterov wrote:
> On 08/13, Jan Kara wrote:
> >
> > On Tue 11-08-15 19:04:01, Oleg Nesterov wrote:
> > > 1. wait_event(frozen < level) without rwsem_acquire_read() is just
> > >    wrong from lockdep perspective. If we are going to deadlock
> > >    because the caller is buggy, lockdep detect this problem.
> > >
> > > 2. __sb_start_write() can race with thaw_super() + freeze_super(),
> > >    and after "goto retry" the 2nd  acquire_freeze_lock() is wrong.
> > >
> > > 3. The "tell lockdep we are doing trylock" hack doesn't look nice.
> > >
> > >    I think this is correct, but this logic should be more explicit.
> > >    Yes, the recursive read_lock() is fine if we hold the lock on a
> > >    higher level. But we do not need to fool lockdep. If we can not
> > >    deadlock in this case then try-lock must not fail and we can use
> > >    use wait == F throughout this code.
> > >
> > > Note: as Dave Chinner explains, the "trylock" hack and the fat comment
> > > can be probably removed. But this needs a separate change and it will
> > > be trivial: just kill __sb_start_write() and rename do_sb_start_write()
> > > back to __sb_start_write().
> > >
> > > Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> >
> > Just a nit below...
> >
> > > +	if (wait)
> > > +		rwsem_acquire_read(&sb->s_writers.lock_map[level-1], 0, 0, ip);
> >
> > If we provided also __sb_writers_acquire() helper (in addition to _nowait)
> > variant, we could use these helpers in __sb_start_write() /
> > __sb_end_write() as well which would look better to me when we already have
> > them.
> 
> Why? This code goes away after 8/8.

Right, OK. Objection retracted :).

								Honza
diff mbox

Patch

diff --git a/fs/super.c b/fs/super.c
index 928c20f..d0fdd49 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -1158,38 +1158,11 @@  void __sb_end_write(struct super_block *sb, int level)
 }
 EXPORT_SYMBOL(__sb_end_write);
 
-#ifdef CONFIG_LOCKDEP
-/*
- * We want lockdep to tell us about possible deadlocks with freezing but
- * it's it bit tricky to properly instrument it. Getting a freeze protection
- * works as getting a read lock but there are subtle problems. XFS for example
- * gets freeze protection on internal level twice in some cases, which is OK
- * only because we already hold a freeze protection also on higher level. Due
- * to these cases we have to tell lockdep we are doing trylock when we
- * already hold a freeze protection for a higher freeze level.
- */
-static void acquire_freeze_lock(struct super_block *sb, int level, bool trylock,
+static int do_sb_start_write(struct super_block *sb, int level, bool wait,
 				unsigned long ip)
 {
-	int i;
-
-	if (!trylock) {
-		for (i = 0; i < level - 1; i++)
-			if (lock_is_held(&sb->s_writers.lock_map[i])) {
-				trylock = true;
-				break;
-			}
-	}
-	rwsem_acquire_read(&sb->s_writers.lock_map[level-1], 0, trylock, ip);
-}
-#endif
-
-/*
- * This is an internal function, please use sb_start_{write,pagefault,intwrite}
- * instead.
- */
-int __sb_start_write(struct super_block *sb, int level, bool wait)
-{
+	if (wait)
+		rwsem_acquire_read(&sb->s_writers.lock_map[level-1], 0, 0, ip);
 retry:
 	if (unlikely(sb->s_writers.frozen >= level)) {
 		if (!wait)
@@ -1198,9 +1171,6 @@  retry:
 			   sb->s_writers.frozen < level);
 	}
 
-#ifdef CONFIG_LOCKDEP
-	acquire_freeze_lock(sb, level, !wait, _RET_IP_);
-#endif
 	percpu_counter_inc(&sb->s_writers.counter[level-1]);
 	/*
 	 * Make sure counter is updated before we check for frozen.
@@ -1211,8 +1181,45 @@  retry:
 		__sb_end_write(sb, level);
 		goto retry;
 	}
+
+	if (!wait)
+		rwsem_acquire_read(&sb->s_writers.lock_map[level-1], 0, 1, ip);
 	return 1;
 }
+
+/*
+ * This is an internal function, please use sb_start_{write,pagefault,intwrite}
+ * instead.
+ */
+int __sb_start_write(struct super_block *sb, int level, bool wait)
+{
+	bool force_trylock = false;
+	int ret;
+
+#ifdef CONFIG_LOCKDEP
+	/*
+	 * We want lockdep to tell us about possible deadlocks with freezing
+	 * but it's it bit tricky to properly instrument it. Getting a freeze
+	 * protection works as getting a read lock but there are subtle
+	 * problems. XFS for example gets freeze protection on internal level
+	 * twice in some cases, which is OK only because we already hold a
+	 * freeze protection also on higher level. Due to these cases we have
+	 * to use wait == F (trylock mode) which must not fail.
+	 */
+	if (wait) {
+		int i;
+
+		for (i = 0; i < level - 1; i++)
+			if (lock_is_held(&sb->s_writers.lock_map[i])) {
+				force_trylock = true;
+				break;
+			}
+	}
+#endif
+	ret = do_sb_start_write(sb, level, wait && !force_trylock, _RET_IP_);
+	WARN_ON(force_trylock & !ret);
+	return ret;
+}
 EXPORT_SYMBOL(__sb_start_write);
 
 /**