diff mbox

orangefs: move features validation to fix filesystem hang

Message ID 1491516695-15133-1-git-send-email-martin@omnibond.com (mailing list archive)
State New, archived
Headers show

Commit Message

Martin Brandenburg April 6, 2017, 10:11 p.m. UTC
Without this fix (and another to the userspace component itself
described later), the kernel will be unable to process any OrangeFS
requests after the userspace component is restarted (due to a crash or
at the administrator's behest).

The bug here is that inside orangefs_remount, the orangefs_request_mutex
is locked.  When the userspace component restarts while the filesystem
is mounted, it sends a ORANGEFS_DEV_REMOUNT_ALL ioctl to the device,
which causes the kernel to send it a few requests aimed at synchronizing
the state between the two.  While this is happening the
orangefs_request_mutex is locked to prevent any other requests going
through.

This is only half of the bugfix.  The other half is in the userspace
component which outright ignores(!) requests made before it considers
the filesystem remounted, which is after the ioctl returns.  Of course
the ioctl doesn't return until after the userspace component responds to
the request it ignores.  The userspace component has been changed to
allow ORANGEFS_VFS_OP_FEATURES regardless of the mount status.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Cc: stable@vger.kernel.org
---
 fs/orangefs/super.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

Comments

Mike Marshall April 7, 2017, 8:01 p.m. UTC | #1
I've tested this patch against the fixed userspace part. This
patch is real important, I hope it can make it into 4.11...

Here's what happens when the userspace daemon is restarted,
without the patch:

[  264.138953] =============================================
[  264.139278] [ INFO: possible recursive locking detected ]
[  264.139607] 4.10.0-00007-ge98bdb3 #1 Not tainted
[  264.139883] ---------------------------------------------
[  264.140205] pvfs2-client-co/29032 is trying to acquire lock:
[  264.140565]  (orangefs_request_mutex){+.+.+.}, at:
[<ffffffffc060eed7>] service_operation+0x3c7/0x7b0 [orangefs]
[  264.141169]
               but task is already holding lock:
[  264.141518]  (orangefs_request_mutex){+.+.+.}, at:
[<ffffffffc060997f>] dispatch_ioctl_command+0x1bf/0x330 [orangefs]
[  264.142145]
               other info that might help us debug this:
[  264.142535]  Possible unsafe locking scenario:

[  264.142888]        CPU0
[  264.143038]        ----
[  264.143187]   lock(orangefs_request_mutex);
[  264.143441]   lock(orangefs_request_mutex);
[  264.143692]
                *** DEADLOCK ***

[  264.144043]  May be due to missing lock nesting notation

[  264.144447] 1 lock held by pvfs2-client-co/29032:
[  264.144728]  #0:  (orangefs_request_mutex){+.+.+.}, at:
[<ffffffffc060997f>] dispatch_ioctl_command+0x1bf/0x330 [orangefs]
[  264.145380]
               stack backtrace:
[  264.145645] CPU: 0 PID: 29032 Comm: pvfs2-client-co Not tainted
4.10.0-00007-ge98bdb3 #1
[  264.146124] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.9.3-1.fc25 04/01/2014
[  264.146639] Call Trace:
[  264.146792]  dump_stack+0x86/0xc3
[  264.146995]  __lock_acquire+0x7eb/0x1290
[  264.147232]  ? add_lock_to_list.isra.28.constprop.43+0x8a/0x100
[  264.147587]  lock_acquire+0xe8/0x1d0
[  264.147805]  ? service_operation+0x3c7/0x7b0 [orangefs]
[  264.148119]  mutex_lock_killable_nested+0x6f/0x6e0
[  264.148407]  ? service_operation+0x3c7/0x7b0 [orangefs]
[  264.148721]  ? service_operation+0x3c7/0x7b0 [orangefs]
[  264.149034]  ? debug_lockdep_rcu_enabled+0x1d/0x20
[  264.149323]  service_operation+0x3c7/0x7b0 [orangefs]
[  264.149628]  ? remove_wait_queue+0x70/0x70
[  264.149877]  orangefs_remount+0xea/0x150 [orangefs]
[  264.150191]  dispatch_ioctl_command+0x227/0x330 [orangefs]
[  264.150523]  orangefs_devreq_ioctl+0x29/0x70 [orangefs]
[  264.150837]  do_vfs_ioctl+0xa3/0x6e0
[  264.151054]  ? __fget+0x5/0x1f0
[  264.151246]  SyS_ioctl+0x79/0x90
[  264.151445]  entry_SYSCALL_64_fastpath+0x1f/0xc2
[  264.151722] RIP: 0033:0x7f8ac4279ce7
[  264.151939] RSP: 002b:00007f8ac0338e38 EFLAGS: 00000246 ORIG_RAX:
0000000000000010
[  264.152389] RAX: ffffffffffffffda RBX: 00007f8ac0339700 RCX: 00007f8ac4279ce7
[  264.152814] RDX: 0000000000000000 RSI: 0000000000006b05 RDI: 0000000000000005
[  264.153239] RBP: 00007ffd6b889a10 R08: 0000000000611c00 R09: 00007f8ac0339700
[  264.153664] R10: 00000000000000e8 R11: 0000000000000246 R12: 00007ffd6b889a0f
[  264.154089] R13: 0000000000001000 R14: 00007f8ac0339700 R15: 00007f8ac03399c0


Signed-off-by: Mike Marshall <hubcap@omnibond.com>

On Thu, Apr 6, 2017 at 6:11 PM, Martin Brandenburg <martin@omnibond.com> wrote:
> Without this fix (and another to the userspace component itself
> described later), the kernel will be unable to process any OrangeFS
> requests after the userspace component is restarted (due to a crash or
> at the administrator's behest).
>
> The bug here is that inside orangefs_remount, the orangefs_request_mutex
> is locked.  When the userspace component restarts while the filesystem
> is mounted, it sends a ORANGEFS_DEV_REMOUNT_ALL ioctl to the device,
> which causes the kernel to send it a few requests aimed at synchronizing
> the state between the two.  While this is happening the
> orangefs_request_mutex is locked to prevent any other requests going
> through.
>
> This is only half of the bugfix.  The other half is in the userspace
> component which outright ignores(!) requests made before it considers
> the filesystem remounted, which is after the ioctl returns.  Of course
> the ioctl doesn't return until after the userspace component responds to
> the request it ignores.  The userspace component has been changed to
> allow ORANGEFS_VFS_OP_FEATURES regardless of the mount status.
>
> Signed-off-by: Martin Brandenburg <martin@omnibond.com>
> Cc: stable@vger.kernel.org
> ---
>  fs/orangefs/super.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/fs/orangefs/super.c b/fs/orangefs/super.c
> index 67c2435..cd261c8 100644
> --- a/fs/orangefs/super.c
> +++ b/fs/orangefs/super.c
> @@ -263,8 +263,13 @@ int orangefs_remount(struct orangefs_sb_info_s *orangefs_sb)
>                 if (!new_op)
>                         return -ENOMEM;
>                 new_op->upcall.req.features.features = 0;
> -               ret = service_operation(new_op, "orangefs_features", 0);
> -               orangefs_features = new_op->downcall.resp.features.features;
> +               ret = service_operation(new_op, "orangefs_features",
> +                   ORANGEFS_OP_PRIORITY | ORANGEFS_OP_NO_MUTEX);
> +               if (!ret)
> +                       orangefs_features =
> +                           new_op->downcall.resp.features.features;
> +               else
> +                       orangefs_features = 0;
>                 op_release(new_op);
>         } else {
>                 orangefs_features = 0;
> --
> 2.1.4
>
diff mbox

Patch

diff --git a/fs/orangefs/super.c b/fs/orangefs/super.c
index 67c2435..cd261c8 100644
--- a/fs/orangefs/super.c
+++ b/fs/orangefs/super.c
@@ -263,8 +263,13 @@  int orangefs_remount(struct orangefs_sb_info_s *orangefs_sb)
 		if (!new_op)
 			return -ENOMEM;
 		new_op->upcall.req.features.features = 0;
-		ret = service_operation(new_op, "orangefs_features", 0);
-		orangefs_features = new_op->downcall.resp.features.features;
+		ret = service_operation(new_op, "orangefs_features",
+		    ORANGEFS_OP_PRIORITY | ORANGEFS_OP_NO_MUTEX);
+		if (!ret)
+			orangefs_features =
+			    new_op->downcall.resp.features.features;
+		else
+			orangefs_features = 0;
 		op_release(new_op);
 	} else {
 		orangefs_features = 0;