Message ID | 20160214234312.GX17997@ZenIV.linux.org.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
I pushed the list_del up to the kernel.org for-next branch... And I've been running tests with the CRUDE bandaid... weird results... No oopses, no WARN_ONs... I was running dbench and ls -R or find and kill-minus-nining different ones of them with no perceived resulting problems, so I moved on to signalling the client-core to abort... it restarted numerous times, and then stuff wedged up differently than I've seen before. Usually I kill the client-core and it comes back (gets restarted) as seen by the different PID: # ps -ef | grep pvfs root 1292 1185 7 11:39 ? 00:00:01 pvfs2-client-core --child -a 60000 -n 60000 --logtype file -L /var/log/client.log # kill -6 1292 # ps -ef | grep pvfs root 1299 1185 8 11:40 ? 00:00:00 pvfs2-client-core --child -a 60000 -n 60000 --logtype file -L /var/log/client.log Until once, it didn't die, and the gorked up unkillable left-over thing's argv[0] (or wherever this string gets scraped from) was goofy: # ps -ef | grep pvfs root 1324 1185 1 11:41 ? 00:00:02 pvfs2-client-core --child -a 60000 -n 60000 --logtype file -L /var/log/client.log [root@be1 hubcap]# kill -6 1324 [root@be1 hubcap]# ps -ef | grep pvfs root 1324 1185 2 11:41 ? 00:00:05 [pvfs2-client-co] The virtual host was pretty wedged up after that, I couldn't look at anything interesting, and got a bunch of terminal windows hung trying: # strace -f -p 1324 Process 1324 attached ^C ^C^C . ls -R's output was flowing out here /pvfsmnt/tdir/z_really_long_disgustingly_long_super_long_file_name52 /pvfsmnt/tdir/z_really_long_disgustingly_long_super_long_file_name53 ^C^C^C [root@logtruck hubcap]# ssh be1 root@be1's password: Last login: Mon Feb 15 11:33:42 2016 from logtruck.clemson.edu [root@be1 ~]# df I still had one functioning window, and looked at dmesg from there, nothing interesting there... a couple of expected tag WARNINGS while I was killing finds and dbenches... ioctls that happened during the successful restarts of the client-core... [ 809.520966] client-core: opening device [ 809.521031] pvfs2-client-core: open device complete (ret = 0) [ 809.521050] dispatch_ioctl_command: client debug mask has been been received :0: :0: [ 809.521068] dispatch_ioctl_command: client debug array string has been received. [ 809.521070] orangefs_prepare_debugfs_help_string: start [ 809.521071] orangefs_prepare_cdm_array: start [ 809.521104] orangefs_prepare_cdm_array: rc:50: [ 809.521106] orangefs_prepare_debugfs_help_string: cdm_element_count:50: [ 809.521239] debug_mask_to_string: start [ 809.521242] debug_mask_to_string: string:none: [ 809.521243] orangefs_client_debug_init: start [ 809.521249] orangefs_client_debug_init: rc:0: [ 809.566652] dispatch_ioctl_command: got ORANGEFS_DEV_REMOUNT_ALL [ 809.566667] dispatch_ioctl_command: priority remount in progress [ 809.566668] dispatch_ioctl_command: priority remount complete [ 812.454255] orangefs_debug_open: orangefs_debug_disabled: 0 [ 812.454294] orangefs_debug_open: rc: 0 [ 812.454320] orangefs_debug_write: kernel-debug [ 812.454323] debug_string_to_mask: start [ 896.410522] WARNING: No one's waiting for tag 15612 [ 1085.339948] WARNING: No one's waiting for tag 127943 [ 1146.820485] orangefs: please confirm that pvfs2-client daemon is running. [ 1146.820488] fs/orangefs/dir.c line 264: orangefs_readdir: orangefs_readdir_index_get() failure (-5) [ 1146.866812] dispatch_ioctl_command: client debug mask has been been received :0: :0: [ 1146.866834] dispatch_ioctl_command: client debug array string has been received. [ 1175.906800] dispatch_ioctl_command: client debug mask has been been received :0: :0: [ 1175.906817] dispatch_ioctl_command: client debug array string has been received. [ 1223.915862] dispatch_ioctl_command: client debug mask has been been received :0: :0: [ 1223.915880] dispatch_ioctl_command: client debug array string has been received. [ 1274.458852] dispatch_ioctl_command: client debug mask has been been received :0: :0: [ 1274.458870] dispatch_ioctl_command: client debug array string has been received. [root@be1 hubcap]# ps aux shows every process' state as S except for 1324 which is racking up time: [hubcap@be1 ~]$ ps aux | grep pvfs2-client root 1324 92.4 0.0 0 0 ? R 11:41 46:29 [pvfs2-client-co] [hubcap@be1 ~]$ ps aux | grep pvfs2-client root 1324 92.4 0.0 0 0 ? R 11:41 46:30 [pvfs2-client-co] I'll virsh destroy this thing now <g>... -Mike On Sun, Feb 14, 2016 at 6:43 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > On Sun, Feb 14, 2016 at 05:31:10PM -0500, Mike Marshall wrote: >> I added the list_del... >> >> Everything is very resilient, I killed >> the client-core over and over while dbench >> was running at the same time as ls -R >> was running, and the client-core always >> restarted... until finally, it didn't. I guess >> related to the state of just what was going on >> at the time... Hit the WARN_ON in service_operation, >> and then oopsed on the orangefs_bufmap_put >> down at the end of wait_for_direct_io... > > Bloody hell... I think I see what's going on, and presumably the newer > slot allocator would fix that. Look: closing control device (== daemon > death) checks if we have a bufmap installed and drops a reference to > it in that case. The reason why it's conditional is that we might have > not gotten around to installing one (it's done via ioctl on control > device). But ->release() does *NOT* wait for all references to go away! > In other words, it's possible to restart the daemon while the old bufmap > is still there. Then have it killed after it has opened control devices > and before the old bufmap has run down. For ->release() it looks like > we *have* gotten around to installing bufmap, and need the reference dropped. > In reality, the reference acquired when we were installing that one has > already been dropped, so we get double put. With expected results... > > If below ends up fixing the symptoms, analysis above has a good chance to > be correct. This is no way to wait for rundown, of course - I'm not > suggesting it as the solution, just as a way to narrow down what's going > on. > > Incidentally, could you fold the list_del() part into offending commit > (orangefs: delay freeing slot until cancel completes) and repush your > for-next? > > diff --git a/fs/orangefs/devorangefs-req.c b/fs/orangefs/devorangefs-req.c > index 6a7df12..630246d 100644 > --- a/fs/orangefs/devorangefs-req.c > +++ b/fs/orangefs/devorangefs-req.c > @@ -529,6 +529,9 @@ static int orangefs_devreq_release(struct inode *inode, struct file *file) > purge_inprogress_ops(); > gossip_debug(GOSSIP_DEV_DEBUG, > "pvfs2-client-core: device close complete\n"); > + /* VERY CRUDE, NOT FOR MERGE */ > + while (orangefs_get_bufmap_init()) > + schedule_timeout(HZ); > open_access_count = 0; > mutex_unlock(&devreq_mutex); > return 0; > diff --git a/fs/orangefs/orangefs-kernel.h b/fs/orangefs/orangefs-kernel.h > index 41f8bb1f..1e28555 100644 > --- a/fs/orangefs/orangefs-kernel.h > +++ b/fs/orangefs/orangefs-kernel.h > @@ -261,6 +261,7 @@ static inline void set_op_state_purged(struct orangefs_kernel_op_s *op) > { > spin_lock(&op->lock); > if (unlikely(op_is_cancel(op))) { > + list_del(&op->list); > spin_unlock(&op->lock); > put_cancel(op); > } else { -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Feb 15, 2016 at 12:46:51PM -0500, Mike Marshall wrote: > I pushed the list_del up to the kernel.org for-next branch... > > And I've been running tests with the CRUDE bandaid... weird > results... > > No oopses, no WARN_ONs... I was running dbench and ls -R > or find and kill-minus-nining different ones of them with no > perceived resulting problems, so I moved on to signalling > the client-core to abort... it restarted numerous times, > and then stuff wedged up differently than I've seen before. There are other problems with that thing (starting with the fact that retrying readdir/wait_for_direct_io can try to grab a slot despite the bufmap winding down). OK, at that point I think we should try to see if bufmap rewrite works - I've rebased on top of your branch and pushed (head at 8c3bc9a). Bufmap rewrite is really completely untested - it's done pretty much blindly and I'd be surprised as hell if it has no brainos at the first try. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> Bufmap rewrite is really completely untested - > it's done pretty much blindly and I'd be surprised as hell if it has no > brainos at the first try. You did pretty good, it takes me two tries to get hello world right... Right off the bat, the kernel crashed, because: static struct slot_map rw_map = { .c = -1, .q = __WAIT_QUEUE_HEAD_INITIALIZER(rw_map.q) }; static struct slot_map readdir_map = { .c = -1, .q = __WAIT_QUEUE_HEAD_INITIALIZER(rw_map.q) }; ^ | D'OH! But after that stuff almost worked... It can still "sort of" wedge up. We think that when dbench is running and the client-core is killed, you can hit orangefs_bufmap_finalize -> mark_killed -> run_down/schedule(). while those wait_for_completion_* schedules of extant ops in wait_for_matching_downcall have also given up the processor... Then... when you interrupt dbench, stuff starts flowing again... I added a couple of gossip statements inside of mark_killed and run_down... Feb 15 16:40:15 be1 kernel: [ 349.981597] orangefs_bufmap_finalize: called Feb 15 16:40:15 be1 kernel: [ 349.981600] mark_killed enter Feb 15 16:40:15 be1 kernel: [ 349.981602] mark_killed: leave Feb 15 16:40:15 be1 kernel: [ 349.981603] mark_killed enter Feb 15 16:40:15 be1 kernel: [ 349.981605] mark_killed: leave Feb 15 16:40:15 be1 kernel: [ 349.981606] run_down: enter:-1: Feb 15 16:40:15 be1 kernel: [ 349.981608] run_down: leave Feb 15 16:40:15 be1 kernel: [ 349.981609] run_down: enter:-2: Feb 15 16:40:15 be1 kernel: [ 349.981610] run_down: before schedule:-2: stuff just sits here while dbench is still running. Then Ctrl-C on dbench and off to the races again. eb 15 16:42:28 be1 kernel: [ 483.049927] *** wait_for_matching_downcall: operation interrupted by a signal (tag 16523, op ffff880013418000) Feb 15 16:42:28 be1 kernel: [ 483.049930] Interrupted: Removed op ffff880013418000 from htable_ops_in_progress Feb 15 16:42:28 be1 kernel: [ 483.049932] orangefs: service_operation orangefs_inode_getattr returning: -4 for ffff880013418000. Feb 15 16:42:28 be1 kernel: [ 483.050116] *** wait_for_matching_downcall: operation interrupted by a signal (tag 16518, op ffff8800001a8000) Feb 15 16:42:28 be1 kernel: [ 483.050118] Interrupted: Removed op ffff8800001a8000 from htable_ops_in_progress Feb 15 16:42:28 be1 kernel: [ 483.050120] orangefs: service_operation orangefs_inode_getattr returning: -4 for ffff8800001a8000. Martin already has a patch... What do you think? I'm headed home for supper... -Mike On Mon, Feb 15, 2016 at 1:45 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > On Mon, Feb 15, 2016 at 12:46:51PM -0500, Mike Marshall wrote: >> I pushed the list_del up to the kernel.org for-next branch... >> >> And I've been running tests with the CRUDE bandaid... weird >> results... >> >> No oopses, no WARN_ONs... I was running dbench and ls -R >> or find and kill-minus-nining different ones of them with no >> perceived resulting problems, so I moved on to signalling >> the client-core to abort... it restarted numerous times, >> and then stuff wedged up differently than I've seen before. > > There are other problems with that thing (starting with the fact that > retrying readdir/wait_for_direct_io can try to grab a slot despite the > bufmap winding down). OK, at that point I think we should try to see > if bufmap rewrite works - I've rebased on top of your branch and pushed > (head at 8c3bc9a). Bufmap rewrite is really completely untested - > it's done pretty much blindly and I'd be surprised as hell if it has no > brainos at the first try. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/orangefs/devorangefs-req.c b/fs/orangefs/devorangefs-req.c index 6a7df12..630246d 100644 --- a/fs/orangefs/devorangefs-req.c +++ b/fs/orangefs/devorangefs-req.c @@ -529,6 +529,9 @@ static int orangefs_devreq_release(struct inode *inode, struct file *file) purge_inprogress_ops(); gossip_debug(GOSSIP_DEV_DEBUG, "pvfs2-client-core: device close complete\n"); + /* VERY CRUDE, NOT FOR MERGE */ + while (orangefs_get_bufmap_init()) + schedule_timeout(HZ); open_access_count = 0; mutex_unlock(&devreq_mutex); return 0; diff --git a/fs/orangefs/orangefs-kernel.h b/fs/orangefs/orangefs-kernel.h index 41f8bb1f..1e28555 100644 --- a/fs/orangefs/orangefs-kernel.h +++ b/fs/orangefs/orangefs-kernel.h @@ -261,6 +261,7 @@ static inline void set_op_state_purged(struct orangefs_kernel_op_s *op) { spin_lock(&op->lock); if (unlikely(op_is_cancel(op))) { + list_del(&op->list); spin_unlock(&op->lock); put_cancel(op); } else {