Message ID | 1421249572-12038-1-git-send-email-jlayton@primarydata.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, Jan 14, 2015 at 10:32 AM, Jeff Layton <jlayton@primarydata.com> wrote: > Bruce reported seeing this warning pop when mounting using v4.1: > > ------------[ cut here ]------------ > WARNING: CPU: 1 PID: 1121 at kernel/sched/core.c:7300 __might_sleep+0xbd/0xd0() > do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffff810ff58f>] prepare_to_wait+0x2f/0x90 > Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc fscache ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_pcm snd_timer ppdev joydev snd virtio_console virtio_balloon pcspkr serio_raw parport_pc parport pvpanic floppy soundcore i2c_piix4 virtio_blk virtio_net qxl drm_kms_helper ttm drm virtio_pci virtio_ring ata_generic virtio pata_acpi > CPU: 1 PID: 1121 Comm: nfsv4.1-svc Not tainted 3.19.0-rc4+ #25 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140709_153950- 04/01/2014 > 0000000000000000 000000004e5e3f73 ffff8800b998fb48 ffffffff8186ac78 > 0000000000000000 ffff8800b998fba0 ffff8800b998fb88 ffffffff810ac9da > ffff8800b998fb68 ffffffff81c923e7 00000000000004d9 0000000000000000 > Call Trace: > [<ffffffff8186ac78>] dump_stack+0x4c/0x65 > [<ffffffff810ac9da>] warn_slowpath_common+0x8a/0xc0 > [<ffffffff810aca65>] warn_slowpath_fmt+0x55/0x70 > [<ffffffff810ff58f>] ? prepare_to_wait+0x2f/0x90 > [<ffffffff810ff58f>] ? prepare_to_wait+0x2f/0x90 > [<ffffffff810dd2ad>] __might_sleep+0xbd/0xd0 > [<ffffffff8124c973>] kmem_cache_alloc_trace+0x243/0x430 > [<ffffffff810d941e>] ? groups_alloc+0x3e/0x130 > [<ffffffff810d941e>] groups_alloc+0x3e/0x130 > [<ffffffffa0301b1e>] svcauth_unix_accept+0x16e/0x290 [sunrpc] > [<ffffffffa0300571>] svc_authenticate+0xe1/0xf0 [sunrpc] > [<ffffffffa02fc564>] svc_process_common+0x244/0x6a0 [sunrpc] > [<ffffffffa02fd044>] bc_svc_process+0x1c4/0x260 [sunrpc] > [<ffffffffa03d5478>] nfs41_callback_svc+0x128/0x1f0 [nfsv4] > [<ffffffff810ff970>] ? wait_woken+0xc0/0xc0 > [<ffffffffa03d5350>] ? nfs4_callback_svc+0x60/0x60 [nfsv4] > [<ffffffff810d45bf>] kthread+0x11f/0x140 > [<ffffffff810ea815>] ? local_clock+0x15/0x30 > [<ffffffff810d44a0>] ? kthread_create_on_node+0x250/0x250 > [<ffffffff81874bfc>] ret_from_fork+0x7c/0xb0 > [<ffffffff810d44a0>] ? kthread_create_on_node+0x250/0x250 > ---[ end trace 675220a11e30f4f2 ]--- > > nfs41_callback_svc does most of its work while in TASK_INTERRUPTIBLE, > which is just wrong. Fix that by finishing the wait immediately if we've > found that the list has something on it. ACK. > Also, we don't expect this kthread to accept signals, so we should be > using a TASK_UNINTERRUPTIBLE sleep instead. Umm... Won't that end up triggering the hung task watchdog for every 120seconds with no callback activity? > Reported-by: "J. Bruce Fields" <bfields@fieldses.org> > Signed-off-by: Jeff Layton <jlayton@primarydata.com> > --- > fs/nfs/callback.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/fs/nfs/callback.c b/fs/nfs/callback.c > index b8fb3a4ef649..ebba958e084c 100644 > --- a/fs/nfs/callback.c > +++ b/fs/nfs/callback.c > @@ -128,13 +128,14 @@ nfs41_callback_svc(void *vrqstp) > if (try_to_freeze()) > continue; > > - prepare_to_wait(&serv->sv_cb_waitq, &wq, TASK_INTERRUPTIBLE); > + prepare_to_wait(&serv->sv_cb_waitq, &wq, TASK_UNINTERRUPTIBLE); > spin_lock_bh(&serv->sv_cb_lock); > if (!list_empty(&serv->sv_cb_list)) { > req = list_first_entry(&serv->sv_cb_list, > struct rpc_rqst, rq_bc_list); > list_del(&req->rq_bc_list); > spin_unlock_bh(&serv->sv_cb_lock); > + finish_wait(&serv->sv_cb_waitq, &wq); > dprintk("Invoking bc_svc_process()\n"); > error = bc_svc_process(serv, req, rqstp); > dprintk("bc_svc_process() returned w/ error code= %d\n", > @@ -142,8 +143,8 @@ nfs41_callback_svc(void *vrqstp) > } else { > spin_unlock_bh(&serv->sv_cb_lock); > schedule(); > + finish_wait(&serv->sv_cb_waitq, &wq); > } > - finish_wait(&serv->sv_cb_waitq, &wq); > } > return 0; > } > -- > 2.1.0 >
On Wed, 14 Jan 2015 12:13:43 -0500 Trond Myklebust <trond.myklebust@primarydata.com> wrote: > On Wed, Jan 14, 2015 at 10:32 AM, Jeff Layton <jlayton@primarydata.com> wrote: > > Bruce reported seeing this warning pop when mounting using v4.1: > > > > ------------[ cut here ]------------ > > WARNING: CPU: 1 PID: 1121 at kernel/sched/core.c:7300 __might_sleep+0xbd/0xd0() > > do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffff810ff58f>] prepare_to_wait+0x2f/0x90 > > Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc fscache ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_pcm snd_timer ppdev joydev snd virtio_console virtio_balloon pcspkr serio_raw parport_pc parport pvpanic floppy soundcore i2c_piix4 virtio_blk virtio_net qxl drm_kms_helper ttm drm virtio_pci virtio_ring ata_generic virtio pata_acpi > > CPU: 1 PID: 1121 Comm: nfsv4.1-svc Not tainted 3.19.0-rc4+ #25 > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140709_153950- 04/01/2014 > > 0000000000000000 000000004e5e3f73 ffff8800b998fb48 ffffffff8186ac78 > > 0000000000000000 ffff8800b998fba0 ffff8800b998fb88 ffffffff810ac9da > > ffff8800b998fb68 ffffffff81c923e7 00000000000004d9 0000000000000000 > > Call Trace: > > [<ffffffff8186ac78>] dump_stack+0x4c/0x65 > > [<ffffffff810ac9da>] warn_slowpath_common+0x8a/0xc0 > > [<ffffffff810aca65>] warn_slowpath_fmt+0x55/0x70 > > [<ffffffff810ff58f>] ? prepare_to_wait+0x2f/0x90 > > [<ffffffff810ff58f>] ? prepare_to_wait+0x2f/0x90 > > [<ffffffff810dd2ad>] __might_sleep+0xbd/0xd0 > > [<ffffffff8124c973>] kmem_cache_alloc_trace+0x243/0x430 > > [<ffffffff810d941e>] ? groups_alloc+0x3e/0x130 > > [<ffffffff810d941e>] groups_alloc+0x3e/0x130 > > [<ffffffffa0301b1e>] svcauth_unix_accept+0x16e/0x290 [sunrpc] > > [<ffffffffa0300571>] svc_authenticate+0xe1/0xf0 [sunrpc] > > [<ffffffffa02fc564>] svc_process_common+0x244/0x6a0 [sunrpc] > > [<ffffffffa02fd044>] bc_svc_process+0x1c4/0x260 [sunrpc] > > [<ffffffffa03d5478>] nfs41_callback_svc+0x128/0x1f0 [nfsv4] > > [<ffffffff810ff970>] ? wait_woken+0xc0/0xc0 > > [<ffffffffa03d5350>] ? nfs4_callback_svc+0x60/0x60 [nfsv4] > > [<ffffffff810d45bf>] kthread+0x11f/0x140 > > [<ffffffff810ea815>] ? local_clock+0x15/0x30 > > [<ffffffff810d44a0>] ? kthread_create_on_node+0x250/0x250 > > [<ffffffff81874bfc>] ret_from_fork+0x7c/0xb0 > > [<ffffffff810d44a0>] ? kthread_create_on_node+0x250/0x250 > > ---[ end trace 675220a11e30f4f2 ]--- > > > > nfs41_callback_svc does most of its work while in TASK_INTERRUPTIBLE, > > which is just wrong. Fix that by finishing the wait immediately if we've > > found that the list has something on it. > > ACK. > > > Also, we don't expect this kthread to accept signals, so we should be > > using a TASK_UNINTERRUPTIBLE sleep instead. > > Umm... Won't that end up triggering the hung task watchdog for every > 120seconds with no callback activity? > Doh! You're correct. What's the right way to do this then? Do we need to use schedule_timeout and wake up every 100s or so? Using TASK_INTERRUPTIBLE to work around the watchdog seems wrong but I guess we can live with that in the short term if it's the only way. Long-term, it would probably make sense to convert this (and other service threads like this) over to workqueue-based services. I still need to chase down where the latency comes from before that patchset is ready for merge though. > > Reported-by: "J. Bruce Fields" <bfields@fieldses.org> > > Signed-off-by: Jeff Layton <jlayton@primarydata.com> > > --- > > fs/nfs/callback.c | 5 +++-- > > 1 file changed, 3 insertions(+), 2 deletions(-) > > > > diff --git a/fs/nfs/callback.c b/fs/nfs/callback.c > > index b8fb3a4ef649..ebba958e084c 100644 > > --- a/fs/nfs/callback.c > > +++ b/fs/nfs/callback.c > > @@ -128,13 +128,14 @@ nfs41_callback_svc(void *vrqstp) > > if (try_to_freeze()) > > continue; > > > > - prepare_to_wait(&serv->sv_cb_waitq, &wq, > > TASK_INTERRUPTIBLE); > > + prepare_to_wait(&serv->sv_cb_waitq, &wq, > > TASK_UNINTERRUPTIBLE); spin_lock_bh(&serv->sv_cb_lock); > > if (!list_empty(&serv->sv_cb_list)) { > > req = list_first_entry(&serv->sv_cb_list, > > struct rpc_rqst, > > rq_bc_list); list_del(&req->rq_bc_list); > > spin_unlock_bh(&serv->sv_cb_lock); > > + finish_wait(&serv->sv_cb_waitq, &wq); > > dprintk("Invoking bc_svc_process()\n"); > > error = bc_svc_process(serv, req, rqstp); > > dprintk("bc_svc_process() returned w/ error > > code= %d\n", @@ -142,8 +143,8 @@ nfs41_callback_svc(void *vrqstp) > > } else { > > spin_unlock_bh(&serv->sv_cb_lock); > > schedule(); > > + finish_wait(&serv->sv_cb_waitq, &wq); > > } > > - finish_wait(&serv->sv_cb_waitq, &wq); > > } > > return 0; > > } > > -- > > 2.1.0 > > > > >
On Wed, Jan 14, 2015 at 12:27 PM, Jeff Layton <jeff.layton@primarydata.com> wrote: > On Wed, 14 Jan 2015 12:13:43 -0500 > Trond Myklebust <trond.myklebust@primarydata.com> wrote: > >> On Wed, Jan 14, 2015 at 10:32 AM, Jeff Layton <jlayton@primarydata.com> wrote: >> > Bruce reported seeing this warning pop when mounting using v4.1: >> > >> > ------------[ cut here ]------------ >> > WARNING: CPU: 1 PID: 1121 at kernel/sched/core.c:7300 __might_sleep+0xbd/0xd0() >> > do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffff810ff58f>] prepare_to_wait+0x2f/0x90 >> > Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc fscache ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_pcm snd_timer ppdev joydev snd virtio_console virtio_balloon pcspkr serio_raw parport_pc parport pvpanic floppy soundcore i2c_piix4 virtio_blk virtio_net qxl drm_kms_helper ttm drm virtio_pci virtio_ring ata_generic virtio pata_acpi >> > CPU: 1 PID: 1121 Comm: nfsv4.1-svc Not tainted 3.19.0-rc4+ #25 >> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140709_153950- 04/01/2014 >> > 0000000000000000 000000004e5e3f73 ffff8800b998fb48 ffffffff8186ac78 >> > 0000000000000000 ffff8800b998fba0 ffff8800b998fb88 ffffffff810ac9da >> > ffff8800b998fb68 ffffffff81c923e7 00000000000004d9 0000000000000000 >> > Call Trace: >> > [<ffffffff8186ac78>] dump_stack+0x4c/0x65 >> > [<ffffffff810ac9da>] warn_slowpath_common+0x8a/0xc0 >> > [<ffffffff810aca65>] warn_slowpath_fmt+0x55/0x70 >> > [<ffffffff810ff58f>] ? prepare_to_wait+0x2f/0x90 >> > [<ffffffff810ff58f>] ? prepare_to_wait+0x2f/0x90 >> > [<ffffffff810dd2ad>] __might_sleep+0xbd/0xd0 >> > [<ffffffff8124c973>] kmem_cache_alloc_trace+0x243/0x430 >> > [<ffffffff810d941e>] ? groups_alloc+0x3e/0x130 >> > [<ffffffff810d941e>] groups_alloc+0x3e/0x130 >> > [<ffffffffa0301b1e>] svcauth_unix_accept+0x16e/0x290 [sunrpc] >> > [<ffffffffa0300571>] svc_authenticate+0xe1/0xf0 [sunrpc] >> > [<ffffffffa02fc564>] svc_process_common+0x244/0x6a0 [sunrpc] >> > [<ffffffffa02fd044>] bc_svc_process+0x1c4/0x260 [sunrpc] >> > [<ffffffffa03d5478>] nfs41_callback_svc+0x128/0x1f0 [nfsv4] >> > [<ffffffff810ff970>] ? wait_woken+0xc0/0xc0 >> > [<ffffffffa03d5350>] ? nfs4_callback_svc+0x60/0x60 [nfsv4] >> > [<ffffffff810d45bf>] kthread+0x11f/0x140 >> > [<ffffffff810ea815>] ? local_clock+0x15/0x30 >> > [<ffffffff810d44a0>] ? kthread_create_on_node+0x250/0x250 >> > [<ffffffff81874bfc>] ret_from_fork+0x7c/0xb0 >> > [<ffffffff810d44a0>] ? kthread_create_on_node+0x250/0x250 >> > ---[ end trace 675220a11e30f4f2 ]--- >> > >> > nfs41_callback_svc does most of its work while in TASK_INTERRUPTIBLE, >> > which is just wrong. Fix that by finishing the wait immediately if we've >> > found that the list has something on it. >> >> ACK. >> >> > Also, we don't expect this kthread to accept signals, so we should be >> > using a TASK_UNINTERRUPTIBLE sleep instead. >> >> Umm... Won't that end up triggering the hung task watchdog for every >> 120seconds with no callback activity? >> > > Doh! You're correct. > > What's the right way to do this then? Do we need to use > schedule_timeout and wake up every 100s or so? Using TASK_INTERRUPTIBLE > to work around the watchdog seems wrong but I guess we can live with > that in the short term if it's the only way. The alternative is to use TASK_KILLABLE. That's a little more restrictive, but still a PITA. Note the both TASK_INTERRUPTIBLE and TASK_KILLABLE will need something to handle that signalled() case, otherwise we end up with a permanent busy-waiting loop.
On Wed, 14 Jan 2015 13:06:17 -0500 Trond Myklebust <trond.myklebust@primarydata.com> wrote: > On Wed, Jan 14, 2015 at 12:27 PM, Jeff Layton > <jeff.layton@primarydata.com> wrote: > > On Wed, 14 Jan 2015 12:13:43 -0500 > > Trond Myklebust <trond.myklebust@primarydata.com> wrote: > > > >> On Wed, Jan 14, 2015 at 10:32 AM, Jeff Layton <jlayton@primarydata.com> wrote: > >> > Bruce reported seeing this warning pop when mounting using v4.1: > >> > > >> > ------------[ cut here ]------------ > >> > WARNING: CPU: 1 PID: 1121 at kernel/sched/core.c:7300 __might_sleep+0xbd/0xd0() > >> > do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffff810ff58f>] prepare_to_wait+0x2f/0x90 > >> > Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc fscache ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_pcm snd_timer ppdev joydev snd virtio_console virtio_balloon pcspkr serio_raw parport_pc parport pvpanic floppy soundcore i2c_piix4 virtio_blk virtio_net qxl drm_kms_helper ttm drm virtio_pci virtio_ring ata_generic virtio pata_acpi > >> > CPU: 1 PID: 1121 Comm: nfsv4.1-svc Not tainted 3.19.0-rc4+ #25 > >> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140709_153950- 04/01/2014 > >> > 0000000000000000 000000004e5e3f73 ffff8800b998fb48 ffffffff8186ac78 > >> > 0000000000000000 ffff8800b998fba0 ffff8800b998fb88 ffffffff810ac9da > >> > ffff8800b998fb68 ffffffff81c923e7 00000000000004d9 0000000000000000 > >> > Call Trace: > >> > [<ffffffff8186ac78>] dump_stack+0x4c/0x65 > >> > [<ffffffff810ac9da>] warn_slowpath_common+0x8a/0xc0 > >> > [<ffffffff810aca65>] warn_slowpath_fmt+0x55/0x70 > >> > [<ffffffff810ff58f>] ? prepare_to_wait+0x2f/0x90 > >> > [<ffffffff810ff58f>] ? prepare_to_wait+0x2f/0x90 > >> > [<ffffffff810dd2ad>] __might_sleep+0xbd/0xd0 > >> > [<ffffffff8124c973>] kmem_cache_alloc_trace+0x243/0x430 > >> > [<ffffffff810d941e>] ? groups_alloc+0x3e/0x130 > >> > [<ffffffff810d941e>] groups_alloc+0x3e/0x130 > >> > [<ffffffffa0301b1e>] svcauth_unix_accept+0x16e/0x290 [sunrpc] > >> > [<ffffffffa0300571>] svc_authenticate+0xe1/0xf0 [sunrpc] > >> > [<ffffffffa02fc564>] svc_process_common+0x244/0x6a0 [sunrpc] > >> > [<ffffffffa02fd044>] bc_svc_process+0x1c4/0x260 [sunrpc] > >> > [<ffffffffa03d5478>] nfs41_callback_svc+0x128/0x1f0 [nfsv4] > >> > [<ffffffff810ff970>] ? wait_woken+0xc0/0xc0 > >> > [<ffffffffa03d5350>] ? nfs4_callback_svc+0x60/0x60 [nfsv4] > >> > [<ffffffff810d45bf>] kthread+0x11f/0x140 > >> > [<ffffffff810ea815>] ? local_clock+0x15/0x30 > >> > [<ffffffff810d44a0>] ? kthread_create_on_node+0x250/0x250 > >> > [<ffffffff81874bfc>] ret_from_fork+0x7c/0xb0 > >> > [<ffffffff810d44a0>] ? kthread_create_on_node+0x250/0x250 > >> > ---[ end trace 675220a11e30f4f2 ]--- > >> > > >> > nfs41_callback_svc does most of its work while in TASK_INTERRUPTIBLE, > >> > which is just wrong. Fix that by finishing the wait immediately if we've > >> > found that the list has something on it. > >> > >> ACK. > >> > >> > Also, we don't expect this kthread to accept signals, so we should be > >> > using a TASK_UNINTERRUPTIBLE sleep instead. > >> > >> Umm... Won't that end up triggering the hung task watchdog for every > >> 120seconds with no callback activity? > >> > > > > Doh! You're correct. > > > > What's the right way to do this then? Do we need to use > > schedule_timeout and wake up every 100s or so? Using TASK_INTERRUPTIBLE > > to work around the watchdog seems wrong but I guess we can live with > > that in the short term if it's the only way. > > The alternative is to use TASK_KILLABLE. That's a little more > restrictive, but still a PITA. Note the both TASK_INTERRUPTIBLE and > TASK_KILLABLE will need something to handle that signalled() case, > otherwise we end up with a permanent busy-waiting loop. > Heh, our emails appear to have crossed. I went with a schedule_timeout for v2 so we can sidestep the whole issue of signals. It's simple enough to add a flush_signals() call at the end of the loop, but I'm a little leery of what effects we might see if the thread catches a signal while in the middle of doing real work. If you really think that's the better approach, I can do a v3 patch that does that however.
On Jan 14, 2015, at 10:32 AM, Jeff Layton <jeff.layton@primarydata.com> wrote: > On Wed, 14 Jan 2015 13:06:17 -0500 > Trond Myklebust <trond.myklebust@primarydata.com> wrote: > >> On Wed, Jan 14, 2015 at 12:27 PM, Jeff Layton >> <jeff.layton@primarydata.com> wrote: >>> On Wed, 14 Jan 2015 12:13:43 -0500 >>> Trond Myklebust <trond.myklebust@primarydata.com> wrote: >>> >>>> On Wed, Jan 14, 2015 at 10:32 AM, Jeff Layton <jlayton@primarydata.com> wrote: >>>>> Bruce reported seeing this warning pop when mounting using v4.1: >>>>> >>>>> ------------[ cut here ]------------ >>>>> WARNING: CPU: 1 PID: 1121 at kernel/sched/core.c:7300 __might_sleep+0xbd/0xd0() >>>>> do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffff810ff58f>] prepare_to_wait+0x2f/0x90 >>>>> Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc fscache ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_pcm snd_timer ppdev joydev snd virtio_console virtio_balloon pcspkr serio_raw parport_pc parport pvpanic floppy soundcore i2c_piix4 virtio_blk virtio_net qxl drm_kms_helper ttm drm virtio_pci virtio_ring ata_generic virtio pata_acpi >>>>> CPU: 1 PID: 1121 Comm: nfsv4.1-svc Not tainted 3.19.0-rc4+ #25 >>>>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140709_153950- 04/01/2014 >>>>> 0000000000000000 000000004e5e3f73 ffff8800b998fb48 ffffffff8186ac78 >>>>> 0000000000000000 ffff8800b998fba0 ffff8800b998fb88 ffffffff810ac9da >>>>> ffff8800b998fb68 ffffffff81c923e7 00000000000004d9 0000000000000000 >>>>> Call Trace: >>>>> [<ffffffff8186ac78>] dump_stack+0x4c/0x65 >>>>> [<ffffffff810ac9da>] warn_slowpath_common+0x8a/0xc0 >>>>> [<ffffffff810aca65>] warn_slowpath_fmt+0x55/0x70 >>>>> [<ffffffff810ff58f>] ? prepare_to_wait+0x2f/0x90 >>>>> [<ffffffff810ff58f>] ? prepare_to_wait+0x2f/0x90 >>>>> [<ffffffff810dd2ad>] __might_sleep+0xbd/0xd0 >>>>> [<ffffffff8124c973>] kmem_cache_alloc_trace+0x243/0x430 >>>>> [<ffffffff810d941e>] ? groups_alloc+0x3e/0x130 >>>>> [<ffffffff810d941e>] groups_alloc+0x3e/0x130 >>>>> [<ffffffffa0301b1e>] svcauth_unix_accept+0x16e/0x290 [sunrpc] >>>>> [<ffffffffa0300571>] svc_authenticate+0xe1/0xf0 [sunrpc] >>>>> [<ffffffffa02fc564>] svc_process_common+0x244/0x6a0 [sunrpc] >>>>> [<ffffffffa02fd044>] bc_svc_process+0x1c4/0x260 [sunrpc] >>>>> [<ffffffffa03d5478>] nfs41_callback_svc+0x128/0x1f0 [nfsv4] >>>>> [<ffffffff810ff970>] ? wait_woken+0xc0/0xc0 >>>>> [<ffffffffa03d5350>] ? nfs4_callback_svc+0x60/0x60 [nfsv4] >>>>> [<ffffffff810d45bf>] kthread+0x11f/0x140 >>>>> [<ffffffff810ea815>] ? local_clock+0x15/0x30 >>>>> [<ffffffff810d44a0>] ? kthread_create_on_node+0x250/0x250 >>>>> [<ffffffff81874bfc>] ret_from_fork+0x7c/0xb0 >>>>> [<ffffffff810d44a0>] ? kthread_create_on_node+0x250/0x250 >>>>> ---[ end trace 675220a11e30f4f2 ]--- >>>>> >>>>> nfs41_callback_svc does most of its work while in TASK_INTERRUPTIBLE, >>>>> which is just wrong. Fix that by finishing the wait immediately if we've >>>>> found that the list has something on it. >>>> >>>> ACK. >>>> >>>>> Also, we don't expect this kthread to accept signals, so we should be >>>>> using a TASK_UNINTERRUPTIBLE sleep instead. >>>> >>>> Umm... Won't that end up triggering the hung task watchdog for every >>>> 120seconds with no callback activity? >>>> >>> >>> Doh! You're correct. >>> >>> What's the right way to do this then? Do we need to use >>> schedule_timeout and wake up every 100s or so? Using TASK_INTERRUPTIBLE >>> to work around the watchdog seems wrong but I guess we can live with >>> that in the short term if it's the only way. >> >> The alternative is to use TASK_KILLABLE. That's a little more >> restrictive, but still a PITA. Note the both TASK_INTERRUPTIBLE and >> TASK_KILLABLE will need something to handle that signalled() case, >> otherwise we end up with a permanent busy-waiting loop. >> > > Heh, our emails appear to have crossed. I went with a schedule_timeout > for v2 so we can sidestep the whole issue of signals. > > It's simple enough to add a flush_signals() call at the end of the > loop, but I'm a little leery of what effects we might see if the thread > catches a signal while in the middle of doing real work. > > If you really think that's the better approach, I can do a v3 patch > that does that however. After applying this patch to a 3.19.0 test system, the WARNINGS are gone, but the system load average is pinned at 1.00. root 12105 0.1 0.0 0 0 ? D Mar18 1:19 [nfsv4.1-svc] -- Chuck Lever chuck[dot]lever[at]oracle[dot]com -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 19 Mar 2015 17:59:15 -0700 Chuck Lever <chuck.lever@oracle.com> wrote: > > On Jan 14, 2015, at 10:32 AM, Jeff Layton <jeff.layton@primarydata.com> wrote: > > > On Wed, 14 Jan 2015 13:06:17 -0500 > > Trond Myklebust <trond.myklebust@primarydata.com> wrote: > > > >> On Wed, Jan 14, 2015 at 12:27 PM, Jeff Layton > >> <jeff.layton@primarydata.com> wrote: > >>> On Wed, 14 Jan 2015 12:13:43 -0500 > >>> Trond Myklebust <trond.myklebust@primarydata.com> wrote: > >>> > >>>> On Wed, Jan 14, 2015 at 10:32 AM, Jeff Layton <jlayton@primarydata.com> wrote: > >>>>> Bruce reported seeing this warning pop when mounting using v4.1: > >>>>> > >>>>> ------------[ cut here ]------------ > >>>>> WARNING: CPU: 1 PID: 1121 at kernel/sched/core.c:7300 __might_sleep+0xbd/0xd0() > >>>>> do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffff810ff58f>] prepare_to_wait+0x2f/0x90 > >>>>> Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc fscache ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_pcm snd_timer ppdev joydev snd virtio_console virtio_balloon pcspkr serio_raw parport_pc parport pvpanic floppy soundcore i2c_piix4 virtio_blk virtio_net qxl drm_kms_helper ttm drm virtio_pci virtio_ring ata_generic virtio pata_acpi > >>>>> CPU: 1 PID: 1121 Comm: nfsv4.1-svc Not tainted 3.19.0-rc4+ #25 > >>>>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140709_153950- 04/01/2014 > >>>>> 0000000000000000 000000004e5e3f73 ffff8800b998fb48 ffffffff8186ac78 > >>>>> 0000000000000000 ffff8800b998fba0 ffff8800b998fb88 ffffffff810ac9da > >>>>> ffff8800b998fb68 ffffffff81c923e7 00000000000004d9 0000000000000000 > >>>>> Call Trace: > >>>>> [<ffffffff8186ac78>] dump_stack+0x4c/0x65 > >>>>> [<ffffffff810ac9da>] warn_slowpath_common+0x8a/0xc0 > >>>>> [<ffffffff810aca65>] warn_slowpath_fmt+0x55/0x70 > >>>>> [<ffffffff810ff58f>] ? prepare_to_wait+0x2f/0x90 > >>>>> [<ffffffff810ff58f>] ? prepare_to_wait+0x2f/0x90 > >>>>> [<ffffffff810dd2ad>] __might_sleep+0xbd/0xd0 > >>>>> [<ffffffff8124c973>] kmem_cache_alloc_trace+0x243/0x430 > >>>>> [<ffffffff810d941e>] ? groups_alloc+0x3e/0x130 > >>>>> [<ffffffff810d941e>] groups_alloc+0x3e/0x130 > >>>>> [<ffffffffa0301b1e>] svcauth_unix_accept+0x16e/0x290 [sunrpc] > >>>>> [<ffffffffa0300571>] svc_authenticate+0xe1/0xf0 [sunrpc] > >>>>> [<ffffffffa02fc564>] svc_process_common+0x244/0x6a0 [sunrpc] > >>>>> [<ffffffffa02fd044>] bc_svc_process+0x1c4/0x260 [sunrpc] > >>>>> [<ffffffffa03d5478>] nfs41_callback_svc+0x128/0x1f0 [nfsv4] > >>>>> [<ffffffff810ff970>] ? wait_woken+0xc0/0xc0 > >>>>> [<ffffffffa03d5350>] ? nfs4_callback_svc+0x60/0x60 [nfsv4] > >>>>> [<ffffffff810d45bf>] kthread+0x11f/0x140 > >>>>> [<ffffffff810ea815>] ? local_clock+0x15/0x30 > >>>>> [<ffffffff810d44a0>] ? kthread_create_on_node+0x250/0x250 > >>>>> [<ffffffff81874bfc>] ret_from_fork+0x7c/0xb0 > >>>>> [<ffffffff810d44a0>] ? kthread_create_on_node+0x250/0x250 > >>>>> ---[ end trace 675220a11e30f4f2 ]--- > >>>>> > >>>>> nfs41_callback_svc does most of its work while in TASK_INTERRUPTIBLE, > >>>>> which is just wrong. Fix that by finishing the wait immediately if we've > >>>>> found that the list has something on it. > >>>> > >>>> ACK. > >>>> > >>>>> Also, we don't expect this kthread to accept signals, so we should be > >>>>> using a TASK_UNINTERRUPTIBLE sleep instead. > >>>> > >>>> Umm... Won't that end up triggering the hung task watchdog for every > >>>> 120seconds with no callback activity? > >>>> > >>> > >>> Doh! You're correct. > >>> > >>> What's the right way to do this then? Do we need to use > >>> schedule_timeout and wake up every 100s or so? Using TASK_INTERRUPTIBLE > >>> to work around the watchdog seems wrong but I guess we can live with > >>> that in the short term if it's the only way. > >> > >> The alternative is to use TASK_KILLABLE. That's a little more > >> restrictive, but still a PITA. Note the both TASK_INTERRUPTIBLE and > >> TASK_KILLABLE will need something to handle that signalled() case, > >> otherwise we end up with a permanent busy-waiting loop. > >> > > > > Heh, our emails appear to have crossed. I went with a schedule_timeout > > for v2 so we can sidestep the whole issue of signals. > > > > It's simple enough to add a flush_signals() call at the end of the > > loop, but I'm a little leery of what effects we might see if the thread > > catches a signal while in the middle of doing real work. > > > > If you really think that's the better approach, I can do a v3 patch > > that does that however. > > After applying this patch to a 3.19.0 test system, the WARNINGS are > gone, but the system load average is pinned at 1.00. > > root 12105 0.1 0.0 0 0 ? D Mar18 1:19 [nfsv4.1-svc] > Darn, good point... I forgot about the stupid load avg on Linux counting D state tasks toward the load avg (why, oh why...). I guess we will have to use a different task state after all -- probably TASK_KILLABLE and just try to ignore fatal signals as best we can. I'll plan to do that when I get a minute. Suggestions of better ways to do this would be appreciated though. Cheers,
diff --git a/fs/nfs/callback.c b/fs/nfs/callback.c index b8fb3a4ef649..ebba958e084c 100644 --- a/fs/nfs/callback.c +++ b/fs/nfs/callback.c @@ -128,13 +128,14 @@ nfs41_callback_svc(void *vrqstp) if (try_to_freeze()) continue; - prepare_to_wait(&serv->sv_cb_waitq, &wq, TASK_INTERRUPTIBLE); + prepare_to_wait(&serv->sv_cb_waitq, &wq, TASK_UNINTERRUPTIBLE); spin_lock_bh(&serv->sv_cb_lock); if (!list_empty(&serv->sv_cb_list)) { req = list_first_entry(&serv->sv_cb_list, struct rpc_rqst, rq_bc_list); list_del(&req->rq_bc_list); spin_unlock_bh(&serv->sv_cb_lock); + finish_wait(&serv->sv_cb_waitq, &wq); dprintk("Invoking bc_svc_process()\n"); error = bc_svc_process(serv, req, rqstp); dprintk("bc_svc_process() returned w/ error code= %d\n", @@ -142,8 +143,8 @@ nfs41_callback_svc(void *vrqstp) } else { spin_unlock_bh(&serv->sv_cb_lock); schedule(); + finish_wait(&serv->sv_cb_waitq, &wq); } - finish_wait(&serv->sv_cb_waitq, &wq); } return 0; }
Bruce reported seeing this warning pop when mounting using v4.1: ------------[ cut here ]------------ WARNING: CPU: 1 PID: 1121 at kernel/sched/core.c:7300 __might_sleep+0xbd/0xd0() do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffff810ff58f>] prepare_to_wait+0x2f/0x90 Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc fscache ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_pcm snd_timer ppdev joydev snd virtio_console virtio_balloon pcspkr serio_raw parport_pc parport pvpanic floppy soundcore i2c_piix4 virtio_blk virtio_net qxl drm_kms_helper ttm drm virtio_pci virtio_ring ata_generic virtio pata_acpi CPU: 1 PID: 1121 Comm: nfsv4.1-svc Not tainted 3.19.0-rc4+ #25 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140709_153950- 04/01/2014 0000000000000000 000000004e5e3f73 ffff8800b998fb48 ffffffff8186ac78 0000000000000000 ffff8800b998fba0 ffff8800b998fb88 ffffffff810ac9da ffff8800b998fb68 ffffffff81c923e7 00000000000004d9 0000000000000000 Call Trace: [<ffffffff8186ac78>] dump_stack+0x4c/0x65 [<ffffffff810ac9da>] warn_slowpath_common+0x8a/0xc0 [<ffffffff810aca65>] warn_slowpath_fmt+0x55/0x70 [<ffffffff810ff58f>] ? prepare_to_wait+0x2f/0x90 [<ffffffff810ff58f>] ? prepare_to_wait+0x2f/0x90 [<ffffffff810dd2ad>] __might_sleep+0xbd/0xd0 [<ffffffff8124c973>] kmem_cache_alloc_trace+0x243/0x430 [<ffffffff810d941e>] ? groups_alloc+0x3e/0x130 [<ffffffff810d941e>] groups_alloc+0x3e/0x130 [<ffffffffa0301b1e>] svcauth_unix_accept+0x16e/0x290 [sunrpc] [<ffffffffa0300571>] svc_authenticate+0xe1/0xf0 [sunrpc] [<ffffffffa02fc564>] svc_process_common+0x244/0x6a0 [sunrpc] [<ffffffffa02fd044>] bc_svc_process+0x1c4/0x260 [sunrpc] [<ffffffffa03d5478>] nfs41_callback_svc+0x128/0x1f0 [nfsv4] [<ffffffff810ff970>] ? wait_woken+0xc0/0xc0 [<ffffffffa03d5350>] ? nfs4_callback_svc+0x60/0x60 [nfsv4] [<ffffffff810d45bf>] kthread+0x11f/0x140 [<ffffffff810ea815>] ? local_clock+0x15/0x30 [<ffffffff810d44a0>] ? kthread_create_on_node+0x250/0x250 [<ffffffff81874bfc>] ret_from_fork+0x7c/0xb0 [<ffffffff810d44a0>] ? kthread_create_on_node+0x250/0x250 ---[ end trace 675220a11e30f4f2 ]--- nfs41_callback_svc does most of its work while in TASK_INTERRUPTIBLE, which is just wrong. Fix that by finishing the wait immediately if we've found that the list has something on it. Also, we don't expect this kthread to accept signals, so we should be using a TASK_UNINTERRUPTIBLE sleep instead. Reported-by: "J. Bruce Fields" <bfields@fieldses.org> Signed-off-by: Jeff Layton <jlayton@primarydata.com> --- fs/nfs/callback.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)