Message ID | 20161129114921.62964-1-zyan@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Tue, Nov 29, 2016 at 12:49 PM, Yan, Zheng <zyan@redhat.com> wrote: > Current code does not wake up sync waiter if osd replies error > or the request does not want ondisk ack. Hi Zheng, Did you mean to write "and" instead of "or" - "if osd replies error *and* the request does not want ondisk ack"? Thanks, Ilya -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> On 29 Nov 2016, at 21:54, Ilya Dryomov <idryomov@gmail.com> wrote: > > On Tue, Nov 29, 2016 at 12:49 PM, Yan, Zheng <zyan@redhat.com> wrote: >> Current code does not wake up sync waiter if osd replies error >> or the request does not want ondisk ack. > > Hi Zheng, > > Did you mean to write "and" instead of "or" - "if osd replies error > *and* the request does not want ondisk ack”? I mean ‘or’. The code does not wake up sync waiter in either of these two cases. > > Thanks, > > Ilya -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Nov 29, 2016 at 2:59 PM, Yan, Zheng <zyan@redhat.com> wrote: > >> On 29 Nov 2016, at 21:54, Ilya Dryomov <idryomov@gmail.com> wrote: >> >> On Tue, Nov 29, 2016 at 12:49 PM, Yan, Zheng <zyan@redhat.com> wrote: >>> Current code does not wake up sync waiter if osd replies error >>> or the request does not want ondisk ack. >> >> Hi Zheng, >> >> Did you mean to write "and" instead of "or" - "if osd replies error >> *and* the request does not want ondisk ack”? > > I mean ‘or’. The code does not wake up sync waiter in either of these two cases. It does wake the sync waiter (r_safe_completion) on error, but only if ondisk ack was requested. If ondisk ack wasn't requested, it's not woken up, success or error. By not setting CEPH_OSD_FLAG_ONDISK, the submitter is indicating that *any* reply is fine. Blocking on the safe reply instead of any reply in the !CEPH_OSD_FLAG_ONDISK case sounds wrong. What is the motivation for this patch? Thanks, Ilya -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> On 29 Nov 2016, at 22:58, Ilya Dryomov <idryomov@gmail.com> wrote: > > On Tue, Nov 29, 2016 at 2:59 PM, Yan, Zheng <zyan@redhat.com> wrote: >> >>> On 29 Nov 2016, at 21:54, Ilya Dryomov <idryomov@gmail.com> wrote: >>> >>> On Tue, Nov 29, 2016 at 12:49 PM, Yan, Zheng <zyan@redhat.com> wrote: >>>> Current code does not wake up sync waiter if osd replies error >>>> or the request does not want ondisk ack. >>> >>> Hi Zheng, >>> >>> Did you mean to write "and" instead of "or" - "if osd replies error >>> *and* the request does not want ondisk ack”? >> >> I mean ‘or’. The code does not wake up sync waiter in either of these two cases. > > It does wake the sync waiter (r_safe_completion) on error, but only > if ondisk ack was requested. If ondisk ack wasn't requested, it's not > woken up, success or error. > > By not setting CEPH_OSD_FLAG_ONDISK, the submitter is indicating that > *any* reply is fine. Blocking on the safe reply instead of any reply > in the !CEPH_OSD_FLAG_ONDISK case sounds wrong. > > What is the motivation for this patch? I saw fsstress hang at ceph_osdc_sync several times. [52310.533483] sysrq: SysRq : Show Blocked State [52310.534057] task PC stack pid father [52310.534653] fsstress D14656 2836 2834 0x00000000 [52310.535568] ffff880036cff200 ffff88003e2dac80 0000000000002fa3 ffff88003e3142c0 [52310.536529] ffff88003fd18318 ffffc900038b7d48 ffffffff819354cc ffffffff00000000 [52310.537464] 000000013e314ae0 ffff88003e314890 ffff88003fd18318 ffff88003fd18300 [52310.538396] Call Trace: [52310.538762] [<ffffffff819354cc>] ? __schedule+0x6ac/0x880 [52310.539326] [<ffffffff81935713>] schedule+0x73/0x90 [52310.539840] [<ffffffff81939d21>] schedule_timeout+0x31/0x5d0 [52310.540425] [<ffffffff810ac4a6>] ? mark_held_locks+0x76/0x90 [52310.540980] [<ffffffff8193b447>] ? _raw_spin_unlock_irq+0x27/0x50 [52310.541582] [<ffffffff810ac64d>] ? trace_hardirqs_on_caller+0x18d/0x1c0 [52310.542217] [<ffffffff81936a1b>] wait_for_completion+0x8b/0x100 [52310.542788] [<ffffffff8108dc10>] ? wake_up_q+0x70/0x70 [52310.543380] [<ffffffffa00116b8>] ceph_osdc_sync+0xc8/0x150 [libceph] [52310.543990] [<ffffffff811eaed0>] ? SyS_tee+0x3b0/0x3b0 [52310.544569] [<ffffffffa005c3ea>] ceph_sync_fs+0x3a/0xb0 [ceph] [52310.545158] [<ffffffff811eaeeb>] sync_fs_one_sb+0x1b/0x20 [52310.545699] [<ffffffff811b931e>] iterate_supers+0x7e/0xe0 [52310.546258] [<ffffffff811eb1e0>] sys_sync+0x50/0x80 [52310.546849] [<ffffffff8193bb6a>] entry_SYSCALL_64_fastpath+0x18/0xad > > Thanks, > > Ilya -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Nov 29, 2016 at 4:03 PM, Yan, Zheng <zyan@redhat.com> wrote: > >> On 29 Nov 2016, at 22:58, Ilya Dryomov <idryomov@gmail.com> wrote: >> >> On Tue, Nov 29, 2016 at 2:59 PM, Yan, Zheng <zyan@redhat.com> wrote: >>> >>>> On 29 Nov 2016, at 21:54, Ilya Dryomov <idryomov@gmail.com> wrote: >>>> >>>> On Tue, Nov 29, 2016 at 12:49 PM, Yan, Zheng <zyan@redhat.com> wrote: >>>>> Current code does not wake up sync waiter if osd replies error >>>>> or the request does not want ondisk ack. >>>> >>>> Hi Zheng, >>>> >>>> Did you mean to write "and" instead of "or" - "if osd replies error >>>> *and* the request does not want ondisk ack”? >>> >>> I mean ‘or’. The code does not wake up sync waiter in either of these two cases. >> >> It does wake the sync waiter (r_safe_completion) on error, but only >> if ondisk ack was requested. If ondisk ack wasn't requested, it's not >> woken up, success or error. >> >> By not setting CEPH_OSD_FLAG_ONDISK, the submitter is indicating that >> *any* reply is fine. Blocking on the safe reply instead of any reply >> in the !CEPH_OSD_FLAG_ONDISK case sounds wrong. >> >> What is the motivation for this patch? > > I saw fsstress hang at ceph_osdc_sync several times. > > [52310.533483] sysrq: SysRq : Show Blocked State > [52310.534057] task PC stack pid father > [52310.534653] fsstress D14656 2836 2834 0x00000000 > [52310.535568] ffff880036cff200 ffff88003e2dac80 0000000000002fa3 ffff88003e3142c0 > [52310.536529] ffff88003fd18318 ffffc900038b7d48 ffffffff819354cc ffffffff00000000 > [52310.537464] 000000013e314ae0 ffff88003e314890 ffff88003fd18318 ffff88003fd18300 > [52310.538396] Call Trace: > [52310.538762] [<ffffffff819354cc>] ? __schedule+0x6ac/0x880 > [52310.539326] [<ffffffff81935713>] schedule+0x73/0x90 > [52310.539840] [<ffffffff81939d21>] schedule_timeout+0x31/0x5d0 > [52310.540425] [<ffffffff810ac4a6>] ? mark_held_locks+0x76/0x90 > [52310.540980] [<ffffffff8193b447>] ? _raw_spin_unlock_irq+0x27/0x50 > [52310.541582] [<ffffffff810ac64d>] ? trace_hardirqs_on_caller+0x18d/0x1c0 > [52310.542217] [<ffffffff81936a1b>] wait_for_completion+0x8b/0x100 > [52310.542788] [<ffffffff8108dc10>] ? wake_up_q+0x70/0x70 > [52310.543380] [<ffffffffa00116b8>] ceph_osdc_sync+0xc8/0x150 [libceph] > [52310.543990] [<ffffffff811eaed0>] ? SyS_tee+0x3b0/0x3b0 > [52310.544569] [<ffffffffa005c3ea>] ceph_sync_fs+0x3a/0xb0 [ceph] > [52310.545158] [<ffffffff811eaeeb>] sync_fs_one_sb+0x1b/0x20 > [52310.545699] [<ffffffff811b931e>] iterate_supers+0x7e/0xe0 > [52310.546258] [<ffffffff811eb1e0>] sys_sync+0x50/0x80 > [52310.546849] [<ffffffff8193bb6a>] entry_SYSCALL_64_fastpath+0x18/0xad Did you happen to capture /osdc? It's probably the pool perm check thing wr_req->r_flags = CEPH_OSD_FLAG_WRITE | CEPH_OSD_FLAG_ACK; osd_req_op_init(wr_req, 0, CEPH_OSD_OP_CREATE, CEPH_OSD_OP_FLAG_EXCL); but it would be good to confirm. Thanks, Ilya -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Nov 29, 2016 at 11:29 PM, Ilya Dryomov <idryomov@gmail.com> wrote: > On Tue, Nov 29, 2016 at 4:03 PM, Yan, Zheng <zyan@redhat.com> wrote: >> >>> On 29 Nov 2016, at 22:58, Ilya Dryomov <idryomov@gmail.com> wrote: >>> >>> On Tue, Nov 29, 2016 at 2:59 PM, Yan, Zheng <zyan@redhat.com> wrote: >>>> >>>>> On 29 Nov 2016, at 21:54, Ilya Dryomov <idryomov@gmail.com> wrote: >>>>> >>>>> On Tue, Nov 29, 2016 at 12:49 PM, Yan, Zheng <zyan@redhat.com> wrote: >>>>>> Current code does not wake up sync waiter if osd replies error >>>>>> or the request does not want ondisk ack. >>>>> >>>>> Hi Zheng, >>>>> >>>>> Did you mean to write "and" instead of "or" - "if osd replies error >>>>> *and* the request does not want ondisk ack”? >>>> >>>> I mean ‘or’. The code does not wake up sync waiter in either of these two cases. >>> >>> It does wake the sync waiter (r_safe_completion) on error, but only >>> if ondisk ack was requested. If ondisk ack wasn't requested, it's not >>> woken up, success or error. >>> >>> By not setting CEPH_OSD_FLAG_ONDISK, the submitter is indicating that >>> *any* reply is fine. Blocking on the safe reply instead of any reply >>> in the !CEPH_OSD_FLAG_ONDISK case sounds wrong. >>> >>> What is the motivation for this patch? >> >> I saw fsstress hang at ceph_osdc_sync several times. >> >> [52310.533483] sysrq: SysRq : Show Blocked State >> [52310.534057] task PC stack pid father >> [52310.534653] fsstress D14656 2836 2834 0x00000000 >> [52310.535568] ffff880036cff200 ffff88003e2dac80 0000000000002fa3 ffff88003e3142c0 >> [52310.536529] ffff88003fd18318 ffffc900038b7d48 ffffffff819354cc ffffffff00000000 >> [52310.537464] 000000013e314ae0 ffff88003e314890 ffff88003fd18318 ffff88003fd18300 >> [52310.538396] Call Trace: >> [52310.538762] [<ffffffff819354cc>] ? __schedule+0x6ac/0x880 >> [52310.539326] [<ffffffff81935713>] schedule+0x73/0x90 >> [52310.539840] [<ffffffff81939d21>] schedule_timeout+0x31/0x5d0 >> [52310.540425] [<ffffffff810ac4a6>] ? mark_held_locks+0x76/0x90 >> [52310.540980] [<ffffffff8193b447>] ? _raw_spin_unlock_irq+0x27/0x50 >> [52310.541582] [<ffffffff810ac64d>] ? trace_hardirqs_on_caller+0x18d/0x1c0 >> [52310.542217] [<ffffffff81936a1b>] wait_for_completion+0x8b/0x100 >> [52310.542788] [<ffffffff8108dc10>] ? wake_up_q+0x70/0x70 >> [52310.543380] [<ffffffffa00116b8>] ceph_osdc_sync+0xc8/0x150 [libceph] >> [52310.543990] [<ffffffff811eaed0>] ? SyS_tee+0x3b0/0x3b0 >> [52310.544569] [<ffffffffa005c3ea>] ceph_sync_fs+0x3a/0xb0 [ceph] >> [52310.545158] [<ffffffff811eaeeb>] sync_fs_one_sb+0x1b/0x20 >> [52310.545699] [<ffffffff811b931e>] iterate_supers+0x7e/0xe0 >> [52310.546258] [<ffffffff811eb1e0>] sys_sync+0x50/0x80 >> [52310.546849] [<ffffffff8193bb6a>] entry_SYSCALL_64_fastpath+0x18/0xad > > Did you happen to capture /osdc? It's probably the pool perm check > thing osdc is complete empty. > > wr_req->r_flags = CEPH_OSD_FLAG_WRITE | CEPH_OSD_FLAG_ACK; > osd_req_op_init(wr_req, 0, CEPH_OSD_OP_CREATE, CEPH_OSD_OP_FLAG_EXCL); > > but it would be good to confirm. > > Thanks, > > Ilya > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Nov 30, 2016 at 1:30 AM, Yan, Zheng <ukernel@gmail.com> wrote: > On Tue, Nov 29, 2016 at 11:29 PM, Ilya Dryomov <idryomov@gmail.com> wrote: >> On Tue, Nov 29, 2016 at 4:03 PM, Yan, Zheng <zyan@redhat.com> wrote: >>> >>>> On 29 Nov 2016, at 22:58, Ilya Dryomov <idryomov@gmail.com> wrote: >>>> >>>> On Tue, Nov 29, 2016 at 2:59 PM, Yan, Zheng <zyan@redhat.com> wrote: >>>>> >>>>>> On 29 Nov 2016, at 21:54, Ilya Dryomov <idryomov@gmail.com> wrote: >>>>>> >>>>>> On Tue, Nov 29, 2016 at 12:49 PM, Yan, Zheng <zyan@redhat.com> wrote: >>>>>>> Current code does not wake up sync waiter if osd replies error >>>>>>> or the request does not want ondisk ack. >>>>>> >>>>>> Hi Zheng, >>>>>> >>>>>> Did you mean to write "and" instead of "or" - "if osd replies error >>>>>> *and* the request does not want ondisk ack”? >>>>> >>>>> I mean ‘or’. The code does not wake up sync waiter in either of these two cases. >>>> >>>> It does wake the sync waiter (r_safe_completion) on error, but only >>>> if ondisk ack was requested. If ondisk ack wasn't requested, it's not >>>> woken up, success or error. >>>> >>>> By not setting CEPH_OSD_FLAG_ONDISK, the submitter is indicating that >>>> *any* reply is fine. Blocking on the safe reply instead of any reply >>>> in the !CEPH_OSD_FLAG_ONDISK case sounds wrong. >>>> >>>> What is the motivation for this patch? >>> >>> I saw fsstress hang at ceph_osdc_sync several times. >>> >>> [52310.533483] sysrq: SysRq : Show Blocked State >>> [52310.534057] task PC stack pid father >>> [52310.534653] fsstress D14656 2836 2834 0x00000000 >>> [52310.535568] ffff880036cff200 ffff88003e2dac80 0000000000002fa3 ffff88003e3142c0 >>> [52310.536529] ffff88003fd18318 ffffc900038b7d48 ffffffff819354cc ffffffff00000000 >>> [52310.537464] 000000013e314ae0 ffff88003e314890 ffff88003fd18318 ffff88003fd18300 >>> [52310.538396] Call Trace: >>> [52310.538762] [<ffffffff819354cc>] ? __schedule+0x6ac/0x880 >>> [52310.539326] [<ffffffff81935713>] schedule+0x73/0x90 >>> [52310.539840] [<ffffffff81939d21>] schedule_timeout+0x31/0x5d0 >>> [52310.540425] [<ffffffff810ac4a6>] ? mark_held_locks+0x76/0x90 >>> [52310.540980] [<ffffffff8193b447>] ? _raw_spin_unlock_irq+0x27/0x50 >>> [52310.541582] [<ffffffff810ac64d>] ? trace_hardirqs_on_caller+0x18d/0x1c0 >>> [52310.542217] [<ffffffff81936a1b>] wait_for_completion+0x8b/0x100 >>> [52310.542788] [<ffffffff8108dc10>] ? wake_up_q+0x70/0x70 >>> [52310.543380] [<ffffffffa00116b8>] ceph_osdc_sync+0xc8/0x150 [libceph] >>> [52310.543990] [<ffffffff811eaed0>] ? SyS_tee+0x3b0/0x3b0 >>> [52310.544569] [<ffffffffa005c3ea>] ceph_sync_fs+0x3a/0xb0 [ceph] >>> [52310.545158] [<ffffffff811eaeeb>] sync_fs_one_sb+0x1b/0x20 >>> [52310.545699] [<ffffffff811b931e>] iterate_supers+0x7e/0xe0 >>> [52310.546258] [<ffffffff811eb1e0>] sys_sync+0x50/0x80 >>> [52310.546849] [<ffffffff8193bb6a>] entry_SYSCALL_64_fastpath+0x18/0xad >> >> Did you happen to capture /osdc? It's probably the pool perm check >> thing > > osdc is complete empty. If you haven't already done so while investigating, could you please dump the offending @req fields? I want to make sure it's the pool perm check request... Thanks, Ilya -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> On 30 Nov 2016, at 17:06, Ilya Dryomov <idryomov@gmail.com> wrote: > > On Wed, Nov 30, 2016 at 1:30 AM, Yan, Zheng <ukernel@gmail.com> wrote: >> On Tue, Nov 29, 2016 at 11:29 PM, Ilya Dryomov <idryomov@gmail.com> wrote: >>> On Tue, Nov 29, 2016 at 4:03 PM, Yan, Zheng <zyan@redhat.com> wrote: >>>> >>>>> On 29 Nov 2016, at 22:58, Ilya Dryomov <idryomov@gmail.com> wrote: >>>>> >>>>> On Tue, Nov 29, 2016 at 2:59 PM, Yan, Zheng <zyan@redhat.com> wrote: >>>>>> >>>>>>> On 29 Nov 2016, at 21:54, Ilya Dryomov <idryomov@gmail.com> wrote: >>>>>>> >>>>>>> On Tue, Nov 29, 2016 at 12:49 PM, Yan, Zheng <zyan@redhat.com> wrote: >>>>>>>> Current code does not wake up sync waiter if osd replies error >>>>>>>> or the request does not want ondisk ack. >>>>>>> >>>>>>> Hi Zheng, >>>>>>> >>>>>>> Did you mean to write "and" instead of "or" - "if osd replies error >>>>>>> *and* the request does not want ondisk ack”? >>>>>> >>>>>> I mean ‘or’. The code does not wake up sync waiter in either of these two cases. >>>>> >>>>> It does wake the sync waiter (r_safe_completion) on error, but only >>>>> if ondisk ack was requested. If ondisk ack wasn't requested, it's not >>>>> woken up, success or error. >>>>> >>>>> By not setting CEPH_OSD_FLAG_ONDISK, the submitter is indicating that >>>>> *any* reply is fine. Blocking on the safe reply instead of any reply >>>>> in the !CEPH_OSD_FLAG_ONDISK case sounds wrong. >>>>> >>>>> What is the motivation for this patch? >>>> >>>> I saw fsstress hang at ceph_osdc_sync several times. >>>> >>>> [52310.533483] sysrq: SysRq : Show Blocked State >>>> [52310.534057] task PC stack pid father >>>> [52310.534653] fsstress D14656 2836 2834 0x00000000 >>>> [52310.535568] ffff880036cff200 ffff88003e2dac80 0000000000002fa3 ffff88003e3142c0 >>>> [52310.536529] ffff88003fd18318 ffffc900038b7d48 ffffffff819354cc ffffffff00000000 >>>> [52310.537464] 000000013e314ae0 ffff88003e314890 ffff88003fd18318 ffff88003fd18300 >>>> [52310.538396] Call Trace: >>>> [52310.538762] [<ffffffff819354cc>] ? __schedule+0x6ac/0x880 >>>> [52310.539326] [<ffffffff81935713>] schedule+0x73/0x90 >>>> [52310.539840] [<ffffffff81939d21>] schedule_timeout+0x31/0x5d0 >>>> [52310.540425] [<ffffffff810ac4a6>] ? mark_held_locks+0x76/0x90 >>>> [52310.540980] [<ffffffff8193b447>] ? _raw_spin_unlock_irq+0x27/0x50 >>>> [52310.541582] [<ffffffff810ac64d>] ? trace_hardirqs_on_caller+0x18d/0x1c0 >>>> [52310.542217] [<ffffffff81936a1b>] wait_for_completion+0x8b/0x100 >>>> [52310.542788] [<ffffffff8108dc10>] ? wake_up_q+0x70/0x70 >>>> [52310.543380] [<ffffffffa00116b8>] ceph_osdc_sync+0xc8/0x150 [libceph] >>>> [52310.543990] [<ffffffff811eaed0>] ? SyS_tee+0x3b0/0x3b0 >>>> [52310.544569] [<ffffffffa005c3ea>] ceph_sync_fs+0x3a/0xb0 [ceph] >>>> [52310.545158] [<ffffffff811eaeeb>] sync_fs_one_sb+0x1b/0x20 >>>> [52310.545699] [<ffffffff811b931e>] iterate_supers+0x7e/0xe0 >>>> [52310.546258] [<ffffffff811eb1e0>] sys_sync+0x50/0x80 >>>> [52310.546849] [<ffffffff8193bb6a>] entry_SYSCALL_64_fastpath+0x18/0xad >>> >>> Did you happen to capture /osdc? It's probably the pool perm check >>> thing >> >> osdc is complete empty. > > If you haven't already done so while investigating, could you please > dump the offending @req fields? I want to make sure it's the pool perm > check request… Sorry, I have already rebooted that machine. Yan, Zheng > > Thanks, > > Ilya -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Nov 30, 2016 at 11:02 AM, Yan, Zheng <zyan@redhat.com> wrote: > >> On 30 Nov 2016, at 17:06, Ilya Dryomov <idryomov@gmail.com> wrote: >> >> On Wed, Nov 30, 2016 at 1:30 AM, Yan, Zheng <ukernel@gmail.com> wrote: >>> On Tue, Nov 29, 2016 at 11:29 PM, Ilya Dryomov <idryomov@gmail.com> wrote: >>>> On Tue, Nov 29, 2016 at 4:03 PM, Yan, Zheng <zyan@redhat.com> wrote: >>>>> >>>>>> On 29 Nov 2016, at 22:58, Ilya Dryomov <idryomov@gmail.com> wrote: >>>>>> >>>>>> On Tue, Nov 29, 2016 at 2:59 PM, Yan, Zheng <zyan@redhat.com> wrote: >>>>>>> >>>>>>>> On 29 Nov 2016, at 21:54, Ilya Dryomov <idryomov@gmail.com> wrote: >>>>>>>> >>>>>>>> On Tue, Nov 29, 2016 at 12:49 PM, Yan, Zheng <zyan@redhat.com> wrote: >>>>>>>>> Current code does not wake up sync waiter if osd replies error >>>>>>>>> or the request does not want ondisk ack. >>>>>>>> >>>>>>>> Hi Zheng, >>>>>>>> >>>>>>>> Did you mean to write "and" instead of "or" - "if osd replies error >>>>>>>> *and* the request does not want ondisk ack”? >>>>>>> >>>>>>> I mean ‘or’. The code does not wake up sync waiter in either of these two cases. >>>>>> >>>>>> It does wake the sync waiter (r_safe_completion) on error, but only >>>>>> if ondisk ack was requested. If ondisk ack wasn't requested, it's not >>>>>> woken up, success or error. >>>>>> >>>>>> By not setting CEPH_OSD_FLAG_ONDISK, the submitter is indicating that >>>>>> *any* reply is fine. Blocking on the safe reply instead of any reply >>>>>> in the !CEPH_OSD_FLAG_ONDISK case sounds wrong. >>>>>> >>>>>> What is the motivation for this patch? >>>>> >>>>> I saw fsstress hang at ceph_osdc_sync several times. >>>>> >>>>> [52310.533483] sysrq: SysRq : Show Blocked State >>>>> [52310.534057] task PC stack pid father >>>>> [52310.534653] fsstress D14656 2836 2834 0x00000000 >>>>> [52310.535568] ffff880036cff200 ffff88003e2dac80 0000000000002fa3 ffff88003e3142c0 >>>>> [52310.536529] ffff88003fd18318 ffffc900038b7d48 ffffffff819354cc ffffffff00000000 >>>>> [52310.537464] 000000013e314ae0 ffff88003e314890 ffff88003fd18318 ffff88003fd18300 >>>>> [52310.538396] Call Trace: >>>>> [52310.538762] [<ffffffff819354cc>] ? __schedule+0x6ac/0x880 >>>>> [52310.539326] [<ffffffff81935713>] schedule+0x73/0x90 >>>>> [52310.539840] [<ffffffff81939d21>] schedule_timeout+0x31/0x5d0 >>>>> [52310.540425] [<ffffffff810ac4a6>] ? mark_held_locks+0x76/0x90 >>>>> [52310.540980] [<ffffffff8193b447>] ? _raw_spin_unlock_irq+0x27/0x50 >>>>> [52310.541582] [<ffffffff810ac64d>] ? trace_hardirqs_on_caller+0x18d/0x1c0 >>>>> [52310.542217] [<ffffffff81936a1b>] wait_for_completion+0x8b/0x100 >>>>> [52310.542788] [<ffffffff8108dc10>] ? wake_up_q+0x70/0x70 >>>>> [52310.543380] [<ffffffffa00116b8>] ceph_osdc_sync+0xc8/0x150 [libceph] >>>>> [52310.543990] [<ffffffff811eaed0>] ? SyS_tee+0x3b0/0x3b0 >>>>> [52310.544569] [<ffffffffa005c3ea>] ceph_sync_fs+0x3a/0xb0 [ceph] >>>>> [52310.545158] [<ffffffff811eaeeb>] sync_fs_one_sb+0x1b/0x20 >>>>> [52310.545699] [<ffffffff811b931e>] iterate_supers+0x7e/0xe0 >>>>> [52310.546258] [<ffffffff811eb1e0>] sys_sync+0x50/0x80 >>>>> [52310.546849] [<ffffffff8193bb6a>] entry_SYSCALL_64_fastpath+0x18/0xad >>>> >>>> Did you happen to capture /osdc? It's probably the pool perm check >>>> thing >>> >>> osdc is complete empty. >> >> If you haven't already done so while investigating, could you please >> dump the offending @req fields? I want to make sure it's the pool perm >> check request… > > Sorry, I have already rebooted that machine. Hi Zheng, I took it a step further and also fixed libceph cancellation code: https://github.com/ceph/ceph-client/commit/c297eb42690b904fb5b78dd9ad001bafe25f49ec I squashed your patch to reduce rename churn. Let me know if that's OK with you. Thanks, Ilya -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> On 15 Dec 2016, at 05:58, Ilya Dryomov <idryomov@gmail.com> wrote: > > On Wed, Nov 30, 2016 at 11:02 AM, Yan, Zheng <zyan@redhat.com> wrote: >> >>> On 30 Nov 2016, at 17:06, Ilya Dryomov <idryomov@gmail.com> wrote: >>> >>> On Wed, Nov 30, 2016 at 1:30 AM, Yan, Zheng <ukernel@gmail.com> wrote: >>>> On Tue, Nov 29, 2016 at 11:29 PM, Ilya Dryomov <idryomov@gmail.com> wrote: >>>>> On Tue, Nov 29, 2016 at 4:03 PM, Yan, Zheng <zyan@redhat.com> wrote: >>>>>> >>>>>>> On 29 Nov 2016, at 22:58, Ilya Dryomov <idryomov@gmail.com> wrote: >>>>>>> >>>>>>> On Tue, Nov 29, 2016 at 2:59 PM, Yan, Zheng <zyan@redhat.com> wrote: >>>>>>>> >>>>>>>>> On 29 Nov 2016, at 21:54, Ilya Dryomov <idryomov@gmail.com> wrote: >>>>>>>>> >>>>>>>>> On Tue, Nov 29, 2016 at 12:49 PM, Yan, Zheng <zyan@redhat.com> wrote: >>>>>>>>>> Current code does not wake up sync waiter if osd replies error >>>>>>>>>> or the request does not want ondisk ack. >>>>>>>>> >>>>>>>>> Hi Zheng, >>>>>>>>> >>>>>>>>> Did you mean to write "and" instead of "or" - "if osd replies error >>>>>>>>> *and* the request does not want ondisk ack”? >>>>>>>> >>>>>>>> I mean ‘or’. The code does not wake up sync waiter in either of these two cases. >>>>>>> >>>>>>> It does wake the sync waiter (r_safe_completion) on error, but only >>>>>>> if ondisk ack was requested. If ondisk ack wasn't requested, it's not >>>>>>> woken up, success or error. >>>>>>> >>>>>>> By not setting CEPH_OSD_FLAG_ONDISK, the submitter is indicating that >>>>>>> *any* reply is fine. Blocking on the safe reply instead of any reply >>>>>>> in the !CEPH_OSD_FLAG_ONDISK case sounds wrong. >>>>>>> >>>>>>> What is the motivation for this patch? >>>>>> >>>>>> I saw fsstress hang at ceph_osdc_sync several times. >>>>>> >>>>>> [52310.533483] sysrq: SysRq : Show Blocked State >>>>>> [52310.534057] task PC stack pid father >>>>>> [52310.534653] fsstress D14656 2836 2834 0x00000000 >>>>>> [52310.535568] ffff880036cff200 ffff88003e2dac80 0000000000002fa3 ffff88003e3142c0 >>>>>> [52310.536529] ffff88003fd18318 ffffc900038b7d48 ffffffff819354cc ffffffff00000000 >>>>>> [52310.537464] 000000013e314ae0 ffff88003e314890 ffff88003fd18318 ffff88003fd18300 >>>>>> [52310.538396] Call Trace: >>>>>> [52310.538762] [<ffffffff819354cc>] ? __schedule+0x6ac/0x880 >>>>>> [52310.539326] [<ffffffff81935713>] schedule+0x73/0x90 >>>>>> [52310.539840] [<ffffffff81939d21>] schedule_timeout+0x31/0x5d0 >>>>>> [52310.540425] [<ffffffff810ac4a6>] ? mark_held_locks+0x76/0x90 >>>>>> [52310.540980] [<ffffffff8193b447>] ? _raw_spin_unlock_irq+0x27/0x50 >>>>>> [52310.541582] [<ffffffff810ac64d>] ? trace_hardirqs_on_caller+0x18d/0x1c0 >>>>>> [52310.542217] [<ffffffff81936a1b>] wait_for_completion+0x8b/0x100 >>>>>> [52310.542788] [<ffffffff8108dc10>] ? wake_up_q+0x70/0x70 >>>>>> [52310.543380] [<ffffffffa00116b8>] ceph_osdc_sync+0xc8/0x150 [libceph] >>>>>> [52310.543990] [<ffffffff811eaed0>] ? SyS_tee+0x3b0/0x3b0 >>>>>> [52310.544569] [<ffffffffa005c3ea>] ceph_sync_fs+0x3a/0xb0 [ceph] >>>>>> [52310.545158] [<ffffffff811eaeeb>] sync_fs_one_sb+0x1b/0x20 >>>>>> [52310.545699] [<ffffffff811b931e>] iterate_supers+0x7e/0xe0 >>>>>> [52310.546258] [<ffffffff811eb1e0>] sys_sync+0x50/0x80 >>>>>> [52310.546849] [<ffffffff8193bb6a>] entry_SYSCALL_64_fastpath+0x18/0xad >>>>> >>>>> Did you happen to capture /osdc? It's probably the pool perm check >>>>> thing >>>> >>>> osdc is complete empty. >>> >>> If you haven't already done so while investigating, could you please >>> dump the offending @req fields? I want to make sure it's the pool perm >>> check request… >> >> Sorry, I have already rebooted that machine. > > Hi Zheng, > > I took it a step further and also fixed libceph cancellation code: > > https://github.com/ceph/ceph-client/commit/c297eb42690b904fb5b78dd9ad001bafe25f49ec > looks good Yan, Zheng > I squashed your patch to reduce rename churn. Let me know if that's OK > with you. > > Thanks, > > Ilya -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c index e6ae15b..54ffe8b 100644 --- a/net/ceph/osd_client.c +++ b/net/ceph/osd_client.c @@ -2934,8 +2934,7 @@ static void handle_reply(struct ceph_osd *osd, struct ceph_msg *msg) dout("req %p tid %llu cb\n", req, req->r_tid); __complete_request(req); } - if (m.flags & CEPH_OSD_FLAG_ONDISK) - complete_all(&req->r_safe_completion); + complete_all(&req->r_safe_completion); ceph_osdc_put_request(req); } else { if (req->r_unsafe_callback) {
Current code does not wake up sync waiter if osd replies error or the request does not want ondisk ack. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> --- net/ceph/osd_client.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)