libceph: wake up sync waiter when unregistering request

Message ID	20161129114921.62964-1-zyan@redhat.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <ceph-devel-owner@kernel.org> From: "Yan, Zheng" <zyan@redhat.com> To: idryomov@gmail.com, ceph-devel@vger.kernel.org Cc: "Yan, Zheng" <zyan@redhat.com> Subject: [PATCH] libceph: wake up sync waiter when unregistering request Date: Tue, 29 Nov 2016 19:49:21 +0800 Message-Id: <20161129114921.62964-1-zyan@redhat.com> Sender: ceph-devel-owner@vger.kernel.org Precedence: bulk

Yan, Zheng Nov. 29, 2016, 11:49 a.m. UTC

Current code does not wake up sync waiter if osd replies error
or the request does not want ondisk ack.

Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
---
 net/ceph/osd_client.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

Ilya Dryomov Nov. 29, 2016, 1:54 p.m. UTC | #1

On Tue, Nov 29, 2016 at 12:49 PM, Yan, Zheng <zyan@redhat.com> wrote:
> Current code does not wake up sync waiter if osd replies error
> or the request does not want ondisk ack.

Hi Zheng,

Did you mean to write "and" instead of "or" - "if osd replies error
*and* the request does not want ondisk ack"?

Thanks,

                Ilya
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Yan, Zheng Nov. 29, 2016, 1:59 p.m. UTC | #2

> On 29 Nov 2016, at 21:54, Ilya Dryomov <idryomov@gmail.com> wrote:
> 
> On Tue, Nov 29, 2016 at 12:49 PM, Yan, Zheng <zyan@redhat.com> wrote:
>> Current code does not wake up sync waiter if osd replies error
>> or the request does not want ondisk ack.
> 
> Hi Zheng,
> 
> Did you mean to write "and" instead of "or" - "if osd replies error
> *and* the request does not want ondisk ack”?

I mean ‘or’.  The code does not wake up sync waiter in either of these two cases.

> 
> Thanks,
> 
>                Ilya

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Ilya Dryomov Nov. 29, 2016, 2:58 p.m. UTC | #3

On Tue, Nov 29, 2016 at 2:59 PM, Yan, Zheng <zyan@redhat.com> wrote:
>
>> On 29 Nov 2016, at 21:54, Ilya Dryomov <idryomov@gmail.com> wrote:
>>
>> On Tue, Nov 29, 2016 at 12:49 PM, Yan, Zheng <zyan@redhat.com> wrote:
>>> Current code does not wake up sync waiter if osd replies error
>>> or the request does not want ondisk ack.
>>
>> Hi Zheng,
>>
>> Did you mean to write "and" instead of "or" - "if osd replies error
>> *and* the request does not want ondisk ack”?
>
> I mean ‘or’.  The code does not wake up sync waiter in either of these two cases.

It does wake the sync waiter (r_safe_completion) on error, but only
if ondisk ack was requested.  If ondisk ack wasn't requested, it's not
woken up, success or error.

By not setting CEPH_OSD_FLAG_ONDISK, the submitter is indicating that
*any* reply is fine.  Blocking on the safe reply instead of any reply
in the !CEPH_OSD_FLAG_ONDISK case sounds wrong.

What is the motivation for this patch?

Thanks,

                Ilya
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Yan, Zheng Nov. 29, 2016, 3:03 p.m. UTC | #4

> On 29 Nov 2016, at 22:58, Ilya Dryomov <idryomov@gmail.com> wrote:
> 
> On Tue, Nov 29, 2016 at 2:59 PM, Yan, Zheng <zyan@redhat.com> wrote:
>> 
>>> On 29 Nov 2016, at 21:54, Ilya Dryomov <idryomov@gmail.com> wrote:
>>> 
>>> On Tue, Nov 29, 2016 at 12:49 PM, Yan, Zheng <zyan@redhat.com> wrote:
>>>> Current code does not wake up sync waiter if osd replies error
>>>> or the request does not want ondisk ack.
>>> 
>>> Hi Zheng,
>>> 
>>> Did you mean to write "and" instead of "or" - "if osd replies error
>>> *and* the request does not want ondisk ack”?
>> 
>> I mean ‘or’.  The code does not wake up sync waiter in either of these two cases.
> 
> It does wake the sync waiter (r_safe_completion) on error, but only
> if ondisk ack was requested.  If ondisk ack wasn't requested, it's not
> woken up, success or error.
> 
> By not setting CEPH_OSD_FLAG_ONDISK, the submitter is indicating that
> *any* reply is fine.  Blocking on the safe reply instead of any reply
> in the !CEPH_OSD_FLAG_ONDISK case sounds wrong.
> 
> What is the motivation for this patch?

I saw fsstress hang at ceph_osdc_sync several times.

[52310.533483] sysrq: SysRq : Show Blocked State
[52310.534057]   task                        PC stack   pid father
[52310.534653] fsstress        D14656  2836   2834 0x00000000
[52310.535568]  ffff880036cff200 ffff88003e2dac80 0000000000002fa3 ffff88003e3142c0
[52310.536529]  ffff88003fd18318 ffffc900038b7d48 ffffffff819354cc ffffffff00000000
[52310.537464]  000000013e314ae0 ffff88003e314890 ffff88003fd18318 ffff88003fd18300
[52310.538396] Call Trace:
[52310.538762]  [<ffffffff819354cc>] ? __schedule+0x6ac/0x880
[52310.539326]  [<ffffffff81935713>] schedule+0x73/0x90
[52310.539840]  [<ffffffff81939d21>] schedule_timeout+0x31/0x5d0
[52310.540425]  [<ffffffff810ac4a6>] ? mark_held_locks+0x76/0x90
[52310.540980]  [<ffffffff8193b447>] ? _raw_spin_unlock_irq+0x27/0x50
[52310.541582]  [<ffffffff810ac64d>] ? trace_hardirqs_on_caller+0x18d/0x1c0
[52310.542217]  [<ffffffff81936a1b>] wait_for_completion+0x8b/0x100
[52310.542788]  [<ffffffff8108dc10>] ? wake_up_q+0x70/0x70
[52310.543380]  [<ffffffffa00116b8>] ceph_osdc_sync+0xc8/0x150 [libceph]
[52310.543990]  [<ffffffff811eaed0>] ? SyS_tee+0x3b0/0x3b0
[52310.544569]  [<ffffffffa005c3ea>] ceph_sync_fs+0x3a/0xb0 [ceph]
[52310.545158]  [<ffffffff811eaeeb>] sync_fs_one_sb+0x1b/0x20
[52310.545699]  [<ffffffff811b931e>] iterate_supers+0x7e/0xe0
[52310.546258]  [<ffffffff811eb1e0>] sys_sync+0x50/0x80
[52310.546849]  [<ffffffff8193bb6a>] entry_SYSCALL_64_fastpath+0x18/0xad

> 
> Thanks,
> 
>                Ilya

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Ilya Dryomov Nov. 29, 2016, 3:29 p.m. UTC | #5

On Tue, Nov 29, 2016 at 4:03 PM, Yan, Zheng <zyan@redhat.com> wrote:
>
>> On 29 Nov 2016, at 22:58, Ilya Dryomov <idryomov@gmail.com> wrote:
>>
>> On Tue, Nov 29, 2016 at 2:59 PM, Yan, Zheng <zyan@redhat.com> wrote:
>>>
>>>> On 29 Nov 2016, at 21:54, Ilya Dryomov <idryomov@gmail.com> wrote:
>>>>
>>>> On Tue, Nov 29, 2016 at 12:49 PM, Yan, Zheng <zyan@redhat.com> wrote:
>>>>> Current code does not wake up sync waiter if osd replies error
>>>>> or the request does not want ondisk ack.
>>>>
>>>> Hi Zheng,
>>>>
>>>> Did you mean to write "and" instead of "or" - "if osd replies error
>>>> *and* the request does not want ondisk ack”?
>>>
>>> I mean ‘or’.  The code does not wake up sync waiter in either of these two cases.
>>
>> It does wake the sync waiter (r_safe_completion) on error, but only
>> if ondisk ack was requested.  If ondisk ack wasn't requested, it's not
>> woken up, success or error.
>>
>> By not setting CEPH_OSD_FLAG_ONDISK, the submitter is indicating that
>> *any* reply is fine.  Blocking on the safe reply instead of any reply
>> in the !CEPH_OSD_FLAG_ONDISK case sounds wrong.
>>
>> What is the motivation for this patch?
>
> I saw fsstress hang at ceph_osdc_sync several times.
>
> [52310.533483] sysrq: SysRq : Show Blocked State
> [52310.534057]   task                        PC stack   pid father
> [52310.534653] fsstress        D14656  2836   2834 0x00000000
> [52310.535568]  ffff880036cff200 ffff88003e2dac80 0000000000002fa3 ffff88003e3142c0
> [52310.536529]  ffff88003fd18318 ffffc900038b7d48 ffffffff819354cc ffffffff00000000
> [52310.537464]  000000013e314ae0 ffff88003e314890 ffff88003fd18318 ffff88003fd18300
> [52310.538396] Call Trace:
> [52310.538762]  [<ffffffff819354cc>] ? __schedule+0x6ac/0x880
> [52310.539326]  [<ffffffff81935713>] schedule+0x73/0x90
> [52310.539840]  [<ffffffff81939d21>] schedule_timeout+0x31/0x5d0
> [52310.540425]  [<ffffffff810ac4a6>] ? mark_held_locks+0x76/0x90
> [52310.540980]  [<ffffffff8193b447>] ? _raw_spin_unlock_irq+0x27/0x50
> [52310.541582]  [<ffffffff810ac64d>] ? trace_hardirqs_on_caller+0x18d/0x1c0
> [52310.542217]  [<ffffffff81936a1b>] wait_for_completion+0x8b/0x100
> [52310.542788]  [<ffffffff8108dc10>] ? wake_up_q+0x70/0x70
> [52310.543380]  [<ffffffffa00116b8>] ceph_osdc_sync+0xc8/0x150 [libceph]
> [52310.543990]  [<ffffffff811eaed0>] ? SyS_tee+0x3b0/0x3b0
> [52310.544569]  [<ffffffffa005c3ea>] ceph_sync_fs+0x3a/0xb0 [ceph]
> [52310.545158]  [<ffffffff811eaeeb>] sync_fs_one_sb+0x1b/0x20
> [52310.545699]  [<ffffffff811b931e>] iterate_supers+0x7e/0xe0
> [52310.546258]  [<ffffffff811eb1e0>] sys_sync+0x50/0x80
> [52310.546849]  [<ffffffff8193bb6a>] entry_SYSCALL_64_fastpath+0x18/0xad

Did you happen to capture /osdc?  It's probably the pool perm check
thing

    wr_req->r_flags = CEPH_OSD_FLAG_WRITE | CEPH_OSD_FLAG_ACK;
    osd_req_op_init(wr_req, 0, CEPH_OSD_OP_CREATE, CEPH_OSD_OP_FLAG_EXCL);

but it would be good to confirm.

Thanks,

                Ilya
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Yan, Zheng Nov. 30, 2016, 12:30 a.m. UTC | #6

On Tue, Nov 29, 2016 at 11:29 PM, Ilya Dryomov <idryomov@gmail.com> wrote:
> On Tue, Nov 29, 2016 at 4:03 PM, Yan, Zheng <zyan@redhat.com> wrote:
>>
>>> On 29 Nov 2016, at 22:58, Ilya Dryomov <idryomov@gmail.com> wrote:
>>>
>>> On Tue, Nov 29, 2016 at 2:59 PM, Yan, Zheng <zyan@redhat.com> wrote:
>>>>
>>>>> On 29 Nov 2016, at 21:54, Ilya Dryomov <idryomov@gmail.com> wrote:
>>>>>
>>>>> On Tue, Nov 29, 2016 at 12:49 PM, Yan, Zheng <zyan@redhat.com> wrote:
>>>>>> Current code does not wake up sync waiter if osd replies error
>>>>>> or the request does not want ondisk ack.
>>>>>
>>>>> Hi Zheng,
>>>>>
>>>>> Did you mean to write "and" instead of "or" - "if osd replies error
>>>>> *and* the request does not want ondisk ack”?
>>>>
>>>> I mean ‘or’.  The code does not wake up sync waiter in either of these two cases.
>>>
>>> It does wake the sync waiter (r_safe_completion) on error, but only
>>> if ondisk ack was requested.  If ondisk ack wasn't requested, it's not
>>> woken up, success or error.
>>>
>>> By not setting CEPH_OSD_FLAG_ONDISK, the submitter is indicating that
>>> *any* reply is fine.  Blocking on the safe reply instead of any reply
>>> in the !CEPH_OSD_FLAG_ONDISK case sounds wrong.
>>>
>>> What is the motivation for this patch?
>>
>> I saw fsstress hang at ceph_osdc_sync several times.
>>
>> [52310.533483] sysrq: SysRq : Show Blocked State
>> [52310.534057]   task                        PC stack   pid father
>> [52310.534653] fsstress        D14656  2836   2834 0x00000000
>> [52310.535568]  ffff880036cff200 ffff88003e2dac80 0000000000002fa3 ffff88003e3142c0
>> [52310.536529]  ffff88003fd18318 ffffc900038b7d48 ffffffff819354cc ffffffff00000000
>> [52310.537464]  000000013e314ae0 ffff88003e314890 ffff88003fd18318 ffff88003fd18300
>> [52310.538396] Call Trace:
>> [52310.538762]  [<ffffffff819354cc>] ? __schedule+0x6ac/0x880
>> [52310.539326]  [<ffffffff81935713>] schedule+0x73/0x90
>> [52310.539840]  [<ffffffff81939d21>] schedule_timeout+0x31/0x5d0
>> [52310.540425]  [<ffffffff810ac4a6>] ? mark_held_locks+0x76/0x90
>> [52310.540980]  [<ffffffff8193b447>] ? _raw_spin_unlock_irq+0x27/0x50
>> [52310.541582]  [<ffffffff810ac64d>] ? trace_hardirqs_on_caller+0x18d/0x1c0
>> [52310.542217]  [<ffffffff81936a1b>] wait_for_completion+0x8b/0x100
>> [52310.542788]  [<ffffffff8108dc10>] ? wake_up_q+0x70/0x70
>> [52310.543380]  [<ffffffffa00116b8>] ceph_osdc_sync+0xc8/0x150 [libceph]
>> [52310.543990]  [<ffffffff811eaed0>] ? SyS_tee+0x3b0/0x3b0
>> [52310.544569]  [<ffffffffa005c3ea>] ceph_sync_fs+0x3a/0xb0 [ceph]
>> [52310.545158]  [<ffffffff811eaeeb>] sync_fs_one_sb+0x1b/0x20
>> [52310.545699]  [<ffffffff811b931e>] iterate_supers+0x7e/0xe0
>> [52310.546258]  [<ffffffff811eb1e0>] sys_sync+0x50/0x80
>> [52310.546849]  [<ffffffff8193bb6a>] entry_SYSCALL_64_fastpath+0x18/0xad
>
> Did you happen to capture /osdc?  It's probably the pool perm check
> thing

osdc is complete empty.

>
>     wr_req->r_flags = CEPH_OSD_FLAG_WRITE | CEPH_OSD_FLAG_ACK;
>     osd_req_op_init(wr_req, 0, CEPH_OSD_OP_CREATE, CEPH_OSD_OP_FLAG_EXCL);
>
> but it would be good to confirm.
>
> Thanks,
>
>                 Ilya
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Ilya Dryomov Nov. 30, 2016, 9:06 a.m. UTC | #7

On Wed, Nov 30, 2016 at 1:30 AM, Yan, Zheng <ukernel@gmail.com> wrote:
> On Tue, Nov 29, 2016 at 11:29 PM, Ilya Dryomov <idryomov@gmail.com> wrote:
>> On Tue, Nov 29, 2016 at 4:03 PM, Yan, Zheng <zyan@redhat.com> wrote:
>>>
>>>> On 29 Nov 2016, at 22:58, Ilya Dryomov <idryomov@gmail.com> wrote:
>>>>
>>>> On Tue, Nov 29, 2016 at 2:59 PM, Yan, Zheng <zyan@redhat.com> wrote:
>>>>>
>>>>>> On 29 Nov 2016, at 21:54, Ilya Dryomov <idryomov@gmail.com> wrote:
>>>>>>
>>>>>> On Tue, Nov 29, 2016 at 12:49 PM, Yan, Zheng <zyan@redhat.com> wrote:
>>>>>>> Current code does not wake up sync waiter if osd replies error
>>>>>>> or the request does not want ondisk ack.
>>>>>>
>>>>>> Hi Zheng,
>>>>>>
>>>>>> Did you mean to write "and" instead of "or" - "if osd replies error
>>>>>> *and* the request does not want ondisk ack”?
>>>>>
>>>>> I mean ‘or’.  The code does not wake up sync waiter in either of these two cases.
>>>>
>>>> It does wake the sync waiter (r_safe_completion) on error, but only
>>>> if ondisk ack was requested.  If ondisk ack wasn't requested, it's not
>>>> woken up, success or error.
>>>>
>>>> By not setting CEPH_OSD_FLAG_ONDISK, the submitter is indicating that
>>>> *any* reply is fine.  Blocking on the safe reply instead of any reply
>>>> in the !CEPH_OSD_FLAG_ONDISK case sounds wrong.
>>>>
>>>> What is the motivation for this patch?
>>>
>>> I saw fsstress hang at ceph_osdc_sync several times.
>>>
>>> [52310.533483] sysrq: SysRq : Show Blocked State
>>> [52310.534057]   task                        PC stack   pid father
>>> [52310.534653] fsstress        D14656  2836   2834 0x00000000
>>> [52310.535568]  ffff880036cff200 ffff88003e2dac80 0000000000002fa3 ffff88003e3142c0
>>> [52310.536529]  ffff88003fd18318 ffffc900038b7d48 ffffffff819354cc ffffffff00000000
>>> [52310.537464]  000000013e314ae0 ffff88003e314890 ffff88003fd18318 ffff88003fd18300
>>> [52310.538396] Call Trace:
>>> [52310.538762]  [<ffffffff819354cc>] ? __schedule+0x6ac/0x880
>>> [52310.539326]  [<ffffffff81935713>] schedule+0x73/0x90
>>> [52310.539840]  [<ffffffff81939d21>] schedule_timeout+0x31/0x5d0
>>> [52310.540425]  [<ffffffff810ac4a6>] ? mark_held_locks+0x76/0x90
>>> [52310.540980]  [<ffffffff8193b447>] ? _raw_spin_unlock_irq+0x27/0x50
>>> [52310.541582]  [<ffffffff810ac64d>] ? trace_hardirqs_on_caller+0x18d/0x1c0
>>> [52310.542217]  [<ffffffff81936a1b>] wait_for_completion+0x8b/0x100
>>> [52310.542788]  [<ffffffff8108dc10>] ? wake_up_q+0x70/0x70
>>> [52310.543380]  [<ffffffffa00116b8>] ceph_osdc_sync+0xc8/0x150 [libceph]
>>> [52310.543990]  [<ffffffff811eaed0>] ? SyS_tee+0x3b0/0x3b0
>>> [52310.544569]  [<ffffffffa005c3ea>] ceph_sync_fs+0x3a/0xb0 [ceph]
>>> [52310.545158]  [<ffffffff811eaeeb>] sync_fs_one_sb+0x1b/0x20
>>> [52310.545699]  [<ffffffff811b931e>] iterate_supers+0x7e/0xe0
>>> [52310.546258]  [<ffffffff811eb1e0>] sys_sync+0x50/0x80
>>> [52310.546849]  [<ffffffff8193bb6a>] entry_SYSCALL_64_fastpath+0x18/0xad
>>
>> Did you happen to capture /osdc?  It's probably the pool perm check
>> thing
>
> osdc is complete empty.

If you haven't already done so while investigating, could you please
dump the offending @req fields?  I want to make sure it's the pool perm
check request...

Thanks,

                Ilya
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Yan, Zheng Nov. 30, 2016, 10:02 a.m. UTC | #8

> On 30 Nov 2016, at 17:06, Ilya Dryomov <idryomov@gmail.com> wrote:
> 
> On Wed, Nov 30, 2016 at 1:30 AM, Yan, Zheng <ukernel@gmail.com> wrote:
>> On Tue, Nov 29, 2016 at 11:29 PM, Ilya Dryomov <idryomov@gmail.com> wrote:
>>> On Tue, Nov 29, 2016 at 4:03 PM, Yan, Zheng <zyan@redhat.com> wrote:
>>>> 
>>>>> On 29 Nov 2016, at 22:58, Ilya Dryomov <idryomov@gmail.com> wrote:
>>>>> 
>>>>> On Tue, Nov 29, 2016 at 2:59 PM, Yan, Zheng <zyan@redhat.com> wrote:
>>>>>> 
>>>>>>> On 29 Nov 2016, at 21:54, Ilya Dryomov <idryomov@gmail.com> wrote:
>>>>>>> 
>>>>>>> On Tue, Nov 29, 2016 at 12:49 PM, Yan, Zheng <zyan@redhat.com> wrote:
>>>>>>>> Current code does not wake up sync waiter if osd replies error
>>>>>>>> or the request does not want ondisk ack.
>>>>>>> 
>>>>>>> Hi Zheng,
>>>>>>> 
>>>>>>> Did you mean to write "and" instead of "or" - "if osd replies error
>>>>>>> *and* the request does not want ondisk ack”?
>>>>>> 
>>>>>> I mean ‘or’.  The code does not wake up sync waiter in either of these two cases.
>>>>> 
>>>>> It does wake the sync waiter (r_safe_completion) on error, but only
>>>>> if ondisk ack was requested.  If ondisk ack wasn't requested, it's not
>>>>> woken up, success or error.
>>>>> 
>>>>> By not setting CEPH_OSD_FLAG_ONDISK, the submitter is indicating that
>>>>> *any* reply is fine.  Blocking on the safe reply instead of any reply
>>>>> in the !CEPH_OSD_FLAG_ONDISK case sounds wrong.
>>>>> 
>>>>> What is the motivation for this patch?
>>>> 
>>>> I saw fsstress hang at ceph_osdc_sync several times.
>>>> 
>>>> [52310.533483] sysrq: SysRq : Show Blocked State
>>>> [52310.534057]   task                        PC stack   pid father
>>>> [52310.534653] fsstress        D14656  2836   2834 0x00000000
>>>> [52310.535568]  ffff880036cff200 ffff88003e2dac80 0000000000002fa3 ffff88003e3142c0
>>>> [52310.536529]  ffff88003fd18318 ffffc900038b7d48 ffffffff819354cc ffffffff00000000
>>>> [52310.537464]  000000013e314ae0 ffff88003e314890 ffff88003fd18318 ffff88003fd18300
>>>> [52310.538396] Call Trace:
>>>> [52310.538762]  [<ffffffff819354cc>] ? __schedule+0x6ac/0x880
>>>> [52310.539326]  [<ffffffff81935713>] schedule+0x73/0x90
>>>> [52310.539840]  [<ffffffff81939d21>] schedule_timeout+0x31/0x5d0
>>>> [52310.540425]  [<ffffffff810ac4a6>] ? mark_held_locks+0x76/0x90
>>>> [52310.540980]  [<ffffffff8193b447>] ? _raw_spin_unlock_irq+0x27/0x50
>>>> [52310.541582]  [<ffffffff810ac64d>] ? trace_hardirqs_on_caller+0x18d/0x1c0
>>>> [52310.542217]  [<ffffffff81936a1b>] wait_for_completion+0x8b/0x100
>>>> [52310.542788]  [<ffffffff8108dc10>] ? wake_up_q+0x70/0x70
>>>> [52310.543380]  [<ffffffffa00116b8>] ceph_osdc_sync+0xc8/0x150 [libceph]
>>>> [52310.543990]  [<ffffffff811eaed0>] ? SyS_tee+0x3b0/0x3b0
>>>> [52310.544569]  [<ffffffffa005c3ea>] ceph_sync_fs+0x3a/0xb0 [ceph]
>>>> [52310.545158]  [<ffffffff811eaeeb>] sync_fs_one_sb+0x1b/0x20
>>>> [52310.545699]  [<ffffffff811b931e>] iterate_supers+0x7e/0xe0
>>>> [52310.546258]  [<ffffffff811eb1e0>] sys_sync+0x50/0x80
>>>> [52310.546849]  [<ffffffff8193bb6a>] entry_SYSCALL_64_fastpath+0x18/0xad
>>> 
>>> Did you happen to capture /osdc?  It's probably the pool perm check
>>> thing
>> 
>> osdc is complete empty.
> 
> If you haven't already done so while investigating, could you please
> dump the offending @req fields?  I want to make sure it's the pool perm
> check request…

Sorry, I have already rebooted that machine. 

Yan, Zheng


> 
> Thanks,
> 
>                Ilya

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Ilya Dryomov Dec. 14, 2016, 9:58 p.m. UTC | #9

On Wed, Nov 30, 2016 at 11:02 AM, Yan, Zheng <zyan@redhat.com> wrote:
>
>> On 30 Nov 2016, at 17:06, Ilya Dryomov <idryomov@gmail.com> wrote:
>>
>> On Wed, Nov 30, 2016 at 1:30 AM, Yan, Zheng <ukernel@gmail.com> wrote:
>>> On Tue, Nov 29, 2016 at 11:29 PM, Ilya Dryomov <idryomov@gmail.com> wrote:
>>>> On Tue, Nov 29, 2016 at 4:03 PM, Yan, Zheng <zyan@redhat.com> wrote:
>>>>>
>>>>>> On 29 Nov 2016, at 22:58, Ilya Dryomov <idryomov@gmail.com> wrote:
>>>>>>
>>>>>> On Tue, Nov 29, 2016 at 2:59 PM, Yan, Zheng <zyan@redhat.com> wrote:
>>>>>>>
>>>>>>>> On 29 Nov 2016, at 21:54, Ilya Dryomov <idryomov@gmail.com> wrote:
>>>>>>>>
>>>>>>>> On Tue, Nov 29, 2016 at 12:49 PM, Yan, Zheng <zyan@redhat.com> wrote:
>>>>>>>>> Current code does not wake up sync waiter if osd replies error
>>>>>>>>> or the request does not want ondisk ack.
>>>>>>>>
>>>>>>>> Hi Zheng,
>>>>>>>>
>>>>>>>> Did you mean to write "and" instead of "or" - "if osd replies error
>>>>>>>> *and* the request does not want ondisk ack”?
>>>>>>>
>>>>>>> I mean ‘or’.  The code does not wake up sync waiter in either of these two cases.
>>>>>>
>>>>>> It does wake the sync waiter (r_safe_completion) on error, but only
>>>>>> if ondisk ack was requested.  If ondisk ack wasn't requested, it's not
>>>>>> woken up, success or error.
>>>>>>
>>>>>> By not setting CEPH_OSD_FLAG_ONDISK, the submitter is indicating that
>>>>>> *any* reply is fine.  Blocking on the safe reply instead of any reply
>>>>>> in the !CEPH_OSD_FLAG_ONDISK case sounds wrong.
>>>>>>
>>>>>> What is the motivation for this patch?
>>>>>
>>>>> I saw fsstress hang at ceph_osdc_sync several times.
>>>>>
>>>>> [52310.533483] sysrq: SysRq : Show Blocked State
>>>>> [52310.534057]   task                        PC stack   pid father
>>>>> [52310.534653] fsstress        D14656  2836   2834 0x00000000
>>>>> [52310.535568]  ffff880036cff200 ffff88003e2dac80 0000000000002fa3 ffff88003e3142c0
>>>>> [52310.536529]  ffff88003fd18318 ffffc900038b7d48 ffffffff819354cc ffffffff00000000
>>>>> [52310.537464]  000000013e314ae0 ffff88003e314890 ffff88003fd18318 ffff88003fd18300
>>>>> [52310.538396] Call Trace:
>>>>> [52310.538762]  [<ffffffff819354cc>] ? __schedule+0x6ac/0x880
>>>>> [52310.539326]  [<ffffffff81935713>] schedule+0x73/0x90
>>>>> [52310.539840]  [<ffffffff81939d21>] schedule_timeout+0x31/0x5d0
>>>>> [52310.540425]  [<ffffffff810ac4a6>] ? mark_held_locks+0x76/0x90
>>>>> [52310.540980]  [<ffffffff8193b447>] ? _raw_spin_unlock_irq+0x27/0x50
>>>>> [52310.541582]  [<ffffffff810ac64d>] ? trace_hardirqs_on_caller+0x18d/0x1c0
>>>>> [52310.542217]  [<ffffffff81936a1b>] wait_for_completion+0x8b/0x100
>>>>> [52310.542788]  [<ffffffff8108dc10>] ? wake_up_q+0x70/0x70
>>>>> [52310.543380]  [<ffffffffa00116b8>] ceph_osdc_sync+0xc8/0x150 [libceph]
>>>>> [52310.543990]  [<ffffffff811eaed0>] ? SyS_tee+0x3b0/0x3b0
>>>>> [52310.544569]  [<ffffffffa005c3ea>] ceph_sync_fs+0x3a/0xb0 [ceph]
>>>>> [52310.545158]  [<ffffffff811eaeeb>] sync_fs_one_sb+0x1b/0x20
>>>>> [52310.545699]  [<ffffffff811b931e>] iterate_supers+0x7e/0xe0
>>>>> [52310.546258]  [<ffffffff811eb1e0>] sys_sync+0x50/0x80
>>>>> [52310.546849]  [<ffffffff8193bb6a>] entry_SYSCALL_64_fastpath+0x18/0xad
>>>>
>>>> Did you happen to capture /osdc?  It's probably the pool perm check
>>>> thing
>>>
>>> osdc is complete empty.
>>
>> If you haven't already done so while investigating, could you please
>> dump the offending @req fields?  I want to make sure it's the pool perm
>> check request…
>
> Sorry, I have already rebooted that machine.

Hi Zheng,

I took it a step further and also fixed libceph cancellation code:

https://github.com/ceph/ceph-client/commit/c297eb42690b904fb5b78dd9ad001bafe25f49ec

I squashed your patch to reduce rename churn.  Let me know if that's OK
with you.

Thanks,

                Ilya
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Yan, Zheng Dec. 15, 2016, 7:58 a.m. UTC | #10

> On 15 Dec 2016, at 05:58, Ilya Dryomov <idryomov@gmail.com> wrote:
> 
> On Wed, Nov 30, 2016 at 11:02 AM, Yan, Zheng <zyan@redhat.com> wrote:
>> 
>>> On 30 Nov 2016, at 17:06, Ilya Dryomov <idryomov@gmail.com> wrote:
>>> 
>>> On Wed, Nov 30, 2016 at 1:30 AM, Yan, Zheng <ukernel@gmail.com> wrote:
>>>> On Tue, Nov 29, 2016 at 11:29 PM, Ilya Dryomov <idryomov@gmail.com> wrote:
>>>>> On Tue, Nov 29, 2016 at 4:03 PM, Yan, Zheng <zyan@redhat.com> wrote:
>>>>>> 
>>>>>>> On 29 Nov 2016, at 22:58, Ilya Dryomov <idryomov@gmail.com> wrote:
>>>>>>> 
>>>>>>> On Tue, Nov 29, 2016 at 2:59 PM, Yan, Zheng <zyan@redhat.com> wrote:
>>>>>>>> 
>>>>>>>>> On 29 Nov 2016, at 21:54, Ilya Dryomov <idryomov@gmail.com> wrote:
>>>>>>>>> 
>>>>>>>>> On Tue, Nov 29, 2016 at 12:49 PM, Yan, Zheng <zyan@redhat.com> wrote:
>>>>>>>>>> Current code does not wake up sync waiter if osd replies error
>>>>>>>>>> or the request does not want ondisk ack.
>>>>>>>>> 
>>>>>>>>> Hi Zheng,
>>>>>>>>> 
>>>>>>>>> Did you mean to write "and" instead of "or" - "if osd replies error
>>>>>>>>> *and* the request does not want ondisk ack”?
>>>>>>>> 
>>>>>>>> I mean ‘or’.  The code does not wake up sync waiter in either of these two cases.
>>>>>>> 
>>>>>>> It does wake the sync waiter (r_safe_completion) on error, but only
>>>>>>> if ondisk ack was requested.  If ondisk ack wasn't requested, it's not
>>>>>>> woken up, success or error.
>>>>>>> 
>>>>>>> By not setting CEPH_OSD_FLAG_ONDISK, the submitter is indicating that
>>>>>>> *any* reply is fine.  Blocking on the safe reply instead of any reply
>>>>>>> in the !CEPH_OSD_FLAG_ONDISK case sounds wrong.
>>>>>>> 
>>>>>>> What is the motivation for this patch?
>>>>>> 
>>>>>> I saw fsstress hang at ceph_osdc_sync several times.
>>>>>> 
>>>>>> [52310.533483] sysrq: SysRq : Show Blocked State
>>>>>> [52310.534057]   task                        PC stack   pid father
>>>>>> [52310.534653] fsstress        D14656  2836   2834 0x00000000
>>>>>> [52310.535568]  ffff880036cff200 ffff88003e2dac80 0000000000002fa3 ffff88003e3142c0
>>>>>> [52310.536529]  ffff88003fd18318 ffffc900038b7d48 ffffffff819354cc ffffffff00000000
>>>>>> [52310.537464]  000000013e314ae0 ffff88003e314890 ffff88003fd18318 ffff88003fd18300
>>>>>> [52310.538396] Call Trace:
>>>>>> [52310.538762]  [<ffffffff819354cc>] ? __schedule+0x6ac/0x880
>>>>>> [52310.539326]  [<ffffffff81935713>] schedule+0x73/0x90
>>>>>> [52310.539840]  [<ffffffff81939d21>] schedule_timeout+0x31/0x5d0
>>>>>> [52310.540425]  [<ffffffff810ac4a6>] ? mark_held_locks+0x76/0x90
>>>>>> [52310.540980]  [<ffffffff8193b447>] ? _raw_spin_unlock_irq+0x27/0x50
>>>>>> [52310.541582]  [<ffffffff810ac64d>] ? trace_hardirqs_on_caller+0x18d/0x1c0
>>>>>> [52310.542217]  [<ffffffff81936a1b>] wait_for_completion+0x8b/0x100
>>>>>> [52310.542788]  [<ffffffff8108dc10>] ? wake_up_q+0x70/0x70
>>>>>> [52310.543380]  [<ffffffffa00116b8>] ceph_osdc_sync+0xc8/0x150 [libceph]
>>>>>> [52310.543990]  [<ffffffff811eaed0>] ? SyS_tee+0x3b0/0x3b0
>>>>>> [52310.544569]  [<ffffffffa005c3ea>] ceph_sync_fs+0x3a/0xb0 [ceph]
>>>>>> [52310.545158]  [<ffffffff811eaeeb>] sync_fs_one_sb+0x1b/0x20
>>>>>> [52310.545699]  [<ffffffff811b931e>] iterate_supers+0x7e/0xe0
>>>>>> [52310.546258]  [<ffffffff811eb1e0>] sys_sync+0x50/0x80
>>>>>> [52310.546849]  [<ffffffff8193bb6a>] entry_SYSCALL_64_fastpath+0x18/0xad
>>>>> 
>>>>> Did you happen to capture /osdc?  It's probably the pool perm check
>>>>> thing
>>>> 
>>>> osdc is complete empty.
>>> 
>>> If you haven't already done so while investigating, could you please
>>> dump the offending @req fields?  I want to make sure it's the pool perm
>>> check request…
>> 
>> Sorry, I have already rebooted that machine.
> 
> Hi Zheng,
> 
> I took it a step further and also fixed libceph cancellation code:
> 
> https://github.com/ceph/ceph-client/commit/c297eb42690b904fb5b78dd9ad001bafe25f49ec
> 

looks good

Yan, Zheng


> I squashed your patch to reduce rename churn.  Let me know if that's OK
> with you.
> 
> Thanks,
> 
>               Ilya

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

libceph: wake up sync waiter when unregistering request

Commit Message

Comments

Patch