diff mbox

[1/2] rbd: don't queue watch delayed work when we are removing device

Message ID 1527564161-17328-1-git-send-email-dongsheng.yang@easystack.cn (mailing list archive)
State New, archived
Headers show

Commit Message

Dongsheng Yang May 29, 2018, 3:22 a.m. UTC
We will cancel all watch delayed work in cancel_delayed_work_sync(&rbd_dev->watch_dwork);
If we queue delayed work after this, there will be a use-after-free problem:

[  549.932085] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
[  549.934134] PGD 0 P4D 0
[  549.935145] Oops: 0000 [#1] SMP PTI
[  549.936283] Modules linked in: rbd(OE) libceph(OE) tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag dns_resolver ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter sg cfg80211 rfkill snd_hda_codec_generic ext4 snd_hda_intel snd_hda_codec crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_core pcbc snd_hwdep snd_seq mbcache aesni_intel snd_seq_device jbd2 crypto_simd nfsd cryptd glue_helper snd_pcm snd_timer auth_rpcgss pcspkr snd virtio_balloon nfs_acl soundcore i2c_piix4 lockd grace sunrpc ip_tables xfs libcrc32c virtio_console virtio_blk ata_generic pata_acpi 8139too qxl drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix libata crc32c_intel virtio_pci 8139cp virtio_ring i2c_core mii virtio floppy serio_raw dm_mirror dm_region_hash
[  549.951835]  dm_log dm_mod dax [last unloaded: libceph]
[  549.953490] CPU: 7 PID: 0 Comm: swapper/7 Tainted: G           OE     4.17.0-rc6+ #13
[  549.955502] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[  549.957246] RIP: 0010:__queue_work+0x6a/0x3b0
[  549.958744] RSP: 0018:ffff9427df1c3e90 EFLAGS: 00010086
[  549.960374] RAX: ffff9427deca8400 RBX: 0000000000000000 RCX: 0000000000000000
[  549.962297] RDX: ffff9427deca8400 RSI: ffff9427df1c3e50 RDI: 0000000000000000
[  549.964216] RBP: ffff942783e39e00 R08: ffff9427deca8400 R09: ffff9427df1c3f00
[  549.966136] R10: 0000000000000004 R11: 0000000000000005 R12: ffff9427cfb85970
[  549.968070] R13: 0000000000002000 R14: 000000000001eca0 R15: 0000000000000007
[  549.969999] FS:  0000000000000000(0000) GS:ffff9427df1c0000(0000) knlGS:0000000000000000
[  549.972069] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  549.973775] CR2: 0000000000000000 CR3: 00000004c900a005 CR4: 00000000000206e0
[  549.975695] Call Trace:
[  549.976900]  <IRQ>
[  549.978033]  ? __queue_work+0x3b0/0x3b0
[  549.979442]  call_timer_fn+0x2d/0x130
[  549.980824]  run_timer_softirq+0x16e/0x430
[  549.982263]  ? tick_sched_timer+0x37/0x70
[  549.983691]  __do_softirq+0xd2/0x280
[  549.985035]  irq_exit+0xd5/0xe0
[  549.986316]  smp_apic_timer_interrupt+0x6c/0x130
[  549.987835]  apic_timer_interrupt+0xf/0x20

This patch forbid to queue watch_dwork when we are removing device.

Signed-off-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
---
 drivers/block/rbd.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

Comments

Ilya Dryomov May 30, 2018, 7:11 p.m. UTC | #1
On Tue, May 29, 2018 at 5:22 AM, Dongsheng Yang
<dongsheng.yang@easystack.cn> wrote:
> We will cancel all watch delayed work in cancel_delayed_work_sync(&rbd_dev->watch_dwork);
> If we queue delayed work after this, there will be a use-after-free problem:
>
> [  549.932085] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
> [  549.934134] PGD 0 P4D 0
> [  549.935145] Oops: 0000 [#1] SMP PTI
> [  549.936283] Modules linked in: rbd(OE) libceph(OE) tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag dns_resolver ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter sg cfg80211 rfkill snd_hda_codec_generic ext4 snd_hda_intel snd_hda_codec crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_core pcbc snd_hwdep snd_seq mbcache aesni_intel snd_seq_device jbd2 crypto_simd nfsd cryptd glue_helper snd_pcm snd_timer auth_rpcgss pcspkr snd virtio_balloon nfs_acl soundcore i2c_piix4 lockd grace sunrpc ip_tables xfs libcrc32c virtio_console virtio_blk ata_generic pata_acpi 8139too qxl drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix libata crc32c_intel virtio_pci 8139cp virtio_ring i2c_core mii virtio floppy serio_raw dm_mirror dm_region_hash
> [  549.951835]  dm_log dm_mod dax [last unloaded: libceph]
> [  549.953490] CPU: 7 PID: 0 Comm: swapper/7 Tainted: G           OE     4.17.0-rc6+ #13
> [  549.955502] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
> [  549.957246] RIP: 0010:__queue_work+0x6a/0x3b0
> [  549.958744] RSP: 0018:ffff9427df1c3e90 EFLAGS: 00010086
> [  549.960374] RAX: ffff9427deca8400 RBX: 0000000000000000 RCX: 0000000000000000
> [  549.962297] RDX: ffff9427deca8400 RSI: ffff9427df1c3e50 RDI: 0000000000000000
> [  549.964216] RBP: ffff942783e39e00 R08: ffff9427deca8400 R09: ffff9427df1c3f00
> [  549.966136] R10: 0000000000000004 R11: 0000000000000005 R12: ffff9427cfb85970
> [  549.968070] R13: 0000000000002000 R14: 000000000001eca0 R15: 0000000000000007
> [  549.969999] FS:  0000000000000000(0000) GS:ffff9427df1c0000(0000) knlGS:0000000000000000
> [  549.972069] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  549.973775] CR2: 0000000000000000 CR3: 00000004c900a005 CR4: 00000000000206e0
> [  549.975695] Call Trace:
> [  549.976900]  <IRQ>
> [  549.978033]  ? __queue_work+0x3b0/0x3b0
> [  549.979442]  call_timer_fn+0x2d/0x130
> [  549.980824]  run_timer_softirq+0x16e/0x430
> [  549.982263]  ? tick_sched_timer+0x37/0x70
> [  549.983691]  __do_softirq+0xd2/0x280
> [  549.985035]  irq_exit+0xd5/0xe0
> [  549.986316]  smp_apic_timer_interrupt+0x6c/0x130
> [  549.987835]  apic_timer_interrupt+0xf/0x20
>
> This patch forbid to queue watch_dwork when we are removing device.
>
> Signed-off-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
> ---
>  drivers/block/rbd.c | 10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
> index 2b4e90d..d1d8f46 100644
> --- a/drivers/block/rbd.c
> +++ b/drivers/block/rbd.c
> @@ -3475,9 +3475,13 @@ static void rbd_reregister_watch(struct work_struct *work)
>                         set_bit(RBD_DEV_FLAG_BLACKLISTED, &rbd_dev->flags);
>                         wake_requests(rbd_dev, true);
>                 } else {
> -                       queue_delayed_work(rbd_dev->task_wq,
> -                                          &rbd_dev->watch_dwork,
> -                                          RBD_RETRY_DELAY);
> +                       spin_lock_irq(&rbd_dev->lock);
> +                       if (!test_bit(RBD_DEV_FLAG_REMOVING, &rbd_dev->flags)) {
> +                               queue_delayed_work(rbd_dev->task_wq,
> +                                                  &rbd_dev->watch_dwork,
> +                                                  RBD_RETRY_DELAY);
> +                       }
> +                       spin_unlock_irq(&rbd_dev->lock);
>                 }
>                 mutex_unlock(&rbd_dev->watch_mutex);
>                 return;

Hi Dongsheng,

What made you think it is rbd (or ceph) related?  Do you know what gets
dereferenced?  Is it reproducible?

While watch-related code could have been better, rbd_dev doesn't get
destroyed until rbd_dev->task_wq is flushed in rbd_dev_release(), so at
first sight I don't see a problem.

Thanks,

                Ilya
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dongsheng Yang May 31, 2018, 7:18 a.m. UTC | #2
On 05/31/2018 03:11 AM, Ilya Dryomov wrote:
> On Tue, May 29, 2018 at 5:22 AM, Dongsheng Yang
> <dongsheng.yang@easystack.cn> wrote:
>> We will cancel all watch delayed work in cancel_delayed_work_sync(&rbd_dev->watch_dwork);
>> If we queue delayed work after this, there will be a use-after-free problem:
>>
>> [  549.932085] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
>> [  549.934134] PGD 0 P4D 0
>> [  549.935145] Oops: 0000 [#1] SMP PTI
>> [  549.936283] Modules linked in: rbd(OE) libceph(OE) tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag dns_resolver ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter sg cfg80211 rfkill snd_hda_codec_generic ext4 snd_hda_intel snd_hda_codec crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_core pcbc snd_hwdep snd_seq mbcache aesni_intel snd_seq_device jbd2 crypto_simd nfsd cryptd glue_helper snd_pcm snd_timer auth_rpcgss pcspkr snd virtio_balloon nfs_acl soundcore i2c_piix4 lockd grace sunrpc ip_tables xfs libcrc32c virtio_console virtio_blk ata_generic pata_acpi 8139too qxl drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix libata crc32c_intel virtio_pci 8139cp virtio_ring i2c_core mii virtio floppy serio_raw dm_mirror dm_region_hash
>> [  549.951835]  dm_log dm_mod dax [last unloaded: libceph]
>> [  549.953490] CPU: 7 PID: 0 Comm: swapper/7 Tainted: G           OE     4.17.0-rc6+ #13
>> [  549.955502] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
>> [  549.957246] RIP: 0010:__queue_work+0x6a/0x3b0
>> [  549.958744] RSP: 0018:ffff9427df1c3e90 EFLAGS: 00010086
>> [  549.960374] RAX: ffff9427deca8400 RBX: 0000000000000000 RCX: 0000000000000000
>> [  549.962297] RDX: ffff9427deca8400 RSI: ffff9427df1c3e50 RDI: 0000000000000000
>> [  549.964216] RBP: ffff942783e39e00 R08: ffff9427deca8400 R09: ffff9427df1c3f00
>> [  549.966136] R10: 0000000000000004 R11: 0000000000000005 R12: ffff9427cfb85970
>> [  549.968070] R13: 0000000000002000 R14: 000000000001eca0 R15: 0000000000000007
>> [  549.969999] FS:  0000000000000000(0000) GS:ffff9427df1c0000(0000) knlGS:0000000000000000
>> [  549.972069] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [  549.973775] CR2: 0000000000000000 CR3: 00000004c900a005 CR4: 00000000000206e0
>> [  549.975695] Call Trace:
>> [  549.976900]  <IRQ>
>> [  549.978033]  ? __queue_work+0x3b0/0x3b0
>> [  549.979442]  call_timer_fn+0x2d/0x130
>> [  549.980824]  run_timer_softirq+0x16e/0x430
>> [  549.982263]  ? tick_sched_timer+0x37/0x70
>> [  549.983691]  __do_softirq+0xd2/0x280
>> [  549.985035]  irq_exit+0xd5/0xe0
>> [  549.986316]  smp_apic_timer_interrupt+0x6c/0x130
>> [  549.987835]  apic_timer_interrupt+0xf/0x20
>>
>> This patch forbid to queue watch_dwork when we are removing device.
>>
>> Signed-off-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
>> ---
>>   drivers/block/rbd.c | 10 +++++++---
>>   1 file changed, 7 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
>> index 2b4e90d..d1d8f46 100644
>> --- a/drivers/block/rbd.c
>> +++ b/drivers/block/rbd.c
>> @@ -3475,9 +3475,13 @@ static void rbd_reregister_watch(struct work_struct *work)
>>                          set_bit(RBD_DEV_FLAG_BLACKLISTED, &rbd_dev->flags);
>>                          wake_requests(rbd_dev, true);
>>                  } else {
>> -                       queue_delayed_work(rbd_dev->task_wq,
>> -                                          &rbd_dev->watch_dwork,
>> -                                          RBD_RETRY_DELAY);
>> +                       spin_lock_irq(&rbd_dev->lock);
>> +                       if (!test_bit(RBD_DEV_FLAG_REMOVING, &rbd_dev->flags)) {
>> +                               queue_delayed_work(rbd_dev->task_wq,
>> +                                                  &rbd_dev->watch_dwork,
>> +                                                  RBD_RETRY_DELAY);
>> +                       }
>> +                       spin_unlock_irq(&rbd_dev->lock);
>>                  }
>>                  mutex_unlock(&rbd_dev->watch_mutex);
>>                  return;
> Hi Dongsheng,
>
> What made you think it is rbd (or ceph) related?  Do you know what gets
> dereferenced?  Is it reproducible?

Hi Ilya,
     There is a simple reproduce script:

./vstart.sh  -k -l --bluestore
rbd map -o osd_request_timeout=10 test1
time dd if=/dev/zero of=/dev/rbd0 bs=64K count=1000 oflag=direct &
sleep 1
ps -ef|grep -E "ceph-mon|ceph-osd"|gawk '{print "kill -9 "$2}'|bash
rbd unmap -o force /dev/rbd0

But maybe you need to run this script in a loop, because that's not 100% 
happen.
>
> While watch-related code could have been better, rbd_dev doesn't get
> destroyed until rbd_dev->task_wq is flushed in rbd_dev_release(), so at
> first sight I don't see a problem.

unmap Thread watch Thread                                               
timer
do_rbd_remove
| |                                                       |
   |-- test_and_set_bit(RBD_DEV_FLAG_REMOVING) 
|                                                       |
| |                                                       |
   |-- cancel_tasks_sync(rbd_dev) 
|                                                       |
| |-- queue_delayed_work for watch                        |
| |                                                       |
   |-- destroy_workqueue(rbd_dev->task_wq); 
|                                                       |
| |                                                       |
                 |-- drain_workqueue(wq); 
|                                                       |
                 |-- destroy other resources in wq 
|                                                       |
| |                                                       |
| |                                                       |-- call_timer_fn
| |
|-- __queue_work()



I draw a sequential for this case,

When we are going to destroy workqueue, we need to cancel_tasks_sync to 
cancel all
delayed work in watch_dwork timer. but this is not protected by lock, 
thus after this work, we can
queue_delayed_work for watch too. In this case, the timer maybe expire 
after drain_workqueue(), then
this work Escape the cancel in timer by cancel_tasks_sync(rbd_dev) , and 
escape drain_workqueu()
by destroy_workqueue().

When the timer expire and call timer fn, we will get problem if we 
already call destroy_workqueue().

So we have to ensure we can't queue delayed work after cancel_tasks_sync().

Thanx
Dongsheng
>
> Thanks,
>
>                  Ilya
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dongsheng Yang May 31, 2018, 7:23 a.m. UTC | #3
On 05/31/2018 03:18 PM, Dongsheng Yang wrote:
>
> On 05/31/2018 03:11 AM, Ilya Dryomov wrote:
>> On Tue, May 29, 2018 at 5:22 AM, Dongsheng Yang
>> <dongsheng.yang@easystack.cn> wrote:
>>> We will cancel all watch delayed work in 
>>> cancel_delayed_work_sync(&rbd_dev->watch_dwork);
>>> If we queue delayed work after this, there will be a use-after-free 
>>> problem:
>>>
>>> [  549.932085] BUG: unable to handle kernel NULL pointer dereference 
>>> at 0000000000000000
>>> [  549.934134] PGD 0 P4D 0
>>> [  549.935145] Oops: 0000 [#1] SMP PTI
>>> [  549.936283] Modules linked in: rbd(OE) libceph(OE) tcp_diag 
>>> udp_diag inet_diag unix_diag af_packet_diag netlink_diag 
>>> dns_resolver ebtable_filter ebtables ip6table_filter ip6_tables 
>>> iptable_filter sg cfg80211 rfkill snd_hda_codec_generic ext4 
>>> snd_hda_intel snd_hda_codec crct10dif_pclmul crc32_pclmul 
>>> ghash_clmulni_intel snd_hda_core pcbc snd_hwdep snd_seq mbcache 
>>> aesni_intel snd_seq_device jbd2 crypto_simd nfsd cryptd glue_helper 
>>> snd_pcm snd_timer auth_rpcgss pcspkr snd virtio_balloon nfs_acl 
>>> soundcore i2c_piix4 lockd grace sunrpc ip_tables xfs libcrc32c 
>>> virtio_console virtio_blk ata_generic pata_acpi 8139too qxl 
>>> drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm 
>>> ata_piix libata crc32c_intel virtio_pci 8139cp virtio_ring i2c_core 
>>> mii virtio floppy serio_raw dm_mirror dm_region_hash
>>> [  549.951835]  dm_log dm_mod dax [last unloaded: libceph]
>>> [  549.953490] CPU: 7 PID: 0 Comm: swapper/7 Tainted: G           
>>> OE     4.17.0-rc6+ #13
>>> [  549.955502] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
>>> [  549.957246] RIP: 0010:__queue_work+0x6a/0x3b0
>>> [  549.958744] RSP: 0018:ffff9427df1c3e90 EFLAGS: 00010086
>>> [  549.960374] RAX: ffff9427deca8400 RBX: 0000000000000000 RCX: 
>>> 0000000000000000
>>> [  549.962297] RDX: ffff9427deca8400 RSI: ffff9427df1c3e50 RDI: 
>>> 0000000000000000
>>> [  549.964216] RBP: ffff942783e39e00 R08: ffff9427deca8400 R09: 
>>> ffff9427df1c3f00
>>> [  549.966136] R10: 0000000000000004 R11: 0000000000000005 R12: 
>>> ffff9427cfb85970
>>> [  549.968070] R13: 0000000000002000 R14: 000000000001eca0 R15: 
>>> 0000000000000007
>>> [  549.969999] FS:  0000000000000000(0000) GS:ffff9427df1c0000(0000) 
>>> knlGS:0000000000000000
>>> [  549.972069] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [  549.973775] CR2: 0000000000000000 CR3: 00000004c900a005 CR4: 
>>> 00000000000206e0
>>> [  549.975695] Call Trace:
>>> [  549.976900]  <IRQ>
>>> [  549.978033]  ? __queue_work+0x3b0/0x3b0
>>> [  549.979442]  call_timer_fn+0x2d/0x130
>>> [  549.980824]  run_timer_softirq+0x16e/0x430
>>> [  549.982263]  ? tick_sched_timer+0x37/0x70
>>> [  549.983691]  __do_softirq+0xd2/0x280
>>> [  549.985035]  irq_exit+0xd5/0xe0
>>> [  549.986316]  smp_apic_timer_interrupt+0x6c/0x130
>>> [  549.987835]  apic_timer_interrupt+0xf/0x20
>>>
>>> This patch forbid to queue watch_dwork when we are removing device.
>>>
>>> Signed-off-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
>>> ---
>>>   drivers/block/rbd.c | 10 +++++++---
>>>   1 file changed, 7 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
>>> index 2b4e90d..d1d8f46 100644
>>> --- a/drivers/block/rbd.c
>>> +++ b/drivers/block/rbd.c
>>> @@ -3475,9 +3475,13 @@ static void rbd_reregister_watch(struct 
>>> work_struct *work)
>>>                          set_bit(RBD_DEV_FLAG_BLACKLISTED, 
>>> &rbd_dev->flags);
>>>                          wake_requests(rbd_dev, true);
>>>                  } else {
>>> - queue_delayed_work(rbd_dev->task_wq,
>>> - &rbd_dev->watch_dwork,
>>> -                                          RBD_RETRY_DELAY);
>>> +                       spin_lock_irq(&rbd_dev->lock);
>>> +                       if (!test_bit(RBD_DEV_FLAG_REMOVING, 
>>> &rbd_dev->flags)) {
>>> + queue_delayed_work(rbd_dev->task_wq,
>>> + &rbd_dev->watch_dwork,
>>> + RBD_RETRY_DELAY);
>>> +                       }
>>> + spin_unlock_irq(&rbd_dev->lock);
>>>                  }
>>>                  mutex_unlock(&rbd_dev->watch_mutex);
>>>                  return;
>> Hi Dongsheng,
>>
>> What made you think it is rbd (or ceph) related?  Do you know what gets
>> dereferenced?  Is it reproducible?
>
> Hi Ilya,
>     There is a simple reproduce script:
>
> ./vstart.sh  -k -l --bluestore
> rbd map -o osd_request_timeout=10 test1
> time dd if=/dev/zero of=/dev/rbd0 bs=64K count=1000 oflag=direct &
> sleep 1
> ps -ef|grep -E "ceph-mon|ceph-osd"|gawk '{print "kill -9 "$2}'|bash
> rbd unmap -o force /dev/rbd0
>
> But maybe you need to run this script in a loop, because that's not 
> 100% happen.
>>
>> While watch-related code could have been better, rbd_dev doesn't get
>> destroyed until rbd_dev->task_wq is flushed in rbd_dev_release(), so at
>> first sight I don't see a problem.
>
> unmap Thread watch 
> Thread                                               timer
> do_rbd_remove
> | |                                                       |
>   |-- test_and_set_bit(RBD_DEV_FLAG_REMOVING) 
> |                                                       |
> | |                                                       |
>   |-- cancel_tasks_sync(rbd_dev) 
> |                                                       |
> | |-- queue_delayed_work for watch                        |
> | |                                                       |
>   |-- destroy_workqueue(rbd_dev->task_wq); 
> |                                                       |
> | |                                                       |
>                 |-- drain_workqueue(wq); 
> |                                                       |
>                 |-- destroy other resources in wq 
> |                                                       |
> | |                                                       |
> | |                                                       |-- 
> call_timer_fn
> | |
> |-- __queue_work()

Sorry for the unreadable format of it. Please read the attachment for it.
>
>
>
> I draw a sequential for this case,
>
> When we are going to destroy workqueue, we need to cancel_tasks_sync 
> to cancel all
> delayed work in watch_dwork timer. but this is not protected by lock, 
> thus after this work, we can
> queue_delayed_work for watch too. In this case, the timer maybe expire 
> after drain_workqueue(), then
> this work Escape the cancel in timer by cancel_tasks_sync(rbd_dev) , 
> and escape drain_workqueu()
> by destroy_workqueue().
>
> When the timer expire and call timer fn, we will get problem if we 
> already call destroy_workqueue().
>
> So we have to ensure we can't queue delayed work after 
> cancel_tasks_sync().
>
> Thanx
> Dongsheng
>>
>> Thanks,
>>
>>                  Ilya
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>
>
> -- 
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
i



unmap Thread							watch Thread						   timer
do_rbd_remove
  |									|							|
  |-- test_and_set_bit(RBD_DEV_FLAG_REMOVING)				|							|
  |									|							|
  |-- cancel_tasks_sync(rbd_dev)					|							|
  |									|-- queue_delayed_work for watch			|
  |									|							|
  |-- destroy_workqueue(rbd_dev->task_wq);				|							|
           	|							|							|
           	|-- drain_workqueue(wq);				|							|
		|-- destroy other resources in wq			|							|
		|							|							|
		|							|							|-- call_timer_fn
		|														|
																	|-- __queue_work()
diff mbox

Patch

diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 2b4e90d..d1d8f46 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -3475,9 +3475,13 @@  static void rbd_reregister_watch(struct work_struct *work)
 			set_bit(RBD_DEV_FLAG_BLACKLISTED, &rbd_dev->flags);
 			wake_requests(rbd_dev, true);
 		} else {
-			queue_delayed_work(rbd_dev->task_wq,
-					   &rbd_dev->watch_dwork,
-					   RBD_RETRY_DELAY);
+			spin_lock_irq(&rbd_dev->lock);
+			if (!test_bit(RBD_DEV_FLAG_REMOVING, &rbd_dev->flags)) {
+				queue_delayed_work(rbd_dev->task_wq,
+						   &rbd_dev->watch_dwork,
+						   RBD_RETRY_DELAY);
+			}
+			spin_unlock_irq(&rbd_dev->lock);
 		}
 		mutex_unlock(&rbd_dev->watch_mutex);
 		return;