diff mbox series

scsi: scsi_dh_rdac: Avoid crash when a disk attach failed

Message ID 20230803112841.588822-1-huangcun@sangfor.com.cn (mailing list archive)
State New, archived
Headers show
Series scsi: scsi_dh_rdac: Avoid crash when a disk attach failed | expand

Commit Message

Huang Cun Aug. 3, 2023, 11:28 a.m. UTC
When a disk fails to attach, the struct rdac_dh_data is released,
but it is not removed from the ctlr->dh_list. When attaching another
disk, the released rdac_dh_data will be accessed and the following
BUG_ON() may be observed:

[  414.696167] scsi 5:0:0:7: rdac: Attach failed (8)
...
[  423.615364] kernel BUG at drivers/scsi/device_handler/scsi_dh_rdac.c:427!
[  423.615731] invalid opcode: 0000 [#1] SMP NOPTI
...
[  423.623247] Call Trace:
[  423.623598]  rdac_bus_attach+0x203/0x4c0
[  423.623949]  ? scsi_dh_handler_attach+0x2d/0x90
[  423.624300]  scsi_dh_handler_attach+0x2d/0x90
[  423.624652]  scsi_sysfs_add_sdev+0x88/0x270
[  423.625004]  scsi_probe_and_add_lun+0xc47/0xd50
[  423.625354]  scsi_report_lun_scan+0x339/0x3b0
[  423.625705]  __scsi_scan_target+0xe9/0x220
[  423.626056]  scsi_scan_target+0xf6/0x100
[  423.626404]  fc_scsi_scan_rport+0xa5/0xb0
[  423.626757]  process_one_work+0x15e/0x3f0
[  423.627106]  worker_thread+0x4c/0x440
[  423.627453]  ? rescuer_thread+0x350/0x350
[  423.627804]  kthread+0xf8/0x130
[  423.628153]  ? kthread_destroy_worker+0x40/0x40
[  423.628509]  ret_from_fork+0x1f/0x40

Fixes: 1a5dc166cd88 ("scsi_dh_rdac: update 'access_state' field")
Signed-off-by: Huang Cun <huangcun@sangfor.com.cn>
Signed-off-by: Ding Hui <dinghui@sangfor.com.cn>
Cc: Donglin Peng <pengdonglin@sangfor.com.cn>
---
 drivers/scsi/device_handler/scsi_dh_rdac.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Ding Hui Sept. 6, 2023, 8:37 a.m. UTC | #1
Friendly ping.

On 2023/8/3 19:28, Huang Cun wrote:
> When a disk fails to attach, the struct rdac_dh_data is released,
> but it is not removed from the ctlr->dh_list. When attaching another
> disk, the released rdac_dh_data will be accessed and the following
> BUG_ON() may be observed:
> 
> [  414.696167] scsi 5:0:0:7: rdac: Attach failed (8)
> ...
> [  423.615364] kernel BUG at drivers/scsi/device_handler/scsi_dh_rdac.c:427!
> [  423.615731] invalid opcode: 0000 [#1] SMP NOPTI
> ...
> [  423.623247] Call Trace:
> [  423.623598]  rdac_bus_attach+0x203/0x4c0
> [  423.623949]  ? scsi_dh_handler_attach+0x2d/0x90
> [  423.624300]  scsi_dh_handler_attach+0x2d/0x90
> [  423.624652]  scsi_sysfs_add_sdev+0x88/0x270
> [  423.625004]  scsi_probe_and_add_lun+0xc47/0xd50
> [  423.625354]  scsi_report_lun_scan+0x339/0x3b0
> [  423.625705]  __scsi_scan_target+0xe9/0x220
> [  423.626056]  scsi_scan_target+0xf6/0x100
> [  423.626404]  fc_scsi_scan_rport+0xa5/0xb0
> [  423.626757]  process_one_work+0x15e/0x3f0
> [  423.627106]  worker_thread+0x4c/0x440
> [  423.627453]  ? rescuer_thread+0x350/0x350
> [  423.627804]  kthread+0xf8/0x130
> [  423.628153]  ? kthread_destroy_worker+0x40/0x40
> [  423.628509]  ret_from_fork+0x1f/0x40
> 
> Fixes: 1a5dc166cd88 ("scsi_dh_rdac: update 'access_state' field")
> Signed-off-by: Huang Cun <huangcun@sangfor.com.cn>
> Signed-off-by: Ding Hui <dinghui@sangfor.com.cn>
> Cc: Donglin Peng <pengdonglin@sangfor.com.cn>
> ---
>   drivers/scsi/device_handler/scsi_dh_rdac.c | 2 ++
>   1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/scsi/device_handler/scsi_dh_rdac.c b/drivers/scsi/device_handler/scsi_dh_rdac.c
> index c5538645057a..9d487c2b7708 100644
> --- a/drivers/scsi/device_handler/scsi_dh_rdac.c
> +++ b/drivers/scsi/device_handler/scsi_dh_rdac.c
> @@ -762,8 +762,10 @@ static int rdac_bus_attach(struct scsi_device *sdev)
>   
>   clean_ctlr:
>   	spin_lock(&list_lock);
> +	list_del_rcu(&h->node);
>   	kref_put(&h->ctlr->kref, release_controller);
>   	spin_unlock(&list_lock);
> +	synchronize_rcu();
>   
>   failed:
>   	kfree(h);
Mike Christie Sept. 6, 2023, 3:51 p.m. UTC | #2
On 8/3/23 6:28 AM, Huang Cun wrote:
> When a disk fails to attach, the struct rdac_dh_data is released,
> but it is not removed from the ctlr->dh_list. When attaching another
> disk, the released rdac_dh_data will be accessed and the following
> BUG_ON() may be observed:
> 
> [  414.696167] scsi 5:0:0:7: rdac: Attach failed (8)
> ...
> [  423.615364] kernel BUG at drivers/scsi/device_handler/scsi_dh_rdac.c:427!
> [  423.615731] invalid opcode: 0000 [#1] SMP NOPTI
> ...
> [  423.623247] Call Trace:
> [  423.623598]  rdac_bus_attach+0x203/0x4c0
> [  423.623949]  ? scsi_dh_handler_attach+0x2d/0x90
> [  423.624300]  scsi_dh_handler_attach+0x2d/0x90
> [  423.624652]  scsi_sysfs_add_sdev+0x88/0x270
> [  423.625004]  scsi_probe_and_add_lun+0xc47/0xd50
> [  423.625354]  scsi_report_lun_scan+0x339/0x3b0
> [  423.625705]  __scsi_scan_target+0xe9/0x220
> [  423.626056]  scsi_scan_target+0xf6/0x100
> [  423.626404]  fc_scsi_scan_rport+0xa5/0xb0
> [  423.626757]  process_one_work+0x15e/0x3f0
> [  423.627106]  worker_thread+0x4c/0x440
> [  423.627453]  ? rescuer_thread+0x350/0x350
> [  423.627804]  kthread+0xf8/0x130
> [  423.628153]  ? kthread_destroy_worker+0x40/0x40
> [  423.628509]  ret_from_fork+0x1f/0x40
> 
> Fixes: 1a5dc166cd88 ("scsi_dh_rdac: update 'access_state' field")
> Signed-off-by: Huang Cun <huangcun@sangfor.com.cn>
> Signed-off-by: Ding Hui <dinghui@sangfor.com.cn>
> Cc: Donglin Peng <pengdonglin@sangfor.com.cn>
> ---
>  drivers/scsi/device_handler/scsi_dh_rdac.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/scsi/device_handler/scsi_dh_rdac.c b/drivers/scsi/device_handler/scsi_dh_rdac.c
> index c5538645057a..9d487c2b7708 100644
> --- a/drivers/scsi/device_handler/scsi_dh_rdac.c
> +++ b/drivers/scsi/device_handler/scsi_dh_rdac.c
> @@ -762,8 +762,10 @@ static int rdac_bus_attach(struct scsi_device *sdev)
>  
>  clean_ctlr:
>  	spin_lock(&list_lock);
> +	list_del_rcu(&h->node);
>  	kref_put(&h->ctlr->kref, release_controller);
>  	spin_unlock(&list_lock);
> +	synchronize_rcu();
>  

Should this be:

spin_lock(&list_lock);
list_del_rcu(&h->node);
spin_unlock(&list_lock);

synchronize_rcu();

kref_put(&h->ctlr->kref, release_controller);


?

If you do the synchronize_rcu after the kref_put, then the kref_put
could free the rdac_dh_data, while check_ownership is still
accessing the rdac_dh_data, right?
Ding Hui Sept. 7, 2023, 1:45 a.m. UTC | #3
On 2023/9/6 23:51, Mike Christie wrote:
> On 8/3/23 6:28 AM, Huang Cun wrote:
>> When a disk fails to attach, the struct rdac_dh_data is released,
>> but it is not removed from the ctlr->dh_list. When attaching another
>> disk, the released rdac_dh_data will be accessed and the following
>> BUG_ON() may be observed:
>>
>> [  414.696167] scsi 5:0:0:7: rdac: Attach failed (8)
>> ...
>> [  423.615364] kernel BUG at drivers/scsi/device_handler/scsi_dh_rdac.c:427!
>> [  423.615731] invalid opcode: 0000 [#1] SMP NOPTI
>> ...
>> [  423.623247] Call Trace:
>> [  423.623598]  rdac_bus_attach+0x203/0x4c0
>> [  423.623949]  ? scsi_dh_handler_attach+0x2d/0x90
>> [  423.624300]  scsi_dh_handler_attach+0x2d/0x90
>> [  423.624652]  scsi_sysfs_add_sdev+0x88/0x270
>> [  423.625004]  scsi_probe_and_add_lun+0xc47/0xd50
>> [  423.625354]  scsi_report_lun_scan+0x339/0x3b0
>> [  423.625705]  __scsi_scan_target+0xe9/0x220
>> [  423.626056]  scsi_scan_target+0xf6/0x100
>> [  423.626404]  fc_scsi_scan_rport+0xa5/0xb0
>> [  423.626757]  process_one_work+0x15e/0x3f0
>> [  423.627106]  worker_thread+0x4c/0x440
>> [  423.627453]  ? rescuer_thread+0x350/0x350
>> [  423.627804]  kthread+0xf8/0x130
>> [  423.628153]  ? kthread_destroy_worker+0x40/0x40
>> [  423.628509]  ret_from_fork+0x1f/0x40
>>
>> Fixes: 1a5dc166cd88 ("scsi_dh_rdac: update 'access_state' field")
>> Signed-off-by: Huang Cun <huangcun@sangfor.com.cn>
>> Signed-off-by: Ding Hui <dinghui@sangfor.com.cn>
>> Cc: Donglin Peng <pengdonglin@sangfor.com.cn>
>> ---
>>   drivers/scsi/device_handler/scsi_dh_rdac.c | 2 ++
>>   1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/scsi/device_handler/scsi_dh_rdac.c b/drivers/scsi/device_handler/scsi_dh_rdac.c
>> index c5538645057a..9d487c2b7708 100644
>> --- a/drivers/scsi/device_handler/scsi_dh_rdac.c
>> +++ b/drivers/scsi/device_handler/scsi_dh_rdac.c
>> @@ -762,8 +762,10 @@ static int rdac_bus_attach(struct scsi_device *sdev)
>>   
>>   clean_ctlr:
>>   	spin_lock(&list_lock);
>> +	list_del_rcu(&h->node);
>>   	kref_put(&h->ctlr->kref, release_controller);
>>   	spin_unlock(&list_lock);
>> +	synchronize_rcu();
>>   
> 
> Should this be:
> 
> spin_lock(&list_lock);
> list_del_rcu(&h->node);
> spin_unlock(&list_lock);
> 
> synchronize_rcu();
> 
> kref_put(&h->ctlr->kref, release_controller);
> 
> 
> ?
> 
> If you do the synchronize_rcu after the kref_put, then the kref_put
> could free the rdac_dh_data, while check_ownership is still
> accessing the rdac_dh_data, right?
> 

You are right.

But I think we should keep the kref_put() and release callback be protected by list_lock, and only free
the ctlr after synchronize_rcu().

So how about the additional modify (not yet tested):

--- a/drivers/scsi/device_handler/scsi_dh_rdac.c
+++ b/drivers/scsi/device_handler/scsi_dh_rdac.c
@@ -166,6 +166,7 @@ struct rdac_controller {
  	struct scsi_device	*ms_sdev;
  	struct list_head	ms_head;
  	struct list_head	dh_list;
+	struct rcu_head		rcu;
  };
  
  struct c2_inquiry {
@@ -320,7 +321,7 @@ static void release_controller(struct kref *kref)
  	ctlr = container_of(kref, struct rdac_controller, kref);
  
  	list_del(&ctlr->node);
-	kfree(ctlr);
+	kfree_rcu(ctlr, rcu);
  }
  
  static struct rdac_controller *get_controller(int index, char *array_name,
Ding Hui Sept. 8, 2023, 1:03 a.m. UTC | #4
On 2023/9/7 9:45, Ding Hui wrote:
> On 2023/9/6 23:51, Mike Christie wrote:
>> On 8/3/23 6:28 AM, Huang Cun wrote:
>>> When a disk fails to attach, the struct rdac_dh_data is released,
>>> but it is not removed from the ctlr->dh_list. When attaching another
>>> disk, the released rdac_dh_data will be accessed and the following
>>> BUG_ON() may be observed:
>>>
>>> [  414.696167] scsi 5:0:0:7: rdac: Attach failed (8)
>>> ...
>>> [  423.615364] kernel BUG at drivers/scsi/device_handler/scsi_dh_rdac.c:427!
>>> [  423.615731] invalid opcode: 0000 [#1] SMP NOPTI
>>> ...
>>> [  423.623247] Call Trace:
>>> [  423.623598]  rdac_bus_attach+0x203/0x4c0
>>> [  423.623949]  ? scsi_dh_handler_attach+0x2d/0x90
>>> [  423.624300]  scsi_dh_handler_attach+0x2d/0x90
>>> [  423.624652]  scsi_sysfs_add_sdev+0x88/0x270
>>> [  423.625004]  scsi_probe_and_add_lun+0xc47/0xd50
>>> [  423.625354]  scsi_report_lun_scan+0x339/0x3b0
>>> [  423.625705]  __scsi_scan_target+0xe9/0x220
>>> [  423.626056]  scsi_scan_target+0xf6/0x100
>>> [  423.626404]  fc_scsi_scan_rport+0xa5/0xb0
>>> [  423.626757]  process_one_work+0x15e/0x3f0
>>> [  423.627106]  worker_thread+0x4c/0x440
>>> [  423.627453]  ? rescuer_thread+0x350/0x350
>>> [  423.627804]  kthread+0xf8/0x130
>>> [  423.628153]  ? kthread_destroy_worker+0x40/0x40
>>> [  423.628509]  ret_from_fork+0x1f/0x40
>>>
>>> Fixes: 1a5dc166cd88 ("scsi_dh_rdac: update 'access_state' field")
>>> Signed-off-by: Huang Cun <huangcun@sangfor.com.cn>
>>> Signed-off-by: Ding Hui <dinghui@sangfor.com.cn>
>>> Cc: Donglin Peng <pengdonglin@sangfor.com.cn>
>>> ---
>>>   drivers/scsi/device_handler/scsi_dh_rdac.c | 2 ++
>>>   1 file changed, 2 insertions(+)
>>>
>>> diff --git a/drivers/scsi/device_handler/scsi_dh_rdac.c b/drivers/scsi/device_handler/scsi_dh_rdac.c
>>> index c5538645057a..9d487c2b7708 100644
>>> --- a/drivers/scsi/device_handler/scsi_dh_rdac.c
>>> +++ b/drivers/scsi/device_handler/scsi_dh_rdac.c
>>> @@ -762,8 +762,10 @@ static int rdac_bus_attach(struct scsi_device *sdev)
>>>   clean_ctlr:
>>>       spin_lock(&list_lock);
>>> +    list_del_rcu(&h->node);
>>>       kref_put(&h->ctlr->kref, release_controller);
>>>       spin_unlock(&list_lock);
>>> +    synchronize_rcu();
>>
>> Should this be:
>>
>> spin_lock(&list_lock);
>> list_del_rcu(&h->node);
>> spin_unlock(&list_lock);
>>
>> synchronize_rcu();
>>
>> kref_put(&h->ctlr->kref, release_controller);
>>
>>
>> ?
>>
>> If you do the synchronize_rcu after the kref_put, then the kref_put
>> could free the rdac_dh_data, while check_ownership is still
>> accessing the rdac_dh_data, right?
>>
> 
> You are right.
> 
> But I think we should keep the kref_put() and release callback be protected by list_lock, and only free
> the ctlr after synchronize_rcu().
> 

Sorry, I thought again, maybe we don't need to worry about it.

The ctlr->kref is protected by list_lock, and release_controller() DO free the ctlr, NOT rdac_dh_data,
kfree(rdac_dh_data) is already after synchronize_rcu().

If check_ownership() shared the same ctlr, the kref must be greater than or equal to 2, so the ctlr will
not be released by kref_put(). On the other hand, if the ctlr is different, releasing it will not affect
others.
diff mbox series

Patch

diff --git a/drivers/scsi/device_handler/scsi_dh_rdac.c b/drivers/scsi/device_handler/scsi_dh_rdac.c
index c5538645057a..9d487c2b7708 100644
--- a/drivers/scsi/device_handler/scsi_dh_rdac.c
+++ b/drivers/scsi/device_handler/scsi_dh_rdac.c
@@ -762,8 +762,10 @@  static int rdac_bus_attach(struct scsi_device *sdev)
 
 clean_ctlr:
 	spin_lock(&list_lock);
+	list_del_rcu(&h->node);
 	kref_put(&h->ctlr->kref, release_controller);
 	spin_unlock(&list_lock);
+	synchronize_rcu();
 
 failed:
 	kfree(h);