diff mbox

[14/14] target: Fix handling of removed LUNs

Message ID 8998eb5e118201e7922cc70296ddc7ed749ab5bb.camel@wdc.com (mailing list archive)
State New, archived
Headers show

Commit Message

Bart Van Assche June 21, 2018, 5:53 p.m. UTC
On Wed, 2018-06-20 at 20:32 -0500, Mike Christie wrote:
> I was in the middle of fixing up the sense key value issue in the patch
> like we discussed before (use illegal request instead of unit
> attention), but it looks like there was another bug. If we have 2
> commands going to the same device and they run target_scsi3_ua_check and
> see ua_count=1 TCM_CHECK_CONDITION_UNIT_ATTENTION is returned for both.
> They then call core_scsi3_ua_for_check_condition and one of them
> deallocates a UA dropping ua_count=0. The second command then runs the
> !atomic_read(&deve->ua_count) check and sees the other command has
> decremented it. Or the second one might have passed the ua_count check
> in core_scsi3_ua_for_check_condition and it runs the loop but nothing is
> found since the first command has already removed it. We then return
> bogus asc/ascq values.
> 
> In the attached patch I just return busy status for this race case. It
> seemed easier than trying to add more locking and error handling.
> 
> Some notes/questions on some of the code the patch touched though.
> 
> If translate_sense_reason failed due to a short buffer it returned
> -EINVAL. transport_send_check_condition_and_sense would then end up
> calling transport_handle_queue_full/transport_complete_qf and that would
> just end up failing the command with
> TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE. Do you know if that long journey
> intended or just an accident?
> 
> In this patch I fixed that so it just translated the error to
> TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE immediately. If the patch is ok
> and we meant to return that error code for the short buffer then I can
> break that out into another patch.

Hello Mike,

That's an interesting observation. However, I think that the race described
above can be fixed with fewer code changes. Can you have a look at the three
attached (untested) patches?

Regarding translate_sense_reason() failing due to a short buffer: I don't
think that the current behavior in case of a short sense buffer is correct.
It's probably better to return a sense buffer that is incomplete and with
correct KEY / ASC / ASCQ values instead of modifying these values. How
about changing the code at the end of translate_sense_reason() as follows?

	if (si->add_sector_info)
		WARN_ON_ONCE(scsi_set_sense_information(buffer,
						  cmd->scsi_sense_length,
						  cmd->bad_sector) < 0);

	return 0;

Thanks,

Bart.

Comments

Mike Christie June 21, 2018, 7:10 p.m. UTC | #1
On 06/21/2018 12:53 PM, Bart Van Assche wrote:
> On Wed, 2018-06-20 at 20:32 -0500, Mike Christie wrote:
>> > I was in the middle of fixing up the sense key value issue in the patch
>> > like we discussed before (use illegal request instead of unit
>> > attention), but it looks like there was another bug. If we have 2
>> > commands going to the same device and they run target_scsi3_ua_check and
>> > see ua_count=1 TCM_CHECK_CONDITION_UNIT_ATTENTION is returned for both.
>> > They then call core_scsi3_ua_for_check_condition and one of them
>> > deallocates a UA dropping ua_count=0. The second command then runs the
>> > !atomic_read(&deve->ua_count) check and sees the other command has
>> > decremented it. Or the second one might have passed the ua_count check
>> > in core_scsi3_ua_for_check_condition and it runs the loop but nothing is
>> > found since the first command has already removed it. We then return
>> > bogus asc/ascq values.
>> > 
>> > In the attached patch I just return busy status for this race case. It
>> > seemed easier than trying to add more locking and error handling.
>> > 
>> > Some notes/questions on some of the code the patch touched though.
>> > 
>> > If translate_sense_reason failed due to a short buffer it returned
>> > -EINVAL. transport_send_check_condition_and_sense would then end up
>> > calling transport_handle_queue_full/transport_complete_qf and that would
>> > just end up failing the command with
>> > TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE. Do you know if that long journey
>> > intended or just an accident?
>> > 
>> > In this patch I fixed that so it just translated the error to
>> > TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE immediately. If the patch is ok
>> > and we meant to return that error code for the short buffer then I can
>> > break that out into another patch.
> Hello Mike,
> 
> That's an interesting observation. However, I think that the race described
> above can be fixed with fewer code changes. Can you have a look at the three
> attached (untested) patches?
> 
> Regarding translate_sense_reason() failing due to a short buffer: I don't
> think that the current behavior in case of a short sense buffer is correct.
> It's probably better to return a sense buffer that is incomplete and with
> correct KEY / ASC / ASCQ values instead of modifying these values. How
> about changing the code at the end of translate_sense_reason() as follows?
> 
> 	if (si->add_sector_info)
> 		WARN_ON_ONCE(scsi_set_sense_information(buffer,
> 						  cmd->scsi_sense_length,
> 						  cmd->bad_sector) < 0);
> 
> 	return 0;

Agree.

> 
> @@ -3156,9 +3167,13 @@ static int translate_sense_reason(struct se_cmd *cmd, sense_reason_t reason)
>  		si = &sense_info_table[(__force int)
>  				       TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE];
>  
> +	key = si->key;
>  	if (reason == TCM_CHECK_CONDITION_UNIT_ATTENTION) {
> -		core_scsi3_ua_for_check_condition(cmd, &asc, &ascq);
> -		WARN_ON_ONCE(asc == 0);
> +		if (!core_scsi3_ua_for_check_condition(cmd, &key, &asc,
> +						       &ascq)) {
> +			cmd->scsi_status = SAM_STAT_BUSY;

Here is another question I had about this code path.

If you hit this case do you need to undo the setting of the
SCF_SENT_CHECK_CONDITION bit done in
transport_send_check_condition_and_sense or does the check for that bit
need to be fixed up.

With the code addition above we have this path setting
SCF_SENT_CHECK_CONDITION so
iscsit_check_task_reassign_expdatasn/iscsit_task_reassign_complete_scsi_cmnd/
transport_send_task_abort would see the CC bit set and return.

If transport_generic_request_failure is passed TCM_LUN_BUSY which gets
translate to SAM_STAT_BUSY then
iscsit_check_task_reassign_expdatasn/iscsit_task_reassign_complete_scsi_cmnd/transport_send_task_abort
would not see the CC bit is set and those functions would not return
right away.

It seems like
iscsit_check_task_reassign_expdatasn/iscsit_task_reassign_complete_scsi_cmnd/
transport_send_task_abort want the same behavior for both check
conditions and commands that completed with non good status, so it
should be checking for the CC bit set and also if non good status has
been returned.


>  	 * The highest priority Unit Attentions are placed at the head of the
> @@ -244,6 +246,7 @@ void core_scsi3_ua_for_check_condition(
>  		 * clearing it.
>  		 */
>  		if (dev->dev_attrib.emulate_ua_intlck_ctrl != 0) {
> +			*key = UNIT_ATTENTION;


Is the key already set to UNIT_ATTENTION here?


Seems nice!
--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bart Van Assche June 21, 2018, 8:25 p.m. UTC | #2
On Thu, 2018-06-21 at 14:10 -0500, Mike Christie wrote:
> > 

> > @@ -3156,9 +3167,13 @@ static int translate_sense_reason(struct se_cmd *cmd, sense_reason_t reason)

> >  		si = &sense_info_table[(__force int)

> >  				       TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE];

> >  

> > +	key = si->key;

> >  	if (reason == TCM_CHECK_CONDITION_UNIT_ATTENTION) {

> > -		core_scsi3_ua_for_check_condition(cmd, &asc, &ascq);

> > -		WARN_ON_ONCE(asc == 0);

> > +		if (!core_scsi3_ua_for_check_condition(cmd, &key, &asc,

> > +						       &ascq)) {

> > +			cmd->scsi_status = SAM_STAT_BUSY;

> 

> Here is another question I had about this code path.

> 

> If you hit this case do you need to undo the setting of the

> SCF_SENT_CHECK_CONDITION bit done in

> transport_send_check_condition_and_sense or does the check for that bit

> need to be fixed up.

> 

> With the code addition above we have this path setting

> SCF_SENT_CHECK_CONDITION so

> iscsit_check_task_reassign_expdatasn/iscsit_task_reassign_complete_scsi_cmnd/

> transport_send_task_abort would see the CC bit set and return.


Hello Mike,

Have you noticed that patch 1/3 that was attached to my previous e-mail
moves the code that sets the SCF_SENT_CHECK_CONDITION flag inside
translate_sense_reason() and past the point where SAM_STAT_BUSY is returned?

> >  	 * The highest priority Unit Attentions are placed at the head of the

> > @@ -244,6 +246,7 @@ void core_scsi3_ua_for_check_condition(

> >  		 * clearing it.

> >  		 */

> >  		if (dev->dev_attrib.emulate_ua_intlck_ctrl != 0) {

> > +			*key = UNIT_ATTENTION;

> 

> 

> Is the key already set to UNIT_ATTENTION here?


Good catch - that assignment is superfluous. But I would like to keep it
because I think it makes the core_scsi3_ua_for_check_condition() easy to
read. However, that assignment makes the following entry in sense_info_table[]
superfluous:

	[TCM_CHECK_CONDITION_UNIT_ATTENTION] = {
		.key = UNIT_ATTENTION,
	},

Bart.
Mike Christie June 21, 2018, 8:32 p.m. UTC | #3
On 06/21/2018 03:25 PM, Bart Van Assche wrote:
> On Thu, 2018-06-21 at 14:10 -0500, Mike Christie wrote:
>>>
>>> @@ -3156,9 +3167,13 @@ static int translate_sense_reason(struct se_cmd *cmd, sense_reason_t reason)
>>>  		si = &sense_info_table[(__force int)
>>>  				       TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE];
>>>  
>>> +	key = si->key;
>>>  	if (reason == TCM_CHECK_CONDITION_UNIT_ATTENTION) {
>>> -		core_scsi3_ua_for_check_condition(cmd, &asc, &ascq);
>>> -		WARN_ON_ONCE(asc == 0);
>>> +		if (!core_scsi3_ua_for_check_condition(cmd, &key, &asc,
>>> +						       &ascq)) {
>>> +			cmd->scsi_status = SAM_STAT_BUSY;
>>
>> Here is another question I had about this code path.
>>
>> If you hit this case do you need to undo the setting of the
>> SCF_SENT_CHECK_CONDITION bit done in
>> transport_send_check_condition_and_sense or does the check for that bit
>> need to be fixed up.
>>
>> With the code addition above we have this path setting
>> SCF_SENT_CHECK_CONDITION so
>> iscsit_check_task_reassign_expdatasn/iscsit_task_reassign_complete_scsi_cmnd/
>> transport_send_task_abort would see the CC bit set and return.
> 
> Hello Mike,
> 
> Have you noticed that patch 1/3 that was attached to my previous e-mail
> moves the code that sets the SCF_SENT_CHECK_CONDITION flag inside

I am not seeing that. I saw SCF_EMULATED_TASK_SENSE was moved but not
SCF_SENT_CHECK_CONDITION. The latter is set a little earlier in
transport_send_check_condition_and_sense.

> translate_sense_reason() and past the point where SAM_STAT_BUSY is returned?
> 
--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mike Christie June 21, 2018, 8:38 p.m. UTC | #4
On 06/21/2018 03:25 PM, Bart Van Assche wrote:
>>>  		 * clearing it.
>>> > >  		 */
>>> > >  		if (dev->dev_attrib.emulate_ua_intlck_ctrl != 0) {
>>> > > +			*key = UNIT_ATTENTION;
>> > 
>> > 
>> > Is the key already set to UNIT_ATTENTION here?
> Good catch - that assignment is superfluous. But I would like to keep it
> because I think it makes the core_scsi3_ua_for_check_condition() easy to
> read. However, that assignment makes the following entry in sense_info_table[]
> superfluous:

If you keep it above then I think it would be good to add it below in
the emulate_ua_intlck_ctrl=0 case:

                if (head) {
                        *asc = ua->ua_asc;
                        *ascq = ua->ua_ascq;
                        head = 0;
                }

so it is consistent.
--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bart Van Assche June 21, 2018, 8:53 p.m. UTC | #5
T24gVGh1LCAyMDE4LTA2LTIxIGF0IDE1OjM4IC0wNTAwLCBNaWtlIENocmlzdGllIHdyb3RlOg0K
PiBPbiAwNi8yMS8yMDE4IDAzOjI1IFBNLCBCYXJ0IFZhbiBBc3NjaGUgd3JvdGU6DQo+ID4gPiA+
ICAJCSAqIGNsZWFyaW5nIGl0Lg0KPiA+ID4gPiA+ID4gIAkJICovDQo+ID4gPiA+ID4gPiAgCQlp
ZiAoZGV2LT5kZXZfYXR0cmliLmVtdWxhdGVfdWFfaW50bGNrX2N0cmwgIT0gMCkgew0KPiA+ID4g
PiA+ID4gKwkJCSprZXkgPSBVTklUX0FUVEVOVElPTjsNCj4gPiA+ID4gDQo+ID4gPiA+IA0KPiA+
ID4gPiBJcyB0aGUga2V5IGFscmVhZHkgc2V0IHRvIFVOSVRfQVRURU5USU9OIGhlcmU/DQo+ID4g
DQo+ID4gR29vZCBjYXRjaCAtIHRoYXQgYXNzaWdubWVudCBpcyBzdXBlcmZsdW91cy4gQnV0IEkg
d291bGQgbGlrZSB0byBrZWVwIGl0DQo+ID4gYmVjYXVzZSBJIHRoaW5rIGl0IG1ha2VzIHRoZSBj
b3JlX3Njc2kzX3VhX2Zvcl9jaGVja19jb25kaXRpb24oKSBlYXN5IHRvDQo+ID4gcmVhZC4gSG93
ZXZlciwgdGhhdCBhc3NpZ25tZW50IG1ha2VzIHRoZSBmb2xsb3dpbmcgZW50cnkgaW4gc2Vuc2Vf
aW5mb190YWJsZVtdDQo+ID4gc3VwZXJmbHVvdXM6DQo+IA0KPiBJZiB5b3Uga2VlcCBpdCBhYm92
ZSB0aGVuIEkgdGhpbmsgaXQgd291bGQgYmUgZ29vZCB0byBhZGQgaXQgYmVsb3cgaW4NCj4gdGhl
IGVtdWxhdGVfdWFfaW50bGNrX2N0cmw9MCBjYXNlOg0KPiANCj4gICAgICAgICAgICAgICAgIGlm
IChoZWFkKSB7DQo+ICAgICAgICAgICAgICAgICAgICAgICAgICphc2MgPSB1YS0+dWFfYXNjOw0K
PiAgICAgICAgICAgICAgICAgICAgICAgICAqYXNjcSA9IHVhLT51YV9hc2NxOw0KPiAgICAgICAg
ICAgICAgICAgICAgICAgICBoZWFkID0gMDsNCj4gICAgICAgICAgICAgICAgIH0NCj4gDQo+IHNv
IGl0IGlzIGNvbnNpc3RlbnQuDQoNCk9LLCB3aWxsIGRvLg0KDQpCYXJ0Lg0KDQoNCg0K
--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bart Van Assche June 21, 2018, 8:58 p.m. UTC | #6
On Thu, 2018-06-21 at 15:32 -0500, Mike Christie wrote:
> I am not seeing that. I saw SCF_EMULATED_TASK_SENSE was moved but not

> SCF_SENT_CHECK_CONDITION. The latter is set a little earlier in

> transport_send_check_condition_and_sense.


Are you sure we need to change the code that sets SCF_SENT_CHECK_CONDITION?
It seems to me like the name of that flag is misleading. I think that flag
is used to keep track of whether or not a response has been sent to the
initiator. So keeping the current code for the manipulation of that flag
should be fine.

Bart.
Mike Christie June 22, 2018, 3:45 a.m. UTC | #7
On 06/21/2018 03:58 PM, Bart Van Assche wrote:
> On Thu, 2018-06-21 at 15:32 -0500, Mike Christie wrote:
>> I am not seeing that. I saw SCF_EMULATED_TASK_SENSE was moved but not
>> SCF_SENT_CHECK_CONDITION. The latter is set a little earlier in
>> transport_send_check_condition_and_sense.
> 
> Are you sure we need to change the code that sets SCF_SENT_CHECK_CONDITION?
> It seems to me like the name of that flag is misleading. I think that flag
> is used to keep track of whether or not a response has been sent to the
> initiator. So keeping the current code for the manipulation of that flag
> should be fine.
> 

I think with your patch the behavior will be the same as before, so it
is safe.

I was more asking if in another future patch we need to set that bit for
the case where TCM_LUN_BUSY/TCM_OUT_OF_RESOURCES is passed to
transport_generic_request_failure so the behavior for that code path is
the same as when you now set cmd->scsi_status = TCM_LUN_BUSY in your patch.



--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bart Van Assche June 22, 2018, 3:38 p.m. UTC | #8
On 06/21/18 20:45, Mike Christie wrote:
> On 06/21/2018 03:58 PM, Bart Van Assche wrote:
>> On Thu, 2018-06-21 at 15:32 -0500, Mike Christie wrote:
>>> I am not seeing that. I saw SCF_EMULATED_TASK_SENSE was moved but not
>>> SCF_SENT_CHECK_CONDITION. The latter is set a little earlier in
>>> transport_send_check_condition_and_sense.
>>
>> Are you sure we need to change the code that sets SCF_SENT_CHECK_CONDITION?
>> It seems to me like the name of that flag is misleading. I think that flag
>> is used to keep track of whether or not a response has been sent to the
>> initiator. So keeping the current code for the manipulation of that flag
>> should be fine.
> 
> I think with your patch the behavior will be the same as before, so it
> is safe.
> 
> I was more asking if in another future patch we need to set that bit for
> the case where TCM_LUN_BUSY/TCM_OUT_OF_RESOURCES is passed to
> transport_generic_request_failure so the behavior for that code path is
> the same as when you now set cmd->scsi_status = TCM_LUN_BUSY in your patch.

Hello Mike,

As you know today the code that processes the abort task management 
function is executed concurrently with the regular command processing 
code. I think if we would process task aborts from inside the regular 
command processing path that not only that code would become simpler but 
also that we wouldn't need the SCF_SENT_CHECK_CONDITION flag anymore for 
most target drivers. I'm not sure though about the iSCSI target driver. 
It's possible that we still would need that flag for the iSCSI target 
driver. How about me resurrecting the patch that moves abort processing
into the same context as the regular SCSI command execution?

Bart.
--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

From fac94854486b98dc04c89c21a4b2f24f895bbafe Mon Sep 17 00:00:00 2001
From: Bart Van Assche <bart.vanassche@wdc.com>
Date: Thu, 21 Jun 2018 10:19:05 -0700
Subject: [PATCH 3/3] target: Remove se_dev_entry.ua_count

se_dev_entry.ua_count is only used to check whether or not
se_dev_entry.ua_list is empty. Use list_empty_careful() instead.
Checking whether or not ua_list is empty without holding the
lock that protects that list is fine because the code that
dequeues from that list will check again whether or not that
list is empty.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Mike Christie <mchristi@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
---
 drivers/target/target_core_device.c |  1 -
 drivers/target/target_core_ua.c     | 12 ++----------
 include/target/target_core_base.h   |  1 -
 3 files changed, 2 insertions(+), 12 deletions(-)

diff --git a/drivers/target/target_core_device.c b/drivers/target/target_core_device.c
index e8d2a7f22382..d91cbff57680 100644
--- a/drivers/target/target_core_device.c
+++ b/drivers/target/target_core_device.c
@@ -336,7 +336,6 @@  int core_enable_device_list_for_node(
 		return -ENOMEM;
 	}
 
-	atomic_set(&new->ua_count, 0);
 	spin_lock_init(&new->ua_lock);
 	INIT_LIST_HEAD(&new->ua_list);
 	INIT_LIST_HEAD(&new->lun_link);
diff --git a/drivers/target/target_core_ua.c b/drivers/target/target_core_ua.c
index d2fe3d98c335..49ccac620fcf 100644
--- a/drivers/target/target_core_ua.c
+++ b/drivers/target/target_core_ua.c
@@ -55,7 +55,7 @@  target_scsi3_ua_check(struct se_cmd *cmd)
 		rcu_read_unlock();
 		return 0;
 	}
-	if (!atomic_read(&deve->ua_count)) {
+	if (list_empty_careful(&deve->ua_list)) {
 		rcu_read_unlock();
 		return 0;
 	}
@@ -154,7 +154,6 @@  int core_scsi3_ua_allocate(
 				&deve->ua_list);
 		spin_unlock(&deve->ua_lock);
 
-		atomic_inc_mb(&deve->ua_count);
 		return 0;
 	}
 	list_add_tail(&ua->ua_nacl_list, &deve->ua_list);
@@ -164,7 +163,6 @@  int core_scsi3_ua_allocate(
 		" 0x%02x, ASCQ: 0x%02x\n", deve->mapped_lun,
 		asc, ascq);
 
-	atomic_inc_mb(&deve->ua_count);
 	return 0;
 }
 
@@ -196,8 +194,6 @@  void core_scsi3_ua_release_all(
 	list_for_each_entry_safe(ua, ua_p, &deve->ua_list, ua_nacl_list) {
 		list_del(&ua->ua_nacl_list);
 		kmem_cache_free(se_ua_cache, ua);
-
-		atomic_dec_mb(&deve->ua_count);
 	}
 	spin_unlock(&deve->ua_lock);
 }
@@ -263,8 +259,6 @@  bool core_scsi3_ua_for_check_condition(struct se_cmd *cmd, u8 *key, u8 *asc,
 		}
 		list_del(&ua->ua_nacl_list);
 		kmem_cache_free(se_ua_cache, ua);
-
-		atomic_dec_mb(&deve->ua_count);
 	}
 	spin_unlock(&deve->ua_lock);
 	rcu_read_unlock();
@@ -304,7 +298,7 @@  int core_scsi3_ua_clear_for_request_sense(
 		rcu_read_unlock();
 		return -EINVAL;
 	}
-	if (!atomic_read(&deve->ua_count)) {
+	if (list_empty_careful(&deve->ua_list)) {
 		rcu_read_unlock();
 		return -EPERM;
 	}
@@ -327,8 +321,6 @@  int core_scsi3_ua_clear_for_request_sense(
 		}
 		list_del(&ua->ua_nacl_list);
 		kmem_cache_free(se_ua_cache, ua);
-
-		atomic_dec_mb(&deve->ua_count);
 	}
 	spin_unlock(&deve->ua_lock);
 	rcu_read_unlock();
diff --git a/include/target/target_core_base.h b/include/target/target_core_base.h
index ca59e065c1fd..7a4ee7852ca4 100644
--- a/include/target/target_core_base.h
+++ b/include/target/target_core_base.h
@@ -638,7 +638,6 @@  struct se_dev_entry {
 	atomic_long_t		total_cmds;
 	atomic_long_t		read_bytes;
 	atomic_long_t		write_bytes;
-	atomic_t		ua_count;
 	/* Used for PR SPEC_I_PT=1 and REGISTER_AND_MOVE */
 	struct kref		pr_kref;
 	struct completion	pr_comp;
-- 
2.17.1