[V3,11/17] SCSI: track pending admin commands

Message ID	20180913121546.5710-12-ming.lei@redhat.com (mailing list archive)
State	Changes Requested
Headers	show Return-Path: <linux-scsi-owner@kernel.org> From: Ming Lei <ming.lei@redhat.com> To: Jens Axboe <axboe@kernel.dk> Cc: linux-block@vger.kernel.org, Ming Lei <ming.lei@redhat.com>, Alan Stern <stern@rowland.harvard.edu>, Christoph Hellwig <hch@lst.de>, Bart Van Assche <bart.vanassche@wdc.com>, Jianchao Wang <jianchao.w.wang@oracle.com>, Hannes Reinecke <hare@suse.de>, Johannes Thumshirn <jthumshirn@suse.de>, Adrian Hunter <adrian.hunter@intel.com>, "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>, "Martin K. Petersen" <martin.petersen@oracle.com>, linux-scsi@vger.kernel.org Subject: [PATCH V3 11/17] SCSI: track pending admin commands Date: Thu, 13 Sep 2018 20:15:40 +0800 Message-Id: <20180913121546.5710-12-ming.lei@redhat.com> In-Reply-To: <20180913121546.5710-1-ming.lei@redhat.com> References: <20180913121546.5710-1-ming.lei@redhat.com> Sender: linux-scsi-owner@vger.kernel.org Precedence: bulk
Series	[V3,01/17] blk-mq: allow to pass default queue flags for creating & initializing queue \| expand [V3,01/17] blk-mq: allow to pass default queue flags for creating & initializing queue [V3,02/17] blk-mq: convert BLK_MQ_F_NO_SCHED into per-queue flag [V3,03/17] block: rename QUEUE_FLAG_NO_SCHED as QUEUE_FLAG_ADMIN [V3,04/17] blk-mq: don't reserve tags for admin queue [V3,05/17] SCSI: try to retrieve request_queue via 'scsi_cmnd' if possible [V3,06/17] SCSI: pass 'scsi_device' instance from 'scsi_request' [V3,07/17] SCSI: prepare for introducing admin queue for legacy path [V3,08/17] SCSI: pass scsi_device to scsi_mq_prep_fn [V3,09/17] SCSI: don't set .queuedata in scsi_mq_alloc_queue() [V3,10/17] SCSI: deal with admin queue busy [V3,11/17] SCSI: track pending admin commands [V3,12/17] SCSI: create admin queue for each host [V3,13/17] SCSI: use the dedicated admin queue to send admin commands [V3,14/17] SCSI: transport_spi: resume a quiesced device [V3,15/17] SCSI: use admin queue to implement queue QUIESCE [V3,16/17] block: simplify runtime PM support [V3,17/17] block: enable runtime PM for blk-mq

Ming Lei Sept. 13, 2018, 12:15 p.m. UTC

Firstly we have to make sure that all pending admin commands to
one same scsi_device are completed before removing the scsi_device.

Secondly scsi_internal_device_block() needs this too.

So introduce one waitqueue and atomic counter for this purpose.

Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/scsi/scsi_lib.c    | 6 ++++++
 drivers/scsi/scsi_scan.c   | 1 +
 drivers/scsi/scsi_sysfs.c  | 1 +
 include/scsi/scsi_device.h | 4 ++++
 4 files changed, 12 insertions(+)

jianchao.wang Sept. 14, 2018, 3:33 a.m. UTC | #1

Hi Ming

On 09/13/2018 08:15 PM, Ming Lei wrote:
>  EXPORT_SYMBOL(__scsi_execute);
> @@ -3246,6 +3251,7 @@ static int scsi_internal_device_block(struct scsi_device *sdev)
>  		else
>  			scsi_wait_for_queuecommand(sdev);
>  	}
> +	wait_event(sdev->admin_wq, !atomic_read(&sdev->nr_admin_pending));
>  	mutex_unlock(&sdev->state_mutex);
>  
>  	return err;
...
> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
> index 3aee9464a7bf..8bcb7ecc0c06 100644
> --- a/drivers/scsi/scsi_sysfs.c
> +++ b/drivers/scsi/scsi_sysfs.c
> @@ -1393,6 +1393,7 @@ void __scsi_remove_device(struct scsi_device *sdev)
>  
>  	blk_cleanup_queue(sdev->request_queue);
>  	cancel_work_sync(&sdev->requeue_work);
> +	wait_event(sdev->admin_wq, !atomic_read(&sdev->nr_admin_pending))

This nr_admin_pending could drain the ongoing scsi_request_fn/scsi_queue_rq,
but I'm afraid it cannot stop new ones coming in, such as the ones that have passed
the sdev state checking and have not crossed the atomic_inc(&sdev->nr_admin_pending).



Thanks
Jianchao

Ming Lei Sept. 14, 2018, 11:33 a.m. UTC | #2

On Fri, Sep 14, 2018 at 11:40 AM jianchao.wang
<jianchao.w.wang@oracle.com> wrote:
>
> Hi Ming
>
> On 09/13/2018 08:15 PM, Ming Lei wrote:
> >  EXPORT_SYMBOL(__scsi_execute);
> > @@ -3246,6 +3251,7 @@ static int scsi_internal_device_block(struct scsi_device *sdev)
> >               else
> >                       scsi_wait_for_queuecommand(sdev);
> >       }
> > +     wait_event(sdev->admin_wq, !atomic_read(&sdev->nr_admin_pending));
> >       mutex_unlock(&sdev->state_mutex);
> >
> >       return err;
> ...
> > diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
> > index 3aee9464a7bf..8bcb7ecc0c06 100644
> > --- a/drivers/scsi/scsi_sysfs.c
> > +++ b/drivers/scsi/scsi_sysfs.c
> > @@ -1393,6 +1393,7 @@ void __scsi_remove_device(struct scsi_device *sdev)
> >
> >       blk_cleanup_queue(sdev->request_queue);
> >       cancel_work_sync(&sdev->requeue_work);
> > +     wait_event(sdev->admin_wq, !atomic_read(&sdev->nr_admin_pending))
>
> This nr_admin_pending could drain the ongoing scsi_request_fn/scsi_queue_rq,
> but I'm afraid it cannot stop new ones coming in, such as the ones that have passed
> the sdev state checking and have not crossed the atomic_inc(&sdev->nr_admin_pending).
>

The counter of .nr_admin_pending is introduced for draining queued
admin requests to this scsi device.

Actually new requests have been prevented from entering scsi_queue_rq(),
please see the two callers of wait_event(sdev->admin_wq,
!atomic_read(&sdev->nr_admin_pending)).

Thanks,
Ming Lei

jianchao.wang Sept. 17, 2018, 2:46 a.m. UTC | #3

Hi Ming

On 09/14/2018 07:33 PM, Ming Lei wrote:
> On Fri, Sep 14, 2018 at 11:40 AM jianchao.wang
> <jianchao.w.wang@oracle.com> wrote:
>>
>> Hi Ming
>>
>> On 09/13/2018 08:15 PM, Ming Lei wrote:
>>>  EXPORT_SYMBOL(__scsi_execute);
>>> @@ -3246,6 +3251,7 @@ static int scsi_internal_device_block(struct scsi_device *sdev)
>>>               else
>>>                       scsi_wait_for_queuecommand(sdev);
>>>       }
>>> +     wait_event(sdev->admin_wq, !atomic_read(&sdev->nr_admin_pending));
>>>       mutex_unlock(&sdev->state_mutex);
>>>
>>>       return err;
>> ...
>>> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
>>> index 3aee9464a7bf..8bcb7ecc0c06 100644
>>> --- a/drivers/scsi/scsi_sysfs.c
>>> +++ b/drivers/scsi/scsi_sysfs.c
>>> @@ -1393,6 +1393,7 @@ void __scsi_remove_device(struct scsi_device *sdev)
>>>
>>>       blk_cleanup_queue(sdev->request_queue);
>>>       cancel_work_sync(&sdev->requeue_work);
>>> +     wait_event(sdev->admin_wq, !atomic_read(&sdev->nr_admin_pending))
>>
>> This nr_admin_pending could drain the ongoing scsi_request_fn/scsi_queue_rq,
>> but I'm afraid it cannot stop new ones coming in, such as the ones that have passed
>> the sdev state checking and have not crossed the atomic_inc(&sdev->nr_admin_pending).
>>
> 
> The counter of .nr_admin_pending is introduced for draining queued
> admin requests to this scsi device.
> 
> Actually new requests have been prevented from entering scsi_queue_rq(),
> please see the two callers of wait_event(sdev->admin_wq,
> !atomic_read(&sdev->nr_admin_pending)).
> 
For example

_scsi_execute
...
                                          scsi_internal_device_block
                                            scsi_internal_device_block_nowait
                                            blk_mq_quiesce_queue
                                            wait_event(sdev->admin_wq, !atomic_read(&sdev->nr_admin_pending))
  atomic_inc(&sdev->nr_admin_pending);

  blk_execute_rq（...）

  atomic_dec(&sdev->nr_admin_pending);
  wake_up_all(&sdev->admin_wq);

Or do you mean the scsi_queue_rq -> scsi_prep_state_check could gate out of ?

Thanks
Jianchao
> Thanks,
> Ming Lei
>

Ming Lei Sept. 17, 2018, 11:35 a.m. UTC | #4

On Mon, Sep 17, 2018 at 10:46:34AM +0800, jianchao.wang wrote:
> Hi Ming
> 
> On 09/14/2018 07:33 PM, Ming Lei wrote:
> > On Fri, Sep 14, 2018 at 11:40 AM jianchao.wang
> > <jianchao.w.wang@oracle.com> wrote:
> >>
> >> Hi Ming
> >>
> >> On 09/13/2018 08:15 PM, Ming Lei wrote:
> >>>  EXPORT_SYMBOL(__scsi_execute);
> >>> @@ -3246,6 +3251,7 @@ static int scsi_internal_device_block(struct scsi_device *sdev)
> >>>               else
> >>>                       scsi_wait_for_queuecommand(sdev);
> >>>       }
> >>> +     wait_event(sdev->admin_wq, !atomic_read(&sdev->nr_admin_pending));
> >>>       mutex_unlock(&sdev->state_mutex);
> >>>
> >>>       return err;
> >> ...
> >>> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
> >>> index 3aee9464a7bf..8bcb7ecc0c06 100644
> >>> --- a/drivers/scsi/scsi_sysfs.c
> >>> +++ b/drivers/scsi/scsi_sysfs.c
> >>> @@ -1393,6 +1393,7 @@ void __scsi_remove_device(struct scsi_device *sdev)
> >>>
> >>>       blk_cleanup_queue(sdev->request_queue);
> >>>       cancel_work_sync(&sdev->requeue_work);
> >>> +     wait_event(sdev->admin_wq, !atomic_read(&sdev->nr_admin_pending))
> >>
> >> This nr_admin_pending could drain the ongoing scsi_request_fn/scsi_queue_rq,
> >> but I'm afraid it cannot stop new ones coming in, such as the ones that have passed
> >> the sdev state checking and have not crossed the atomic_inc(&sdev->nr_admin_pending).
> >>
> > 
> > The counter of .nr_admin_pending is introduced for draining queued
> > admin requests to this scsi device.
> > 
> > Actually new requests have been prevented from entering scsi_queue_rq(),
> > please see the two callers of wait_event(sdev->admin_wq,
> > !atomic_read(&sdev->nr_admin_pending)).
> > 
> For example
> 
> _scsi_execute
> ...
>                                           scsi_internal_device_block
>                                             scsi_internal_device_block_nowait
>                                             blk_mq_quiesce_queue
>                                             wait_event(sdev->admin_wq, !atomic_read(&sdev->nr_admin_pending))
>   atomic_inc(&sdev->nr_admin_pending);
> 
>   blk_execute_rq（...）
> 
>   atomic_dec(&sdev->nr_admin_pending);
>   wake_up_all(&sdev->admin_wq);
> 
> Or do you mean the scsi_queue_rq -> scsi_prep_state_check could gate out of ?

I got it, then this issue can be fixed simply by moving atomic_inc/dec(&sdev->nr_admin_pending)
and related wake_up_all(&sdev->admin_wq) into scsi_admin_queue_rq().

Thanks,
Ming

jianchao.wang Sept. 18, 2018, 1:22 a.m. UTC | #5

Hi Ming

On 09/17/2018 07:35 PM, Ming Lei wrote:
> On Mon, Sep 17, 2018 at 10:46:34AM +0800, jianchao.wang wrote:
>> Hi Ming
>>
>> On 09/14/2018 07:33 PM, Ming Lei wrote:
>>> On Fri, Sep 14, 2018 at 11:40 AM jianchao.wang
>>> <jianchao.w.wang@oracle.com> wrote:
>>>>
>>>> Hi Ming
>>>>
>>>> On 09/13/2018 08:15 PM, Ming Lei wrote:
>>>>>  EXPORT_SYMBOL(__scsi_execute);
>>>>> @@ -3246,6 +3251,7 @@ static int scsi_internal_device_block(struct scsi_device *sdev)
>>>>>               else
>>>>>                       scsi_wait_for_queuecommand(sdev);
>>>>>       }
>>>>> +     wait_event(sdev->admin_wq, !atomic_read(&sdev->nr_admin_pending));
>>>>>       mutex_unlock(&sdev->state_mutex);
>>>>>
>>>>>       return err;
>>>> ...
>>>>> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
>>>>> index 3aee9464a7bf..8bcb7ecc0c06 100644
>>>>> --- a/drivers/scsi/scsi_sysfs.c
>>>>> +++ b/drivers/scsi/scsi_sysfs.c
>>>>> @@ -1393,6 +1393,7 @@ void __scsi_remove_device(struct scsi_device *sdev)
>>>>>
>>>>>       blk_cleanup_queue(sdev->request_queue);
>>>>>       cancel_work_sync(&sdev->requeue_work);
>>>>> +     wait_event(sdev->admin_wq, !atomic_read(&sdev->nr_admin_pending))
>>>>
>>>> This nr_admin_pending could drain the ongoing scsi_request_fn/scsi_queue_rq,
>>>> but I'm afraid it cannot stop new ones coming in, such as the ones that have passed
>>>> the sdev state checking and have not crossed the atomic_inc(&sdev->nr_admin_pending).
>>>>
>>>
>>> The counter of .nr_admin_pending is introduced for draining queued
>>> admin requests to this scsi device.
>>>
>>> Actually new requests have been prevented from entering scsi_queue_rq(),
>>> please see the two callers of wait_event(sdev->admin_wq,
>>> !atomic_read(&sdev->nr_admin_pending)).
>>>
>> For example
>>
>> _scsi_execute
>> ...
>>                                           scsi_internal_device_block
>>                                             scsi_internal_device_block_nowait
>>                                             blk_mq_quiesce_queue
>>                                             wait_event(sdev->admin_wq, !atomic_read(&sdev->nr_admin_pending))
>>   &sdev->nr_admin_pending;
>>
>>   blk_execute_rq（...）
>>
>>   atomic_dec(&sdev->nr_admin_pending);
>>   wake_up_all(&sdev->admin_wq);
>>
>> Or do you mean the scsi_queue_rq -> scsi_prep_state_check could gate out of ?
> 
> I got it, then this issue can be fixed simply by moving atomic_inc/dec(&sdev->nr_admin_pending)
> and related wake_up_all(&sdev->admin_wq) into scsi_admin_queue_rq().
> 

I don't think so. It is a similar scenario.

I guess a state checking is needed after atomic_inc(&sdev->nr_admin_pending), like:

 _scsi_execute
 ...
                                           scsi_internal_device_block
                                             scsi_internal_device_block_nowait
                                             blk_mq_quiesce_queue
                                             wait_event(sdev->admin_wq, !atomic_read(&sdev->nr_admin_pending))
   atomic_inc(&sdev->nr_admin_pending);
   if state checking fails
     goto done

   blk_execute_rq（...）

   atomic_dec(&sdev->nr_admin_pending);
   wake_up_all(&sdev->admin_wq);

Thanks
Jianchao

Ming Lei Sept. 18, 2018, 7:39 a.m. UTC | #6

On Tue, Sep 18, 2018 at 09:22:32AM +0800, jianchao.wang wrote:
> Hi Ming
> 
> On 09/17/2018 07:35 PM, Ming Lei wrote:
> > On Mon, Sep 17, 2018 at 10:46:34AM +0800, jianchao.wang wrote:
> >> Hi Ming
> >>
> >> On 09/14/2018 07:33 PM, Ming Lei wrote:
> >>> On Fri, Sep 14, 2018 at 11:40 AM jianchao.wang
> >>> <jianchao.w.wang@oracle.com> wrote:
> >>>>
> >>>> Hi Ming
> >>>>
> >>>> On 09/13/2018 08:15 PM, Ming Lei wrote:
> >>>>>  EXPORT_SYMBOL(__scsi_execute);
> >>>>> @@ -3246,6 +3251,7 @@ static int scsi_internal_device_block(struct scsi_device *sdev)
> >>>>>               else
> >>>>>                       scsi_wait_for_queuecommand(sdev);
> >>>>>       }
> >>>>> +     wait_event(sdev->admin_wq, !atomic_read(&sdev->nr_admin_pending));
> >>>>>       mutex_unlock(&sdev->state_mutex);
> >>>>>
> >>>>>       return err;
> >>>> ...
> >>>>> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
> >>>>> index 3aee9464a7bf..8bcb7ecc0c06 100644
> >>>>> --- a/drivers/scsi/scsi_sysfs.c
> >>>>> +++ b/drivers/scsi/scsi_sysfs.c
> >>>>> @@ -1393,6 +1393,7 @@ void __scsi_remove_device(struct scsi_device *sdev)
> >>>>>
> >>>>>       blk_cleanup_queue(sdev->request_queue);
> >>>>>       cancel_work_sync(&sdev->requeue_work);
> >>>>> +     wait_event(sdev->admin_wq, !atomic_read(&sdev->nr_admin_pending))
> >>>>
> >>>> This nr_admin_pending could drain the ongoing scsi_request_fn/scsi_queue_rq,
> >>>> but I'm afraid it cannot stop new ones coming in, such as the ones that have passed
> >>>> the sdev state checking and have not crossed the atomic_inc(&sdev->nr_admin_pending).
> >>>>
> >>>
> >>> The counter of .nr_admin_pending is introduced for draining queued
> >>> admin requests to this scsi device.
> >>>
> >>> Actually new requests have been prevented from entering scsi_queue_rq(),
> >>> please see the two callers of wait_event(sdev->admin_wq,
> >>> !atomic_read(&sdev->nr_admin_pending)).
> >>>
> >> For example
> >>
> >> _scsi_execute
> >> ...
> >>                                           scsi_internal_device_block
> >>                                             scsi_internal_device_block_nowait
> >>                                             blk_mq_quiesce_queue
> >>                                             wait_event(sdev->admin_wq, !atomic_read(&sdev->nr_admin_pending))
> >>   &sdev->nr_admin_pending;
> >>
> >>   blk_execute_rq（...）
> >>
> >>   atomic_dec(&sdev->nr_admin_pending);
> >>   wake_up_all(&sdev->admin_wq);
> >>
> >> Or do you mean the scsi_queue_rq -> scsi_prep_state_check could gate out of ?
> > 
> > I got it, then this issue can be fixed simply by moving atomic_inc/dec(&sdev->nr_admin_pending)
> > and related wake_up_all(&sdev->admin_wq) into scsi_admin_queue_rq().
> > 
> 
> I don't think so. It is a similar scenario.
> 
> I guess a state checking is needed after atomic_inc(&sdev->nr_admin_pending), like:
> 
>  _scsi_execute
>  ...
>                                            scsi_internal_device_block
>                                              scsi_internal_device_block_nowait
>                                              blk_mq_quiesce_queue
>                                              wait_event(sdev->admin_wq, !atomic_read(&sdev->nr_admin_pending))
>    atomic_inc(&sdev->nr_admin_pending);
>    if state checking fails
>      goto done

The check will be done in scsi_admin_queue_rq().

> 
>    blk_execute_rq（...）
> 
>    atomic_dec(&sdev->nr_admin_pending);
>    wake_up_all(&sdev->admin_wq);

I guess you may misunderstand the purpose of .nr_admin_pending, which is
for draining requests to .queue_rq(). So it is enough to just move the
inc/dec of .nr_admin_pending into scsi_admin_queue_rq(), right?

Thanks,
Ming

jianchao.wang Sept. 18, 2018, 7:51 a.m. UTC | #7

On 09/18/2018 03:39 PM, Ming Lei wrote:
> On Tue, Sep 18, 2018 at 09:22:32AM +0800, jianchao.wang wrote:
>> Hi Ming
>>
>> On 09/17/2018 07:35 PM, Ming Lei wrote:
>>> On Mon, Sep 17, 2018 at 10:46:34AM +0800, jianchao.wang wrote:
>>>> Hi Ming
>>>>
>>>> On 09/14/2018 07:33 PM, Ming Lei wrote:
>>>>> On Fri, Sep 14, 2018 at 11:40 AM jianchao.wang
>>>>> <jianchao.w.wang@oracle.com> wrote:
>>>>>>
>>>>>> Hi Ming
>>>>>>
>>>>>> On 09/13/2018 08:15 PM, Ming Lei wrote:
>>>>>>>  EXPORT_SYMBOL(__scsi_execute);
>>>>>>> @@ -3246,6 +3251,7 @@ static int scsi_internal_device_block(struct scsi_device *sdev)
>>>>>>>               else
>>>>>>>                       scsi_wait_for_queuecommand(sdev);
>>>>>>>       }
>>>>>>> +     wait_event(sdev->admin_wq, !atomic_read(&sdev->nr_admin_pending));
>>>>>>>       mutex_unlock(&sdev->state_mutex);
>>>>>>>
>>>>>>>       return err;
>>>>>> ...
>>>>>>> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
>>>>>>> index 3aee9464a7bf..8bcb7ecc0c06 100644
>>>>>>> --- a/drivers/scsi/scsi_sysfs.c
>>>>>>> +++ b/drivers/scsi/scsi_sysfs.c
>>>>>>> @@ -1393,6 +1393,7 @@ void __scsi_remove_device(struct scsi_device *sdev)
>>>>>>>
>>>>>>>       blk_cleanup_queue(sdev->request_queue);
>>>>>>>       cancel_work_sync(&sdev->requeue_work);
>>>>>>> +     wait_event(sdev->admin_wq, !atomic_read(&sdev->nr_admin_pending))
>>>>>>
>>>>>> This nr_admin_pending could drain the ongoing scsi_request_fn/scsi_queue_rq,
>>>>>> but I'm afraid it cannot stop new ones coming in, such as the ones that have passed
>>>>>> the sdev state checking and have not crossed the atomic_inc(&sdev->nr_admin_pending).
>>>>>>
>>>>>
>>>>> The counter of .nr_admin_pending is introduced for draining queued
>>>>> admin requests to this scsi device.
>>>>>
>>>>> Actually new requests have been prevented from entering scsi_queue_rq(),
>>>>> please see the two callers of wait_event(sdev->admin_wq,
>>>>> !atomic_read(&sdev->nr_admin_pending)).
>>>>>
>>>> For example
>>>>
>>>> _scsi_execute
>>>> ...
>>>>                                           scsi_internal_device_block
>>>>                                             scsi_internal_device_block_nowait
>>>>                                             blk_mq_quiesce_queue
>>>>                                             wait_event(sdev->admin_wq, !atomic_read(&sdev->nr_admin_pending))
>>>>   &sdev->nr_admin_pending;
>>>>
>>>>   blk_execute_rq（...）
>>>>
>>>>   atomic_dec(&sdev->nr_admin_pending);
>>>>   wake_up_all(&sdev->admin_wq);
>>>>
>>>> Or do you mean the scsi_queue_rq -> scsi_prep_state_check could gate out of ?
>>>
>>> I got it, then this issue can be fixed simply by moving atomic_inc/dec(&sdev->nr_admin_pending)
>>> and related wake_up_all(&sdev->admin_wq) into scsi_admin_queue_rq().
>>>
>>
>> I don't think so. It is a similar scenario.
>>
>> I guess a state checking is needed after atomic_inc(&sdev->nr_admin_pending), like:
>>
>>  _scsi_execute
>>  ...
>>                                            scsi_internal_device_block
>>                                              scsi_internal_device_block_nowait
>>                                              blk_mq_quiesce_queue
>>                                              wait_event(sdev->admin_wq, !atomic_read(&sdev->nr_admin_pending))
>>    atomic_inc(&sdev->nr_admin_pending);
>>    if state checking fails
>>      goto done
> 
> The check will be done in scsi_admin_queue_rq().
> 
>>
>>    blk_execute_rq（...）
>>
>>    atomic_dec(&sdev->nr_admin_pending);
>>    wake_up_all(&sdev->admin_wq);
> 
> I guess you may misunderstand the purpose of .nr_admin_pending, which is
> for draining requests to .queue_rq(). So it is enough to just move the
> inc/dec of .nr_admin_pending into scsi_admin_queue_rq(), right?
Yes.

But I just think of how to assign with queue quiesce.
The existence of per-host adminq seems to break it.

Thanks
Jianchao

Ming Lei Sept. 18, 2018, 7:55 a.m. UTC | #8

On Tue, Sep 18, 2018 at 03:51:10PM +0800, jianchao.wang wrote:
> 
> 
> On 09/18/2018 03:39 PM, Ming Lei wrote:
> > On Tue, Sep 18, 2018 at 09:22:32AM +0800, jianchao.wang wrote:
> >> Hi Ming
> >>
> >> On 09/17/2018 07:35 PM, Ming Lei wrote:
> >>> On Mon, Sep 17, 2018 at 10:46:34AM +0800, jianchao.wang wrote:
> >>>> Hi Ming
> >>>>
> >>>> On 09/14/2018 07:33 PM, Ming Lei wrote:
> >>>>> On Fri, Sep 14, 2018 at 11:40 AM jianchao.wang
> >>>>> <jianchao.w.wang@oracle.com> wrote:
> >>>>>>
> >>>>>> Hi Ming
> >>>>>>
> >>>>>> On 09/13/2018 08:15 PM, Ming Lei wrote:
> >>>>>>>  EXPORT_SYMBOL(__scsi_execute);
> >>>>>>> @@ -3246,6 +3251,7 @@ static int scsi_internal_device_block(struct scsi_device *sdev)
> >>>>>>>               else
> >>>>>>>                       scsi_wait_for_queuecommand(sdev);
> >>>>>>>       }
> >>>>>>> +     wait_event(sdev->admin_wq, !atomic_read(&sdev->nr_admin_pending));
> >>>>>>>       mutex_unlock(&sdev->state_mutex);
> >>>>>>>
> >>>>>>>       return err;
> >>>>>> ...
> >>>>>>> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
> >>>>>>> index 3aee9464a7bf..8bcb7ecc0c06 100644
> >>>>>>> --- a/drivers/scsi/scsi_sysfs.c
> >>>>>>> +++ b/drivers/scsi/scsi_sysfs.c
> >>>>>>> @@ -1393,6 +1393,7 @@ void __scsi_remove_device(struct scsi_device *sdev)
> >>>>>>>
> >>>>>>>       blk_cleanup_queue(sdev->request_queue);
> >>>>>>>       cancel_work_sync(&sdev->requeue_work);
> >>>>>>> +     wait_event(sdev->admin_wq, !atomic_read(&sdev->nr_admin_pending))
> >>>>>>
> >>>>>> This nr_admin_pending could drain the ongoing scsi_request_fn/scsi_queue_rq,
> >>>>>> but I'm afraid it cannot stop new ones coming in, such as the ones that have passed
> >>>>>> the sdev state checking and have not crossed the atomic_inc(&sdev->nr_admin_pending).
> >>>>>>
> >>>>>
> >>>>> The counter of .nr_admin_pending is introduced for draining queued
> >>>>> admin requests to this scsi device.
> >>>>>
> >>>>> Actually new requests have been prevented from entering scsi_queue_rq(),
> >>>>> please see the two callers of wait_event(sdev->admin_wq,
> >>>>> !atomic_read(&sdev->nr_admin_pending)).
> >>>>>
> >>>> For example
> >>>>
> >>>> _scsi_execute
> >>>> ...
> >>>>                                           scsi_internal_device_block
> >>>>                                             scsi_internal_device_block_nowait
> >>>>                                             blk_mq_quiesce_queue
> >>>>                                             wait_event(sdev->admin_wq, !atomic_read(&sdev->nr_admin_pending))
> >>>>   &sdev->nr_admin_pending;
> >>>>
> >>>>   blk_execute_rq（...）
> >>>>
> >>>>   atomic_dec(&sdev->nr_admin_pending);
> >>>>   wake_up_all(&sdev->admin_wq);
> >>>>
> >>>> Or do you mean the scsi_queue_rq -> scsi_prep_state_check could gate out of ?
> >>>
> >>> I got it, then this issue can be fixed simply by moving atomic_inc/dec(&sdev->nr_admin_pending)
> >>> and related wake_up_all(&sdev->admin_wq) into scsi_admin_queue_rq().
> >>>
> >>
> >> I don't think so. It is a similar scenario.
> >>
> >> I guess a state checking is needed after atomic_inc(&sdev->nr_admin_pending), like:
> >>
> >>  _scsi_execute
> >>  ...
> >>                                            scsi_internal_device_block
> >>                                              scsi_internal_device_block_nowait
> >>                                              blk_mq_quiesce_queue
> >>                                              wait_event(sdev->admin_wq, !atomic_read(&sdev->nr_admin_pending))
> >>    atomic_inc(&sdev->nr_admin_pending);
> >>    if state checking fails
> >>      goto done
> > 
> > The check will be done in scsi_admin_queue_rq().
> > 
> >>
> >>    blk_execute_rq（...）
> >>
> >>    atomic_dec(&sdev->nr_admin_pending);
> >>    wake_up_all(&sdev->admin_wq);
> > 
> > I guess you may misunderstand the purpose of .nr_admin_pending, which is
> > for draining requests to .queue_rq(). So it is enough to just move the
> > inc/dec of .nr_admin_pending into scsi_admin_queue_rq(), right?
> Yes.

Thanks for your confirmation.

> 
> But I just think of how to assign with queue quiesce.
> The existence of per-host adminq seems to break it.

The per-host adminq won't be frozen or quiesced at all, could you explain
your concern in a bit detail about 'assign with queue quiesce'? 

Thanks,
Ming

jianchao.wang Sept. 18, 2018, 8:25 a.m. UTC | #9

On 09/18/2018 03:55 PM, Ming Lei wrote:
> On Tue, Sep 18, 2018 at 03:51:10PM +0800, jianchao.wang wrote:
>>
>>
>> On 09/18/2018 03:39 PM, Ming Lei wrote:
>>> On Tue, Sep 18, 2018 at 09:22:32AM +0800, jianchao.wang wrote:
>>>> Hi Ming
>>>>
>>>> On 09/17/2018 07:35 PM, Ming Lei wrote:
>>>>> On Mon, Sep 17, 2018 at 10:46:34AM +0800, jianchao.wang wrote:
>>>>>> Hi Ming
>>>>>>
>>>>>> On 09/14/2018 07:33 PM, Ming Lei wrote:
>>>>>>> On Fri, Sep 14, 2018 at 11:40 AM jianchao.wang
>>>>>>> <jianchao.w.wang@oracle.com> wrote:
>>>>>>>>
>>>>>>>> Hi Ming
>>>>>>>>
>>>>>>>> On 09/13/2018 08:15 PM, Ming Lei wrote:
>>>>>>>>>  EXPORT_SYMBOL(__scsi_execute);
>>>>>>>>> @@ -3246,6 +3251,7 @@ static int scsi_internal_device_block(struct scsi_device *sdev)
>>>>>>>>>               else
>>>>>>>>>                       scsi_wait_for_queuecommand(sdev);
>>>>>>>>>       }
>>>>>>>>> +     wait_event(sdev->admin_wq, !atomic_read(&sdev->nr_admin_pending));
>>>>>>>>>       mutex_unlock(&sdev->state_mutex);
>>>>>>>>>
>>>>>>>>>       return err;
>>>>>>>> ...
>>>>>>>>> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
>>>>>>>>> index 3aee9464a7bf..8bcb7ecc0c06 100644
>>>>>>>>> --- a/drivers/scsi/scsi_sysfs.c
>>>>>>>>> +++ b/drivers/scsi/scsi_sysfs.c
>>>>>>>>> @@ -1393,6 +1393,7 @@ void __scsi_remove_device(struct scsi_device *sdev)
>>>>>>>>>
>>>>>>>>>       blk_cleanup_queue(sdev->request_queue);
>>>>>>>>>       cancel_work_sync(&sdev->requeue_work);
>>>>>>>>> +     wait_event(sdev->admin_wq, !atomic_read(&sdev->nr_admin_pending))
>>>>>>>>
>>>>>>>> This nr_admin_pending could drain the ongoing scsi_request_fn/scsi_queue_rq,
>>>>>>>> but I'm afraid it cannot stop new ones coming in, such as the ones that have passed
>>>>>>>> the sdev state checking and have not crossed the atomic_inc(&sdev->nr_admin_pending).
>>>>>>>>
>>>>>>>
>>>>>>> The counter of .nr_admin_pending is introduced for draining queued
>>>>>>> admin requests to this scsi device.
>>>>>>>
>>>>>>> Actually new requests have been prevented from entering scsi_queue_rq(),
>>>>>>> please see the two callers of wait_event(sdev->admin_wq,
>>>>>>> !atomic_read(&sdev->nr_admin_pending)).
>>>>>>>
>>>>>> For example
>>>>>>
>>>>>> _scsi_execute
>>>>>> ...
>>>>>>                                           scsi_internal_device_block
>>>>>>                                             scsi_internal_device_block_nowait
>>>>>>                                             blk_mq_quiesce_queue
>>>>>>                                             wait_event(sdev->admin_wq, !atomic_read(&sdev->nr_admin_pending))
>>>>>>   &sdev->nr_admin_pending;
>>>>>>
>>>>>>   blk_execute_rq（...）
>>>>>>
>>>>>>   atomic_dec(&sdev->nr_admin_pending);
>>>>>>   wake_up_all(&sdev->admin_wq);
>>>>>>
>>>>>> Or do you mean the scsi_queue_rq -> scsi_prep_state_check could gate out of ?
>>>>>
>>>>> I got it, then this issue can be fixed simply by moving atomic_inc/dec(&sdev->nr_admin_pending)
>>>>> and related wake_up_all(&sdev->admin_wq) into scsi_admin_queue_rq().
>>>>>
>>>>
>>>> I don't think so. It is a similar scenario.
>>>>
>>>> I guess a state checking is needed after atomic_inc(&sdev->nr_admin_pending), like:
>>>>
>>>>  _scsi_execute
>>>>  ...
>>>>                                            scsi_internal_device_block
>>>>                                              scsi_internal_device_block_nowait
>>>>                                              blk_mq_quiesce_queue
>>>>                                              wait_event(sdev->admin_wq, !atomic_read(&sdev->nr_admin_pending))
>>>>    atomic_inc(&sdev->nr_admin_pending);
>>>>    if state checking fails
>>>>      goto done
>>>
>>> The check will be done in scsi_admin_queue_rq().
>>>
>>>>
>>>>    blk_execute_rq（...）
>>>>
>>>>    atomic_dec(&sdev->nr_admin_pending);
>>>>    wake_up_all(&sdev->admin_wq);
>>>
>>> I guess you may misunderstand the purpose of .nr_admin_pending, which is
>>> for draining requests to .queue_rq(). So it is enough to just move the
>>> inc/dec of .nr_admin_pending into scsi_admin_queue_rq(), right?
>> Yes.
> 
> Thanks for your confirmation.
> 
>>
>> But I just think of how to assign with queue quiesce.
>> The existence of per-host adminq seems to break it.
> 
> The per-host adminq won't be frozen or quiesced at all,

This is the place which makes me confused.

Before this patchset, when scsi_internal_device_block quiesces the request_queue,
Both normal IO or admin requests cannot be processed any more.

Given the per-adminq, we could say every scsi device has two request_queues, or
there could be two request_queues sending requests to the a scsi_device:
 - normal IO request_queue
 - shared per-host adminq

We could quiesce the normal IO request_queue, but what's about the per-host adminq ?

> could you explain
> your concern in a bit detail about 'assign with queue quiesce'? 
>

The scsi_queue_rq->scsi_prep_state_check could stop requests entering further, but
the scsi_queue_rq is still invoked.


Thanks
Jianchao

Ming Lei Sept. 18, 2018, 12:15 p.m. UTC | #10

On Tue, Sep 18, 2018 at 04:25:33PM +0800, jianchao.wang wrote:
> 
> 
> On 09/18/2018 03:55 PM, Ming Lei wrote:
> > On Tue, Sep 18, 2018 at 03:51:10PM +0800, jianchao.wang wrote:
> >>
> >>
> >> On 09/18/2018 03:39 PM, Ming Lei wrote:
> >>> On Tue, Sep 18, 2018 at 09:22:32AM +0800, jianchao.wang wrote:
> >>>> Hi Ming
> >>>>
> >>>> On 09/17/2018 07:35 PM, Ming Lei wrote:
> >>>>> On Mon, Sep 17, 2018 at 10:46:34AM +0800, jianchao.wang wrote:
> >>>>>> Hi Ming
> >>>>>>
> >>>>>> On 09/14/2018 07:33 PM, Ming Lei wrote:
> >>>>>>> On Fri, Sep 14, 2018 at 11:40 AM jianchao.wang
> >>>>>>> <jianchao.w.wang@oracle.com> wrote:
> >>>>>>>>
> >>>>>>>> Hi Ming
> >>>>>>>>
> >>>>>>>> On 09/13/2018 08:15 PM, Ming Lei wrote:
> >>>>>>>>>  EXPORT_SYMBOL(__scsi_execute);
> >>>>>>>>> @@ -3246,6 +3251,7 @@ static int scsi_internal_device_block(struct scsi_device *sdev)
> >>>>>>>>>               else
> >>>>>>>>>                       scsi_wait_for_queuecommand(sdev);
> >>>>>>>>>       }
> >>>>>>>>> +     wait_event(sdev->admin_wq, !atomic_read(&sdev->nr_admin_pending));
> >>>>>>>>>       mutex_unlock(&sdev->state_mutex);
> >>>>>>>>>
> >>>>>>>>>       return err;
> >>>>>>>> ...
> >>>>>>>>> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
> >>>>>>>>> index 3aee9464a7bf..8bcb7ecc0c06 100644
> >>>>>>>>> --- a/drivers/scsi/scsi_sysfs.c
> >>>>>>>>> +++ b/drivers/scsi/scsi_sysfs.c
> >>>>>>>>> @@ -1393,6 +1393,7 @@ void __scsi_remove_device(struct scsi_device *sdev)
> >>>>>>>>>
> >>>>>>>>>       blk_cleanup_queue(sdev->request_queue);
> >>>>>>>>>       cancel_work_sync(&sdev->requeue_work);
> >>>>>>>>> +     wait_event(sdev->admin_wq, !atomic_read(&sdev->nr_admin_pending))
> >>>>>>>>
> >>>>>>>> This nr_admin_pending could drain the ongoing scsi_request_fn/scsi_queue_rq,
> >>>>>>>> but I'm afraid it cannot stop new ones coming in, such as the ones that have passed
> >>>>>>>> the sdev state checking and have not crossed the atomic_inc(&sdev->nr_admin_pending).
> >>>>>>>>
> >>>>>>>
> >>>>>>> The counter of .nr_admin_pending is introduced for draining queued
> >>>>>>> admin requests to this scsi device.
> >>>>>>>
> >>>>>>> Actually new requests have been prevented from entering scsi_queue_rq(),
> >>>>>>> please see the two callers of wait_event(sdev->admin_wq,
> >>>>>>> !atomic_read(&sdev->nr_admin_pending)).
> >>>>>>>
> >>>>>> For example
> >>>>>>
> >>>>>> _scsi_execute
> >>>>>> ...
> >>>>>>                                           scsi_internal_device_block
> >>>>>>                                             scsi_internal_device_block_nowait
> >>>>>>                                             blk_mq_quiesce_queue
> >>>>>>                                             wait_event(sdev->admin_wq, !atomic_read(&sdev->nr_admin_pending))
> >>>>>>   &sdev->nr_admin_pending;
> >>>>>>
> >>>>>>   blk_execute_rq（...）
> >>>>>>
> >>>>>>   atomic_dec(&sdev->nr_admin_pending);
> >>>>>>   wake_up_all(&sdev->admin_wq);
> >>>>>>
> >>>>>> Or do you mean the scsi_queue_rq -> scsi_prep_state_check could gate out of ?
> >>>>>
> >>>>> I got it, then this issue can be fixed simply by moving atomic_inc/dec(&sdev->nr_admin_pending)
> >>>>> and related wake_up_all(&sdev->admin_wq) into scsi_admin_queue_rq().
> >>>>>
> >>>>
> >>>> I don't think so. It is a similar scenario.
> >>>>
> >>>> I guess a state checking is needed after atomic_inc(&sdev->nr_admin_pending), like:
> >>>>
> >>>>  _scsi_execute
> >>>>  ...
> >>>>                                            scsi_internal_device_block
> >>>>                                              scsi_internal_device_block_nowait
> >>>>                                              blk_mq_quiesce_queue
> >>>>                                              wait_event(sdev->admin_wq, !atomic_read(&sdev->nr_admin_pending))
> >>>>    atomic_inc(&sdev->nr_admin_pending);
> >>>>    if state checking fails
> >>>>      goto done
> >>>
> >>> The check will be done in scsi_admin_queue_rq().
> >>>
> >>>>
> >>>>    blk_execute_rq（...）
> >>>>
> >>>>    atomic_dec(&sdev->nr_admin_pending);
> >>>>    wake_up_all(&sdev->admin_wq);
> >>>
> >>> I guess you may misunderstand the purpose of .nr_admin_pending, which is
> >>> for draining requests to .queue_rq(). So it is enough to just move the
> >>> inc/dec of .nr_admin_pending into scsi_admin_queue_rq(), right?
> >> Yes.
> > 
> > Thanks for your confirmation.
> > 
> >>
> >> But I just think of how to assign with queue quiesce.
> >> The existence of per-host adminq seems to break it.
> > 
> > The per-host adminq won't be frozen or quiesced at all,
> 
> This is the place which makes me confused.
> 
> Before this patchset, when scsi_internal_device_block quiesces the request_queue,
> Both normal IO or admin requests cannot be processed any more.
> 
> Given the per-adminq, we could say every scsi device has two request_queues, or
> there could be two request_queues sending requests to the a scsi_device:
>  - normal IO request_queue
>  - shared per-host adminq
> 
> We could quiesce the normal IO request_queue, but what's about the per-host adminq ?

The adminq request can be quisced in the scsi_device level, just as normal IO
request, because the scsi_device instance is same with normal IO.

But in blk-mq level, this admin queue won't be quiesced or frozen at all.

> 
> > could you explain
> > your concern in a bit detail about 'assign with queue quiesce'? 
> >
> 
> The scsi_queue_rq->scsi_prep_state_check could stop requests entering further, but
> the scsi_queue_rq is still invoked.

As I mentioned, there isn't any change in scsi level about handling the
admin request because the scsi_device is shared, and scsi_host too.

Thanks,
Ming

jianchao.wang Sept. 19, 2018, 3:52 a.m. UTC | #11

Hi Ming

On 09/18/2018 08:15 PM, Ming Lei wrote:
> The adminq request can be quisced in the scsi_device level, just as normal IO
> request, because the scsi_device instance is same with normal IO.

Yes, the scsi_queue_rq->scsi_prep_state_check could gate things out of.

> 
> But in blk-mq level, this admin queue won't be quiesced or frozen at all.

But the scsi_queue_rq/scsi_request_fn will be invoked, right ?

This is different from the original implementation.

Thanks
Jianchao

Ming Lei Sept. 19, 2018, 8:07 a.m. UTC | #12

On Wed, Sep 19, 2018 at 11:52:21AM +0800, jianchao.wang wrote:
> Hi Ming
> 
> On 09/18/2018 08:15 PM, Ming Lei wrote:
> > The adminq request can be quisced in the scsi_device level, just as normal IO
> > request, because the scsi_device instance is same with normal IO.
> 
> Yes, the scsi_queue_rq->scsi_prep_state_check could gate things out of.
> 
> > 
> > But in blk-mq level, this admin queue won't be quiesced or frozen at all.
> 
> But the scsi_queue_rq/scsi_request_fn will be invoked, right ?
> 
> This is different from the original implementation.

For legacy path, as you mentioned, scsi_prep_fn() will work, then
scsi_request_fn() won't be called.

For blk-mq, scsi_queue_rq() is always called, and that is a bit
different, but the result is fine given admin request is submitted
much less frequently.

Thanks,
Ming

[V3,11/17] SCSI: track pending admin commands

Commit Message

Comments

Patch