diff mbox

[9/9,RFC] nvme: Fix a race condition

Message ID 9c372b04-a194-58c4-a64f-b155b52a5244@sandisk.com (mailing list archive)
State Superseded
Headers show

Commit Message

Bart Van Assche Sept. 26, 2016, 6:28 p.m. UTC
Avoid that nvme_queue_rq() is still running when nvme_stop_queues()
returns. Untested.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
---
 drivers/nvme/host/core.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

Comments

Steve Wise Sept. 27, 2016, 4:31 p.m. UTC | #1
> @@ -2079,11 +2075,15 @@ EXPORT_SYMBOL_GPL(nvme_kill_queues);
>  void nvme_stop_queues(struct nvme_ctrl *ctrl)
>  {
>  	struct nvme_ns *ns;
> +	struct request_queue *q;
> 
>  	mutex_lock(&ctrl->namespaces_mutex);
>  	list_for_each_entry(ns, &ctrl->namespaces, list) {
> -		blk_mq_cancel_requeue_work(ns->queue);
> -		blk_mq_stop_hw_queues(ns->queue);
> +		q = ns->queue;
> +		blk_quiesce_queue(q);
> +		blk_mq_cancel_requeue_work(q);
> +		blk_mq_stop_hw_queues(q);
> +		blk_resume_queue(q);
>  	}
>  	mutex_unlock(&ctrl->namespaces_mutex);

Hey Bart, should nvme_stop_queues() really be resuming the blk queue?



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bart Van Assche Sept. 27, 2016, 4:43 p.m. UTC | #2
On 09/27/2016 09:31 AM, Steve Wise wrote:
>> @@ -2079,11 +2075,15 @@ EXPORT_SYMBOL_GPL(nvme_kill_queues);
>>  void nvme_stop_queues(struct nvme_ctrl *ctrl)
>>  {
>>  	struct nvme_ns *ns;
>> +	struct request_queue *q;
>>
>>  	mutex_lock(&ctrl->namespaces_mutex);
>>  	list_for_each_entry(ns, &ctrl->namespaces, list) {
>> -		blk_mq_cancel_requeue_work(ns->queue);
>> -		blk_mq_stop_hw_queues(ns->queue);
>> +		q = ns->queue;
>> +		blk_quiesce_queue(q);
>> +		blk_mq_cancel_requeue_work(q);
>> +		blk_mq_stop_hw_queues(q);
>> +		blk_resume_queue(q);
>>  	}
>>  	mutex_unlock(&ctrl->namespaces_mutex);
>
> Hey Bart, should nvme_stop_queues() really be resuming the blk queue?

Hello Steve,

Would you perhaps prefer that blk_resume_queue(q) is called from 
nvme_start_queues()? I think that would make the NVMe code harder to 
review. The above code won't cause any unexpected side effects if an 
NVMe namespace is removed after nvme_stop_queues() has been called and 
before nvme_start_queues() is called. Moving the blk_resume_queue(q) 
call into nvme_start_queues() will only work as expected if no 
namespaces are added nor removed between the nvme_stop_queues() and 
nvme_start_queues() calls. I'm not familiar enough with the NVMe code to 
know whether or not this change is safe ...

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
James Bottomley Sept. 27, 2016, 4:56 p.m. UTC | #3
On Tue, 2016-09-27 at 09:43 -0700, Bart Van Assche wrote:
> On 09/27/2016 09:31 AM, Steve Wise wrote:
> > > @@ -2079,11 +2075,15 @@ EXPORT_SYMBOL_GPL(nvme_kill_queues);
> > >  void nvme_stop_queues(struct nvme_ctrl *ctrl)
> > >  {
> > >  	struct nvme_ns *ns;
> > > +	struct request_queue *q;
> > > 
> > >  	mutex_lock(&ctrl->namespaces_mutex);
> > >  	list_for_each_entry(ns, &ctrl->namespaces, list) {
> > > -		blk_mq_cancel_requeue_work(ns->queue);
> > > -		blk_mq_stop_hw_queues(ns->queue);
> > > +		q = ns->queue;
> > > +		blk_quiesce_queue(q);
> > > +		blk_mq_cancel_requeue_work(q);
> > > +		blk_mq_stop_hw_queues(q);
> > > +		blk_resume_queue(q);
> > >  	}
> > >  	mutex_unlock(&ctrl->namespaces_mutex);
> > 
> > Hey Bart, should nvme_stop_queues() really be resuming the blk
> > queue?
> 
> Hello Steve,
> 
> Would you perhaps prefer that blk_resume_queue(q) is called from 
> nvme_start_queues()? I think that would make the NVMe code harder to 
> review. The above code won't cause any unexpected side effects if an 
> NVMe namespace is removed after nvme_stop_queues() has been called 
> and before nvme_start_queues() is called. Moving the 
> blk_resume_queue(q) call into nvme_start_queues() will only work as 
> expected if no namespaces are added nor removed between the 
> nvme_stop_queues() and nvme_start_queues() calls. I'm not familiar 
> enough with the NVMe code to know whether or not this change is safe
> ...

It's something that looks obviously wrong, so explain why you need to
do it, preferably in a comment above the function.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steve Wise Sept. 27, 2016, 4:56 p.m. UTC | #4
> On 09/27/2016 09:31 AM, Steve Wise wrote:
> >> @@ -2079,11 +2075,15 @@ EXPORT_SYMBOL_GPL(nvme_kill_queues);
> >>  void nvme_stop_queues(struct nvme_ctrl *ctrl)
> >>  {
> >>  	struct nvme_ns *ns;
> >> +	struct request_queue *q;
> >>
> >>  	mutex_lock(&ctrl->namespaces_mutex);
> >>  	list_for_each_entry(ns, &ctrl->namespaces, list) {
> >> -		blk_mq_cancel_requeue_work(ns->queue);
> >> -		blk_mq_stop_hw_queues(ns->queue);
> >> +		q = ns->queue;
> >> +		blk_quiesce_queue(q);
> >> +		blk_mq_cancel_requeue_work(q);
> >> +		blk_mq_stop_hw_queues(q);
> >> +		blk_resume_queue(q);
> >>  	}
> >>  	mutex_unlock(&ctrl->namespaces_mutex);
> >
> > Hey Bart, should nvme_stop_queues() really be resuming the blk queue?
> 
> Hello Steve,
> 
> Would you perhaps prefer that blk_resume_queue(q) is called from
> nvme_start_queues()? I think that would make the NVMe code harder to
> review. 

I'm still learning the blk code (and nvme code :)), but I would think
blk_resume_queue() would cause requests to start being submit on the NVME
queues, which I believe shouldn't happen when they are stopped.  I'm currently
debugging a problem where requests are submitted to the nvme-rdma driver while
it has supposedly stopped all the nvme and blk mqs.  I tried your series at
Christoph's request to see if it resolved my problem, but it didn't.  

> The above code won't cause any unexpected side effects if an
> NVMe namespace is removed after nvme_stop_queues() has been called and
> before nvme_start_queues() is called. Moving the blk_resume_queue(q)
> call into nvme_start_queues() will only work as expected if no
> namespaces are added nor removed between the nvme_stop_queues() and
> nvme_start_queues() calls. I'm not familiar enough with the NVMe code to
> know whether or not this change is safe ...
> 

I'll have to look and see if new namespaces can be added/deleted while a nvme
controller is in the RECONNECTING state.   In the meantime, I'm going to move
the blk_resume_queue() to nvme_start_queues() and see if it helps my problem.

Christoph:  Thoughts?

Steve.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bart Van Assche Sept. 27, 2016, 5:09 p.m. UTC | #5
On 09/27/2016 09:56 AM, James Bottomley wrote:
> On Tue, 2016-09-27 at 09:43 -0700, Bart Van Assche wrote:
>> On 09/27/2016 09:31 AM, Steve Wise wrote:
>>>> @@ -2079,11 +2075,15 @@ EXPORT_SYMBOL_GPL(nvme_kill_queues);
>>>>  void nvme_stop_queues(struct nvme_ctrl *ctrl)
>>>>  {
>>>>  	struct nvme_ns *ns;
>>>> +	struct request_queue *q;
>>>>
>>>>  	mutex_lock(&ctrl->namespaces_mutex);
>>>>  	list_for_each_entry(ns, &ctrl->namespaces, list) {
>>>> -		blk_mq_cancel_requeue_work(ns->queue);
>>>> -		blk_mq_stop_hw_queues(ns->queue);
>>>> +		q = ns->queue;
>>>> +		blk_quiesce_queue(q);
>>>> +		blk_mq_cancel_requeue_work(q);
>>>> +		blk_mq_stop_hw_queues(q);
>>>> +		blk_resume_queue(q);
>>>>  	}
>>>>  	mutex_unlock(&ctrl->namespaces_mutex);
>>>
>>> Hey Bart, should nvme_stop_queues() really be resuming the blk
>>> queue?
>>
>> Hello Steve,
>>
>> Would you perhaps prefer that blk_resume_queue(q) is called from
>> nvme_start_queues()? I think that would make the NVMe code harder to
>> review. The above code won't cause any unexpected side effects if an
>> NVMe namespace is removed after nvme_stop_queues() has been called
>> and before nvme_start_queues() is called. Moving the
>> blk_resume_queue(q) call into nvme_start_queues() will only work as
>> expected if no namespaces are added nor removed between the
>> nvme_stop_queues() and nvme_start_queues() calls. I'm not familiar
>> enough with the NVMe code to know whether or not this change is safe
>> ...
>
> It's something that looks obviously wrong, so explain why you need to
> do it, preferably in a comment above the function.

Hello James and Steve,

I will add a comment.

Please note that the above patch does not change the behavior of 
nvme_stop_queues() except that it causes nvme_stop_queues() to wait 
until any ongoing nvme_queue_rq() calls have finished. 
blk_resume_queue() does not affect the value of the BLK_MQ_S_STOPPED bit 
that has been set by blk_mq_stop_hw_queues(). All it does is to resume 
pending blk_queue_enter() calls and to ensure that future 
blk_queue_enter() calls do not block. Even after blk_resume_queue() has 
been called if a new request is queued queue_rq() won't be invoked 
because the BLK_MQ_S_STOPPED bit is still set. Patch "dm: Fix a race 
condition related to stopping and starting queues" realizes a similar 
change in the dm driver and that change has been tested extensively.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steve Wise Sept. 28, 2016, 2:23 p.m. UTC | #6
> 
> Hello James and Steve,
> 
> I will add a comment.
> 
> Please note that the above patch does not change the behavior of
> nvme_stop_queues() except that it causes nvme_stop_queues() to wait
> until any ongoing nvme_queue_rq() calls have finished.
> blk_resume_queue() does not affect the value of the BLK_MQ_S_STOPPED bit
> that has been set by blk_mq_stop_hw_queues(). All it does is to resume
> pending blk_queue_enter() calls and to ensure that future
> blk_queue_enter() calls do not block. Even after blk_resume_queue() has
> been called if a new request is queued queue_rq() won't be invoked
> because the BLK_MQ_S_STOPPED bit is still set. Patch "dm: Fix a race
> condition related to stopping and starting queues" realizes a similar
> change in the dm driver and that change has been tested extensively.
> 

Thanks for the detailed explanation!  I think your code, then, is correct as-is.   And this series doesn't fix the issue I'm hitting, so I'll keep digging. :)

Steve.  

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 057f1fa..6e2bf6a 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -201,13 +201,9 @@  fail:
 
 void nvme_requeue_req(struct request *req)
 {
-	unsigned long flags;
-
 	blk_mq_requeue_request(req);
-	spin_lock_irqsave(req->q->queue_lock, flags);
-	if (!blk_mq_queue_stopped(req->q))
-		blk_mq_kick_requeue_list(req->q);
-	spin_unlock_irqrestore(req->q->queue_lock, flags);
+	WARN_ON_ONCE(blk_mq_queue_stopped(req->q));
+	blk_mq_kick_requeue_list(req->q);
 }
 EXPORT_SYMBOL_GPL(nvme_requeue_req);
 
@@ -2079,11 +2075,15 @@  EXPORT_SYMBOL_GPL(nvme_kill_queues);
 void nvme_stop_queues(struct nvme_ctrl *ctrl)
 {
 	struct nvme_ns *ns;
+	struct request_queue *q;
 
 	mutex_lock(&ctrl->namespaces_mutex);
 	list_for_each_entry(ns, &ctrl->namespaces, list) {
-		blk_mq_cancel_requeue_work(ns->queue);
-		blk_mq_stop_hw_queues(ns->queue);
+		q = ns->queue;
+		blk_quiesce_queue(q);
+		blk_mq_cancel_requeue_work(q);
+		blk_mq_stop_hw_queues(q);
+		blk_resume_queue(q);
 	}
 	mutex_unlock(&ctrl->namespaces_mutex);
 }