[2/8] blk-mq: protect completion path with RCU

Message ID	20180108191542.379478-3-tj@kernel.org (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-btrfs-owner@kernel.org> From: Tejun Heo <tj@kernel.org> To: jack@suse.cz, axboe@kernel.dk, clm@fb.com, jbacik@fb.com Cc: kernel-team@fb.com, linux-kernel@vger.kernel.org, linux-btrfs@vger.kernel.org, peterz@infradead.org, jianchao.w.wang@oracle.com, Bart.VanAssche@wdc.com, linux-block@vger.kernel.org, Tejun Heo <tj@kernel.org> Subject: [PATCH 2/8] blk-mq: protect completion path with RCU Date: Mon, 8 Jan 2018 11:15:36 -0800 Message-Id: <20180108191542.379478-3-tj@kernel.org> In-Reply-To: <20180108191542.379478-1-tj@kernel.org> References: <20180108191542.379478-1-tj@kernel.org> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk

Tejun Heo Jan. 8, 2018, 7:15 p.m. UTC

Currently, blk-mq protects only the issue path with RCU.  This patch
puts the completion path under the same RCU protection.  This will be
used to synchronize issue/completion against timeout by later patches,
which will also add the comments.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 block/blk-mq.c | 5 +++++
 1 file changed, 5 insertions(+)

Holger Hoffstätte Jan. 8, 2018, 7:57 p.m. UTC | #1

On 01/08/18 20:15, Tejun Heo wrote:
> Currently, blk-mq protects only the issue path with RCU.  This patch
> puts the completion path under the same RCU protection.  This will be
> used to synchronize issue/completion against timeout by later patches,
> which will also add the comments.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> ---
>  block/blk-mq.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index ddc9261..6741c3e 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -584,11 +584,16 @@ static void hctx_lock(struct blk_mq_hw_ctx *hctx, int *srcu_idx)
>  void blk_mq_complete_request(struct request *rq)
>  {
>  	struct request_queue *q = rq->q;
> +	struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(q, rq->mq_ctx->cpu);
> +	int srcu_idx;
>  
>  	if (unlikely(blk_should_fake_timeout(q)))
>  		return;
> +
> +	hctx_lock(hctx, &srcu_idx);
>  	if (!blk_mark_rq_complete(rq))
>  		__blk_mq_complete_request(rq);
> +	hctx_unlock(hctx, srcu_idx);
>  }
>  EXPORT_SYMBOL(blk_mq_complete_request);

So I've had v3 running fine with 4.14++ and when I first tried Jens'
additional helpers on top, I got a bunch of warnings which I didn't
investigate further at the time. Now they are back since the helpers
moved into patch #1 and #2 correctly says:

..
block/blk-mq.c: In function ‘blk_mq_complete_request’:
./include/linux/srcu.h:175:2: warning: ‘srcu_idx’ may be used uninitialized in this function [-Wmaybe-uninitialized]
  __srcu_read_unlock(sp, idx);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~
block/blk-mq.c:587:6: note: ‘srcu_idx’ was declared here
  int srcu_idx;
      ^~~~~~~~
..etc.

This is with gcc 7.2.0.

I understand that this is a somewhat-false positive since the lock always
precedes the unlock & writes to the value, but can we properly initialize
or annotate this?

cheers
Holger
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jens Axboe Jan. 8, 2018, 8:15 p.m. UTC | #2

On 1/8/18 12:57 PM, Holger Hoffstätte wrote:
> On 01/08/18 20:15, Tejun Heo wrote:
>> Currently, blk-mq protects only the issue path with RCU.  This patch
>> puts the completion path under the same RCU protection.  This will be
>> used to synchronize issue/completion against timeout by later patches,
>> which will also add the comments.
>>
>> Signed-off-by: Tejun Heo <tj@kernel.org>
>> ---
>>  block/blk-mq.c | 5 +++++
>>  1 file changed, 5 insertions(+)
>>
>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>> index ddc9261..6741c3e 100644
>> --- a/block/blk-mq.c
>> +++ b/block/blk-mq.c
>> @@ -584,11 +584,16 @@ static void hctx_lock(struct blk_mq_hw_ctx *hctx, int *srcu_idx)
>>  void blk_mq_complete_request(struct request *rq)
>>  {
>>  	struct request_queue *q = rq->q;
>> +	struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(q, rq->mq_ctx->cpu);
>> +	int srcu_idx;
>>  
>>  	if (unlikely(blk_should_fake_timeout(q)))
>>  		return;
>> +
>> +	hctx_lock(hctx, &srcu_idx);
>>  	if (!blk_mark_rq_complete(rq))
>>  		__blk_mq_complete_request(rq);
>> +	hctx_unlock(hctx, srcu_idx);
>>  }
>>  EXPORT_SYMBOL(blk_mq_complete_request);
> 
> So I've had v3 running fine with 4.14++ and when I first tried Jens'
> additional helpers on top, I got a bunch of warnings which I didn't
> investigate further at the time. Now they are back since the helpers
> moved into patch #1 and #2 correctly says:
> 
> ..
> block/blk-mq.c: In function ‘blk_mq_complete_request’:
> ./include/linux/srcu.h:175:2: warning: ‘srcu_idx’ may be used uninitialized in this function [-Wmaybe-uninitialized]
>   __srcu_read_unlock(sp, idx);
>   ^~~~~~~~~~~~~~~~~~~~~~~~~~~
> block/blk-mq.c:587:6: note: ‘srcu_idx’ was declared here
>   int srcu_idx;
>       ^~~~~~~~
> ..etc.
> 
> This is with gcc 7.2.0.
> 
> I understand that this is a somewhat-false positive since the lock always
> precedes the unlock & writes to the value, but can we properly initialize
> or annotate this?

It's not a somewhat false positive, it's a false positive. I haven't seen
that bogus warning with the compiler I'm running:

gcc (Ubuntu 7.2.0-1ubuntu1~16.04) 7.2.0

and

gcc (GCC) 7.2.0

Neither of them throw the warning.

Jens Axboe Jan. 8, 2018, 10:55 p.m. UTC | #3

On 1/8/18 1:15 PM, Jens Axboe wrote:
> On 1/8/18 12:57 PM, Holger Hoffstätte wrote:
>> On 01/08/18 20:15, Tejun Heo wrote:
>>> Currently, blk-mq protects only the issue path with RCU.  This patch
>>> puts the completion path under the same RCU protection.  This will be
>>> used to synchronize issue/completion against timeout by later patches,
>>> which will also add the comments.
>>>
>>> Signed-off-by: Tejun Heo <tj@kernel.org>
>>> ---
>>>  block/blk-mq.c | 5 +++++
>>>  1 file changed, 5 insertions(+)
>>>
>>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>>> index ddc9261..6741c3e 100644
>>> --- a/block/blk-mq.c
>>> +++ b/block/blk-mq.c
>>> @@ -584,11 +584,16 @@ static void hctx_lock(struct blk_mq_hw_ctx *hctx, int *srcu_idx)
>>>  void blk_mq_complete_request(struct request *rq)
>>>  {
>>>  	struct request_queue *q = rq->q;
>>> +	struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(q, rq->mq_ctx->cpu);
>>> +	int srcu_idx;
>>>  
>>>  	if (unlikely(blk_should_fake_timeout(q)))
>>>  		return;
>>> +
>>> +	hctx_lock(hctx, &srcu_idx);
>>>  	if (!blk_mark_rq_complete(rq))
>>>  		__blk_mq_complete_request(rq);
>>> +	hctx_unlock(hctx, srcu_idx);
>>>  }
>>>  EXPORT_SYMBOL(blk_mq_complete_request);
>>
>> So I've had v3 running fine with 4.14++ and when I first tried Jens'
>> additional helpers on top, I got a bunch of warnings which I didn't
>> investigate further at the time. Now they are back since the helpers
>> moved into patch #1 and #2 correctly says:
>>
>> ..
>> block/blk-mq.c: In function ‘blk_mq_complete_request’:
>> ./include/linux/srcu.h:175:2: warning: ‘srcu_idx’ may be used uninitialized in this function [-Wmaybe-uninitialized]
>>   __srcu_read_unlock(sp, idx);
>>   ^~~~~~~~~~~~~~~~~~~~~~~~~~~
>> block/blk-mq.c:587:6: note: ‘srcu_idx’ was declared here
>>   int srcu_idx;
>>       ^~~~~~~~
>> ..etc.
>>
>> This is with gcc 7.2.0.
>>
>> I understand that this is a somewhat-false positive since the lock always
>> precedes the unlock & writes to the value, but can we properly initialize
>> or annotate this?
> 
> It's not a somewhat false positive, it's a false positive. I haven't seen
> that bogus warning with the compiler I'm running:
> 
> gcc (Ubuntu 7.2.0-1ubuntu1~16.04) 7.2.0
> 
> and
> 
> gcc (GCC) 7.2.0
> 
> Neither of them throw the warning.

Are you on non-x86? Really bothers me to have to add a work-around
for something that's obviously a false positive.

I forget if we have some gcc/compiler annotation for this, otherwise
the good old

int srcu_idx = srcu_idx;

should get the job done.

Holger Hoffstätte Jan. 8, 2018, 11:27 p.m. UTC | #4

On 01/08/18 23:55, Jens Axboe wrote:
> On 1/8/18 1:15 PM, Jens Axboe wrote:
>> On 1/8/18 12:57 PM, Holger Hoffstätte wrote:
>>> On 01/08/18 20:15, Tejun Heo wrote:
>>>> Currently, blk-mq protects only the issue path with RCU.  This patch
>>>> puts the completion path under the same RCU protection.  This will be
>>>> used to synchronize issue/completion against timeout by later patches,
>>>> which will also add the comments.
>>>>
>>>> Signed-off-by: Tejun Heo <tj@kernel.org>
>>>> ---
>>>>  block/blk-mq.c | 5 +++++
>>>>  1 file changed, 5 insertions(+)
>>>>
>>>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>>>> index ddc9261..6741c3e 100644
>>>> --- a/block/blk-mq.c
>>>> +++ b/block/blk-mq.c
>>>> @@ -584,11 +584,16 @@ static void hctx_lock(struct blk_mq_hw_ctx *hctx, int *srcu_idx)
>>>>  void blk_mq_complete_request(struct request *rq)
>>>>  {
>>>>  	struct request_queue *q = rq->q;
>>>> +	struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(q, rq->mq_ctx->cpu);
>>>> +	int srcu_idx;
>>>>  
>>>>  	if (unlikely(blk_should_fake_timeout(q)))
>>>>  		return;
>>>> +
>>>> +	hctx_lock(hctx, &srcu_idx);
>>>>  	if (!blk_mark_rq_complete(rq))
>>>>  		__blk_mq_complete_request(rq);
>>>> +	hctx_unlock(hctx, srcu_idx);
>>>>  }
>>>>  EXPORT_SYMBOL(blk_mq_complete_request);
>>>
>>> So I've had v3 running fine with 4.14++ and when I first tried Jens'
>>> additional helpers on top, I got a bunch of warnings which I didn't
>>> investigate further at the time. Now they are back since the helpers
>>> moved into patch #1 and #2 correctly says:
>>>
>>> ..
>>> block/blk-mq.c: In function ‘blk_mq_complete_request’:
>>> ./include/linux/srcu.h:175:2: warning: ‘srcu_idx’ may be used uninitialized in this function [-Wmaybe-uninitialized]
>>>   __srcu_read_unlock(sp, idx);
>>>   ^~~~~~~~~~~~~~~~~~~~~~~~~~~
>>> block/blk-mq.c:587:6: note: ‘srcu_idx’ was declared here
>>>   int srcu_idx;
>>>       ^~~~~~~~
>>> ..etc.
>>>
>>> This is with gcc 7.2.0.
>>>
>>> I understand that this is a somewhat-false positive since the lock always
>>> precedes the unlock & writes to the value, but can we properly initialize
>>> or annotate this?
>>
>> It's not a somewhat false positive, it's a false positive. I haven't seen
>> that bogus warning with the compiler I'm running:
>>
>> gcc (Ubuntu 7.2.0-1ubuntu1~16.04) 7.2.0
>>
>> and
>>
>> gcc (GCC) 7.2.0
>>
>> Neither of them throw the warning.
> 
> Are you on non-x86? Really bothers me to have to add a work-around
> for something that's obviously a false positive.

No, plain old x86-64 on Gentoo (*without* crazy hardening or extras).
But don't bother spending too much time on this, I can live with the
warning even though I found it curious. Not even sure why you don't
see this; according to git -Wmaybe-uninitialized is part of the
"extrawarn" make flags since ~2016.

Apparently gcc can't see that the first branch in the lock-helper
implies that the first branch in the unlock helper will also be taken
unconditionally (and how could it? nested branch condition elision?),
so it concludes srcu_idx "may" be used uninitialized in unlock, and
that's not generally wrong. It just happens to be in this case.

> I forget if we have some gcc/compiler annotation for this, otherwise

I actually went looking for gcc bugs, pragmas, annotations and whatnot;
there are many bugs depending on optimizer, the #pragmas don't work
and a potential __attribute__(initialized) was only discussed, but
apparently never implemented. \o/

> the good old
> 
> int srcu_idx = srcu_idx;
> 
> should get the job done.

(Narrator: It didn't.)

Isn't there some magic value that will never be used by regular
operations? 0 or -1? It should not matter much since it will be
consistently overwritten or left alone.

cheers
Holger
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Holger Hoffstätte Jan. 8, 2018, 11:33 p.m. UTC | #5

On 01/09/18 00:27, Holger Hoffstätte wrote:
> On 01/08/18 23:55, Jens Axboe wrote:
>> the good old
>>
>> int srcu_idx = srcu_idx;
>>
>> should get the job done.
> 
> (Narrator: It didn't.)

Narrator: we retract our previous statement and apologize for the
confusion. It works fine when you correctly replace all occurrences,
not just one. <:)

Holger
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Hannes Reinecke Jan. 9, 2018, 7:08 a.m. UTC | #6

On 01/08/2018 08:15 PM, Tejun Heo wrote:
> Currently, blk-mq protects only the issue path with RCU.  This patch
> puts the completion path under the same RCU protection.  This will be
> used to synchronize issue/completion against timeout by later patches,
> which will also add the comments.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> ---
>  block/blk-mq.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index ddc9261..6741c3e 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -584,11 +584,16 @@ static void hctx_lock(struct blk_mq_hw_ctx *hctx, int *srcu_idx)
>  void blk_mq_complete_request(struct request *rq)
>  {
>  	struct request_queue *q = rq->q;
> +	struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(q, rq->mq_ctx->cpu);
> +	int srcu_idx;
>  
>  	if (unlikely(blk_should_fake_timeout(q)))
>  		return;
> +
> +	hctx_lock(hctx, &srcu_idx);
>  	if (!blk_mark_rq_complete(rq))
>  		__blk_mq_complete_request(rq);
> +	hctx_unlock(hctx, srcu_idx);
>  }
>  EXPORT_SYMBOL(blk_mq_complete_request);
>  
> 
Hmm. Why do we need to call blk_mq_map_queue() here?
Is there a chance that we end up with a _different_ hctx on completion
than that one used for submission?
If not, why can't we just keep a pointer to the hctx in struct request?

Cheers,

Hannes

Jens Axboe Jan. 9, 2018, 3:22 p.m. UTC | #7

On 1/9/18 12:08 AM, Hannes Reinecke wrote:
> On 01/08/2018 08:15 PM, Tejun Heo wrote:
>> Currently, blk-mq protects only the issue path with RCU.  This patch
>> puts the completion path under the same RCU protection.  This will be
>> used to synchronize issue/completion against timeout by later patches,
>> which will also add the comments.
>>
>> Signed-off-by: Tejun Heo <tj@kernel.org>
>> ---
>>  block/blk-mq.c | 5 +++++
>>  1 file changed, 5 insertions(+)
>>
>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>> index ddc9261..6741c3e 100644
>> --- a/block/blk-mq.c
>> +++ b/block/blk-mq.c
>> @@ -584,11 +584,16 @@ static void hctx_lock(struct blk_mq_hw_ctx *hctx, int *srcu_idx)
>>  void blk_mq_complete_request(struct request *rq)
>>  {
>>  	struct request_queue *q = rq->q;
>> +	struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(q, rq->mq_ctx->cpu);
>> +	int srcu_idx;
>>  
>>  	if (unlikely(blk_should_fake_timeout(q)))
>>  		return;
>> +
>> +	hctx_lock(hctx, &srcu_idx);
>>  	if (!blk_mark_rq_complete(rq))
>>  		__blk_mq_complete_request(rq);
>> +	hctx_unlock(hctx, srcu_idx);
>>  }
>>  EXPORT_SYMBOL(blk_mq_complete_request);
>>  
>>
> Hmm. Why do we need to call blk_mq_map_queue() here?
> Is there a chance that we end up with a _different_ hctx on completion
> than that one used for submission?
> If not, why can't we just keep a pointer to the hctx in struct request?

Mapping is the right thing to do. We cache the software queue, which
allows us to map back to the same hardware queue. There would be no
point in storing both, the mapping is very cheap. No point in bloating
the request with redundant information.

Jens Axboe Jan. 9, 2018, 4:17 p.m. UTC | #8

On 1/9/18 9:12 AM, Bart Van Assche wrote:
> On Mon, 2018-01-08 at 11:15 -0800, Tejun Heo wrote:
>> Currently, blk-mq protects only the issue path with RCU.  This patch
>> puts the completion path under the same RCU protection.  This will be
>> used to synchronize issue/completion against timeout by later patches,
>> which will also add the comments.
>>
>> Signed-off-by: Tejun Heo <tj@kernel.org>
>> ---
>>  block/blk-mq.c | 5 +++++
>>  1 file changed, 5 insertions(+)
>>
>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>> index ddc9261..6741c3e 100644
>> --- a/block/blk-mq.c
>> +++ b/block/blk-mq.c
>> @@ -584,11 +584,16 @@ static void hctx_lock(struct blk_mq_hw_ctx *hctx, int *srcu_idx)
>>  void blk_mq_complete_request(struct request *rq)
>>  {
>>  	struct request_queue *q = rq->q;
>> +	struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(q, rq->mq_ctx->cpu);
>> +	int srcu_idx;
>>  
>>  	if (unlikely(blk_should_fake_timeout(q)))
>>  		return;
>> +
>> +	hctx_lock(hctx, &srcu_idx);
>>  	if (!blk_mark_rq_complete(rq))
>>  		__blk_mq_complete_request(rq);
>> +	hctx_unlock(hctx, srcu_idx);
>>  }
>>  EXPORT_SYMBOL(blk_mq_complete_request);
> 
> Hello Tejun,
> 
> I'm concerned about the additional CPU cycles needed for the new blk_mq_map_queue()
> call, although I know this call is cheap. Would the timeout code really get that
> much more complicated if the hctx_lock() and hctx_unlock() calls would be changed
> into rcu_read_lock() and rcu_read_unlock() calls? Would it be sufficient if
> "if (has_rcu) synchronize_rcu();" would be changed into "synchronize_rcu();" in
> blk_mq_timeout_work()?

It's a non concern, imho. We do queue mapping all over the place for a variety
of reasons, it's total noise, especially since we're calling [s]rcu anyway
after that.

Tejun Heo Jan. 9, 2018, 4:19 p.m. UTC | #9

Hello, Bart.

On Tue, Jan 09, 2018 at 04:12:40PM +0000, Bart Van Assche wrote:
> I'm concerned about the additional CPU cycles needed for the new blk_mq_map_queue()
> call, although I know this call is cheap. Would the timeout code really get that

So, if that is really a concern, let's cache that mapping instead of
changing synchronization rules for that.

> much more complicated if the hctx_lock() and hctx_unlock() calls would be changed
> into rcu_read_lock() and rcu_read_unlock() calls? Would it be sufficient if
> "if (has_rcu) synchronize_rcu();" would be changed into "synchronize_rcu();" in
> blk_mq_timeout_work()?

Code-wise, it won't be too much extra code but I think diverging the
sync methods between issue and completion paths is more fragile and
likely to invite confusions and mistakes in the future.  We have the
normal path (issue&completion) synchronizing against the exception
path (timeout).  I think it's best to keep the sync constructs aligned
with that conceptual picture.

Thanks.

Jens Axboe Jan. 9, 2018, 4:22 p.m. UTC | #10

On 1/9/18 9:19 AM, tj@kernel.org wrote:
> Hello, Bart.
> 
> On Tue, Jan 09, 2018 at 04:12:40PM +0000, Bart Van Assche wrote:
>> I'm concerned about the additional CPU cycles needed for the new blk_mq_map_queue()
>> call, although I know this call is cheap. Would the timeout code really get that
> 
> So, if that is really a concern, let's cache that mapping instead of
> changing synchronization rules for that.
> 
>> much more complicated if the hctx_lock() and hctx_unlock() calls would be changed
>> into rcu_read_lock() and rcu_read_unlock() calls? Would it be sufficient if
>> "if (has_rcu) synchronize_rcu();" would be changed into "synchronize_rcu();" in
>> blk_mq_timeout_work()?
> 
> Code-wise, it won't be too much extra code but I think diverging the
> sync methods between issue and completion paths is more fragile and
> likely to invite confusions and mistakes in the future.  We have the
> normal path (issue&completion) synchronizing against the exception
> path (timeout).  I think it's best to keep the sync constructs aligned
> with that conceptual picture.

Guys, the map call is FINE. We do it at least once per request anyway,
usually multiple times. If it's too expensive, then THAT is what needs
fixing, not adding an arbitrary caching of that mapping. Look at the
actual code:

	return q->queue_hw_ctx[q->mq_map[cpu]];

that's it. It's just a double index.

So let's put this thing to rest right now - the map call is perfectly
fine, especially since the alternatives are either bloating struct
request or making the code less maintainable.

Foot -> down.

[2/8] blk-mq: protect completion path with RCU

Commit Message

Comments

Patch