diff mbox series

blktrace: Avoid sparse warnings when assigning q->blk_trace

Message ID 20200528092910.11118-1-jack@suse.cz (mailing list archive)
State New, archived
Headers show
Series blktrace: Avoid sparse warnings when assigning q->blk_trace | expand

Commit Message

Jan Kara May 28, 2020, 9:29 a.m. UTC
Mostly for historical reasons, q->blk_trace is assigned through xchg()
and cmpxchg() atomic operations. Although this is correct, sparse
complains about this because it violates rcu annotations. Furthermore
there's no real need for atomic operations anymore since all changes to
q->blk_trace happen under q->blk_trace_mutex. So let's just replace
xchg() with rcu_replace_pointer() and cmpxchg() with explicit check and
rcu_assign_pointer(). This makes the code more efficient and sparse
happy.

Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 kernel/trace/blktrace.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

Comments

Bart Van Assche May 28, 2020, 2:44 p.m. UTC | #1
(+Luis)

On 2020-05-28 02:29, Jan Kara wrote:
> Mostly for historical reasons, q->blk_trace is assigned through xchg()
> and cmpxchg() atomic operations. Although this is correct, sparse
> complains about this because it violates rcu annotations. Furthermore
> there's no real need for atomic operations anymore since all changes to
> q->blk_trace happen under q->blk_trace_mutex. So let's just replace
> xchg() with rcu_replace_pointer() and cmpxchg() with explicit check and
> rcu_assign_pointer(). This makes the code more efficient and sparse
> happy.
> 
> Reported-by: kbuild test robot <lkp@intel.com>
> Signed-off-by: Jan Kara <jack@suse.cz>

How about adding a reference to commit c780e86dd48e ("blktrace: Protect
q->blk_trace with RCU") in the description of this patch?

> @@ -1669,10 +1672,7 @@ static int blk_trace_setup_queue(struct request_queue *q,
>  
>  	blk_trace_setup_lba(bt, bdev);
>  
> -	ret = -EBUSY;
> -	if (cmpxchg(&q->blk_trace, NULL, bt))
> -		goto free_bt;
> -
> +	rcu_assign_pointer(q->blk_trace, bt);
>  	get_probe_ref();
>  	return 0;

This changes a conditional assignment of q->blk_trace into an
unconditional assignment. Shouldn't q->blk_trace only be assigned if
q->blk_trace == NULL?

Thanks,

Bart.
Luis Chamberlain May 28, 2020, 2:55 p.m. UTC | #2
On Thu, May 28, 2020 at 07:44:38AM -0700, Bart Van Assche wrote:
> > @@ -1669,10 +1672,7 @@ static int blk_trace_setup_queue(struct request_queue *q,
> >  
> >  	blk_trace_setup_lba(bt, bdev);
> >  
> > -	ret = -EBUSY;
> > -	if (cmpxchg(&q->blk_trace, NULL, bt))
> > -		goto free_bt;
> > -
> > +	rcu_assign_pointer(q->blk_trace, bt);
> >  	get_probe_ref();
> >  	return 0;
> 
> Shouldn't q->blk_trace only be assigned if
> q->blk_trace == NULL?

Yes but the old call checked for that and left it as NULL if
it was not NULL.

I have a patch in my series which checks for q->blkt_trace *early*
prior to continuing to avoid concurrent calls proactively, otherwise
this will fail only at the very end.

  Luis
Jan Kara May 28, 2020, 6:31 p.m. UTC | #3
On Thu 28-05-20 07:44:38, Bart Van Assche wrote:
> (+Luis)
> 
> On 2020-05-28 02:29, Jan Kara wrote:
> > Mostly for historical reasons, q->blk_trace is assigned through xchg()
> > and cmpxchg() atomic operations. Although this is correct, sparse
> > complains about this because it violates rcu annotations. Furthermore
> > there's no real need for atomic operations anymore since all changes to
> > q->blk_trace happen under q->blk_trace_mutex. So let's just replace
> > xchg() with rcu_replace_pointer() and cmpxchg() with explicit check and
> > rcu_assign_pointer(). This makes the code more efficient and sparse
> > happy.
> > 
> > Reported-by: kbuild test robot <lkp@intel.com>
> > Signed-off-by: Jan Kara <jack@suse.cz>
> 
> How about adding a reference to commit c780e86dd48e ("blktrace: Protect
> q->blk_trace with RCU") in the description of this patch?

Yes, that's probably a good idea.

> > @@ -1669,10 +1672,7 @@ static int blk_trace_setup_queue(struct request_queue *q,
> >  
> >  	blk_trace_setup_lba(bt, bdev);
> >  
> > -	ret = -EBUSY;
> > -	if (cmpxchg(&q->blk_trace, NULL, bt))
> > -		goto free_bt;
> > -
> > +	rcu_assign_pointer(q->blk_trace, bt);
> >  	get_probe_ref();
> >  	return 0;
> 
> This changes a conditional assignment of q->blk_trace into an
> unconditional assignment. Shouldn't q->blk_trace only be assigned if
> q->blk_trace == NULL?

Yes but both callers of blk_trace_setup_queue() actually check that
q->blk_trace is NULL before calling blk_trace_setup_queue() and since we
hold blk_trace_mutex all the time, the value of q->blk_trace cannot change.
So the conditional assignment was just bogus.

								Honza
Luis Chamberlain May 28, 2020, 6:43 p.m. UTC | #4
On Thu, May 28, 2020 at 08:31:52PM +0200, Jan Kara wrote:
> On Thu 28-05-20 07:44:38, Bart Van Assche wrote:
> > (+Luis)
> > 
> > On 2020-05-28 02:29, Jan Kara wrote:
> > > Mostly for historical reasons, q->blk_trace is assigned through xchg()
> > > and cmpxchg() atomic operations. Although this is correct, sparse
> > > complains about this because it violates rcu annotations. Furthermore
> > > there's no real need for atomic operations anymore since all changes to
> > > q->blk_trace happen under q->blk_trace_mutex. So let's just replace
> > > xchg() with rcu_replace_pointer() and cmpxchg() with explicit check and
> > > rcu_assign_pointer(). This makes the code more efficient and sparse
> > > happy.
> > > 
> > > Reported-by: kbuild test robot <lkp@intel.com>
> > > Signed-off-by: Jan Kara <jack@suse.cz>
> > 
> > How about adding a reference to commit c780e86dd48e ("blktrace: Protect
> > q->blk_trace with RCU") in the description of this patch?
> 
> Yes, that's probably a good idea.
> 
> > > @@ -1669,10 +1672,7 @@ static int blk_trace_setup_queue(struct request_queue *q,
> > >  
> > >  	blk_trace_setup_lba(bt, bdev);
> > >  
> > > -	ret = -EBUSY;
> > > -	if (cmpxchg(&q->blk_trace, NULL, bt))
> > > -		goto free_bt;
> > > -
> > > +	rcu_assign_pointer(q->blk_trace, bt);
> > >  	get_probe_ref();
> > >  	return 0;
> > 
> > This changes a conditional assignment of q->blk_trace into an
> > unconditional assignment. Shouldn't q->blk_trace only be assigned if
> > q->blk_trace == NULL?
> 
> Yes but both callers of blk_trace_setup_queue() actually check that
> q->blk_trace is NULL before calling blk_trace_setup_queue() and since we
> hold blk_trace_mutex all the time, the value of q->blk_trace cannot change.
> So the conditional assignment was just bogus.

If you run a blktrace against a different partition the check does have
an effect today. This is because the request_queue is shared between
partitions implicitly, even though they end up using a different struct
dentry. So the check is actually still needed, however my change adds
this check early as well so we don't do a memory allocation just to
throw it away.

Even then, the last final check is neede then.

Try these:

https://github.com/mcgrof/break-blktrace

  Luis
Jan Kara May 28, 2020, 6:55 p.m. UTC | #5
On Thu 28-05-20 18:43:33, Luis Chamberlain wrote:
> On Thu, May 28, 2020 at 08:31:52PM +0200, Jan Kara wrote:
> > On Thu 28-05-20 07:44:38, Bart Van Assche wrote:
> > > (+Luis)
> > > 
> > > On 2020-05-28 02:29, Jan Kara wrote:
> > > > Mostly for historical reasons, q->blk_trace is assigned through xchg()
> > > > and cmpxchg() atomic operations. Although this is correct, sparse
> > > > complains about this because it violates rcu annotations. Furthermore
> > > > there's no real need for atomic operations anymore since all changes to
> > > > q->blk_trace happen under q->blk_trace_mutex. So let's just replace
> > > > xchg() with rcu_replace_pointer() and cmpxchg() with explicit check and
> > > > rcu_assign_pointer(). This makes the code more efficient and sparse
> > > > happy.
> > > > 
> > > > Reported-by: kbuild test robot <lkp@intel.com>
> > > > Signed-off-by: Jan Kara <jack@suse.cz>
> > > 
> > > How about adding a reference to commit c780e86dd48e ("blktrace: Protect
> > > q->blk_trace with RCU") in the description of this patch?
> > 
> > Yes, that's probably a good idea.
> > 
> > > > @@ -1669,10 +1672,7 @@ static int blk_trace_setup_queue(struct request_queue *q,
> > > >  
> > > >  	blk_trace_setup_lba(bt, bdev);
> > > >  
> > > > -	ret = -EBUSY;
> > > > -	if (cmpxchg(&q->blk_trace, NULL, bt))
> > > > -		goto free_bt;
> > > > -
> > > > +	rcu_assign_pointer(q->blk_trace, bt);
> > > >  	get_probe_ref();
> > > >  	return 0;
> > > 
> > > This changes a conditional assignment of q->blk_trace into an
> > > unconditional assignment. Shouldn't q->blk_trace only be assigned if
> > > q->blk_trace == NULL?
> > 
> > Yes but both callers of blk_trace_setup_queue() actually check that
> > q->blk_trace is NULL before calling blk_trace_setup_queue() and since we
> > hold blk_trace_mutex all the time, the value of q->blk_trace cannot change.
> > So the conditional assignment was just bogus.
> 
> If you run a blktrace against a different partition the check does have
> an effect today. This is because the request_queue is shared between
> partitions implicitly, even though they end up using a different struct
> dentry. So the check is actually still needed, however my change adds
> this check early as well so we don't do a memory allocation just to
> throw it away.

I'm not sure we are speaking about the same check but I might be missing
something. blk_trace_setup_queue() is only called from
sysfs_blk_trace_attr_store(). That does:

        mutex_lock(&q->blk_trace_mutex);

        bt = rcu_dereference_protected(q->blk_trace,
                                       lockdep_is_held(&q->blk_trace_mutex));
        if (attr == &dev_attr_enable) {
                if (!!value == !!bt) {
                        ret = 0;
                        goto out_unlock_bdev;
                }

		^^^ So if 'bt' is non-NULL, and we are enabling, we bail
instead of calling blk_trace_setup_queue().

Similarly later:

        if (bt == NULL) {
                ret = blk_trace_setup_queue(q, bdev);
	...
so we again call blk_trace_setup_queue() only if bt is NULL. So IMO the
cmpxchg() in blk_trace_setup_queue() could never fail to set the value.
Am I missing something?

								Honza
Luis Chamberlain May 29, 2020, 8 a.m. UTC | #6
On Thu, May 28, 2020 at 08:55:39PM +0200, Jan Kara wrote:
> On Thu 28-05-20 18:43:33, Luis Chamberlain wrote:
> > On Thu, May 28, 2020 at 08:31:52PM +0200, Jan Kara wrote:
> > > On Thu 28-05-20 07:44:38, Bart Van Assche wrote:
> > > > (+Luis)
> > > > 
> > > > On 2020-05-28 02:29, Jan Kara wrote:
> > > > > Mostly for historical reasons, q->blk_trace is assigned through xchg()
> > > > > and cmpxchg() atomic operations. Although this is correct, sparse
> > > > > complains about this because it violates rcu annotations. Furthermore
> > > > > there's no real need for atomic operations anymore since all changes to
> > > > > q->blk_trace happen under q->blk_trace_mutex. So let's just replace
> > > > > xchg() with rcu_replace_pointer() and cmpxchg() with explicit check and
> > > > > rcu_assign_pointer(). This makes the code more efficient and sparse
> > > > > happy.
> > > > > 
> > > > > Reported-by: kbuild test robot <lkp@intel.com>
> > > > > Signed-off-by: Jan Kara <jack@suse.cz>
> > > > 
> > > > How about adding a reference to commit c780e86dd48e ("blktrace: Protect
> > > > q->blk_trace with RCU") in the description of this patch?
> > > 
> > > Yes, that's probably a good idea.
> > > 
> > > > > @@ -1669,10 +1672,7 @@ static int blk_trace_setup_queue(struct request_queue *q,
> > > > >  
> > > > >  	blk_trace_setup_lba(bt, bdev);
> > > > >  
> > > > > -	ret = -EBUSY;
> > > > > -	if (cmpxchg(&q->blk_trace, NULL, bt))
> > > > > -		goto free_bt;
> > > > > -
> > > > > +	rcu_assign_pointer(q->blk_trace, bt);
> > > > >  	get_probe_ref();
> > > > >  	return 0;
> > > > 
> > > > This changes a conditional assignment of q->blk_trace into an
> > > > unconditional assignment. Shouldn't q->blk_trace only be assigned if
> > > > q->blk_trace == NULL?
> > > 
> > > Yes but both callers of blk_trace_setup_queue() actually check that
> > > q->blk_trace is NULL before calling blk_trace_setup_queue() and since we
> > > hold blk_trace_mutex all the time, the value of q->blk_trace cannot change.
> > > So the conditional assignment was just bogus.
> > 
> > If you run a blktrace against a different partition the check does have
> > an effect today. This is because the request_queue is shared between
> > partitions implicitly, even though they end up using a different struct
> > dentry. So the check is actually still needed, however my change adds
> > this check early as well so we don't do a memory allocation just to
> > throw it away.
> 
> I'm not sure we are speaking about the same check but I might be missing
> something. blk_trace_setup_queue() is only called from
> sysfs_blk_trace_attr_store(). That does:
> 
>         mutex_lock(&q->blk_trace_mutex);
> 
>         bt = rcu_dereference_protected(q->blk_trace,
>                                        lockdep_is_held(&q->blk_trace_mutex));
>         if (attr == &dev_attr_enable) {
>                 if (!!value == !!bt) {
>                         ret = 0;
>                         goto out_unlock_bdev;
>                 }
> 
> 		^^^ So if 'bt' is non-NULL, and we are enabling, we bail
> instead of calling blk_trace_setup_queue().
> 
> Similarly later:
> 
>         if (bt == NULL) {
>                 ret = blk_trace_setup_queue(q, bdev);
> 	...
> so we again call blk_trace_setup_queue() only if bt is NULL. So IMO the
> cmpxchg() in blk_trace_setup_queue() could never fail to set the value.
> Am I missing something?

I believe we are talking about the same check indeed. Consider the
situation not as a race, but instead consider the state machine of
the ioctl. The BLKTRACESETUP goes first, and when that is over we
have not ran BLKTRACESTART. So, prior to BLKTRACESTART we can have
another BLKTRACESETUP run but against another partition. At that
point we have two cases trying to use the same request_queue and
the same q->blk_trace, even though this was well protected with
the mutex.

And so the final check is needed to ensure we only give one of
the users the right to blktrace.

Did I misunderstand the check though?

  Luis
Jan Kara May 29, 2020, 9:04 a.m. UTC | #7
On Fri 29-05-20 08:00:56, Luis Chamberlain wrote:
> On Thu, May 28, 2020 at 08:55:39PM +0200, Jan Kara wrote:
> > On Thu 28-05-20 18:43:33, Luis Chamberlain wrote:
> > > On Thu, May 28, 2020 at 08:31:52PM +0200, Jan Kara wrote:
> > > > On Thu 28-05-20 07:44:38, Bart Van Assche wrote:
> > > > > (+Luis)
> > > > > 
> > > > > On 2020-05-28 02:29, Jan Kara wrote:
> > > > > > Mostly for historical reasons, q->blk_trace is assigned through xchg()
> > > > > > and cmpxchg() atomic operations. Although this is correct, sparse
> > > > > > complains about this because it violates rcu annotations. Furthermore
> > > > > > there's no real need for atomic operations anymore since all changes to
> > > > > > q->blk_trace happen under q->blk_trace_mutex. So let's just replace
> > > > > > xchg() with rcu_replace_pointer() and cmpxchg() with explicit check and
> > > > > > rcu_assign_pointer(). This makes the code more efficient and sparse
> > > > > > happy.
> > > > > > 
> > > > > > Reported-by: kbuild test robot <lkp@intel.com>
> > > > > > Signed-off-by: Jan Kara <jack@suse.cz>
> > > > > 
> > > > > How about adding a reference to commit c780e86dd48e ("blktrace: Protect
> > > > > q->blk_trace with RCU") in the description of this patch?
> > > > 
> > > > Yes, that's probably a good idea.
> > > > 
> > > > > > @@ -1669,10 +1672,7 @@ static int blk_trace_setup_queue(struct request_queue *q,
> > > > > >  
> > > > > >  	blk_trace_setup_lba(bt, bdev);
> > > > > >  
> > > > > > -	ret = -EBUSY;
> > > > > > -	if (cmpxchg(&q->blk_trace, NULL, bt))
> > > > > > -		goto free_bt;
> > > > > > -
> > > > > > +	rcu_assign_pointer(q->blk_trace, bt);
> > > > > >  	get_probe_ref();
> > > > > >  	return 0;
> > > > > 
> > > > > This changes a conditional assignment of q->blk_trace into an
> > > > > unconditional assignment. Shouldn't q->blk_trace only be assigned if
> > > > > q->blk_trace == NULL?
> > > > 
> > > > Yes but both callers of blk_trace_setup_queue() actually check that
> > > > q->blk_trace is NULL before calling blk_trace_setup_queue() and since we
> > > > hold blk_trace_mutex all the time, the value of q->blk_trace cannot change.
> > > > So the conditional assignment was just bogus.
> > > 
> > > If you run a blktrace against a different partition the check does have
> > > an effect today. This is because the request_queue is shared between
> > > partitions implicitly, even though they end up using a different struct
> > > dentry. So the check is actually still needed, however my change adds
> > > this check early as well so we don't do a memory allocation just to
> > > throw it away.
> > 
> > I'm not sure we are speaking about the same check but I might be missing
> > something. blk_trace_setup_queue() is only called from
> > sysfs_blk_trace_attr_store(). That does:
> > 
> >         mutex_lock(&q->blk_trace_mutex);
> > 
> >         bt = rcu_dereference_protected(q->blk_trace,
> >                                        lockdep_is_held(&q->blk_trace_mutex));
> >         if (attr == &dev_attr_enable) {
> >                 if (!!value == !!bt) {
> >                         ret = 0;
> >                         goto out_unlock_bdev;
> >                 }
> > 
> > 		^^^ So if 'bt' is non-NULL, and we are enabling, we bail
> > instead of calling blk_trace_setup_queue().
> > 
> > Similarly later:
> > 
> >         if (bt == NULL) {
> >                 ret = blk_trace_setup_queue(q, bdev);
> > 	...
> > so we again call blk_trace_setup_queue() only if bt is NULL. So IMO the
> > cmpxchg() in blk_trace_setup_queue() could never fail to set the value.
> > Am I missing something?
> 
> I believe we are talking about the same check indeed. Consider the
> situation not as a race, but instead consider the state machine of
> the ioctl. The BLKTRACESETUP goes first, and when that is over we
> have not ran BLKTRACESTART. So, prior to BLKTRACESTART we can have
> another BLKTRACESETUP run but against another partition.

So first note that BLKTRACESETUP goes through do_blk_trace_setup() while
'echo 1 >/sys/block/../trace/enable' goes through blk_trace_setup_queue().
Although these operations achieve a very similar things, they are completely
separate code paths. I was speaking about the second case while you are now
speaking about the first one.

WRT to your BLKTRACESETUP example, the first BLKTRACESETUP will end up
setting q->blk_trace to 'bt' so the second BLKTRACESETUP will see
q->blk_trace is not NULL (my patch adds this check to do_blk_trace_setup()
so we bail out earlier than during cmpxchg()) and fails. Again I don't see
any problem here...

> At that
> point we have two cases trying to use the same request_queue and
> the same q->blk_trace, even though this was well protected with
> the mutex.
> 
> And so the final check is needed to ensure we only give one of
> the users the right to blktrace.

Well, the check to make sure there's only one trace setup is the one
checking q->blk_trace is NULL while holding q->blk_trace_mutex...

								Honza
Luis Chamberlain May 29, 2020, 11:43 a.m. UTC | #8
On Fri, May 29, 2020 at 11:04:48AM +0200, Jan Kara wrote:
> On Fri 29-05-20 08:00:56, Luis Chamberlain wrote:
> > On Thu, May 28, 2020 at 08:55:39PM +0200, Jan Kara wrote:
> > > On Thu 28-05-20 18:43:33, Luis Chamberlain wrote:
> > > > On Thu, May 28, 2020 at 08:31:52PM +0200, Jan Kara wrote:
> > > > > On Thu 28-05-20 07:44:38, Bart Van Assche wrote:
> > > > > > (+Luis)
> > > > > > 
> > > > > > On 2020-05-28 02:29, Jan Kara wrote:
> > > > > > > Mostly for historical reasons, q->blk_trace is assigned through xchg()
> > > > > > > and cmpxchg() atomic operations. Although this is correct, sparse
> > > > > > > complains about this because it violates rcu annotations. Furthermore
> > > > > > > there's no real need for atomic operations anymore since all changes to
> > > > > > > q->blk_trace happen under q->blk_trace_mutex. So let's just replace
> > > > > > > xchg() with rcu_replace_pointer() and cmpxchg() with explicit check and
> > > > > > > rcu_assign_pointer(). This makes the code more efficient and sparse
> > > > > > > happy.
> > > > > > > 
> > > > > > > Reported-by: kbuild test robot <lkp@intel.com>
> > > > > > > Signed-off-by: Jan Kara <jack@suse.cz>
> > > > > > 
> > > > > > How about adding a reference to commit c780e86dd48e ("blktrace: Protect
> > > > > > q->blk_trace with RCU") in the description of this patch?
> > > > > 
> > > > > Yes, that's probably a good idea.
> > > > > 
> > > > > > > @@ -1669,10 +1672,7 @@ static int blk_trace_setup_queue(struct request_queue *q,
> > > > > > >  
> > > > > > >  	blk_trace_setup_lba(bt, bdev);
> > > > > > >  
> > > > > > > -	ret = -EBUSY;
> > > > > > > -	if (cmpxchg(&q->blk_trace, NULL, bt))
> > > > > > > -		goto free_bt;
> > > > > > > -
> > > > > > > +	rcu_assign_pointer(q->blk_trace, bt);
> > > > > > >  	get_probe_ref();
> > > > > > >  	return 0;
> > > > > > 
> > > > > > This changes a conditional assignment of q->blk_trace into an
> > > > > > unconditional assignment. Shouldn't q->blk_trace only be assigned if
> > > > > > q->blk_trace == NULL?
> > > > > 
> > > > > Yes but both callers of blk_trace_setup_queue() actually check that
> > > > > q->blk_trace is NULL before calling blk_trace_setup_queue() and since we
> > > > > hold blk_trace_mutex all the time, the value of q->blk_trace cannot change.
> > > > > So the conditional assignment was just bogus.
> > > > 
> > > > If you run a blktrace against a different partition the check does have
> > > > an effect today. This is because the request_queue is shared between
> > > > partitions implicitly, even though they end up using a different struct
> > > > dentry. So the check is actually still needed, however my change adds
> > > > this check early as well so we don't do a memory allocation just to
> > > > throw it away.
> > > 
> > > I'm not sure we are speaking about the same check but I might be missing
> > > something. blk_trace_setup_queue() is only called from
> > > sysfs_blk_trace_attr_store(). That does:
> > > 
> > >         mutex_lock(&q->blk_trace_mutex);
> > > 
> > >         bt = rcu_dereference_protected(q->blk_trace,
> > >                                        lockdep_is_held(&q->blk_trace_mutex));
> > >         if (attr == &dev_attr_enable) {
> > >                 if (!!value == !!bt) {
> > >                         ret = 0;
> > >                         goto out_unlock_bdev;
> > >                 }
> > > 
> > > 		^^^ So if 'bt' is non-NULL, and we are enabling, we bail
> > > instead of calling blk_trace_setup_queue().
> > > 
> > > Similarly later:
> > > 
> > >         if (bt == NULL) {
> > >                 ret = blk_trace_setup_queue(q, bdev);
> > > 	...
> > > so we again call blk_trace_setup_queue() only if bt is NULL. So IMO the
> > > cmpxchg() in blk_trace_setup_queue() could never fail to set the value.
> > > Am I missing something?
> > 
> > I believe we are talking about the same check indeed. Consider the
> > situation not as a race, but instead consider the state machine of
> > the ioctl. The BLKTRACESETUP goes first, and when that is over we
> > have not ran BLKTRACESTART. So, prior to BLKTRACESTART we can have
> > another BLKTRACESETUP run but against another partition.
> 
> So first note that BLKTRACESETUP goes through do_blk_trace_setup() while
> 'echo 1 >/sys/block/../trace/enable' goes through blk_trace_setup_queue().
> Although these operations achieve a very similar things, they are completely
> separate code paths. I was speaking about the second case while you are now
> speaking about the first one.
> 
> WRT to your BLKTRACESETUP example, the first BLKTRACESETUP will end up
> setting q->blk_trace to 'bt' so the second BLKTRACESETUP will see
> q->blk_trace is not NULL (my patch adds this check to do_blk_trace_setup()
> so we bail out earlier than during cmpxchg()) and fails. Again I don't see
> any problem here...

Ah, the patch I was CC'd on didn't contain this hunk! It only had the
change from cmpxchg() to the rcu_assign_pointer(), so I misunderstood
your intention, sorry!

In that case, I already proposed a patch to do that, and it also adds
a tiny bit of verbiage given we currently don't inform the user about
why this fails [0].

Let me know how you folks would like to proceed.

[0] https://lkml.kernel.org/r/20200516031956.2605-7-mcgrof@kernel.org

  Luis
Jan Kara May 29, 2020, 12:11 p.m. UTC | #9
On Fri 29-05-20 11:43:00, Luis Chamberlain wrote:
> On Fri, May 29, 2020 at 11:04:48AM +0200, Jan Kara wrote:
> > On Fri 29-05-20 08:00:56, Luis Chamberlain wrote:
> > > On Thu, May 28, 2020 at 08:55:39PM +0200, Jan Kara wrote:
> > > > On Thu 28-05-20 18:43:33, Luis Chamberlain wrote:
> > > > > On Thu, May 28, 2020 at 08:31:52PM +0200, Jan Kara wrote:
> > > > > > On Thu 28-05-20 07:44:38, Bart Van Assche wrote:
> > > > > > > (+Luis)
> > > > > > > 
> > > > > > > On 2020-05-28 02:29, Jan Kara wrote:
> > > > > > > > Mostly for historical reasons, q->blk_trace is assigned through xchg()
> > > > > > > > and cmpxchg() atomic operations. Although this is correct, sparse
> > > > > > > > complains about this because it violates rcu annotations. Furthermore
> > > > > > > > there's no real need for atomic operations anymore since all changes to
> > > > > > > > q->blk_trace happen under q->blk_trace_mutex. So let's just replace
> > > > > > > > xchg() with rcu_replace_pointer() and cmpxchg() with explicit check and
> > > > > > > > rcu_assign_pointer(). This makes the code more efficient and sparse
> > > > > > > > happy.
> > > > > > > > 
> > > > > > > > Reported-by: kbuild test robot <lkp@intel.com>
> > > > > > > > Signed-off-by: Jan Kara <jack@suse.cz>
> > > > > > > 
> > > > > > > How about adding a reference to commit c780e86dd48e ("blktrace: Protect
> > > > > > > q->blk_trace with RCU") in the description of this patch?
> > > > > > 
> > > > > > Yes, that's probably a good idea.
> > > > > > 
> > > > > > > > @@ -1669,10 +1672,7 @@ static int blk_trace_setup_queue(struct request_queue *q,
> > > > > > > >  
> > > > > > > >  	blk_trace_setup_lba(bt, bdev);
> > > > > > > >  
> > > > > > > > -	ret = -EBUSY;
> > > > > > > > -	if (cmpxchg(&q->blk_trace, NULL, bt))
> > > > > > > > -		goto free_bt;
> > > > > > > > -
> > > > > > > > +	rcu_assign_pointer(q->blk_trace, bt);
> > > > > > > >  	get_probe_ref();
> > > > > > > >  	return 0;
> > > > > > > 
> > > > > > > This changes a conditional assignment of q->blk_trace into an
> > > > > > > unconditional assignment. Shouldn't q->blk_trace only be assigned if
> > > > > > > q->blk_trace == NULL?
> > > > > > 
> > > > > > Yes but both callers of blk_trace_setup_queue() actually check that
> > > > > > q->blk_trace is NULL before calling blk_trace_setup_queue() and since we
> > > > > > hold blk_trace_mutex all the time, the value of q->blk_trace cannot change.
> > > > > > So the conditional assignment was just bogus.
> > > > > 
> > > > > If you run a blktrace against a different partition the check does have
> > > > > an effect today. This is because the request_queue is shared between
> > > > > partitions implicitly, even though they end up using a different struct
> > > > > dentry. So the check is actually still needed, however my change adds
> > > > > this check early as well so we don't do a memory allocation just to
> > > > > throw it away.
> > > > 
> > > > I'm not sure we are speaking about the same check but I might be missing
> > > > something. blk_trace_setup_queue() is only called from
> > > > sysfs_blk_trace_attr_store(). That does:
> > > > 
> > > >         mutex_lock(&q->blk_trace_mutex);
> > > > 
> > > >         bt = rcu_dereference_protected(q->blk_trace,
> > > >                                        lockdep_is_held(&q->blk_trace_mutex));
> > > >         if (attr == &dev_attr_enable) {
> > > >                 if (!!value == !!bt) {
> > > >                         ret = 0;
> > > >                         goto out_unlock_bdev;
> > > >                 }
> > > > 
> > > > 		^^^ So if 'bt' is non-NULL, and we are enabling, we bail
> > > > instead of calling blk_trace_setup_queue().
> > > > 
> > > > Similarly later:
> > > > 
> > > >         if (bt == NULL) {
> > > >                 ret = blk_trace_setup_queue(q, bdev);
> > > > 	...
> > > > so we again call blk_trace_setup_queue() only if bt is NULL. So IMO the
> > > > cmpxchg() in blk_trace_setup_queue() could never fail to set the value.
> > > > Am I missing something?
> > > 
> > > I believe we are talking about the same check indeed. Consider the
> > > situation not as a race, but instead consider the state machine of
> > > the ioctl. The BLKTRACESETUP goes first, and when that is over we
> > > have not ran BLKTRACESTART. So, prior to BLKTRACESTART we can have
> > > another BLKTRACESETUP run but against another partition.
> > 
> > So first note that BLKTRACESETUP goes through do_blk_trace_setup() while
> > 'echo 1 >/sys/block/../trace/enable' goes through blk_trace_setup_queue().
> > Although these operations achieve a very similar things, they are completely
> > separate code paths. I was speaking about the second case while you are now
> > speaking about the first one.
> > 
> > WRT to your BLKTRACESETUP example, the first BLKTRACESETUP will end up
> > setting q->blk_trace to 'bt' so the second BLKTRACESETUP will see
> > q->blk_trace is not NULL (my patch adds this check to do_blk_trace_setup()
> > so we bail out earlier than during cmpxchg()) and fails. Again I don't see
> > any problem here...
> 
> Ah, the patch I was CC'd on didn't contain this hunk! It only had the
> change from cmpxchg() to the rcu_assign_pointer(), so I misunderstood
> your intention, sorry!

Good that we are on the same page now :)

> In that case, I already proposed a patch to do that, and it also adds
> a tiny bit of verbiage given we currently don't inform the user about
> why this fails [0].

Honestly, I'm not sure pr_warn() you've added is that useful. We usually
don't spam logs due to someone trying to use already used resource. But
anyway, I can see other people are fine with that so I don't insist.

> Let me know how you folks would like to proceed.

I guess I can rebase my patch on top of your series since that seems pretty
much done. I was aware of it but didn't realize there's a conflict...

								Honza
Luis Chamberlain May 29, 2020, 12:22 p.m. UTC | #10
On Fri, May 29, 2020 at 02:11:14PM +0200, Jan Kara wrote:
> On Fri 29-05-20 11:43:00, Luis Chamberlain wrote:
> > On Fri, May 29, 2020 at 11:04:48AM +0200, Jan Kara wrote:
> > > On Fri 29-05-20 08:00:56, Luis Chamberlain wrote:
> > > > On Thu, May 28, 2020 at 08:55:39PM +0200, Jan Kara wrote:
> > > > > On Thu 28-05-20 18:43:33, Luis Chamberlain wrote:
> > > > > > On Thu, May 28, 2020 at 08:31:52PM +0200, Jan Kara wrote:
> > > > > > > On Thu 28-05-20 07:44:38, Bart Van Assche wrote:
> > > > > > > > (+Luis)
> > > > > > > > 
> > > > > > > > On 2020-05-28 02:29, Jan Kara wrote:
> > > > > > > > > Mostly for historical reasons, q->blk_trace is assigned through xchg()
> > > > > > > > > and cmpxchg() atomic operations. Although this is correct, sparse
> > > > > > > > > complains about this because it violates rcu annotations. Furthermore
> > > > > > > > > there's no real need for atomic operations anymore since all changes to
> > > > > > > > > q->blk_trace happen under q->blk_trace_mutex. So let's just replace
> > > > > > > > > xchg() with rcu_replace_pointer() and cmpxchg() with explicit check and
> > > > > > > > > rcu_assign_pointer(). This makes the code more efficient and sparse
> > > > > > > > > happy.
> > > > > > > > > 
> > > > > > > > > Reported-by: kbuild test robot <lkp@intel.com>
> > > > > > > > > Signed-off-by: Jan Kara <jack@suse.cz>
> > > > > > > > 
> > > > > > > > How about adding a reference to commit c780e86dd48e ("blktrace: Protect
> > > > > > > > q->blk_trace with RCU") in the description of this patch?
> > > > > > > 
> > > > > > > Yes, that's probably a good idea.
> > > > > > > 
> > > > > > > > > @@ -1669,10 +1672,7 @@ static int blk_trace_setup_queue(struct request_queue *q,
> > > > > > > > >  
> > > > > > > > >  	blk_trace_setup_lba(bt, bdev);
> > > > > > > > >  
> > > > > > > > > -	ret = -EBUSY;
> > > > > > > > > -	if (cmpxchg(&q->blk_trace, NULL, bt))
> > > > > > > > > -		goto free_bt;
> > > > > > > > > -
> > > > > > > > > +	rcu_assign_pointer(q->blk_trace, bt);
> > > > > > > > >  	get_probe_ref();
> > > > > > > > >  	return 0;
> > > > > > > > 
> > > > > > > > This changes a conditional assignment of q->blk_trace into an
> > > > > > > > unconditional assignment. Shouldn't q->blk_trace only be assigned if
> > > > > > > > q->blk_trace == NULL?
> > > > > > > 
> > > > > > > Yes but both callers of blk_trace_setup_queue() actually check that
> > > > > > > q->blk_trace is NULL before calling blk_trace_setup_queue() and since we
> > > > > > > hold blk_trace_mutex all the time, the value of q->blk_trace cannot change.
> > > > > > > So the conditional assignment was just bogus.
> > > > > > 
> > > > > > If you run a blktrace against a different partition the check does have
> > > > > > an effect today. This is because the request_queue is shared between
> > > > > > partitions implicitly, even though they end up using a different struct
> > > > > > dentry. So the check is actually still needed, however my change adds
> > > > > > this check early as well so we don't do a memory allocation just to
> > > > > > throw it away.
> > > > > 
> > > > > I'm not sure we are speaking about the same check but I might be missing
> > > > > something. blk_trace_setup_queue() is only called from
> > > > > sysfs_blk_trace_attr_store(). That does:
> > > > > 
> > > > >         mutex_lock(&q->blk_trace_mutex);
> > > > > 
> > > > >         bt = rcu_dereference_protected(q->blk_trace,
> > > > >                                        lockdep_is_held(&q->blk_trace_mutex));
> > > > >         if (attr == &dev_attr_enable) {
> > > > >                 if (!!value == !!bt) {
> > > > >                         ret = 0;
> > > > >                         goto out_unlock_bdev;
> > > > >                 }
> > > > > 
> > > > > 		^^^ So if 'bt' is non-NULL, and we are enabling, we bail
> > > > > instead of calling blk_trace_setup_queue().
> > > > > 
> > > > > Similarly later:
> > > > > 
> > > > >         if (bt == NULL) {
> > > > >                 ret = blk_trace_setup_queue(q, bdev);
> > > > > 	...
> > > > > so we again call blk_trace_setup_queue() only if bt is NULL. So IMO the
> > > > > cmpxchg() in blk_trace_setup_queue() could never fail to set the value.
> > > > > Am I missing something?
> > > > 
> > > > I believe we are talking about the same check indeed. Consider the
> > > > situation not as a race, but instead consider the state machine of
> > > > the ioctl. The BLKTRACESETUP goes first, and when that is over we
> > > > have not ran BLKTRACESTART. So, prior to BLKTRACESTART we can have
> > > > another BLKTRACESETUP run but against another partition.
> > > 
> > > So first note that BLKTRACESETUP goes through do_blk_trace_setup() while
> > > 'echo 1 >/sys/block/../trace/enable' goes through blk_trace_setup_queue().
> > > Although these operations achieve a very similar things, they are completely
> > > separate code paths. I was speaking about the second case while you are now
> > > speaking about the first one.
> > > 
> > > WRT to your BLKTRACESETUP example, the first BLKTRACESETUP will end up
> > > setting q->blk_trace to 'bt' so the second BLKTRACESETUP will see
> > > q->blk_trace is not NULL (my patch adds this check to do_blk_trace_setup()
> > > so we bail out earlier than during cmpxchg()) and fails. Again I don't see
> > > any problem here...
> > 
> > Ah, the patch I was CC'd on didn't contain this hunk! It only had the
> > change from cmpxchg() to the rcu_assign_pointer(), so I misunderstood
> > your intention, sorry!
> 
> Good that we are on the same page now :)

Yay!

> > In that case, I already proposed a patch to do that, and it also adds
> > a tiny bit of verbiage given we currently don't inform the user about
> > why this fails [0].
> 
> Honestly, I'm not sure pr_warn() you've added is that useful. We usually
> don't spam logs due to someone trying to use already used resource. But
> anyway, I can see other people are fine with that so I don't insist.

Well I would typically agree... however... 

It is in no way shape or form, not even in the blktrace documentation
that the request_queue / and therefore blktrace is shared between
partitions. Likewise for scsi-generic and say its respective block
device for TYPE_BLOCK.

If it is not obvious to some developer, it won't be obvious to users.
So *why* this fails really today is a mystery to users.

These limitations to the design of blktrace is not well documented
at all.

> > Let me know how you folks would like to proceed.
> 
> I guess I can rebase my patch on top of your series since that seems pretty
> much done.

I think so as well.

> I was aware of it but didn't realize there's a conflict...

Thanks for Bart for pointing it out!

  Luis
diff mbox series

Patch

diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c
index ca39dc3230cb..e4a9ba85b76f 100644
--- a/kernel/trace/blktrace.c
+++ b/kernel/trace/blktrace.c
@@ -344,7 +344,8 @@  static int __blk_trace_remove(struct request_queue *q)
 {
 	struct blk_trace *bt;
 
-	bt = xchg(&q->blk_trace, NULL);
+	bt = rcu_replace_pointer(q->blk_trace, NULL,
+				 lockdep_is_held(&q->blk_trace_mutex));
 	if (!bt)
 		return -EINVAL;
 
@@ -485,6 +486,10 @@  static int do_blk_trace_setup(struct request_queue *q, char *name, dev_t dev,
 	if (!blk_debugfs_root)
 		return -ENOENT;
 
+	if (rcu_dereference_protected(q->blk_trace,
+				      lockdep_is_held(&q->blk_trace_mutex)))
+		return -EBUSY;
+
 	strncpy(buts->name, name, BLKTRACE_BDEV_SIZE);
 	buts->name[BLKTRACE_BDEV_SIZE - 1] = '\0';
 
@@ -543,10 +548,7 @@  static int do_blk_trace_setup(struct request_queue *q, char *name, dev_t dev,
 	bt->pid = buts->pid;
 	bt->trace_state = Blktrace_setup;
 
-	ret = -EBUSY;
-	if (cmpxchg(&q->blk_trace, NULL, bt))
-		goto err;
-
+	rcu_assign_pointer(q->blk_trace, bt);
 	get_probe_ref();
 
 	ret = 0;
@@ -1637,7 +1639,8 @@  static int blk_trace_remove_queue(struct request_queue *q)
 {
 	struct blk_trace *bt;
 
-	bt = xchg(&q->blk_trace, NULL);
+	bt = rcu_replace_pointer(q->blk_trace, NULL,
+				 lockdep_is_held(&q->blk_trace_mutex));
 	if (bt == NULL)
 		return -EINVAL;
 
@@ -1669,10 +1672,7 @@  static int blk_trace_setup_queue(struct request_queue *q,
 
 	blk_trace_setup_lba(bt, bdev);
 
-	ret = -EBUSY;
-	if (cmpxchg(&q->blk_trace, NULL, bt))
-		goto free_bt;
-
+	rcu_assign_pointer(q->blk_trace, bt);
 	get_probe_ref();
 	return 0;