diff mbox series

rxrpc_find_service_conn_rcu: use read_seqbegin() rather than read_seqbegin_or_lock()

Message ID 20231027095842.GA30868@redhat.com (mailing list archive)
State Not Applicable
Delegated to: Netdev Maintainers
Headers show
Series rxrpc_find_service_conn_rcu: use read_seqbegin() rather than read_seqbegin_or_lock() | expand

Checks

Context Check Description
netdev/series_format warning Single patches do not need cover letters; Target tree name not specified in the subject
netdev/tree_selection success Guessed tree name to be net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1365 this patch: 1364
netdev/cc_maintainers success CCed 8 of 8 maintainers
netdev/build_clang success Errors and warnings before: 1388 this patch: 1388
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 1390 this patch: 1389
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 26 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Oleg Nesterov Oct. 27, 2023, 9:58 a.m. UTC
read_seqbegin_or_lock() makes no sense unless you make "seq" odd
after the lockless access failed. See thread_group_cputime() as
an example, note that it does nextseq = 1 for the 2nd round.

So this code can use read_seqbegin() without changing the current
behaviour.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 net/rxrpc/conn_service.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

Comments

Oleg Nesterov Oct. 27, 2023, 10 a.m. UTC | #1
On 10/27, Oleg Nesterov wrote:
>
> read_seqbegin_or_lock() makes no sense unless you make "seq" odd
> after the lockless access failed. See thread_group_cputime() as
> an example, note that it does nextseq = 1 for the 2nd round.

See also

	[PATCH 1/2] seqlock: fix the wrong read_seqbegin_or_lock/need_seqretry documentation
	https://lore.kernel.org/all/20231024120808.GA15382@redhat.com/

> So this code can use read_seqbegin() without changing the current
> behaviour.

I am trying to remove the misuse of read_seqbegin_or_lock(),
then I am going to turn need_seqretry() into

	static inline int need_seqretry(seqlock_t *lock, int *seq)
	{
		int ret = !(*seq & 1) && read_seqretry(lock, *seq);

		if (ret)
			*seq = 1; /* make this counter odd */

		return ret;
	}

Oleg.
David Howells Nov. 1, 2023, 3:45 p.m. UTC | #2
Oleg Nesterov <oleg@redhat.com> wrote:

> read_seqbegin_or_lock() makes no sense unless you make "seq" odd
> after the lockless access failed.

I think you're wrong.

write_seqlock() turns it odd.  For instance, if the read lock is taken first:

	sequence seq	CPU 1				CPU 2
	======= =======	===============================	===============
	0
	0	0	seq = 0 // MUST BE EVEN ACCORDING TO DOC
	0	0	read_seqbegin_or_lock() [lockless]
			...
	1	0					write_seqlock()
	1	0	need_seqretry() [seq=even; sequence!=seq: retry]
	1	1	read_seqbegin_or_lock() [exclusive]
			-->spin_lock(lock);
	2	1					write_sequnlock()
			<--locked
			...
	2	1	need_seqretry()

However, if the write lock is taken first:

	sequence seq	CPU 1				CPU 2
	======= =======	===============================	===============
	0
	1						write_seqlock()
	1	0	seq = 0 // MUST BE EVEN ACCORDING TO DOC
	1	0	read_seqbegin_or_lock() [lockless]
	1	0	    __read_seqcount_begin()
				while (lock.sequence is odd)
				    cpu_relax();
	2	0					write_sequnlock()
	2	2		[loop end]
			...
	2	2	need_seqretry() [seq=even; sequence==seq; done]

Note that it spins in __read_seqcount_begin() until we get an even seq,
indicating that no write is currently in progress - at which point we can
perform a lockless pass.

> See thread_group_cputime() as an example, note that it does nextseq = 1 for
> the 2nd round.

That's not especially convincing.

David
Oleg Nesterov Nov. 1, 2023, 8:23 p.m. UTC | #3
On 11/01, David Howells wrote:
>
> Oleg Nesterov <oleg@redhat.com> wrote:
>
> > read_seqbegin_or_lock() makes no sense unless you make "seq" odd
> > after the lockless access failed.
>
> I think you're wrong.

I think you missed the point ;)

> write_seqlock() turns it odd.

It changes seqcount_t->sequence but not "seq" so this doesn't matter.

> For instance, if the read lock is taken first:
>
> 	sequence seq	CPU 1				CPU 2
> 	======= =======	===============================	===============
> 	0
> 	0	0	seq = 0  MUST BE EVEN

This is correct,

> ACCORDING TO DOC

documentation is wrong, please see

	[PATCH 1/2] seqlock: fix the wrong read_seqbegin_or_lock/need_seqretry documentation
	https://lore.kernel.org/all/20231024120808.GA15382@redhat.com/

> 	0	0	read_seqbegin_or_lock() [lockless]
> 			...
> 	1	0					write_seqlock()
> 	1	0	need_seqretry() [seq=even; sequence!=seq: retry]

Yes, if CPU_1 races with write_seqlock() need_seqretry() returns true,

> 	1	1	read_seqbegin_or_lock() [exclusive]

No. "seq" is still even, so read_seqbegin_or_lock() won't do read_seqlock_excl(),
it will do

	seq = read_seqbegin(lock);

again.

> Note that it spins in __read_seqcount_begin() until we get an even seq,
> indicating that no write is currently in progress - at which point we can
> perform a lockless pass.

Exactly. And this means that "seq" is always even.

> > See thread_group_cputime() as an example, note that it does nextseq = 1 for
> > the 2nd round.
>
> That's not especially convincing.

See also the usage of read_seqbegin_or_lock() in fs/dcache.c and fs/d_path.c.
All other users are wrong.

Lets start from the very beginning. This code does

        int seq = 0;
        do {
                read_seqbegin_or_lock(service_conn_lock, &seq);

                do_something();

        } while (need_seqretry(service_conn_lock, seq));

        done_seqretry(service_conn_lock, seq);

Initially seq is even (it is zero), so read_seqbegin_or_lock(&seq) does

	*seq = read_seqbegin(lock);

and returns. Note that "seq" is still even.

Now. If need_seqretry(seq) detects the race with write_seqlock() it returns
true but it does NOT change this "seq", it is still even. So on the next
iteration read_seqbegin_or_lock() will do

	*seq = read_seqbegin(lock);

again, it won't take this lock for writing. And again, seq will be even.
And so on.

And this means that the code above is equivalent to

	do {
		seq = read_seqbegin(service_conn_lock);

		do_something();

	} while (read_seqretry(service_conn_lock, seq));

and this is what this patch does.

Yes this is confusing. Again, even the documentation is wrong! That is why
I am trying to remove the misuse of read_seqbegin_or_lock(), then I am going
to change the semantics of need_seqretry() to enforce the locking on the 2nd
pass.

Oleg.
Oleg Nesterov Nov. 1, 2023, 8:40 p.m. UTC | #4
In case I was not clear, I am not saying this code is buggy.

Just none of read_seqbegin_or_lock/need_seqretry/done_seqretry
helpers make any sense in this code. It can use read_seqbegin/
read_seqretry and this won't change the current behaviour.

On 11/01, Oleg Nesterov wrote:
>
> On 11/01, David Howells wrote:
> >
> > Oleg Nesterov <oleg@redhat.com> wrote:
> >
> > > read_seqbegin_or_lock() makes no sense unless you make "seq" odd
> > > after the lockless access failed.
> >
> > I think you're wrong.
>
> I think you missed the point ;)
>
> > write_seqlock() turns it odd.
>
> It changes seqcount_t->sequence but not "seq" so this doesn't matter.
>
> > For instance, if the read lock is taken first:
> >
> > 	sequence seq	CPU 1				CPU 2
> > 	======= =======	===============================	===============
> > 	0
> > 	0	0	seq = 0  MUST BE EVEN
>
> This is correct,
>
> > ACCORDING TO DOC
>
> documentation is wrong, please see
>
> 	[PATCH 1/2] seqlock: fix the wrong read_seqbegin_or_lock/need_seqretry documentation
> 	https://lore.kernel.org/all/20231024120808.GA15382@redhat.com/
>
> > 	0	0	read_seqbegin_or_lock() [lockless]
> > 			...
> > 	1	0					write_seqlock()
> > 	1	0	need_seqretry() [seq=even; sequence!=seq: retry]
>
> Yes, if CPU_1 races with write_seqlock() need_seqretry() returns true,
>
> > 	1	1	read_seqbegin_or_lock() [exclusive]
>
> No. "seq" is still even, so read_seqbegin_or_lock() won't do read_seqlock_excl(),
> it will do
>
> 	seq = read_seqbegin(lock);
>
> again.
>
> > Note that it spins in __read_seqcount_begin() until we get an even seq,
> > indicating that no write is currently in progress - at which point we can
> > perform a lockless pass.
>
> Exactly. And this means that "seq" is always even.
>
> > > See thread_group_cputime() as an example, note that it does nextseq = 1 for
> > > the 2nd round.
> >
> > That's not especially convincing.
>
> See also the usage of read_seqbegin_or_lock() in fs/dcache.c and fs/d_path.c.
> All other users are wrong.
>
> Lets start from the very beginning. This code does
>
>         int seq = 0;
>         do {
>                 read_seqbegin_or_lock(service_conn_lock, &seq);
>
>                 do_something();
>
>         } while (need_seqretry(service_conn_lock, seq));
>
>         done_seqretry(service_conn_lock, seq);
>
> Initially seq is even (it is zero), so read_seqbegin_or_lock(&seq) does
>
> 	*seq = read_seqbegin(lock);
>
> and returns. Note that "seq" is still even.
>
> Now. If need_seqretry(seq) detects the race with write_seqlock() it returns
> true but it does NOT change this "seq", it is still even. So on the next
> iteration read_seqbegin_or_lock() will do
>
> 	*seq = read_seqbegin(lock);
>
> again, it won't take this lock for writing. And again, seq will be even.
> And so on.
>
> And this means that the code above is equivalent to
>
> 	do {
> 		seq = read_seqbegin(service_conn_lock);
>
> 		do_something();
>
> 	} while (read_seqretry(service_conn_lock, seq));
>
> and this is what this patch does.
>
> Yes this is confusing. Again, even the documentation is wrong! That is why
> I am trying to remove the misuse of read_seqbegin_or_lock(), then I am going
> to change the semantics of need_seqretry() to enforce the locking on the 2nd
> pass.
>
> Oleg.
Al Viro Nov. 1, 2023, 8:52 p.m. UTC | #5
On Wed, Nov 01, 2023 at 09:23:03PM +0100, Oleg Nesterov wrote:

> Yes this is confusing. Again, even the documentation is wrong! That is why
> I am trying to remove the misuse of read_seqbegin_or_lock(), then I am going
> to change the semantics of need_seqretry() to enforce the locking on the 2nd
> pass.

What for?  Sure, documentation needs to be fixed, but *not* in direction you
suggested in that patch.

Why would you want to force that "switch to locked on the second pass" policy
on every possible caller?
David Howells Nov. 1, 2023, 9:20 p.m. UTC | #6
Oleg Nesterov <oleg@redhat.com> wrote:

> > 	1	0	need_seqretry() [seq=even; sequence!=seq: retry]
>
> Yes, if CPU_1 races with write_seqlock() need_seqretry() returns true,
>
> > 	1	1	read_seqbegin_or_lock() [exclusive]
>
> No. "seq" is still even, so read_seqbegin_or_lock() won't do read_seqlock_excl(),
> it will do

Yeah, you're right.  I missed the fact that I got in the second example that
read_seqbegin_or_lock() spins until it sees a positive seq.

However, I think just changing all of these to always-lockless isn't
necessarily the most optimal way.  Yes, it will work... eventually.  But the
point is to limit the number of iterations.

So I have the following:

 (1) rxrpc_find_service_conn_rcu().

     I want to move the first part of the reaper to the I/O thread at some
     point, then the locking here can go away entirely.  However, this is
     drivable by external events, so I would prefer to limit the number of
     passes to just two and take a lock on the second pass.  Holding up the
     reaper thread for the moment is fine; holding up the I/O thread is not.

 (2) afs_lookup_volume_rcu().

     There can be a lot of volumes known by a system.  A thousand would
     require a 10-step walk and this is drivable by remote operation, so I
     think this should probably take a lock on the second pass too.

 (3) afs_check_validity().
 (4) afs_get_attr().

     These are both pretty short, so your solution is probably good for them.
     That said, afs_vnode_commit_status() can spend a long time under the
     write lock - and pretty much every file RPC op returns a status update.

 (5) afs_find_server().

     There could be a lot of servers in the list and each server can have
     multiple addresses, so I think this would be better with an exclusive
     second pass.

     The server list isn't likely to change all that often, but when it does
     change, there's a good chance several servers are going to be
     added/removed one after the other.  Further, this is only going to be
     used for incoming cache management/callback requests from the server,
     which hopefully aren't going to happen too often - but it is remotely
     drivable.

 (6) afs_find_server_by_uuid().

     Similarly to (5), there could be a lot of servers to search through, but
     they are in a tree not a flat list, so it should be faster to process.
     Again, it's not likely to change that often and, again, when it does
     change it's likely to involve multiple changes.  This can be driven
     remotely by an incoming cache management request but is mostly going to
     be driven by setting up or reconfiguring a volume's server list -
     something that also isn't likely to happen often.

I wonder if struct seqlock would make more sense with an rwlock rather than a
spinlock.  As it is, it does an exclusive spinlock for the readpath which is
kind of overkill.

David
David Howells Nov. 1, 2023, 9:22 p.m. UTC | #7
Oleg Nesterov <oleg@redhat.com> wrote:

> Just none of read_seqbegin_or_lock/need_seqretry/done_seqretry
> helpers make any sense in this code.

I disagree.  I think in at least a couple of cases I do want a locked second
path - ideally locked shared if seqlock can be made to use an rwlock instead
of a spinlock.

David
Oleg Nesterov Nov. 1, 2023, 9:52 p.m. UTC | #8
On 11/01, Al Viro wrote:
>
> On Wed, Nov 01, 2023 at 09:23:03PM +0100, Oleg Nesterov wrote:
>
> > Yes this is confusing. Again, even the documentation is wrong! That is why
> > I am trying to remove the misuse of read_seqbegin_or_lock(), then I am going
> > to change the semantics of need_seqretry() to enforce the locking on the 2nd
> > pass.
>
> What for?  Sure, documentation needs to be fixed,

So do you agree that the current usage of read_seqbegin_or_lock() in
rxrpc_find_service_conn_rcu() is misleading ? Do you agree it can use
read_seqbegin/read_seqretry without changing the current behaviour?

> but *not* in direction you
> suggested in that patch.

Hmm. then how do you think the doc should be changed? To describe the
current behaviour.

> Why would you want to force that "switch to locked on the second pass" policy
> on every possible caller?

Because this is what (I think) read_seqbegin_or_lock() is supposed to do.
It should take the lock for writing if the lockless access failed. At least
according to the documentation.

This needs another discussion and perhaps this makes no sense. But I'd
like to turn need_seqretry(seq) into something like

	static inline int need_seqretry(seqlock_t *lock, int *seq)
	{
		int ret = !(*seq & 1) && read_seqretry(lock, *seq);

		if (ret)
			*seq = 1; /* make this counter odd */

		return ret;
	}

and update the users which actually want read_seqlock_excl() on the 2nd pass.
thread_group_cputime(), fs/d_path.c and fs/dcache.c.

For example, __dentry_path()

	--- a/fs/d_path.c
	+++ b/fs/d_path.c
	@@ -349,10 +349,9 @@ static char *__dentry_path(const struct dentry *d, struct prepend_buffer *p)
		}
		if (!(seq & 1))
			rcu_read_unlock();
	-	if (need_seqretry(&rename_lock, seq)) {
	-		seq = 1;
	+	if (need_seqretry(&rename_lock, &seq))
			goto restart;
	-	}
	+
		done_seqretry(&rename_lock, seq);
		if (b.len == p->len)
			prepend_char(&b, '/');


but again, this need another discussion.

Oleg.
Oleg Nesterov Nov. 1, 2023, 10:15 p.m. UTC | #9
On 11/01, David Howells wrote:
>
> However, I think just changing all of these to always-lockless isn't
> necessarily the most optimal way.

Yes, but so far I am trying to change the users which never take the
lock for writing, so this patch doesn't change the current behaviour.

> I wonder if struct seqlock would make more sense with an rwlock rather than a
> spinlock.  As it is, it does an exclusive spinlock for the readpath which is
> kind of overkill.

Heh. Please see

	[PATCH 4/5] seqlock: introduce read_seqcount_begin_or_lock() and friends
	https://lore.kernel.org/all/20230913155005.GA26252@redhat.com/

I am going to return to this later.

Oleg.
Oleg Nesterov Nov. 1, 2023, 10:29 p.m. UTC | #10
sorry for noise, but in case I wasn't clear...

On 11/01, Oleg Nesterov wrote:
>
> On 11/01, David Howells wrote:
> >
> > However, I think just changing all of these to always-lockless isn't
> > necessarily the most optimal way.
>
> Yes, but so far I am trying to change the users which never take the
> lock for writing, so this patch doesn't change the current behaviour.
>
> > I wonder if struct seqlock would make more sense with an rwlock rather than a
> > spinlock.  As it is, it does an exclusive spinlock for the readpath which is
> > kind of overkill.
>
> Heh. Please see
>
> 	[PATCH 4/5] seqlock: introduce read_seqcount_begin_or_lock() and friends
> 	https://lore.kernel.org/all/20230913155005.GA26252@redhat.com/
>

I meant, we already have seqcount_rwlock_t, but currently you can't do
something like read_seqbegin_or_lock(&seqcount_rwlock_t).

> I am going to return to this later.

Yes.

Oleg.
Oleg Nesterov Nov. 1, 2023, 10:38 p.m. UTC | #11
On 11/01, David Howells wrote:
>
> Oleg Nesterov <oleg@redhat.com> wrote:
>
> > Just none of read_seqbegin_or_lock/need_seqretry/done_seqretry
> > helpers make any sense in this code.
>
> I disagree.  I think in at least a couple of cases I do want a locked second
> path

Sorry for confusion. I never said that the 2nd locked pass makes no sense.

My only point is that rxrpc_find_service_conn_rcu() (and more) use
read_seqbegin_or_lock() incorrectly. They can use read_seqbegin() and this
won't change the current behaviour.

So lets change these users first? Then we can discuss the possible changes
in include/linux/seqlock.h and (perhaps) update the users which actually
want the locking on the 2nd pass.

Oleg.
Al Viro Nov. 1, 2023, 10:48 p.m. UTC | #12
On Wed, Nov 01, 2023 at 10:52:15PM +0100, Oleg Nesterov wrote:

> > Why would you want to force that "switch to locked on the second pass" policy
> > on every possible caller?
> 
> Because this is what (I think) read_seqbegin_or_lock() is supposed to do.
> It should take the lock for writing if the lockless access failed. At least
> according to the documentation.

Not really - it's literally seqbegin or lock, depending upon what the caller
tells it...  IMO the mistake in docs is the insistence on using do-while
loop for its users.

Take a look at d_walk() and try to shoehorn that into your variant.  Especially
the D_WALK_NORETRY handling...
Oleg Nesterov Nov. 1, 2023, 11:17 p.m. UTC | #13
On 11/01, Al Viro wrote:
>
> On Wed, Nov 01, 2023 at 10:52:15PM +0100, Oleg Nesterov wrote:
>
> > > Why would you want to force that "switch to locked on the second pass" policy
> > > on every possible caller?
> >
> > Because this is what (I think) read_seqbegin_or_lock() is supposed to do.
> > It should take the lock for writing if the lockless access failed. At least
> > according to the documentation.
>
> Not really - it's literally seqbegin or lock, depending upon what the caller
> tells it...

OK, I won't argue right now. But again, this patch doesn't change the current
behaviour. Exactly because the caller does NOT tell read_seqbegin_or_lock() that
it wants "or lock" on the 2nd pass.

> Take a look at d_walk() and try to shoehorn that into your variant.  Especially
> the D_WALK_NORETRY handling...

I am already sleeping, quite possibly I am wrong. But it seems that if we change
done_seqretry() then d_walk() needs something like

	--- a/fs/dcache.c
	+++ b/fs/dcache.c
	@@ -1420,7 +1420,7 @@ static void d_walk(struct dentry *parent, void *data,
			spin_lock(&this_parent->d_lock);
	 
			/* might go back up the wrong parent if we have had a rename. */
	-		if (need_seqretry(&rename_lock, seq))
	+		if (need_seqretry(&rename_lock, &seq))
				goto rename_retry;
			/* go into the first sibling still alive */
			do {
	@@ -1432,22 +1432,20 @@ static void d_walk(struct dentry *parent, void *data,
			rcu_read_unlock();
			goto resume;
		}
	-	if (need_seqretry(&rename_lock, seq))
	+	if (need_seqretry(&rename_lock, &seq))
			goto rename_retry;
		rcu_read_unlock();
	 
	 out_unlock:
		spin_unlock(&this_parent->d_lock);
	-	done_seqretry(&rename_lock, seq);
	+	done_seqretry(&rename_lock, &seq);
		return;
	 
	 rename_retry:
		spin_unlock(&this_parent->d_lock);
		rcu_read_unlock();
	-	BUG_ON(seq & 1);
		if (!retry)
			return;
	-	seq = 1;
		goto again;
	 }


But again, again, this is off-topic and needs another discussion. Right now I am just
trying to audit the users of read_seqbegin_or_lock/need_seqretry and change those who
use them incorrectly.

Oleg.
Oleg Nesterov Nov. 16, 2023, 1:18 p.m. UTC | #14
David, Al,

So do you agree that

	- the usage of read_seqbegin_or_lock/need_seqretry in
	  this code makes no sense because read_seqlock_excl()
	  is not possible

	- this patch doesn't change the current behaviour but
	  simplifies the code and makes it more clear

?

On 10/27, Oleg Nesterov wrote:
>
> read_seqbegin_or_lock() makes no sense unless you make "seq" odd
> after the lockless access failed. See thread_group_cputime() as
> an example, note that it does nextseq = 1 for the 2nd round.
> 
> So this code can use read_seqbegin() without changing the current
> behaviour.
> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> ---
>  net/rxrpc/conn_service.c | 7 +++----
>  1 file changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/net/rxrpc/conn_service.c b/net/rxrpc/conn_service.c
> index 89ac05a711a4..bfafe58681d9 100644
> --- a/net/rxrpc/conn_service.c
> +++ b/net/rxrpc/conn_service.c
> @@ -25,7 +25,7 @@ struct rxrpc_connection *rxrpc_find_service_conn_rcu(struct rxrpc_peer *peer,
>  	struct rxrpc_conn_proto k;
>  	struct rxrpc_skb_priv *sp = rxrpc_skb(skb);
>  	struct rb_node *p;
> -	unsigned int seq = 0;
> +	unsigned int seq;
>  
>  	k.epoch	= sp->hdr.epoch;
>  	k.cid	= sp->hdr.cid & RXRPC_CIDMASK;
> @@ -35,7 +35,7 @@ struct rxrpc_connection *rxrpc_find_service_conn_rcu(struct rxrpc_peer *peer,
>  		 * under just the RCU read lock, so we have to check for
>  		 * changes.
>  		 */
> -		read_seqbegin_or_lock(&peer->service_conn_lock, &seq);
> +		seq = read_seqbegin(&peer->service_conn_lock);
>  
>  		p = rcu_dereference_raw(peer->service_conns.rb_node);
>  		while (p) {
> @@ -49,9 +49,8 @@ struct rxrpc_connection *rxrpc_find_service_conn_rcu(struct rxrpc_peer *peer,
>  				break;
>  			conn = NULL;
>  		}
> -	} while (need_seqretry(&peer->service_conn_lock, seq));
> +	} while (read_seqretry(&peer->service_conn_lock, seq));
>  
> -	done_seqretry(&peer->service_conn_lock, seq);
>  	_leave(" = %d", conn ? conn->debug_id : -1);
>  	return conn;
>  }
> -- 
> 2.25.1.362.g51ebf55
>
David Howells Nov. 16, 2023, 1:41 p.m. UTC | #15
Oleg Nesterov <oleg@redhat.com> wrote:

> So do you agree that
> 
> 	- the usage of read_seqbegin_or_lock/need_seqretry in
> 	  this code makes no sense because read_seqlock_excl()
> 	  is not possible

Not exactly.  I think it should take a lock on the second pass.

> 	- this patch doesn't change the current behaviour but
> 	  simplifies the code and makes it more clear

That is true.

David
Oleg Nesterov Nov. 16, 2023, 2:19 p.m. UTC | #16
On 11/16, David Howells wrote:
>
> Oleg Nesterov <oleg@redhat.com> wrote:
>
> > So do you agree that
> >
> > 	- the usage of read_seqbegin_or_lock/need_seqretry in
> > 	  this code makes no sense because read_seqlock_excl()
> > 	  is not possible
>
> Not exactly.  I think it should take a lock on the second pass.

OK, then how about the patch below?

Again, I'd prefer to change the semantics/prototype of need_seqretry()
to enforce the locking on the 2nd pass "automatically", but a) this
needs more discussion and b) I can't do this before I update the users
which use read_seqbegin_or_lock/need_seqretry incorrectly. So lets
discuss this later.

Oleg.

--- a/net/rxrpc/conn_service.c
+++ b/net/rxrpc/conn_service.c
@@ -25,7 +25,7 @@ struct rxrpc_connection *rxrpc_find_service_conn_rcu(struct rxrpc_peer *peer,
 	struct rxrpc_conn_proto k;
 	struct rxrpc_skb_priv *sp = rxrpc_skb(skb);
 	struct rb_node *p;
-	unsigned int seq = 0;
+	unsigned int seq = 1;
 
 	k.epoch	= sp->hdr.epoch;
 	k.cid	= sp->hdr.cid & RXRPC_CIDMASK;
@@ -35,6 +35,7 @@ struct rxrpc_connection *rxrpc_find_service_conn_rcu(struct rxrpc_peer *peer,
 		 * under just the RCU read lock, so we have to check for
 		 * changes.
 		 */
+		seq++; /* 2 on the 1st/lockless path, otherwise odd */
 		read_seqbegin_or_lock(&peer->service_conn_lock, &seq);
 
 		p = rcu_dereference_raw(peer->service_conns.rb_node);
David Howells Nov. 16, 2023, 3:02 p.m. UTC | #17
Oleg Nesterov <oleg@redhat.com> wrote:

> > > 	- the usage of read_seqbegin_or_lock/need_seqretry in
> > > 	  this code makes no sense because read_seqlock_excl()
> > > 	  is not possible
> >
> > Not exactly.  I think it should take a lock on the second pass.
> 
> OK, then how about the patch below?

That seems to work.

David
Oleg Nesterov Nov. 16, 2023, 3:06 p.m. UTC | #18
On 11/16, David Howells wrote:
>
> Oleg Nesterov <oleg@redhat.com> wrote:
>
> > > > 	- the usage of read_seqbegin_or_lock/need_seqretry in
> > > > 	  this code makes no sense because read_seqlock_excl()
> > > > 	  is not possible
> > >
> > > Not exactly.  I think it should take a lock on the second pass.
> >
> > OK, then how about the patch below?
>
> That seems to work.

OK, I'll send V2 tomorrow.

Should I change fs/afs the same way?

Oleg.
diff mbox series

Patch

diff --git a/net/rxrpc/conn_service.c b/net/rxrpc/conn_service.c
index 89ac05a711a4..bfafe58681d9 100644
--- a/net/rxrpc/conn_service.c
+++ b/net/rxrpc/conn_service.c
@@ -25,7 +25,7 @@  struct rxrpc_connection *rxrpc_find_service_conn_rcu(struct rxrpc_peer *peer,
 	struct rxrpc_conn_proto k;
 	struct rxrpc_skb_priv *sp = rxrpc_skb(skb);
 	struct rb_node *p;
-	unsigned int seq = 0;
+	unsigned int seq;
 
 	k.epoch	= sp->hdr.epoch;
 	k.cid	= sp->hdr.cid & RXRPC_CIDMASK;
@@ -35,7 +35,7 @@  struct rxrpc_connection *rxrpc_find_service_conn_rcu(struct rxrpc_peer *peer,
 		 * under just the RCU read lock, so we have to check for
 		 * changes.
 		 */
-		read_seqbegin_or_lock(&peer->service_conn_lock, &seq);
+		seq = read_seqbegin(&peer->service_conn_lock);
 
 		p = rcu_dereference_raw(peer->service_conns.rb_node);
 		while (p) {
@@ -49,9 +49,8 @@  struct rxrpc_connection *rxrpc_find_service_conn_rcu(struct rxrpc_peer *peer,
 				break;
 			conn = NULL;
 		}
-	} while (need_seqretry(&peer->service_conn_lock, seq));
+	} while (read_seqretry(&peer->service_conn_lock, seq));
 
-	done_seqretry(&peer->service_conn_lock, seq);
 	_leave(" = %d", conn ? conn->debug_id : -1);
 	return conn;
 }