[0/5,-,V2] locks: avoid thundering-herd wake-ups

Message ID	153378012255.1220.6754153662007899557.stgit@noble (mailing list archive)
Headers	show Return-Path: <linux-fsdevel-owner@kernel.org> From: NeilBrown <neilb@suse.com> To: Jeff Layton <jlayton@kernel.org>, Alexander Viro <viro@zeniv.linux.org.uk> Date: Thu, 09 Aug 2018 12:04:41 +1000 Subject: [PATCH 0/5 - V2] locks: avoid thundering-herd wake-ups Cc: "J. Bruce Fields" <bfields@fieldses.org>, Martin Wilck <mwilck@suse.de>, linux-fsdevel@vger.kernel.org, Frank Filz <ffilzlnx@mindspring.com>, linux-kernel@vger.kernel.org Message-ID: <153378012255.1220.6754153662007899557.stgit@noble> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk
Series	locks: avoid thundering-herd wake-ups \| expand [0/5,-,V2] locks: avoid thundering-herd wake-ups [1/5] fs/locks: rename some lists and pointers. [2/5] fs/locks: allow a lock request to block other requests. [3/5] fs/locks: change all *_conflict() functions to return a new enum. [4/5] fs/locks: split out __locks_wake_one() [5/5] fs/locks: create a tree of dependent requests.

NeilBrown Aug. 9, 2018, 2:04 a.m. UTC

This series adds "wake_non_conflicts()" to wake up
any waiters that are being added beneath a lock that they
don't actually conflict with, even though they conflict with a parent
which conflict with the top-level blocker.
Thanks for Bruce for highlighting this issue.

This series hasn't been tested beyond compile-test.

Original description:
If you have a many-core machine, and have many threads all wanting to
briefly lock a give file (udev is known to do this), you can get quite
poor performance.

When one thread releases a lock, it wakes up all other threads that
are waiting (classic thundering-herd) - one will get the lock and the
others go to sleep.
When you have few cores, this is not very noticeable: by the time the
4th or 5th thread gets enough CPU time to try to claim the lock, the
earlier threads have claimed it, done what was needed, and released.
With 50+ cores, the contention can easily be measured.

This patchset creates a tree of pending lock request in which siblings
don't conflict and each lock request does conflict with its parent.
When a lock is released, only requests which don't conflict with each
other a woken.

Testing shows that lock-acquisitions-per-second is now fairly stable even
as number of contending process goes to 1000.  Without this patch,
locks-per-second drops off steeply after a few 10s of processes.

There is a small cost to this extra complexity.
At 20 processes running a particular test on 72 cores, the lock
acquisitions per second drops from 1.8 million to 1.4 million with
this patch.  For 100 processes, this patch still provides 1.4 million
while without this patch there are about 700,000.

NeilBrown


---

NeilBrown (5):
      fs/locks: rename some lists and pointers.
      fs/locks: allow a lock request to block other requests.
      fs/locks: change all *_conflict() functions to return a new enum.
      fs/locks: split out __locks_wake_one()
      fs/locks: create a tree of dependent requests.


 fs/cifs/file.c                  |    2 
 fs/locks.c                      |  228 ++++++++++++++++++++++++++++++---------
 include/linux/fs.h              |    5 +
 include/trace/events/filelock.h |   16 +--
 4 files changed, 186 insertions(+), 65 deletions(-)

--
Signature

J. Bruce Fields Aug. 9, 2018, 5:32 p.m. UTC | #1

I think there's also a problem with multiple tasks sharing the same
lock owner.

So, all locks are exclusive locks for the same range.  We have four
tasks.  Tasks 1 and 4 share the same owner, the others' owners are
distinct.

	- Task 1 gets a lock.
	- Task 2 gets a conflicting lock.
	- Task 3 gets another conflicting lock.  So now we the tree is
		3->2->1.
	- Task 1's lock is released.
	- Before task 2 is scheduled, task 4 acquires a new lock.
	- Task 2 waits on task 4's lock, we now have
		3->2->4.

Task 3 shouldn't be waiting--the lock it's requesting has the same owner
as the lock task 4 holds--but we fail to wake up task 3.

--b.

NeilBrown Aug. 9, 2018, 10:12 p.m. UTC | #2

On Thu, Aug 09 2018, J. Bruce Fields wrote:

> I think there's also a problem with multiple tasks sharing the same
> lock owner.
>
> So, all locks are exclusive locks for the same range.  We have four
> tasks.  Tasks 1 and 4 share the same owner, the others' owners are
> distinct.
>
> 	- Task 1 gets a lock.
> 	- Task 2 gets a conflicting lock.
> 	- Task 3 gets another conflicting lock.  So now we the tree is
> 		3->2->1.
> 	- Task 1's lock is released.
> 	- Before task 2 is scheduled, task 4 acquires a new lock.
> 	- Task 2 waits on task 4's lock, we now have
> 		3->2->4.
>
> Task 3 shouldn't be waiting--the lock it's requesting has the same owner
> as the lock task 4 holds--but we fail to wake up task 3.

So task 1 and task 4 are threads in the one process - OK.
Tasks 2 and 3 are threads in two other processes.

So 2 and 3 conflict with either 1 or 4 equally - why should task 3 be
woken?

I suspect you got the numbers bit mixed up, but in any case, the
"conflict()" function that is passed around takes ownership into account
when assessing if one lock conflicts with another.

NeilBrown

J. Bruce Fields Aug. 10, 2018, 12:29 a.m. UTC | #3

On Fri, Aug 10, 2018 at 08:12:43AM +1000, NeilBrown wrote:
> On Thu, Aug 09 2018, J. Bruce Fields wrote:
> 
> > I think there's also a problem with multiple tasks sharing the same
> > lock owner.
> >
> > So, all locks are exclusive locks for the same range.  We have four
> > tasks.  Tasks 1 and 4 share the same owner, the others' owners are
> > distinct.
> >
> > 	- Task 1 gets a lock.
> > 	- Task 2 gets a conflicting lock.
> > 	- Task 3 gets another conflicting lock.  So now we the tree is
> > 		3->2->1.
> > 	- Task 1's lock is released.
> > 	- Before task 2 is scheduled, task 4 acquires a new lock.
> > 	- Task 2 waits on task 4's lock, we now have
> > 		3->2->4.
> >
> > Task 3 shouldn't be waiting--the lock it's requesting has the same owner
> > as the lock task 4 holds--but we fail to wake up task 3.
> 
> So task 1 and task 4 are threads in the one process - OK.
> Tasks 2 and 3 are threads in two other processes.
> 
> So 2 and 3 conflict with either 1 or 4 equally - why should task 3 be
> woken?
> 
> I suspect you got the numbers bit mixed up,

Whoops.

> but in any case, the "conflict()" function that is passed around takes
> ownership into account when assessing if one lock conflicts with
> another.

Right, I know, but, let me try again:

All locks are exclusive locks for the same range.  Only tasks 3 and 4
share the the same owner.

	- Task 1 gets a lock.
	- Task 2 requests a conflicting lock, so we have    2->1.
	- Task 3 requests a conflicting lock, so we have 3->2->1.
	- Task 1 unlocks.  We wake up task 2, but it isn't scheduled yet.
	- Task 4 gets a new lock.
	- Task 2 runs, discovers the conflict, and waits.  Now we have:
		3->2->4.

There is no conflict between the lock 3 requested and the lock 4 holds,
but 3 is not woken up.

This is another version of the first problem: there's information we
need (the owners of the waiting locks in the tree) that we can't
determine just from looking at the root of the tree.

I'm not sure what to do about that.

--b.

NeilBrown Aug. 10, 2018, 1:50 a.m. UTC | #4

On Thu, Aug 09 2018, J. Bruce Fields wrote:

> On Fri, Aug 10, 2018 at 08:12:43AM +1000, NeilBrown wrote:
>> On Thu, Aug 09 2018, J. Bruce Fields wrote:
>> 
>> > I think there's also a problem with multiple tasks sharing the same
>> > lock owner.
>> >
>> > So, all locks are exclusive locks for the same range.  We have four
>> > tasks.  Tasks 1 and 4 share the same owner, the others' owners are
>> > distinct.
>> >
>> > 	- Task 1 gets a lock.
>> > 	- Task 2 gets a conflicting lock.
>> > 	- Task 3 gets another conflicting lock.  So now we the tree is
>> > 		3->2->1.
>> > 	- Task 1's lock is released.
>> > 	- Before task 2 is scheduled, task 4 acquires a new lock.
>> > 	- Task 2 waits on task 4's lock, we now have
>> > 		3->2->4.
>> >
>> > Task 3 shouldn't be waiting--the lock it's requesting has the same owner
>> > as the lock task 4 holds--but we fail to wake up task 3.
>> 
>> So task 1 and task 4 are threads in the one process - OK.
>> Tasks 2 and 3 are threads in two other processes.
>> 
>> So 2 and 3 conflict with either 1 or 4 equally - why should task 3 be
>> woken?
>> 
>> I suspect you got the numbers bit mixed up,
>
> Whoops.
>
>> but in any case, the "conflict()" function that is passed around takes
>> ownership into account when assessing if one lock conflicts with
>> another.
>
> Right, I know, but, let me try again:
>
> All locks are exclusive locks for the same range.  Only tasks 3 and 4
> share the the same owner.
>
> 	- Task 1 gets a lock.
> 	- Task 2 requests a conflicting lock, so we have    2->1.
> 	- Task 3 requests a conflicting lock, so we have 3->2->1.
> 	- Task 1 unlocks.  We wake up task 2, but it isn't scheduled yet.
> 	- Task 4 gets a new lock.
> 	- Task 2 runs, discovers the conflict, and waits.  Now we have:
> 		3->2->4.
>
> There is no conflict between the lock 3 requested and the lock 4 holds,
> but 3 is not woken up.
>
> This is another version of the first problem: there's information we
> need (the owners of the waiting locks in the tree) that we can't
> determine just from looking at the root of the tree.
>
> I'm not sure what to do about that.

You're good at this game!

So, because a locker with the same "owner" gets a free pass, you can
*never* say that any lock which conflicts with A also conflicts with B,
as a lock with the same owner as B will never conflict with B, even
though it conflicts with A.

I think there is still value in having the tree, but when a waiter is
attached under a new blocker, we need to walk the whole tree beneath the
waiter and detach/wake anything that is not blocked by the new blocker.

Thanks,
NeilBrown

J. Bruce Fields Aug. 10, 2018, 2:52 a.m. UTC | #5

On Fri, Aug 10, 2018 at 11:50:58AM +1000, NeilBrown wrote:
> You're good at this game!

Everybody's got to have a hobby, mine is pathological posix locking
cases....

> So, because a locker with the same "owner" gets a free pass, you can
> *never* say that any lock which conflicts with A also conflicts with B,
> as a lock with the same owner as B will never conflict with B, even
> though it conflicts with A.
> 
> I think there is still value in having the tree, but when a waiter is
> attached under a new blocker, we need to walk the whole tree beneath the
> waiter and detach/wake anything that is not blocked by the new blocker.

If you're walking the whole tree every time then it might as well be a
flat list, I think?

--b.

NeilBrown Aug. 10, 2018, 3:17 a.m. UTC | #6

On Thu, Aug 09 2018, J. Bruce Fields wrote:

> On Fri, Aug 10, 2018 at 11:50:58AM +1000, NeilBrown wrote:
>> You're good at this game!
>
> Everybody's got to have a hobby, mine is pathological posix locking
> cases....
>
>> So, because a locker with the same "owner" gets a free pass, you can
>> *never* say that any lock which conflicts with A also conflicts with B,
>> as a lock with the same owner as B will never conflict with B, even
>> though it conflicts with A.
>> 
>> I think there is still value in having the tree, but when a waiter is
>> attached under a new blocker, we need to walk the whole tree beneath the
>> waiter and detach/wake anything that is not blocked by the new blocker.
>
> If you're walking the whole tree every time then it might as well be a
> flat list, I think?

The advantage of a tree is that it keeps over-lapping locks closer
together.
For it to make a difference you would need a load where lots of threads
were locking several different small ranges, and other threads were
locking large ranges that cover all the small ranges.
I doubt this is common, but it doesn't seem as strange as other things
I've seen in the wild.
The other advantage, of course, is that I've already written the code,
and I like it.

Maybe I'll do a simple-list version, then a patch to convert that to the
clever-tree version, and we can then have something concrete to assess.

Thanks,
NeilBrown

J. Bruce Fields Aug. 10, 2018, 3:47 p.m. UTC | #7

On Fri, Aug 10, 2018 at 01:17:14PM +1000, NeilBrown wrote:
> On Thu, Aug 09 2018, J. Bruce Fields wrote:
> 
> > On Fri, Aug 10, 2018 at 11:50:58AM +1000, NeilBrown wrote:
> >> You're good at this game!
> >
> > Everybody's got to have a hobby, mine is pathological posix locking
> > cases....
> >
> >> So, because a locker with the same "owner" gets a free pass, you can
> >> *never* say that any lock which conflicts with A also conflicts with B,
> >> as a lock with the same owner as B will never conflict with B, even
> >> though it conflicts with A.
> >> 
> >> I think there is still value in having the tree, but when a waiter is
> >> attached under a new blocker, we need to walk the whole tree beneath the
> >> waiter and detach/wake anything that is not blocked by the new blocker.
> >
> > If you're walking the whole tree every time then it might as well be a
> > flat list, I think?
> 
> The advantage of a tree is that it keeps over-lapping locks closer
> together.
> For it to make a difference you would need a load where lots of threads
> were locking several different small ranges, and other threads were
> locking large ranges that cover all the small ranges.

OK, I'm not sure I understand, but I'll give another look at the next
version....

> I doubt this is common, but it doesn't seem as strange as other things
> I've seen in the wild.
> The other advantage, of course, is that I've already written the code,
> and I like it.
> 
> Maybe I'll do a simple-list version, then a patch to convert that to the
> clever-tree version, and we can then have something concrete to assess.

That might help, thanks.

--b.

Jeff Layton Aug. 11, 2018, 11:51 a.m. UTC | #8

On Thu, 2018-08-09 at 20:29 -0400, J. Bruce Fields wrote:
> On Fri, Aug 10, 2018 at 08:12:43AM +1000, NeilBrown wrote:
> > On Thu, Aug 09 2018, J. Bruce Fields wrote:
> > 
> > > I think there's also a problem with multiple tasks sharing the same
> > > lock owner.
> > > 
> > > So, all locks are exclusive locks for the same range.  We have four
> > > tasks.  Tasks 1 and 4 share the same owner, the others' owners are
> > > distinct.
> > > 
> > > 	- Task 1 gets a lock.
> > > 	- Task 2 gets a conflicting lock.
> > > 	- Task 3 gets another conflicting lock.  So now we the tree is
> > > 		3->2->1.
> > > 	- Task 1's lock is released.
> > > 	- Before task 2 is scheduled, task 4 acquires a new lock.
> > > 	- Task 2 waits on task 4's lock, we now have
> > > 		3->2->4.
> > > 
> > > Task 3 shouldn't be waiting--the lock it's requesting has the same owner
> > > as the lock task 4 holds--but we fail to wake up task 3.
> > 
> > So task 1 and task 4 are threads in the one process - OK.
> > Tasks 2 and 3 are threads in two other processes.
> > 
> > So 2 and 3 conflict with either 1 or 4 equally - why should task 3 be
> > woken?
> > 
> > I suspect you got the numbers bit mixed up,
> 
> Whoops.
> 
> > but in any case, the "conflict()" function that is passed around takes
> > ownership into account when assessing if one lock conflicts with
> > another.
> 
> Right, I know, but, let me try again:
> 
> All locks are exclusive locks for the same range.  Only tasks 3 and 4
> share the the same owner.
> 
> 	- Task 1 gets a lock.
> 	- Task 2 requests a conflicting lock, so we have    2->1.
> 	- Task 3 requests a conflicting lock, so we have 3->2->1.
> 	- Task 1 unlocks.  We wake up task 2, but it isn't scheduled yet.
> 	- Task 4 gets a new lock.
> 	- Task 2 runs, discovers the conflict, and waits.  Now we have:
> 		3->2->4.
> 
> There is no conflict between the lock 3 requested and the lock 4 holds,
> but 3 is not woken up.
> 
> This is another version of the first problem: there's information we
> need (the owners of the waiting locks in the tree) that we can't
> determine just from looking at the root of the tree.
> 
> I'm not sure what to do about that.
> 

Is this still a problem in the v2 set?

wake_non_conflicts walks the whole tree of requests that were blocked on
it, so a. After task 2 discovers the conflict, it should wake any of its
children that don't conflict. So in that last step, task 3 would be
awoken before task 2 goes back to sleep.

Jeff Layton Aug. 11, 2018, 11:56 a.m. UTC | #9

On Fri, 2018-08-10 at 11:47 -0400, J. Bruce Fields wrote:
> On Fri, Aug 10, 2018 at 01:17:14PM +1000, NeilBrown wrote:
> > On Thu, Aug 09 2018, J. Bruce Fields wrote:
> > 
> > > On Fri, Aug 10, 2018 at 11:50:58AM +1000, NeilBrown wrote:
> > > > You're good at this game!
> > > 
> > > Everybody's got to have a hobby, mine is pathological posix locking
> > > cases....
> > > 
> > > > So, because a locker with the same "owner" gets a free pass, you can
> > > > *never* say that any lock which conflicts with A also conflicts with B,
> > > > as a lock with the same owner as B will never conflict with B, even
> > > > though it conflicts with A.
> > > > 
> > > > I think there is still value in having the tree, but when a waiter is
> > > > attached under a new blocker, we need to walk the whole tree beneath the
> > > > waiter and detach/wake anything that is not blocked by the new blocker.
> > > 
> > > If you're walking the whole tree every time then it might as well be a
> > > flat list, I think?
> > 
> > The advantage of a tree is that it keeps over-lapping locks closer
> > together.
> > For it to make a difference you would need a load where lots of threads
> > were locking several different small ranges, and other threads were
> > locking large ranges that cover all the small ranges.
> 
> OK, I'm not sure I understand, but I'll give another look at the next
> version....
> 
> > I doubt this is common, but it doesn't seem as strange as other things
> > I've seen in the wild.
> > The other advantage, of course, is that I've already written the code,
> > and I like it.
> > 
> > Maybe I'll do a simple-list version, then a patch to convert that to the
> > clever-tree version, and we can then have something concrete to assess.
> 
> That might help, thanks.
> 

FWIW, I did a bit of testing with lockperf tests that I had written on
an earlier rework of this code:

    https://git.samba.org/jlayton/linux.git/?p=jlayton/lockperf.git;a=summary

The posix01 and flock01 tests in there show about a 10x speedup with
this set in place.

I think something closer to Neil's design will end up being what we want
here. Consider the relatively common case where you have a whole-file
POSIX write lock held with a bunch of different waiters blocked on it
(all whole file write locks with different owners):

With Neil's patches, you will just wake up a single waiter when the
blocked lock is released, as they would all be in a long chain of
waiters.

If you keep all the locks in a single list, you'll either have to:

a) wake up all the waiters on the list when the lock comes free: no lock
is held at that point so none of them will conflict.

...or...

b) keep track of what waiters have already been awoken, and compare any
further candidate for waking against the current set of held locks and
any lock requests by waiters that you just woke.

b seems more expensive as you have to walk over a larger set of locks
on every change.

J. Bruce Fields Aug. 11, 2018, 12:21 p.m. UTC | #10

On Sat, Aug 11, 2018 at 07:51:13AM -0400, Jeff Layton wrote:
> On Thu, 2018-08-09 at 20:29 -0400, J. Bruce Fields wrote:
> > On Fri, Aug 10, 2018 at 08:12:43AM +1000, NeilBrown wrote:
> > > On Thu, Aug 09 2018, J. Bruce Fields wrote:
> > > 
> > > > I think there's also a problem with multiple tasks sharing the same
> > > > lock owner.
> > > > 
> > > > So, all locks are exclusive locks for the same range.  We have four
> > > > tasks.  Tasks 1 and 4 share the same owner, the others' owners are
> > > > distinct.
> > > > 
> > > > 	- Task 1 gets a lock.
> > > > 	- Task 2 gets a conflicting lock.
> > > > 	- Task 3 gets another conflicting lock.  So now we the tree is
> > > > 		3->2->1.
> > > > 	- Task 1's lock is released.
> > > > 	- Before task 2 is scheduled, task 4 acquires a new lock.
> > > > 	- Task 2 waits on task 4's lock, we now have
> > > > 		3->2->4.
> > > > 
> > > > Task 3 shouldn't be waiting--the lock it's requesting has the same owner
> > > > as the lock task 4 holds--but we fail to wake up task 3.
> > > 
> > > So task 1 and task 4 are threads in the one process - OK.
> > > Tasks 2 and 3 are threads in two other processes.
> > > 
> > > So 2 and 3 conflict with either 1 or 4 equally - why should task 3 be
> > > woken?
> > > 
> > > I suspect you got the numbers bit mixed up,
> > 
> > Whoops.
> > 
> > > but in any case, the "conflict()" function that is passed around takes
> > > ownership into account when assessing if one lock conflicts with
> > > another.
> > 
> > Right, I know, but, let me try again:
> > 
> > All locks are exclusive locks for the same range.  Only tasks 3 and 4
> > share the the same owner.
> > 
> > 	- Task 1 gets a lock.
> > 	- Task 2 requests a conflicting lock, so we have    2->1.
> > 	- Task 3 requests a conflicting lock, so we have 3->2->1.
> > 	- Task 1 unlocks.  We wake up task 2, but it isn't scheduled yet.
> > 	- Task 4 gets a new lock.
> > 	- Task 2 runs, discovers the conflict, and waits.  Now we have:
> > 		3->2->4.
> > 
> > There is no conflict between the lock 3 requested and the lock 4 holds,
> > but 3 is not woken up.
> > 
> > This is another version of the first problem: there's information we
> > need (the owners of the waiting locks in the tree) that we can't
> > determine just from looking at the root of the tree.
> > 
> > I'm not sure what to do about that.
> > 
> 
> Is this still a problem in the v2 set?
> 
> wake_non_conflicts walks the whole tree of requests that were blocked on
> it,

Not in the FL_TRANSITIVE_CONFLICT case, which is the case here.

--b.

> so a. After task 2 discovers the conflict, it should wake any of its
> children that don't conflict. So in that last step, task 3 would be
> awoken before task 2 goes back to sleep.
> 
> -- 
> Jeff Layton <jlayton@kernel.org>

J. Bruce Fields Aug. 11, 2018, 12:35 p.m. UTC | #11

On Sat, Aug 11, 2018 at 07:56:25AM -0400, Jeff Layton wrote:
> FWIW, I did a bit of testing with lockperf tests that I had written on
> an earlier rework of this code:
> 
>     https://git.samba.org/jlayton/linux.git/?p=jlayton/lockperf.git;a=summary
> 
> 
> The posix01 and flock01 tests in there show about a 10x speedup with
> this set in place.
> 
> I think something closer to Neil's design will end up being what we want
> here. Consider the relatively common case where you have a whole-file
> POSIX write lock held with a bunch of different waiters blocked on it
> (all whole file write locks with different owners):
> 
> With Neil's patches, you will just wake up a single waiter when the
> blocked lock is released, as they would all be in a long chain of
> waiters.

Right, but you still need to walk the whole tree to make sure that it's
the only one you need to wake.  The tree structure means that you know
all the other locks have non-overlapping ranges, but it doesn't tell you
the lock owners.

Maybe there's some reasonable way to rule out the shared-lockowner case
more quickly too.  I haven't thought about that much.

> If you keep all the locks in a single list, you'll either have to:
> 
> a) wake up all the waiters on the list when the lock comes free: no lock
> is held at that point so none of them will conflict.
> 
> ...or...
> 
> b) keep track of what waiters have already been awoken, and compare any
> further candidate for waking against the current set of held locks and
> any lock requests by waiters that you just woke.

Instead of keeping track of *every* waiter that you've woken, you could
keep track of some subset.  Worst case that just means waking more
processes than you need to, which is wasteful but correct.

In the common case that you give, that subset could just be "the first
waiter you wake".  You'd get the same result.

The every-waiter-a-whole-file-write-lock case is pretty easy.  To
benefit from the tree you need a case where some of the waiters overlap
and some don't.

Might be worth it, sure.

--b.

> b seems more expensive as you have to walk over a larger set of locks
> on every change.
> -- 
> Jeff Layton <jlayton@kernel.org>

Jeff Layton Aug. 11, 2018, 1:15 p.m. UTC | #12

On Sat, 2018-08-11 at 08:21 -0400, J. Bruce Fields wrote:
> On Sat, Aug 11, 2018 at 07:51:13AM -0400, Jeff Layton wrote:
> > On Thu, 2018-08-09 at 20:29 -0400, J. Bruce Fields wrote:
> > > On Fri, Aug 10, 2018 at 08:12:43AM +1000, NeilBrown wrote:
> > > > On Thu, Aug 09 2018, J. Bruce Fields wrote:
> > > > 
> > > > > I think there's also a problem with multiple tasks sharing the same
> > > > > lock owner.
> > > > > 
> > > > > So, all locks are exclusive locks for the same range.  We have four
> > > > > tasks.  Tasks 1 and 4 share the same owner, the others' owners are
> > > > > distinct.
> > > > > 
> > > > > 	- Task 1 gets a lock.
> > > > > 	- Task 2 gets a conflicting lock.
> > > > > 	- Task 3 gets another conflicting lock.  So now we the tree is
> > > > > 		3->2->1.
> > > > > 	- Task 1's lock is released.
> > > > > 	- Before task 2 is scheduled, task 4 acquires a new lock.
> > > > > 	- Task 2 waits on task 4's lock, we now have
> > > > > 		3->2->4.
> > > > > 
> > > > > Task 3 shouldn't be waiting--the lock it's requesting has the same owner
> > > > > as the lock task 4 holds--but we fail to wake up task 3.
> > > > 
> > > > So task 1 and task 4 are threads in the one process - OK.
> > > > Tasks 2 and 3 are threads in two other processes.
> > > > 
> > > > So 2 and 3 conflict with either 1 or 4 equally - why should task 3 be
> > > > woken?
> > > > 
> > > > I suspect you got the numbers bit mixed up,
> > > 
> > > Whoops.
> > > 
> > > > but in any case, the "conflict()" function that is passed around takes
> > > > ownership into account when assessing if one lock conflicts with
> > > > another.
> > > 
> > > Right, I know, but, let me try again:
> > > 
> > > All locks are exclusive locks for the same range.  Only tasks 3 and 4
> > > share the the same owner.
> > > 
> > > 	- Task 1 gets a lock.
> > > 	- Task 2 requests a conflicting lock, so we have    2->1.
> > > 	- Task 3 requests a conflicting lock, so we have 3->2->1.
> > > 	- Task 1 unlocks.  We wake up task 2, but it isn't scheduled yet.
> > > 	- Task 4 gets a new lock.
> > > 	- Task 2 runs, discovers the conflict, and waits.  Now we have:
> > > 		3->2->4.
> > > 
> > > There is no conflict between the lock 3 requested and the lock 4 holds,
> > > but 3 is not woken up.
> > > 
> > > This is another version of the first problem: there's information we
> > > need (the owners of the waiting locks in the tree) that we can't
> > > determine just from looking at the root of the tree.
> > > 
> > > I'm not sure what to do about that.
> > > 
> > 
> > Is this still a problem in the v2 set?
> > 
> > wake_non_conflicts walks the whole tree of requests that were blocked on
> > it,
> 
> Not in the FL_TRANSITIVE_CONFLICT case, which is the case here.
> 

Got it. Yeah, I'm not sure that the idea of FL_TRANSITIVE_CONFLICT
really works. I think you could fix this by just getting rid of that and
folding it into the FL_CONFLICT case.

In more complex situations (like the one you describe), you may end up
cycling through several blocked requests before you hit one that can get
the lock. It may slow things down some for those cases. In the more
common locking scenarios (whole file locks, different owners), I think
this will be much more efficient by avoiding so many wakeups.

[0/5,-,V2] locks: avoid thundering-herd wake-ups

Message

Comments