mbox series

[RFC,0/2] rwsem: introduce upgrade_read interface

Message ID 20241016043600.35139-1-lizhe.67@bytedance.com (mailing list archive)
Headers show
Series rwsem: introduce upgrade_read interface | expand

Message

lizhe.67@bytedance.com Oct. 16, 2024, 4:35 a.m. UTC
From: Li Zhe <lizhe.67@bytedance.com>

In the current kernel rwsem implementation, there is an interface to
downgrade write lock to read lock, but there is no interface to upgrade
a read lock to write lock. This means that in order to acquire write
lock while holding read lock, we have to release the read lock first and
then acquire the write lock, which will introduce some troubles in
concurrent programming. This patch set provides the 'upgrade_read' interface
to solve this problem. This interface can change a read lock to a write
lock.

Li Zhe (2):
  rwsem: introduce upgrade_read interface
  khugepaged: use upgrade_read() to optimize collapse_huge_page

 include/linux/rwsem.h  |  1 +
 kernel/locking/rwsem.c | 87 ++++++++++++++++++++++++++++++++++++++++--
 mm/khugepaged.c        | 36 ++++++++---------
 3 files changed, 104 insertions(+), 20 deletions(-)

Comments

Peter Zijlstra Oct. 16, 2024, 8:09 a.m. UTC | #1
On Wed, Oct 16, 2024 at 12:35:58PM +0800, lizhe.67@bytedance.com wrote:
> From: Li Zhe <lizhe.67@bytedance.com>
> 
> In the current kernel rwsem implementation, there is an interface to
> downgrade write lock to read lock, but there is no interface to upgrade
> a read lock to write lock. This means that in order to acquire write
> lock while holding read lock, we have to release the read lock first and
> then acquire the write lock, which will introduce some troubles in
> concurrent programming. This patch set provides the 'upgrade_read' interface
> to solve this problem. This interface can change a read lock to a write
> lock.

upgrade-read is fundamentally prone to deadlocks. Imagine two concurrent
invocations, each waiting for all readers to go away before proceeding
to upgrade to a writer.

Any solution to fixing that will end up being semantically similar to
dropping the read lock and acquiring a write lock -- there will not be a
single continuous critical section.

As such, this interface makes no sense.
lizhe.67@bytedance.com Oct. 16, 2024, 8:53 a.m. UTC | #2
On Wed, 16 Oct 2024 10:09:55 +0200, peterz@infradead.org wrote:

> On Wed, Oct 16, 2024 at 12:35:58PM +0800, lizhe.67@bytedance.com wrote:
> > From: Li Zhe <lizhe.67@bytedance.com>
> > 
> > In the current kernel rwsem implementation, there is an interface to
> > downgrade write lock to read lock, but there is no interface to upgrade
> > a read lock to write lock. This means that in order to acquire write
> > lock while holding read lock, we have to release the read lock first and
> > then acquire the write lock, which will introduce some troubles in
> > concurrent programming. This patch set provides the 'upgrade_read' interface
> > to solve this problem. This interface can change a read lock to a write
> > lock.
> 
> upgrade-read is fundamentally prone to deadlocks. Imagine two concurrent
> invocations, each waiting for all readers to go away before proceeding
> to upgrade to a writer.
>
> Any solution to fixing that will end up being semantically similar to
> dropping the read lock and acquiring a write lock -- there will not be a
> single continuous critical section.

According to the implementation of this patch, one of the invocation will
get '-EBUSY' in this case. If -EBUSY is obtained and the invocation thread
continues to retry instead of dropping the read lock and acquiring a write lock,
it may cause problems. Of course, this patchset only try it's best to achieve a
single continuous critical section as much as possible, and there is no guarantee.

> As such, this interface makes no sense.

This interface is just trying to reduce the overhead caused by the additional
checks, which is caused by non-continuous critical sections, as much as possible.
Rather than eliminating it in all scenarios. So would it be better to change the
error code to something else? So that the caller will not retry this interface?
Peter Zijlstra Oct. 16, 2024, 12:10 p.m. UTC | #3
On Wed, Oct 16, 2024 at 04:53:45PM +0800, lizhe.67@bytedance.com wrote:
> On Wed, 16 Oct 2024 10:09:55 +0200, peterz@infradead.org wrote:
> 
> > On Wed, Oct 16, 2024 at 12:35:58PM +0800, lizhe.67@bytedance.com wrote:
> > > From: Li Zhe <lizhe.67@bytedance.com>
> > > 
> > > In the current kernel rwsem implementation, there is an interface to
> > > downgrade write lock to read lock, but there is no interface to upgrade
> > > a read lock to write lock. This means that in order to acquire write
> > > lock while holding read lock, we have to release the read lock first and
> > > then acquire the write lock, which will introduce some troubles in
> > > concurrent programming. This patch set provides the 'upgrade_read' interface
> > > to solve this problem. This interface can change a read lock to a write
> > > lock.
> > 
> > upgrade-read is fundamentally prone to deadlocks. Imagine two concurrent
> > invocations, each waiting for all readers to go away before proceeding
> > to upgrade to a writer.
> >
> > Any solution to fixing that will end up being semantically similar to
> > dropping the read lock and acquiring a write lock -- there will not be a
> > single continuous critical section.
> 
> According to the implementation of this patch, one of the invocation will

Since the premise as described here is utter nonsense, I didn't get to
actually reading the implementation -- why continue to waste time etc.

> get '-EBUSY' in this case. If -EBUSY is obtained and the invocation thread
> continues to retry instead of dropping the read lock and acquiring a write lock,
> it may cause problems.

Failure should drop the read lock, otherwise it is too easy to mess
things up.

> Of course, this patchset only try it's best to achieve a
> single continuous critical section as much as possible, and there is no guarantee.

As already stated, nothing like that was mentioned.

> > As such, this interface makes no sense.
> 
> This interface is just trying to reduce the overhead caused by the
> additional checks, which is caused by non-continuous critical
> sections, as much as possible.  Rather than eliminating it in all
> scenarios. So would it be better to change the error code to something
> else? So that the caller will not retry this interface?

You fail to quantify the gains. How am I supposed to know if the
(significant?) increase in complexity is worth it?

Why should I accept this increase in complexity for the sake of
khugepaged, something which I care very little about?