mbox series

[RFC,v2,0/2] pidfs: use maple tree

Message ID 20241209-work-pidfs-maple_tree-v2-0-003dbf3bd96b@kernel.org (mailing list archive)
Headers show
Series pidfs: use maple tree | expand

Message

Christian Brauner Dec. 9, 2024, 1:46 p.m. UTC
Hey,

Ok, I wanted to give this another try as I'd really like to rely on the
maple tree supporting ULONG_MAX when BITS_PER_LONG is 64 as it makes
things a lot simpler overall.

As Willy didn't want additional users relying on an external lock I made
it so that we don't have to and can just use the mtree lock.

However, I need an irq safe variant which is why I added support for
this into the maple tree.

This is pullable from
https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git work.pidfs.maple_tree

Thanks!
Christian

---
Christian Brauner (2):
      maple_tree: make MT_FLAGS_LOCK_IRQ do something
      pidfs: use maple tree

 fs/pidfs.c                 | 52 +++++++++++++++++++++++++++-------------------
 include/linux/maple_tree.h | 16 ++++++++++++--
 kernel/pid.c               | 34 +++++++++++++++---------------
 3 files changed, 62 insertions(+), 40 deletions(-)
---
base-commit: 963c8e506c6d4769d04fcb64d4bf783e4ef6093e
change-id: 20241206-work-pidfs-maple_tree-b322ff5399b7

Comments

Liam R. Howlett Dec. 13, 2024, 2:40 p.m. UTC | #1
* Christian Brauner <brauner@kernel.org> [241209 08:47]:
> Hey,
> 
> Ok, I wanted to give this another try as I'd really like to rely on the
> maple tree supporting ULONG_MAX when BITS_PER_LONG is 64 as it makes
> things a lot simpler overall.
> 
> As Willy didn't want additional users relying on an external lock I made
> it so that we don't have to and can just use the mtree lock.
> 
> However, I need an irq safe variant which is why I added support for
> this into the maple tree.
> 
> This is pullable from
> https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git work.pidfs.maple_tree


I've been meaning to respond to this thread.

I believe the flag is to tell the internal code what lock to use.  If
you look at mas_nomem(), there is a retry loop that will drop the lock
to allocate and retry the operation.  That function needs to support the
flag and use the correct lock/unlock.

The mas_lock()/mas_unlock() needs a mas_lock_irq()/mas_unlock_irq()
variant, which would be used in the correct context.

Does that make sense?

Thanks,
Liam

> 
> Thanks!
> Christian
> 
> ---
> Christian Brauner (2):
>       maple_tree: make MT_FLAGS_LOCK_IRQ do something
>       pidfs: use maple tree
> 
>  fs/pidfs.c                 | 52 +++++++++++++++++++++++++++-------------------
>  include/linux/maple_tree.h | 16 ++++++++++++--
>  kernel/pid.c               | 34 +++++++++++++++---------------
>  3 files changed, 62 insertions(+), 40 deletions(-)
> ---
> base-commit: 963c8e506c6d4769d04fcb64d4bf783e4ef6093e
> change-id: 20241206-work-pidfs-maple_tree-b322ff5399b7
>
Christian Brauner Dec. 13, 2024, 6:51 p.m. UTC | #2
On Fri, Dec 13, 2024 at 09:40:49AM -0500, Liam R. Howlett wrote:
> * Christian Brauner <brauner@kernel.org> [241209 08:47]:
> > Hey,
> > 
> > Ok, I wanted to give this another try as I'd really like to rely on the
> > maple tree supporting ULONG_MAX when BITS_PER_LONG is 64 as it makes
> > things a lot simpler overall.
> > 
> > As Willy didn't want additional users relying on an external lock I made
> > it so that we don't have to and can just use the mtree lock.
> > 
> > However, I need an irq safe variant which is why I added support for
> > this into the maple tree.
> > 
> > This is pullable from
> > https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git work.pidfs.maple_tree
> 
> 
> I've been meaning to respond to this thread.
> 
> I believe the flag is to tell the internal code what lock to use.  If
> you look at mas_nomem(), there is a retry loop that will drop the lock
> to allocate and retry the operation.  That function needs to support the
> flag and use the correct lock/unlock.
> 
> The mas_lock()/mas_unlock() needs a mas_lock_irq()/mas_unlock_irq()
> variant, which would be used in the correct context.

Yeah, it does. Did you see the patch that is included in the series?
I've replaced the macro with always inline functions that select the
lock based on the flag:

static __always_inline void mtree_lock(struct maple_tree *mt)
{
        if (mt->ma_flags & MT_FLAGS_LOCK_IRQ)
                spin_lock_irq(&mt->ma_lock);
        else
                spin_lock(&mt->ma_lock);
}
static __always_inline void mtree_unlock(struct maple_tree *mt)
{
        if (mt->ma_flags & MT_FLAGS_LOCK_IRQ)
                spin_unlock_irq(&mt->ma_lock);
        else
                spin_unlock(&mt->ma_lock);
}

Does that work for you?
Matthew Wilcox (Oracle) Dec. 13, 2024, 6:53 p.m. UTC | #3
On Fri, Dec 13, 2024 at 07:51:50PM +0100, Christian Brauner wrote:
> Yeah, it does. Did you see the patch that is included in the series?
> I've replaced the macro with always inline functions that select the
> lock based on the flag:
> 
> static __always_inline void mtree_lock(struct maple_tree *mt)
> {
>         if (mt->ma_flags & MT_FLAGS_LOCK_IRQ)
>                 spin_lock_irq(&mt->ma_lock);
>         else
>                 spin_lock(&mt->ma_lock);
> }
> static __always_inline void mtree_unlock(struct maple_tree *mt)
> {
>         if (mt->ma_flags & MT_FLAGS_LOCK_IRQ)
>                 spin_unlock_irq(&mt->ma_lock);
>         else
>                 spin_unlock(&mt->ma_lock);
> }
> 
> Does that work for you?

See the way the XArray works; we're trying to keep the two APIs as
close as possible.

The caller should use mtree_lock_irq() or mtree_lock_irqsave()
as appropriate.
Christian Brauner Dec. 13, 2024, 7:01 p.m. UTC | #4
On Fri, Dec 13, 2024 at 06:53:55PM +0000, Matthew Wilcox wrote:
> On Fri, Dec 13, 2024 at 07:51:50PM +0100, Christian Brauner wrote:
> > Yeah, it does. Did you see the patch that is included in the series?
> > I've replaced the macro with always inline functions that select the
> > lock based on the flag:
> > 
> > static __always_inline void mtree_lock(struct maple_tree *mt)
> > {
> >         if (mt->ma_flags & MT_FLAGS_LOCK_IRQ)
> >                 spin_lock_irq(&mt->ma_lock);
> >         else
> >                 spin_lock(&mt->ma_lock);
> > }
> > static __always_inline void mtree_unlock(struct maple_tree *mt)
> > {
> >         if (mt->ma_flags & MT_FLAGS_LOCK_IRQ)
> >                 spin_unlock_irq(&mt->ma_lock);
> >         else
> >                 spin_unlock(&mt->ma_lock);
> > }
> > 
> > Does that work for you?
> 
> See the way the XArray works; we're trying to keep the two APIs as
> close as possible.
> 
> The caller should use mtree_lock_irq() or mtree_lock_irqsave()
> as appropriate.

Say I need:

spin_lock_irqsave(&mt->ma_lock, flags);
mas_erase(...);
-> mas_nomem()
   -> mtree_unlock() // uses spin_unlock();
      // allocate
   -> mtree_lock() // uses spin_lock();
spin_lock_irqrestore(&mt->ma_lock, flags);

So that doesn't work, right? IOW, the maple tree does internal drop and
retake locks and they need to match the locks of the outer context.

So, I think I need a way to communicate to mas_*() what type of lock to
take, no? Any idea how you would like me to do this in case I'm not
wrong?
Christian Brauner Dec. 13, 2024, 7:25 p.m. UTC | #5
On Fri, Dec 13, 2024 at 08:01:30PM +0100, Christian Brauner wrote:
> On Fri, Dec 13, 2024 at 06:53:55PM +0000, Matthew Wilcox wrote:
> > On Fri, Dec 13, 2024 at 07:51:50PM +0100, Christian Brauner wrote:
> > > Yeah, it does. Did you see the patch that is included in the series?
> > > I've replaced the macro with always inline functions that select the
> > > lock based on the flag:
> > > 
> > > static __always_inline void mtree_lock(struct maple_tree *mt)
> > > {
> > >         if (mt->ma_flags & MT_FLAGS_LOCK_IRQ)
> > >                 spin_lock_irq(&mt->ma_lock);
> > >         else
> > >                 spin_lock(&mt->ma_lock);
> > > }
> > > static __always_inline void mtree_unlock(struct maple_tree *mt)
> > > {
> > >         if (mt->ma_flags & MT_FLAGS_LOCK_IRQ)
> > >                 spin_unlock_irq(&mt->ma_lock);
> > >         else
> > >                 spin_unlock(&mt->ma_lock);
> > > }
> > > 
> > > Does that work for you?
> > 
> > See the way the XArray works; we're trying to keep the two APIs as
> > close as possible.
> > 
> > The caller should use mtree_lock_irq() or mtree_lock_irqsave()
> > as appropriate.
> 
> Say I need:
> 
> spin_lock_irqsave(&mt->ma_lock, flags);
> mas_erase(...);
> -> mas_nomem()
>    -> mtree_unlock() // uses spin_unlock();
>       // allocate
>    -> mtree_lock() // uses spin_lock();
> spin_lock_irqrestore(&mt->ma_lock, flags);
> 
> So that doesn't work, right? IOW, the maple tree does internal drop and
> retake locks and they need to match the locks of the outer context.
> 
> So, I think I need a way to communicate to mas_*() what type of lock to
> take, no? Any idea how you would like me to do this in case I'm not
> wrong?

My first inclination has been to do it via MA_STATE() and the mas_flag
value but I'm open to any other ideas.
Christian Brauner Dec. 13, 2024, 8:11 p.m. UTC | #6
On Fri, Dec 13, 2024 at 08:25:21PM +0100, Christian Brauner wrote:
> On Fri, Dec 13, 2024 at 08:01:30PM +0100, Christian Brauner wrote:
> > On Fri, Dec 13, 2024 at 06:53:55PM +0000, Matthew Wilcox wrote:
> > > On Fri, Dec 13, 2024 at 07:51:50PM +0100, Christian Brauner wrote:
> > > > Yeah, it does. Did you see the patch that is included in the series?
> > > > I've replaced the macro with always inline functions that select the
> > > > lock based on the flag:
> > > > 
> > > > static __always_inline void mtree_lock(struct maple_tree *mt)
> > > > {
> > > >         if (mt->ma_flags & MT_FLAGS_LOCK_IRQ)
> > > >                 spin_lock_irq(&mt->ma_lock);
> > > >         else
> > > >                 spin_lock(&mt->ma_lock);
> > > > }
> > > > static __always_inline void mtree_unlock(struct maple_tree *mt)
> > > > {
> > > >         if (mt->ma_flags & MT_FLAGS_LOCK_IRQ)
> > > >                 spin_unlock_irq(&mt->ma_lock);
> > > >         else
> > > >                 spin_unlock(&mt->ma_lock);
> > > > }
> > > > 
> > > > Does that work for you?
> > > 
> > > See the way the XArray works; we're trying to keep the two APIs as
> > > close as possible.
> > > 
> > > The caller should use mtree_lock_irq() or mtree_lock_irqsave()
> > > as appropriate.
> > 
> > Say I need:
> > 
> > spin_lock_irqsave(&mt->ma_lock, flags);
> > mas_erase(...);
> > -> mas_nomem()
> >    -> mtree_unlock() // uses spin_unlock();
> >       // allocate
> >    -> mtree_lock() // uses spin_lock();
> > spin_lock_irqrestore(&mt->ma_lock, flags);
> > 
> > So that doesn't work, right? IOW, the maple tree does internal drop and
> > retake locks and they need to match the locks of the outer context.
> > 
> > So, I think I need a way to communicate to mas_*() what type of lock to
> > take, no? Any idea how you would like me to do this in case I'm not
> > wrong?
> 
> My first inclination has been to do it via MA_STATE() and the mas_flag
> value but I'm open to any other ideas.

Braino on my part as free_pid() can be called with write_lock_irq() held.
Christian Brauner Dec. 13, 2024, 8:50 p.m. UTC | #7
On Fri, Dec 13, 2024 at 09:11:04PM +0100, Christian Brauner wrote:
> On Fri, Dec 13, 2024 at 08:25:21PM +0100, Christian Brauner wrote:
> > On Fri, Dec 13, 2024 at 08:01:30PM +0100, Christian Brauner wrote:
> > > On Fri, Dec 13, 2024 at 06:53:55PM +0000, Matthew Wilcox wrote:
> > > > On Fri, Dec 13, 2024 at 07:51:50PM +0100, Christian Brauner wrote:
> > > > > Yeah, it does. Did you see the patch that is included in the series?
> > > > > I've replaced the macro with always inline functions that select the
> > > > > lock based on the flag:
> > > > > 
> > > > > static __always_inline void mtree_lock(struct maple_tree *mt)
> > > > > {
> > > > >         if (mt->ma_flags & MT_FLAGS_LOCK_IRQ)
> > > > >                 spin_lock_irq(&mt->ma_lock);
> > > > >         else
> > > > >                 spin_lock(&mt->ma_lock);
> > > > > }
> > > > > static __always_inline void mtree_unlock(struct maple_tree *mt)
> > > > > {
> > > > >         if (mt->ma_flags & MT_FLAGS_LOCK_IRQ)
> > > > >                 spin_unlock_irq(&mt->ma_lock);
> > > > >         else
> > > > >                 spin_unlock(&mt->ma_lock);
> > > > > }
> > > > > 
> > > > > Does that work for you?
> > > > 
> > > > See the way the XArray works; we're trying to keep the two APIs as
> > > > close as possible.
> > > > 
> > > > The caller should use mtree_lock_irq() or mtree_lock_irqsave()
> > > > as appropriate.
> > > 
> > > Say I need:
> > > 
> > > spin_lock_irqsave(&mt->ma_lock, flags);
> > > mas_erase(...);
> > > -> mas_nomem()
> > >    -> mtree_unlock() // uses spin_unlock();
> > >       // allocate
> > >    -> mtree_lock() // uses spin_lock();
> > > spin_lock_irqrestore(&mt->ma_lock, flags);
> > > 
> > > So that doesn't work, right? IOW, the maple tree does internal drop and
> > > retake locks and they need to match the locks of the outer context.
> > > 
> > > So, I think I need a way to communicate to mas_*() what type of lock to
> > > take, no? Any idea how you would like me to do this in case I'm not
> > > wrong?
> > 
> > My first inclination has been to do it via MA_STATE() and the mas_flag
> > value but I'm open to any other ideas.
> 
> Braino on my part as free_pid() can be called with write_lock_irq() held.

I don't think I can use the maple tree because even an mas_erase()
operation may allocate memory and that just makes it rather unpleasant
to use in e.g., free_pid(). So I think I'm going to explore using the
xarray to get the benefits of ULONG_MAX indices and I see that btrfs is
using it already for similar purposes.
Liam R. Howlett Dec. 13, 2024, 9:04 p.m. UTC | #8
* Christian Brauner <brauner@kernel.org> [241213 15:11]:
> On Fri, Dec 13, 2024 at 08:25:21PM +0100, Christian Brauner wrote:
> > On Fri, Dec 13, 2024 at 08:01:30PM +0100, Christian Brauner wrote:
> > > On Fri, Dec 13, 2024 at 06:53:55PM +0000, Matthew Wilcox wrote:
> > > > On Fri, Dec 13, 2024 at 07:51:50PM +0100, Christian Brauner wrote:
> > > > > Yeah, it does. Did you see the patch that is included in the series?
> > > > > I've replaced the macro with always inline functions that select the
> > > > > lock based on the flag:
> > > > > 
> > > > > static __always_inline void mtree_lock(struct maple_tree *mt)
> > > > > {
> > > > >         if (mt->ma_flags & MT_FLAGS_LOCK_IRQ)
> > > > >                 spin_lock_irq(&mt->ma_lock);
> > > > >         else
> > > > >                 spin_lock(&mt->ma_lock);
> > > > > }
> > > > > static __always_inline void mtree_unlock(struct maple_tree *mt)
> > > > > {
> > > > >         if (mt->ma_flags & MT_FLAGS_LOCK_IRQ)
> > > > >                 spin_unlock_irq(&mt->ma_lock);
> > > > >         else
> > > > >                 spin_unlock(&mt->ma_lock);
> > > > > }
> > > > > 
> > > > > Does that work for you?
> > > > 
> > > > See the way the XArray works; we're trying to keep the two APIs as
> > > > close as possible.
> > > > 
> > > > The caller should use mtree_lock_irq() or mtree_lock_irqsave()
> > > > as appropriate.
> > > 
> > > Say I need:
> > > 
> > > spin_lock_irqsave(&mt->ma_lock, flags);
> > > mas_erase(...);
> > > -> mas_nomem()
> > >    -> mtree_unlock() // uses spin_unlock();
> > >       // allocate
> > >    -> mtree_lock() // uses spin_lock();
> > > spin_lock_irqrestore(&mt->ma_lock, flags);
> > > 
> > > So that doesn't work, right? IOW, the maple tree does internal drop and
> > > retake locks and they need to match the locks of the outer context.
> > > 
> > > So, I think I need a way to communicate to mas_*() what type of lock to
> > > take, no? Any idea how you would like me to do this in case I'm not
> > > wrong?
> > 
> > My first inclination has been to do it via MA_STATE() and the mas_flag
> > value but I'm open to any other ideas.
> 
> Braino on my part as free_pid() can be called with write_lock_irq() held.

Instead of checking the flag inside mas_lock()/mas_unlock(), the flag is
checked in mas_nomem(), and the correct mas_lock_irq() pair would be
called there.  External callers would use the mas_lock_irq() pair
directly instead of checking the flag.

To keep the API as close as possible, we'd keep the mas_lock() the same
and add the mas_lock_irq() as well as mas_lock_type(mas, lock_type).
__xas_nomem() uses the (static) xas_lock_type() to lock/unlock for
internal translations.

Thanks,
Liam
Christian Brauner Dec. 13, 2024, 9:07 p.m. UTC | #9
On Fri, Dec 13, 2024 at 09:50:56PM +0100, Christian Brauner wrote:
> On Fri, Dec 13, 2024 at 09:11:04PM +0100, Christian Brauner wrote:
> > On Fri, Dec 13, 2024 at 08:25:21PM +0100, Christian Brauner wrote:
> > > On Fri, Dec 13, 2024 at 08:01:30PM +0100, Christian Brauner wrote:
> > > > On Fri, Dec 13, 2024 at 06:53:55PM +0000, Matthew Wilcox wrote:
> > > > > On Fri, Dec 13, 2024 at 07:51:50PM +0100, Christian Brauner wrote:
> > > > > > Yeah, it does. Did you see the patch that is included in the series?
> > > > > > I've replaced the macro with always inline functions that select the
> > > > > > lock based on the flag:
> > > > > > 
> > > > > > static __always_inline void mtree_lock(struct maple_tree *mt)
> > > > > > {
> > > > > >         if (mt->ma_flags & MT_FLAGS_LOCK_IRQ)
> > > > > >                 spin_lock_irq(&mt->ma_lock);
> > > > > >         else
> > > > > >                 spin_lock(&mt->ma_lock);
> > > > > > }
> > > > > > static __always_inline void mtree_unlock(struct maple_tree *mt)
> > > > > > {
> > > > > >         if (mt->ma_flags & MT_FLAGS_LOCK_IRQ)
> > > > > >                 spin_unlock_irq(&mt->ma_lock);
> > > > > >         else
> > > > > >                 spin_unlock(&mt->ma_lock);
> > > > > > }
> > > > > > 
> > > > > > Does that work for you?
> > > > > 
> > > > > See the way the XArray works; we're trying to keep the two APIs as
> > > > > close as possible.
> > > > > 
> > > > > The caller should use mtree_lock_irq() or mtree_lock_irqsave()
> > > > > as appropriate.
> > > > 
> > > > Say I need:
> > > > 
> > > > spin_lock_irqsave(&mt->ma_lock, flags);
> > > > mas_erase(...);
> > > > -> mas_nomem()
> > > >    -> mtree_unlock() // uses spin_unlock();
> > > >       // allocate
> > > >    -> mtree_lock() // uses spin_lock();
> > > > spin_lock_irqrestore(&mt->ma_lock, flags);
> > > > 
> > > > So that doesn't work, right? IOW, the maple tree does internal drop and
> > > > retake locks and they need to match the locks of the outer context.
> > > > 
> > > > So, I think I need a way to communicate to mas_*() what type of lock to
> > > > take, no? Any idea how you would like me to do this in case I'm not
> > > > wrong?
> > > 
> > > My first inclination has been to do it via MA_STATE() and the mas_flag
> > > value but I'm open to any other ideas.
> > 
> > Braino on my part as free_pid() can be called with write_lock_irq() held.
> 
> I don't think I can use the maple tree because even an mas_erase()
> operation may allocate memory and that just makes it rather unpleasant
> to use in e.g., free_pid(). So I think I'm going to explore using the
> xarray to get the benefits of ULONG_MAX indices and I see that btrfs is
> using it already for similar purposes.

Hm, __xa_alloc_cyclic() doesn't support ULONG_MAX indices. So any ideas
how I can proceed? Can I use the maple tree to wipe an entry at a given
index and have the guarantee it won't have to allocate memory?
Liam R. Howlett Dec. 13, 2024, 9:13 p.m. UTC | #10
* Christian Brauner <brauner@kernel.org> [241213 15:51]:
> On Fri, Dec 13, 2024 at 09:11:04PM +0100, Christian Brauner wrote:
> > On Fri, Dec 13, 2024 at 08:25:21PM +0100, Christian Brauner wrote:
> > > On Fri, Dec 13, 2024 at 08:01:30PM +0100, Christian Brauner wrote:
> > > > On Fri, Dec 13, 2024 at 06:53:55PM +0000, Matthew Wilcox wrote:
> > > > > On Fri, Dec 13, 2024 at 07:51:50PM +0100, Christian Brauner wrote:
> > > > > > Yeah, it does. Did you see the patch that is included in the series?
> > > > > > I've replaced the macro with always inline functions that select the
> > > > > > lock based on the flag:
> > > > > > 
> > > > > > static __always_inline void mtree_lock(struct maple_tree *mt)
> > > > > > {
> > > > > >         if (mt->ma_flags & MT_FLAGS_LOCK_IRQ)
> > > > > >                 spin_lock_irq(&mt->ma_lock);
> > > > > >         else
> > > > > >                 spin_lock(&mt->ma_lock);
> > > > > > }
> > > > > > static __always_inline void mtree_unlock(struct maple_tree *mt)
> > > > > > {
> > > > > >         if (mt->ma_flags & MT_FLAGS_LOCK_IRQ)
> > > > > >                 spin_unlock_irq(&mt->ma_lock);
> > > > > >         else
> > > > > >                 spin_unlock(&mt->ma_lock);
> > > > > > }
> > > > > > 
> > > > > > Does that work for you?
> > > > > 
> > > > > See the way the XArray works; we're trying to keep the two APIs as
> > > > > close as possible.
> > > > > 
> > > > > The caller should use mtree_lock_irq() or mtree_lock_irqsave()
> > > > > as appropriate.
> > > > 
> > > > Say I need:
> > > > 
> > > > spin_lock_irqsave(&mt->ma_lock, flags);
> > > > mas_erase(...);
> > > > -> mas_nomem()
> > > >    -> mtree_unlock() // uses spin_unlock();
> > > >       // allocate
> > > >    -> mtree_lock() // uses spin_lock();
> > > > spin_lock_irqrestore(&mt->ma_lock, flags);
> > > > 
> > > > So that doesn't work, right? IOW, the maple tree does internal drop and
> > > > retake locks and they need to match the locks of the outer context.
> > > > 
> > > > So, I think I need a way to communicate to mas_*() what type of lock to
> > > > take, no? Any idea how you would like me to do this in case I'm not
> > > > wrong?
> > > 
> > > My first inclination has been to do it via MA_STATE() and the mas_flag
> > > value but I'm open to any other ideas.
> > 
> > Braino on my part as free_pid() can be called with write_lock_irq() held.
> 
> I don't think I can use the maple tree because even an mas_erase()
> operation may allocate memory and that just makes it rather unpleasant
> to use in e.g., free_pid(). So I think I'm going to explore using the
> xarray to get the benefits of ULONG_MAX indices and I see that btrfs is
> using it already for similar purposes.

Can you point to the code that concerns you?  I'd like to understand the
problem better and see if there's a way around it.

By the way, I rarely use erase as that's for when people don't know the
ranges of the store.  I use a store (which overwrites) of a NULL to the
range.  This won't solve your allocation concerns.

We do have the preallocation support for a known range and value.

Thanks,
Liam
Liam R. Howlett Dec. 13, 2024, 9:16 p.m. UTC | #11
* Christian Brauner <brauner@kernel.org> [241213 16:07]:
> On Fri, Dec 13, 2024 at 09:50:56PM +0100, Christian Brauner wrote:
> > On Fri, Dec 13, 2024 at 09:11:04PM +0100, Christian Brauner wrote:
> > > On Fri, Dec 13, 2024 at 08:25:21PM +0100, Christian Brauner wrote:
> > > > On Fri, Dec 13, 2024 at 08:01:30PM +0100, Christian Brauner wrote:
> > > > > On Fri, Dec 13, 2024 at 06:53:55PM +0000, Matthew Wilcox wrote:
> > > > > > On Fri, Dec 13, 2024 at 07:51:50PM +0100, Christian Brauner wrote:
> > > > > > > Yeah, it does. Did you see the patch that is included in the series?
> > > > > > > I've replaced the macro with always inline functions that select the
> > > > > > > lock based on the flag:
> > > > > > > 
> > > > > > > static __always_inline void mtree_lock(struct maple_tree *mt)
> > > > > > > {
> > > > > > >         if (mt->ma_flags & MT_FLAGS_LOCK_IRQ)
> > > > > > >                 spin_lock_irq(&mt->ma_lock);
> > > > > > >         else
> > > > > > >                 spin_lock(&mt->ma_lock);
> > > > > > > }
> > > > > > > static __always_inline void mtree_unlock(struct maple_tree *mt)
> > > > > > > {
> > > > > > >         if (mt->ma_flags & MT_FLAGS_LOCK_IRQ)
> > > > > > >                 spin_unlock_irq(&mt->ma_lock);
> > > > > > >         else
> > > > > > >                 spin_unlock(&mt->ma_lock);
> > > > > > > }
> > > > > > > 
> > > > > > > Does that work for you?
> > > > > > 
> > > > > > See the way the XArray works; we're trying to keep the two APIs as
> > > > > > close as possible.
> > > > > > 
> > > > > > The caller should use mtree_lock_irq() or mtree_lock_irqsave()
> > > > > > as appropriate.
> > > > > 
> > > > > Say I need:
> > > > > 
> > > > > spin_lock_irqsave(&mt->ma_lock, flags);
> > > > > mas_erase(...);
> > > > > -> mas_nomem()
> > > > >    -> mtree_unlock() // uses spin_unlock();
> > > > >       // allocate
> > > > >    -> mtree_lock() // uses spin_lock();
> > > > > spin_lock_irqrestore(&mt->ma_lock, flags);
> > > > > 
> > > > > So that doesn't work, right? IOW, the maple tree does internal drop and
> > > > > retake locks and they need to match the locks of the outer context.
> > > > > 
> > > > > So, I think I need a way to communicate to mas_*() what type of lock to
> > > > > take, no? Any idea how you would like me to do this in case I'm not
> > > > > wrong?
> > > > 
> > > > My first inclination has been to do it via MA_STATE() and the mas_flag
> > > > value but I'm open to any other ideas.
> > > 
> > > Braino on my part as free_pid() can be called with write_lock_irq() held.
> > 
> > I don't think I can use the maple tree because even an mas_erase()
> > operation may allocate memory and that just makes it rather unpleasant
> > to use in e.g., free_pid(). So I think I'm going to explore using the
> > xarray to get the benefits of ULONG_MAX indices and I see that btrfs is
> > using it already for similar purposes.
> 
> Hm, __xa_alloc_cyclic() doesn't support ULONG_MAX indices. So any ideas
> how I can proceed? Can I use the maple tree to wipe an entry at a given
> index and have the guarantee it won't have to allocate memory?

There are two ways you can do this:
1. preallocate, then store in the locked area.
2. store XA_ZERO_ENTRY and translate that to NULL on reading.
Christian Brauner Dec. 14, 2024, 11:48 a.m. UTC | #12
On Fri, Dec 13, 2024 at 04:04:40PM -0500, Liam R. Howlett wrote:
> * Christian Brauner <brauner@kernel.org> [241213 15:11]:
> > On Fri, Dec 13, 2024 at 08:25:21PM +0100, Christian Brauner wrote:
> > > On Fri, Dec 13, 2024 at 08:01:30PM +0100, Christian Brauner wrote:
> > > > On Fri, Dec 13, 2024 at 06:53:55PM +0000, Matthew Wilcox wrote:
> > > > > On Fri, Dec 13, 2024 at 07:51:50PM +0100, Christian Brauner wrote:
> > > > > > Yeah, it does. Did you see the patch that is included in the series?
> > > > > > I've replaced the macro with always inline functions that select the
> > > > > > lock based on the flag:
> > > > > > 
> > > > > > static __always_inline void mtree_lock(struct maple_tree *mt)
> > > > > > {
> > > > > >         if (mt->ma_flags & MT_FLAGS_LOCK_IRQ)
> > > > > >                 spin_lock_irq(&mt->ma_lock);
> > > > > >         else
> > > > > >                 spin_lock(&mt->ma_lock);
> > > > > > }
> > > > > > static __always_inline void mtree_unlock(struct maple_tree *mt)
> > > > > > {
> > > > > >         if (mt->ma_flags & MT_FLAGS_LOCK_IRQ)
> > > > > >                 spin_unlock_irq(&mt->ma_lock);
> > > > > >         else
> > > > > >                 spin_unlock(&mt->ma_lock);
> > > > > > }
> > > > > > 
> > > > > > Does that work for you?
> > > > > 
> > > > > See the way the XArray works; we're trying to keep the two APIs as
> > > > > close as possible.
> > > > > 
> > > > > The caller should use mtree_lock_irq() or mtree_lock_irqsave()
> > > > > as appropriate.
> > > > 
> > > > Say I need:
> > > > 
> > > > spin_lock_irqsave(&mt->ma_lock, flags);
> > > > mas_erase(...);
> > > > -> mas_nomem()
> > > >    -> mtree_unlock() // uses spin_unlock();
> > > >       // allocate
> > > >    -> mtree_lock() // uses spin_lock();
> > > > spin_lock_irqrestore(&mt->ma_lock, flags);
> > > > 
> > > > So that doesn't work, right? IOW, the maple tree does internal drop and
> > > > retake locks and they need to match the locks of the outer context.
> > > > 
> > > > So, I think I need a way to communicate to mas_*() what type of lock to
> > > > take, no? Any idea how you would like me to do this in case I'm not
> > > > wrong?
> > > 
> > > My first inclination has been to do it via MA_STATE() and the mas_flag
> > > value but I'm open to any other ideas.
> > 
> > Braino on my part as free_pid() can be called with write_lock_irq() held.
> 
> Instead of checking the flag inside mas_lock()/mas_unlock(), the flag is
> checked in mas_nomem(), and the correct mas_lock_irq() pair would be
> called there.  External callers would use the mas_lock_irq() pair
> directly instead of checking the flag.

I'm probably being dense but say I have two different locations with two
different locking requirements - as is the case with alloc_pid() and
free_pid(). alloc_pid() just uses spin_lock_irq() and spin_unlock_irq()
but free_pid() requires spin_lock_irqsave() and
spin_unlock_irqrestore(). If the whole mtree is marked with a flag that
tells the mtree_lock() to use spin_lock_irq() then it will use that also
in free_pid() where it should use spin_lock_irqsave().

So if I'm not completely off-track then we'd need a way to tell the
mtree to use different locks for different callsites. Or at least
override the locking requirements for specific calls by e.g., allowing a
flag to be specified in the MA_STATE() struct that's checke by e.g.,
mas_nomem().

> To keep the API as close as possible, we'd keep the mas_lock() the same
> and add the mas_lock_irq() as well as mas_lock_type(mas, lock_type).
> __xas_nomem() uses the (static) xas_lock_type() to lock/unlock for
> internal translations.
> 
> Thanks,
> Liam
Liam R. Howlett Dec. 17, 2024, 5:41 p.m. UTC | #13
* Christian Brauner <brauner@kernel.org> [241214 06:48]:
> On Fri, Dec 13, 2024 at 04:04:40PM -0500, Liam R. Howlett wrote:
> > * Christian Brauner <brauner@kernel.org> [241213 15:11]:
> > > On Fri, Dec 13, 2024 at 08:25:21PM +0100, Christian Brauner wrote:
> > > > On Fri, Dec 13, 2024 at 08:01:30PM +0100, Christian Brauner wrote:
> > > > > On Fri, Dec 13, 2024 at 06:53:55PM +0000, Matthew Wilcox wrote:
> > > > > > On Fri, Dec 13, 2024 at 07:51:50PM +0100, Christian Brauner wrote:
> > > > > > > Yeah, it does. Did you see the patch that is included in the series?
> > > > > > > I've replaced the macro with always inline functions that select the
> > > > > > > lock based on the flag:
> > > > > > > 
> > > > > > > static __always_inline void mtree_lock(struct maple_tree *mt)
> > > > > > > {
> > > > > > >         if (mt->ma_flags & MT_FLAGS_LOCK_IRQ)
> > > > > > >                 spin_lock_irq(&mt->ma_lock);
> > > > > > >         else
> > > > > > >                 spin_lock(&mt->ma_lock);
> > > > > > > }
> > > > > > > static __always_inline void mtree_unlock(struct maple_tree *mt)
> > > > > > > {
> > > > > > >         if (mt->ma_flags & MT_FLAGS_LOCK_IRQ)
> > > > > > >                 spin_unlock_irq(&mt->ma_lock);
> > > > > > >         else
> > > > > > >                 spin_unlock(&mt->ma_lock);
> > > > > > > }
> > > > > > > 
> > > > > > > Does that work for you?
> > > > > > 
> > > > > > See the way the XArray works; we're trying to keep the two APIs as
> > > > > > close as possible.
> > > > > > 
> > > > > > The caller should use mtree_lock_irq() or mtree_lock_irqsave()
> > > > > > as appropriate.
> > > > > 
> > > > > Say I need:
> > > > > 
> > > > > spin_lock_irqsave(&mt->ma_lock, flags);
> > > > > mas_erase(...);
> > > > > -> mas_nomem()
> > > > >    -> mtree_unlock() // uses spin_unlock();
> > > > >       // allocate
> > > > >    -> mtree_lock() // uses spin_lock();
> > > > > spin_lock_irqrestore(&mt->ma_lock, flags);
> > > > > 
> > > > > So that doesn't work, right? IOW, the maple tree does internal drop and
> > > > > retake locks and they need to match the locks of the outer context.
> > > > > 
> > > > > So, I think I need a way to communicate to mas_*() what type of lock to
> > > > > take, no? Any idea how you would like me to do this in case I'm not
> > > > > wrong?
> > > > 
> > > > My first inclination has been to do it via MA_STATE() and the mas_flag
> > > > value but I'm open to any other ideas.
> > > 
> > > Braino on my part as free_pid() can be called with write_lock_irq() held.
> > 
> > Instead of checking the flag inside mas_lock()/mas_unlock(), the flag is
> > checked in mas_nomem(), and the correct mas_lock_irq() pair would be
> > called there.  External callers would use the mas_lock_irq() pair
> > directly instead of checking the flag.
> 
> I'm probably being dense but say I have two different locations with two
> different locking requirements - as is the case with alloc_pid() and
> free_pid(). alloc_pid() just uses spin_lock_irq() and spin_unlock_irq()
> but free_pid() requires spin_lock_irqsave() and
> spin_unlock_irqrestore(). If the whole mtree is marked with a flag that
> tells the mtree_lock() to use spin_lock_irq() then it will use that also
> in free_pid() where it should use spin_lock_irqsave().
> 
> So if I'm not completely off-track then we'd need a way to tell the
> mtree to use different locks for different callsites. Or at least
> override the locking requirements for specific calls by e.g., allowing a
> flag to be specified in the MA_STATE() struct that's checke by e.g.,
> mas_nomem().

The xarray seems to not support different lock types like this.

It seems to support IRQ, BH, and just the spinlock.

xa_lock_irqsave() exists and is used.  I think it enables/disables
without restoring then restores at the end.  Either that, or there
aren't writes during the irqsave/irqrestore locking sections that needs
to allocate..

You could preallocate for the free_pid() side and handle it there?

ie:

retry:
mas_set(mas, pid);
mas_lock_irqsave(mas, saved);
if (mas_preallocate(mas, NULL, gfp)) {
    mas_unlock_irqrestore(mas, saved);
    /* something about reclaim */
    goto retry;
}
mas_erase(mas); /* preallocated, nomem won't happen */
mas_unlock_irqrestores(mas, saved);



> 
> > To keep the API as close as possible, we'd keep the mas_lock() the same
> > and add the mas_lock_irq() as well as mas_lock_type(mas, lock_type).
> > __xas_nomem() uses the (static) xas_lock_type() to lock/unlock for
> > internal translations.
> > 
> > Thanks,
> > Liam