[drm-next,v2,04/16] maple_tree: add flag MT_FLAGS_LOCK_NONE

Message ID	20230217134422.14116-5-dakr@redhat.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <dri-devel-bounces@lists.freedesktop.org> From: Danilo Krummrich <dakr@redhat.com> To: airlied@gmail.com, daniel@ffwll.ch, tzimmermann@suse.de, mripard@kernel.org, corbet@lwn.net, christian.koenig@amd.com, bskeggs@redhat.com, Liam.Howlett@oracle.com, matthew.brost@intel.com, boris.brezillon@collabora.com, alexdeucher@gmail.com, ogabbay@kernel.org, bagasdotme@gmail.com, willy@infradead.org, jason@jlekstrand.net Subject: [PATCH drm-next v2 04/16] maple_tree: add flag MT_FLAGS_LOCK_NONE Date: Fri, 17 Feb 2023 14:44:10 +0100 Message-Id: <20230217134422.14116-5-dakr@redhat.com> In-Reply-To: <20230217134422.14116-1-dakr@redhat.com> References: <20230217134422.14116-1-dakr@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset="US-ASCII"; x-default=true Precedence: list Cc: linux-doc@vger.kernel.org, nouveau@lists.freedesktop.org, linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-mm@kvack.org, Danilo Krummrich <dakr@redhat.com> Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>
Series	DRM GPUVA Manager & Nouveau VM_BIND UAPI \| expand [drm-next,v2,00/16,RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI [drm-next,v2,01/16] drm: execution context for GEM buffers [drm-next,v2,02/16] drm/exec: fix memory leak in drm_exec_prepare_obj() [drm-next,v2,03/16] maple_tree: split up MA_STATE() macro [drm-next,v2,04/16] maple_tree: add flag MT_FLAGS_LOCK_NONE [drm-next,v2,05/16] drm: manager to keep track of GPUs VA mappings [drm-next,v2,06/16] drm: debugfs: provide infrastructure to dump a DRM GPU VA space [drm-next,v2,07/16] drm/nouveau: new VM_BIND uapi interfaces [drm-next,v2,08/16] drm/nouveau: get vmm via nouveau_cli_vmm() [drm-next,v2,09/16] drm/nouveau: bo: initialize GEM GPU VA interface [drm-next,v2,10/16] drm/nouveau: move usercopy helpers to nouveau_drv.h [drm-next,v2,11/16] drm/nouveau: fence: fail to emit when fence context is killed [drm-next,v2,12/16] drm/nouveau: chan: provide nouveau_channel_kill() [drm-next,v2,13/16] drm/nouveau: nvkm/vmm: implement raw ops to manage uvmm [drm-next,v2,14/16] drm/nouveau: implement uvmm for user mode bindings [drm-next,v2,15/16] drm/nouveau: implement new VM_BIND UAPI [drm-next,v2,16/16] drm/nouveau: debugfs: implement DRM GPU VA debugfs

Danilo Krummrich Feb. 17, 2023, 1:44 p.m. UTC

Generic components making use of the maple tree (such as the
DRM GPUVA Manager) delegate the responsibility of ensuring mutual
exclusion to their users.

While such components could inherit the concept of an external lock,
some users might just serialize the access to the component and hence to
the internal maple tree.

In order to allow such use cases, add a new flag MT_FLAGS_LOCK_NONE to
indicate not to do any internal lockdep checks.

Signed-off-by: Danilo Krummrich <dakr@redhat.com>
---
 include/linux/maple_tree.h | 20 +++++++++++++++-----
 lib/maple_tree.c           |  7 ++++---
 2 files changed, 19 insertions(+), 8 deletions(-)

Liam R. Howlett Feb. 17, 2023, 6:18 p.m. UTC | #1

* Danilo Krummrich <dakr@redhat.com> [230217 08:44]:
> Generic components making use of the maple tree (such as the
> DRM GPUVA Manager) delegate the responsibility of ensuring mutual
> exclusion to their users.
> 
> While such components could inherit the concept of an external lock,
> some users might just serialize the access to the component and hence to
> the internal maple tree.
> 
> In order to allow such use cases, add a new flag MT_FLAGS_LOCK_NONE to
> indicate not to do any internal lockdep checks.
> 
> Signed-off-by: Danilo Krummrich <dakr@redhat.com>
> ---
>  include/linux/maple_tree.h | 20 +++++++++++++++-----
>  lib/maple_tree.c           |  7 ++++---
>  2 files changed, 19 insertions(+), 8 deletions(-)
> 
> diff --git a/include/linux/maple_tree.h b/include/linux/maple_tree.h
> index ca04c900e51a..f795e5def8d0 100644
> --- a/include/linux/maple_tree.h
> +++ b/include/linux/maple_tree.h
> @@ -170,10 +170,11 @@ enum maple_type {
>  #define MT_FLAGS_USE_RCU	0x02
>  #define MT_FLAGS_HEIGHT_OFFSET	0x02
>  #define MT_FLAGS_HEIGHT_MASK	0x7C
> -#define MT_FLAGS_LOCK_MASK	0x300
> +#define MT_FLAGS_LOCK_MASK	0x700
>  #define MT_FLAGS_LOCK_IRQ	0x100
>  #define MT_FLAGS_LOCK_BH	0x200
>  #define MT_FLAGS_LOCK_EXTERN	0x300
> +#define MT_FLAGS_LOCK_NONE	0x400

Please add this to the documentation above the flags as well.  We should
probably add enough context so that users don't just set this and then
use multiple writers.

>  
>  #define MAPLE_HEIGHT_MAX	31
>  
> @@ -559,11 +560,16 @@ static inline void mas_set(struct ma_state *mas, unsigned long index)
>  	mas_set_range(mas, index, index);
>  }
>  
> -static inline bool mt_external_lock(const struct maple_tree *mt)
> +static inline bool mt_lock_external(const struct maple_tree *mt)
>  {
>  	return (mt->ma_flags & MT_FLAGS_LOCK_MASK) == MT_FLAGS_LOCK_EXTERN;
>  }
>  
> +static inline bool mt_lock_none(const struct maple_tree *mt)
> +{
> +	return (mt->ma_flags & MT_FLAGS_LOCK_MASK) == MT_FLAGS_LOCK_NONE;
> +}
> +
>  /**
>   * mt_init_flags() - Initialise an empty maple tree with flags.
>   * @mt: Maple Tree
> @@ -577,7 +583,7 @@ static inline bool mt_external_lock(const struct maple_tree *mt)
>  static inline void mt_init_flags(struct maple_tree *mt, unsigned int flags)
>  {
>  	mt->ma_flags = flags;
> -	if (!mt_external_lock(mt))
> +	if (!mt_lock_external(mt) && !mt_lock_none(mt))
>  		spin_lock_init(&mt->ma_lock);
>  	rcu_assign_pointer(mt->ma_root, NULL);
>  }
> @@ -612,9 +618,11 @@ static inline void mt_clear_in_rcu(struct maple_tree *mt)
>  	if (!mt_in_rcu(mt))
>  		return;
>  
> -	if (mt_external_lock(mt)) {
> +	if (mt_lock_external(mt)) {
>  		BUG_ON(!mt_lock_is_held(mt));
>  		mt->ma_flags &= ~MT_FLAGS_USE_RCU;
> +	} else if (mt_lock_none(mt)) {
> +		mt->ma_flags &= ~MT_FLAGS_USE_RCU;
>  	} else {
>  		mtree_lock(mt);
>  		mt->ma_flags &= ~MT_FLAGS_USE_RCU;
> @@ -631,9 +639,11 @@ static inline void mt_set_in_rcu(struct maple_tree *mt)
>  	if (mt_in_rcu(mt))
>  		return;
>  
> -	if (mt_external_lock(mt)) {
> +	if (mt_lock_external(mt)) {
>  		BUG_ON(!mt_lock_is_held(mt));
>  		mt->ma_flags |= MT_FLAGS_USE_RCU;
> +	} else if (mt_lock_none(mt)) {
> +		mt->ma_flags |= MT_FLAGS_USE_RCU;
>  	} else {
>  		mtree_lock(mt);
>  		mt->ma_flags |= MT_FLAGS_USE_RCU;
> diff --git a/lib/maple_tree.c b/lib/maple_tree.c
> index 26e2045d3cda..f51c0fd4eaad 100644
> --- a/lib/maple_tree.c
> +++ b/lib/maple_tree.c
> @@ -802,8 +802,8 @@ static inline void __rcu **ma_slots(struct maple_node *mn, enum maple_type mt)
>  
>  static inline bool mt_locked(const struct maple_tree *mt)
>  {
> -	return mt_external_lock(mt) ? mt_lock_is_held(mt) :
> -		lockdep_is_held(&mt->ma_lock);
> +	return mt_lock_external(mt) ? mt_lock_is_held(mt) :
> +		mt_lock_none(mt) ? true : lockdep_is_held(&mt->ma_lock);

It might be better to just make this two return statements for clarity.

>  }
>  
>  static inline void *mt_slot(const struct maple_tree *mt,
> @@ -6120,7 +6120,8 @@ bool mas_nomem(struct ma_state *mas, gfp_t gfp)
>  		return false;
>  	}
>  
> -	if (gfpflags_allow_blocking(gfp) && !mt_external_lock(mas->tree)) {
> +	if (gfpflags_allow_blocking(gfp) &&
> +	    !mt_lock_external(mas->tree) && !mt_lock_none(mas->tree)) {
>  		mtree_unlock(mas->tree);
>  		mas_alloc_nodes(mas, gfp);
>  		mtree_lock(mas->tree);
> -- 
> 2.39.1
>

Matthew Wilcox Feb. 17, 2023, 7:38 p.m. UTC | #2

On Fri, Feb 17, 2023 at 02:44:10PM +0100, Danilo Krummrich wrote:
> Generic components making use of the maple tree (such as the
> DRM GPUVA Manager) delegate the responsibility of ensuring mutual
> exclusion to their users.
> 
> While such components could inherit the concept of an external lock,
> some users might just serialize the access to the component and hence to
> the internal maple tree.
> 
> In order to allow such use cases, add a new flag MT_FLAGS_LOCK_NONE to
> indicate not to do any internal lockdep checks.

I'm really against this change.

First, we really should check that users have their locking right.
It's bitten us so many times when they get it wrong.

Second, having a lock allows us to defragment the slab cache.  The
patches to do that haven't gone anywhere recently, but if we drop the
requirement now, we'll never be able to compact ranges of memory that
have slabs allocated to them.

Danilo Krummrich Feb. 20, 2023, 2 p.m. UTC | #3

On 2/17/23 20:38, Matthew Wilcox wrote:
> On Fri, Feb 17, 2023 at 02:44:10PM +0100, Danilo Krummrich wrote:
>> Generic components making use of the maple tree (such as the
>> DRM GPUVA Manager) delegate the responsibility of ensuring mutual
>> exclusion to their users.
>>
>> While such components could inherit the concept of an external lock,
>> some users might just serialize the access to the component and hence to
>> the internal maple tree.
>>
>> In order to allow such use cases, add a new flag MT_FLAGS_LOCK_NONE to
>> indicate not to do any internal lockdep checks.
> 
> I'm really against this change.
> 
> First, we really should check that users have their locking right.
> It's bitten us so many times when they get it wrong.

In case of the DRM GPUVA manager, some users might serialize the access 
to the GPUVA manager and hence to it's maple tree instances, e.g. 
through the drm_gpu_scheduler. In such a case ensuring to hold a lock 
would be a bit pointless and I wouldn't really know how to "sell" this 
to potential users of the GPUVA manager.

> 
> Second, having a lock allows us to defragment the slab cache.  The
> patches to do that haven't gone anywhere recently, but if we drop the
> requirement now, we'll never be able to compact ranges of memory that
> have slabs allocated to them.
> 

Not sure if I get that, do you mind explaining a bit how this would 
affect other users of the maple tree, such as my use case, the GPUVA 
manager?

Matthew Wilcox Feb. 20, 2023, 3:10 p.m. UTC | #4

On Mon, Feb 20, 2023 at 03:00:59PM +0100, Danilo Krummrich wrote:
> On 2/17/23 20:38, Matthew Wilcox wrote:
> > On Fri, Feb 17, 2023 at 02:44:10PM +0100, Danilo Krummrich wrote:
> > > Generic components making use of the maple tree (such as the
> > > DRM GPUVA Manager) delegate the responsibility of ensuring mutual
> > > exclusion to their users.
> > > 
> > > While such components could inherit the concept of an external lock,
> > > some users might just serialize the access to the component and hence to
> > > the internal maple tree.
> > > 
> > > In order to allow such use cases, add a new flag MT_FLAGS_LOCK_NONE to
> > > indicate not to do any internal lockdep checks.
> > 
> > I'm really against this change.
> > 
> > First, we really should check that users have their locking right.
> > It's bitten us so many times when they get it wrong.
> 
> In case of the DRM GPUVA manager, some users might serialize the access to
> the GPUVA manager and hence to it's maple tree instances, e.g. through the
> drm_gpu_scheduler. In such a case ensuring to hold a lock would be a bit
> pointless and I wouldn't really know how to "sell" this to potential users
> of the GPUVA manager.

This is why we like people to use the spinlock embedded in the tree.
There's nothing for the user to care about.  If the access really is
serialised, acquiring/releasing the uncontended spinlock is a minimal
cost compared to all the other things that will happen while modifying
the tree.

> > Second, having a lock allows us to defragment the slab cache.  The
> > patches to do that haven't gone anywhere recently, but if we drop the
> > requirement now, we'll never be able to compact ranges of memory that
> > have slabs allocated to them.
> > 
> 
> Not sure if I get that, do you mind explaining a bit how this would affect
> other users of the maple tree, such as my use case, the GPUVA manager?

When we want to free a slab in order to defragment memory, we need
to relocate all the objects allocated within that slab.  To do that
for the maple tree node cache, for each node in this particular slab,
we'll need to walk up to the top of the tree and lock it.  We can then
allocate a new node from a different slab, change the parent to point
to the new node and drop the lock.  After an RCU delay, we can free the
slab and create a larger contiguous block of memory.

As I said, this is somewhat hypothetical in that there's no current
code in the tree to reclaim slabs when we're trying to defragment
memory.  And that's because it's hard to do.  The XArray and maple
tree were designed to make it possible for their slabs.

Danilo Krummrich Feb. 20, 2023, 5:06 p.m. UTC | #5

On 2/20/23 16:10, Matthew Wilcox wrote:
> On Mon, Feb 20, 2023 at 03:00:59PM +0100, Danilo Krummrich wrote:
>> On 2/17/23 20:38, Matthew Wilcox wrote:
>>> On Fri, Feb 17, 2023 at 02:44:10PM +0100, Danilo Krummrich wrote:
>>>> Generic components making use of the maple tree (such as the
>>>> DRM GPUVA Manager) delegate the responsibility of ensuring mutual
>>>> exclusion to their users.
>>>>
>>>> While such components could inherit the concept of an external lock,
>>>> some users might just serialize the access to the component and hence to
>>>> the internal maple tree.
>>>>
>>>> In order to allow such use cases, add a new flag MT_FLAGS_LOCK_NONE to
>>>> indicate not to do any internal lockdep checks.
>>>
>>> I'm really against this change.
>>>
>>> First, we really should check that users have their locking right.
>>> It's bitten us so many times when they get it wrong.
>>
>> In case of the DRM GPUVA manager, some users might serialize the access to
>> the GPUVA manager and hence to it's maple tree instances, e.g. through the
>> drm_gpu_scheduler. In such a case ensuring to hold a lock would be a bit
>> pointless and I wouldn't really know how to "sell" this to potential users
>> of the GPUVA manager.
> 
> This is why we like people to use the spinlock embedded in the tree.
> There's nothing for the user to care about.  If the access really is
> serialised, acquiring/releasing the uncontended spinlock is a minimal
> cost compared to all the other things that will happen while modifying
> the tree.

I think as for the users of the GPUVA manager we'd have two cases:

1) Accesses to the manager (and hence the tree) are serialized, no lock 
needed.

2) Multiple operations on the tree must be locked in order to make them 
appear atomic.

In either case the embedded spinlock wouldn't be useful, we'd either 
need an external lock or no lock at all.

If there are any internal reasons why specific tree operations must be 
mutually excluded (such as those you explain below), wouldn't it make 
more sense to always have the internal lock and, optionally, allow users 
to specify an external lock additionally?

> 
>>> Second, having a lock allows us to defragment the slab cache.  The
>>> patches to do that haven't gone anywhere recently, but if we drop the
>>> requirement now, we'll never be able to compact ranges of memory that
>>> have slabs allocated to them.
>>>
>>
>> Not sure if I get that, do you mind explaining a bit how this would affect
>> other users of the maple tree, such as my use case, the GPUVA manager?
> 
> When we want to free a slab in order to defragment memory, we need
> to relocate all the objects allocated within that slab.  To do that
> for the maple tree node cache, for each node in this particular slab,
> we'll need to walk up to the top of the tree and lock it.  We can then
> allocate a new node from a different slab, change the parent to point
> to the new node and drop the lock.  After an RCU delay, we can free the
> slab and create a larger contiguous block of memory.
> 
> As I said, this is somewhat hypothetical in that there's no current
> code in the tree to reclaim slabs when we're trying to defragment
> memory.  And that's because it's hard to do.  The XArray and maple
> tree were designed to make it possible for their slabs.
>

Matthew Wilcox Feb. 20, 2023, 8:33 p.m. UTC | #6

On Mon, Feb 20, 2023 at 06:06:03PM +0100, Danilo Krummrich wrote:
> On 2/20/23 16:10, Matthew Wilcox wrote:
> > This is why we like people to use the spinlock embedded in the tree.
> > There's nothing for the user to care about.  If the access really is
> > serialised, acquiring/releasing the uncontended spinlock is a minimal
> > cost compared to all the other things that will happen while modifying
> > the tree.
> 
> I think as for the users of the GPUVA manager we'd have two cases:
> 
> 1) Accesses to the manager (and hence the tree) are serialized, no lock
> needed.
> 
> 2) Multiple operations on the tree must be locked in order to make them
> appear atomic.

Could you give an example here of what you'd like to do?  Ideally
something complicated so I don't say "Oh, you can just do this" when
there's a more complex example for which "this" won't work.  I'm sure
that's embedded somewhere in the next 20-odd patches, but it's probably
quicker for you to describe in terms of tree operations that have to
appear atomic than for me to try to figure it out.

> In either case the embedded spinlock wouldn't be useful, we'd either need an
> external lock or no lock at all.
> 
> If there are any internal reasons why specific tree operations must be
> mutually excluded (such as those you explain below), wouldn't it make more
> sense to always have the internal lock and, optionally, allow users to
> specify an external lock additionally?

So the way this works for the XArray, which is a little older than the
Maple tree, is that we always use the internal spinlock for
modifications (possibly BH or IRQ safe), and if someone wants to
use an external mutex to make some callers atomic with respect to each
other, they're free to do so.  In that case, the XArray doesn't check
the user's external locking at all, because it really can't know.

I'd advise taking that approach; if there's really no way to use the
internal spinlock to make your complicated updates appear atomic
then just let the maple tree use its internal spinlock, and you can
also use your external mutex however you like.

Danilo Krummrich Feb. 21, 2023, 2:37 p.m. UTC | #7

On Mon, Feb 20, 2023 at 08:33:35PM +0000, Matthew Wilcox wrote:
> On Mon, Feb 20, 2023 at 06:06:03PM +0100, Danilo Krummrich wrote:
> > On 2/20/23 16:10, Matthew Wilcox wrote:
> > > This is why we like people to use the spinlock embedded in the tree.
> > > There's nothing for the user to care about.  If the access really is
> > > serialised, acquiring/releasing the uncontended spinlock is a minimal
> > > cost compared to all the other things that will happen while modifying
> > > the tree.
> > 
> > I think as for the users of the GPUVA manager we'd have two cases:
> > 
> > 1) Accesses to the manager (and hence the tree) are serialized, no lock
> > needed.
> > 
> > 2) Multiple operations on the tree must be locked in order to make them
> > appear atomic.
> 
> Could you give an example here of what you'd like to do?  Ideally
> something complicated so I don't say "Oh, you can just do this" when
> there's a more complex example for which "this" won't work.  I'm sure
> that's embedded somewhere in the next 20-odd patches, but it's probably
> quicker for you to describe in terms of tree operations that have to
> appear atomic than for me to try to figure it out.
> 

Absolutely, not gonna ask you to read all of that. :-)

One thing the GPUVA manager does is to provide drivers the (sub-)operations
that need to be processed in order to fulfill a map or unmap request from
userspace. For instance, when userspace asks the driver to map some memory
the GPUVA manager calculates which existing mappings must be removed, split up
or can be merged with the newly requested mapping.

A driver has two ways to fetch those operations from the GPUVA manager. It can
either obtain a list of operations or receive a callback for each operation
generated by the GPUVA manager.

In both cases the GPUVA manager walks the maple tree, which keeps track of
existing mappings, for the given range in __drm_gpuva_sm_map() (only considering
the map case, since the unmap case is a subset basically). For each mapping
found in the given range the driver, as mentioned, either receives a callback or
a list entry is added to the list of operations.

Typically, for each operation / callback one entry within the maple tree is
removed and, optionally at the beginning and end of a new mapping's range, a
new entry is inserted. An of course, as the last operation, there is the new
mapping itself to insert.

The GPUVA manager delegates locking responsibility to the drivers. Typically,
a driver either serializes access to the VA space managed by the GPUVA manager
(no lock needed) or need to lock the processing of a full set of operations
generated by the GPUVA manager.

> > In either case the embedded spinlock wouldn't be useful, we'd either need an
> > external lock or no lock at all.
> > 
> > If there are any internal reasons why specific tree operations must be
> > mutually excluded (such as those you explain below), wouldn't it make more
> > sense to always have the internal lock and, optionally, allow users to
> > specify an external lock additionally?
> 
> So the way this works for the XArray, which is a little older than the
> Maple tree, is that we always use the internal spinlock for
> modifications (possibly BH or IRQ safe), and if someone wants to
> use an external mutex to make some callers atomic with respect to each
> other, they're free to do so.  In that case, the XArray doesn't check
> the user's external locking at all, because it really can't know.
> 
> I'd advise taking that approach; if there's really no way to use the
> internal spinlock to make your complicated updates appear atomic
> then just let the maple tree use its internal spinlock, and you can
> also use your external mutex however you like.
> 

That sounds like the right thing to do.

However, I'm using the advanced API of the maple tree (and that's the reason
why the above example appears a little more detailed than needed) because I
think with the normal API I can't insert / remove tree entries while walking
the tree, right?

As by the documentation the advanced API, however, doesn't take care of locking
itself, hence just letting the maple tree use its internal spinlock doesn't
really work - I need to take care of that myself, right?

It feels a bit weird that I, as a user of the API, would need to lock certain
(or all?) mas_*() functions with the internal spinlock in order to protect
(future) internal features of the tree, such as the slab cache defragmentation
you mentioned. Because from my perspective, as the generic component that tells
it's users (the drivers) to take care of locking VA space operations (and hence
tree operations) I don't have an own purpose of this internal spinlock, right?

Also I'm a little confused how I'd know where to take the spinlock? E.g. for
inserting entries in the tree I use mas_store_gfp() with GFP_KERNEL.

Matthew Wilcox Feb. 21, 2023, 6:31 p.m. UTC | #8

On Tue, Feb 21, 2023 at 03:37:49PM +0100, Danilo Krummrich wrote:
> On Mon, Feb 20, 2023 at 08:33:35PM +0000, Matthew Wilcox wrote:
> > On Mon, Feb 20, 2023 at 06:06:03PM +0100, Danilo Krummrich wrote:
> > > On 2/20/23 16:10, Matthew Wilcox wrote:
> > > > This is why we like people to use the spinlock embedded in the tree.
> > > > There's nothing for the user to care about.  If the access really is
> > > > serialised, acquiring/releasing the uncontended spinlock is a minimal
> > > > cost compared to all the other things that will happen while modifying
> > > > the tree.
> > > 
> > > I think as for the users of the GPUVA manager we'd have two cases:
> > > 
> > > 1) Accesses to the manager (and hence the tree) are serialized, no lock
> > > needed.
> > > 
> > > 2) Multiple operations on the tree must be locked in order to make them
> > > appear atomic.
> > 
> > Could you give an example here of what you'd like to do?  Ideally
> > something complicated so I don't say "Oh, you can just do this" when
> > there's a more complex example for which "this" won't work.  I'm sure
> > that's embedded somewhere in the next 20-odd patches, but it's probably
> > quicker for you to describe in terms of tree operations that have to
> > appear atomic than for me to try to figure it out.
> > 
> 
> Absolutely, not gonna ask you to read all of that. :-)
> 
> One thing the GPUVA manager does is to provide drivers the (sub-)operations
> that need to be processed in order to fulfill a map or unmap request from
> userspace. For instance, when userspace asks the driver to map some memory
> the GPUVA manager calculates which existing mappings must be removed, split up
> or can be merged with the newly requested mapping.
> 
> A driver has two ways to fetch those operations from the GPUVA manager. It can
> either obtain a list of operations or receive a callback for each operation
> generated by the GPUVA manager.
> 
> In both cases the GPUVA manager walks the maple tree, which keeps track of
> existing mappings, for the given range in __drm_gpuva_sm_map() (only considering
> the map case, since the unmap case is a subset basically). For each mapping
> found in the given range the driver, as mentioned, either receives a callback or
> a list entry is added to the list of operations.
> 
> Typically, for each operation / callback one entry within the maple tree is
> removed and, optionally at the beginning and end of a new mapping's range, a
> new entry is inserted. An of course, as the last operation, there is the new
> mapping itself to insert.
> 
> The GPUVA manager delegates locking responsibility to the drivers. Typically,
> a driver either serializes access to the VA space managed by the GPUVA manager
> (no lock needed) or need to lock the processing of a full set of operations
> generated by the GPUVA manager.

OK, that all makes sense.  It does make sense to have the driver use its
own mutex and then take the spinlock inside the maple tree code.  It
shouldn't ever be contended.

> > > In either case the embedded spinlock wouldn't be useful, we'd either need an
> > > external lock or no lock at all.
> > > 
> > > If there are any internal reasons why specific tree operations must be
> > > mutually excluded (such as those you explain below), wouldn't it make more
> > > sense to always have the internal lock and, optionally, allow users to
> > > specify an external lock additionally?
> > 
> > So the way this works for the XArray, which is a little older than the
> > Maple tree, is that we always use the internal spinlock for
> > modifications (possibly BH or IRQ safe), and if someone wants to
> > use an external mutex to make some callers atomic with respect to each
> > other, they're free to do so.  In that case, the XArray doesn't check
> > the user's external locking at all, because it really can't know.
> > 
> > I'd advise taking that approach; if there's really no way to use the
> > internal spinlock to make your complicated updates appear atomic
> > then just let the maple tree use its internal spinlock, and you can
> > also use your external mutex however you like.
> > 
> 
> That sounds like the right thing to do.
> 
> However, I'm using the advanced API of the maple tree (and that's the reason
> why the above example appears a little more detailed than needed) because I
> think with the normal API I can't insert / remove tree entries while walking
> the tree, right?

Right.  The normal API is for simple operations while the advanced API
is for doing compound operations.

> As by the documentation the advanced API, however, doesn't take care of locking
> itself, hence just letting the maple tree use its internal spinlock doesn't
> really work - I need to take care of that myself, right?

Yes; once you're using the advanced API, you get to compose the entire
operation yourself.

> It feels a bit weird that I, as a user of the API, would need to lock certain
> (or all?) mas_*() functions with the internal spinlock in order to protect
> (future) internal features of the tree, such as the slab cache defragmentation
> you mentioned. Because from my perspective, as the generic component that tells
> it's users (the drivers) to take care of locking VA space operations (and hence
> tree operations) I don't have an own purpose of this internal spinlock, right?

You don't ... but we can't know that.

> Also I'm a little confused how I'd know where to take the spinlock? E.g. for
> inserting entries in the tree I use mas_store_gfp() with GFP_KERNEL.

Lockdep will shout at you if you get it wrong ;-)  But you can safely
take the spinlock before calling mas_store_gfp(GFP_KERNEL) because
mas_nomem() knows to drop the lock before doing a sleeping allocation.
Essentially you're open-coding mtree_store_range() but doing your own
thing in addition to the store.

Danilo Krummrich Feb. 22, 2023, 4:11 p.m. UTC | #9

On 2/21/23 19:31, Matthew Wilcox wrote:
> On Tue, Feb 21, 2023 at 03:37:49PM +0100, Danilo Krummrich wrote:
>> On Mon, Feb 20, 2023 at 08:33:35PM +0000, Matthew Wilcox wrote:
>>> On Mon, Feb 20, 2023 at 06:06:03PM +0100, Danilo Krummrich wrote:
>>>> On 2/20/23 16:10, Matthew Wilcox wrote:
>>>>> This is why we like people to use the spinlock embedded in the tree.
>>>>> There's nothing for the user to care about.  If the access really is
>>>>> serialised, acquiring/releasing the uncontended spinlock is a minimal
>>>>> cost compared to all the other things that will happen while modifying
>>>>> the tree.
>>>>
>>>> I think as for the users of the GPUVA manager we'd have two cases:
>>>>
>>>> 1) Accesses to the manager (and hence the tree) are serialized, no lock
>>>> needed.
>>>>
>>>> 2) Multiple operations on the tree must be locked in order to make them
>>>> appear atomic.
>>>
>>> Could you give an example here of what you'd like to do?  Ideally
>>> something complicated so I don't say "Oh, you can just do this" when
>>> there's a more complex example for which "this" won't work.  I'm sure
>>> that's embedded somewhere in the next 20-odd patches, but it's probably
>>> quicker for you to describe in terms of tree operations that have to
>>> appear atomic than for me to try to figure it out.
>>>
>>
>> Absolutely, not gonna ask you to read all of that. :-)
>>
>> One thing the GPUVA manager does is to provide drivers the (sub-)operations
>> that need to be processed in order to fulfill a map or unmap request from
>> userspace. For instance, when userspace asks the driver to map some memory
>> the GPUVA manager calculates which existing mappings must be removed, split up
>> or can be merged with the newly requested mapping.
>>
>> A driver has two ways to fetch those operations from the GPUVA manager. It can
>> either obtain a list of operations or receive a callback for each operation
>> generated by the GPUVA manager.
>>
>> In both cases the GPUVA manager walks the maple tree, which keeps track of
>> existing mappings, for the given range in __drm_gpuva_sm_map() (only considering
>> the map case, since the unmap case is a subset basically). For each mapping
>> found in the given range the driver, as mentioned, either receives a callback or
>> a list entry is added to the list of operations.
>>
>> Typically, for each operation / callback one entry within the maple tree is
>> removed and, optionally at the beginning and end of a new mapping's range, a
>> new entry is inserted. An of course, as the last operation, there is the new
>> mapping itself to insert.
>>
>> The GPUVA manager delegates locking responsibility to the drivers. Typically,
>> a driver either serializes access to the VA space managed by the GPUVA manager
>> (no lock needed) or need to lock the processing of a full set of operations
>> generated by the GPUVA manager.
> 
> OK, that all makes sense.  It does make sense to have the driver use its
> own mutex and then take the spinlock inside the maple tree code.  It
> shouldn't ever be contended.
> 
>>>> In either case the embedded spinlock wouldn't be useful, we'd either need an
>>>> external lock or no lock at all.
>>>>
>>>> If there are any internal reasons why specific tree operations must be
>>>> mutually excluded (such as those you explain below), wouldn't it make more
>>>> sense to always have the internal lock and, optionally, allow users to
>>>> specify an external lock additionally?
>>>
>>> So the way this works for the XArray, which is a little older than the
>>> Maple tree, is that we always use the internal spinlock for
>>> modifications (possibly BH or IRQ safe), and if someone wants to
>>> use an external mutex to make some callers atomic with respect to each
>>> other, they're free to do so.  In that case, the XArray doesn't check
>>> the user's external locking at all, because it really can't know.
>>>
>>> I'd advise taking that approach; if there's really no way to use the
>>> internal spinlock to make your complicated updates appear atomic
>>> then just let the maple tree use its internal spinlock, and you can
>>> also use your external mutex however you like.
>>>
>>
>> That sounds like the right thing to do.
>>
>> However, I'm using the advanced API of the maple tree (and that's the reason
>> why the above example appears a little more detailed than needed) because I
>> think with the normal API I can't insert / remove tree entries while walking
>> the tree, right?
> 
> Right.  The normal API is for simple operations while the advanced API
> is for doing compound operations.
> 
>> As by the documentation the advanced API, however, doesn't take care of locking
>> itself, hence just letting the maple tree use its internal spinlock doesn't
>> really work - I need to take care of that myself, right?
> 
> Yes; once you're using the advanced API, you get to compose the entire
> operation yourself.
> 
>> It feels a bit weird that I, as a user of the API, would need to lock certain
>> (or all?) mas_*() functions with the internal spinlock in order to protect
>> (future) internal features of the tree, such as the slab cache defragmentation
>> you mentioned. Because from my perspective, as the generic component that tells
>> it's users (the drivers) to take care of locking VA space operations (and hence
>> tree operations) I don't have an own purpose of this internal spinlock, right?
> 
> You don't ... but we can't know that.

Thanks for the clarification. I think I should now know what to for the 
GPUVA manager in terms of locking the maple tree in general.

Though I still have very limited insights on the maple tree I want to 
share some further thoughts.

 From what I got so far it really seems to me that it would be better to 
just take the internal spinlock for both APIs (normal and advanced) 
whenever you need to internally.

This way users would not need to take care of locking maple tree 
internals, which I still think is a little odd.

Another plus would probably be maintainability. Once you got quite a few 
maple tree users using external locks (either in the sense of calling 
mt_set_external_lock() or in the way I'll potentially do it by using the 
internal lock with the advanced API and an additional external lock) it 
might be hard to apply any changes to the locking requirements, because 
you would either need to check every users implementation by hand or be 
able to run it in order to check it with lockdep.

If I got this correctly (please tell me if I don't) the only reason the 
internal lock is not managed by the advanced API internally is to let 
users do more complex transactions, without the need of having a 
separate external lock, as long as they fulfill the locking requirements 
of the maple tree, which are enforced by lockdep.

However, you already mentioned that "acquiring/releasing the uncontended 
spinlock is a minimal cost compared to all the other things that will 
happen while modifying the tree".

Do I miss something?

> 
>> Also I'm a little confused how I'd know where to take the spinlock? E.g. for
>> inserting entries in the tree I use mas_store_gfp() with GFP_KERNEL.
> 
> Lockdep will shout at you if you get it wrong ;-)  But you can safely
> take the spinlock before calling mas_store_gfp(GFP_KERNEL) because
> mas_nomem() knows to drop the lock before doing a sleeping allocation.
> Essentially you're open-coding mtree_store_range() but doing your own
> thing in addition to the store.
> 
Just asking lockdep was my plan already, however I thought I still 
better ask. :D

If you will keep the current approach of handling the internal lock I 
think its necessary to somewhere document where users need to take the 
lock and, even more important, where the maple tree implementation will 
drop the lock.

For instance, if I would rely on using the internal spinlock for locking 
sets of transactions to the maple tree this would cause nasty bugs if I 
use functions like mas_store_gfp() dropping the lock. Even though I must 
admit that this is not a great example, since it should raise some red 
flags if a user would expect a spinlock is held for a sleeping 
allocation without questioning it. However, you get the point I guess.

Matthew Wilcox Feb. 22, 2023, 4:32 p.m. UTC | #10

On Wed, Feb 22, 2023 at 05:11:34PM +0100, Danilo Krummrich wrote:
> On 2/21/23 19:31, Matthew Wilcox wrote:
> > on tue, feb 21, 2023 at 03:37:49pm +0100, danilo krummrich wrote:
> > > It feels a bit weird that I, as a user of the API, would need to lock certain
> > > (or all?) mas_*() functions with the internal spinlock in order to protect
> > > (future) internal features of the tree, such as the slab cache defragmentation
> > > you mentioned. Because from my perspective, as the generic component that tells
> > > it's users (the drivers) to take care of locking VA space operations (and hence
> > > tree operations) I don't have an own purpose of this internal spinlock, right?
> > 
> > You don't ... but we can't know that.
> 
> Thanks for the clarification. I think I should now know what to for the
> GPUVA manager in terms of locking the maple tree in general.
> 
> Though I still have very limited insights on the maple tree I want to share
> some further thoughts.
> 
> From what I got so far it really seems to me that it would be better to just
> take the internal spinlock for both APIs (normal and advanced) whenever you
> need to internally.

No.  Really, no.  The point of the advanced API is that it's a toolbox
for doing the operation you want, but isn't a generic enough operation
to be part of the normal API.  To take an example from the radix
tree days, in the page cache, we need to walk a range of the tree,
looking for any entries that are marked as DIRTY, clear the DIRTY
mark and set the TOWRITE mark.  There was a horrendous function called
radix_tree_range_tag_if_tagged() which did exactly this.  Now look at
the implementation of tag_pages_for_writeback(); it's a simple loop over
a range with an occasional pause to check whether we need to reschedule.

But that means you need to know how to use the toolbox.  Some of the
tools are dangerous and you can cut yourself on them.

> Another plus would probably be maintainability. Once you got quite a few
> maple tree users using external locks (either in the sense of calling

I don't want maple tree users using external locks.  That exists
because it was the only reasonable way of converting the VMA tree
from the rbtree to the maple tree.  I intend to get rid of
mt_set_external_lock().  The VMAs are eventually going to be protected
by the internal spinlock.

Danilo Krummrich Feb. 22, 2023, 5:28 p.m. UTC | #11

On 2/22/23 17:32, Matthew Wilcox wrote:
> On Wed, Feb 22, 2023 at 05:11:34PM +0100, Danilo Krummrich wrote:
>> On 2/21/23 19:31, Matthew Wilcox wrote:
>>> on tue, feb 21, 2023 at 03:37:49pm +0100, danilo krummrich wrote:
>>>> It feels a bit weird that I, as a user of the API, would need to lock certain
>>>> (or all?) mas_*() functions with the internal spinlock in order to protect
>>>> (future) internal features of the tree, such as the slab cache defragmentation
>>>> you mentioned. Because from my perspective, as the generic component that tells
>>>> it's users (the drivers) to take care of locking VA space operations (and hence
>>>> tree operations) I don't have an own purpose of this internal spinlock, right?
>>>
>>> You don't ... but we can't know that.
>>
>> Thanks for the clarification. I think I should now know what to for the
>> GPUVA manager in terms of locking the maple tree in general.
>>
>> Though I still have very limited insights on the maple tree I want to share
>> some further thoughts.
>>
>>  From what I got so far it really seems to me that it would be better to just
>> take the internal spinlock for both APIs (normal and advanced) whenever you
>> need to internally.
> 
> No.  Really, no.  The point of the advanced API is that it's a toolbox
> for doing the operation you want, but isn't a generic enough operation
> to be part of the normal API.

Again the disclaimer, I'm just sharing my thoughts from the perspective 
of a user from a generic tree API.

For me it feels like - and this purely is an assumption, hence please 
correct me if I'm wrong on that - you consider the advanced API to be 
more of a collection of internal functions not *really* being meant to 
be used by arbitrary users and maybe even being slightly tied to mm 
since it originated there?

However, from my external perspective I see it the following way.

Even if an operation is not part of the 'normal API', but an API called 
'advanced API', it still is a generic API operation being exposed to 
arbitrary users. However, my point is not (at least not exclusively) 
that I do not consider this to be safe enough or something.

Its just that I think that when the API *enforces* the user to take an 
internal lock at certain places it can also just take the lock itself no 
matter what the API is being called. Especially when one can't rely on 
this lock at all for other (external) purposes anyways because the 
implementation behind the API is free to drop the lock whenever it needs to.

> To take an example from the radix
> tree days, in the page cache, we need to walk a range of the tree,
> looking for any entries that are marked as DIRTY, clear the DIRTY
> mark and set the TOWRITE mark.  There was a horrendous function called
> radix_tree_range_tag_if_tagged() which did exactly this.  Now look at
> the implementation of tag_pages_for_writeback(); it's a simple loop over
> a range with an occasional pause to check whether we need to reschedule.
> 
> But that means you need to know how to use the toolbox.  Some of the
> tools are dangerous and you can cut yourself on them.
> 
>> Another plus would probably be maintainability. Once you got quite a few
>> maple tree users using external locks (either in the sense of calling
> 
> I don't want maple tree users using external locks.  That exists
> because it was the only reasonable way of converting the VMA tree
> from the rbtree to the maple tree.  I intend to get rid of
> mt_set_external_lock().  The VMAs are eventually going to be protected
> by the internal spinlock.
> 

But the argument also holds for the case of using the advanced API and 
using the internal spinlock. If your requirements on locking change in 
the future every user implementation must be re-validated.

Danilo Krummrich Feb. 27, 2023, 5:39 p.m. UTC | #12

On 2/21/23 19:31, Matthew Wilcox wrote:
> On Tue, Feb 21, 2023 at 03:37:49PM +0100, Danilo Krummrich wrote:
>> On Mon, Feb 20, 2023 at 08:33:35PM +0000, Matthew Wilcox wrote:
>>> On Mon, Feb 20, 2023 at 06:06:03PM +0100, Danilo Krummrich wrote:
>>>> On 2/20/23 16:10, Matthew Wilcox wrote:
>>>>> This is why we like people to use the spinlock embedded in the tree.
>>>>> There's nothing for the user to care about.  If the access really is
>>>>> serialised, acquiring/releasing the uncontended spinlock is a minimal
>>>>> cost compared to all the other things that will happen while modifying
>>>>> the tree.
>>>>
>>>> I think as for the users of the GPUVA manager we'd have two cases:
>>>>
>>>> 1) Accesses to the manager (and hence the tree) are serialized, no lock
>>>> needed.
>>>>
>>>> 2) Multiple operations on the tree must be locked in order to make them
>>>> appear atomic.
>>>
>>> Could you give an example here of what you'd like to do?  Ideally
>>> something complicated so I don't say "Oh, you can just do this" when
>>> there's a more complex example for which "this" won't work.  I'm sure
>>> that's embedded somewhere in the next 20-odd patches, but it's probably
>>> quicker for you to describe in terms of tree operations that have to
>>> appear atomic than for me to try to figure it out.
>>>
>>
>> Absolutely, not gonna ask you to read all of that. :-)
>>
>> One thing the GPUVA manager does is to provide drivers the (sub-)operations
>> that need to be processed in order to fulfill a map or unmap request from
>> userspace. For instance, when userspace asks the driver to map some memory
>> the GPUVA manager calculates which existing mappings must be removed, split up
>> or can be merged with the newly requested mapping.
>>
>> A driver has two ways to fetch those operations from the GPUVA manager. It can
>> either obtain a list of operations or receive a callback for each operation
>> generated by the GPUVA manager.
>>
>> In both cases the GPUVA manager walks the maple tree, which keeps track of
>> existing mappings, for the given range in __drm_gpuva_sm_map() (only considering
>> the map case, since the unmap case is a subset basically). For each mapping
>> found in the given range the driver, as mentioned, either receives a callback or
>> a list entry is added to the list of operations.
>>
>> Typically, for each operation / callback one entry within the maple tree is
>> removed and, optionally at the beginning and end of a new mapping's range, a
>> new entry is inserted. An of course, as the last operation, there is the new
>> mapping itself to insert.
>>
>> The GPUVA manager delegates locking responsibility to the drivers. Typically,
>> a driver either serializes access to the VA space managed by the GPUVA manager
>> (no lock needed) or need to lock the processing of a full set of operations
>> generated by the GPUVA manager.
> 
> OK, that all makes sense.  It does make sense to have the driver use its
> own mutex and then take the spinlock inside the maple tree code.  It
> shouldn't ever be contended.
> 
>>>> In either case the embedded spinlock wouldn't be useful, we'd either need an
>>>> external lock or no lock at all.
>>>>
>>>> If there are any internal reasons why specific tree operations must be
>>>> mutually excluded (such as those you explain below), wouldn't it make more
>>>> sense to always have the internal lock and, optionally, allow users to
>>>> specify an external lock additionally?
>>>
>>> So the way this works for the XArray, which is a little older than the
>>> Maple tree, is that we always use the internal spinlock for
>>> modifications (possibly BH or IRQ safe), and if someone wants to
>>> use an external mutex to make some callers atomic with respect to each
>>> other, they're free to do so.  In that case, the XArray doesn't check
>>> the user's external locking at all, because it really can't know.
>>>
>>> I'd advise taking that approach; if there's really no way to use the
>>> internal spinlock to make your complicated updates appear atomic
>>> then just let the maple tree use its internal spinlock, and you can
>>> also use your external mutex however you like.
>>>
>>
>> That sounds like the right thing to do.
>>
>> However, I'm using the advanced API of the maple tree (and that's the reason
>> why the above example appears a little more detailed than needed) because I
>> think with the normal API I can't insert / remove tree entries while walking
>> the tree, right?
> 
> Right.  The normal API is for simple operations while the advanced API
> is for doing compound operations.
> 
>> As by the documentation the advanced API, however, doesn't take care of locking
>> itself, hence just letting the maple tree use its internal spinlock doesn't
>> really work - I need to take care of that myself, right?
> 
> Yes; once you're using the advanced API, you get to compose the entire
> operation yourself.
> 
>> It feels a bit weird that I, as a user of the API, would need to lock certain
>> (or all?) mas_*() functions with the internal spinlock in order to protect
>> (future) internal features of the tree, such as the slab cache defragmentation
>> you mentioned. Because from my perspective, as the generic component that tells
>> it's users (the drivers) to take care of locking VA space operations (and hence
>> tree operations) I don't have an own purpose of this internal spinlock, right?
> 
> You don't ... but we can't know that.
> 
>> Also I'm a little confused how I'd know where to take the spinlock? E.g. for
>> inserting entries in the tree I use mas_store_gfp() with GFP_KERNEL.
> 
> Lockdep will shout at you if you get it wrong ;-)  But you can safely
> take the spinlock before calling mas_store_gfp(GFP_KERNEL) because
> mas_nomem() knows to drop the lock before doing a sleeping allocation.
> Essentially you're open-coding mtree_store_range() but doing your own
> thing in addition to the store.
> 

As already mentioned, I went with your advice to just take the maple 
tree's internal spinlock within the GPUVA manager and leave all the 
other locking to the drivers as intended.

However, I run into the case that lockdep shouts at me for not taking 
the spinlock before calling mas_find() in the iterator macros.

Now, I definitely don't want to let the drivers take the maple tree's 
spinlock before they use the iterator macro. Of course, drivers 
shouldn't even know about the underlying maple tree of the GPUVA manager.

One way to make lockdep happy in this case seems to be taking the 
spinlock right before mas_find() and drop it right after for each iteration.

What do you advice to do in this case?

Thanks,
Danilo

Matthew Wilcox Feb. 27, 2023, 6:36 p.m. UTC | #13

On Mon, Feb 27, 2023 at 06:39:33PM +0100, Danilo Krummrich wrote:
> On 2/21/23 19:31, Matthew Wilcox wrote:
> > Lockdep will shout at you if you get it wrong ;-)  But you can safely
> > take the spinlock before calling mas_store_gfp(GFP_KERNEL) because
> > mas_nomem() knows to drop the lock before doing a sleeping allocation.
> > Essentially you're open-coding mtree_store_range() but doing your own
> > thing in addition to the store.
> 
> As already mentioned, I went with your advice to just take the maple tree's
> internal spinlock within the GPUVA manager and leave all the other locking
> to the drivers as intended.
> 
> However, I run into the case that lockdep shouts at me for not taking the
> spinlock before calling mas_find() in the iterator macros.
> 
> Now, I definitely don't want to let the drivers take the maple tree's
> spinlock before they use the iterator macro. Of course, drivers shouldn't
> even know about the underlying maple tree of the GPUVA manager.
> 
> One way to make lockdep happy in this case seems to be taking the spinlock
> right before mas_find() and drop it right after for each iteration.

While we don't have any lockdep checking of this, you really shouldn't be
using an iterator if you're going to drop the lock between invocations.
The iterator points into the tree, so you need to invalidate the iterator
any time you drop the lock.

You don't have to use a spinlock to do a read iteration.  You can just
take the rcu_read_lock() around your iteration, as long as you can
tolerate the mild inconsistencies that RCU permits.

Danilo Krummrich Feb. 27, 2023, 6:59 p.m. UTC | #14

On 2/27/23 19:36, Matthew Wilcox wrote:
> On Mon, Feb 27, 2023 at 06:39:33PM +0100, Danilo Krummrich wrote:
>> On 2/21/23 19:31, Matthew Wilcox wrote:
>>> Lockdep will shout at you if you get it wrong ;-)  But you can safely
>>> take the spinlock before calling mas_store_gfp(GFP_KERNEL) because
>>> mas_nomem() knows to drop the lock before doing a sleeping allocation.
>>> Essentially you're open-coding mtree_store_range() but doing your own
>>> thing in addition to the store.
>>
>> As already mentioned, I went with your advice to just take the maple tree's
>> internal spinlock within the GPUVA manager and leave all the other locking
>> to the drivers as intended.
>>
>> However, I run into the case that lockdep shouts at me for not taking the
>> spinlock before calling mas_find() in the iterator macros.
>>
>> Now, I definitely don't want to let the drivers take the maple tree's
>> spinlock before they use the iterator macro. Of course, drivers shouldn't
>> even know about the underlying maple tree of the GPUVA manager.
>>
>> One way to make lockdep happy in this case seems to be taking the spinlock
>> right before mas_find() and drop it right after for each iteration.
> 
> While we don't have any lockdep checking of this, you really shouldn't be
> using an iterator if you're going to drop the lock between invocations.
> The iterator points into the tree, so you need to invalidate the iterator
> any time you drop the lock.

The tree can't change either way in my case. Changes to the DRM GPUVA 
manager (and hence the tree) are protected by drivers, either by 
serializing tree accesses or by having another external lock ensuring 
mutual exclusion. Just as a reminder, in the latter case drivers usually 
lock multiple transactions to the manager (and hence the tree) to ensure 
they appear atomic.

So, really the only purpose for me taking the internal lock is to ensure 
I satisfy lockdep and the maple tree's internal requirements on locking 
for future use cases you mentioned (e.g. slab cache defragmentation).

It's the rcu_dereference_check() in mas_root() that triggers in my case:

[   28.745706] lib/maple_tree.c:851 suspicious rcu_dereference_check() 
usage!

                stack backtrace:
[   28.746057] CPU: 8 PID: 1518 Comm: nouveau_dma_cop Not tainted 
6.2.0-rc6-vmbind-0.2+ #104
[   28.746061] Hardware name: ASUS System Product Name/PRIME Z690-A, 
BIOS 2103 09/30/2022
[   28.746064] Call Trace:
[   28.746067]  <TASK>
[   28.746070]  dump_stack_lvl+0x5b/0x77
[   28.746077]  mas_walk+0x16d/0x1b0
[   28.746082]  mas_find+0xf7/0x300
[   28.746088]  drm_gpuva_in_region+0x63/0xa0
[   28.746099]  __drm_gpuva_sm_map.isra.0+0x465/0x9f0
[   28.746103]  ? lock_acquire+0xbf/0x2b0
[   28.746111]  ? __pfx_drm_gpuva_sm_step+0x10/0x10
[   28.746114]  ? lock_is_held_type+0xe3/0x140
[   28.746121]  ? mark_held_locks+0x49/0x80
[   28.746125]  ? _raw_spin_unlock_irqrestore+0x30/0x60
[   28.746138]  drm_gpuva_sm_map_ops_create+0x80/0xc0
[   28.746145]  uvmm_bind_job_submit+0x3c2/0x470 [nouveau]
[   28.746272]  nouveau_job_submit+0x60/0x450 [nouveau]
[   28.746393]  nouveau_uvmm_ioctl_vm_bind+0x179/0x1e0 [nouveau]
[   28.746510]  ? __pfx_nouveau_uvmm_ioctl_vm_bind+0x10/0x10 [nouveau]
[   28.746622]  drm_ioctl_kernel+0xa9/0x160
[   28.746629]  drm_ioctl+0x1f7/0x4b0

> 
> You don't have to use a spinlock to do a read iteration.  You can just
> take the rcu_read_lock() around your iteration, as long as you can
> tolerate the mild inconsistencies that RCU permits.
>

Doing that would mean that the driver needs to do it. However, the 
driver either needs to serialize accesses or use it's own mutex for 
protection for the above reasons. Hence, that should not be needed.

[drm-next,v2,04/16] maple_tree: add flag MT_FLAGS_LOCK_NONE

Commit Message

Comments

Patch