diff mbox series

[11/21] drm/i915: Stop manually RCU banging in reset_stats_ioctl

Message ID 20210423223131.879208-12-jason@jlekstrand.net (mailing list archive)
State New, archived
Headers show
Series drm/i915/gem: ioctl clean-ups | expand

Commit Message

Jason Ekstrand April 23, 2021, 10:31 p.m. UTC
As far as I can tell, the only real reason for this is to avoid taking a
reference to the i915_gem_context.  The cost of those two atomics
probably pales in comparison to the cost of the ioctl itself so we're
really not buying ourselves anything here.  We're about to make context
lookup a tiny bit more complicated, so let's get rid of the one hand-
rolled case.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c | 13 ++++---------
 drivers/gpu/drm/i915/i915_drv.h             |  8 +-------
 2 files changed, 5 insertions(+), 16 deletions(-)

Comments

Daniel Vetter April 28, 2021, 10:27 a.m. UTC | #1
On Fri, Apr 23, 2021 at 05:31:21PM -0500, Jason Ekstrand wrote:
> As far as I can tell, the only real reason for this is to avoid taking a
> reference to the i915_gem_context.  The cost of those two atomics
> probably pales in comparison to the cost of the ioctl itself so we're
> really not buying ourselves anything here.  We're about to make context
> lookup a tiny bit more complicated, so let's get rid of the one hand-
> rolled case.

I think the historical reason here is that i965_brw checks this before
every execbuf call, at least for arb_robustness contexts with the right
flag. But we've fixed that hotpath problem by adding non-recoverable
contexts. The kernel will tell you now automatically, for proper userspace
at least (I checked iris and anv, assuming I got it correct), and
reset_stats ioctl isn't a hot path worth micro-optimizing anymore.

With that bit of more context added to the commit message:

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

> 
> Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_context.c | 13 ++++---------
>  drivers/gpu/drm/i915/i915_drv.h             |  8 +-------
>  2 files changed, 5 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> index ecb3bf5369857..941fbf78267b4 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> @@ -2090,16 +2090,13 @@ int i915_gem_context_reset_stats_ioctl(struct drm_device *dev,
>  	struct drm_i915_private *i915 = to_i915(dev);
>  	struct drm_i915_reset_stats *args = data;
>  	struct i915_gem_context *ctx;
> -	int ret;
>  
>  	if (args->flags || args->pad)
>  		return -EINVAL;
>  
> -	ret = -ENOENT;
> -	rcu_read_lock();
> -	ctx = __i915_gem_context_lookup_rcu(file->driver_priv, args->ctx_id);
> +	ctx = i915_gem_context_lookup(file->driver_priv, args->ctx_id);
>  	if (!ctx)
> -		goto out;
> +		return -ENOENT;
>  
>  	/*
>  	 * We opt for unserialised reads here. This may result in tearing
> @@ -2116,10 +2113,8 @@ int i915_gem_context_reset_stats_ioctl(struct drm_device *dev,
>  	args->batch_active = atomic_read(&ctx->guilty_count);
>  	args->batch_pending = atomic_read(&ctx->active_count);
>  
> -	ret = 0;
> -out:
> -	rcu_read_unlock();
> -	return ret;
> +	i915_gem_context_put(ctx);
> +	return 0;
>  }
>  
>  /* GEM context-engines iterator: for_each_gem_engine() */
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 0b44333eb7033..8571c5c1509a7 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1840,19 +1840,13 @@ struct drm_gem_object *i915_gem_prime_import(struct drm_device *dev,
>  
>  struct dma_buf *i915_gem_prime_export(struct drm_gem_object *gem_obj, int flags);
>  
> -static inline struct i915_gem_context *
> -__i915_gem_context_lookup_rcu(struct drm_i915_file_private *file_priv, u32 id)
> -{
> -	return xa_load(&file_priv->context_xa, id);
> -}
> -
>  static inline struct i915_gem_context *
>  i915_gem_context_lookup(struct drm_i915_file_private *file_priv, u32 id)
>  {
>  	struct i915_gem_context *ctx;
>  
>  	rcu_read_lock();
> -	ctx = __i915_gem_context_lookup_rcu(file_priv, id);
> +	ctx = xa_load(&file_priv->context_xa, id);
>  	if (ctx && !kref_get_unless_zero(&ctx->ref))
>  		ctx = NULL;
>  	rcu_read_unlock();
> -- 
> 2.31.1
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
Jason Ekstrand April 28, 2021, 6:22 p.m. UTC | #2
On Wed, Apr 28, 2021 at 5:27 AM Daniel Vetter <daniel@ffwll.ch> wrote:
>
> On Fri, Apr 23, 2021 at 05:31:21PM -0500, Jason Ekstrand wrote:
> > As far as I can tell, the only real reason for this is to avoid taking a
> > reference to the i915_gem_context.  The cost of those two atomics
> > probably pales in comparison to the cost of the ioctl itself so we're
> > really not buying ourselves anything here.  We're about to make context
> > lookup a tiny bit more complicated, so let's get rid of the one hand-
> > rolled case.
>
> I think the historical reason here is that i965_brw checks this before
> every execbuf call, at least for arb_robustness contexts with the right
> flag. But we've fixed that hotpath problem by adding non-recoverable
> contexts. The kernel will tell you now automatically, for proper userspace
> at least (I checked iris and anv, assuming I got it correct), and
> reset_stats ioctl isn't a hot path worth micro-optimizing anymore.

I'm not sure I agree with that bit.  I don't think it was ever worth
micro-optimizing like this.  What does it gain us?  Two fewer atomics?
 It's not like the bad old days when it took a lock.

ANV still calls reset_stats before every set of execbuf (sometimes
more than one) but I've never once seen it show up on a perf trace.
execbuf, on the other hand, that does show up and pretty heavy
sometimes.

> With that bit of more context added to the commit message:

I'd like to agree on what to add before adding something

--Jason

> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
>
> >
> > Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
> > ---
> >  drivers/gpu/drm/i915/gem/i915_gem_context.c | 13 ++++---------
> >  drivers/gpu/drm/i915/i915_drv.h             |  8 +-------
> >  2 files changed, 5 insertions(+), 16 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > index ecb3bf5369857..941fbf78267b4 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > @@ -2090,16 +2090,13 @@ int i915_gem_context_reset_stats_ioctl(struct drm_device *dev,
> >       struct drm_i915_private *i915 = to_i915(dev);
> >       struct drm_i915_reset_stats *args = data;
> >       struct i915_gem_context *ctx;
> > -     int ret;
> >
> >       if (args->flags || args->pad)
> >               return -EINVAL;
> >
> > -     ret = -ENOENT;
> > -     rcu_read_lock();
> > -     ctx = __i915_gem_context_lookup_rcu(file->driver_priv, args->ctx_id);
> > +     ctx = i915_gem_context_lookup(file->driver_priv, args->ctx_id);
> >       if (!ctx)
> > -             goto out;
> > +             return -ENOENT;
> >
> >       /*
> >        * We opt for unserialised reads here. This may result in tearing
> > @@ -2116,10 +2113,8 @@ int i915_gem_context_reset_stats_ioctl(struct drm_device *dev,
> >       args->batch_active = atomic_read(&ctx->guilty_count);
> >       args->batch_pending = atomic_read(&ctx->active_count);
> >
> > -     ret = 0;
> > -out:
> > -     rcu_read_unlock();
> > -     return ret;
> > +     i915_gem_context_put(ctx);
> > +     return 0;
> >  }
> >
> >  /* GEM context-engines iterator: for_each_gem_engine() */
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> > index 0b44333eb7033..8571c5c1509a7 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -1840,19 +1840,13 @@ struct drm_gem_object *i915_gem_prime_import(struct drm_device *dev,
> >
> >  struct dma_buf *i915_gem_prime_export(struct drm_gem_object *gem_obj, int flags);
> >
> > -static inline struct i915_gem_context *
> > -__i915_gem_context_lookup_rcu(struct drm_i915_file_private *file_priv, u32 id)
> > -{
> > -     return xa_load(&file_priv->context_xa, id);
> > -}
> > -
> >  static inline struct i915_gem_context *
> >  i915_gem_context_lookup(struct drm_i915_file_private *file_priv, u32 id)
> >  {
> >       struct i915_gem_context *ctx;
> >
> >       rcu_read_lock();
> > -     ctx = __i915_gem_context_lookup_rcu(file_priv, id);
> > +     ctx = xa_load(&file_priv->context_xa, id);
> >       if (ctx && !kref_get_unless_zero(&ctx->ref))
> >               ctx = NULL;
> >       rcu_read_unlock();
> > --
> > 2.31.1
> >
> > _______________________________________________
> > dri-devel mailing list
> > dri-devel@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
Daniel Vetter April 29, 2021, 12:22 p.m. UTC | #3
On Wed, Apr 28, 2021 at 01:22:14PM -0500, Jason Ekstrand wrote:
> On Wed, Apr 28, 2021 at 5:27 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> >
> > On Fri, Apr 23, 2021 at 05:31:21PM -0500, Jason Ekstrand wrote:
> > > As far as I can tell, the only real reason for this is to avoid taking a
> > > reference to the i915_gem_context.  The cost of those two atomics
> > > probably pales in comparison to the cost of the ioctl itself so we're
> > > really not buying ourselves anything here.  We're about to make context
> > > lookup a tiny bit more complicated, so let's get rid of the one hand-
> > > rolled case.
> >
> > I think the historical reason here is that i965_brw checks this before
> > every execbuf call, at least for arb_robustness contexts with the right
> > flag. But we've fixed that hotpath problem by adding non-recoverable
> > contexts. The kernel will tell you now automatically, for proper userspace
> > at least (I checked iris and anv, assuming I got it correct), and
> > reset_stats ioctl isn't a hot path worth micro-optimizing anymore.
> 
> I'm not sure I agree with that bit.  I don't think it was ever worth
> micro-optimizing like this.  What does it gain us?  Two fewer atomics?
>  It's not like the bad old days when it took a lock.
> 
> ANV still calls reset_stats before every set of execbuf (sometimes
> more than one) but I've never once seen it show up on a perf trace.
> execbuf, on the other hand, that does show up and pretty heavy
> sometimes.

Huh I thought I checked, but I guess got lost.

> > With that bit of more context added to the commit message:
> 
> I'd like to agree on what to add before adding something

Yeah in this case maybe just mention that with non-recoverable ctx there's
no need for userspace to check before every execbuf, so if this ever shows
up there's a proper fix which avoids the ioctl entirely. Like iris does.

Or something like that. I just want to make it clear that if this ever
does show up (once we've made execbuf faster with vm_bind and all that)
then the correct fix isn't to make this ioctl faster. But to just not
call it :-)

Cheers, Daniel

> 
> --Jason
> 
> > Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> >
> > >
> > > Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
> > > ---
> > >  drivers/gpu/drm/i915/gem/i915_gem_context.c | 13 ++++---------
> > >  drivers/gpu/drm/i915/i915_drv.h             |  8 +-------
> > >  2 files changed, 5 insertions(+), 16 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > > index ecb3bf5369857..941fbf78267b4 100644
> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > > @@ -2090,16 +2090,13 @@ int i915_gem_context_reset_stats_ioctl(struct drm_device *dev,
> > >       struct drm_i915_private *i915 = to_i915(dev);
> > >       struct drm_i915_reset_stats *args = data;
> > >       struct i915_gem_context *ctx;
> > > -     int ret;
> > >
> > >       if (args->flags || args->pad)
> > >               return -EINVAL;
> > >
> > > -     ret = -ENOENT;
> > > -     rcu_read_lock();
> > > -     ctx = __i915_gem_context_lookup_rcu(file->driver_priv, args->ctx_id);
> > > +     ctx = i915_gem_context_lookup(file->driver_priv, args->ctx_id);
> > >       if (!ctx)
> > > -             goto out;
> > > +             return -ENOENT;
> > >
> > >       /*
> > >        * We opt for unserialised reads here. This may result in tearing
> > > @@ -2116,10 +2113,8 @@ int i915_gem_context_reset_stats_ioctl(struct drm_device *dev,
> > >       args->batch_active = atomic_read(&ctx->guilty_count);
> > >       args->batch_pending = atomic_read(&ctx->active_count);
> > >
> > > -     ret = 0;
> > > -out:
> > > -     rcu_read_unlock();
> > > -     return ret;
> > > +     i915_gem_context_put(ctx);
> > > +     return 0;
> > >  }
> > >
> > >  /* GEM context-engines iterator: for_each_gem_engine() */
> > > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> > > index 0b44333eb7033..8571c5c1509a7 100644
> > > --- a/drivers/gpu/drm/i915/i915_drv.h
> > > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > > @@ -1840,19 +1840,13 @@ struct drm_gem_object *i915_gem_prime_import(struct drm_device *dev,
> > >
> > >  struct dma_buf *i915_gem_prime_export(struct drm_gem_object *gem_obj, int flags);
> > >
> > > -static inline struct i915_gem_context *
> > > -__i915_gem_context_lookup_rcu(struct drm_i915_file_private *file_priv, u32 id)
> > > -{
> > > -     return xa_load(&file_priv->context_xa, id);
> > > -}
> > > -
> > >  static inline struct i915_gem_context *
> > >  i915_gem_context_lookup(struct drm_i915_file_private *file_priv, u32 id)
> > >  {
> > >       struct i915_gem_context *ctx;
> > >
> > >       rcu_read_lock();
> > > -     ctx = __i915_gem_context_lookup_rcu(file_priv, id);
> > > +     ctx = xa_load(&file_priv->context_xa, id);
> > >       if (ctx && !kref_get_unless_zero(&ctx->ref))
> > >               ctx = NULL;
> > >       rcu_read_unlock();
> > > --
> > > 2.31.1
> > >
> > > _______________________________________________
> > > dri-devel mailing list
> > > dri-devel@lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/dri-devel
> >
> > --
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > http://blog.ffwll.ch
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index ecb3bf5369857..941fbf78267b4 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -2090,16 +2090,13 @@  int i915_gem_context_reset_stats_ioctl(struct drm_device *dev,
 	struct drm_i915_private *i915 = to_i915(dev);
 	struct drm_i915_reset_stats *args = data;
 	struct i915_gem_context *ctx;
-	int ret;
 
 	if (args->flags || args->pad)
 		return -EINVAL;
 
-	ret = -ENOENT;
-	rcu_read_lock();
-	ctx = __i915_gem_context_lookup_rcu(file->driver_priv, args->ctx_id);
+	ctx = i915_gem_context_lookup(file->driver_priv, args->ctx_id);
 	if (!ctx)
-		goto out;
+		return -ENOENT;
 
 	/*
 	 * We opt for unserialised reads here. This may result in tearing
@@ -2116,10 +2113,8 @@  int i915_gem_context_reset_stats_ioctl(struct drm_device *dev,
 	args->batch_active = atomic_read(&ctx->guilty_count);
 	args->batch_pending = atomic_read(&ctx->active_count);
 
-	ret = 0;
-out:
-	rcu_read_unlock();
-	return ret;
+	i915_gem_context_put(ctx);
+	return 0;
 }
 
 /* GEM context-engines iterator: for_each_gem_engine() */
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 0b44333eb7033..8571c5c1509a7 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1840,19 +1840,13 @@  struct drm_gem_object *i915_gem_prime_import(struct drm_device *dev,
 
 struct dma_buf *i915_gem_prime_export(struct drm_gem_object *gem_obj, int flags);
 
-static inline struct i915_gem_context *
-__i915_gem_context_lookup_rcu(struct drm_i915_file_private *file_priv, u32 id)
-{
-	return xa_load(&file_priv->context_xa, id);
-}
-
 static inline struct i915_gem_context *
 i915_gem_context_lookup(struct drm_i915_file_private *file_priv, u32 id)
 {
 	struct i915_gem_context *ctx;
 
 	rcu_read_lock();
-	ctx = __i915_gem_context_lookup_rcu(file_priv, id);
+	ctx = xa_load(&file_priv->context_xa, id);
 	if (ctx && !kref_get_unless_zero(&ctx->ref))
 		ctx = NULL;
 	rcu_read_unlock();