diff mbox series

[v5,04/21] gpu: host1x: Remove cancelled waiters immediately

Message ID 20210111130019.3515669-5-mperttunen@nvidia.com (mailing list archive)
State New, archived
Headers show
Series Host1x/TegraDRM UAPI | expand

Commit Message

Mikko Perttunen Jan. 11, 2021, 1 p.m. UTC
Before this patch, cancelled waiters would only be cleaned up
once their threshold value was reached. Make host1x_intr_put_ref
process the cancellation immediately to fix this.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
v5:
* Add parameter to flush, i.e. wait for all pending waiters to
  complete before returning. The reason this is not always true
  is that the pending waiter might be the place that is calling
  the put_ref.
---
 drivers/gpu/host1x/intr.c   | 23 +++++++++++++++++------
 drivers/gpu/host1x/intr.h   |  4 +++-
 drivers/gpu/host1x/syncpt.c |  2 +-
 3 files changed, 21 insertions(+), 8 deletions(-)

Comments

Dmitry Osipenko Jan. 12, 2021, 10:07 p.m. UTC | #1
11.01.2021 16:00, Mikko Perttunen пишет:
> -void host1x_intr_put_ref(struct host1x *host, unsigned int id, void *ref)
> +void host1x_intr_put_ref(struct host1x *host, unsigned int id, void *ref,
> +			 bool flush)
>  {
>  	struct host1x_waitlist *waiter = ref;
>  	struct host1x_syncpt *syncpt;
>  
> -	while (atomic_cmpxchg(&waiter->state, WLS_PENDING, WLS_CANCELLED) ==
> -	       WLS_REMOVED)
> -		schedule();
> +	atomic_cmpxchg(&waiter->state, WLS_PENDING, WLS_CANCELLED);
>  
>  	syncpt = host->syncpt + id;
> -	(void)process_wait_list(host, syncpt,
> -				host1x_syncpt_load(host->syncpt + id));
> +
> +	spin_lock(&syncpt->intr.lock);
> +	if (atomic_cmpxchg(&waiter->state, WLS_CANCELLED, WLS_HANDLED) ==
> +	    WLS_CANCELLED) {
> +		list_del(&waiter->list);
> +		kref_put(&waiter->refcount, waiter_release);
> +	}
> +	spin_unlock(&syncpt->intr.lock);
> +
> +	if (flush) {
> +		/* Wait until any concurrently executing handler has finished. */
> +		while (atomic_read(&waiter->state) != WLS_HANDLED)
> +			cpu_relax();
> +	}

A busy-loop shouldn't be used in kernel unless there is a very good
reason. The wait_event() should be used instead.

But please don't hurry to update this patch, we may need or want to
retire the host1x-waiter and then these all waiter-related patches won't
be needed.
Mikko Perttunen Jan. 12, 2021, 10:20 p.m. UTC | #2
On 1/13/21 12:07 AM, Dmitry Osipenko wrote:
> 11.01.2021 16:00, Mikko Perttunen пишет:
>> -void host1x_intr_put_ref(struct host1x *host, unsigned int id, void *ref)
>> +void host1x_intr_put_ref(struct host1x *host, unsigned int id, void *ref,
>> +			 bool flush)
>>   {
>>   	struct host1x_waitlist *waiter = ref;
>>   	struct host1x_syncpt *syncpt;
>>   
>> -	while (atomic_cmpxchg(&waiter->state, WLS_PENDING, WLS_CANCELLED) ==
>> -	       WLS_REMOVED)
>> -		schedule();
>> +	atomic_cmpxchg(&waiter->state, WLS_PENDING, WLS_CANCELLED);
>>   
>>   	syncpt = host->syncpt + id;
>> -	(void)process_wait_list(host, syncpt,
>> -				host1x_syncpt_load(host->syncpt + id));
>> +
>> +	spin_lock(&syncpt->intr.lock);
>> +	if (atomic_cmpxchg(&waiter->state, WLS_CANCELLED, WLS_HANDLED) ==
>> +	    WLS_CANCELLED) {
>> +		list_del(&waiter->list);
>> +		kref_put(&waiter->refcount, waiter_release);
>> +	}
>> +	spin_unlock(&syncpt->intr.lock);
>> +
>> +	if (flush) {
>> +		/* Wait until any concurrently executing handler has finished. */
>> +		while (atomic_read(&waiter->state) != WLS_HANDLED)
>> +			cpu_relax();
>> +	}
> 
> A busy-loop shouldn't be used in kernel unless there is a very good
> reason. The wait_event() should be used instead.
> 
> But please don't hurry to update this patch, we may need or want to
> retire the host1x-waiter and then these all waiter-related patches won't
> be needed.
> 

Yes, we should improve the intr code to remove all this complexity. But 
let's merge this first to get a functional baseline and do larger design 
changes in follow-up patches.

It is cumbersome for me to develop further series (of which I have 
several under work and planning) with this baseline series not being 
merged. The uncertainty on the approval of the UAPI design also makes it 
hard to know whether it makes sense for me to work on top of this code 
or not, so I'd like to focus on what's needed to get this merged instead 
of further redesign of the driver at this time.

Mikko
Dmitry Osipenko Jan. 13, 2021, 4:29 p.m. UTC | #3
13.01.2021 01:20, Mikko Perttunen пишет:
> On 1/13/21 12:07 AM, Dmitry Osipenko wrote:
>> 11.01.2021 16:00, Mikko Perttunen пишет:
>>> -void host1x_intr_put_ref(struct host1x *host, unsigned int id, void
>>> *ref)
>>> +void host1x_intr_put_ref(struct host1x *host, unsigned int id, void
>>> *ref,
>>> +             bool flush)
>>>   {
>>>       struct host1x_waitlist *waiter = ref;
>>>       struct host1x_syncpt *syncpt;
>>>   -    while (atomic_cmpxchg(&waiter->state, WLS_PENDING,
>>> WLS_CANCELLED) ==
>>> -           WLS_REMOVED)
>>> -        schedule();
>>> +    atomic_cmpxchg(&waiter->state, WLS_PENDING, WLS_CANCELLED);
>>>         syncpt = host->syncpt + id;
>>> -    (void)process_wait_list(host, syncpt,
>>> -                host1x_syncpt_load(host->syncpt + id));
>>> +
>>> +    spin_lock(&syncpt->intr.lock);
>>> +    if (atomic_cmpxchg(&waiter->state, WLS_CANCELLED, WLS_HANDLED) ==
>>> +        WLS_CANCELLED) {
>>> +        list_del(&waiter->list);
>>> +        kref_put(&waiter->refcount, waiter_release);
>>> +    }
>>> +    spin_unlock(&syncpt->intr.lock);
>>> +
>>> +    if (flush) {
>>> +        /* Wait until any concurrently executing handler has
>>> finished. */
>>> +        while (atomic_read(&waiter->state) != WLS_HANDLED)
>>> +            cpu_relax();
>>> +    }
>>
>> A busy-loop shouldn't be used in kernel unless there is a very good
>> reason. The wait_event() should be used instead.
>>
>> But please don't hurry to update this patch, we may need or want to
>> retire the host1x-waiter and then these all waiter-related patches won't
>> be needed.
>>
> 
> Yes, we should improve the intr code to remove all this complexity. But
> let's merge this first to get a functional baseline and do larger design
> changes in follow-up patches.
> 
> It is cumbersome for me to develop further series (of which I have
> several under work and planning) with this baseline series not being
> merged. The uncertainty on the approval of the UAPI design also makes it
> hard to know whether it makes sense for me to work on top of this code
> or not, so I'd like to focus on what's needed to get this merged instead
> of further redesign of the driver at this time.

Is this patch (and some others) necessary for the new UAPI? If not,
could we please narrow down the patches to the minimum that is needed
for trying out the new UAPI?
Mikko Perttunen Jan. 13, 2021, 6:16 p.m. UTC | #4
On 1/13/21 6:29 PM, Dmitry Osipenko wrote:
> 13.01.2021 01:20, Mikko Perttunen пишет:
>> On 1/13/21 12:07 AM, Dmitry Osipenko wrote:
>>> 11.01.2021 16:00, Mikko Perttunen пишет:
>>>> -void host1x_intr_put_ref(struct host1x *host, unsigned int id, void
>>>> *ref)
>>>> +void host1x_intr_put_ref(struct host1x *host, unsigned int id, void
>>>> *ref,
>>>> +             bool flush)
>>>>    {
>>>>        struct host1x_waitlist *waiter = ref;
>>>>        struct host1x_syncpt *syncpt;
>>>>    -    while (atomic_cmpxchg(&waiter->state, WLS_PENDING,
>>>> WLS_CANCELLED) ==
>>>> -           WLS_REMOVED)
>>>> -        schedule();
>>>> +    atomic_cmpxchg(&waiter->state, WLS_PENDING, WLS_CANCELLED);
>>>>          syncpt = host->syncpt + id;
>>>> -    (void)process_wait_list(host, syncpt,
>>>> -                host1x_syncpt_load(host->syncpt + id));
>>>> +
>>>> +    spin_lock(&syncpt->intr.lock);
>>>> +    if (atomic_cmpxchg(&waiter->state, WLS_CANCELLED, WLS_HANDLED) ==
>>>> +        WLS_CANCELLED) {
>>>> +        list_del(&waiter->list);
>>>> +        kref_put(&waiter->refcount, waiter_release);
>>>> +    }
>>>> +    spin_unlock(&syncpt->intr.lock);
>>>> +
>>>> +    if (flush) {
>>>> +        /* Wait until any concurrently executing handler has
>>>> finished. */
>>>> +        while (atomic_read(&waiter->state) != WLS_HANDLED)
>>>> +            cpu_relax();
>>>> +    }
>>>
>>> A busy-loop shouldn't be used in kernel unless there is a very good
>>> reason. The wait_event() should be used instead.
>>>
>>> But please don't hurry to update this patch, we may need or want to
>>> retire the host1x-waiter and then these all waiter-related patches won't
>>> be needed.
>>>
>>
>> Yes, we should improve the intr code to remove all this complexity. But
>> let's merge this first to get a functional baseline and do larger design
>> changes in follow-up patches.
>>
>> It is cumbersome for me to develop further series (of which I have
>> several under work and planning) with this baseline series not being
>> merged. The uncertainty on the approval of the UAPI design also makes it
>> hard to know whether it makes sense for me to work on top of this code
>> or not, so I'd like to focus on what's needed to get this merged instead
>> of further redesign of the driver at this time.
> 
> Is this patch (and some others) necessary for the new UAPI? If not,
> could we please narrow down the patches to the minimum that is needed
> for trying out the new UAPI?
> 

Yes, it is necessary. I tried to revert it and half the tests in the 
test suite start failing.

I think patches 01, 03, 14 and 17 are not strictly required, though 
reverting 03 will cause one of the syncpoint tests to fail.

Mikko
Thierry Reding March 23, 2021, 10:23 a.m. UTC | #5
On Wed, Jan 13, 2021 at 12:20:38AM +0200, Mikko Perttunen wrote:
> On 1/13/21 12:07 AM, Dmitry Osipenko wrote:
> > 11.01.2021 16:00, Mikko Perttunen пишет:
> > > -void host1x_intr_put_ref(struct host1x *host, unsigned int id, void *ref)
> > > +void host1x_intr_put_ref(struct host1x *host, unsigned int id, void *ref,
> > > +			 bool flush)
> > >   {
> > >   	struct host1x_waitlist *waiter = ref;
> > >   	struct host1x_syncpt *syncpt;
> > > -	while (atomic_cmpxchg(&waiter->state, WLS_PENDING, WLS_CANCELLED) ==
> > > -	       WLS_REMOVED)
> > > -		schedule();
> > > +	atomic_cmpxchg(&waiter->state, WLS_PENDING, WLS_CANCELLED);
> > >   	syncpt = host->syncpt + id;
> > > -	(void)process_wait_list(host, syncpt,
> > > -				host1x_syncpt_load(host->syncpt + id));
> > > +
> > > +	spin_lock(&syncpt->intr.lock);
> > > +	if (atomic_cmpxchg(&waiter->state, WLS_CANCELLED, WLS_HANDLED) ==
> > > +	    WLS_CANCELLED) {
> > > +		list_del(&waiter->list);
> > > +		kref_put(&waiter->refcount, waiter_release);
> > > +	}
> > > +	spin_unlock(&syncpt->intr.lock);
> > > +
> > > +	if (flush) {
> > > +		/* Wait until any concurrently executing handler has finished. */
> > > +		while (atomic_read(&waiter->state) != WLS_HANDLED)
> > > +			cpu_relax();
> > > +	}
> > 
> > A busy-loop shouldn't be used in kernel unless there is a very good
> > reason. The wait_event() should be used instead.
> > 
> > But please don't hurry to update this patch, we may need or want to
> > retire the host1x-waiter and then these all waiter-related patches won't
> > be needed.
> > 
> 
> Yes, we should improve the intr code to remove all this complexity. But
> let's merge this first to get a functional baseline and do larger design
> changes in follow-up patches.

I agree, there's no reason to hold back any interim improvements. But I
do agree with Dmitry's argument about the busy loop. Prior to this, the
code used schedule() to let the CPU run other code while waiting for the
waiter to change state. Is there any reason why we can't keep schedule()
here?

Thierry
diff mbox series

Patch

diff --git a/drivers/gpu/host1x/intr.c b/drivers/gpu/host1x/intr.c
index 9245add23b5d..70e1096a4fe9 100644
--- a/drivers/gpu/host1x/intr.c
+++ b/drivers/gpu/host1x/intr.c
@@ -242,18 +242,29 @@  int host1x_intr_add_action(struct host1x *host, struct host1x_syncpt *syncpt,
 	return 0;
 }
 
-void host1x_intr_put_ref(struct host1x *host, unsigned int id, void *ref)
+void host1x_intr_put_ref(struct host1x *host, unsigned int id, void *ref,
+			 bool flush)
 {
 	struct host1x_waitlist *waiter = ref;
 	struct host1x_syncpt *syncpt;
 
-	while (atomic_cmpxchg(&waiter->state, WLS_PENDING, WLS_CANCELLED) ==
-	       WLS_REMOVED)
-		schedule();
+	atomic_cmpxchg(&waiter->state, WLS_PENDING, WLS_CANCELLED);
 
 	syncpt = host->syncpt + id;
-	(void)process_wait_list(host, syncpt,
-				host1x_syncpt_load(host->syncpt + id));
+
+	spin_lock(&syncpt->intr.lock);
+	if (atomic_cmpxchg(&waiter->state, WLS_CANCELLED, WLS_HANDLED) ==
+	    WLS_CANCELLED) {
+		list_del(&waiter->list);
+		kref_put(&waiter->refcount, waiter_release);
+	}
+	spin_unlock(&syncpt->intr.lock);
+
+	if (flush) {
+		/* Wait until any concurrently executing handler has finished. */
+		while (atomic_read(&waiter->state) != WLS_HANDLED)
+			cpu_relax();
+	}
 
 	kref_put(&waiter->refcount, waiter_release);
 }
diff --git a/drivers/gpu/host1x/intr.h b/drivers/gpu/host1x/intr.h
index aac38194398f..6ea55e615e3a 100644
--- a/drivers/gpu/host1x/intr.h
+++ b/drivers/gpu/host1x/intr.h
@@ -74,8 +74,10 @@  int host1x_intr_add_action(struct host1x *host, struct host1x_syncpt *syncpt,
  * Unreference an action submitted to host1x_intr_add_action().
  * You must call this if you passed non-NULL as ref.
  * @ref the ref returned from host1x_intr_add_action()
+ * @flush wait until any pending handlers have completed before returning.
  */
-void host1x_intr_put_ref(struct host1x *host, unsigned int id, void *ref);
+void host1x_intr_put_ref(struct host1x *host, unsigned int id, void *ref,
+			 bool flush);
 
 /* Initialize host1x sync point interrupt */
 int host1x_intr_init(struct host1x *host, unsigned int irq_sync);
diff --git a/drivers/gpu/host1x/syncpt.c b/drivers/gpu/host1x/syncpt.c
index 5982fdf64e1c..e48b4595cf53 100644
--- a/drivers/gpu/host1x/syncpt.c
+++ b/drivers/gpu/host1x/syncpt.c
@@ -293,7 +293,7 @@  int host1x_syncpt_wait(struct host1x_syncpt *sp, u32 thresh, long timeout,
 		}
 	}
 
-	host1x_intr_put_ref(sp->host, sp->id, ref);
+	host1x_intr_put_ref(sp->host, sp->id, ref, true);
 
 done:
 	return err;