[2/2] drm/i915: Use atomic waits for short non-atomic ones

Message ID	1467114710-29989-2-git-send-email-tvrtko.ursulin@linux.intel.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <intel-gfx-bounces@lists.freedesktop.org> From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> To: Intel-gfx@lists.freedesktop.org Date: Tue, 28 Jun 2016 12:51:50 +0100 Message-Id: <1467114710-29989-2-git-send-email-tvrtko.ursulin@linux.intel.com> In-Reply-To: <1467114710-29989-1-git-send-email-tvrtko.ursulin@linux.intel.com> References: <1467114710-29989-1-git-send-email-tvrtko.ursulin@linux.intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Subject: [Intel-gfx] [PATCH 2/2] drm/i915: Use atomic waits for short non-atomic ones Precedence: list MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>

Tvrtko Ursulin June 28, 2016, 11:51 a.m. UTC

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

usleep_range is not recommended for waits shorten than 10us.

Make the wait_for_us use the atomic variant for such waits.

To do so we need to disable the !in_atomic warning for such uses
and also disable preemption since the macro is written in a way
to only be safe to be used in atomic context (local_clock() and
no second COND check after the timeout).

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Imre Deak <imre.deak@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
---
 drivers/gpu/drm/i915/intel_drv.h | 29 +++++++++++++++++++++--------
 1 file changed, 21 insertions(+), 8 deletions(-)

Imre Deak June 28, 2016, 12:19 p.m. UTC | #1

On ti, 2016-06-28 at 12:51 +0100, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> usleep_range is not recommended for waits shorten than 10us.
> 
> Make the wait_for_us use the atomic variant for such waits.
> 
> To do so we need to disable the !in_atomic warning for such uses
> and also disable preemption since the macro is written in a way
> to only be safe to be used in atomic context (local_clock() and
> no second COND check after the timeout).
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Imre Deak <imre.deak@intel.com>
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> ---
>  drivers/gpu/drm/i915/intel_drv.h | 29 +++++++++++++++++++++--------
>  1 file changed, 21 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
> index 3156d8df7921..e21bf6e6f119 100644
> --- a/drivers/gpu/drm/i915/intel_drv.h
> +++ b/drivers/gpu/drm/i915/intel_drv.h
> @@ -69,20 +69,21 @@
>  })
>  
>  #define wait_for(COND, MS)	  	_wait_for((COND), (MS) * 1000, 1000)
> -#define wait_for_us(COND, US)	  	_wait_for((COND), (US), 1)
>  
>  /* If CONFIG_PREEMPT_COUNT is disabled, in_atomic() always reports false. */
>  #if defined(CONFIG_DRM_I915_DEBUG) && defined(CONFIG_PREEMPT_COUNT)
> -# define _WAIT_FOR_ATOMIC_CHECK WARN_ON_ONCE(!in_atomic())
> +# define _WAIT_FOR_ATOMIC_CHECK(ATOMIC) WARN_ON_ONCE((ATOMIC) && !in_atomic())
>  #else
> -# define _WAIT_FOR_ATOMIC_CHECK do { } while (0)
> +# define _WAIT_FOR_ATOMIC_CHECK(ATOMIC) do { } while (0)
>  #endif
>  
> -#define _wait_for_atomic(COND, US) ({ \
> +#define _wait_for_atomic(COND, US, ATOMIC) ({ \
>  	unsigned long end__; \
>  	int ret__ = 0; \
> -	_WAIT_FOR_ATOMIC_CHECK; \
> -	BUILD_BUG_ON((US) > 50000); \
> +	_WAIT_FOR_ATOMIC_CHECK(ATOMIC); \
> +	BUILD_BUG_ON((ATOMIC) && (US) > 50000); \
> +	if (!(ATOMIC)) \
> +		preempt_disable(); \

Disabling preemption for this purpose (scheduling a timeout) could be
frowned upon, although for 10us may be not an issue. Another
possibility would be to use cpu_clock() instead which would have some
overhead in case of scheduling away from the initial CPU, but we'd only
incur it for the non-atomic <10us case, so would be negligible imo.
You'd also have to re-check the condition with that solution.

Also could you explain how can we ignore hard IRQs as hinted by the
comment in _wait_for_atomic()?

>  	end__ = (local_clock() >> 10) + (US) + 1; \
>  	while (!(COND)) { \
>  		if (time_after((unsigned long)(local_clock() >> 10), end__)) { \
> @@ -97,11 +98,23 @@
>  		} \
>  		cpu_relax(); \
>  	} \
> +	if (!(ATOMIC)) \
> +		preempt_enable(); \
>  	ret__; \
>  })
>  
> -#define wait_for_atomic(COND, MS)	_wait_for_atomic((COND), (MS) * 1000)
> -#define wait_for_atomic_us(COND, US)	_wait_for_atomic((COND), (US))
> +#define wait_for_us(COND, US) \
> +({ \
> +	int ret__; \
> +	if ((US) > 10) \
> +		ret__ = _wait_for((COND), (US), 10); \
> +	else \
> +		ret__ = _wait_for_atomic((COND), (US), 0); \
> +	ret__; \
> +})
> +
> +#define wait_for_atomic(COND, MS)	_wait_for_atomic((COND), (MS) * 1000, 1)
> +#define wait_for_atomic_us(COND, US)	_wait_for_atomic((COND), (US), 1)
>  
>  #define KHz(x) (1000 * (x))
>  #define MHz(x) KHz(1000 * (x))

Chris Wilson June 28, 2016, 12:19 p.m. UTC | #2

On Tue, Jun 28, 2016 at 12:51:50PM +0100, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> usleep_range is not recommended for waits shorten than 10us.
> 
> Make the wait_for_us use the atomic variant for such waits.
> 
> To do so we need to disable the !in_atomic warning for such uses
> and also disable preemption since the macro is written in a way
> to only be safe to be used in atomic context (local_clock() and
> no second COND check after the timeout).
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Imre Deak <imre.deak@intel.com>
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> ---
>  drivers/gpu/drm/i915/intel_drv.h | 29 +++++++++++++++++++++--------
>  1 file changed, 21 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
> index 3156d8df7921..e21bf6e6f119 100644
> --- a/drivers/gpu/drm/i915/intel_drv.h
> +++ b/drivers/gpu/drm/i915/intel_drv.h
> @@ -69,20 +69,21 @@
>  })
>  
>  #define wait_for(COND, MS)	  	_wait_for((COND), (MS) * 1000, 1000)
> -#define wait_for_us(COND, US)	  	_wait_for((COND), (US), 1)
>  
>  /* If CONFIG_PREEMPT_COUNT is disabled, in_atomic() always reports false. */
>  #if defined(CONFIG_DRM_I915_DEBUG) && defined(CONFIG_PREEMPT_COUNT)
> -# define _WAIT_FOR_ATOMIC_CHECK WARN_ON_ONCE(!in_atomic())
> +# define _WAIT_FOR_ATOMIC_CHECK(ATOMIC) WARN_ON_ONCE((ATOMIC) && !in_atomic())
>  #else
> -# define _WAIT_FOR_ATOMIC_CHECK do { } while (0)
> +# define _WAIT_FOR_ATOMIC_CHECK(ATOMIC) do { } while (0)
>  #endif
>  
> -#define _wait_for_atomic(COND, US) ({ \
> +#define _wait_for_atomic(COND, US, ATOMIC) ({ \

__busywait_for
__busyspin_for

(Thinking __usleep_until and maybe __busyspin_until)

>  	unsigned long end__; \
>  	int ret__ = 0; \
> -	_WAIT_FOR_ATOMIC_CHECK; \
> -	BUILD_BUG_ON((US) > 50000); \
> +	_WAIT_FOR_ATOMIC_CHECK(ATOMIC); \
> +	BUILD_BUG_ON((ATOMIC) && (US) > 50000); \
> +	if (!(ATOMIC)) \
> +		preempt_disable(); \

>  	end__ = (local_clock() >> 10) + (US) + 1; \
>  	while (!(COND)) { \
>  		if (time_after((unsigned long)(local_clock() >> 10), end__)) { \
> @@ -97,11 +98,23 @@
>  		} \

if (!ATOMIC && need_resched())
	break;

?
-Chris

Tvrtko Ursulin June 28, 2016, 1:29 p.m. UTC | #3

On 28/06/16 13:19, Imre Deak wrote:
> On ti, 2016-06-28 at 12:51 +0100, Tvrtko Ursulin wrote:
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> usleep_range is not recommended for waits shorten than 10us.
>>
>> Make the wait_for_us use the atomic variant for such waits.
>>
>> To do so we need to disable the !in_atomic warning for such uses
>> and also disable preemption since the macro is written in a way
>> to only be safe to be used in atomic context (local_clock() and
>> no second COND check after the timeout).
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>> Cc: Imre Deak <imre.deak@intel.com>
>> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
>> ---
>>   drivers/gpu/drm/i915/intel_drv.h | 29 +++++++++++++++++++++--------
>>   1 file changed, 21 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
>> index 3156d8df7921..e21bf6e6f119 100644
>> --- a/drivers/gpu/drm/i915/intel_drv.h
>> +++ b/drivers/gpu/drm/i915/intel_drv.h
>> @@ -69,20 +69,21 @@
>>   })
>>
>>   #define wait_for(COND, MS)	  	_wait_for((COND), (MS) * 1000, 1000)
>> -#define wait_for_us(COND, US)	  	_wait_for((COND), (US), 1)
>>
>>   /* If CONFIG_PREEMPT_COUNT is disabled, in_atomic() always reports false. */
>>   #if defined(CONFIG_DRM_I915_DEBUG) && defined(CONFIG_PREEMPT_COUNT)
>> -# define _WAIT_FOR_ATOMIC_CHECK WARN_ON_ONCE(!in_atomic())
>> +# define _WAIT_FOR_ATOMIC_CHECK(ATOMIC) WARN_ON_ONCE((ATOMIC) && !in_atomic())
>>   #else
>> -# define _WAIT_FOR_ATOMIC_CHECK do { } while (0)
>> +# define _WAIT_FOR_ATOMIC_CHECK(ATOMIC) do { } while (0)
>>   #endif
>>
>> -#define _wait_for_atomic(COND, US) ({ \
>> +#define _wait_for_atomic(COND, US, ATOMIC) ({ \
>>   	unsigned long end__; \
>>   	int ret__ = 0; \
>> -	_WAIT_FOR_ATOMIC_CHECK; \
>> -	BUILD_BUG_ON((US) > 50000); \
>> +	_WAIT_FOR_ATOMIC_CHECK(ATOMIC); \
>> +	BUILD_BUG_ON((ATOMIC) && (US) > 50000); \
>> +	if (!(ATOMIC)) \
>> +		preempt_disable(); \
>
> Disabling preemption for this purpose (scheduling a timeout) could be
> frowned upon, although for 10us may be not an issue. Another

Possibly, but I don't see how to otherwise do it.

And about the number itself - I chose 10us just because usleep_range is 
not recommended for <10us due setup overhead.

> possibility would be to use cpu_clock() instead which would have some
> overhead in case of scheduling away from the initial CPU, but we'd only
> incur it for the non-atomic <10us case, so would be negligible imo.
> You'd also have to re-check the condition with that solution.

How would you implement it with cpu_clock? What would you do when 
re-scheduled?

> Also could you explain how can we ignore hard IRQs as hinted by the
> comment in _wait_for_atomic()?

Hm, in retrospect it does not look safe. Upside that after your fixes 
from today it will be, since all remaining callers are with interrupts 
disabled. And downside that the patch from this thread is not safe then 
and would need the condition put back in. Possibly only in the !ATOMIC 
case but that might be too fragile for the future.

Regards,

Tvrtko

Imre Deak June 28, 2016, 1:53 p.m. UTC | #4

On ti, 2016-06-28 at 14:29 +0100, Tvrtko Ursulin wrote:
> On 28/06/16 13:19, Imre Deak wrote:
> > On ti, 2016-06-28 at 12:51 +0100, Tvrtko Ursulin wrote:
> > > From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > > 
> > > usleep_range is not recommended for waits shorten than 10us.
> > > 
> > > Make the wait_for_us use the atomic variant for such waits.
> > > 
> > > To do so we need to disable the !in_atomic warning for such uses
> > > and also disable preemption since the macro is written in a way
> > > to only be safe to be used in atomic context (local_clock() and
> > > no second COND check after the timeout).
> > > 
> > > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > > Cc: Chris Wilson <chris@chris-wilson.co.uk>
> > > Cc: Imre Deak <imre.deak@intel.com>
> > > Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> > > ---
> > >   drivers/gpu/drm/i915/intel_drv.h | 29 +++++++++++++++++++++--------
> > >   1 file changed, 21 insertions(+), 8 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
> > > index 3156d8df7921..e21bf6e6f119 100644
> > > --- a/drivers/gpu/drm/i915/intel_drv.h
> > > +++ b/drivers/gpu/drm/i915/intel_drv.h
> > > @@ -69,20 +69,21 @@
> > >   })
> > > 
> > >   #define wait_for(COND, MS)	  	_wait_for((COND), (MS) * 1000, 1000)
> > > -#define wait_for_us(COND, US)	  	_wait_for((COND), (US), 1)
> > > 
> > >   /* If CONFIG_PREEMPT_COUNT is disabled, in_atomic() always reports false. */
> > >   #if defined(CONFIG_DRM_I915_DEBUG) && defined(CONFIG_PREEMPT_COUNT)
> > > -# define _WAIT_FOR_ATOMIC_CHECK WARN_ON_ONCE(!in_atomic())
> > > +# define _WAIT_FOR_ATOMIC_CHECK(ATOMIC) WARN_ON_ONCE((ATOMIC) && !in_atomic())
> > >   #else
> > > -# define _WAIT_FOR_ATOMIC_CHECK do { } while (0)
> > > +# define _WAIT_FOR_ATOMIC_CHECK(ATOMIC) do { } while (0)
> > >   #endif
> > > 
> > > -#define _wait_for_atomic(COND, US) ({ \
> > > +#define _wait_for_atomic(COND, US, ATOMIC) ({ \
> > >   	unsigned long end__; \
> > >   	int ret__ = 0; \
> > > -	_WAIT_FOR_ATOMIC_CHECK; \
> > > -	BUILD_BUG_ON((US) > 50000); \
> > > +	_WAIT_FOR_ATOMIC_CHECK(ATOMIC); \
> > > +	BUILD_BUG_ON((ATOMIC) && (US) > 50000); \
> > > +	if (!(ATOMIC)) \
> > > +		preempt_disable(); \
> > 
> > Disabling preemption for this purpose (scheduling a timeout) could be
> > frowned upon, although for 10us may be not an issue. Another
> 
> Possibly, but I don't see how to otherwise do it.
> 
> And about the number itself - I chose 10us just because usleep_range is 
> not recommended for <10us due setup overhead.
> 
> > possibility would be to use cpu_clock() instead which would have some
> > overhead in case of scheduling away from the initial CPU, but we'd only
> > incur it for the non-atomic <10us case, so would be negligible imo.
> > You'd also have to re-check the condition with that solution.
> 
> How would you implement it with cpu_clock? What would you do when 
> re-scheduled?

By calculating the expiry in the beginning with cpu_clock()
using raw_smp_processor_id() and then calling cpu_clock() in
time_after() with the same CPU id. cpu_clock() would then internally
handle the scheduling away scenario.

> > Also could you explain how can we ignore hard IRQs as hinted by the
> > comment in _wait_for_atomic()?
> 
> Hm, in retrospect it does not look safe. Upside that after your fixes 
> from today it will be, since all remaining callers are with interrupts 
> disabled.

Well, except for the GuC path, but that's for a 10ms timeout, so
probably doesn't matter (or else we have a bigger problem).

> And downside that the patch from this thread is not safe then 
> and would need the condition put back in. Possibly only in the !ATOMIC 
> case but that might be too fragile for the future.

I'd say we'd need the extra check at least whenever hard IRQs are not
disabled. Even then there could be NMIs or some other background stuff
(ME) that could be a problem. OTOH we'd incur the overhead from the
extra check only in the exceptional timeout case, so I think doing it
in all cases wouldn't be a big problem.

--Imre

Chris Wilson June 28, 2016, 1:55 p.m. UTC | #5

On Tue, Jun 28, 2016 at 02:29:33PM +0100, Tvrtko Ursulin wrote:
> How would you implement it with cpu_clock? What would you do when
> re-scheduled?

unsigned long base;
int cpu;
int ret;

preempt_disable();
cpu = smp_processor_id();
base = local_clock() >> 10;
for (;;) {
	u64 now = local_clock() >> 10;
	preempt_enable();

	if (COND) {
		ret = 0;
		break;
	}

	if (now - base >= timeout) {
		ret = -ETIMEOUT;
		break;
	}
	
	cpu_relax();

	preempt_disable();
	if (unlikely(cpu != smp_processor_id()) {
		timeout -= now - base;
		cpu = smp_processor_id();
		base = local_clock() >> 10;
	}
}
ret;

Borrowed from udelay()
-Chris

Chris Wilson June 28, 2016, 2:14 p.m. UTC | #6

On Tue, Jun 28, 2016 at 02:55:28PM +0100, Chris Wilson wrote:
> On Tue, Jun 28, 2016 at 02:29:33PM +0100, Tvrtko Ursulin wrote:
> > How would you implement it with cpu_clock? What would you do when
> > re-scheduled?
> 
> unsigned long base;
> int cpu;
> int ret;
> 
> preempt_disable();
> cpu = smp_processor_id();
> base = local_clock() >> 10;
> for (;;) {
> 	u64 now = local_clock() >> 10;
> 	preempt_enable();
> 
> 	if (COND) {
> 		ret = 0;
> 		break;
> 	}
> 
> 	if (now - base >= timeout) {
> 		ret = -ETIMEOUT;
> 		break;
> 	}
> 	
> 	cpu_relax();
> 
> 	preempt_disable();
> 	if (unlikely(cpu != smp_processor_id()) {
> 		timeout -= now - base;

For this, we should scale everything to ns (u64).
-Chris

Tvrtko Ursulin June 28, 2016, 2:38 p.m. UTC | #7

On 28/06/16 14:53, Imre Deak wrote:
> On ti, 2016-06-28 at 14:29 +0100, Tvrtko Ursulin wrote:
>> On 28/06/16 13:19, Imre Deak wrote:
>>> On ti, 2016-06-28 at 12:51 +0100, Tvrtko Ursulin wrote:
>>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>
>>>> usleep_range is not recommended for waits shorten than 10us.
>>>>
>>>> Make the wait_for_us use the atomic variant for such waits.
>>>>
>>>> To do so we need to disable the !in_atomic warning for such uses
>>>> and also disable preemption since the macro is written in a way
>>>> to only be safe to be used in atomic context (local_clock() and
>>>> no second COND check after the timeout).
>>>>
>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>>>> Cc: Imre Deak <imre.deak@intel.com>
>>>> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
>>>> ---
>>>>    drivers/gpu/drm/i915/intel_drv.h | 29 +++++++++++++++++++++--------
>>>>    1 file changed, 21 insertions(+), 8 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
>>>> index 3156d8df7921..e21bf6e6f119 100644
>>>> --- a/drivers/gpu/drm/i915/intel_drv.h
>>>> +++ b/drivers/gpu/drm/i915/intel_drv.h
>>>> @@ -69,20 +69,21 @@
>>>>    })
>>>>
>>>>    #define wait_for(COND, MS)	  	_wait_for((COND), (MS) * 1000, 1000)
>>>> -#define wait_for_us(COND, US)	  	_wait_for((COND), (US), 1)
>>>>
>>>>    /* If CONFIG_PREEMPT_COUNT is disabled, in_atomic() always reports false. */
>>>>    #if defined(CONFIG_DRM_I915_DEBUG) && defined(CONFIG_PREEMPT_COUNT)
>>>> -# define _WAIT_FOR_ATOMIC_CHECK WARN_ON_ONCE(!in_atomic())
>>>> +# define _WAIT_FOR_ATOMIC_CHECK(ATOMIC) WARN_ON_ONCE((ATOMIC) && !in_atomic())
>>>>    #else
>>>> -# define _WAIT_FOR_ATOMIC_CHECK do { } while (0)
>>>> +# define _WAIT_FOR_ATOMIC_CHECK(ATOMIC) do { } while (0)
>>>>    #endif
>>>>
>>>> -#define _wait_for_atomic(COND, US) ({ \
>>>> +#define _wait_for_atomic(COND, US, ATOMIC) ({ \
>>>>    	unsigned long end__; \
>>>>    	int ret__ = 0; \
>>>> -	_WAIT_FOR_ATOMIC_CHECK; \
>>>> -	BUILD_BUG_ON((US) > 50000); \
>>>> +	_WAIT_FOR_ATOMIC_CHECK(ATOMIC); \
>>>> +	BUILD_BUG_ON((ATOMIC) && (US) > 50000); \
>>>> +	if (!(ATOMIC)) \
>>>> +		preempt_disable(); \
>>>
>>> Disabling preemption for this purpose (scheduling a timeout) could be
>>> frowned upon, although for 10us may be not an issue. Another
>>
>> Possibly, but I don't see how to otherwise do it.
>>
>> And about the number itself - I chose 10us just because usleep_range is
>> not recommended for <10us due setup overhead.
>>
>>> possibility would be to use cpu_clock() instead which would have some
>>> overhead in case of scheduling away from the initial CPU, but we'd only
>>> incur it for the non-atomic <10us case, so would be negligible imo.
>>> You'd also have to re-check the condition with that solution.
>>
>> How would you implement it with cpu_clock? What would you do when
>> re-scheduled?
>
> By calculating the expiry in the beginning with cpu_clock()
> using raw_smp_processor_id() and then calling cpu_clock() in
> time_after() with the same CPU id. cpu_clock() would then internally
> handle the scheduling away scenario.

Right, but that is also not ideal since if the two cpu_clocks differ the 
running time domain is not identical to the timeout one. Probably would 
not matter but feels hacky.

>>> Also could you explain how can we ignore hard IRQs as hinted by the
>>> comment in _wait_for_atomic()?
>>
>> Hm, in retrospect it does not look safe. Upside that after your fixes
>> from today it will be, since all remaining callers are with interrupts
>> disabled.
>
> Well, except for the GuC path, but that's for a 10ms timeout, so
> probably doesn't matter (or else we have a bigger problem).

I've just sent a patch for that.

>> And downside that the patch from this thread is not safe then
>> and would need the condition put back in. Possibly only in the !ATOMIC
>> case but that might be too fragile for the future.
>
> I'd say we'd need the extra check at least whenever hard IRQs are not
> disabled. Even then there could be NMIs or some other background stuff
> (ME) that could be a problem. OTOH we'd incur the overhead from the
> extra check only in the exceptional timeout case, so I think doing it
> in all cases wouldn't be a big problem.

Yeah I'll put it in.

Regards,

Tvrtko

Tvrtko Ursulin June 28, 2016, 2:40 p.m. UTC | #8

On 28/06/16 15:14, Chris Wilson wrote:
> On Tue, Jun 28, 2016 at 02:55:28PM +0100, Chris Wilson wrote:
>> On Tue, Jun 28, 2016 at 02:29:33PM +0100, Tvrtko Ursulin wrote:
>>> How would you implement it with cpu_clock? What would you do when
>>> re-scheduled?
>>
>> unsigned long base;
>> int cpu;
>> int ret;
>>
>> preempt_disable();
>> cpu = smp_processor_id();
>> base = local_clock() >> 10;
>> for (;;) {
>> 	u64 now = local_clock() >> 10;
>> 	preempt_enable();
>>
>> 	if (COND) {
>> 		ret = 0;
>> 		break;
>> 	}
>>
>> 	if (now - base >= timeout) {
>> 		ret = -ETIMEOUT;
>> 		break;
>> 	}
>> 	
>> 	cpu_relax();
>>
>> 	preempt_disable();
>> 	if (unlikely(cpu != smp_processor_id()) {
>> 		timeout -= now - base;
>
> For this, we should scale everything to ns (u64).

In other words not scale. Is this type of loop more preferable to you 
guys vs how it looked in this original patch?

Only difference is the preempt off section is shorter here, but I don't 
think that is interesting for the atomic waits case.


Regards,

Tvrtko

Chris Wilson June 28, 2016, 3:39 p.m. UTC | #9

On Tue, Jun 28, 2016 at 03:40:24PM +0100, Tvrtko Ursulin wrote:
> 
> On 28/06/16 15:14, Chris Wilson wrote:
> >On Tue, Jun 28, 2016 at 02:55:28PM +0100, Chris Wilson wrote:
> >>On Tue, Jun 28, 2016 at 02:29:33PM +0100, Tvrtko Ursulin wrote:
> >>>How would you implement it with cpu_clock? What would you do when
> >>>re-scheduled?
> >>
> >>unsigned long base;
> >>int cpu;
> >>int ret;
> >>
> >>preempt_disable();
> >>cpu = smp_processor_id();
> >>base = local_clock() >> 10;
> >>for (;;) {
> >>	u64 now = local_clock() >> 10;
> >>	preempt_enable();
> >>
> >>	if (COND) {
> >>		ret = 0;
> >>		break;
> >>	}
> >>
> >>	if (now - base >= timeout) {
> >>		ret = -ETIMEOUT;
> >>		break;
> >>	}
> >>	
> >>	cpu_relax();
> >>
> >>	preempt_disable();
> >>	if (unlikely(cpu != smp_processor_id()) {
> >>		timeout -= now - base;
> >
> >For this, we should scale everything to ns (u64).
> 
> In other words not scale. Is this type of loop more preferable to
> you guys vs how it looked in this original patch?
> 
> Only difference is the preempt off section is shorter here, but I
> don't think that is interesting for the atomic waits case.

Right, for the current (correct) atomic users we simply don't care. It's
the sleepers where adding a 10us unpreemptible section is likely to be
frowned upon. I wonder if we should even go as far as adding a
cond_resched().
-Chris

Imre Deak June 28, 2016, 5:45 p.m. UTC | #10

On Tue, 2016-06-28 at 15:38 +0100, Tvrtko Ursulin wrote:
> On 28/06/16 14:53, Imre Deak wrote:
> > On ti, 2016-06-28 at 14:29 +0100, Tvrtko Ursulin wrote:
> > > On 28/06/16 13:19, Imre Deak wrote:
> > > > On ti, 2016-06-28 at 12:51 +0100, Tvrtko Ursulin wrote:
> > > > > From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > > > > 
> > > > > usleep_range is not recommended for waits shorten than 10us.
> > > > > 
> > > > > Make the wait_for_us use the atomic variant for such waits.
> > > > > 
> > > > > To do so we need to disable the !in_atomic warning for such uses
> > > > > and also disable preemption since the macro is written in a way
> > > > > to only be safe to be used in atomic context (local_clock() and
> > > > > no second COND check after the timeout).
> > > > > 
> > > > > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > > > > Cc: Chris Wilson <chris@chris-wilson.co.uk>
> > > > > Cc: Imre Deak <imre.deak@intel.com>
> > > > > Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> > > > > ---
> > > > >    drivers/gpu/drm/i915/intel_drv.h | 29 +++++++++++++++++++++--------
> > > > >    1 file changed, 21 insertions(+), 8 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
> > > > > index 3156d8df7921..e21bf6e6f119 100644
> > > > > --- a/drivers/gpu/drm/i915/intel_drv.h
> > > > > +++ b/drivers/gpu/drm/i915/intel_drv.h
> > > > > @@ -69,20 +69,21 @@
> > > > >    })
> > > > > 
> > > > >    #define wait_for(COND, MS)	  	_wait_for((COND), (MS) * 1000, 1000)
> > > > > -#define wait_for_us(COND, US)	  	_wait_for((COND), (US), 1)
> > > > > 
> > > > >    /* If CONFIG_PREEMPT_COUNT is disabled, in_atomic() always reports false. */
> > > > >    #if defined(CONFIG_DRM_I915_DEBUG) && defined(CONFIG_PREEMPT_COUNT)
> > > > > -# define _WAIT_FOR_ATOMIC_CHECK WARN_ON_ONCE(!in_atomic())
> > > > > +# define _WAIT_FOR_ATOMIC_CHECK(ATOMIC) WARN_ON_ONCE((ATOMIC) && !in_atomic())
> > > > >    #else
> > > > > -# define _WAIT_FOR_ATOMIC_CHECK do { } while (0)
> > > > > +# define _WAIT_FOR_ATOMIC_CHECK(ATOMIC) do { } while (0)
> > > > >    #endif
> > > > > 
> > > > > -#define _wait_for_atomic(COND, US) ({ \
> > > > > +#define _wait_for_atomic(COND, US, ATOMIC) ({ \
> > > > >    	unsigned long end__; \
> > > > >    	int ret__ = 0; \
> > > > > -	_WAIT_FOR_ATOMIC_CHECK; \
> > > > > -	BUILD_BUG_ON((US) > 50000); \
> > > > > +	_WAIT_FOR_ATOMIC_CHECK(ATOMIC); \
> > > > > +	BUILD_BUG_ON((ATOMIC) && (US) > 50000); \
> > > > > +	if (!(ATOMIC)) \
> > > > > +		preempt_disable(); \
> > > > 
> > > > Disabling preemption for this purpose (scheduling a timeout) could be
> > > > frowned upon, although for 10us may be not an issue. Another
> > > 
> > > Possibly, but I don't see how to otherwise do it.
> > > 
> > > And about the number itself - I chose 10us just because usleep_range is
> > > not recommended for <10us due setup overhead.
> > > 
> > > > possibility would be to use cpu_clock() instead which would have some
> > > > overhead in case of scheduling away from the initial CPU, but we'd only
> > > > incur it for the non-atomic <10us case, so would be negligible imo.
> > > > You'd also have to re-check the condition with that solution.
> > > 
> > > How would you implement it with cpu_clock? What would you do when
> > > re-scheduled?
> > 
> > By calculating the expiry in the beginning with cpu_clock()
> > using raw_smp_processor_id() and then calling cpu_clock() in
> > time_after() with the same CPU id. cpu_clock() would then internally
> > handle the scheduling away scenario.
> 
> Right, but that is also not ideal since if the two cpu_clocks differ the 
> running time domain is not identical to the timeout one.

Hm, this is what I meant:

int cpu = raw_smp_processor_id();
end = cpu_clock(cpu) + timeout;
while (!time_after(cpu_clock(cpu), end))
   check condition...

So cpu_clock() would be always called with the same CPU id and hence it
would return timestamps from the same time domain. In case of
scheduling away at any point cpu_clock() would internally detect this
and return the timestamp from the original CPU time domain.

This also has the advantage that in case of a stable, synchronized TSC
- which should be the generic case - cpu_clock() simply translates to
reading the TSC, without disabling pre-emption.

I don't see any problem with Chris' approach either though.

> Probably would not matter but feels hacky.
> 
> > > > Also could you explain how can we ignore hard IRQs as hinted by the
> > > > comment in _wait_for_atomic()?
> > > 
> > > Hm, in retrospect it does not look safe. Upside that after your fixes
> > > from today it will be, since all remaining callers are with interrupts
> > > disabled.
> > 
> > Well, except for the GuC path, but that's for a 10ms timeout, so
> > probably doesn't matter (or else we have a bigger problem).
> 
> I've just sent a patch for that.
> 
> > > And downside that the patch from this thread is not safe then
> > > and would need the condition put back in. Possibly only in the !ATOMIC
> > > case but that might be too fragile for the future.
> > 
> > I'd say we'd need the extra check at least whenever hard IRQs are not
> > disabled. Even then there could be NMIs or some other background stuff
> > (ME) that could be a problem. OTOH we'd incur the overhead from the
> > extra check only in the exceptional timeout case, so I think doing it
> > in all cases wouldn't be a big problem.
> 
> Yeah I'll put it in.
> 
> Regards,
> 
> Tvrtko
> 
> 
>

[2/2] drm/i915: Use atomic waits for short non-atomic ones

Commit Message

Comments

Patch