diff mbox

drm/i915: Use ktime on wait_for

Message ID 20180420095426.1105-1-mika.kuoppala@linux.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Mika Kuoppala April 20, 2018, 9:54 a.m. UTC
We use jiffies to determine when wait expires. However
Imre did find out that jiffies can and will do a >1
increments on certain situations [1]. When this happens
in a wait_for loop, we return timeout errorneously
much earlier than what the real wallclock would say.

We can't afford our waits to timeout prematurely.
Discard jiffies and change to ktime to detect timeouts.

Reported-by: Imre Deak <imre.deak@intel.com>
References: https://lkml.org/lkml/2018/4/18/798 [1]
Cc: Imre Deak <imre.deak@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_drv.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

sagar.a.kamble@intel.com April 20, 2018, 10:23 a.m. UTC | #1
On 4/20/2018 3:24 PM, Mika Kuoppala wrote:
> We use jiffies to determine when wait expires. However
> Imre did find out that jiffies can and will do a >1
> increments on certain situations [1]. When this happens
> in a wait_for loop, we return timeout errorneously
> much earlier than what the real wallclock would say.
>
> We can't afford our waits to timeout prematurely.
> Discard jiffies and change to ktime to detect timeouts.
>
> Reported-by: Imre Deak <imre.deak@intel.com>
> References: https://lkml.org/lkml/2018/4/18/798 [1]
> Cc: Imre Deak <imre.deak@intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> ---
>   drivers/gpu/drm/i915/intel_drv.h | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
> index 8b20824e806e..ac7565220aa3 100644
> --- a/drivers/gpu/drm/i915/intel_drv.h
> +++ b/drivers/gpu/drm/i915/intel_drv.h
> @@ -49,12 +49,12 @@
>    * check the condition before the timeout.
>    */
>   #define __wait_for(OP, COND, US, Wmin, Wmax) ({ \
> -	unsigned long timeout__ = jiffies + usecs_to_jiffies(US) + 1;	\
> +	const ktime_t end__ = ktime_add_ns(ktime_get_raw(), 1000ll * (US)); \
Is ktime_get_raw() monotonic? Thomas suggested ktime_get()
>   	long wait__ = (Wmin); /* recommended min for usleep is 10 us */	\
>   	int ret__;							\
>   	might_sleep();							\
>   	for (;;) {							\
> -		bool expired__ = time_after(jiffies, timeout__);	\
> +		const bool expired__ = ktime_after(ktime_get_raw(), end__); \
>   		OP;							\
>   		if (COND) {						\
>   			ret__ = 0;					\
Chris Wilson April 20, 2018, 10:27 a.m. UTC | #2
Quoting Mika Kuoppala (2018-04-20 10:54:26)
> We use jiffies to determine when wait expires. However
> Imre did find out that jiffies can and will do a >1
> increments on certain situations [1]. When this happens
> in a wait_for loop, we return timeout errorneously
> much earlier than what the real wallclock would say.
> 
> We can't afford our waits to timeout prematurely.
> Discard jiffies and change to ktime to detect timeouts.
> 
> Reported-by: Imre Deak <imre.deak@intel.com>
> References: https://lkml.org/lkml/2018/4/18/798 [1]
> Cc: Imre Deak <imre.deak@intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>

The atomic variant is already jiffie-less (since it has to work in
irq-off contexts). Maybe a bit tricky to suggest that the callers know
if jiffie incremens are accurate or not.

What is not clear from the link is whether our wait_for() is running
across suspend, or whether it is just jiffie recalibration some time
during resume that breaks.

>  drivers/gpu/drm/i915/intel_drv.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
> index 8b20824e806e..ac7565220aa3 100644
> --- a/drivers/gpu/drm/i915/intel_drv.h
> +++ b/drivers/gpu/drm/i915/intel_drv.h
> @@ -49,12 +49,12 @@
>   * check the condition before the timeout.
>   */
>  #define __wait_for(OP, COND, US, Wmin, Wmax) ({ \
> -       unsigned long timeout__ = jiffies + usecs_to_jiffies(US) + 1;   \
> +       const ktime_t end__ = ktime_add_ns(ktime_get_raw(), 1000ll * (US)); \
>         long wait__ = (Wmin); /* recommended min for usleep is 10 us */ \
>         int ret__;                                                      \
>         might_sleep();                                                  \
>         for (;;) {                                                      \
> -               bool expired__ = time_after(jiffies, timeout__);        \
> +               const bool expired__ = ktime_after(ktime_get_raw(), end__); \
>                 OP;                                                     \
>                 if (COND) {                                             \
>                         ret__ = 0;                                      \

Nevertheless, the patch is ok and I don't have too much objection to
adding another tsc (at best, hpet at worst!) read around every mmio+sleep,
plus expanding the code for the function calls. Out of curiosity what is
the size delta? How many wait_for() do we have left that we need to
convert to a function call rather than macro expansion?

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>

Cc stable?
-Chris
Chris Wilson April 20, 2018, 10:39 a.m. UTC | #3
Quoting Sagar Arun Kamble (2018-04-20 11:23:50)
> 
> 
> On 4/20/2018 3:24 PM, Mika Kuoppala wrote:
> > We use jiffies to determine when wait expires. However
> > Imre did find out that jiffies can and will do a >1
> > increments on certain situations [1]. When this happens
> > in a wait_for loop, we return timeout errorneously
> > much earlier than what the real wallclock would say.
> >
> > We can't afford our waits to timeout prematurely.
> > Discard jiffies and change to ktime to detect timeouts.
> >
> > Reported-by: Imre Deak <imre.deak@intel.com>
> > References: https://lkml.org/lkml/2018/4/18/798 [1]
> > Cc: Imre Deak <imre.deak@intel.com>
> > Cc: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > ---
> >   drivers/gpu/drm/i915/intel_drv.h | 4 ++--
> >   1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
> > index 8b20824e806e..ac7565220aa3 100644
> > --- a/drivers/gpu/drm/i915/intel_drv.h
> > +++ b/drivers/gpu/drm/i915/intel_drv.h
> > @@ -49,12 +49,12 @@
> >    * check the condition before the timeout.
> >    */
> >   #define __wait_for(OP, COND, US, Wmin, Wmax) ({ \
> > -     unsigned long timeout__ = jiffies + usecs_to_jiffies(US) + 1;   \
> > +     const ktime_t end__ = ktime_add_ns(ktime_get_raw(), 1000ll * (US)); \
> Is ktime_get_raw() monotonic? Thomas suggested ktime_get()

It proclaims to be monotonic, without the clock drift calibration. For
the milliseconds we should be sleeping at most, I hope that is
immaterial.
-Chris
sagar.a.kamble@intel.com April 20, 2018, 11:55 a.m. UTC | #4
On 4/20/2018 4:09 PM, Chris Wilson wrote:
> Quoting Sagar Arun Kamble (2018-04-20 11:23:50)
>>
>> On 4/20/2018 3:24 PM, Mika Kuoppala wrote:
>>> We use jiffies to determine when wait expires. However
>>> Imre did find out that jiffies can and will do a >1
>>> increments on certain situations [1]. When this happens
>>> in a wait_for loop, we return timeout errorneously
>>> much earlier than what the real wallclock would say.
>>>
>>> We can't afford our waits to timeout prematurely.
>>> Discard jiffies and change to ktime to detect timeouts.
>>>
>>> Reported-by: Imre Deak <imre.deak@intel.com>
>>> References: https://lkml.org/lkml/2018/4/18/798 [1]
>>> Cc: Imre Deak <imre.deak@intel.com>
>>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>>> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
>>> Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
>>> ---
>>>    drivers/gpu/drm/i915/intel_drv.h | 4 ++--
>>>    1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
>>> index 8b20824e806e..ac7565220aa3 100644
>>> --- a/drivers/gpu/drm/i915/intel_drv.h
>>> +++ b/drivers/gpu/drm/i915/intel_drv.h
>>> @@ -49,12 +49,12 @@
>>>     * check the condition before the timeout.
>>>     */
>>>    #define __wait_for(OP, COND, US, Wmin, Wmax) ({ \
>>> -     unsigned long timeout__ = jiffies + usecs_to_jiffies(US) + 1;   \
>>> +     const ktime_t end__ = ktime_add_ns(ktime_get_raw(), 1000ll * (US)); \
>> Is ktime_get_raw() monotonic? Thomas suggested ktime_get()
> It proclaims to be monotonic, without the clock drift calibration. For
> the milliseconds we should be sleeping at most, I hope that is
> immaterial.
Yes. I remembered from Imre's comment[1] that raw clock can jump and 
will not be calibrated. If jumps are are not > jiffies we should be good 
get_raw then.

[1] 
https://lists.freedesktop.org/archives/dri-devel/2012-October/028878.html
> -Chris
Mika Kuoppala April 20, 2018, 12:25 p.m. UTC | #5
Chris Wilson <chris@chris-wilson.co.uk> writes:

> Quoting Sagar Arun Kamble (2018-04-20 11:23:50)
>> 
>> 
>> On 4/20/2018 3:24 PM, Mika Kuoppala wrote:
>> > We use jiffies to determine when wait expires. However
>> > Imre did find out that jiffies can and will do a >1
>> > increments on certain situations [1]. When this happens
>> > in a wait_for loop, we return timeout errorneously
>> > much earlier than what the real wallclock would say.
>> >
>> > We can't afford our waits to timeout prematurely.
>> > Discard jiffies and change to ktime to detect timeouts.
>> >
>> > Reported-by: Imre Deak <imre.deak@intel.com>
>> > References: https://lkml.org/lkml/2018/4/18/798 [1]
>> > Cc: Imre Deak <imre.deak@intel.com>
>> > Cc: Chris Wilson <chris@chris-wilson.co.uk>
>> > Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
>> > Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
>> > ---
>> >   drivers/gpu/drm/i915/intel_drv.h | 4 ++--
>> >   1 file changed, 2 insertions(+), 2 deletions(-)
>> >
>> > diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
>> > index 8b20824e806e..ac7565220aa3 100644
>> > --- a/drivers/gpu/drm/i915/intel_drv.h
>> > +++ b/drivers/gpu/drm/i915/intel_drv.h
>> > @@ -49,12 +49,12 @@
>> >    * check the condition before the timeout.
>> >    */
>> >   #define __wait_for(OP, COND, US, Wmin, Wmax) ({ \
>> > -     unsigned long timeout__ = jiffies + usecs_to_jiffies(US) + 1;   \
>> > +     const ktime_t end__ = ktime_add_ns(ktime_get_raw(), 1000ll * (US)); \
>> Is ktime_get_raw() monotonic? Thomas suggested ktime_get()
>
> It proclaims to be monotonic, without the clock drift calibration. For
> the milliseconds we should be sleeping at most, I hope that is
> immaterial.

Agreed. And even if we would be affected by drift, we would want
to extend the end a little instead of accepting a drift calibration
while we are waiting.

Here is how the code size is affected:

add/remove: 0/0 grow/shrink: 5/29 up/down: 77/-1099 (-1022)
Function                                     old     new   delta
__intel_wait_for_register_fw                 462     496     +34
swsci                                        576     596     +20
skl_pcode_request                            488     507     +19
intel_hdcp_auth                             3806    3809      +3
__intel_wait_for_register                    436     437      +1
hsw_power_well_disable                       548     539      -9
i915_gem_idle_work_handler                   608     593     -15
guc_fw_xfer                                  875     854     -21
bxt_ddi_pll_disable                          324     303     -21
chv_set_cdclk                                357     334     -23
vlv_wm_get_hw_state                         2474    2450     -24
gmbus_wait                                   564     538     -26
cnl_cdclk_pll_enable                         286     259     -27
cnl_cdclk_pll_disable                        247     220     -27
hsw_enable_pc8                              1699    1671     -28
bdw_set_cdclk                               1006     977     -29
wait_for_pipe_scanline_moving                388     358     -30
intel_enable_dsi_pll                        1060    1028     -32
i915_gem_wait_for_idle                       348     314     -34
chv_set_memory_dvfs                          258     223     -35
vlv_set_power_well                           321     282     -39
intel_hdmi_hdcp_check_link                   485     446     -39
chv_set_pipe_power_well.constprop            359     320     -39
g33_do_reset                                 216     175     -41
vlv_wait_for_pw_status                       210     168     -42
intel_engines_park                           348     306     -42
vlv_set_cdclk                                711     662     -49
lspcon_wait_mode                             265     211     -54
lpt_init_pch_refclk                         1542    1488     -54
g4x_do_reset                                 595     541     -54
bxt_ddi_pll_enable                          2407    2353     -54
ironlake_crtc_enable                        3279    3211     -68
i915_do_reset                                443     375     -68
intel_guc_send_ct                           1691    1616     -75
Total: Before=1232834, After=1231812, chg -0.08%

-Mika
Imre Deak April 20, 2018, 12:28 p.m. UTC | #6
On Fri, Apr 20, 2018 at 11:27:55AM +0100, Chris Wilson wrote:
> Quoting Mika Kuoppala (2018-04-20 10:54:26)
> > We use jiffies to determine when wait expires. However
> > Imre did find out that jiffies can and will do a >1
> > increments on certain situations [1]. When this happens
> > in a wait_for loop, we return timeout errorneously
> > much earlier than what the real wallclock would say.
> > 
> > We can't afford our waits to timeout prematurely.
> > Discard jiffies and change to ktime to detect timeouts.
> > 
> > Reported-by: Imre Deak <imre.deak@intel.com>
> > References: https://lkml.org/lkml/2018/4/18/798 [1]
> > Cc: Imre Deak <imre.deak@intel.com>
> > Cc: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> 
> The atomic variant is already jiffie-less (since it has to work in
> irq-off contexts). Maybe a bit tricky to suggest that the callers know
> if jiffie incremens are accurate or not.
> 
> What is not clear from the link is whether our wait_for() is running
> across suspend, or whether it is just jiffie recalibration some time
> during resume that breaks.

The wait_for starts on the resume path, so the jump shouldn't be related
to any of the timekeeping adjustments across suspend/resume (happening
already during syscore resume). It looks like a delayed LAPIC timer
interrupt on that GLK system, trying to get more details on that with
irqsoff ftracing.

> 
> >  drivers/gpu/drm/i915/intel_drv.h | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
> > index 8b20824e806e..ac7565220aa3 100644
> > --- a/drivers/gpu/drm/i915/intel_drv.h
> > +++ b/drivers/gpu/drm/i915/intel_drv.h
> > @@ -49,12 +49,12 @@
> >   * check the condition before the timeout.
> >   */
> >  #define __wait_for(OP, COND, US, Wmin, Wmax) ({ \
> > -       unsigned long timeout__ = jiffies + usecs_to_jiffies(US) + 1;   \
> > +       const ktime_t end__ = ktime_add_ns(ktime_get_raw(), 1000ll * (US)); \
> >         long wait__ = (Wmin); /* recommended min for usleep is 10 us */ \
> >         int ret__;                                                      \
> >         might_sleep();                                                  \
> >         for (;;) {                                                      \
> > -               bool expired__ = time_after(jiffies, timeout__);        \
> > +               const bool expired__ = ktime_after(ktime_get_raw(), end__); \
> >                 OP;                                                     \
> >                 if (COND) {                                             \
> >                         ret__ = 0;                                      \
> 
> Nevertheless, the patch is ok and I don't have too much objection to
> adding another tsc (at best, hpet at worst!) read around every mmio+sleep,
> plus expanding the code for the function calls. Out of curiosity what is
> the size delta? How many wait_for() do we have left that we need to
> convert to a function call rather than macro expansion?
> 
> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
> 
> Cc stable?
> -Chris
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Mika Kuoppala April 24, 2018, 2:11 p.m. UTC | #7
Imre Deak <imre.deak@intel.com> writes:

> On Fri, Apr 20, 2018 at 11:27:55AM +0100, Chris Wilson wrote:
>> Quoting Mika Kuoppala (2018-04-20 10:54:26)
>> > We use jiffies to determine when wait expires. However
>> > Imre did find out that jiffies can and will do a >1
>> > increments on certain situations [1]. When this happens
>> > in a wait_for loop, we return timeout errorneously
>> > much earlier than what the real wallclock would say.
>> > 
>> > We can't afford our waits to timeout prematurely.
>> > Discard jiffies and change to ktime to detect timeouts.
>> > 
>> > Reported-by: Imre Deak <imre.deak@intel.com>
>> > References: https://lkml.org/lkml/2018/4/18/798 [1]
>> > Cc: Imre Deak <imre.deak@intel.com>
>> > Cc: Chris Wilson <chris@chris-wilson.co.uk>
>> > Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
>> > Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
>> 
>> The atomic variant is already jiffie-less (since it has to work in
>> irq-off contexts). Maybe a bit tricky to suggest that the callers know
>> if jiffie incremens are accurate or not.
>> 
>> What is not clear from the link is whether our wait_for() is running
>> across suspend, or whether it is just jiffie recalibration some time
>> during resume that breaks.
>
> The wait_for starts on the resume path, so the jump shouldn't be related
> to any of the timekeeping adjustments across suspend/resume (happening
> already during syscore resume). It looks like a delayed LAPIC timer
> interrupt on that GLK system, trying to get more details on that with
> irqsoff ftracing.

Both patches pushed. Thanks for the report and reviews.

-Mika

>
>> 
>> >  drivers/gpu/drm/i915/intel_drv.h | 4 ++--
>> >  1 file changed, 2 insertions(+), 2 deletions(-)
>> > 
>> > diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
>> > index 8b20824e806e..ac7565220aa3 100644
>> > --- a/drivers/gpu/drm/i915/intel_drv.h
>> > +++ b/drivers/gpu/drm/i915/intel_drv.h
>> > @@ -49,12 +49,12 @@
>> >   * check the condition before the timeout.
>> >   */
>> >  #define __wait_for(OP, COND, US, Wmin, Wmax) ({ \
>> > -       unsigned long timeout__ = jiffies + usecs_to_jiffies(US) + 1;   \
>> > +       const ktime_t end__ = ktime_add_ns(ktime_get_raw(), 1000ll * (US)); \
>> >         long wait__ = (Wmin); /* recommended min for usleep is 10 us */ \
>> >         int ret__;                                                      \
>> >         might_sleep();                                                  \
>> >         for (;;) {                                                      \
>> > -               bool expired__ = time_after(jiffies, timeout__);        \
>> > +               const bool expired__ = ktime_after(ktime_get_raw(), end__); \
>> >                 OP;                                                     \
>> >                 if (COND) {                                             \
>> >                         ret__ = 0;                                      \
>> 
>> Nevertheless, the patch is ok and I don't have too much objection to
>> adding another tsc (at best, hpet at worst!) read around every mmio+sleep,
>> plus expanding the code for the function calls. Out of curiosity what is
>> the size delta? How many wait_for() do we have left that we need to
>> convert to a function call rather than macro expansion?
>> 
>> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
>> 
>> Cc stable?
>> -Chris
>> _______________________________________________
>> Intel-gfx mailing list
>> Intel-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
index 8b20824e806e..ac7565220aa3 100644
--- a/drivers/gpu/drm/i915/intel_drv.h
+++ b/drivers/gpu/drm/i915/intel_drv.h
@@ -49,12 +49,12 @@ 
  * check the condition before the timeout.
  */
 #define __wait_for(OP, COND, US, Wmin, Wmax) ({ \
-	unsigned long timeout__ = jiffies + usecs_to_jiffies(US) + 1;	\
+	const ktime_t end__ = ktime_add_ns(ktime_get_raw(), 1000ll * (US)); \
 	long wait__ = (Wmin); /* recommended min for usleep is 10 us */	\
 	int ret__;							\
 	might_sleep();							\
 	for (;;) {							\
-		bool expired__ = time_after(jiffies, timeout__);	\
+		const bool expired__ = ktime_after(ktime_get_raw(), end__); \
 		OP;							\
 		if (COND) {						\
 			ret__ = 0;					\