diff mbox

drm/i915: use udelay for very short delays

Message ID 1481776147-23041-1-git-send-email-hofrat@osadl.org (mailing list archive)
State New, archived
Headers show

Commit Message

Nicholas Mc Guire Dec. 15, 2016, 4:29 a.m. UTC
Even on fast systems a 2 microsecond delay is most likely more efficient
as a busy-wait loop. The overhead of a hrtimer does not seem warranted -
change this to a udelay(2).

Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org>
---

Problem found by coccinelle:

Patch was compile tested with: x86_64_defconfig (implies CONFIG_DRM_I915)

Patch is against 4.9.0 (localversion-next is next-20161214)

 drivers/gpu/drm/i915/intel_dsi.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Jani Nikula Dec. 15, 2016, 9:08 a.m. UTC | #1
On Thu, 15 Dec 2016, Nicholas Mc Guire <hofrat@osadl.org> wrote:
> Even on fast systems a 2 microsecond delay is most likely more efficient
> as a busy-wait loop. The overhead of a hrtimer does not seem warranted -
> change this to a udelay(2).

Similar concerns as in [1]. We don't need the accuracy of udelay() here,
so this boils down to which is the better use of CPU. We could probably
relax the max delay more if that was helpful. But I'm not immediately
sold on "is most likely more efficient" which sounds like a gut feeling.

I'm sorry it's not clear in my other reply that I do appreciate
addressing incorrect/silly use of usleep_range(); I'm just not (yet)
convinced udelay() is the answer.

BR,
Jani.


[1] http://lkml.kernel.org/r/8737hpr32a.fsf@intel.com


>
> Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org>
> ---
>
> Problem found by coccinelle:
>
> Patch was compile tested with: x86_64_defconfig (implies CONFIG_DRM_I915)
>
> Patch is against 4.9.0 (localversion-next is next-20161214)
>
>  drivers/gpu/drm/i915/intel_dsi.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_dsi.c b/drivers/gpu/drm/i915/intel_dsi.c
> index 5b72c50..19fe86b 100644
> --- a/drivers/gpu/drm/i915/intel_dsi.c
> +++ b/drivers/gpu/drm/i915/intel_dsi.c
> @@ -379,7 +379,7 @@ static void bxt_dsi_device_ready(struct intel_encoder *encoder)
>  		val &= ~ULPS_STATE_MASK;
>  		val |= (ULPS_STATE_ENTER | DEVICE_READY);
>  		I915_WRITE(MIPI_DEVICE_READY(port), val);
> -		usleep_range(2, 3);
> +		udelay(2);
>  
>  		/* 3. Exit ULPS */
>  		val = I915_READ(MIPI_DEVICE_READY(port));
Daniel Vetter Dec. 15, 2016, 9:25 a.m. UTC | #2
On Thu, Dec 15, 2016 at 11:08:49AM +0200, Jani Nikula wrote:
> On Thu, 15 Dec 2016, Nicholas Mc Guire <hofrat@osadl.org> wrote:
> > Even on fast systems a 2 microsecond delay is most likely more efficient
> > as a busy-wait loop. The overhead of a hrtimer does not seem warranted -
> > change this to a udelay(2).
> 
> Similar concerns as in [1]. We don't need the accuracy of udelay() here,
> so this boils down to which is the better use of CPU. We could probably
> relax the max delay more if that was helpful. But I'm not immediately
> sold on "is most likely more efficient" which sounds like a gut feeling.
> 
> I'm sorry it's not clear in my other reply that I do appreciate
> addressing incorrect/silly use of usleep_range(); I'm just not (yet)
> convinced udelay() is the answer.

So one reason why we unconditionally use *sleep variants is the
might_sleep check. Because in the past people have used udelay and mdelay,
those delays had to be increased a lot because hw, and at the same time
someone added users of these functions to our irq helper, resulting in irq
handling times measures in multiple ms. That's not good.

So until someone can demonstrate that there's a real benefit (which let's
be honest, for modeset code, will never be the case) I very highly prefer
to use *sleep* functions. They prevent some silly things from happening by
accident. Micro-optimizing modeset code and hampering maitainability in
the process is not good.
-Daniel

> 
> BR,
> Jani.
> 
> 
> [1] http://lkml.kernel.org/r/8737hpr32a.fsf@intel.com
> 
> 
> >
> > Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org>
> > ---
> >
> > Problem found by coccinelle:
> >
> > Patch was compile tested with: x86_64_defconfig (implies CONFIG_DRM_I915)
> >
> > Patch is against 4.9.0 (localversion-next is next-20161214)
> >
> >  drivers/gpu/drm/i915/intel_dsi.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/i915/intel_dsi.c b/drivers/gpu/drm/i915/intel_dsi.c
> > index 5b72c50..19fe86b 100644
> > --- a/drivers/gpu/drm/i915/intel_dsi.c
> > +++ b/drivers/gpu/drm/i915/intel_dsi.c
> > @@ -379,7 +379,7 @@ static void bxt_dsi_device_ready(struct intel_encoder *encoder)
> >  		val &= ~ULPS_STATE_MASK;
> >  		val |= (ULPS_STATE_ENTER | DEVICE_READY);
> >  		I915_WRITE(MIPI_DEVICE_READY(port), val);
> > -		usleep_range(2, 3);
> > +		udelay(2);
> >  
> >  		/* 3. Exit ULPS */
> >  		val = I915_READ(MIPI_DEVICE_READY(port));
> 
> -- 
> Jani Nikula, Intel Open Source Technology Center
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Daniel Vetter Dec. 15, 2016, 9:27 a.m. UTC | #3
On Thu, Dec 15, 2016 at 10:25:19AM +0100, Daniel Vetter wrote:
> On Thu, Dec 15, 2016 at 11:08:49AM +0200, Jani Nikula wrote:
> > On Thu, 15 Dec 2016, Nicholas Mc Guire <hofrat@osadl.org> wrote:
> > > Even on fast systems a 2 microsecond delay is most likely more efficient
> > > as a busy-wait loop. The overhead of a hrtimer does not seem warranted -
> > > change this to a udelay(2).
> > 
> > Similar concerns as in [1]. We don't need the accuracy of udelay() here,
> > so this boils down to which is the better use of CPU. We could probably
> > relax the max delay more if that was helpful. But I'm not immediately
> > sold on "is most likely more efficient" which sounds like a gut feeling.
> > 
> > I'm sorry it's not clear in my other reply that I do appreciate
> > addressing incorrect/silly use of usleep_range(); I'm just not (yet)
> > convinced udelay() is the answer.
> 
> So one reason why we unconditionally use *sleep variants is the
> might_sleep check. Because in the past people have used udelay and mdelay,
> those delays had to be increased a lot because hw, and at the same time
> someone added users of these functions to our irq helper, resulting in irq
> handling times measures in multiple ms. That's not good.
> 
> So until someone can demonstrate that there's a real benefit (which let's
> be honest, for modeset code, will never be the case) I very highly prefer
> to use *sleep* functions. They prevent some silly things from happening by
> accident. Micro-optimizing modeset code and hampering maitainability in
> the process is not good.

Also, the entire premise seems backwards: usleep_range is inefficient for
certain parameter ranges and better replaced with udelay. That makes
sense.

But why exactly do we not fix udelay_range then, but instead do a cocci
job crawling through all the thousands of callers? Just fix usleep(_range)
to use udelay for very small values (and still keep the might_sleep check
ofc) if that's more efficient!
-Daniel
Nicholas Mc Guire Dec. 15, 2016, 9:28 a.m. UTC | #4
On Thu, Dec 15, 2016 at 11:08:49AM +0200, Jani Nikula wrote:
> On Thu, 15 Dec 2016, Nicholas Mc Guire <hofrat@osadl.org> wrote:
> > Even on fast systems a 2 microsecond delay is most likely more efficient
> > as a busy-wait loop. The overhead of a hrtimer does not seem warranted -
> > change this to a udelay(2).
> 
> Similar concerns as in [1]. We don't need the accuracy of udelay() here,
> so this boils down to which is the better use of CPU. We could probably
> relax the max delay more if that was helpful. But I'm not immediately
> sold on "is most likely more efficient" which sounds like a gut feeling.
> 
> I'm sorry it's not clear in my other reply that I do appreciate
> addressing incorrect/silly use of usleep_range(); I'm just not (yet)
> convinced udelay() is the answer.

if the delay is not critical and all that is needed 
is an assurance that it is greater than X us then 
usleep_range is fine with a relaxed limit. 
So from what you wrote my patch proposal is wrong - the
udelay() is not the way to got.
My intent is to get rid of very small usleep_range() cases
so if usleep_range(20,50) causes no issues with this driver
and does not induce any performance penalty then that would 
be the way to go I think.

thx!
hofrat
Jani Nikula Dec. 15, 2016, 9:52 a.m. UTC | #5
On Thu, 15 Dec 2016, Nicholas Mc Guire <der.herr@hofr.at> wrote:
> On Thu, Dec 15, 2016 at 11:08:49AM +0200, Jani Nikula wrote:
>> On Thu, 15 Dec 2016, Nicholas Mc Guire <hofrat@osadl.org> wrote:
>> > Even on fast systems a 2 microsecond delay is most likely more efficient
>> > as a busy-wait loop. The overhead of a hrtimer does not seem warranted -
>> > change this to a udelay(2).
>> 
>> Similar concerns as in [1]. We don't need the accuracy of udelay() here,
>> so this boils down to which is the better use of CPU. We could probably
>> relax the max delay more if that was helpful. But I'm not immediately
>> sold on "is most likely more efficient" which sounds like a gut feeling.
>> 
>> I'm sorry it's not clear in my other reply that I do appreciate
>> addressing incorrect/silly use of usleep_range(); I'm just not (yet)
>> convinced udelay() is the answer.
>
> if the delay is not critical and all that is needed 
> is an assurance that it is greater than X us then 
> usleep_range is fine with a relaxed limit. 
> So from what you wrote my patch proposal is wrong - the
> udelay() is not the way to got.
> My intent is to get rid of very small usleep_range() cases
> so if usleep_range(20,50) causes no issues with this driver
> and does not induce any performance penalty then that would 
> be the way to go I think.

Okay, so I looked at the code, and I looked at our spec, and I looked at
the MIPI D-PHY spec, and I cried a little.

Long story short, I think usleep_range(10, 50) will be fine.


BR,
Jani.
Ville Syrjälä Dec. 15, 2016, 10:10 a.m. UTC | #6
On Thu, Dec 15, 2016 at 11:52:34AM +0200, Jani Nikula wrote:
> On Thu, 15 Dec 2016, Nicholas Mc Guire <der.herr@hofr.at> wrote:
> > On Thu, Dec 15, 2016 at 11:08:49AM +0200, Jani Nikula wrote:
> >> On Thu, 15 Dec 2016, Nicholas Mc Guire <hofrat@osadl.org> wrote:
> >> > Even on fast systems a 2 microsecond delay is most likely more efficient
> >> > as a busy-wait loop. The overhead of a hrtimer does not seem warranted -
> >> > change this to a udelay(2).
> >> 
> >> Similar concerns as in [1]. We don't need the accuracy of udelay() here,
> >> so this boils down to which is the better use of CPU. We could probably
> >> relax the max delay more if that was helpful. But I'm not immediately
> >> sold on "is most likely more efficient" which sounds like a gut feeling.
> >> 
> >> I'm sorry it's not clear in my other reply that I do appreciate
> >> addressing incorrect/silly use of usleep_range(); I'm just not (yet)
> >> convinced udelay() is the answer.
> >
> > if the delay is not critical and all that is needed 
> > is an assurance that it is greater than X us then 
> > usleep_range is fine with a relaxed limit. 
> > So from what you wrote my patch proposal is wrong - the
> > udelay() is not the way to got.
> > My intent is to get rid of very small usleep_range() cases
> > so if usleep_range(20,50) causes no issues with this driver
> > and does not induce any performance penalty then that would 
> > be the way to go I think.
> 
> Okay, so I looked at the code, and I looked at our spec, and I looked at
> the MIPI D-PHY spec, and I cried a little.
> 
> Long story short, I think usleep_range(10, 50) will be fine.

Note that I really want to see a comment next to every delay like this
documenting the actual hardware requirement, if the delay used by the
code doesn't 100% match it.
Jani Nikula Dec. 15, 2016, 10:20 a.m. UTC | #7
On Thu, 15 Dec 2016, Ville Syrjälä <ville.syrjala@linux.intel.com> wrote:
> On Thu, Dec 15, 2016 at 11:52:34AM +0200, Jani Nikula wrote:
>> On Thu, 15 Dec 2016, Nicholas Mc Guire <der.herr@hofr.at> wrote:
>> > On Thu, Dec 15, 2016 at 11:08:49AM +0200, Jani Nikula wrote:
>> >> On Thu, 15 Dec 2016, Nicholas Mc Guire <hofrat@osadl.org> wrote:
>> >> > Even on fast systems a 2 microsecond delay is most likely more efficient
>> >> > as a busy-wait loop. The overhead of a hrtimer does not seem warranted -
>> >> > change this to a udelay(2).
>> >> 
>> >> Similar concerns as in [1]. We don't need the accuracy of udelay() here,
>> >> so this boils down to which is the better use of CPU. We could probably
>> >> relax the max delay more if that was helpful. But I'm not immediately
>> >> sold on "is most likely more efficient" which sounds like a gut feeling.
>> >> 
>> >> I'm sorry it's not clear in my other reply that I do appreciate
>> >> addressing incorrect/silly use of usleep_range(); I'm just not (yet)
>> >> convinced udelay() is the answer.
>> >
>> > if the delay is not critical and all that is needed 
>> > is an assurance that it is greater than X us then 
>> > usleep_range is fine with a relaxed limit. 
>> > So from what you wrote my patch proposal is wrong - the
>> > udelay() is not the way to got.
>> > My intent is to get rid of very small usleep_range() cases
>> > so if usleep_range(20,50) causes no issues with this driver
>> > and does not induce any performance penalty then that would 
>> > be the way to go I think.
>> 
>> Okay, so I looked at the code, and I looked at our spec, and I looked at
>> the MIPI D-PHY spec, and I cried a little.
>> 
>> Long story short, I think usleep_range(10, 50) will be fine.
>
> Note that I really want to see a comment next to every delay like this
> documenting the actual hardware requirement, if the delay used by the
> code doesn't 100% match it.

Our spec says, "Wait for 2us for ULPS to complete". That's a simplistic
view wrt D-PHY, and our code doesn't even match the spec. Hence the
tears. Want to propose a wording for the comment so we can apply this
change, without going for a full rewrite of the sequence?


BR,
Jani.
Nicholas Mc Guire Dec. 15, 2016, 10:34 a.m. UTC | #8
On Thu, Dec 15, 2016 at 12:20:01PM +0200, Jani Nikula wrote:
> On Thu, 15 Dec 2016, Ville Syrjälä <ville.syrjala@linux.intel.com> wrote:
> > On Thu, Dec 15, 2016 at 11:52:34AM +0200, Jani Nikula wrote:
> >> On Thu, 15 Dec 2016, Nicholas Mc Guire <der.herr@hofr.at> wrote:
> >> > On Thu, Dec 15, 2016 at 11:08:49AM +0200, Jani Nikula wrote:
> >> >> On Thu, 15 Dec 2016, Nicholas Mc Guire <hofrat@osadl.org> wrote:
> >> >> > Even on fast systems a 2 microsecond delay is most likely more efficient
> >> >> > as a busy-wait loop. The overhead of a hrtimer does not seem warranted -
> >> >> > change this to a udelay(2).
> >> >> 
> >> >> Similar concerns as in [1]. We don't need the accuracy of udelay() here,
> >> >> so this boils down to which is the better use of CPU. We could probably
> >> >> relax the max delay more if that was helpful. But I'm not immediately
> >> >> sold on "is most likely more efficient" which sounds like a gut feeling.
> >> >> 
> >> >> I'm sorry it's not clear in my other reply that I do appreciate
> >> >> addressing incorrect/silly use of usleep_range(); I'm just not (yet)
> >> >> convinced udelay() is the answer.
> >> >
> >> > if the delay is not critical and all that is needed 
> >> > is an assurance that it is greater than X us then 
> >> > usleep_range is fine with a relaxed limit. 
> >> > So from what you wrote my patch proposal is wrong - the
> >> > udelay() is not the way to got.
> >> > My intent is to get rid of very small usleep_range() cases
> >> > so if usleep_range(20,50) causes no issues with this driver
> >> > and does not induce any performance penalty then that would 
> >> > be the way to go I think.
> >> 
> >> Okay, so I looked at the code, and I looked at our spec, and I looked at
> >> the MIPI D-PHY spec, and I cried a little.
> >> 
> >> Long story short, I think usleep_range(10, 50) will be fine.
> >
> > Note that I really want to see a comment next to every delay like this
> > documenting the actual hardware requirement, if the delay used by the
> > code doesn't 100% match it.
> 
> Our spec says, "Wait for 2us for ULPS to complete". That's a simplistic
> view wrt D-PHY, and our code doesn't even match the spec. Hence the
> tears. Want to propose a wording for the comment so we can apply this
> change, without going for a full rewrite of the sequence?
>
is that suitable or am I overdoing it ?

-               usleep_range(2, 3);
+               /* delay for at least 2us - relaxed to 10-50 to allow
+                * hrtimer subsystem to optimize uncritical timer handling
+                */
+               usleep_range(10, 50);

thx!
hofrat
Nicholas Mc Guire Dec. 15, 2016, 10:51 a.m. UTC | #9
On Thu, Dec 15, 2016 at 10:27:59AM +0100, Daniel Vetter wrote:
> On Thu, Dec 15, 2016 at 10:25:19AM +0100, Daniel Vetter wrote:
> > On Thu, Dec 15, 2016 at 11:08:49AM +0200, Jani Nikula wrote:
> > > On Thu, 15 Dec 2016, Nicholas Mc Guire <hofrat@osadl.org> wrote:
> > > > Even on fast systems a 2 microsecond delay is most likely more efficient
> > > > as a busy-wait loop. The overhead of a hrtimer does not seem warranted -
> > > > change this to a udelay(2).
> > > 
> > > Similar concerns as in [1]. We don't need the accuracy of udelay() here,
> > > so this boils down to which is the better use of CPU. We could probably
> > > relax the max delay more if that was helpful. But I'm not immediately
> > > sold on "is most likely more efficient" which sounds like a gut feeling.
> > > 
> > > I'm sorry it's not clear in my other reply that I do appreciate
> > > addressing incorrect/silly use of usleep_range(); I'm just not (yet)
> > > convinced udelay() is the answer.
> > 
> > So one reason why we unconditionally use *sleep variants is the
> > might_sleep check. Because in the past people have used udelay and mdelay,
> > those delays had to be increased a lot because hw, and at the same time
> > someone added users of these functions to our irq helper, resulting in irq
> > handling times measures in multiple ms. That's not good.
> > 
> > So until someone can demonstrate that there's a real benefit (which let's
> > be honest, for modeset code, will never be the case) I very highly prefer
> > to use *sleep* functions. They prevent some silly things from happening by
> > accident. Micro-optimizing modeset code and hampering maitainability in
> > the process is not good.
> 
> Also, the entire premise seems backwards: usleep_range is inefficient for
> certain parameter ranges and better replaced with udelay. That makes
> sense.
> 
> But why exactly do we not fix udelay_range then, but instead do a cocci
> job crawling through all the thousands of callers? Just fix usleep(_range)
> to use udelay for very small values (and still keep the might_sleep check
> ofc) if that's more efficient!

its actually not thousands more like a few dozen:

usleep_range(min,max) in linux-stable 4.9.0

1648 calls total
1488 pass numeric values only (90.29%)
  27 min below 10us (1.81%)
  40 min above 10ms (2.68%)
     min out of spec 4.50%
  76 preprocessor constants (4.61%)
   1 min below 10us (1.31%)
   8 min above 10ms (10.52%)
     min out of spec 11.84%
  85 expressions (5.15%)
1(0) min below 10us (1.50%)*
6(2) min above 10ms (7.50%)*
     min out of spec 9.0%
Errors:
  23 where min==max  (1.39%)
   0 where max < min (0.00%)

Total:
  Bugs: 6.48%-10.70%*
  Crit: 3.09%-3.15%* (min < 10 and min==max and max < min)
  Detectable by coccinelle:
  Bugs: 74/103 (71.8%)
  Crit: 50/52 (96.1%)

*based on estimats as runtime values not known.


there is no usleep() as noted in Documentation/timers/timers-howto.txt
                - Why not usleep?
                        On slower systems, (embedded, OR perhaps a speed-
                        stepped PC!) the overhead of setting up the hrtimers
                        for usleep *may* not be worth it. Such an evaluation
                        will obviously depend on your specific situation, but
                        it is something to be aware of.

and it suggests to use different API for different ranges - sounds sane
to me and seems to cover the needs of drivers.

thx!
hofrat
Daniel Vetter Dec. 15, 2016, 11:39 a.m. UTC | #10
On Thu, Dec 15, 2016 at 11:51 AM, Nicholas Mc Guire <der.herr@hofr.at> wrote:
> On Thu, Dec 15, 2016 at 10:27:59AM +0100, Daniel Vetter wrote:
>> On Thu, Dec 15, 2016 at 10:25:19AM +0100, Daniel Vetter wrote:
>> > On Thu, Dec 15, 2016 at 11:08:49AM +0200, Jani Nikula wrote:
>> > > On Thu, 15 Dec 2016, Nicholas Mc Guire <hofrat@osadl.org> wrote:
>> > > > Even on fast systems a 2 microsecond delay is most likely more efficient
>> > > > as a busy-wait loop. The overhead of a hrtimer does not seem warranted -
>> > > > change this to a udelay(2).
>> > >
>> > > Similar concerns as in [1]. We don't need the accuracy of udelay() here,
>> > > so this boils down to which is the better use of CPU. We could probably
>> > > relax the max delay more if that was helpful. But I'm not immediately
>> > > sold on "is most likely more efficient" which sounds like a gut feeling.
>> > >
>> > > I'm sorry it's not clear in my other reply that I do appreciate
>> > > addressing incorrect/silly use of usleep_range(); I'm just not (yet)
>> > > convinced udelay() is the answer.
>> >
>> > So one reason why we unconditionally use *sleep variants is the
>> > might_sleep check. Because in the past people have used udelay and mdelay,
>> > those delays had to be increased a lot because hw, and at the same time
>> > someone added users of these functions to our irq helper, resulting in irq
>> > handling times measures in multiple ms. That's not good.
>> >
>> > So until someone can demonstrate that there's a real benefit (which let's
>> > be honest, for modeset code, will never be the case) I very highly prefer
>> > to use *sleep* functions. They prevent some silly things from happening by
>> > accident. Micro-optimizing modeset code and hampering maitainability in
>> > the process is not good.
>>
>> Also, the entire premise seems backwards: usleep_range is inefficient for
>> certain parameter ranges and better replaced with udelay. That makes
>> sense.
>>
>> But why exactly do we not fix udelay_range then, but instead do a cocci
>> job crawling through all the thousands of callers? Just fix usleep(_range)
>> to use udelay for very small values (and still keep the might_sleep check
>> ofc) if that's more efficient!
>
> its actually not thousands more like a few dozen:

There's 1k + usleep* calls you need to audit and teach people how to
exactly use it.

> usleep_range(min,max) in linux-stable 4.9.0
>
> 1648 calls total
> 1488 pass numeric values only (90.29%)
>   27 min below 10us (1.81%)
>   40 min above 10ms (2.68%)
>      min out of spec 4.50%
>   76 preprocessor constants (4.61%)
>    1 min below 10us (1.31%)
>    8 min above 10ms (10.52%)
>      min out of spec 11.84%
>   85 expressions (5.15%)
> 1(0) min below 10us (1.50%)*
> 6(2) min above 10ms (7.50%)*
>      min out of spec 9.0%
> Errors:
>   23 where min==max  (1.39%)
>    0 where max < min (0.00%)
>
> Total:
>   Bugs: 6.48%-10.70%*
>   Crit: 3.09%-3.15%* (min < 10 and min==max and max < min)
>   Detectable by coccinelle:
>   Bugs: 74/103 (71.8%)
>   Crit: 50/52 (96.1%)
>
> *based on estimats as runtime values not known.
>
>
> there is no usleep() as noted in Documentation/timers/timers-howto.txt
>                 - Why not usleep?
>                         On slower systems, (embedded, OR perhaps a speed-
>                         stepped PC!) the overhead of setting up the hrtimers
>                         for usleep *may* not be worth it. Such an evaluation
>                         will obviously depend on your specific situation, but
>                         it is something to be aware of.
>
> and it suggests to use different API for different ranges - sounds sane
> to me and seems to cover the needs of drivers.

git grep usleep seems to disagree, at least I see a bunch of usleep()
calls. And per Rusty's api usability scale the ultimate fucntion is
dwim(). It just feels to me like pushing such trade-off decisions to
drivers is bad design because a) the tradeoffs depend upon the cpu
you're running on b) driver writers are pretty good at getting such
tradeoffs wrong in general. Aiming for a more dwim()-like api for this
here makes sense, instead of an eternal fight to correct drivers that
get it wrong all the time.
-Daniel
Jani Nikula Dec. 15, 2016, 11:48 a.m. UTC | #11
On Thu, 15 Dec 2016, Nicholas Mc Guire <der.herr@hofr.at> wrote:
> On Thu, Dec 15, 2016 at 12:20:01PM +0200, Jani Nikula wrote:
>> On Thu, 15 Dec 2016, Ville Syrjälä <ville.syrjala@linux.intel.com> wrote:
>> > On Thu, Dec 15, 2016 at 11:52:34AM +0200, Jani Nikula wrote:
>> >> On Thu, 15 Dec 2016, Nicholas Mc Guire <der.herr@hofr.at> wrote:
>> >> > On Thu, Dec 15, 2016 at 11:08:49AM +0200, Jani Nikula wrote:
>> >> >> On Thu, 15 Dec 2016, Nicholas Mc Guire <hofrat@osadl.org> wrote:
>> >> >> > Even on fast systems a 2 microsecond delay is most likely more efficient
>> >> >> > as a busy-wait loop. The overhead of a hrtimer does not seem warranted -
>> >> >> > change this to a udelay(2).
>> >> >> 
>> >> >> Similar concerns as in [1]. We don't need the accuracy of udelay() here,
>> >> >> so this boils down to which is the better use of CPU. We could probably
>> >> >> relax the max delay more if that was helpful. But I'm not immediately
>> >> >> sold on "is most likely more efficient" which sounds like a gut feeling.
>> >> >> 
>> >> >> I'm sorry it's not clear in my other reply that I do appreciate
>> >> >> addressing incorrect/silly use of usleep_range(); I'm just not (yet)
>> >> >> convinced udelay() is the answer.
>> >> >
>> >> > if the delay is not critical and all that is needed 
>> >> > is an assurance that it is greater than X us then 
>> >> > usleep_range is fine with a relaxed limit. 
>> >> > So from what you wrote my patch proposal is wrong - the
>> >> > udelay() is not the way to got.
>> >> > My intent is to get rid of very small usleep_range() cases
>> >> > so if usleep_range(20,50) causes no issues with this driver
>> >> > and does not induce any performance penalty then that would 
>> >> > be the way to go I think.
>> >> 
>> >> Okay, so I looked at the code, and I looked at our spec, and I looked at
>> >> the MIPI D-PHY spec, and I cried a little.
>> >> 
>> >> Long story short, I think usleep_range(10, 50) will be fine.
>> >
>> > Note that I really want to see a comment next to every delay like this
>> > documenting the actual hardware requirement, if the delay used by the
>> > code doesn't 100% match it.
>> 
>> Our spec says, "Wait for 2us for ULPS to complete". That's a simplistic
>> view wrt D-PHY, and our code doesn't even match the spec. Hence the
>> tears. Want to propose a wording for the comment so we can apply this
>> change, without going for a full rewrite of the sequence?
>>
> is that suitable or am I overdoing it ?
>
> -               usleep_range(2, 3);
> +               /* delay for at least 2us - relaxed to 10-50 to allow
> +                * hrtimer subsystem to optimize uncritical timer handling
> +                */
> +               usleep_range(10, 50);

I'm fine with that. Or maybe just make it "relaxed to allow" without the
values.

Jani.


>
> thx!
> hofrat
Ville Syrjälä Dec. 15, 2016, 11:51 a.m. UTC | #12
On Thu, Dec 15, 2016 at 10:34:00AM +0000, Nicholas Mc Guire wrote:
> On Thu, Dec 15, 2016 at 12:20:01PM +0200, Jani Nikula wrote:
> > On Thu, 15 Dec 2016, Ville Syrjälä <ville.syrjala@linux.intel.com> wrote:
> > > On Thu, Dec 15, 2016 at 11:52:34AM +0200, Jani Nikula wrote:
> > >> On Thu, 15 Dec 2016, Nicholas Mc Guire <der.herr@hofr.at> wrote:
> > >> > On Thu, Dec 15, 2016 at 11:08:49AM +0200, Jani Nikula wrote:
> > >> >> On Thu, 15 Dec 2016, Nicholas Mc Guire <hofrat@osadl.org> wrote:
> > >> >> > Even on fast systems a 2 microsecond delay is most likely more efficient
> > >> >> > as a busy-wait loop. The overhead of a hrtimer does not seem warranted -
> > >> >> > change this to a udelay(2).
> > >> >> 
> > >> >> Similar concerns as in [1]. We don't need the accuracy of udelay() here,
> > >> >> so this boils down to which is the better use of CPU. We could probably
> > >> >> relax the max delay more if that was helpful. But I'm not immediately
> > >> >> sold on "is most likely more efficient" which sounds like a gut feeling.
> > >> >> 
> > >> >> I'm sorry it's not clear in my other reply that I do appreciate
> > >> >> addressing incorrect/silly use of usleep_range(); I'm just not (yet)
> > >> >> convinced udelay() is the answer.
> > >> >
> > >> > if the delay is not critical and all that is needed 
> > >> > is an assurance that it is greater than X us then 
> > >> > usleep_range is fine with a relaxed limit. 
> > >> > So from what you wrote my patch proposal is wrong - the
> > >> > udelay() is not the way to got.
> > >> > My intent is to get rid of very small usleep_range() cases
> > >> > so if usleep_range(20,50) causes no issues with this driver
> > >> > and does not induce any performance penalty then that would 
> > >> > be the way to go I think.
> > >> 
> > >> Okay, so I looked at the code, and I looked at our spec, and I looked at
> > >> the MIPI D-PHY spec, and I cried a little.
> > >> 
> > >> Long story short, I think usleep_range(10, 50) will be fine.
> > >
> > > Note that I really want to see a comment next to every delay like this
> > > documenting the actual hardware requirement, if the delay used by the
> > > code doesn't 100% match it.
> > 
> > Our spec says, "Wait for 2us for ULPS to complete". That's a simplistic
> > view wrt D-PHY, and our code doesn't even match the spec. Hence the
> > tears. Want to propose a wording for the comment so we can apply this
> > change, without going for a full rewrite of the sequence?
> >
> is that suitable or am I overdoing it ?
> 
> -               usleep_range(2, 3);
> +               /* delay for at least 2us - relaxed to 10-50 to allow
> +                * hrtimer subsystem to optimize uncritical timer handling
> +                */

That's entirely too verbose IMO, and the reason for using usleep_range()
is pretty obvious without spelling it out.

All we really want to know is what the spec says is the minimum
acceptable delay.

> +               usleep_range(10, 50);
> 
> thx!
> hofrat
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/intel_dsi.c b/drivers/gpu/drm/i915/intel_dsi.c
index 5b72c50..19fe86b 100644
--- a/drivers/gpu/drm/i915/intel_dsi.c
+++ b/drivers/gpu/drm/i915/intel_dsi.c
@@ -379,7 +379,7 @@  static void bxt_dsi_device_ready(struct intel_encoder *encoder)
 		val &= ~ULPS_STATE_MASK;
 		val |= (ULPS_STATE_ENTER | DEVICE_READY);
 		I915_WRITE(MIPI_DEVICE_READY(port), val);
-		usleep_range(2, 3);
+		udelay(2);
 
 		/* 3. Exit ULPS */
 		val = I915_READ(MIPI_DEVICE_READY(port));