diff mbox

drm/i915: use udelay for very short delays

Message ID 1481776147-23041-1-git-send-email-hofrat@osadl.org (mailing list archive)
State New, archived
Headers show

Commit Message

Nicholas Mc Guire Dec. 15, 2016, 4:29 a.m. UTC
Even on fast systems a 2 microsecond delay is most likely more efficient
as a busy-wait loop. The overhead of a hrtimer does not seem warranted -
change this to a udelay(2).

Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org>
---

Problem found by coccinelle:

Patch was compile tested with: x86_64_defconfig (implies CONFIG_DRM_I915)

Patch is against 4.9.0 (localversion-next is next-20161214)

 drivers/gpu/drm/i915/intel_dsi.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Nicholas Mc Guire Dec. 15, 2016, 9:28 a.m. UTC | #1
On Thu, Dec 15, 2016 at 11:08:49AM +0200, Jani Nikula wrote:
> On Thu, 15 Dec 2016, Nicholas Mc Guire <hofrat@osadl.org> wrote:
> > Even on fast systems a 2 microsecond delay is most likely more efficient
> > as a busy-wait loop. The overhead of a hrtimer does not seem warranted -
> > change this to a udelay(2).
> 
> Similar concerns as in [1]. We don't need the accuracy of udelay() here,
> so this boils down to which is the better use of CPU. We could probably
> relax the max delay more if that was helpful. But I'm not immediately
> sold on "is most likely more efficient" which sounds like a gut feeling.
> 
> I'm sorry it's not clear in my other reply that I do appreciate
> addressing incorrect/silly use of usleep_range(); I'm just not (yet)
> convinced udelay() is the answer.

if the delay is not critical and all that is needed 
is an assurance that it is greater than X us then 
usleep_range is fine with a relaxed limit. 
So from what you wrote my patch proposal is wrong - the
udelay() is not the way to got.
My intent is to get rid of very small usleep_range() cases
so if usleep_range(20,50) causes no issues with this driver
and does not induce any performance penalty then that would 
be the way to go I think.

thx!
hofrat
Nicholas Mc Guire Dec. 15, 2016, 10:34 a.m. UTC | #2
On Thu, Dec 15, 2016 at 12:20:01PM +0200, Jani Nikula wrote:
> On Thu, 15 Dec 2016, Ville Syrjälä <ville.syrjala@linux.intel.com> wrote:
> > On Thu, Dec 15, 2016 at 11:52:34AM +0200, Jani Nikula wrote:
> >> On Thu, 15 Dec 2016, Nicholas Mc Guire <der.herr@hofr.at> wrote:
> >> > On Thu, Dec 15, 2016 at 11:08:49AM +0200, Jani Nikula wrote:
> >> >> On Thu, 15 Dec 2016, Nicholas Mc Guire <hofrat@osadl.org> wrote:
> >> >> > Even on fast systems a 2 microsecond delay is most likely more efficient
> >> >> > as a busy-wait loop. The overhead of a hrtimer does not seem warranted -
> >> >> > change this to a udelay(2).
> >> >> 
> >> >> Similar concerns as in [1]. We don't need the accuracy of udelay() here,
> >> >> so this boils down to which is the better use of CPU. We could probably
> >> >> relax the max delay more if that was helpful. But I'm not immediately
> >> >> sold on "is most likely more efficient" which sounds like a gut feeling.
> >> >> 
> >> >> I'm sorry it's not clear in my other reply that I do appreciate
> >> >> addressing incorrect/silly use of usleep_range(); I'm just not (yet)
> >> >> convinced udelay() is the answer.
> >> >
> >> > if the delay is not critical and all that is needed 
> >> > is an assurance that it is greater than X us then 
> >> > usleep_range is fine with a relaxed limit. 
> >> > So from what you wrote my patch proposal is wrong - the
> >> > udelay() is not the way to got.
> >> > My intent is to get rid of very small usleep_range() cases
> >> > so if usleep_range(20,50) causes no issues with this driver
> >> > and does not induce any performance penalty then that would 
> >> > be the way to go I think.
> >> 
> >> Okay, so I looked at the code, and I looked at our spec, and I looked at
> >> the MIPI D-PHY spec, and I cried a little.
> >> 
> >> Long story short, I think usleep_range(10, 50) will be fine.
> >
> > Note that I really want to see a comment next to every delay like this
> > documenting the actual hardware requirement, if the delay used by the
> > code doesn't 100% match it.
> 
> Our spec says, "Wait for 2us for ULPS to complete". That's a simplistic
> view wrt D-PHY, and our code doesn't even match the spec. Hence the
> tears. Want to propose a wording for the comment so we can apply this
> change, without going for a full rewrite of the sequence?
>
is that suitable or am I overdoing it ?

-               usleep_range(2, 3);
+               /* delay for at least 2us - relaxed to 10-50 to allow
+                * hrtimer subsystem to optimize uncritical timer handling
+                */
+               usleep_range(10, 50);

thx!
hofrat
Nicholas Mc Guire Dec. 15, 2016, 10:51 a.m. UTC | #3
On Thu, Dec 15, 2016 at 10:27:59AM +0100, Daniel Vetter wrote:
> On Thu, Dec 15, 2016 at 10:25:19AM +0100, Daniel Vetter wrote:
> > On Thu, Dec 15, 2016 at 11:08:49AM +0200, Jani Nikula wrote:
> > > On Thu, 15 Dec 2016, Nicholas Mc Guire <hofrat@osadl.org> wrote:
> > > > Even on fast systems a 2 microsecond delay is most likely more efficient
> > > > as a busy-wait loop. The overhead of a hrtimer does not seem warranted -
> > > > change this to a udelay(2).
> > > 
> > > Similar concerns as in [1]. We don't need the accuracy of udelay() here,
> > > so this boils down to which is the better use of CPU. We could probably
> > > relax the max delay more if that was helpful. But I'm not immediately
> > > sold on "is most likely more efficient" which sounds like a gut feeling.
> > > 
> > > I'm sorry it's not clear in my other reply that I do appreciate
> > > addressing incorrect/silly use of usleep_range(); I'm just not (yet)
> > > convinced udelay() is the answer.
> > 
> > So one reason why we unconditionally use *sleep variants is the
> > might_sleep check. Because in the past people have used udelay and mdelay,
> > those delays had to be increased a lot because hw, and at the same time
> > someone added users of these functions to our irq helper, resulting in irq
> > handling times measures in multiple ms. That's not good.
> > 
> > So until someone can demonstrate that there's a real benefit (which let's
> > be honest, for modeset code, will never be the case) I very highly prefer
> > to use *sleep* functions. They prevent some silly things from happening by
> > accident. Micro-optimizing modeset code and hampering maitainability in
> > the process is not good.
> 
> Also, the entire premise seems backwards: usleep_range is inefficient for
> certain parameter ranges and better replaced with udelay. That makes
> sense.
> 
> But why exactly do we not fix udelay_range then, but instead do a cocci
> job crawling through all the thousands of callers? Just fix usleep(_range)
> to use udelay for very small values (and still keep the might_sleep check
> ofc) if that's more efficient!

its actually not thousands more like a few dozen:

usleep_range(min,max) in linux-stable 4.9.0

1648 calls total
1488 pass numeric values only (90.29%)
  27 min below 10us (1.81%)
  40 min above 10ms (2.68%)
     min out of spec 4.50%
  76 preprocessor constants (4.61%)
   1 min below 10us (1.31%)
   8 min above 10ms (10.52%)
     min out of spec 11.84%
  85 expressions (5.15%)
1(0) min below 10us (1.50%)*
6(2) min above 10ms (7.50%)*
     min out of spec 9.0%
Errors:
  23 where min==max  (1.39%)
   0 where max < min (0.00%)

Total:
  Bugs: 6.48%-10.70%*
  Crit: 3.09%-3.15%* (min < 10 and min==max and max < min)
  Detectable by coccinelle:
  Bugs: 74/103 (71.8%)
  Crit: 50/52 (96.1%)

*based on estimats as runtime values not known.


there is no usleep() as noted in Documentation/timers/timers-howto.txt
                - Why not usleep?
                        On slower systems, (embedded, OR perhaps a speed-
                        stepped PC!) the overhead of setting up the hrtimers
                        for usleep *may* not be worth it. Such an evaluation
                        will obviously depend on your specific situation, but
                        it is something to be aware of.

and it suggests to use different API for different ranges - sounds sane
to me and seems to cover the needs of drivers.

thx!
hofrat
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/intel_dsi.c b/drivers/gpu/drm/i915/intel_dsi.c
index 5b72c50..19fe86b 100644
--- a/drivers/gpu/drm/i915/intel_dsi.c
+++ b/drivers/gpu/drm/i915/intel_dsi.c
@@ -379,7 +379,7 @@  static void bxt_dsi_device_ready(struct intel_encoder *encoder)
 		val &= ~ULPS_STATE_MASK;
 		val |= (ULPS_STATE_ENTER | DEVICE_READY);
 		I915_WRITE(MIPI_DEVICE_READY(port), val);
-		usleep_range(2, 3);
+		udelay(2);
 
 		/* 3. Exit ULPS */
 		val = I915_READ(MIPI_DEVICE_READY(port));