Guarantee udelay(N) spins at least N microseconds

Message ID	55282C1F.3000600@free.fr (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org> sender: shill) by smtp4-g21.free.fr (Postfix) with ESMTPSA id 870F64C801D; Fri, 10 Apr 2015 21:59:40 +0200 (CEST) Message-ID: <55282C1F.3000600@free.fr> Date: Fri, 10 Apr 2015 22:01:35 +0200 From: Mason <slash.tmp@free.fr> User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:36.0) Gecko/20100101 Firefox/36.0 SeaMonkey/2.33.1 MIME-Version: 1.0 To: Russell King - ARM Linux <linux@arm.linux.org.uk> Subject: Re: Guarantee udelay(N) spins at least N microseconds References: <5527B331.5000205@free.fr> <20150410114415.GC12732@n2100.arm.linux.org.uk> <5527C501.2040808@free.fr> <20150410150607.GG12732@n2100.arm.linux.org.uk> <5527EC90.9050900@free.fr> <20150410160817.GH12732@n2100.arm.linux.org.uk> In-Reply-To: <20150410160817.GH12732@n2100.arm.linux.org.uk> Cc: Will Deacon <will.deacon@arm.com>, Stephen Boyd <sboyd@codeaurora.org>, Jonathan Austin <jonathan.austin@arm.com>, Linux ARM <linux-arm-kernel@lists.infradead.org> Precedence: list Content-Type: text/plain; charset="iso-8859-15" Content-Transfer-Encoding: quoted-printable Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org> Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org

Mason April 10, 2015, 8:01 p.m. UTC

On 10/04/2015 18:08, Russell King - ARM Linux wrote:
> On Fri, Apr 10, 2015 at 05:30:24PM +0200, Mason wrote:
>> I appreciate (very much so) that you spend time replying to me,
>> but I also sense a lot of animosity, and I don't know what I've
>> done wrong to deserve it :-(
> 
> I'm putting the point across strongly because I really don't think
> there is an issue to be solved here.
> 
>> On 10/04/2015 17:06, Russell King - ARM Linux wrote:
>>> And what this means is that udelay(n) where 'n' is less than the
>>> period between two timer interrupts /will/ be, and is /expected to
>>> be/ potentially shorter than the requested period.
>>
>> You've made it clear how loop-based delays are implemented; and also
>> that loop-based delays are typically 1% shorter than requested.
>> (Thanks for the overview, by the way.) Please note that I haven't
>> touched to the loop-based code, I'm only discussing the timer-based
>> code.
> 
> 1% is a figure I pulled out of the air.  It really depends on the CPU
> instructions per cycle, and how much work is being done in the timer
> interrupt handler.
> 
>>> There's no getting away from that, we can't estimate how long the timer
>>> interrupt takes to handle without the use of an external timer, and if
>>> we've got an external timer, we might as well use it for all delays.
>>
>> Exactly! And my patch only changes __timer_const_udelay() so again I'm
>> not touching loop-based code.
> 
> What I'm trying to get through to you is that udelay() as a _whole_ does
> not provide a guarantee that it will wait for _at least_ the time you
> asked for.  All that it does is provide an _approximate_ delay.

I've taken time to digest the information you provided.

Let's forget the subject of my message, i.e. trying to
guarantee udelay(N) spins at least N microseconds.

There is, however, an important difference between loop-based
delays and timer-based delays; CPU frequencies typically fall
in the 50-5000 MHz range, while timer frequencies typically
span tens of kHz up to hundreds of MHz. For example, 90 kHz
is sometimes provided in multimedia systems (MPEG TS).

And rounding down the cycle count becomes a real problem
with coarse timers. Consider the aforementioned 90 kHz
timer, i.e. T = 11.1 µs

If a driver wants to spin for 5 µs, and the author was
careful enough to double the number, for a safe cushion,
he'll request 10 µs. But since 10 < 11.1, the request
gets rounded down to 0.

Therefore, I think timer-based loops should round up.

If you NAK the patch, that's OK. I'll drop the issue,
and only apply it locally.

Regards.

Patch provided for easier reference:

Russell King - ARM Linux April 10, 2015, 8:42 p.m. UTC | #1

On Fri, Apr 10, 2015 at 10:01:35PM +0200, Mason wrote:
> There is, however, an important difference between loop-based
> delays and timer-based delays; CPU frequencies typically fall
> in the 50-5000 MHz range, while timer frequencies typically
> span tens of kHz up to hundreds of MHz. For example, 90 kHz
> is sometimes provided in multimedia systems (MPEG TS).

Why would you want to use such a slowly clocked counter for something
which is supposed to be able to produce delays in the micro-second and
potentially the nanosecond range?

get_cycles(), which is what the timer based delay is based upon, is
supposed to be a _high resolution counter_, preferably running at
the same kind of speeds as the CPU, though with a fixed clock rate.
It most definitely is not supposed to be in the kHz range.

Mason April 10, 2015, 9:22 p.m. UTC | #2

On 10/04/2015 22:42, Russell King - ARM Linux wrote:
> On Fri, Apr 10, 2015 at 10:01:35PM +0200, Mason wrote:
>> There is, however, an important difference between loop-based
>> delays and timer-based delays; CPU frequencies typically fall
>> in the 50-5000 MHz range, while timer frequencies typically
>> span tens of kHz up to hundreds of MHz. For example, 90 kHz
>> is sometimes provided in multimedia systems (MPEG TS).
> 
> Why would you want to use such a slowly clocked counter for something
> which is supposed to be able to produce delays in the micro-second and
> potentially the nanosecond range?
> 
> get_cycles(), which is what the timer based delay is based upon, is
> supposed to be a _high resolution counter_, preferably running at
> the same kind of speeds as the CPU, though with a fixed clock rate.
> It most definitely is not supposed to be in the kHz range.

If there's only a single fixed clock in the system, I'd
use it for sched_clock, clocksource, and timer delay.
Are there other options?

It was you who wrote some time ago: "Timers are preferred
because of the problems with the software delay loop."
(My system implements DVFS.)

It seems to me that a 90 kHz timer is still better than
the jiffy counter, or am I mistaken again?

Regards.

Russell King - ARM Linux April 11, 2015, 7:30 a.m. UTC | #3

On Fri, Apr 10, 2015 at 11:22:56PM +0200, Mason wrote:
> On 10/04/2015 22:42, Russell King - ARM Linux wrote:
> > On Fri, Apr 10, 2015 at 10:01:35PM +0200, Mason wrote:
> >> There is, however, an important difference between loop-based
> >> delays and timer-based delays; CPU frequencies typically fall
> >> in the 50-5000 MHz range, while timer frequencies typically
> >> span tens of kHz up to hundreds of MHz. For example, 90 kHz
> >> is sometimes provided in multimedia systems (MPEG TS).
> > 
> > Why would you want to use such a slowly clocked counter for something
> > which is supposed to be able to produce delays in the micro-second and
> > potentially the nanosecond range?
> > 
> > get_cycles(), which is what the timer based delay is based upon, is
> > supposed to be a _high resolution counter_, preferably running at
> > the same kind of speeds as the CPU, though with a fixed clock rate.
> > It most definitely is not supposed to be in the kHz range.
> 
> If there's only a single fixed clock in the system, I'd
> use it for sched_clock, clocksource, and timer delay.
> Are there other options?
> 
> It was you who wrote some time ago: "Timers are preferred
> because of the problems with the software delay loop."
> (My system implements DVFS.)
> 
> It seems to me that a 90 kHz timer is still better than
> the jiffy counter, or am I mistaken again?

Given the choice of a 90kHz timer vs using a calibrated software delay
loop, the software delay loop wins.  I never envisioned that someone
would be silly enough to think that a 90kHz timer would somehow be
suitable to replace a software delay loop calibrated against a timer.

Mason April 11, 2015, 11:57 a.m. UTC | #4

On 11/04/2015 09:30, Russell King - ARM Linux wrote:

> On Fri, Apr 10, 2015 at 11:22:56PM +0200, Mason wrote:
>
>> It was you who wrote some time ago: "Timers are preferred
>> because of the problems with the software delay loop."
>> (My system implements DVFS.)
>>
>> It seems to me that a 90 kHz timer is still better than
>> the jiffy counter, or am I mistaken again?
> 
> Given the choice of a 90kHz timer vs using a calibrated software
> delay loop, the software delay loop wins.  I never envisioned that
> someone would be silly enough to think

I'm full of surprises.

> that a 90kHz timer would somehow be suitable to replace a software
> delay loop calibrated against a timer.

Only one message ago, you were arguing that loop-based delays
could be up to 50% inaccurate. Thus, if one wanted to spin for
500 µs, they'd have to request 1 ms just to be sure. An 11 µs
accuracy looks like a better deal to me, overall.

Add DVFS to the mix, and that 500 µs loop-based delay turns into
a 50 µs delay when the other core decides to boost the cluster
from 100 MHz to 1 GHz. And then drivers break randomly.

Regards.

Russell King - ARM Linux April 11, 2015, 12:10 p.m. UTC | #5

On Sat, Apr 11, 2015 at 01:57:12PM +0200, Mason wrote:
> On 11/04/2015 09:30, Russell King - ARM Linux wrote:
> 
> > On Fri, Apr 10, 2015 at 11:22:56PM +0200, Mason wrote:
> >
> >> It was you who wrote some time ago: "Timers are preferred
> >> because of the problems with the software delay loop."
> >> (My system implements DVFS.)
> >>
> >> It seems to me that a 90 kHz timer is still better than
> >> the jiffy counter, or am I mistaken again?
> > 
> > Given the choice of a 90kHz timer vs using a calibrated software
> > delay loop, the software delay loop wins.  I never envisioned that
> > someone would be silly enough to think
> 
> I'm full of surprises.
> 
> > that a 90kHz timer would somehow be suitable to replace a software
> > delay loop calibrated against a timer.
> 
> Only one message ago, you were arguing that loop-based delays
> could be up to 50% inaccurate. Thus, if one wanted to spin for
> 500 µs, they'd have to request 1 ms just to be sure. An 11 µs
> accuracy looks like a better deal to me, overall.
> 
> Add DVFS to the mix, and that 500 µs loop-based delay turns into
> a 50 µs delay when the other core decides to boost the cluster
> from 100 MHz to 1 GHz. And then drivers break randomly.

*Think* please.

What you've just said is total rubbish.

As you've already found out, with a 90kHz clock, if you request a 5µs
delay, you get a zero delay.

That's because 90kHz has _insufficient_ resolution to be used for a
microsecond delay, and in that case it is _FAR_ better to use the
software delay loop.

Asking for a 1µs and getting an 11µs delay is _total_ bollocks.  It's
worse than a 50% error.  It's an 1100% error.

As I say, *THINK*.  I know it's a foreign idea.

You're not going to convince me to accept the idea that a 90kHz
counter is going to be suitable for a delay.  In fact, what you're
actually doing is convincing me that we need to put a test into
arch/arm/lib/delay.c to stop this stupidity: prevent it considering
any counter which does not have a clock rate in the MHz range.

That means we will reject your stupid 90kHz counter, and your
problem will be solved.

What this also means is that if you only have a 90kHz counter
available, doing DVFS is also a stupid idea.

Mason April 11, 2015, 1:45 p.m. UTC | #6

On 11/04/2015 14:10, Russell King - ARM Linux wrote:

> What this also means is that if you only have a 90kHz counter
> available, doing DVFS is also a stupid idea.

I must be missing something, because I don't see why DVFS
(i.e. saving energy when the system is idle) is a stupid
idea when only a 90 kHz counter is available?

What's the connection?

Guarantee udelay(N) spins at least N microseconds

Commit Message

Comments

Patch