__i915_spin_request() sucks

Message ID	20151113151204.GA8939@kernel.dk (mailing list archive)
State	New, archived
Headers	show Return-Path: <dri-devel-bounces@lists.freedesktop.org> Date: Fri, 13 Nov 2015 08:12:04 -0700 From: Jens Axboe <axboe@kernel.dk> To: Chris Wilson <chris@chris-wilson.co.uk>, Daniel Vetter <daniel.vetter@ffwll.ch>, DRI Development <dri-devel@lists.freedesktop.org>, LKML <linux-kernel@vger.kernel.org> Subject: Re: __i915_spin_request() sucks Message-ID: <20151113151204.GA8939@kernel.dk> References: <5644F850.2060803@kernel.dk> <5644F941.9090505@kernel.dk> <20151112221908.GA26194@nuc-i3427.alporthouse.com> <56451812.2050704@kernel.dk> <20151113091558.GN6247@nuc-i3427.alporthouse.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20151113091558.GN6247@nuc-i3427.alporthouse.com> FCC: imap://axboe%40kernel.dk@imap.gmail.com/[Gmail]/Sent Mail User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 Precedence: list Content-Transfer-Encoding: base64 Content-Type: text/plain; charset="utf-8"; Format="flowed" Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>

Message ID

20151113151204.GA8939@kernel.dk (mailing list archive)

State

New, archived

Headers

Date: Fri, 13 Nov 2015 08:12:04 -0700
From: Jens Axboe <axboe@kernel.dk>
To: Chris Wilson <chris@chris-wilson.co.uk>,
	Daniel Vetter <daniel.vetter@ffwll.ch>,
	DRI Development <dri-devel@lists.freedesktop.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: __i915_spin_request() sucks
Message-ID: <20151113151204.GA8939@kernel.dk>
References: <5644F850.2060803@kernel.dk> <5644F941.9090505@kernel.dk>
	<20151112221908.GA26194@nuc-i3427.alporthouse.com>
	<56451812.2050704@kernel.dk>
	<20151113091558.GN6247@nuc-i3427.alporthouse.com>
MIME-Version: 1.0
Content-Disposition: inline
In-Reply-To: <20151113091558.GN6247@nuc-i3427.alporthouse.com>
FCC: imap://axboe%40kernel.dk@imap.gmail.com/[Gmail]/Sent Mail
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101
	Thunderbird/38.3.0
Precedence: list
Content-Transfer-Encoding: base64
Content-Type: text/plain; charset="utf-8"; Format="flowed"
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>

Commit Message

Jens Axboe Nov. 13, 2015, 3:12 p.m. UTC

On 11/13/2015 02:15 AM, Chris Wilson wrote:
> On Thu, Nov 12, 2015 at 03:52:02PM -0700, Jens Axboe wrote:
>> On 11/12/2015 03:19 PM, Chris Wilson wrote:
>>>>> So today, I figured I'd try just killing that spin. If it fails, we'll
>>>>> punt to normal completions, so easy change. And wow, MASSIVE difference.
>>>>> I can now scroll in chrome and not rage! It's like the laptop is 10x
>>>>> faster now.
>>>>>
>>>>> Ran git blame, and found:
>>>>>
>>>>> commit 2def4ad99befa25775dd2f714fdd4d92faec6e34
>>>>> Author: Chris Wilson <chris@chris-wilson.co.uk>
>>>>> Date:   Tue Apr 7 16:20:41 2015 +0100
>>>>>
>>>>>      drm/i915: Optimistically spin for the request completion
>>>>>
>>>>> and read the commit message. Doesn't sound that impressive. Especially
>>>>> not for something that screws up interactive performance by a LOT.
>>>>>
>>>>> What's the deal? Revert?
>>>
>>> The tests that it improved the most were the latency sensitive tests and
>>> since my Broadwell xps13 behaves itself, I'd like to understand how it
>>> culminates in an interactivity loss.
>>>
>>> 1. Maybe it is the uninterruptible nature of the polling, making X's
>>> SIGIO jerky:
>>
>> This one still feels bad.
>>
>>> 2. Or maybe it is increased mutex contention:
>>
>> And so does this one... I had to manually apply hunks 2-3, and after
>> doing seat-of-the-pants testing for both variants, I confirmed with
>> perf that we're still seeing a ton of time in __i915_wait_request()
>> for both of them.
>>
>>> Or maybe it is an indirect effect, such as power balancing between the
>>> CPU and GPU, or just thermal throttling, or it may be the task being
>>> penalised for consuming its timeslice (for which any completion polling
>>> seems susceptible).
>>
>> Look, polls in the 1-10ms range are just insane. Either you botched
>> the commit message and really meant "~1ms at most" and in which case
>> I'd suspect you of smoking something good, or you hacked it up wrong
>> and used jiffies when you really wanted to be using some other time
>> check that really did give you 1us.
>
> What other time check? I was under the impression of setting up a
> hrtimer was expensive and jiffies was available.

Looping for 10ms is a lot more expensive :-). jiffies is always there, 
but it's WAY too coarse to be used for something like this.

You could use ktime_get(), there's a lot of helpers for checking 
time_after, adding msecs, etc. Something like the below, not tested here
yet.

>> I'll take an IRQ over 10 msecs of busy looping on my laptop, thanks.
>>
>>>> "Limit the spinning to a single jiffie (~1us) at most"
>>>>
>>>> is totally wrong. I have HZ=100 on my laptop. That's 10ms. 10ms!
>>>> Even if I had HZ=1000, that'd still be 1ms of spinning. That's
>>>> seriously screwed up, guys.
>>>
>>> That's over and above the termination condition for blk_poll().
>>
>> ?! And this is related, how? Comparing apples and oranges. One is a
>> test opt-in feature for experimentation, the other is
>> unconditionally enabled for everyone. I believe the commit even says
>> so. See the difference? Would I use busy loop spinning waiting for
>> rotating storage completions, which are in the 1-10ms range? No,
>> with the reason being that the potential wins for spins are in the
>> usec range.
>
> Equally I expect the service interval for a batch to be around 2-20us.
> There are many workloads that execute 30-50k requests/s, and you can
> appreciate that they are sensitive to the latency in setting up an irq
> and receiving it - just as equally leaving that irq enabled saturates a
> CPU with simply executing the irq handler. So what mechanism do you use
> to guard against either a very long queue depth or an abnormal request
> causing msec+ spins?

Not disputing that polling can work, but it needs to be a bit more 
clever. Do you know which requests are fast and which ones are not? 
Could you track it? Should we make this a module option?

20usec is too long to poll. If we look at the wins of polling, we're
talking anywhere from 1-2 usec to maybe 5 usec, depending on different
factors. So spinning between 1-3 usec should be a hard limit on most
platforms. And it's somewhat of a policy decision, since it does involve
throwing CPU at the problem. There's a crossover point where below it's
always a win, but that needs a lot more work than just optimistic
spinning for everything. You also do need a check for whether the task
has been woken up, that's also missing.

As for interrupt mitigation, I'd consider that a separate problem. It's
a lot simpler than the app induced polling that i915 is doing here. So
if overloading a core with IRQs is an issue, I'd solve that differently
similarly to NAPI or blk-iopoll (not to be confused with blk_poll() that
you referenced and is app induced polling).

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 5cf4a1998273..658514e899b1 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1148,17 +1148,19 @@  static bool missed_irq(struct drm_i915_private *dev_priv,
 
 static int __i915_spin_request(struct drm_i915_gem_request *req)
 {
-	unsigned long timeout;
+	ktime_t start, end;
 
 	if (i915_gem_request_get_ring(req)->irq_refcount)
 		return -EBUSY;
 
-	timeout = jiffies + 1;
+	start = ktime_get();
+	end.tv64 = start.tv64;
+	ktime_add_us(end, 1);
 	while (!need_resched()) {
 		if (i915_gem_request_completed(req, true))
 			return 0;
 
-		if (time_after_eq(jiffies, timeout))
+		if (ktime_after(start, end))
 			break;
 
 		cpu_relax_lowlatency();

__i915_spin_request() sucks

Commit Message

Patch