drm/i915: WARN if we hit a signal from kernel context

Message ID	20180403175855.30858-1-daniel.vetter@ffwll.ch (mailing list archive)
State	New, archived
Headers	show Return-Path: <intel-gfx-bounces@lists.freedesktop.org> From: Daniel Vetter <daniel.vetter@ffwll.ch> To: Intel Graphics Development <intel-gfx@lists.freedesktop.org> Date: Tue, 3 Apr 2018 19:58:55 +0200 Message-Id: <20180403175855.30858-1-daniel.vetter@ffwll.ch> In-Reply-To: <20180403174604.30529-1-daniel.vetter@ffwll.ch> References: <20180403174604.30529-1-daniel.vetter@ffwll.ch> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH] drm/i915: WARN if we hit a signal from kernel context Precedence: list Cc: Daniel Vetter <daniel.vetter@ffwll.ch>, Matthew Wilcox <willy@infradead.org> Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>

Message ID

20180403175855.30858-1-daniel.vetter@ffwll.ch (mailing list archive)

State

New, archived

Headers

From: Daniel Vetter <daniel.vetter@ffwll.ch>
To: Intel Graphics Development <intel-gfx@lists.freedesktop.org>
Date: Tue,  3 Apr 2018 19:58:55 +0200
Message-Id: <20180403175855.30858-1-daniel.vetter@ffwll.ch>
In-Reply-To: <20180403174604.30529-1-daniel.vetter@ffwll.ch>
References: <20180403174604.30529-1-daniel.vetter@ffwll.ch>
MIME-Version: 1.0
Subject: [Intel-gfx] [PATCH] drm/i915: WARN if we hit a signal from kernel
	context
Precedence: list
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>,
	Matthew Wilcox <willy@infradead.org>
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
Errors-To: intel-gfx-bounces@lists.freedesktop.org
Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>

Commit Message

Daniel Vetter April 3, 2018, 5:58 p.m. UTC

After a discussion with Wily I got the nagging feeling we might have
some cases of nasty busy loops. The window is fairly small since we
always have a non-faulting fastpath (using page_fault_dis|enable())
first, usually followed by a pile of pending signal checks, before we
go into the slowpath copy_to|from_user that might blow up for real.

Test patch to check what CI thinks of this theory.

v2: Don't WARN on success (Ville).

Cc: Matthew Wilcox <willy@infradead.org>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_gem.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Daniel Vetter April 4, 2018, 9:24 a.m. UTC | #1

On Tue, Apr 03, 2018 at 07:58:55PM +0200, Daniel Vetter wrote:
> After a discussion with Wily I got the nagging feeling we might have
> some cases of nasty busy loops. The window is fairly small since we
> always have a non-faulting fastpath (using page_fault_dis|enable())
> first, usually followed by a pile of pending signal checks, before we
> go into the slowpath copy_to|from_user that might blow up for real.
> 
> Test patch to check what CI thinks of this theory.
> 
> v2: Don't WARN on success (Ville).
> 
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

Ok, results from CI are in, we're hitting this path with our test-suite,
and nothing bad happens.

From my very layman reading of the x86 uaccess code what happens is that
we take a fault, don't insert the pte because there's a signal pending
(VM_FAULT_NOPAGE case). But since it's a kernel fault we don't just
restart the same instruction, but go do the fixup code.

That calls copy_user_handle_tail, which does a dead-slow bytewise copy. If
that fails again, then there's a different fixup handler which just gives
up and results in a short read.

In eb_relocate_slow we retry the entire thing upon short reads, and the
first thing we do is check for signals. If that's the case we bail out,
and change the errno from -EFAULT to -ERESTARTSYS.

For most other parts (and everything else in the kernel I suspect) we get
an -EFAULT or similar indicator of short reads/writes.

Kernel at large seems indeed not prepared to deal with page faults not
going through because of signals, but then I think we get away with that
because you should pass graphics buffer addresses to random things anyway.
Both GL and vk (and everything else) have explicit upload/download
functions, and you can't just write() into gpu memory. I think.

tldr; I agree with Wily that we're probably breaking kernel fault api
expectations here, but I think it's ok ...

Only way I can see us fix this reliably is if we have a
pending_signal_disable/enable() pair, which makes all interruptible sleeps
non-interruptible. I don't core think kernel devs would like that :-)

Cheers, Daniel
> ---
>  drivers/gpu/drm/i915/i915_gem.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 9650a7b10c5f..9766fa152e05 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1902,6 +1902,7 @@ int i915_gem_fault(struct vm_fault *vmf)
>  	struct i915_vma *vma;
>  	pgoff_t page_offset;
>  	unsigned int flags;
> +	bool user_fault = vmf->flags & FAULT_FLAG_USER;
>  	int ret;
>  
>  	/* We don't use vmf->pgoff since that has the fake offset */
> @@ -2017,9 +2018,10 @@ int i915_gem_fault(struct vm_fault *vmf)
>  		 * handler to reset everything when re-faulting in
>  		 * i915_mutex_lock_interruptible.
>  		 */
> -	case 0:
>  	case -ERESTARTSYS:
>  	case -EINTR:
> +		WARN_ON(!user_fault);
> +	case 0:
>  	case -EBUSY:
>  		/*
>  		 * EBUSY is ok: this just means that another thread
> -- 
> 2.16.2
>

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 9650a7b10c5f..9766fa152e05 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1902,6 +1902,7 @@  int i915_gem_fault(struct vm_fault *vmf)
 	struct i915_vma *vma;
 	pgoff_t page_offset;
 	unsigned int flags;
+	bool user_fault = vmf->flags & FAULT_FLAG_USER;
 	int ret;
 
 	/* We don't use vmf->pgoff since that has the fake offset */
@@ -2017,9 +2018,10 @@  int i915_gem_fault(struct vm_fault *vmf)
 		 * handler to reset everything when re-faulting in
 		 * i915_mutex_lock_interruptible.
 		 */
-	case 0:
 	case -ERESTARTSYS:
 	case -EINTR:
+		WARN_ON(!user_fault);
+	case 0:
 	case -EBUSY:
 		/*
 		 * EBUSY is ok: this just means that another thread

drm/i915: WARN if we hit a signal from kernel context

Commit Message

Comments

Patch