diff mbox series

drm/mm: revert "Break long searches in fragmented address spaces"

Message ID 20200330123425.3944-1-christian.koenig@amd.com (mailing list archive)
State New, archived
Headers show
Series drm/mm: revert "Break long searches in fragmented address spaces" | expand

Commit Message

Christian König March 30, 2020, 12:34 p.m. UTC
This reverts commit 7be1b9b8e9d1e9ef0342d2e001f44eec4030aa4d.

The drm_mm is supposed to work in atomic context, so calling schedule()
or in this case cond_resched() is illegal.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/drm_mm.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

Comments

Chris Wilson March 30, 2020, 12:40 p.m. UTC | #1
Quoting Christian König (2020-03-30 13:34:25)
> This reverts commit 7be1b9b8e9d1e9ef0342d2e001f44eec4030aa4d.
> 
> The drm_mm is supposed to work in atomic context, so calling schedule()
> or in this case cond_resched() is illegal.

https://patchwork.freedesktop.org/patch/358535/?series=74984&rev=1

(Though I do question the wisdom in searching, rather than just doing a
cursory check, from an atomic context :)
-Chris
Christian König March 31, 2020, 8:59 a.m. UTC | #2
A not so gentle ping, since this pretty much broke all TTM based drivers.

Could we revert this for now?

Thanks,
Christian.

Am 30.03.20 um 14:34 schrieb Christian König:
> This reverts commit 7be1b9b8e9d1e9ef0342d2e001f44eec4030aa4d.
>
> The drm_mm is supposed to work in atomic context, so calling schedule()
> or in this case cond_resched() is illegal.
>
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
>   drivers/gpu/drm/drm_mm.c | 8 +-------
>   1 file changed, 1 insertion(+), 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_mm.c b/drivers/gpu/drm/drm_mm.c
> index bc6e208949e8..8981abe8b7c9 100644
> --- a/drivers/gpu/drm/drm_mm.c
> +++ b/drivers/gpu/drm/drm_mm.c
> @@ -45,7 +45,6 @@
>   #include <linux/export.h>
>   #include <linux/interval_tree_generic.h>
>   #include <linux/seq_file.h>
> -#include <linux/sched/signal.h>
>   #include <linux/slab.h>
>   #include <linux/stacktrace.h>
>   
> @@ -367,11 +366,6 @@ next_hole(struct drm_mm *mm,
>   	  struct drm_mm_node *node,
>   	  enum drm_mm_insert_mode mode)
>   {
> -	/* Searching is slow; check if we ran out of time/patience */
> -	cond_resched();
> -	if (fatal_signal_pending(current))
> -		return NULL;
> -
>   	switch (mode) {
>   	default:
>   	case DRM_MM_INSERT_BEST:
> @@ -563,7 +557,7 @@ int drm_mm_insert_node_in_range(struct drm_mm * const mm,
>   		return 0;
>   	}
>   
> -	return signal_pending(current) ? -ERESTARTSYS : -ENOSPC;
> +	return -ENOSPC;
>   }
>   EXPORT_SYMBOL(drm_mm_insert_node_in_range);
>
Daniel Vetter March 31, 2020, 9:16 a.m. UTC | #3
On Tue, Mar 31, 2020 at 10:59:45AM +0200, Christian König wrote:
> A not so gentle ping, since this pretty much broke all TTM based drivers.
> 
> Could we revert this for now?

Always ack for revert.

Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>

Needs to go to drm-misc-next-fixes, and then maybe also ask for a
backmerge since the patch landed pre-split. Also ping Maarten to do
another pull request (there's other stuff in there already anyway).
-Daniel
> 
> Thanks,
> Christian.
> 
> Am 30.03.20 um 14:34 schrieb Christian König:
> > This reverts commit 7be1b9b8e9d1e9ef0342d2e001f44eec4030aa4d.
> > 
> > The drm_mm is supposed to work in atomic context, so calling schedule()
> > or in this case cond_resched() is illegal.
> > 
> > Signed-off-by: Christian König <christian.koenig@amd.com>
> > ---
> >   drivers/gpu/drm/drm_mm.c | 8 +-------
> >   1 file changed, 1 insertion(+), 7 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/drm_mm.c b/drivers/gpu/drm/drm_mm.c
> > index bc6e208949e8..8981abe8b7c9 100644
> > --- a/drivers/gpu/drm/drm_mm.c
> > +++ b/drivers/gpu/drm/drm_mm.c
> > @@ -45,7 +45,6 @@
> >   #include <linux/export.h>
> >   #include <linux/interval_tree_generic.h>
> >   #include <linux/seq_file.h>
> > -#include <linux/sched/signal.h>
> >   #include <linux/slab.h>
> >   #include <linux/stacktrace.h>
> > @@ -367,11 +366,6 @@ next_hole(struct drm_mm *mm,
> >   	  struct drm_mm_node *node,
> >   	  enum drm_mm_insert_mode mode)
> >   {
> > -	/* Searching is slow; check if we ran out of time/patience */
> > -	cond_resched();
> > -	if (fatal_signal_pending(current))
> > -		return NULL;
> > -
> >   	switch (mode) {
> >   	default:
> >   	case DRM_MM_INSERT_BEST:
> > @@ -563,7 +557,7 @@ int drm_mm_insert_node_in_range(struct drm_mm * const mm,
> >   		return 0;
> >   	}
> > -	return signal_pending(current) ? -ERESTARTSYS : -ENOSPC;
> > +	return -ENOSPC;
> >   }
> >   EXPORT_SYMBOL(drm_mm_insert_node_in_range);
>
Chris Wilson March 31, 2020, 9:19 a.m. UTC | #4
Quoting Christian König (2020-03-31 09:59:45)
> A not so gentle ping, since this pretty much broke all TTM based drivers.
> 
> Could we revert this for now?

Ping???
https://patchwork.freedesktop.org/patch/358535/?series=74984&rev=1
-Chris
Chris Wilson March 31, 2020, 9:20 a.m. UTC | #5
Quoting Daniel Vetter (2020-03-31 10:16:18)
> On Tue, Mar 31, 2020 at 10:59:45AM +0200, Christian König wrote:
> > A not so gentle ping, since this pretty much broke all TTM based drivers.
> > 
> > Could we revert this for now?
> 
> Always ack for revert.
> 
> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>

So you didn't check the earlier patch either?
-Chris
Daniel Vetter March 31, 2020, 10:38 a.m. UTC | #6
On Tue, Mar 31, 2020 at 11:20 AM Chris Wilson <chris@chris-wilson.co.uk> wrote:
> Quoting Daniel Vetter (2020-03-31 10:16:18)
> > On Tue, Mar 31, 2020 at 10:59:45AM +0200, Christian König wrote:
> > > A not so gentle ping, since this pretty much broke all TTM based drivers.
> > >
> > > Could we revert this for now?
> >
> > Always ack for revert.
> >
> > Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
>
> So you didn't check the earlier patch either?

I did, but wasn't super sold on the idea of more flags to smack an r-b
onto it, so figured I'll throw the default ack-for-revert on this
meanwhile.
-Daniel
Christian König March 31, 2020, 12:44 p.m. UTC | #7
Am 31.03.20 um 12:38 schrieb Daniel Vetter:
> On Tue, Mar 31, 2020 at 11:20 AM Chris Wilson <chris@chris-wilson.co.uk> wrote:
>> Quoting Daniel Vetter (2020-03-31 10:16:18)
>>> On Tue, Mar 31, 2020 at 10:59:45AM +0200, Christian König wrote:
>>>> A not so gentle ping, since this pretty much broke all TTM based drivers.
>>>>
>>>> Could we revert this for now?
>>> Always ack for revert.
>>>
>>> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
>> So you didn't check the earlier patch either?
> I did, but wasn't super sold on the idea of more flags to smack an r-b
> onto it, so figured I'll throw the default ack-for-revert on this
> meanwhile.

Mhm, and there is something wrong with either dri-devel or patchwork (I 
suspect the former).

I can't see your reply on patchwork and it entered my inbox much later 
than Daniels mails.

Christian.

> -Daniel
Chris Wilson March 31, 2020, 1:19 p.m. UTC | #8
Quoting Daniel Vetter (2020-03-31 11:38:50)
> On Tue, Mar 31, 2020 at 11:20 AM Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > Quoting Daniel Vetter (2020-03-31 10:16:18)
> > > On Tue, Mar 31, 2020 at 10:59:45AM +0200, Christian König wrote:
> > > > A not so gentle ping, since this pretty much broke all TTM based drivers.
> > > >
> > > > Could we revert this for now?
> > >
> > > Always ack for revert.
> > >
> > > Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> >
> > So you didn't check the earlier patch either?
> 
> I did, but wasn't super sold on the idea of more flags to smack an r-b
> onto it, so figured I'll throw the default ack-for-revert on this
> meanwhile.

We allow userspace to poison the drm_mm at roughly 8K intervals, a
search space of 35b with typically O(N^2) behaviour and each node
traversal (rb_next/rb_prev) will itself be costly. Even our simple tests
can generate a search of several minutes before our patience runs out.o
Any drm_mm that allows for userspace to control alignment can be
arbitrarily fragmented, hence a raised eyebrow that this search would be
allowed in atomic context.
-Chris
Christian König April 1, 2020, 7:29 a.m. UTC | #9
Am 31.03.20 um 15:19 schrieb Chris Wilson:
> Quoting Daniel Vetter (2020-03-31 11:38:50)
>> On Tue, Mar 31, 2020 at 11:20 AM Chris Wilson <chris@chris-wilson.co.uk> wrote:
>>> Quoting Daniel Vetter (2020-03-31 10:16:18)
>>>> On Tue, Mar 31, 2020 at 10:59:45AM +0200, Christian König wrote:
>>>>> A not so gentle ping, since this pretty much broke all TTM based drivers.
>>>>>
>>>>> Could we revert this for now?
>>>> Always ack for revert.
>>>>
>>>> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
>>> So you didn't check the earlier patch either?
>> I did, but wasn't super sold on the idea of more flags to smack an r-b
>> onto it, so figured I'll throw the default ack-for-revert on this
>> meanwhile.
> We allow userspace to poison the drm_mm at roughly 8K intervals, a
> search space of 35b with typically O(N^2) behaviour and each node
> traversal (rb_next/rb_prev) will itself be costly. Even our simple tests
> can generate a search of several minutes before our patience runs out.o
> Any drm_mm that allows for userspace to control alignment can be
> arbitrarily fragmented, hence a raised eyebrow that this search would be
> allowed in atomic context.

Wow, that is indeed quite a lot.

What is the criteria use for ordering the tree? Just the size or is that 
size+alignment?

Never looked into this, but maybe we have a low hanging fruit for an 
improvement here?

I'm not 100% sure, but moving away from atomic context wouldn't be that 
easy.

Christian.

> -Chris
Chris Wilson April 1, 2020, 8:53 a.m. UTC | #10
Quoting Christian König (2020-04-01 08:29:34)
> Am 31.03.20 um 15:19 schrieb Chris Wilson:
> > Quoting Daniel Vetter (2020-03-31 11:38:50)
> >> On Tue, Mar 31, 2020 at 11:20 AM Chris Wilson <chris@chris-wilson.co.uk> wrote:
> >>> Quoting Daniel Vetter (2020-03-31 10:16:18)
> >>>> On Tue, Mar 31, 2020 at 10:59:45AM +0200, Christian König wrote:
> >>>>> A not so gentle ping, since this pretty much broke all TTM based drivers.
> >>>>>
> >>>>> Could we revert this for now?
> >>>> Always ack for revert.
> >>>>
> >>>> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> >>> So you didn't check the earlier patch either?
> >> I did, but wasn't super sold on the idea of more flags to smack an r-b
> >> onto it, so figured I'll throw the default ack-for-revert on this
> >> meanwhile.
> > We allow userspace to poison the drm_mm at roughly 8K intervals, a
> > search space of 35b with typically O(N^2) behaviour and each node
> > traversal (rb_next/rb_prev) will itself be costly. Even our simple tests
> > can generate a search of several minutes before our patience runs out.o
> > Any drm_mm that allows for userspace to control alignment can be
> > arbitrarily fragmented, hence a raised eyebrow that this search would be
> > allowed in atomic context.
> 
> Wow, that is indeed quite a lot.
> 
> What is the criteria use for ordering the tree? Just the size or is that 
> size+alignment?

The tree is just size. Alignment is a little used parameter, but there's
a requirement for userspace to be able to control it -- although it is
strictly the older interface, it is still open to abuse.

Converting the tree to [size, ffs(addr)] would help for many, but on top
of that we have zones in the drm_mm, so search-in-range can be abused on
top of search-for-alignment.
 
> Never looked into this, but maybe we have a low hanging fruit for an 
> improvement here?

A bit -- alignment is so rarely used in practice, optimising it was not
a concern, just someone else has now noticed the potential for abuse.
They ran a test, get bored and complained that it didn't respond to ^C
for a long period of time and from that derive a proof-of-concept test to
show how it can be used by one client to upset another :|
 
> I'm not 100% sure, but moving away from atomic context wouldn't be that 
> easy.

Fair enough. I would not worry unless the layout is controllable by the
user -- but we probably want some other means of bounding the search.
-Chris
Christian König April 1, 2020, 9:17 a.m. UTC | #11
Am 01.04.20 um 10:53 schrieb Chris Wilson:
> Quoting Christian König (2020-04-01 08:29:34)
>> Am 31.03.20 um 15:19 schrieb Chris Wilson:
>>> Quoting Daniel Vetter (2020-03-31 11:38:50)
>>>> On Tue, Mar 31, 2020 at 11:20 AM Chris Wilson <chris@chris-wilson.co.uk> wrote:
>>>>> Quoting Daniel Vetter (2020-03-31 10:16:18)
>>>>>> On Tue, Mar 31, 2020 at 10:59:45AM +0200, Christian König wrote:
>>>>>> [SNIP]
>>> We allow userspace to poison the drm_mm at roughly 8K intervals, a
>>> search space of 35b with typically O(N^2) behaviour and each node
>>> traversal (rb_next/rb_prev) will itself be costly. Even our simple tests
>>> can generate a search of several minutes before our patience runs out.o
>>> Any drm_mm that allows for userspace to control alignment can be
>>> arbitrarily fragmented, hence a raised eyebrow that this search would be
>>> allowed in atomic context.
>> Wow, that is indeed quite a lot.
>>
>> What is the criteria use for ordering the tree? Just the size or is that
>> size+alignment?
> The tree is just size. Alignment is a little used parameter, but there's
> a requirement for userspace to be able to control it -- although it is
> strictly the older interface, it is still open to abuse.
>
> Converting the tree to [size, ffs(addr)] would help for many, but on top
> of that we have zones in the drm_mm, so search-in-range can be abused on
> top of search-for-alignment.

The difference is that search in range is not controllable by userspace, 
but at least for amdgpu the alignment is very well controllable.

>> Never looked into this, but maybe we have a low hanging fruit for an
>> improvement here?
> A bit -- alignment is so rarely used in practice, optimising it was not
> a concern, just someone else has now noticed the potential for abuse.

Well we do use alignment rather widely. IIRC we can have everything 
between 4K and 2MB based on the tilling flags, memory channel config etc 
etc...

> They ran a test, get bored and complained that it didn't respond to ^C
> for a long period of time and from that derive a proof-of-concept test to
> show how it can be used by one client to upset another :|

And as far as I can see that is a really valid problem we need to fix. 
Give me a second to write a test case for this.

Thanks for pointing that out,
Christian.

>   
>> I'm not 100% sure, but moving away from atomic context wouldn't be that
>> easy.
> Fair enough. I would not worry unless the layout is controllable by the
> user -- but we probably want some other means of bounding the search.
> -Chris
Christian König April 6, 2020, 8:25 a.m. UTC | #12
[Adding Nirmoy, setting bunch of people to BCC]

This bubbled up in our internal testing as well. Nirmoy now wants to 
take a look at it.

Am 01.04.20 um 11:17 schrieb Christian König:
> Am 01.04.20 um 10:53 schrieb Chris Wilson:
>> Quoting Christian König (2020-04-01 08:29:34)
>>> Am 31.03.20 um 15:19 schrieb Chris Wilson:
>>>> Quoting Daniel Vetter (2020-03-31 11:38:50)
>>>>> On Tue, Mar 31, 2020 at 11:20 AM Chris Wilson 
>>>>> <chris@chris-wilson.co.uk> wrote:
>>>>>> Quoting Daniel Vetter (2020-03-31 10:16:18)
>>>>>>> On Tue, Mar 31, 2020 at 10:59:45AM +0200, Christian König wrote:
>>>>>>> [SNIP]
>>>> We allow userspace to poison the drm_mm at roughly 8K intervals, a
>>>> search space of 35b with typically O(N^2) behaviour and each node
>>>> traversal (rb_next/rb_prev) will itself be costly. Even our simple 
>>>> tests
>>>> can generate a search of several minutes before our patience runs 
>>>> out.o
>>>> Any drm_mm that allows for userspace to control alignment can be
>>>> arbitrarily fragmented, hence a raised eyebrow that this search 
>>>> would be
>>>> allowed in atomic context.
>>> Wow, that is indeed quite a lot.
>>>
>>> What is the criteria use for ordering the tree? Just the size or is 
>>> that
>>> size+alignment?
>> The tree is just size. Alignment is a little used parameter, but there's
>> a requirement for userspace to be able to control it -- although it is
>> strictly the older interface, it is still open to abuse.
>>
>> Converting the tree to [size, ffs(addr)] would help for many, but on top
>> of that we have zones in the drm_mm, so search-in-range can be abused on
>> top of search-for-alignment.
>
> The difference is that search in range is not controllable by 
> userspace, but at least for amdgpu the alignment is very well 
> controllable.
>
>>> Never looked into this, but maybe we have a low hanging fruit for an
>>> improvement here?
>> A bit -- alignment is so rarely used in practice, optimising it was not
>> a concern, just someone else has now noticed the potential for abuse.
>
> Well we do use alignment rather widely. IIRC we can have everything 
> between 4K and 2MB based on the tilling flags, memory channel config 
> etc etc...
>
>> They ran a test, get bored and complained that it didn't respond to ^C
>> for a long period of time and from that derive a proof-of-concept 
>> test to
>> show how it can be used by one client to upset another :|
>
> And as far as I can see that is a really valid problem we need to fix. 
> Give me a second to write a test case for this.

Nirmoy, could you tackle this first? I've came up with some very quick 
and dirty code for this for our libdrm unit tests.

Ping me internally and we can chat about it.

Thanks,
Christian.

>
> Thanks for pointing that out,
> Christian.
diff mbox series

Patch

diff --git a/drivers/gpu/drm/drm_mm.c b/drivers/gpu/drm/drm_mm.c
index bc6e208949e8..8981abe8b7c9 100644
--- a/drivers/gpu/drm/drm_mm.c
+++ b/drivers/gpu/drm/drm_mm.c
@@ -45,7 +45,6 @@ 
 #include <linux/export.h>
 #include <linux/interval_tree_generic.h>
 #include <linux/seq_file.h>
-#include <linux/sched/signal.h>
 #include <linux/slab.h>
 #include <linux/stacktrace.h>
 
@@ -367,11 +366,6 @@  next_hole(struct drm_mm *mm,
 	  struct drm_mm_node *node,
 	  enum drm_mm_insert_mode mode)
 {
-	/* Searching is slow; check if we ran out of time/patience */
-	cond_resched();
-	if (fatal_signal_pending(current))
-		return NULL;
-
 	switch (mode) {
 	default:
 	case DRM_MM_INSERT_BEST:
@@ -563,7 +557,7 @@  int drm_mm_insert_node_in_range(struct drm_mm * const mm,
 		return 0;
 	}
 
-	return signal_pending(current) ? -ERESTARTSYS : -ENOSPC;
+	return -ENOSPC;
 }
 EXPORT_SYMBOL(drm_mm_insert_node_in_range);