Message ID | 20200330123425.3944-1-christian.koenig@amd.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/mm: revert "Break long searches in fragmented address spaces" | expand |
Quoting Christian König (2020-03-30 13:34:25) > This reverts commit 7be1b9b8e9d1e9ef0342d2e001f44eec4030aa4d. > > The drm_mm is supposed to work in atomic context, so calling schedule() > or in this case cond_resched() is illegal. https://patchwork.freedesktop.org/patch/358535/?series=74984&rev=1 (Though I do question the wisdom in searching, rather than just doing a cursory check, from an atomic context :) -Chris
A not so gentle ping, since this pretty much broke all TTM based drivers. Could we revert this for now? Thanks, Christian. Am 30.03.20 um 14:34 schrieb Christian König: > This reverts commit 7be1b9b8e9d1e9ef0342d2e001f44eec4030aa4d. > > The drm_mm is supposed to work in atomic context, so calling schedule() > or in this case cond_resched() is illegal. > > Signed-off-by: Christian König <christian.koenig@amd.com> > --- > drivers/gpu/drm/drm_mm.c | 8 +------- > 1 file changed, 1 insertion(+), 7 deletions(-) > > diff --git a/drivers/gpu/drm/drm_mm.c b/drivers/gpu/drm/drm_mm.c > index bc6e208949e8..8981abe8b7c9 100644 > --- a/drivers/gpu/drm/drm_mm.c > +++ b/drivers/gpu/drm/drm_mm.c > @@ -45,7 +45,6 @@ > #include <linux/export.h> > #include <linux/interval_tree_generic.h> > #include <linux/seq_file.h> > -#include <linux/sched/signal.h> > #include <linux/slab.h> > #include <linux/stacktrace.h> > > @@ -367,11 +366,6 @@ next_hole(struct drm_mm *mm, > struct drm_mm_node *node, > enum drm_mm_insert_mode mode) > { > - /* Searching is slow; check if we ran out of time/patience */ > - cond_resched(); > - if (fatal_signal_pending(current)) > - return NULL; > - > switch (mode) { > default: > case DRM_MM_INSERT_BEST: > @@ -563,7 +557,7 @@ int drm_mm_insert_node_in_range(struct drm_mm * const mm, > return 0; > } > > - return signal_pending(current) ? -ERESTARTSYS : -ENOSPC; > + return -ENOSPC; > } > EXPORT_SYMBOL(drm_mm_insert_node_in_range); >
On Tue, Mar 31, 2020 at 10:59:45AM +0200, Christian König wrote: > A not so gentle ping, since this pretty much broke all TTM based drivers. > > Could we revert this for now? Always ack for revert. Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Needs to go to drm-misc-next-fixes, and then maybe also ask for a backmerge since the patch landed pre-split. Also ping Maarten to do another pull request (there's other stuff in there already anyway). -Daniel > > Thanks, > Christian. > > Am 30.03.20 um 14:34 schrieb Christian König: > > This reverts commit 7be1b9b8e9d1e9ef0342d2e001f44eec4030aa4d. > > > > The drm_mm is supposed to work in atomic context, so calling schedule() > > or in this case cond_resched() is illegal. > > > > Signed-off-by: Christian König <christian.koenig@amd.com> > > --- > > drivers/gpu/drm/drm_mm.c | 8 +------- > > 1 file changed, 1 insertion(+), 7 deletions(-) > > > > diff --git a/drivers/gpu/drm/drm_mm.c b/drivers/gpu/drm/drm_mm.c > > index bc6e208949e8..8981abe8b7c9 100644 > > --- a/drivers/gpu/drm/drm_mm.c > > +++ b/drivers/gpu/drm/drm_mm.c > > @@ -45,7 +45,6 @@ > > #include <linux/export.h> > > #include <linux/interval_tree_generic.h> > > #include <linux/seq_file.h> > > -#include <linux/sched/signal.h> > > #include <linux/slab.h> > > #include <linux/stacktrace.h> > > @@ -367,11 +366,6 @@ next_hole(struct drm_mm *mm, > > struct drm_mm_node *node, > > enum drm_mm_insert_mode mode) > > { > > - /* Searching is slow; check if we ran out of time/patience */ > > - cond_resched(); > > - if (fatal_signal_pending(current)) > > - return NULL; > > - > > switch (mode) { > > default: > > case DRM_MM_INSERT_BEST: > > @@ -563,7 +557,7 @@ int drm_mm_insert_node_in_range(struct drm_mm * const mm, > > return 0; > > } > > - return signal_pending(current) ? -ERESTARTSYS : -ENOSPC; > > + return -ENOSPC; > > } > > EXPORT_SYMBOL(drm_mm_insert_node_in_range); >
Quoting Christian König (2020-03-31 09:59:45) > A not so gentle ping, since this pretty much broke all TTM based drivers. > > Could we revert this for now? Ping??? https://patchwork.freedesktop.org/patch/358535/?series=74984&rev=1 -Chris
Quoting Daniel Vetter (2020-03-31 10:16:18) > On Tue, Mar 31, 2020 at 10:59:45AM +0200, Christian König wrote: > > A not so gentle ping, since this pretty much broke all TTM based drivers. > > > > Could we revert this for now? > > Always ack for revert. > > Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> So you didn't check the earlier patch either? -Chris
On Tue, Mar 31, 2020 at 11:20 AM Chris Wilson <chris@chris-wilson.co.uk> wrote: > Quoting Daniel Vetter (2020-03-31 10:16:18) > > On Tue, Mar 31, 2020 at 10:59:45AM +0200, Christian König wrote: > > > A not so gentle ping, since this pretty much broke all TTM based drivers. > > > > > > Could we revert this for now? > > > > Always ack for revert. > > > > Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> > > So you didn't check the earlier patch either? I did, but wasn't super sold on the idea of more flags to smack an r-b onto it, so figured I'll throw the default ack-for-revert on this meanwhile. -Daniel
Am 31.03.20 um 12:38 schrieb Daniel Vetter: > On Tue, Mar 31, 2020 at 11:20 AM Chris Wilson <chris@chris-wilson.co.uk> wrote: >> Quoting Daniel Vetter (2020-03-31 10:16:18) >>> On Tue, Mar 31, 2020 at 10:59:45AM +0200, Christian König wrote: >>>> A not so gentle ping, since this pretty much broke all TTM based drivers. >>>> >>>> Could we revert this for now? >>> Always ack for revert. >>> >>> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> >> So you didn't check the earlier patch either? > I did, but wasn't super sold on the idea of more flags to smack an r-b > onto it, so figured I'll throw the default ack-for-revert on this > meanwhile. Mhm, and there is something wrong with either dri-devel or patchwork (I suspect the former). I can't see your reply on patchwork and it entered my inbox much later than Daniels mails. Christian. > -Daniel
Quoting Daniel Vetter (2020-03-31 11:38:50) > On Tue, Mar 31, 2020 at 11:20 AM Chris Wilson <chris@chris-wilson.co.uk> wrote: > > Quoting Daniel Vetter (2020-03-31 10:16:18) > > > On Tue, Mar 31, 2020 at 10:59:45AM +0200, Christian König wrote: > > > > A not so gentle ping, since this pretty much broke all TTM based drivers. > > > > > > > > Could we revert this for now? > > > > > > Always ack for revert. > > > > > > Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> > > > > So you didn't check the earlier patch either? > > I did, but wasn't super sold on the idea of more flags to smack an r-b > onto it, so figured I'll throw the default ack-for-revert on this > meanwhile. We allow userspace to poison the drm_mm at roughly 8K intervals, a search space of 35b with typically O(N^2) behaviour and each node traversal (rb_next/rb_prev) will itself be costly. Even our simple tests can generate a search of several minutes before our patience runs out.o Any drm_mm that allows for userspace to control alignment can be arbitrarily fragmented, hence a raised eyebrow that this search would be allowed in atomic context. -Chris
Am 31.03.20 um 15:19 schrieb Chris Wilson: > Quoting Daniel Vetter (2020-03-31 11:38:50) >> On Tue, Mar 31, 2020 at 11:20 AM Chris Wilson <chris@chris-wilson.co.uk> wrote: >>> Quoting Daniel Vetter (2020-03-31 10:16:18) >>>> On Tue, Mar 31, 2020 at 10:59:45AM +0200, Christian König wrote: >>>>> A not so gentle ping, since this pretty much broke all TTM based drivers. >>>>> >>>>> Could we revert this for now? >>>> Always ack for revert. >>>> >>>> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> >>> So you didn't check the earlier patch either? >> I did, but wasn't super sold on the idea of more flags to smack an r-b >> onto it, so figured I'll throw the default ack-for-revert on this >> meanwhile. > We allow userspace to poison the drm_mm at roughly 8K intervals, a > search space of 35b with typically O(N^2) behaviour and each node > traversal (rb_next/rb_prev) will itself be costly. Even our simple tests > can generate a search of several minutes before our patience runs out.o > Any drm_mm that allows for userspace to control alignment can be > arbitrarily fragmented, hence a raised eyebrow that this search would be > allowed in atomic context. Wow, that is indeed quite a lot. What is the criteria use for ordering the tree? Just the size or is that size+alignment? Never looked into this, but maybe we have a low hanging fruit for an improvement here? I'm not 100% sure, but moving away from atomic context wouldn't be that easy. Christian. > -Chris
Quoting Christian König (2020-04-01 08:29:34) > Am 31.03.20 um 15:19 schrieb Chris Wilson: > > Quoting Daniel Vetter (2020-03-31 11:38:50) > >> On Tue, Mar 31, 2020 at 11:20 AM Chris Wilson <chris@chris-wilson.co.uk> wrote: > >>> Quoting Daniel Vetter (2020-03-31 10:16:18) > >>>> On Tue, Mar 31, 2020 at 10:59:45AM +0200, Christian König wrote: > >>>>> A not so gentle ping, since this pretty much broke all TTM based drivers. > >>>>> > >>>>> Could we revert this for now? > >>>> Always ack for revert. > >>>> > >>>> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> > >>> So you didn't check the earlier patch either? > >> I did, but wasn't super sold on the idea of more flags to smack an r-b > >> onto it, so figured I'll throw the default ack-for-revert on this > >> meanwhile. > > We allow userspace to poison the drm_mm at roughly 8K intervals, a > > search space of 35b with typically O(N^2) behaviour and each node > > traversal (rb_next/rb_prev) will itself be costly. Even our simple tests > > can generate a search of several minutes before our patience runs out.o > > Any drm_mm that allows for userspace to control alignment can be > > arbitrarily fragmented, hence a raised eyebrow that this search would be > > allowed in atomic context. > > Wow, that is indeed quite a lot. > > What is the criteria use for ordering the tree? Just the size or is that > size+alignment? The tree is just size. Alignment is a little used parameter, but there's a requirement for userspace to be able to control it -- although it is strictly the older interface, it is still open to abuse. Converting the tree to [size, ffs(addr)] would help for many, but on top of that we have zones in the drm_mm, so search-in-range can be abused on top of search-for-alignment. > Never looked into this, but maybe we have a low hanging fruit for an > improvement here? A bit -- alignment is so rarely used in practice, optimising it was not a concern, just someone else has now noticed the potential for abuse. They ran a test, get bored and complained that it didn't respond to ^C for a long period of time and from that derive a proof-of-concept test to show how it can be used by one client to upset another :| > I'm not 100% sure, but moving away from atomic context wouldn't be that > easy. Fair enough. I would not worry unless the layout is controllable by the user -- but we probably want some other means of bounding the search. -Chris
Am 01.04.20 um 10:53 schrieb Chris Wilson: > Quoting Christian König (2020-04-01 08:29:34) >> Am 31.03.20 um 15:19 schrieb Chris Wilson: >>> Quoting Daniel Vetter (2020-03-31 11:38:50) >>>> On Tue, Mar 31, 2020 at 11:20 AM Chris Wilson <chris@chris-wilson.co.uk> wrote: >>>>> Quoting Daniel Vetter (2020-03-31 10:16:18) >>>>>> On Tue, Mar 31, 2020 at 10:59:45AM +0200, Christian König wrote: >>>>>> [SNIP] >>> We allow userspace to poison the drm_mm at roughly 8K intervals, a >>> search space of 35b with typically O(N^2) behaviour and each node >>> traversal (rb_next/rb_prev) will itself be costly. Even our simple tests >>> can generate a search of several minutes before our patience runs out.o >>> Any drm_mm that allows for userspace to control alignment can be >>> arbitrarily fragmented, hence a raised eyebrow that this search would be >>> allowed in atomic context. >> Wow, that is indeed quite a lot. >> >> What is the criteria use for ordering the tree? Just the size or is that >> size+alignment? > The tree is just size. Alignment is a little used parameter, but there's > a requirement for userspace to be able to control it -- although it is > strictly the older interface, it is still open to abuse. > > Converting the tree to [size, ffs(addr)] would help for many, but on top > of that we have zones in the drm_mm, so search-in-range can be abused on > top of search-for-alignment. The difference is that search in range is not controllable by userspace, but at least for amdgpu the alignment is very well controllable. >> Never looked into this, but maybe we have a low hanging fruit for an >> improvement here? > A bit -- alignment is so rarely used in practice, optimising it was not > a concern, just someone else has now noticed the potential for abuse. Well we do use alignment rather widely. IIRC we can have everything between 4K and 2MB based on the tilling flags, memory channel config etc etc... > They ran a test, get bored and complained that it didn't respond to ^C > for a long period of time and from that derive a proof-of-concept test to > show how it can be used by one client to upset another :| And as far as I can see that is a really valid problem we need to fix. Give me a second to write a test case for this. Thanks for pointing that out, Christian. > >> I'm not 100% sure, but moving away from atomic context wouldn't be that >> easy. > Fair enough. I would not worry unless the layout is controllable by the > user -- but we probably want some other means of bounding the search. > -Chris
[Adding Nirmoy, setting bunch of people to BCC] This bubbled up in our internal testing as well. Nirmoy now wants to take a look at it. Am 01.04.20 um 11:17 schrieb Christian König: > Am 01.04.20 um 10:53 schrieb Chris Wilson: >> Quoting Christian König (2020-04-01 08:29:34) >>> Am 31.03.20 um 15:19 schrieb Chris Wilson: >>>> Quoting Daniel Vetter (2020-03-31 11:38:50) >>>>> On Tue, Mar 31, 2020 at 11:20 AM Chris Wilson >>>>> <chris@chris-wilson.co.uk> wrote: >>>>>> Quoting Daniel Vetter (2020-03-31 10:16:18) >>>>>>> On Tue, Mar 31, 2020 at 10:59:45AM +0200, Christian König wrote: >>>>>>> [SNIP] >>>> We allow userspace to poison the drm_mm at roughly 8K intervals, a >>>> search space of 35b with typically O(N^2) behaviour and each node >>>> traversal (rb_next/rb_prev) will itself be costly. Even our simple >>>> tests >>>> can generate a search of several minutes before our patience runs >>>> out.o >>>> Any drm_mm that allows for userspace to control alignment can be >>>> arbitrarily fragmented, hence a raised eyebrow that this search >>>> would be >>>> allowed in atomic context. >>> Wow, that is indeed quite a lot. >>> >>> What is the criteria use for ordering the tree? Just the size or is >>> that >>> size+alignment? >> The tree is just size. Alignment is a little used parameter, but there's >> a requirement for userspace to be able to control it -- although it is >> strictly the older interface, it is still open to abuse. >> >> Converting the tree to [size, ffs(addr)] would help for many, but on top >> of that we have zones in the drm_mm, so search-in-range can be abused on >> top of search-for-alignment. > > The difference is that search in range is not controllable by > userspace, but at least for amdgpu the alignment is very well > controllable. > >>> Never looked into this, but maybe we have a low hanging fruit for an >>> improvement here? >> A bit -- alignment is so rarely used in practice, optimising it was not >> a concern, just someone else has now noticed the potential for abuse. > > Well we do use alignment rather widely. IIRC we can have everything > between 4K and 2MB based on the tilling flags, memory channel config > etc etc... > >> They ran a test, get bored and complained that it didn't respond to ^C >> for a long period of time and from that derive a proof-of-concept >> test to >> show how it can be used by one client to upset another :| > > And as far as I can see that is a really valid problem we need to fix. > Give me a second to write a test case for this. Nirmoy, could you tackle this first? I've came up with some very quick and dirty code for this for our libdrm unit tests. Ping me internally and we can chat about it. Thanks, Christian. > > Thanks for pointing that out, > Christian.
diff --git a/drivers/gpu/drm/drm_mm.c b/drivers/gpu/drm/drm_mm.c index bc6e208949e8..8981abe8b7c9 100644 --- a/drivers/gpu/drm/drm_mm.c +++ b/drivers/gpu/drm/drm_mm.c @@ -45,7 +45,6 @@ #include <linux/export.h> #include <linux/interval_tree_generic.h> #include <linux/seq_file.h> -#include <linux/sched/signal.h> #include <linux/slab.h> #include <linux/stacktrace.h> @@ -367,11 +366,6 @@ next_hole(struct drm_mm *mm, struct drm_mm_node *node, enum drm_mm_insert_mode mode) { - /* Searching is slow; check if we ran out of time/patience */ - cond_resched(); - if (fatal_signal_pending(current)) - return NULL; - switch (mode) { default: case DRM_MM_INSERT_BEST: @@ -563,7 +557,7 @@ int drm_mm_insert_node_in_range(struct drm_mm * const mm, return 0; } - return signal_pending(current) ? -ERESTARTSYS : -ENOSPC; + return -ENOSPC; } EXPORT_SYMBOL(drm_mm_insert_node_in_range);
This reverts commit 7be1b9b8e9d1e9ef0342d2e001f44eec4030aa4d. The drm_mm is supposed to work in atomic context, so calling schedule() or in this case cond_resched() is illegal. Signed-off-by: Christian König <christian.koenig@amd.com> --- drivers/gpu/drm/drm_mm.c | 8 +------- 1 file changed, 1 insertion(+), 7 deletions(-)