Message ID | 20240308010851.17104-2-dreaming.about.electric.sheep@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/qxl: fixes qxl_fence_wait | expand |
On 08.03.24 02:08, Alex Constantino wrote: > Fix OOM scenario by doing multiple notifications to the OOM handler through > a busy wait logic. > Changes from commit 5a838e5d5825 ("drm/qxl: simplify qxl_fence_wait") would > result in a '[TTM] Buffer eviction failed' exception whenever it reached a > timeout. Thx for working on this. > Fixes: 5a838e5d5825 ("drm/qxl: simplify qxl_fence_wait") > Link: https://lore.kernel.org/regressions/fb0fda6a-3750-4e1b-893f-97a3e402b9af@leemhuis.info Nitpicking: that ideally should be pointing to https://lore.kernel.org/regressions/ZTgydqRlK6WX_b29@eldamar.lan/ , as that the report and not just a reply to prod things. Ciao, Thorsten
On 08.03.24 02:08, Alex Constantino wrote: > Fix OOM scenario by doing multiple notifications to the OOM handler through > a busy wait logic. > Changes from commit 5a838e5d5825 ("drm/qxl: simplify qxl_fence_wait") would > result in a '[TTM] Buffer eviction failed' exception whenever it reached a > timeout. > > Fixes: 5a838e5d5825 ("drm/qxl: simplify qxl_fence_wait") > Link: https://lore.kernel.org/regressions/fb0fda6a-3750-4e1b-893f-97a3e402b9af@leemhuis.info > Reported-by: Timo Lindfors <timo.lindfors@iki.fi> > Closes: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1054514 > Signed-off-by: Alex Constantino <dreaming.about.electric.sheep@gmail.com> > --- > drivers/gpu/drm/qxl/qxl_release.c | 20 ++++++++++++++------ > 1 file changed, 14 insertions(+), 6 deletions(-) Hey Dave and Gerd as well as Thomas, Maarten and Maxime (the latter two I just added to the CC), it seems to me this regression fix did not maybe any progress since it was posted. Did I miss something, is it just "we are busy with the merge window", or is there some other a reason? Just wondering, I just saw someone on a Fedora IRC channel complaining about the regression, that's why I'm asking. Would be really good to finally get this resolved... Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr If I did something stupid, please tell me, as explained on that page. #regzbot poke > diff --git a/drivers/gpu/drm/qxl/qxl_release.c b/drivers/gpu/drm/qxl/qxl_release.c > index 368d26da0d6a..51c22e7f9647 100644 > --- a/drivers/gpu/drm/qxl/qxl_release.c > +++ b/drivers/gpu/drm/qxl/qxl_release.c > @@ -20,8 +20,6 @@ > * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. > */ > > -#include <linux/delay.h> > - > #include <trace/events/dma_fence.h> > > #include "qxl_drv.h" > @@ -59,14 +57,24 @@ static long qxl_fence_wait(struct dma_fence *fence, bool intr, > { > struct qxl_device *qdev; > unsigned long cur, end = jiffies + timeout; > + signed long iterations = 1; > + signed long timeout_fraction = timeout; > > qdev = container_of(fence->lock, struct qxl_device, release_lock); > > - if (!wait_event_timeout(qdev->release_event, > + // using HZ as a factor since it is used in ttm_bo_wait_ctx too > + if (timeout_fraction > HZ) { > + iterations = timeout_fraction / HZ; > + timeout_fraction = HZ; > + } > + for (int i = 0; i < iterations; i++) { > + if (wait_event_timeout( > + qdev->release_event, > (dma_fence_is_signaled(fence) || > - (qxl_io_notify_oom(qdev), 0)), > - timeout)) > - return 0; > + (qxl_io_notify_oom(qdev), 0)), > + timeout_fraction)) > + break; > + } > > cur = jiffies; > if (time_after(cur, end))
Hi, On Wed, Mar 20, 2024 at 04:25:48PM +0100, Linux regression tracking (Thorsten Leemhuis) wrote: > On 08.03.24 02:08, Alex Constantino wrote: > > Fix OOM scenario by doing multiple notifications to the OOM handler through > > a busy wait logic. > > Changes from commit 5a838e5d5825 ("drm/qxl: simplify qxl_fence_wait") would > > result in a '[TTM] Buffer eviction failed' exception whenever it reached a > > timeout. > > > > Fixes: 5a838e5d5825 ("drm/qxl: simplify qxl_fence_wait") > > Link: https://lore.kernel.org/regressions/fb0fda6a-3750-4e1b-893f-97a3e402b9af@leemhuis.info > > Reported-by: Timo Lindfors <timo.lindfors@iki.fi> > > Closes: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1054514 > > Signed-off-by: Alex Constantino <dreaming.about.electric.sheep@gmail.com> > > --- > > drivers/gpu/drm/qxl/qxl_release.c | 20 ++++++++++++++------ > > 1 file changed, 14 insertions(+), 6 deletions(-) > > Hey Dave and Gerd as well as Thomas, Maarten and Maxime (the latter two > I just added to the CC), it seems to me this regression fix did not > maybe any progress since it was posted. Did I miss something, is it just > "we are busy with the merge window", or is there some other a reason? > Just wondering, I just saw someone on a Fedora IRC channel complaining > about the regression, that's why I'm asking. Would be really good to > finally get this resolved... I've ping'd Gerd last week about it, but he couldn't remember the details of why that patch was warranted in the first place. If it works, I'd prefer to revert the original patch that we know used to work instead of coming up with some less proven logic, which seems to be quite different to what it used to be. Alex, could you try reverting 5a838e5d5825c85556011478abde708251cc0776 and letting us know the result? Thanks! Maxime
On Thu, Apr 04, 2024 at 07:14:48PM +0100, Alex Constantino wrote: > This reverts commit 5a838e5d5825c85556011478abde708251cc0776. > > Changes from commit 5a838e5d5825 ("drm/qxl: simplify qxl_fence_wait") would > result in a '[TTM] Buffer eviction failed' exception whenever it reached a > timeout. > Due to a dependency to DMA_FENCE_WARN this also restores some code deleted > by commit d72277b6c37d ("dma-buf: nuke DMA_FENCE_TRACE macros v2"). > > Fixes: 5a838e5d5825 ("drm/qxl: simplify qxl_fence_wait") > Link: https://lore.kernel.org/regressions/ZTgydqRlK6WX_b29@eldamar.lan/ > Reported-by: Timo Lindfors <timo.lindfors@iki.fi> > Closes: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1054514 > Signed-off-by: Alex Constantino <dreaming.about.electric.sheep@gmail.com> > --- > drivers/gpu/drm/qxl/qxl_release.c | 50 +++++++++++++++++++++++++++---- > include/linux/dma-fence.h | 7 +++++ > 2 files changed, 52 insertions(+), 5 deletions(-) Hi, This is the friendly patch-bot of Greg Kroah-Hartman. You have sent him a patch that has triggered this response. He used to manually respond to these common problems, but in order to save his sanity (he kept writing the same thing over and over, yet to different people), I was created. Hopefully you will not take offence and will fix the problem in your patch and resubmit it so that it can be accepted into the Linux kernel tree. You are receiving this message because of the following common error(s) as indicated below: - You have marked a patch with a "Fixes:" tag for a commit that is in an older released kernel, yet you do not have a cc: stable line in the signed-off-by area at all, which means that the patch will not be applied to any older kernel releases. To properly fix this, please follow the documented rules in the Documentation/process/stable-kernel-rules.rst file for how to resolve this. If you wish to discuss this problem further, or you have questions about how to resolve this issue, please feel free to respond to this email and Greg will reply once he has dug out from the pending patches received from other developers. thanks, greg k-h's patch email bot
diff --git a/drivers/gpu/drm/qxl/qxl_release.c b/drivers/gpu/drm/qxl/qxl_release.c index 368d26da0d6a..51c22e7f9647 100644 --- a/drivers/gpu/drm/qxl/qxl_release.c +++ b/drivers/gpu/drm/qxl/qxl_release.c @@ -20,8 +20,6 @@ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ -#include <linux/delay.h> - #include <trace/events/dma_fence.h> #include "qxl_drv.h" @@ -59,14 +57,24 @@ static long qxl_fence_wait(struct dma_fence *fence, bool intr, { struct qxl_device *qdev; unsigned long cur, end = jiffies + timeout; + signed long iterations = 1; + signed long timeout_fraction = timeout; qdev = container_of(fence->lock, struct qxl_device, release_lock); - if (!wait_event_timeout(qdev->release_event, + // using HZ as a factor since it is used in ttm_bo_wait_ctx too + if (timeout_fraction > HZ) { + iterations = timeout_fraction / HZ; + timeout_fraction = HZ; + } + for (int i = 0; i < iterations; i++) { + if (wait_event_timeout( + qdev->release_event, (dma_fence_is_signaled(fence) || - (qxl_io_notify_oom(qdev), 0)), - timeout)) - return 0; + (qxl_io_notify_oom(qdev), 0)), + timeout_fraction)) + break; + } cur = jiffies; if (time_after(cur, end))
Fix OOM scenario by doing multiple notifications to the OOM handler through a busy wait logic. Changes from commit 5a838e5d5825 ("drm/qxl: simplify qxl_fence_wait") would result in a '[TTM] Buffer eviction failed' exception whenever it reached a timeout. Fixes: 5a838e5d5825 ("drm/qxl: simplify qxl_fence_wait") Link: https://lore.kernel.org/regressions/fb0fda6a-3750-4e1b-893f-97a3e402b9af@leemhuis.info Reported-by: Timo Lindfors <timo.lindfors@iki.fi> Closes: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1054514 Signed-off-by: Alex Constantino <dreaming.about.electric.sheep@gmail.com> --- drivers/gpu/drm/qxl/qxl_release.c | 20 ++++++++++++++------ 1 file changed, 14 insertions(+), 6 deletions(-)