diff mbox series

ttm: wait mem space if user allow while gpu busy

Message ID 20190422103836.4300-1-david1.zhou@amd.com (mailing list archive)
State New, archived
Headers show
Series ttm: wait mem space if user allow while gpu busy | expand

Commit Message

Chunming Zhou April 22, 2019, 10:38 a.m. UTC
heavy gpu job could occupy memory long time, which could lead to other
user fail to get memory.

Change-Id: I0b322d98cd76e5ac32b00462bbae8008d76c5e11
Signed-off-by: Chunming Zhou <david1.zhou@amd.com>
---
 drivers/gpu/drm/ttm/ttm_bo.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Comments

Liang, Prike April 23, 2019, 1:45 a.m. UTC | #1
Acked-by: Prike Liang <Prike.Liang@amd.com>

Thanks,
Prike
-----Original Message-----
From: Chunming Zhou <david1.zhou@amd.com> 
Sent: Monday, April 22, 2019 6:39 PM
To: dri-devel@lists.freedesktop.org
Cc: Liang, Prike <Prike.Liang@amd.com>; Zhou, David(ChunMing) <David1.Zhou@amd.com>
Subject: [PATCH] ttm: wait mem space if user allow while gpu busy

heavy gpu job could occupy memory long time, which could lead to other user fail to get memory.

Change-Id: I0b322d98cd76e5ac32b00462bbae8008d76c5e11
Signed-off-by: Chunming Zhou <david1.zhou@amd.com>
---
 drivers/gpu/drm/ttm/ttm_bo.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 7c484729f9b2..6c596cc24bec 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -830,8 +830,10 @@ static int ttm_bo_mem_force_space(struct ttm_buffer_object *bo,
 		if (mem->mm_node)
 			break;
 		ret = ttm_mem_evict_first(bdev, mem_type, place, ctx);
-		if (unlikely(ret != 0))
-			return ret;
+		if (unlikely(ret != 0)) {
+			if (!ctx || ctx->no_wait_gpu || ret != -EBUSY)
+				return ret;
+		}
 	} while (1);
 	mem->mem_type = mem_type;
 	return ttm_bo_add_move_fence(bo, man, mem);
--
2.17.1
Christian König April 23, 2019, 1:05 p.m. UTC | #2
Well that is certainly a NAK because it can lead to deadlock in the 
memory management.

You can't just busy wait with all those locks held.

Regards,
Christian.

Am 23.04.19 um 03:45 schrieb Liang, Prike:
> Acked-by: Prike Liang <Prike.Liang@amd.com>
>
> Thanks,
> Prike
> -----Original Message-----
> From: Chunming Zhou <david1.zhou@amd.com>
> Sent: Monday, April 22, 2019 6:39 PM
> To: dri-devel@lists.freedesktop.org
> Cc: Liang, Prike <Prike.Liang@amd.com>; Zhou, David(ChunMing) <David1.Zhou@amd.com>
> Subject: [PATCH] ttm: wait mem space if user allow while gpu busy
>
> heavy gpu job could occupy memory long time, which could lead to other user fail to get memory.
>
> Change-Id: I0b322d98cd76e5ac32b00462bbae8008d76c5e11
> Signed-off-by: Chunming Zhou <david1.zhou@amd.com>
> ---
>   drivers/gpu/drm/ttm/ttm_bo.c | 6 ++++--
>   1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 7c484729f9b2..6c596cc24bec 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -830,8 +830,10 @@ static int ttm_bo_mem_force_space(struct ttm_buffer_object *bo,
>   		if (mem->mm_node)
>   			break;
>   		ret = ttm_mem_evict_first(bdev, mem_type, place, ctx);
> -		if (unlikely(ret != 0))
> -			return ret;
> +		if (unlikely(ret != 0)) {
> +			if (!ctx || ctx->no_wait_gpu || ret != -EBUSY)
> +				return ret;
> +		}
>   	} while (1);
>   	mem->mem_type = mem_type;
>   	return ttm_bo_add_move_fence(bo, man, mem);
> --
> 2.17.1
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
Chunming Zhou April 23, 2019, 1:19 p.m. UTC | #3
How about adding more condition ctx->resv inline to address your concern? As well as don't wait from same user, shouldn't lead to deadlock.

Otherwise, any other idea?

-------- Original Message --------
Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy
From: Christian König
To: "Liang, Prike" ,"Zhou, David(ChunMing)" ,dri-devel@lists.freedesktop.org
CC:

Well that is certainly a NAK because it can lead to deadlock in the
memory management.

You can't just busy wait with all those locks held.

Regards,
Christian.

Am 23.04.19 um 03:45 schrieb Liang, Prike:
> Acked-by: Prike Liang <Prike.Liang@amd.com>
>
> Thanks,
> Prike
> -----Original Message-----
> From: Chunming Zhou <david1.zhou@amd.com>
> Sent: Monday, April 22, 2019 6:39 PM
> To: dri-devel@lists.freedesktop.org
> Cc: Liang, Prike <Prike.Liang@amd.com>; Zhou, David(ChunMing) <David1.Zhou@amd.com>
> Subject: [PATCH] ttm: wait mem space if user allow while gpu busy
>
> heavy gpu job could occupy memory long time, which could lead to other user fail to get memory.
>
> Change-Id: I0b322d98cd76e5ac32b00462bbae8008d76c5e11
> Signed-off-by: Chunming Zhou <david1.zhou@amd.com>
> ---
>   drivers/gpu/drm/ttm/ttm_bo.c | 6 ++++--
>   1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 7c484729f9b2..6c596cc24bec 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -830,8 +830,10 @@ static int ttm_bo_mem_force_space(struct ttm_buffer_object *bo,
>                if (mem->mm_node)
>                        break;
>                ret = ttm_mem_evict_first(bdev, mem_type, place, ctx);
> -             if (unlikely(ret != 0))
> -                     return ret;
> +             if (unlikely(ret != 0)) {
> +                     if (!ctx || ctx->no_wait_gpu || ret != -EBUSY)
> +                             return ret;
> +             }
>        } while (1);
>        mem->mem_type = mem_type;
>        return ttm_bo_add_move_fence(bo, man, mem);
> --
> 2.17.1
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<meta name="Generator" content="Microsoft Exchange Server">
<!-- converted from text --><style><!-- .EmailQuote { margin-left: 1pt; padding-left: 4pt; border-left: #800000 2px solid; } --></style>
</head>
<body>
<div>How about adding more condition ctx-&gt;resv inline to address your concern? As well as don't wait from same user, shouldn't lead to deadlock.<br>
<br>
Otherwise, any other idea?<br>
<br>
-------- Original Message --------<br>
Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy<br>
From: Christian König <br>
To: &quot;Liang, Prike&quot; ,&quot;Zhou, David(ChunMing)&quot; ,dri-devel@lists.freedesktop.org<br>
CC: <br>
<br>
</div>
<font size="2"><span style="font-size:11pt;">
<div class="PlainText">Well that is certainly a NAK because it can lead to deadlock in the
<br>
memory management.<br>
<br>
You can't just busy wait with all those locks held.<br>
<br>
Regards,<br>
Christian.<br>
<br>
Am 23.04.19 um 03:45 schrieb Liang, Prike:<br>
&gt; Acked-by: Prike Liang &lt;Prike.Liang@amd.com&gt;<br>
&gt;<br>
&gt; Thanks,<br>
&gt; Prike<br>
&gt; -----Original Message-----<br>
&gt; From: Chunming Zhou &lt;david1.zhou@amd.com&gt;<br>
&gt; Sent: Monday, April 22, 2019 6:39 PM<br>
&gt; To: dri-devel@lists.freedesktop.org<br>
&gt; Cc: Liang, Prike &lt;Prike.Liang@amd.com&gt;; Zhou, David(ChunMing) &lt;David1.Zhou@amd.com&gt;<br>
&gt; Subject: [PATCH] ttm: wait mem space if user allow while gpu busy<br>
&gt;<br>
&gt; heavy gpu job could occupy memory long time, which could lead to other user fail to get memory.<br>
&gt;<br>
&gt; Change-Id: I0b322d98cd76e5ac32b00462bbae8008d76c5e11<br>
&gt; Signed-off-by: Chunming Zhou &lt;david1.zhou@amd.com&gt;<br>
&gt; ---<br>
&gt;&nbsp;&nbsp; drivers/gpu/drm/ttm/ttm_bo.c | 6 &#43;&#43;&#43;&#43;--<br>
&gt;&nbsp;&nbsp; 1 file changed, 4 insertions(&#43;), 2 deletions(-)<br>
&gt;<br>
&gt; diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 7c484729f9b2..6c596cc24bec 100644<br>
&gt; --- a/drivers/gpu/drm/ttm/ttm_bo.c<br>
&gt; &#43;&#43;&#43; b/drivers/gpu/drm/ttm/ttm_bo.c<br>
&gt; @@ -830,8 &#43;830,10 @@ static int ttm_bo_mem_force_space(struct ttm_buffer_object *bo,<br>
&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; if (mem-&gt;mm_node)<br>
&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; break;<br>
&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ret = ttm_mem_evict_first(bdev, mem_type, place, ctx);<br>
&gt; -&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; if (unlikely(ret != 0))<br>
&gt; -&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; return ret;<br>
&gt; &#43;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; if (unlikely(ret != 0)) {<br>
&gt; &#43;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; if (!ctx || ctx-&gt;no_wait_gpu || ret != -EBUSY)<br>
&gt; &#43;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; return ret;<br>
&gt; &#43;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; }<br>
&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; } while (1);<br>
&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; mem-&gt;mem_type = mem_type;<br>
&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; return ttm_bo_add_move_fence(bo, man, mem);<br>
&gt; --<br>
&gt; 2.17.1<br>
&gt;<br>
&gt; _______________________________________________<br>
&gt; dri-devel mailing list<br>
&gt; dri-devel@lists.freedesktop.org<br>
&gt; <a href="https://lists.freedesktop.org/mailman/listinfo/dri-devel">https://lists.freedesktop.org/mailman/listinfo/dri-devel</a><br>
<br>
</div>
</span></font>
</body>
</html>
Christian König April 23, 2019, 2:42 p.m. UTC | #4
Well that's not so easy of hand.

The basic problem here is that when you busy wait at this place you can 
easily run into situations where application A busy waits for B while B 
busy waits for A -> deadlock.

So what we need here is the deadlock detection logic of the ww_mutex. To 
use this we at least need to do the following steps:

1. Reserve the BO in DC using a ww_mutex ticket (trivial).

2. If we then run into this EBUSY condition in TTM check if the BO we 
need memory for (or rather the ww_mutex of its reservation object) has a 
ticket assigned.

3. If we have a ticket we grab a reference to the first BO on the LRU, 
drop the LRU lock and try to grab the reservation lock with the ticket.

4. If getting the reservation lock with the ticket succeeded we check if 
the BO is still the first one on the LRU in question (the BO could have 
moved).

5. If the BO is still the first one on the LRU in question we try to 
evict it as we would evict any other BO.

6. If any of the "If's" above fail we just back off and return -EBUSY.

Steps 2-5 are certainly not trivial, but doable as far as I can see.

Have fun :)
Christian.

Am 23.04.19 um 15:19 schrieb Zhou, David(ChunMing):
> How about adding more condition ctx->resv inline to address your 
> concern? As well as don't wait from same user, shouldn't lead to deadlock.
>
> Otherwise, any other idea?
>
> -------- Original Message --------
> Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy
> From: Christian König
> To: "Liang, Prike" ,"Zhou, David(ChunMing)" 
> ,dri-devel@lists.freedesktop.org
> CC:
>
> Well that is certainly a NAK because it can lead to deadlock in the
> memory management.
>
> You can't just busy wait with all those locks held.
>
> Regards,
> Christian.
>
> Am 23.04.19 um 03:45 schrieb Liang, Prike:
> > Acked-by: Prike Liang <Prike.Liang@amd.com>
> >
> > Thanks,
> > Prike
> > -----Original Message-----
> > From: Chunming Zhou <david1.zhou@amd.com>
> > Sent: Monday, April 22, 2019 6:39 PM
> > To: dri-devel@lists.freedesktop.org
> > Cc: Liang, Prike <Prike.Liang@amd.com>; Zhou, David(ChunMing) 
> <David1.Zhou@amd.com>
> > Subject: [PATCH] ttm: wait mem space if user allow while gpu busy
> >
> > heavy gpu job could occupy memory long time, which could lead to 
> other user fail to get memory.
> >
> > Change-Id: I0b322d98cd76e5ac32b00462bbae8008d76c5e11
> > Signed-off-by: Chunming Zhou <david1.zhou@amd.com>
> > ---
> >   drivers/gpu/drm/ttm/ttm_bo.c | 6 ++++--
> >   1 file changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c 
> b/drivers/gpu/drm/ttm/ttm_bo.c index 7c484729f9b2..6c596cc24bec 100644
> > --- a/drivers/gpu/drm/ttm/ttm_bo.c
> > +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> > @@ -830,8 +830,10 @@ static int ttm_bo_mem_force_space(struct 
> ttm_buffer_object *bo,
> >                if (mem->mm_node)
> >                        break;
> >                ret = ttm_mem_evict_first(bdev, mem_type, place, ctx);
> > -             if (unlikely(ret != 0))
> > -                     return ret;
> > +             if (unlikely(ret != 0)) {
> > +                     if (!ctx || ctx->no_wait_gpu || ret != -EBUSY)
> > +                             return ret;
> > +             }
> >        } while (1);
> >        mem->mem_type = mem_type;
> >        return ttm_bo_add_move_fence(bo, man, mem);
> > --
> > 2.17.1
> >
> > _______________________________________________
> > dri-devel mailing list
> > dri-devel@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html;
      charset=windows-1252">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <div class="moz-cite-prefix">Well that's not so easy of hand.<br>
      <br>
      The basic problem here is that when you busy wait at this place
      you can easily run into situations where application A busy waits
      for B while B busy waits for A -&gt; deadlock.<br>
      <br>
      So what we need here is the deadlock detection logic of the
      ww_mutex. To use this we at least need to do the following steps:<br>
      <br>
      1. Reserve the BO in DC using a ww_mutex ticket (trivial).<br>
      <br>
      2. If we then run into this EBUSY condition in TTM check if the BO
      we need memory for (or rather the ww_mutex of its reservation
      object) has a ticket assigned.<br>
      <br>
      3. If we have a ticket we grab a reference to the first BO on the
      LRU, drop the LRU lock and try to grab the reservation lock with
      the ticket.<br>
      <br>
      4. If getting the reservation lock with the ticket succeeded we
      check if the BO is still the first one on the LRU in question (the
      BO could have moved).<br>
      <br>
      5. If the BO is still the first one on the LRU in question we try
      to evict it as we would evict any other BO.<br>
      <br>
      6. If any of the "If's" above fail we just back off and return
      -EBUSY.<br>
      <br>
      Steps 2-5 are certainly not trivial, but doable as far as I can
      see.<br>
      <br>
      Have fun :)<br>
      Christian.<br>
      <br>
      Am 23.04.19 um 15:19 schrieb Zhou, David(ChunMing):<br>
    </div>
    <blockquote type="cite"
cite="mid:ven3p5-dcjsua-aovhkzqonovm1ermvl-iasad2-grfh83xntt6k-qnptbs-6ko55nfhu2ac-rew2lw-zbd8fsix6bv3-dlzat2-8w1c4brcif8r7131wn-p5wdqd-fvtllbv000vq-jszyw5-f3mv7ivocjk.1556025578377@email.android.com">
      <meta http-equiv="Content-Type" content="text/html;
        charset=windows-1252">
      <meta name="Generator" content="Microsoft Exchange Server">
      <!-- converted from text -->
      <style><!-- .EmailQuote { margin-left: 1pt; padding-left: 4pt; border-left: #800000 2px solid; } --></style>
      <div>How about adding more condition ctx-&gt;resv inline to
        address your concern? As well as don't wait from same user,
        shouldn't lead to deadlock.<br>
        <br>
        Otherwise, any other idea?<br>
        <br>
        -------- Original Message --------<br>
        Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu
        busy<br>
        From: Christian König <br>
        To: "Liang, Prike" ,"Zhou, David(ChunMing)"
        ,<a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org">dri-devel@lists.freedesktop.org</a><br>
        CC: <br>
        <br>
      </div>
      <font size="2"><span style="font-size:11pt;">
          <div class="PlainText">Well that is certainly a NAK because it
            can lead to deadlock in the
            <br>
            memory management.<br>
            <br>
            You can't just busy wait with all those locks held.<br>
            <br>
            Regards,<br>
            Christian.<br>
            <br>
            Am 23.04.19 um 03:45 schrieb Liang, Prike:<br>
            &gt; Acked-by: Prike Liang <a class="moz-txt-link-rfc2396E" href="mailto:Prike.Liang@amd.com">&lt;Prike.Liang@amd.com&gt;</a><br>
            &gt;<br>
            &gt; Thanks,<br>
            &gt; Prike<br>
            &gt; -----Original Message-----<br>
            &gt; From: Chunming Zhou <a class="moz-txt-link-rfc2396E" href="mailto:david1.zhou@amd.com">&lt;david1.zhou@amd.com&gt;</a><br>
            &gt; Sent: Monday, April 22, 2019 6:39 PM<br>
            &gt; To: <a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org">dri-devel@lists.freedesktop.org</a><br>
            &gt; Cc: Liang, Prike <a class="moz-txt-link-rfc2396E" href="mailto:Prike.Liang@amd.com">&lt;Prike.Liang@amd.com&gt;</a>; Zhou,
            David(ChunMing) <a class="moz-txt-link-rfc2396E" href="mailto:David1.Zhou@amd.com">&lt;David1.Zhou@amd.com&gt;</a><br>
            &gt; Subject: [PATCH] ttm: wait mem space if user allow
            while gpu busy<br>
            &gt;<br>
            &gt; heavy gpu job could occupy memory long time, which
            could lead to other user fail to get memory.<br>
            &gt;<br>
            &gt; Change-Id: I0b322d98cd76e5ac32b00462bbae8008d76c5e11<br>
            &gt; Signed-off-by: Chunming Zhou
            <a class="moz-txt-link-rfc2396E" href="mailto:david1.zhou@amd.com">&lt;david1.zhou@amd.com&gt;</a><br>
            &gt; ---<br>
            &gt;   drivers/gpu/drm/ttm/ttm_bo.c | 6 ++++--<br>
            &gt;   1 file changed, 4 insertions(+), 2 deletions(-)<br>
            &gt;<br>
            &gt; diff --git a/drivers/gpu/drm/ttm/ttm_bo.c
            b/drivers/gpu/drm/ttm/ttm_bo.c index
            7c484729f9b2..6c596cc24bec 100644<br>
            &gt; --- a/drivers/gpu/drm/ttm/ttm_bo.c<br>
            &gt; +++ b/drivers/gpu/drm/ttm/ttm_bo.c<br>
            &gt; @@ -830,8 +830,10 @@ static int
            ttm_bo_mem_force_space(struct ttm_buffer_object *bo,<br>
            &gt;                if (mem-&gt;mm_node)<br>
            &gt;                        break;<br>
            &gt;                ret = ttm_mem_evict_first(bdev,
            mem_type, place, ctx);<br>
            &gt; -             if (unlikely(ret != 0))<br>
            &gt; -                     return ret;<br>
            &gt; +             if (unlikely(ret != 0)) {<br>
            &gt; +                     if (!ctx || ctx-&gt;no_wait_gpu
            || ret != -EBUSY)<br>
            &gt; +                             return ret;<br>
            &gt; +             }<br>
            &gt;        } while (1);<br>
            &gt;        mem-&gt;mem_type = mem_type;<br>
            &gt;        return ttm_bo_add_move_fence(bo, man, mem);<br>
            &gt; --<br>
            &gt; 2.17.1<br>
            &gt;<br>
            &gt; _______________________________________________<br>
            &gt; dri-devel mailing list<br>
            &gt; <a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org">dri-devel@lists.freedesktop.org</a><br>
            &gt; <a
              href="https://lists.freedesktop.org/mailman/listinfo/dri-devel"
              moz-do-not-send="true">https://lists.freedesktop.org/mailman/listinfo/dri-devel</a><br>
            <br>
          </div>
        </span></font>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <pre class="moz-quote-pre" wrap="">_______________________________________________
dri-devel mailing list
<a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org">dri-devel@lists.freedesktop.org</a>
<a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/dri-devel">https://lists.freedesktop.org/mailman/listinfo/dri-devel</a></pre>
    </blockquote>
    <br>
  </body>
</html>
Chunming Zhou April 23, 2019, 3:09 p.m. UTC | #5
>3. If we have a ticket we grab a reference to the first BO on the LRU, drop the LRU lock and try to grab the reservation lock with the ticket.

The BO on LRU is already locked by cs user, can it be dropped here by DC user? and then DC user grab its lock with ticket, how does CS grab it again?

If you think waiting in ttm has this risk, how about just adding a wrapper for pin function as below?
amdgpu_get_pin_bo_timeout()
{
do {
amdgpo_bo_reserve();
r=amdgpu_bo_pin();

if(!r)
        break;
amdgpu_bo_unreserve();
timeout--;

} while(timeout>0);

}

-------- Original Message --------
Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy
From: Christian König
To: "Zhou, David(ChunMing)" ,"Koenig, Christian" ,"Liang, Prike" ,dri-devel@lists.freedesktop.org
CC:

Well that's not so easy of hand.

The basic problem here is that when you busy wait at this place you can easily run into situations where application A busy waits for B while B busy waits for A -> deadlock.

So what we need here is the deadlock detection logic of the ww_mutex. To use this we at least need to do the following steps:

1. Reserve the BO in DC using a ww_mutex ticket (trivial).

2. If we then run into this EBUSY condition in TTM check if the BO we need memory for (or rather the ww_mutex of its reservation object) has a ticket assigned.

3. If we have a ticket we grab a reference to the first BO on the LRU, drop the LRU lock and try to grab the reservation lock with the ticket.

4. If getting the reservation lock with the ticket succeeded we check if the BO is still the first one on the LRU in question (the BO could have moved).

5. If the BO is still the first one on the LRU in question we try to evict it as we would evict any other BO.

6. If any of the "If's" above fail we just back off and return -EBUSY.

Steps 2-5 are certainly not trivial, but doable as far as I can see.

Have fun :)
Christian.

Am 23.04.19 um 15:19 schrieb Zhou, David(ChunMing):
How about adding more condition ctx->resv inline to address your concern? As well as don't wait from same user, shouldn't lead to deadlock.

Otherwise, any other idea?

-------- Original Message --------
Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy
From: Christian König
To: "Liang, Prike" ,"Zhou, David(ChunMing)" ,dri-devel@lists.freedesktop.org<mailto:dri-devel@lists.freedesktop.org>
CC:

Well that is certainly a NAK because it can lead to deadlock in the
memory management.

You can't just busy wait with all those locks held.

Regards,
Christian.

Am 23.04.19 um 03:45 schrieb Liang, Prike:
> Acked-by: Prike Liang <Prike.Liang@amd.com><mailto:Prike.Liang@amd.com>
>
> Thanks,
> Prike
> -----Original Message-----
> From: Chunming Zhou <david1.zhou@amd.com><mailto:david1.zhou@amd.com>
> Sent: Monday, April 22, 2019 6:39 PM
> To: dri-devel@lists.freedesktop.org<mailto:dri-devel@lists.freedesktop.org>
> Cc: Liang, Prike <Prike.Liang@amd.com><mailto:Prike.Liang@amd.com>; Zhou, David(ChunMing) <David1.Zhou@amd.com><mailto:David1.Zhou@amd.com>
> Subject: [PATCH] ttm: wait mem space if user allow while gpu busy
>
> heavy gpu job could occupy memory long time, which could lead to other user fail to get memory.
>
> Change-Id: I0b322d98cd76e5ac32b00462bbae8008d76c5e11
> Signed-off-by: Chunming Zhou <david1.zhou@amd.com><mailto:david1.zhou@amd.com>
> ---
>   drivers/gpu/drm/ttm/ttm_bo.c | 6 ++++--
>   1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 7c484729f9b2..6c596cc24bec 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -830,8 +830,10 @@ static int ttm_bo_mem_force_space(struct ttm_buffer_object *bo,
>                if (mem->mm_node)
>                        break;
>                ret = ttm_mem_evict_first(bdev, mem_type, place, ctx);
> -             if (unlikely(ret != 0))
> -                     return ret;
> +             if (unlikely(ret != 0)) {
> +                     if (!ctx || ctx->no_wait_gpu || ret != -EBUSY)
> +                             return ret;
> +             }
>        } while (1);
>        mem->mem_type = mem_type;
>        return ttm_bo_add_move_fence(bo, man, mem);
> --
> 2.17.1
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org<mailto:dri-devel@lists.freedesktop.org>
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
Christian König April 24, 2019, 7:12 a.m. UTC | #6
> how about just adding a wrapper for pin function as below?
I considered this as well and don't think it will work reliable.

We could use it as a band aid for this specific problem, but in general 
we need to improve the handling in TTM to resolve those kind of resource 
conflicts.

Regards,
Christian.

Am 23.04.19 um 17:09 schrieb Zhou, David(ChunMing):
> >3. If we have a ticket we grab a reference to the first BO on the 
> LRU, drop the LRU lock and try to grab the reservation lock with the 
> ticket.
>
> The BO on LRU is already locked by cs user, can it be dropped here by 
> DC user? and then DC user grab its lock with ticket, how does CS grab 
> it again?
>
> If you think waiting in ttm has this risk, how about just adding a 
> wrapper for pin function as below?
> amdgpu_get_pin_bo_timeout()
> {
> do {
> amdgpo_bo_reserve();
> r=amdgpu_bo_pin();
>
> if(!r)
>         break;
> amdgpu_bo_unreserve();
> timeout--;
>
> } while(timeout>0);
>
> }
>
> -------- Original Message --------
> Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy
> From: Christian König
> To: "Zhou, David(ChunMing)" ,"Koenig, Christian" ,"Liang, Prike" 
> ,dri-devel@lists.freedesktop.org
> CC:
>
> Well that's not so easy of hand.
>
> The basic problem here is that when you busy wait at this place you 
> can easily run into situations where application A busy waits for B 
> while B busy waits for A -> deadlock.
>
> So what we need here is the deadlock detection logic of the ww_mutex. 
> To use this we at least need to do the following steps:
>
> 1. Reserve the BO in DC using a ww_mutex ticket (trivial).
>
> 2. If we then run into this EBUSY condition in TTM check if the BO we 
> need memory for (or rather the ww_mutex of its reservation object) has 
> a ticket assigned.
>
> 3. If we have a ticket we grab a reference to the first BO on the LRU, 
> drop the LRU lock and try to grab the reservation lock with the ticket.
>
> 4. If getting the reservation lock with the ticket succeeded we check 
> if the BO is still the first one on the LRU in question (the BO could 
> have moved).
>
> 5. If the BO is still the first one on the LRU in question we try to 
> evict it as we would evict any other BO.
>
> 6. If any of the "If's" above fail we just back off and return -EBUSY.
>
> Steps 2-5 are certainly not trivial, but doable as far as I can see.
>
> Have fun :)
> Christian.
>
> Am 23.04.19 um 15:19 schrieb Zhou, David(ChunMing):
>> How about adding more condition ctx->resv inline to address your 
>> concern? As well as don't wait from same user, shouldn't lead to 
>> deadlock.
>>
>> Otherwise, any other idea?
>>
>> -------- Original Message --------
>> Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy
>> From: Christian König
>> To: "Liang, Prike" ,"Zhou, David(ChunMing)" 
>> ,dri-devel@lists.freedesktop.org
>> CC:
>>
>> Well that is certainly a NAK because it can lead to deadlock in the
>> memory management.
>>
>> You can't just busy wait with all those locks held.
>>
>> Regards,
>> Christian.
>>
>> Am 23.04.19 um 03:45 schrieb Liang, Prike:
>> > Acked-by: Prike Liang <Prike.Liang@amd.com>
>> >
>> > Thanks,
>> > Prike
>> > -----Original Message-----
>> > From: Chunming Zhou <david1.zhou@amd.com>
>> > Sent: Monday, April 22, 2019 6:39 PM
>> > To: dri-devel@lists.freedesktop.org
>> > Cc: Liang, Prike <Prike.Liang@amd.com>; Zhou, David(ChunMing) 
>> <David1.Zhou@amd.com>
>> > Subject: [PATCH] ttm: wait mem space if user allow while gpu busy
>> >
>> > heavy gpu job could occupy memory long time, which could lead to 
>> other user fail to get memory.
>> >
>> > Change-Id: I0b322d98cd76e5ac32b00462bbae8008d76c5e11
>> > Signed-off-by: Chunming Zhou <david1.zhou@amd.com>
>> > ---
>> >   drivers/gpu/drm/ttm/ttm_bo.c | 6 ++++--
>> >   1 file changed, 4 insertions(+), 2 deletions(-)
>> >
>> > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c 
>> b/drivers/gpu/drm/ttm/ttm_bo.c index 7c484729f9b2..6c596cc24bec 100644
>> > --- a/drivers/gpu/drm/ttm/ttm_bo.c
>> > +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>> > @@ -830,8 +830,10 @@ static int ttm_bo_mem_force_space(struct 
>> ttm_buffer_object *bo,
>> >                if (mem->mm_node)
>> >                        break;
>> >                ret = ttm_mem_evict_first(bdev, mem_type, place, ctx);
>> > -             if (unlikely(ret != 0))
>> > -                     return ret;
>> > +             if (unlikely(ret != 0)) {
>> > +                     if (!ctx || ctx->no_wait_gpu || ret != -EBUSY)
>> > +                             return ret;
>> > +             }
>> >        } while (1);
>> >        mem->mem_type = mem_type;
>> >        return ttm_bo_add_move_fence(bo, man, mem);
>> > --
>> > 2.17.1
>> >
>> > _______________________________________________
>> > dri-devel mailing list
>> > dri-devel@lists.freedesktop.org
>> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>
>>
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html;
      charset=windows-1252">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <div class="moz-cite-prefix">
      <blockquote type="cite">how about just adding a wrapper for pin
        function as below?</blockquote>
      I considered this as well and don't think it will work reliable.<br>
      <br>
      We could use it as a band aid for this specific problem, but in
      general we need to improve the handling in TTM to resolve those
      kind of resource conflicts.<br>
      <br>
      Regards,<br>
      Christian.<br>
      <br>
      Am 23.04.19 um 17:09 schrieb Zhou, David(ChunMing):<br>
    </div>
    <blockquote type="cite"
cite="mid:-jm8957cqk536djh1631fvvv-xx4wzb5q21ak-v8q7rd2l14aq-f68kxr7kbea18a7xceae626b-8s84c4d1mgrg53g0bhq3ahee89h70qrv4ly1-vf5a7d63x4mbquxnfmiehuk2-rwaw2m-qc2chu.1556032141262@email.android.com">
      <meta http-equiv="Content-Type" content="text/html;
        charset=windows-1252">
      <meta content="text/html; charset=Windows-1252">
      &gt;3. If we have a ticket we grab a reference to the first BO on
      the LRU, drop the LRU lock and try to grab the reservation lock
      with the ticket.<br>
      <br>
      The BO on LRU is already locked by cs user, can it be dropped here
      by DC user? and then DC user grab its lock with ticket, how does
      CS grab it again?<br>
      <br>
      If you think waiting in ttm has this risk, how about just adding a
      wrapper for pin function as below?<br>
      amdgpu_get_pin_bo_timeout()<br>
      {<br>
      do {<br>
      amdgpo_bo_reserve();<br>
      r=amdgpu_bo_pin();<br>
      <br>
      if(!r)<br>
              break;<br>
      amdgpu_bo_unreserve();<br>
      timeout--;<br>
      <br>
      } while(timeout&gt;0);<br>
      <br>
      }<br>
      <br>
      -------- Original Message --------<br>
      Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu
      busy<br>
      From: Christian König <br>
      To: "Zhou, David(ChunMing)" ,"Koenig, Christian" ,"Liang, Prike"
      ,<a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org">dri-devel@lists.freedesktop.org</a><br>
      CC: <br>
      <br>
      <div>
        <div class="moz-cite-prefix">Well that's not so easy of hand.<br>
          <br>
          The basic problem here is that when you busy wait at this
          place you can easily run into situations where application A
          busy waits for B while B busy waits for A -&gt; deadlock.<br>
          <br>
          So what we need here is the deadlock detection logic of the
          ww_mutex. To use this we at least need to do the following
          steps:<br>
          <br>
          1. Reserve the BO in DC using a ww_mutex ticket (trivial).<br>
          <br>
          2. If we then run into this EBUSY condition in TTM check if
          the BO we need memory for (or rather the ww_mutex of its
          reservation object) has a ticket assigned.<br>
          <br>
          3. If we have a ticket we grab a reference to the first BO on
          the LRU, drop the LRU lock and try to grab the reservation
          lock with the ticket.<br>
          <br>
          4. If getting the reservation lock with the ticket succeeded
          we check if the BO is still the first one on the LRU in
          question (the BO could have moved).<br>
          <br>
          5. If the BO is still the first one on the LRU in question we
          try to evict it as we would evict any other BO.<br>
          <br>
          6. If any of the "If's" above fail we just back off and return
          -EBUSY.<br>
          <br>
          Steps 2-5 are certainly not trivial, but doable as far as I
          can see.<br>
          <br>
          Have fun :)<br>
          Christian.<br>
          <br>
          Am 23.04.19 um 15:19 schrieb Zhou, David(ChunMing):<br>
        </div>
        <blockquote type="cite">
          <meta name="Generator" content="Microsoft Exchange Server">
          <style>
<!--
.EmailQuote
	{margin-left:1pt;
	padding-left:4pt;
	border-left:#800000 2px solid}
-->
</style>
          <div>How about adding more condition ctx-&gt;resv inline to
            address your concern? As well as don't wait from same user,
            shouldn't lead to deadlock.<br>
            <br>
            Otherwise, any other idea?<br>
            <br>
            -------- Original Message --------<br>
            Subject: Re: [PATCH] ttm: wait mem space if user allow while
            gpu busy<br>
            From: Christian König <br>
            To: "Liang, Prike" ,"Zhou, David(ChunMing)" ,<a
              class="moz-txt-link-abbreviated"
              href="mailto:dri-devel@lists.freedesktop.org"
              moz-do-not-send="true">dri-devel@lists.freedesktop.org</a><br>
            CC: <br>
            <br>
          </div>
          <font size="2"><span style="font-size:11pt">
              <div class="PlainText">Well that is certainly a NAK
                because it can lead to deadlock in the
                <br>
                memory management.<br>
                <br>
                You can't just busy wait with all those locks held.<br>
                <br>
                Regards,<br>
                Christian.<br>
                <br>
                Am 23.04.19 um 03:45 schrieb Liang, Prike:<br>
                &gt; Acked-by: Prike Liang <a
                  class="moz-txt-link-rfc2396E"
                  href="mailto:Prike.Liang@amd.com"
                  moz-do-not-send="true">
                  &lt;Prike.Liang@amd.com&gt;</a><br>
                &gt;<br>
                &gt; Thanks,<br>
                &gt; Prike<br>
                &gt; -----Original Message-----<br>
                &gt; From: Chunming Zhou <a
                  class="moz-txt-link-rfc2396E"
                  href="mailto:david1.zhou@amd.com"
                  moz-do-not-send="true">
                  &lt;david1.zhou@amd.com&gt;</a><br>
                &gt; Sent: Monday, April 22, 2019 6:39 PM<br>
                &gt; To: <a class="moz-txt-link-abbreviated"
                  href="mailto:dri-devel@lists.freedesktop.org"
                  moz-do-not-send="true">
                  dri-devel@lists.freedesktop.org</a><br>
                &gt; Cc: Liang, Prike <a class="moz-txt-link-rfc2396E"
                  href="mailto:Prike.Liang@amd.com"
                  moz-do-not-send="true">
                  &lt;Prike.Liang@amd.com&gt;</a>; Zhou, David(ChunMing)
                <a class="moz-txt-link-rfc2396E"
                  href="mailto:David1.Zhou@amd.com"
                  moz-do-not-send="true">
                  &lt;David1.Zhou@amd.com&gt;</a><br>
                &gt; Subject: [PATCH] ttm: wait mem space if user allow
                while gpu busy<br>
                &gt;<br>
                &gt; heavy gpu job could occupy memory long time, which
                could lead to other user fail to get memory.<br>
                &gt;<br>
                &gt; Change-Id:
                I0b322d98cd76e5ac32b00462bbae8008d76c5e11<br>
                &gt; Signed-off-by: Chunming Zhou <a
                  class="moz-txt-link-rfc2396E"
                  href="mailto:david1.zhou@amd.com"
                  moz-do-not-send="true">
                  &lt;david1.zhou@amd.com&gt;</a><br>
                &gt; ---<br>
                &gt;   drivers/gpu/drm/ttm/ttm_bo.c | 6 ++++--<br>
                &gt;   1 file changed, 4 insertions(+), 2 deletions(-)<br>
                &gt;<br>
                &gt; diff --git a/drivers/gpu/drm/ttm/ttm_bo.c
                b/drivers/gpu/drm/ttm/ttm_bo.c index
                7c484729f9b2..6c596cc24bec 100644<br>
                &gt; --- a/drivers/gpu/drm/ttm/ttm_bo.c<br>
                &gt; +++ b/drivers/gpu/drm/ttm/ttm_bo.c<br>
                &gt; @@ -830,8 +830,10 @@ static int
                ttm_bo_mem_force_space(struct ttm_buffer_object *bo,<br>
                &gt;                if (mem-&gt;mm_node)<br>
                &gt;                        break;<br>
                &gt;                ret = ttm_mem_evict_first(bdev,
                mem_type, place, ctx);<br>
                &gt; -             if (unlikely(ret != 0))<br>
                &gt; -                     return ret;<br>
                &gt; +             if (unlikely(ret != 0)) {<br>
                &gt; +                     if (!ctx ||
                ctx-&gt;no_wait_gpu || ret != -EBUSY)<br>
                &gt; +                             return ret;<br>
                &gt; +             }<br>
                &gt;        } while (1);<br>
                &gt;        mem-&gt;mem_type = mem_type;<br>
                &gt;        return ttm_bo_add_move_fence(bo, man, mem);<br>
                &gt; --<br>
                &gt; 2.17.1<br>
                &gt;<br>
                &gt; _______________________________________________<br>
                &gt; dri-devel mailing list<br>
                &gt; <a class="moz-txt-link-abbreviated"
                  href="mailto:dri-devel@lists.freedesktop.org"
                  moz-do-not-send="true">
                  dri-devel@lists.freedesktop.org</a><br>
                &gt; <a
                  href="https://lists.freedesktop.org/mailman/listinfo/dri-devel"
                  moz-do-not-send="true">https://lists.freedesktop.org/mailman/listinfo/dri-devel</a><br>
                <br>
              </div>
            </span></font><br>
          <fieldset class="mimeAttachmentHeader"></fieldset>
          <pre class="moz-quote-pre">_______________________________________________
dri-devel mailing list
<a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true">dri-devel@lists.freedesktop.org</a>
<a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/dri-devel" moz-do-not-send="true">https://lists.freedesktop.org/mailman/listinfo/dri-devel</a></pre>
        </blockquote>
        <br>
      </div>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <pre class="moz-quote-pre" wrap="">_______________________________________________
dri-devel mailing list
<a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org">dri-devel@lists.freedesktop.org</a>
<a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/dri-devel">https://lists.freedesktop.org/mailman/listinfo/dri-devel</a></pre>
    </blockquote>
    <br>
  </body>
</html>
Chunming Zhou April 24, 2019, 7:17 a.m. UTC | #7
I made a patch as attached.

I'm not sure how to make patch as your proposal, Could you make a patch 
for that if mine isn't enough?

-David

On 2019年04月24日 15:12, Christian König wrote:
>> how about just adding a wrapper for pin function as below?
> I considered this as well and don't think it will work reliable.
>
> We could use it as a band aid for this specific problem, but in 
> general we need to improve the handling in TTM to resolve those kind 
> of resource conflicts.
>
> Regards,
> Christian.
>
> Am 23.04.19 um 17:09 schrieb Zhou, David(ChunMing):
>> >3. If we have a ticket we grab a reference to the first BO on the 
>> LRU, drop the LRU lock and try to grab the reservation lock with the 
>> ticket.
>>
>> The BO on LRU is already locked by cs user, can it be dropped here by 
>> DC user? and then DC user grab its lock with ticket, how does CS grab 
>> it again?
>>
>> If you think waiting in ttm has this risk, how about just adding a 
>> wrapper for pin function as below?
>> amdgpu_get_pin_bo_timeout()
>> {
>> do {
>> amdgpo_bo_reserve();
>> r=amdgpu_bo_pin();
>>
>> if(!r)
>>         break;
>> amdgpu_bo_unreserve();
>> timeout--;
>>
>> } while(timeout>0);
>>
>> }
>>
>> -------- Original Message --------
>> Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy
>> From: Christian König
>> To: "Zhou, David(ChunMing)" ,"Koenig, Christian" ,"Liang, Prike" 
>> ,dri-devel@lists.freedesktop.org
>> CC:
>>
>> Well that's not so easy of hand.
>>
>> The basic problem here is that when you busy wait at this place you 
>> can easily run into situations where application A busy waits for B 
>> while B busy waits for A -> deadlock.
>>
>> So what we need here is the deadlock detection logic of the ww_mutex. 
>> To use this we at least need to do the following steps:
>>
>> 1. Reserve the BO in DC using a ww_mutex ticket (trivial).
>>
>> 2. If we then run into this EBUSY condition in TTM check if the BO we 
>> need memory for (or rather the ww_mutex of its reservation object) 
>> has a ticket assigned.
>>
>> 3. If we have a ticket we grab a reference to the first BO on the 
>> LRU, drop the LRU lock and try to grab the reservation lock with the 
>> ticket.
>>
>> 4. If getting the reservation lock with the ticket succeeded we check 
>> if the BO is still the first one on the LRU in question (the BO could 
>> have moved).
>>
>> 5. If the BO is still the first one on the LRU in question we try to 
>> evict it as we would evict any other BO.
>>
>> 6. If any of the "If's" above fail we just back off and return -EBUSY.
>>
>> Steps 2-5 are certainly not trivial, but doable as far as I can see.
>>
>> Have fun :)
>> Christian.
>>
>> Am 23.04.19 um 15:19 schrieb Zhou, David(ChunMing):
>>> How about adding more condition ctx->resv inline to address your 
>>> concern? As well as don't wait from same user, shouldn't lead to 
>>> deadlock.
>>>
>>> Otherwise, any other idea?
>>>
>>> -------- Original Message --------
>>> Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy
>>> From: Christian König
>>> To: "Liang, Prike" ,"Zhou, David(ChunMing)" 
>>> ,dri-devel@lists.freedesktop.org
>>> CC:
>>>
>>> Well that is certainly a NAK because it can lead to deadlock in the
>>> memory management.
>>>
>>> You can't just busy wait with all those locks held.
>>>
>>> Regards,
>>> Christian.
>>>
>>> Am 23.04.19 um 03:45 schrieb Liang, Prike:
>>> > Acked-by: Prike Liang <Prike.Liang@amd.com>
>>> >
>>> > Thanks,
>>> > Prike
>>> > -----Original Message-----
>>> > From: Chunming Zhou <david1.zhou@amd.com>
>>> > Sent: Monday, April 22, 2019 6:39 PM
>>> > To: dri-devel@lists.freedesktop.org
>>> > Cc: Liang, Prike <Prike.Liang@amd.com>; Zhou, David(ChunMing) 
>>> <David1.Zhou@amd.com>
>>> > Subject: [PATCH] ttm: wait mem space if user allow while gpu busy
>>> >
>>> > heavy gpu job could occupy memory long time, which could lead to 
>>> other user fail to get memory.
>>> >
>>> > Change-Id: I0b322d98cd76e5ac32b00462bbae8008d76c5e11
>>> > Signed-off-by: Chunming Zhou <david1.zhou@amd.com>
>>> > ---
>>> >   drivers/gpu/drm/ttm/ttm_bo.c | 6 ++++--
>>> >   1 file changed, 4 insertions(+), 2 deletions(-)
>>> >
>>> > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c 
>>> b/drivers/gpu/drm/ttm/ttm_bo.c index 7c484729f9b2..6c596cc24bec 100644
>>> > --- a/drivers/gpu/drm/ttm/ttm_bo.c
>>> > +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>>> > @@ -830,8 +830,10 @@ static int ttm_bo_mem_force_space(struct 
>>> ttm_buffer_object *bo,
>>> >                if (mem->mm_node)
>>> >                        break;
>>> >                ret = ttm_mem_evict_first(bdev, mem_type, place, ctx);
>>> > -             if (unlikely(ret != 0))
>>> > -                     return ret;
>>> > +             if (unlikely(ret != 0)) {
>>> > +                     if (!ctx || ctx->no_wait_gpu || ret != -EBUSY)
>>> > +                             return ret;
>>> > +             }
>>> >        } while (1);
>>> >        mem->mem_type = mem_type;
>>> >        return ttm_bo_add_move_fence(bo, man, mem);
>>> > --
>>> > 2.17.1
>>> >
>>> > _______________________________________________
>>> > dri-devel mailing list
>>> > dri-devel@lists.freedesktop.org
>>> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>
>>>
>>> _______________________________________________
>>> dri-devel mailing list
>>> dri-devel@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>
>>
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <p>I made a patch as attached.</p>
    <p>I'm not sure how to make patch as your proposal, Could you make a
      patch for that if mine isn't enough?<br>
    </p>
    -David<br>
    <br>
    <div class="moz-cite-prefix">On 2019年04月24日 15:12, Christian König
      wrote:<br>
    </div>
    <blockquote type="cite"
      cite="mid:32f2a1b7-99f8-de9a-799c-f98af308842b@gmail.com">
      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
      <div class="moz-cite-prefix">
        <blockquote type="cite">how about just adding a wrapper for pin
          function as below?</blockquote>
        I considered this as well and don't think it will work reliable.<br>
        <br>
        We could use it as a band aid for this specific problem, but in
        general we need to improve the handling in TTM to resolve those
        kind of resource conflicts.<br>
        <br>
        Regards,<br>
        Christian.<br>
        <br>
        Am 23.04.19 um 17:09 schrieb Zhou, David(ChunMing):<br>
      </div>
      <blockquote type="cite"
cite="mid:-jm8957cqk536djh1631fvvv-xx4wzb5q21ak-v8q7rd2l14aq-f68kxr7kbea18a7xceae626b-8s84c4d1mgrg53g0bhq3ahee89h70qrv4ly1-vf5a7d63x4mbquxnfmiehuk2-rwaw2m-qc2chu.1556032141262@email.android.com">
        <meta content="text/html; charset=Windows-1252">
        &gt;3. If we have a ticket we grab a reference to the first BO
        on the LRU, drop the LRU lock and try to grab the reservation
        lock with the ticket.<br>
        <br>
        The BO on LRU is already locked by cs user, can it be dropped
        here by DC user? and then DC user grab its lock with ticket, how
        does CS grab it again?<br>
        <br>
        If you think waiting in ttm has this risk, how about just adding
        a wrapper for pin function as below?<br>
        amdgpu_get_pin_bo_timeout()<br>
        {<br>
        do {<br>
        amdgpo_bo_reserve();<br>
        r=amdgpu_bo_pin();<br>
        <br>
        if(!r)<br>
                break;<br>
        amdgpu_bo_unreserve();<br>
        timeout--;<br>
        <br>
        } while(timeout&gt;0);<br>
        <br>
        }<br>
        <br>
        -------- Original Message --------<br>
        Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu
        busy<br>
        From: Christian König <br>
        To: "Zhou, David(ChunMing)" ,"Koenig, Christian" ,"Liang, Prike"
        ,<a class="moz-txt-link-abbreviated"
          href="mailto:dri-devel@lists.freedesktop.org"
          moz-do-not-send="true">dri-devel@lists.freedesktop.org</a><br>
        CC: <br>
        <br>
        <div>
          <div class="moz-cite-prefix">Well that's not so easy of hand.<br>
            <br>
            The basic problem here is that when you busy wait at this
            place you can easily run into situations where application A
            busy waits for B while B busy waits for A -&gt; deadlock.<br>
            <br>
            So what we need here is the deadlock detection logic of the
            ww_mutex. To use this we at least need to do the following
            steps:<br>
            <br>
            1. Reserve the BO in DC using a ww_mutex ticket (trivial).<br>
            <br>
            2. If we then run into this EBUSY condition in TTM check if
            the BO we need memory for (or rather the ww_mutex of its
            reservation object) has a ticket assigned.<br>
            <br>
            3. If we have a ticket we grab a reference to the first BO
            on the LRU, drop the LRU lock and try to grab the
            reservation lock with the ticket.<br>
            <br>
            4. If getting the reservation lock with the ticket succeeded
            we check if the BO is still the first one on the LRU in
            question (the BO could have moved).<br>
            <br>
            5. If the BO is still the first one on the LRU in question
            we try to evict it as we would evict any other BO.<br>
            <br>
            6. If any of the "If's" above fail we just back off and
            return -EBUSY.<br>
            <br>
            Steps 2-5 are certainly not trivial, but doable as far as I
            can see.<br>
            <br>
            Have fun :)<br>
            Christian.<br>
            <br>
            Am 23.04.19 um 15:19 schrieb Zhou, David(ChunMing):<br>
          </div>
          <blockquote type="cite">
            <meta name="Generator" content="Microsoft Exchange Server">
            <style>
<!--
.EmailQuote
	{margin-left:1pt;
	padding-left:4pt;
	border-left:#800000 2px solid}
-->
</style>
            <div>How about adding more condition ctx-&gt;resv inline to
              address your concern? As well as don't wait from same
              user, shouldn't lead to deadlock.<br>
              <br>
              Otherwise, any other idea?<br>
              <br>
              -------- Original Message --------<br>
              Subject: Re: [PATCH] ttm: wait mem space if user allow
              while gpu busy<br>
              From: Christian König <br>
              To: "Liang, Prike" ,"Zhou, David(ChunMing)" ,<a
                class="moz-txt-link-abbreviated"
                href="mailto:dri-devel@lists.freedesktop.org"
                moz-do-not-send="true">dri-devel@lists.freedesktop.org</a><br>
              CC: <br>
              <br>
            </div>
            <font size="2"><span style="font-size:11pt">
                <div class="PlainText">Well that is certainly a NAK
                  because it can lead to deadlock in the <br>
                  memory management.<br>
                  <br>
                  You can't just busy wait with all those locks held.<br>
                  <br>
                  Regards,<br>
                  Christian.<br>
                  <br>
                  Am 23.04.19 um 03:45 schrieb Liang, Prike:<br>
                  &gt; Acked-by: Prike Liang <a
                    class="moz-txt-link-rfc2396E"
                    href="mailto:Prike.Liang@amd.com"
                    moz-do-not-send="true"> &lt;Prike.Liang@amd.com&gt;</a><br>
                  &gt;<br>
                  &gt; Thanks,<br>
                  &gt; Prike<br>
                  &gt; -----Original Message-----<br>
                  &gt; From: Chunming Zhou <a
                    class="moz-txt-link-rfc2396E"
                    href="mailto:david1.zhou@amd.com"
                    moz-do-not-send="true"> &lt;david1.zhou@amd.com&gt;</a><br>
                  &gt; Sent: Monday, April 22, 2019 6:39 PM<br>
                  &gt; To: <a class="moz-txt-link-abbreviated"
                    href="mailto:dri-devel@lists.freedesktop.org"
                    moz-do-not-send="true">
                    dri-devel@lists.freedesktop.org</a><br>
                  &gt; Cc: Liang, Prike <a
                    class="moz-txt-link-rfc2396E"
                    href="mailto:Prike.Liang@amd.com"
                    moz-do-not-send="true"> &lt;Prike.Liang@amd.com&gt;</a>;
                  Zhou, David(ChunMing) <a
                    class="moz-txt-link-rfc2396E"
                    href="mailto:David1.Zhou@amd.com"
                    moz-do-not-send="true"> &lt;David1.Zhou@amd.com&gt;</a><br>
                  &gt; Subject: [PATCH] ttm: wait mem space if user
                  allow while gpu busy<br>
                  &gt;<br>
                  &gt; heavy gpu job could occupy memory long time,
                  which could lead to other user fail to get memory.<br>
                  &gt;<br>
                  &gt; Change-Id:
                  I0b322d98cd76e5ac32b00462bbae8008d76c5e11<br>
                  &gt; Signed-off-by: Chunming Zhou <a
                    class="moz-txt-link-rfc2396E"
                    href="mailto:david1.zhou@amd.com"
                    moz-do-not-send="true"> &lt;david1.zhou@amd.com&gt;</a><br>
                  &gt; ---<br>
                  &gt;   drivers/gpu/drm/ttm/ttm_bo.c | 6 ++++--<br>
                  &gt;   1 file changed, 4 insertions(+), 2 deletions(-)<br>
                  &gt;<br>
                  &gt; diff --git a/drivers/gpu/drm/ttm/ttm_bo.c
                  b/drivers/gpu/drm/ttm/ttm_bo.c index
                  7c484729f9b2..6c596cc24bec 100644<br>
                  &gt; --- a/drivers/gpu/drm/ttm/ttm_bo.c<br>
                  &gt; +++ b/drivers/gpu/drm/ttm/ttm_bo.c<br>
                  &gt; @@ -830,8 +830,10 @@ static int
                  ttm_bo_mem_force_space(struct ttm_buffer_object *bo,<br>
                  &gt;                if (mem-&gt;mm_node)<br>
                  &gt;                        break;<br>
                  &gt;                ret = ttm_mem_evict_first(bdev,
                  mem_type, place, ctx);<br>
                  &gt; -             if (unlikely(ret != 0))<br>
                  &gt; -                     return ret;<br>
                  &gt; +             if (unlikely(ret != 0)) {<br>
                  &gt; +                     if (!ctx ||
                  ctx-&gt;no_wait_gpu || ret != -EBUSY)<br>
                  &gt; +                             return ret;<br>
                  &gt; +             }<br>
                  &gt;        } while (1);<br>
                  &gt;        mem-&gt;mem_type = mem_type;<br>
                  &gt;        return ttm_bo_add_move_fence(bo, man,
                  mem);<br>
                  &gt; --<br>
                  &gt; 2.17.1<br>
                  &gt;<br>
                  &gt; _______________________________________________<br>
                  &gt; dri-devel mailing list<br>
                  &gt; <a class="moz-txt-link-abbreviated"
                    href="mailto:dri-devel@lists.freedesktop.org"
                    moz-do-not-send="true">
                    dri-devel@lists.freedesktop.org</a><br>
                  &gt; <a
                    href="https://lists.freedesktop.org/mailman/listinfo/dri-devel"
                    moz-do-not-send="true">https://lists.freedesktop.org/mailman/listinfo/dri-devel</a><br>
                  <br>
                </div>
              </span></font><br>
            <fieldset class="mimeAttachmentHeader"></fieldset>
            <pre class="moz-quote-pre">_______________________________________________
dri-devel mailing list
<a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true">dri-devel@lists.freedesktop.org</a>
<a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/dri-devel" moz-do-not-send="true">https://lists.freedesktop.org/mailman/listinfo/dri-devel</a></pre>
          </blockquote>
          <br>
        </div>
        <br>
        <fieldset class="mimeAttachmentHeader"></fieldset>
        <pre class="moz-quote-pre" wrap="">_______________________________________________
dri-devel mailing list
<a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true">dri-devel@lists.freedesktop.org</a>
<a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/dri-devel" moz-do-not-send="true">https://lists.freedesktop.org/mailman/listinfo/dri-devel</a></pre>
      </blockquote>
      <br>
    </blockquote>
    <br>
  </body>
</html>
From 184941165665d80ad1992179e7652d7e4d739b2e Mon Sep 17 00:00:00 2001
From: Chunming Zhou <david1.zhou@amd.com>
Date: Wed, 24 Apr 2019 11:35:29 +0800
Subject: [PATCH] ttm: return EBUSY to user

EBUSY means there is no idle BO on LRU which can be evicted, the error
is meaningful to user.

Change-Id: I3857c4b99859c9d2f354cf2c486dbacb8520d0ed
Signed-off-by: Chunming Zhou <david1.zhou@amd.com>
---
 drivers/gpu/drm/ttm/ttm_bo.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 7c484729f9b2..dce90053f29e 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -1000,7 +1000,7 @@ int ttm_bo_mem_space(struct ttm_buffer_object *bo,
 		return -EINVAL;
 	}
 
-	return (has_erestartsys) ? -ERESTARTSYS : -ENOMEM;
+	return ret == -EBUSY ? ret : (has_erestartsys) ? -ERESTARTSYS : -ENOMEM;
 }
 EXPORT_SYMBOL(ttm_bo_mem_space);
Christian König April 24, 2019, 7:30 a.m. UTC | #8
That would change the semantics of ttm_bo_mem_space() and so could 
change the return code in an IOCTL as well. Not a good idea, cause that 
could have a lot of side effects.

Instead in the calling DC code you could check if you get an -ENOMEM and 
then call schedule().

If after the schedule() we see that we have now BOs on the LRU we can 
try again and see if pinning the frame buffer now succeeds.

Christian.

Am 24.04.19 um 09:17 schrieb zhoucm1:
>
> I made a patch as attached.
>
> I'm not sure how to make patch as your proposal, Could you make a 
> patch for that if mine isn't enough?
>
> -David
>
> On 2019年04月24日 15:12, Christian König wrote:
>>> how about just adding a wrapper for pin function as below?
>> I considered this as well and don't think it will work reliable.
>>
>> We could use it as a band aid for this specific problem, but in 
>> general we need to improve the handling in TTM to resolve those kind 
>> of resource conflicts.
>>
>> Regards,
>> Christian.
>>
>> Am 23.04.19 um 17:09 schrieb Zhou, David(ChunMing):
>>> >3. If we have a ticket we grab a reference to the first BO on the 
>>> LRU, drop the LRU lock and try to grab the reservation lock with the 
>>> ticket.
>>>
>>> The BO on LRU is already locked by cs user, can it be dropped here 
>>> by DC user? and then DC user grab its lock with ticket, how does CS 
>>> grab it again?
>>>
>>> If you think waiting in ttm has this risk, how about just adding a 
>>> wrapper for pin function as below?
>>> amdgpu_get_pin_bo_timeout()
>>> {
>>> do {
>>> amdgpo_bo_reserve();
>>> r=amdgpu_bo_pin();
>>>
>>> if(!r)
>>>         break;
>>> amdgpu_bo_unreserve();
>>> timeout--;
>>>
>>> } while(timeout>0);
>>>
>>> }
>>>
>>> -------- Original Message --------
>>> Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy
>>> From: Christian König
>>> To: "Zhou, David(ChunMing)" ,"Koenig, Christian" ,"Liang, Prike" 
>>> ,dri-devel@lists.freedesktop.org
>>> CC:
>>>
>>> Well that's not so easy of hand.
>>>
>>> The basic problem here is that when you busy wait at this place you 
>>> can easily run into situations where application A busy waits for B 
>>> while B busy waits for A -> deadlock.
>>>
>>> So what we need here is the deadlock detection logic of the 
>>> ww_mutex. To use this we at least need to do the following steps:
>>>
>>> 1. Reserve the BO in DC using a ww_mutex ticket (trivial).
>>>
>>> 2. If we then run into this EBUSY condition in TTM check if the BO 
>>> we need memory for (or rather the ww_mutex of its reservation 
>>> object) has a ticket assigned.
>>>
>>> 3. If we have a ticket we grab a reference to the first BO on the 
>>> LRU, drop the LRU lock and try to grab the reservation lock with the 
>>> ticket.
>>>
>>> 4. If getting the reservation lock with the ticket succeeded we 
>>> check if the BO is still the first one on the LRU in question (the 
>>> BO could have moved).
>>>
>>> 5. If the BO is still the first one on the LRU in question we try to 
>>> evict it as we would evict any other BO.
>>>
>>> 6. If any of the "If's" above fail we just back off and return -EBUSY.
>>>
>>> Steps 2-5 are certainly not trivial, but doable as far as I can see.
>>>
>>> Have fun :)
>>> Christian.
>>>
>>> Am 23.04.19 um 15:19 schrieb Zhou, David(ChunMing):
>>>> How about adding more condition ctx->resv inline to address your 
>>>> concern? As well as don't wait from same user, shouldn't lead to 
>>>> deadlock.
>>>>
>>>> Otherwise, any other idea?
>>>>
>>>> -------- Original Message --------
>>>> Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy
>>>> From: Christian König
>>>> To: "Liang, Prike" ,"Zhou, David(ChunMing)" 
>>>> ,dri-devel@lists.freedesktop.org
>>>> CC:
>>>>
>>>> Well that is certainly a NAK because it can lead to deadlock in the
>>>> memory management.
>>>>
>>>> You can't just busy wait with all those locks held.
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>> Am 23.04.19 um 03:45 schrieb Liang, Prike:
>>>> > Acked-by: Prike Liang <Prike.Liang@amd.com>
>>>> >
>>>> > Thanks,
>>>> > Prike
>>>> > -----Original Message-----
>>>> > From: Chunming Zhou <david1.zhou@amd.com>
>>>> > Sent: Monday, April 22, 2019 6:39 PM
>>>> > To: dri-devel@lists.freedesktop.org
>>>> > Cc: Liang, Prike <Prike.Liang@amd.com>; Zhou, David(ChunMing) 
>>>> <David1.Zhou@amd.com>
>>>> > Subject: [PATCH] ttm: wait mem space if user allow while gpu busy
>>>> >
>>>> > heavy gpu job could occupy memory long time, which could lead to 
>>>> other user fail to get memory.
>>>> >
>>>> > Change-Id: I0b322d98cd76e5ac32b00462bbae8008d76c5e11
>>>> > Signed-off-by: Chunming Zhou <david1.zhou@amd.com>
>>>> > ---
>>>> >   drivers/gpu/drm/ttm/ttm_bo.c | 6 ++++--
>>>> >   1 file changed, 4 insertions(+), 2 deletions(-)
>>>> >
>>>> > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c 
>>>> b/drivers/gpu/drm/ttm/ttm_bo.c index 7c484729f9b2..6c596cc24bec 100644
>>>> > --- a/drivers/gpu/drm/ttm/ttm_bo.c
>>>> > +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>>>> > @@ -830,8 +830,10 @@ static int ttm_bo_mem_force_space(struct 
>>>> ttm_buffer_object *bo,
>>>> >                if (mem->mm_node)
>>>> >                        break;
>>>> >                ret = ttm_mem_evict_first(bdev, mem_type, place, ctx);
>>>> > -             if (unlikely(ret != 0))
>>>> > -                     return ret;
>>>> > +             if (unlikely(ret != 0)) {
>>>> > +                     if (!ctx || ctx->no_wait_gpu || ret != -EBUSY)
>>>> > +                             return ret;
>>>> > +             }
>>>> >        } while (1);
>>>> >        mem->mem_type = mem_type;
>>>> >        return ttm_bo_add_move_fence(bo, man, mem);
>>>> > --
>>>> > 2.17.1
>>>> >
>>>> > _______________________________________________
>>>> > dri-devel mailing list
>>>> > dri-devel@lists.freedesktop.org
>>>> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>>
>>>>
>>>> _______________________________________________
>>>> dri-devel mailing list
>>>> dri-devel@lists.freedesktop.org
>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>
>>>
>>> _______________________________________________
>>> dri-devel mailing list
>>> dri-devel@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>
>
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <div class="moz-cite-prefix">That would change the semantics of
      ttm_bo_mem_space() and so could change the return code in an IOCTL
      as well. Not a good idea, cause that could have a lot of side
      effects.<br>
      <br>
      Instead in the calling DC code you could check if you get an
      -ENOMEM and then call schedule().<br>
      <br>
      If after the schedule() we see that we have now BOs on the LRU we
      can try again and see if pinning the frame buffer now succeeds.<br>
      <br>
      Christian.<br>
      <br>
      Am 24.04.19 um 09:17 schrieb zhoucm1:<br>
    </div>
    <blockquote type="cite"
      cite="mid:b57d94e6-6e20-d809-7ac2-89a39a57a365@amd.com">
      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
      <p>I made a patch as attached.</p>
      <p>I'm not sure how to make patch as your proposal, Could you make
        a patch for that if mine isn't enough?<br>
      </p>
      -David<br>
      <br>
      <div class="moz-cite-prefix">On 2019年04月24日 15:12, Christian König
        wrote:<br>
      </div>
      <blockquote type="cite"
        cite="mid:32f2a1b7-99f8-de9a-799c-f98af308842b@gmail.com">
        <meta http-equiv="Content-Type" content="text/html;
          charset=UTF-8">
        <div class="moz-cite-prefix">
          <blockquote type="cite">how about just adding a wrapper for
            pin function as below?</blockquote>
          I considered this as well and don't think it will work
          reliable.<br>
          <br>
          We could use it as a band aid for this specific problem, but
          in general we need to improve the handling in TTM to resolve
          those kind of resource conflicts.<br>
          <br>
          Regards,<br>
          Christian.<br>
          <br>
          Am 23.04.19 um 17:09 schrieb Zhou, David(ChunMing):<br>
        </div>
        <blockquote type="cite"
cite="mid:-jm8957cqk536djh1631fvvv-xx4wzb5q21ak-v8q7rd2l14aq-f68kxr7kbea18a7xceae626b-8s84c4d1mgrg53g0bhq3ahee89h70qrv4ly1-vf5a7d63x4mbquxnfmiehuk2-rwaw2m-qc2chu.1556032141262@email.android.com">
          <meta content="text/html; charset=Windows-1252">
          &gt;3. If we have a ticket we grab a reference to the first BO
          on the LRU, drop the LRU lock and try to grab the reservation
          lock with the ticket.<br>
          <br>
          The BO on LRU is already locked by cs user, can it be dropped
          here by DC user? and then DC user grab its lock with ticket,
          how does CS grab it again?<br>
          <br>
          If you think waiting in ttm has this risk, how about just
          adding a wrapper for pin function as below?<br>
          amdgpu_get_pin_bo_timeout()<br>
          {<br>
          do {<br>
          amdgpo_bo_reserve();<br>
          r=amdgpu_bo_pin();<br>
          <br>
          if(!r)<br>
                  break;<br>
          amdgpu_bo_unreserve();<br>
          timeout--;<br>
          <br>
          } while(timeout&gt;0);<br>
          <br>
          }<br>
          <br>
          -------- Original Message --------<br>
          Subject: Re: [PATCH] ttm: wait mem space if user allow while
          gpu busy<br>
          From: Christian König <br>
          To: "Zhou, David(ChunMing)" ,"Koenig, Christian" ,"Liang,
          Prike" ,<a class="moz-txt-link-abbreviated"
            href="mailto:dri-devel@lists.freedesktop.org"
            moz-do-not-send="true">dri-devel@lists.freedesktop.org</a><br>
          CC: <br>
          <br>
          <div>
            <div class="moz-cite-prefix">Well that's not so easy of
              hand.<br>
              <br>
              The basic problem here is that when you busy wait at this
              place you can easily run into situations where application
              A busy waits for B while B busy waits for A -&gt;
              deadlock.<br>
              <br>
              So what we need here is the deadlock detection logic of
              the ww_mutex. To use this we at least need to do the
              following steps:<br>
              <br>
              1. Reserve the BO in DC using a ww_mutex ticket (trivial).<br>
              <br>
              2. If we then run into this EBUSY condition in TTM check
              if the BO we need memory for (or rather the ww_mutex of
              its reservation object) has a ticket assigned.<br>
              <br>
              3. If we have a ticket we grab a reference to the first BO
              on the LRU, drop the LRU lock and try to grab the
              reservation lock with the ticket.<br>
              <br>
              4. If getting the reservation lock with the ticket
              succeeded we check if the BO is still the first one on the
              LRU in question (the BO could have moved).<br>
              <br>
              5. If the BO is still the first one on the LRU in question
              we try to evict it as we would evict any other BO.<br>
              <br>
              6. If any of the "If's" above fail we just back off and
              return -EBUSY.<br>
              <br>
              Steps 2-5 are certainly not trivial, but doable as far as
              I can see.<br>
              <br>
              Have fun :)<br>
              Christian.<br>
              <br>
              Am 23.04.19 um 15:19 schrieb Zhou, David(ChunMing):<br>
            </div>
            <blockquote type="cite">
              <meta name="Generator" content="Microsoft Exchange Server">
              <style>
<!--
.EmailQuote
	{margin-left:1pt;
	padding-left:4pt;
	border-left:#800000 2px solid}
-->
</style>
              <div>How about adding more condition ctx-&gt;resv inline
                to address your concern? As well as don't wait from same
                user, shouldn't lead to deadlock.<br>
                <br>
                Otherwise, any other idea?<br>
                <br>
                -------- Original Message --------<br>
                Subject: Re: [PATCH] ttm: wait mem space if user allow
                while gpu busy<br>
                From: Christian König <br>
                To: "Liang, Prike" ,"Zhou, David(ChunMing)" ,<a
                  class="moz-txt-link-abbreviated"
                  href="mailto:dri-devel@lists.freedesktop.org"
                  moz-do-not-send="true">dri-devel@lists.freedesktop.org</a><br>
                CC: <br>
                <br>
              </div>
              <font size="2"><span style="font-size:11pt">
                  <div class="PlainText">Well that is certainly a NAK
                    because it can lead to deadlock in the <br>
                    memory management.<br>
                    <br>
                    You can't just busy wait with all those locks held.<br>
                    <br>
                    Regards,<br>
                    Christian.<br>
                    <br>
                    Am 23.04.19 um 03:45 schrieb Liang, Prike:<br>
                    &gt; Acked-by: Prike Liang <a
                      class="moz-txt-link-rfc2396E"
                      href="mailto:Prike.Liang@amd.com"
                      moz-do-not-send="true">
                      &lt;Prike.Liang@amd.com&gt;</a><br>
                    &gt;<br>
                    &gt; Thanks,<br>
                    &gt; Prike<br>
                    &gt; -----Original Message-----<br>
                    &gt; From: Chunming Zhou <a
                      class="moz-txt-link-rfc2396E"
                      href="mailto:david1.zhou@amd.com"
                      moz-do-not-send="true">
                      &lt;david1.zhou@amd.com&gt;</a><br>
                    &gt; Sent: Monday, April 22, 2019 6:39 PM<br>
                    &gt; To: <a class="moz-txt-link-abbreviated"
                      href="mailto:dri-devel@lists.freedesktop.org"
                      moz-do-not-send="true">
                      dri-devel@lists.freedesktop.org</a><br>
                    &gt; Cc: Liang, Prike <a
                      class="moz-txt-link-rfc2396E"
                      href="mailto:Prike.Liang@amd.com"
                      moz-do-not-send="true">
                      &lt;Prike.Liang@amd.com&gt;</a>; Zhou,
                    David(ChunMing) <a class="moz-txt-link-rfc2396E"
                      href="mailto:David1.Zhou@amd.com"
                      moz-do-not-send="true">
                      &lt;David1.Zhou@amd.com&gt;</a><br>
                    &gt; Subject: [PATCH] ttm: wait mem space if user
                    allow while gpu busy<br>
                    &gt;<br>
                    &gt; heavy gpu job could occupy memory long time,
                    which could lead to other user fail to get memory.<br>
                    &gt;<br>
                    &gt; Change-Id:
                    I0b322d98cd76e5ac32b00462bbae8008d76c5e11<br>
                    &gt; Signed-off-by: Chunming Zhou <a
                      class="moz-txt-link-rfc2396E"
                      href="mailto:david1.zhou@amd.com"
                      moz-do-not-send="true">
                      &lt;david1.zhou@amd.com&gt;</a><br>
                    &gt; ---<br>
                    &gt;   drivers/gpu/drm/ttm/ttm_bo.c | 6 ++++--<br>
                    &gt;   1 file changed, 4 insertions(+), 2
                    deletions(-)<br>
                    &gt;<br>
                    &gt; diff --git a/drivers/gpu/drm/ttm/ttm_bo.c
                    b/drivers/gpu/drm/ttm/ttm_bo.c index
                    7c484729f9b2..6c596cc24bec 100644<br>
                    &gt; --- a/drivers/gpu/drm/ttm/ttm_bo.c<br>
                    &gt; +++ b/drivers/gpu/drm/ttm/ttm_bo.c<br>
                    &gt; @@ -830,8 +830,10 @@ static int
                    ttm_bo_mem_force_space(struct ttm_buffer_object *bo,<br>
                    &gt;                if (mem-&gt;mm_node)<br>
                    &gt;                        break;<br>
                    &gt;                ret = ttm_mem_evict_first(bdev,
                    mem_type, place, ctx);<br>
                    &gt; -             if (unlikely(ret != 0))<br>
                    &gt; -                     return ret;<br>
                    &gt; +             if (unlikely(ret != 0)) {<br>
                    &gt; +                     if (!ctx ||
                    ctx-&gt;no_wait_gpu || ret != -EBUSY)<br>
                    &gt; +                             return ret;<br>
                    &gt; +             }<br>
                    &gt;        } while (1);<br>
                    &gt;        mem-&gt;mem_type = mem_type;<br>
                    &gt;        return ttm_bo_add_move_fence(bo, man,
                    mem);<br>
                    &gt; --<br>
                    &gt; 2.17.1<br>
                    &gt;<br>
                    &gt; _______________________________________________<br>
                    &gt; dri-devel mailing list<br>
                    &gt; <a class="moz-txt-link-abbreviated"
                      href="mailto:dri-devel@lists.freedesktop.org"
                      moz-do-not-send="true">
                      dri-devel@lists.freedesktop.org</a><br>
                    &gt; <a
                      href="https://lists.freedesktop.org/mailman/listinfo/dri-devel"
                      moz-do-not-send="true">https://lists.freedesktop.org/mailman/listinfo/dri-devel</a><br>
                    <br>
                  </div>
                </span></font><br>
              <fieldset class="mimeAttachmentHeader"></fieldset>
              <pre class="moz-quote-pre">_______________________________________________
dri-devel mailing list
<a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true">dri-devel@lists.freedesktop.org</a>
<a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/dri-devel" moz-do-not-send="true">https://lists.freedesktop.org/mailman/listinfo/dri-devel</a></pre>
            </blockquote>
            <br>
          </div>
          <br>
          <fieldset class="mimeAttachmentHeader"></fieldset>
          <pre class="moz-quote-pre" wrap="">_______________________________________________
dri-devel mailing list
<a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true">dri-devel@lists.freedesktop.org</a>
<a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/dri-devel" moz-do-not-send="true">https://lists.freedesktop.org/mailman/listinfo/dri-devel</a></pre>
        </blockquote>
        <br>
      </blockquote>
      <br>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <pre class="moz-quote-pre" wrap="">_______________________________________________
dri-devel mailing list
<a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org">dri-devel@lists.freedesktop.org</a>
<a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/dri-devel">https://lists.freedesktop.org/mailman/listinfo/dri-devel</a></pre>
    </blockquote>
    <br>
  </body>
</html>
Chunming Zhou April 24, 2019, 7:59 a.m. UTC | #9
how about new attached?


-David


On 2019年04月24日 15:30, Christian König wrote:
> That would change the semantics of ttm_bo_mem_space() and so could 
> change the return code in an IOCTL as well. Not a good idea, cause 
> that could have a lot of side effects.
>
> Instead in the calling DC code you could check if you get an -ENOMEM 
> and then call schedule().
>
> If after the schedule() we see that we have now BOs on the LRU we can 
> try again and see if pinning the frame buffer now succeeds.
>
> Christian.
>
> Am 24.04.19 um 09:17 schrieb zhoucm1:
>>
>> I made a patch as attached.
>>
>> I'm not sure how to make patch as your proposal, Could you make a 
>> patch for that if mine isn't enough?
>>
>> -David
>>
>> On 2019年04月24日 15:12, Christian König wrote:
>>>> how about just adding a wrapper for pin function as below?
>>> I considered this as well and don't think it will work reliable.
>>>
>>> We could use it as a band aid for this specific problem, but in 
>>> general we need to improve the handling in TTM to resolve those kind 
>>> of resource conflicts.
>>>
>>> Regards,
>>> Christian.
>>>
>>> Am 23.04.19 um 17:09 schrieb Zhou, David(ChunMing):
>>>> >3. If we have a ticket we grab a reference to the first BO on the 
>>>> LRU, drop the LRU lock and try to grab the reservation lock with 
>>>> the ticket.
>>>>
>>>> The BO on LRU is already locked by cs user, can it be dropped here 
>>>> by DC user? and then DC user grab its lock with ticket, how does CS 
>>>> grab it again?
>>>>
>>>> If you think waiting in ttm has this risk, how about just adding a 
>>>> wrapper for pin function as below?
>>>> amdgpu_get_pin_bo_timeout()
>>>> {
>>>> do {
>>>> amdgpo_bo_reserve();
>>>> r=amdgpu_bo_pin();
>>>>
>>>> if(!r)
>>>>         break;
>>>> amdgpu_bo_unreserve();
>>>> timeout--;
>>>>
>>>> } while(timeout>0);
>>>>
>>>> }
>>>>
>>>> -------- Original Message --------
>>>> Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy
>>>> From: Christian König
>>>> To: "Zhou, David(ChunMing)" ,"Koenig, Christian" ,"Liang, Prike" 
>>>> ,dri-devel@lists.freedesktop.org
>>>> CC:
>>>>
>>>> Well that's not so easy of hand.
>>>>
>>>> The basic problem here is that when you busy wait at this place you 
>>>> can easily run into situations where application A busy waits for B 
>>>> while B busy waits for A -> deadlock.
>>>>
>>>> So what we need here is the deadlock detection logic of the 
>>>> ww_mutex. To use this we at least need to do the following steps:
>>>>
>>>> 1. Reserve the BO in DC using a ww_mutex ticket (trivial).
>>>>
>>>> 2. If we then run into this EBUSY condition in TTM check if the BO 
>>>> we need memory for (or rather the ww_mutex of its reservation 
>>>> object) has a ticket assigned.
>>>>
>>>> 3. If we have a ticket we grab a reference to the first BO on the 
>>>> LRU, drop the LRU lock and try to grab the reservation lock with 
>>>> the ticket.
>>>>
>>>> 4. If getting the reservation lock with the ticket succeeded we 
>>>> check if the BO is still the first one on the LRU in question (the 
>>>> BO could have moved).
>>>>
>>>> 5. If the BO is still the first one on the LRU in question we try 
>>>> to evict it as we would evict any other BO.
>>>>
>>>> 6. If any of the "If's" above fail we just back off and return -EBUSY.
>>>>
>>>> Steps 2-5 are certainly not trivial, but doable as far as I can see.
>>>>
>>>> Have fun :)
>>>> Christian.
>>>>
>>>> Am 23.04.19 um 15:19 schrieb Zhou, David(ChunMing):
>>>>> How about adding more condition ctx->resv inline to address your 
>>>>> concern? As well as don't wait from same user, shouldn't lead to 
>>>>> deadlock.
>>>>>
>>>>> Otherwise, any other idea?
>>>>>
>>>>> -------- Original Message --------
>>>>> Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy
>>>>> From: Christian König
>>>>> To: "Liang, Prike" ,"Zhou, David(ChunMing)" 
>>>>> ,dri-devel@lists.freedesktop.org
>>>>> CC:
>>>>>
>>>>> Well that is certainly a NAK because it can lead to deadlock in the
>>>>> memory management.
>>>>>
>>>>> You can't just busy wait with all those locks held.
>>>>>
>>>>> Regards,
>>>>> Christian.
>>>>>
>>>>> Am 23.04.19 um 03:45 schrieb Liang, Prike:
>>>>> > Acked-by: Prike Liang <Prike.Liang@amd.com>
>>>>> >
>>>>> > Thanks,
>>>>> > Prike
>>>>> > -----Original Message-----
>>>>> > From: Chunming Zhou <david1.zhou@amd.com>
>>>>> > Sent: Monday, April 22, 2019 6:39 PM
>>>>> > To: dri-devel@lists.freedesktop.org
>>>>> > Cc: Liang, Prike <Prike.Liang@amd.com>; Zhou, David(ChunMing) 
>>>>> <David1.Zhou@amd.com>
>>>>> > Subject: [PATCH] ttm: wait mem space if user allow while gpu busy
>>>>> >
>>>>> > heavy gpu job could occupy memory long time, which could lead to 
>>>>> other user fail to get memory.
>>>>> >
>>>>> > Change-Id: I0b322d98cd76e5ac32b00462bbae8008d76c5e11
>>>>> > Signed-off-by: Chunming Zhou <david1.zhou@amd.com>
>>>>> > ---
>>>>> >   drivers/gpu/drm/ttm/ttm_bo.c | 6 ++++--
>>>>> >   1 file changed, 4 insertions(+), 2 deletions(-)
>>>>> >
>>>>> > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c 
>>>>> b/drivers/gpu/drm/ttm/ttm_bo.c index 7c484729f9b2..6c596cc24bec 100644
>>>>> > --- a/drivers/gpu/drm/ttm/ttm_bo.c
>>>>> > +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>>>>> > @@ -830,8 +830,10 @@ static int ttm_bo_mem_force_space(struct 
>>>>> ttm_buffer_object *bo,
>>>>> >                if (mem->mm_node)
>>>>> >                        break;
>>>>> >                ret = ttm_mem_evict_first(bdev, mem_type, place, 
>>>>> ctx);
>>>>> > -             if (unlikely(ret != 0))
>>>>> > -                     return ret;
>>>>> > +             if (unlikely(ret != 0)) {
>>>>> > +                     if (!ctx || ctx->no_wait_gpu || ret != -EBUSY)
>>>>> > +                             return ret;
>>>>> > +             }
>>>>> >        } while (1);
>>>>> >        mem->mem_type = mem_type;
>>>>> >        return ttm_bo_add_move_fence(bo, man, mem);
>>>>> > --
>>>>> > 2.17.1
>>>>> >
>>>>> > _______________________________________________
>>>>> > dri-devel mailing list
>>>>> > dri-devel@lists.freedesktop.org
>>>>> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> dri-devel mailing list
>>>>> dri-devel@lists.freedesktop.org
>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>>
>>>>
>>>> _______________________________________________
>>>> dri-devel mailing list
>>>> dri-devel@lists.freedesktop.org
>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>
>>
>>
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <p>how about new attached?</p>
    <p><br>
    </p>
    <p>-David<br>
    </p>
    <br>
    <div class="moz-cite-prefix">On 2019年04月24日 15:30, Christian König
      wrote:<br>
    </div>
    <blockquote type="cite"
      cite="mid:8db16fe8-78fe-7f25-4acd-51e37e645f3b@gmail.com">
      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
      <div class="moz-cite-prefix">That would change the semantics of
        ttm_bo_mem_space() and so could change the return code in an
        IOCTL as well. Not a good idea, cause that could have a lot of
        side effects.<br>
        <br>
        Instead in the calling DC code you could check if you get an
        -ENOMEM and then call schedule().<br>
        <br>
        If after the schedule() we see that we have now BOs on the LRU
        we can try again and see if pinning the frame buffer now
        succeeds.<br>
        <br>
        Christian.<br>
        <br>
        Am 24.04.19 um 09:17 schrieb zhoucm1:<br>
      </div>
      <blockquote type="cite"
        cite="mid:b57d94e6-6e20-d809-7ac2-89a39a57a365@amd.com">
        <p>I made a patch as attached.</p>
        <p>I'm not sure how to make patch as your proposal, Could you
          make a patch for that if mine isn't enough?<br>
        </p>
        -David<br>
        <br>
        <div class="moz-cite-prefix">On 2019年04月24日 15:12, Christian
          König wrote:<br>
        </div>
        <blockquote type="cite"
          cite="mid:32f2a1b7-99f8-de9a-799c-f98af308842b@gmail.com">
          <div class="moz-cite-prefix">
            <blockquote type="cite">how about just adding a wrapper for
              pin function as below?</blockquote>
            I considered this as well and don't think it will work
            reliable.<br>
            <br>
            We could use it as a band aid for this specific problem, but
            in general we need to improve the handling in TTM to resolve
            those kind of resource conflicts.<br>
            <br>
            Regards,<br>
            Christian.<br>
            <br>
            Am 23.04.19 um 17:09 schrieb Zhou, David(ChunMing):<br>
          </div>
          <blockquote type="cite"
cite="mid:-jm8957cqk536djh1631fvvv-xx4wzb5q21ak-v8q7rd2l14aq-f68kxr7kbea18a7xceae626b-8s84c4d1mgrg53g0bhq3ahee89h70qrv4ly1-vf5a7d63x4mbquxnfmiehuk2-rwaw2m-qc2chu.1556032141262@email.android.com">
            <meta content="text/html; charset=Windows-1252">
            &gt;3. If we have a ticket we grab a reference to the first
            BO on the LRU, drop the LRU lock and try to grab the
            reservation lock with the ticket.<br>
            <br>
            The BO on LRU is already locked by cs user, can it be
            dropped here by DC user? and then DC user grab its lock with
            ticket, how does CS grab it again?<br>
            <br>
            If you think waiting in ttm has this risk, how about just
            adding a wrapper for pin function as below?<br>
            amdgpu_get_pin_bo_timeout()<br>
            {<br>
            do {<br>
            amdgpo_bo_reserve();<br>
            r=amdgpu_bo_pin();<br>
            <br>
            if(!r)<br>
                    break;<br>
            amdgpu_bo_unreserve();<br>
            timeout--;<br>
            <br>
            } while(timeout&gt;0);<br>
            <br>
            }<br>
            <br>
            -------- Original Message --------<br>
            Subject: Re: [PATCH] ttm: wait mem space if user allow while
            gpu busy<br>
            From: Christian König <br>
            To: "Zhou, David(ChunMing)" ,"Koenig, Christian" ,"Liang,
            Prike" ,<a class="moz-txt-link-abbreviated"
              href="mailto:dri-devel@lists.freedesktop.org"
              moz-do-not-send="true">dri-devel@lists.freedesktop.org</a><br>
            CC: <br>
            <br>
            <div>
              <div class="moz-cite-prefix">Well that's not so easy of
                hand.<br>
                <br>
                The basic problem here is that when you busy wait at
                this place you can easily run into situations where
                application A busy waits for B while B busy waits for A
                -&gt; deadlock.<br>
                <br>
                So what we need here is the deadlock detection logic of
                the ww_mutex. To use this we at least need to do the
                following steps:<br>
                <br>
                1. Reserve the BO in DC using a ww_mutex ticket
                (trivial).<br>
                <br>
                2. If we then run into this EBUSY condition in TTM check
                if the BO we need memory for (or rather the ww_mutex of
                its reservation object) has a ticket assigned.<br>
                <br>
                3. If we have a ticket we grab a reference to the first
                BO on the LRU, drop the LRU lock and try to grab the
                reservation lock with the ticket.<br>
                <br>
                4. If getting the reservation lock with the ticket
                succeeded we check if the BO is still the first one on
                the LRU in question (the BO could have moved).<br>
                <br>
                5. If the BO is still the first one on the LRU in
                question we try to evict it as we would evict any other
                BO.<br>
                <br>
                6. If any of the "If's" above fail we just back off and
                return -EBUSY.<br>
                <br>
                Steps 2-5 are certainly not trivial, but doable as far
                as I can see.<br>
                <br>
                Have fun :)<br>
                Christian.<br>
                <br>
                Am 23.04.19 um 15:19 schrieb Zhou, David(ChunMing):<br>
              </div>
              <blockquote type="cite">
                <meta name="Generator" content="Microsoft Exchange
                  Server">
                <style>
<!--
.EmailQuote
	{margin-left:1pt;
	padding-left:4pt;
	border-left:#800000 2px solid}
-->
</style>
                <div>How about adding more condition ctx-&gt;resv inline
                  to address your concern? As well as don't wait from
                  same user, shouldn't lead to deadlock.<br>
                  <br>
                  Otherwise, any other idea?<br>
                  <br>
                  -------- Original Message --------<br>
                  Subject: Re: [PATCH] ttm: wait mem space if user allow
                  while gpu busy<br>
                  From: Christian König <br>
                  To: "Liang, Prike" ,"Zhou, David(ChunMing)" ,<a
                    class="moz-txt-link-abbreviated"
                    href="mailto:dri-devel@lists.freedesktop.org"
                    moz-do-not-send="true">dri-devel@lists.freedesktop.org</a><br>
                  CC: <br>
                  <br>
                </div>
                <font size="2"><span style="font-size:11pt">
                    <div class="PlainText">Well that is certainly a NAK
                      because it can lead to deadlock in the <br>
                      memory management.<br>
                      <br>
                      You can't just busy wait with all those locks
                      held.<br>
                      <br>
                      Regards,<br>
                      Christian.<br>
                      <br>
                      Am 23.04.19 um 03:45 schrieb Liang, Prike:<br>
                      &gt; Acked-by: Prike Liang <a
                        class="moz-txt-link-rfc2396E"
                        href="mailto:Prike.Liang@amd.com"
                        moz-do-not-send="true">
                        &lt;Prike.Liang@amd.com&gt;</a><br>
                      &gt;<br>
                      &gt; Thanks,<br>
                      &gt; Prike<br>
                      &gt; -----Original Message-----<br>
                      &gt; From: Chunming Zhou <a
                        class="moz-txt-link-rfc2396E"
                        href="mailto:david1.zhou@amd.com"
                        moz-do-not-send="true">
                        &lt;david1.zhou@amd.com&gt;</a><br>
                      &gt; Sent: Monday, April 22, 2019 6:39 PM<br>
                      &gt; To: <a class="moz-txt-link-abbreviated"
                        href="mailto:dri-devel@lists.freedesktop.org"
                        moz-do-not-send="true">
                        dri-devel@lists.freedesktop.org</a><br>
                      &gt; Cc: Liang, Prike <a
                        class="moz-txt-link-rfc2396E"
                        href="mailto:Prike.Liang@amd.com"
                        moz-do-not-send="true">
                        &lt;Prike.Liang@amd.com&gt;</a>; Zhou,
                      David(ChunMing) <a class="moz-txt-link-rfc2396E"
                        href="mailto:David1.Zhou@amd.com"
                        moz-do-not-send="true">
                        &lt;David1.Zhou@amd.com&gt;</a><br>
                      &gt; Subject: [PATCH] ttm: wait mem space if user
                      allow while gpu busy<br>
                      &gt;<br>
                      &gt; heavy gpu job could occupy memory long time,
                      which could lead to other user fail to get memory.<br>
                      &gt;<br>
                      &gt; Change-Id:
                      I0b322d98cd76e5ac32b00462bbae8008d76c5e11<br>
                      &gt; Signed-off-by: Chunming Zhou <a
                        class="moz-txt-link-rfc2396E"
                        href="mailto:david1.zhou@amd.com"
                        moz-do-not-send="true">
                        &lt;david1.zhou@amd.com&gt;</a><br>
                      &gt; ---<br>
                      &gt;   drivers/gpu/drm/ttm/ttm_bo.c | 6 ++++--<br>
                      &gt;   1 file changed, 4 insertions(+), 2
                      deletions(-)<br>
                      &gt;<br>
                      &gt; diff --git a/drivers/gpu/drm/ttm/ttm_bo.c
                      b/drivers/gpu/drm/ttm/ttm_bo.c index
                      7c484729f9b2..6c596cc24bec 100644<br>
                      &gt; --- a/drivers/gpu/drm/ttm/ttm_bo.c<br>
                      &gt; +++ b/drivers/gpu/drm/ttm/ttm_bo.c<br>
                      &gt; @@ -830,8 +830,10 @@ static int
                      ttm_bo_mem_force_space(struct ttm_buffer_object
                      *bo,<br>
                      &gt;                if (mem-&gt;mm_node)<br>
                      &gt;                        break;<br>
                      &gt;                ret =
                      ttm_mem_evict_first(bdev, mem_type, place, ctx);<br>
                      &gt; -             if (unlikely(ret != 0))<br>
                      &gt; -                     return ret;<br>
                      &gt; +             if (unlikely(ret != 0)) {<br>
                      &gt; +                     if (!ctx ||
                      ctx-&gt;no_wait_gpu || ret != -EBUSY)<br>
                      &gt; +                             return ret;<br>
                      &gt; +             }<br>
                      &gt;        } while (1);<br>
                      &gt;        mem-&gt;mem_type = mem_type;<br>
                      &gt;        return ttm_bo_add_move_fence(bo, man,
                      mem);<br>
                      &gt; --<br>
                      &gt; 2.17.1<br>
                      &gt;<br>
                      &gt;
                      _______________________________________________<br>
                      &gt; dri-devel mailing list<br>
                      &gt; <a class="moz-txt-link-abbreviated"
                        href="mailto:dri-devel@lists.freedesktop.org"
                        moz-do-not-send="true">
                        dri-devel@lists.freedesktop.org</a><br>
                      &gt; <a
                        href="https://lists.freedesktop.org/mailman/listinfo/dri-devel"
                        moz-do-not-send="true">https://lists.freedesktop.org/mailman/listinfo/dri-devel</a><br>
                      <br>
                    </div>
                  </span></font><br>
                <fieldset class="mimeAttachmentHeader"></fieldset>
                <pre class="moz-quote-pre">_______________________________________________
dri-devel mailing list
<a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true">dri-devel@lists.freedesktop.org</a>
<a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/dri-devel" moz-do-not-send="true">https://lists.freedesktop.org/mailman/listinfo/dri-devel</a></pre>
              </blockquote>
              <br>
            </div>
            <br>
            <fieldset class="mimeAttachmentHeader"></fieldset>
            <pre class="moz-quote-pre" wrap="">_______________________________________________
dri-devel mailing list
<a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true">dri-devel@lists.freedesktop.org</a>
<a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/dri-devel" moz-do-not-send="true">https://lists.freedesktop.org/mailman/listinfo/dri-devel</a></pre>
          </blockquote>
          <br>
        </blockquote>
        <br>
        <br>
        <fieldset class="mimeAttachmentHeader"></fieldset>
        <pre class="moz-quote-pre" wrap="">_______________________________________________
dri-devel mailing list
<a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true">dri-devel@lists.freedesktop.org</a>
<a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/dri-devel" moz-do-not-send="true">https://lists.freedesktop.org/mailman/listinfo/dri-devel</a></pre>
      </blockquote>
      <br>
    </blockquote>
    <br>
  </body>
</html>
From 61174dd44618bbfcb035484a44435b7ed4df86a5 Mon Sep 17 00:00:00 2001
From: Chunming Zhou <david1.zhou@amd.com>
Date: Wed, 24 Apr 2019 11:35:29 +0800
Subject: [PATCH] drm/amdgpu: wait memory idle when fb is pinned fail

heavy gpu job execution could occupy memory long time, which
could lead to other user fail to get memory.

Change-Id: I3857c4b99859c9d2f354cf2c486dbacb8520d0ed
Signed-off-by: Chunming Zhou <david1.zhou@amd.com>
---
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index a5cacf846e1b..8577b629a640 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -4093,6 +4093,8 @@ static const struct drm_plane_funcs dm_plane_funcs = {
 	.atomic_destroy_state = dm_drm_plane_destroy_state,
 };
 
+/* 100s default to wait for pin fb */
+#define DM_PIN_MEM_TIMEOUT 100000000000ULL
 static int dm_plane_helper_prepare_fb(struct drm_plane *plane,
 				      struct drm_plane_state *new_state)
 {
@@ -4103,6 +4105,7 @@ static int dm_plane_helper_prepare_fb(struct drm_plane *plane,
 	struct dm_plane_state *dm_plane_state_new, *dm_plane_state_old;
 	uint64_t tiling_flags;
 	uint32_t domain;
+	u64 timeout = nsecs_to_jiffies64(DM_PIN_MEM_TIMEOUT);
 	int r;
 
 	dm_plane_state_old = to_dm_plane_state(plane->state);
@@ -4117,6 +4120,8 @@ static int dm_plane_helper_prepare_fb(struct drm_plane *plane,
 	obj = new_state->fb->obj[0];
 	rbo = gem_to_amdgpu_bo(obj);
 	adev = amdgpu_ttm_adev(rbo->tbo.bdev);
+
+retry:
 	r = amdgpu_bo_reserve(rbo, false);
 	if (unlikely(r != 0))
 		return r;
@@ -4131,8 +4136,18 @@ static int dm_plane_helper_prepare_fb(struct drm_plane *plane,
 		if (r != -ERESTARTSYS)
 			DRM_ERROR("Failed to pin framebuffer with error %d\n", r);
 		amdgpu_bo_unreserve(rbo);
+		if (r == -ENOMEM) {
+			set_current_state(TASK_INTERRUPTIBLE);
+			if (timeout == 0)
+				return r;
+			if (signal_pending(current))
+				return r;
+			timeout = schedule_timeout(timeout);
+			goto retry;
+		}
 		return r;
 	}
+        __set_current_state(TASK_RUNNING);
 
 	r = amdgpu_ttm_alloc_gart(&rbo->tbo);
 	if (unlikely(r != 0)) {
Daniel Vetter April 24, 2019, 8:01 a.m. UTC | #10
On Tue, Apr 23, 2019 at 4:42 PM Christian König
<ckoenig.leichtzumerken@gmail.com> wrote:
>
> Well that's not so easy of hand.
>
> The basic problem here is that when you busy wait at this place you can easily run into situations where application A busy waits for B while B busy waits for A -> deadlock.
>
> So what we need here is the deadlock detection logic of the ww_mutex. To use this we at least need to do the following steps:
>
> 1. Reserve the BO in DC using a ww_mutex ticket (trivial).
>
> 2. If we then run into this EBUSY condition in TTM check if the BO we need memory for (or rather the ww_mutex of its reservation object) has a ticket assigned.
>
> 3. If we have a ticket we grab a reference to the first BO on the LRU, drop the LRU lock and try to grab the reservation lock with the ticket.
>
> 4. If getting the reservation lock with the ticket succeeded we check if the BO is still the first one on the LRU in question (the BO could have moved).

I don't think you actually need this check. Once you're in this slow
reclaim mode all hope for performance is pretty much lost (you're
thrashin vram terribly), forward progress matters. Also, less code :-)

> 5. If the BO is still the first one on the LRU in question we try to evict it as we would evict any other BO.
>
> 6. If any of the "If's" above fail we just back off and return -EBUSY.

Another idea I pondered (but never implemented) is a slow reclaim lru
lock. Essentially there'd be two ways to walk the lru and evict bo:

- fast path: spinlock + trylock, like today

- slow path: ww_mutex lru lock, plus every bo is reserved (nested
within that ww_mutex lru lock) with a full ww_mutex_lock. Guaranteed
forward progress.

Transition would be that as soon as someone hits an EBUSY, they set
the slow reclaim flag (while holding the quick reclaim spinlock
quickly, which will drain anyone still stuck in fast reclaim path).
Everytime fast reclaim acquires the spinlock it needs to check for the
slow reclaim flag, and if that's set, fall back to slow reclaim.

Transitioning out of slow reclaim would only happen once the thread
(with it's ww ticket) that hit the EBUSY has completed whatever it was
trying to do (either successfully, or failed because even evicting
everyone else didn't give you enough vram). Tricky part here is making
sure threads still in slow reclaim don't blow up if we switch back.
Since only ever one thread can be actually doing slow reclaim
(everyone is serialized on the single ww mutex lru lock) should be
doable by checking for the slow reclaim conditiona once you have the
lru ww_mutex and if the slow reclaim condition is lifted, switch back
to fast reclaim.

The slow reclaim conditiona might also need to be a full reference
count, to handle multiple threads hitting EBUSY/slow reclaim without
the book-keeping getting all confused.

Upshot of this is that it's guranteeing forward progress, but the perf
cliff might be too steep if this happens too often. You might need to
round it off with 1-2 retries when you hit EBUSY, before forcing slow
reclaim.
-Daniel

> Steps 2-5 are certainly not trivial, but doable as far as I can see.
>
> Have fun :)
> Christian.
>
> Am 23.04.19 um 15:19 schrieb Zhou, David(ChunMing):
>
> How about adding more condition ctx->resv inline to address your concern? As well as don't wait from same user, shouldn't lead to deadlock.
>
> Otherwise, any other idea?
>
> -------- Original Message --------
> Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy
> From: Christian König
> To: "Liang, Prike" ,"Zhou, David(ChunMing)" ,dri-devel@lists.freedesktop.org
> CC:
>
> Well that is certainly a NAK because it can lead to deadlock in the
> memory management.
>
> You can't just busy wait with all those locks held.
>
> Regards,
> Christian.
>
> Am 23.04.19 um 03:45 schrieb Liang, Prike:
> > Acked-by: Prike Liang <Prike.Liang@amd.com>
> >
> > Thanks,
> > Prike
> > -----Original Message-----
> > From: Chunming Zhou <david1.zhou@amd.com>
> > Sent: Monday, April 22, 2019 6:39 PM
> > To: dri-devel@lists.freedesktop.org
> > Cc: Liang, Prike <Prike.Liang@amd.com>; Zhou, David(ChunMing) <David1.Zhou@amd.com>
> > Subject: [PATCH] ttm: wait mem space if user allow while gpu busy
> >
> > heavy gpu job could occupy memory long time, which could lead to other user fail to get memory.
> >
> > Change-Id: I0b322d98cd76e5ac32b00462bbae8008d76c5e11
> > Signed-off-by: Chunming Zhou <david1.zhou@amd.com>
> > ---
> >   drivers/gpu/drm/ttm/ttm_bo.c | 6 ++++--
> >   1 file changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 7c484729f9b2..6c596cc24bec 100644
> > --- a/drivers/gpu/drm/ttm/ttm_bo.c
> > +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> > @@ -830,8 +830,10 @@ static int ttm_bo_mem_force_space(struct ttm_buffer_object *bo,
> >                if (mem->mm_node)
> >                        break;
> >                ret = ttm_mem_evict_first(bdev, mem_type, place, ctx);
> > -             if (unlikely(ret != 0))
> > -                     return ret;
> > +             if (unlikely(ret != 0)) {
> > +                     if (!ctx || ctx->no_wait_gpu || ret != -EBUSY)
> > +                             return ret;
> > +             }
> >        } while (1);
> >        mem->mem_type = mem_type;
> >        return ttm_bo_add_move_fence(bo, man, mem);
> > --
> > 2.17.1
> >
> > _______________________________________________
> > dri-devel mailing list
> > dri-devel@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
Christian König April 24, 2019, 8:06 a.m. UTC | #11
Am 24.04.19 um 10:01 schrieb Daniel Vetter:
> On Tue, Apr 23, 2019 at 4:42 PM Christian König
> <ckoenig.leichtzumerken@gmail.com> wrote:
>> Well that's not so easy of hand.
>>
>> The basic problem here is that when you busy wait at this place you can easily run into situations where application A busy waits for B while B busy waits for A -> deadlock.
>>
>> So what we need here is the deadlock detection logic of the ww_mutex. To use this we at least need to do the following steps:
>>
>> 1. Reserve the BO in DC using a ww_mutex ticket (trivial).
>>
>> 2. If we then run into this EBUSY condition in TTM check if the BO we need memory for (or rather the ww_mutex of its reservation object) has a ticket assigned.
>>
>> 3. If we have a ticket we grab a reference to the first BO on the LRU, drop the LRU lock and try to grab the reservation lock with the ticket.
>>
>> 4. If getting the reservation lock with the ticket succeeded we check if the BO is still the first one on the LRU in question (the BO could have moved).
> I don't think you actually need this check. Once you're in this slow
> reclaim mode all hope for performance is pretty much lost (you're
> thrashin vram terribly), forward progress matters. Also, less code :-)

This step is not for performance, but for correctness.

When you drop the LRU lock and grab the ww_mutex in the slow path you 
need to double check that this BO wasn't evicted by somebody else in the 
meantime.

>> 5. If the BO is still the first one on the LRU in question we try to evict it as we would evict any other BO.
>>
>> 6. If any of the "If's" above fail we just back off and return -EBUSY.
> Another idea I pondered (but never implemented) is a slow reclaim lru
> lock. Essentially there'd be two ways to walk the lru and evict bo:
>
> - fast path: spinlock + trylock, like today
>
> - slow path: ww_mutex lru lock, plus every bo is reserved (nested
> within that ww_mutex lru lock) with a full ww_mutex_lock. Guaranteed
> forward progress.

Of hand I don't see any advantage to this. So what is the benefit?

Regards,
Christian.

>
> Transition would be that as soon as someone hits an EBUSY, they set
> the slow reclaim flag (while holding the quick reclaim spinlock
> quickly, which will drain anyone still stuck in fast reclaim path).
> Everytime fast reclaim acquires the spinlock it needs to check for the
> slow reclaim flag, and if that's set, fall back to slow reclaim.
>
> Transitioning out of slow reclaim would only happen once the thread
> (with it's ww ticket) that hit the EBUSY has completed whatever it was
> trying to do (either successfully, or failed because even evicting
> everyone else didn't give you enough vram). Tricky part here is making
> sure threads still in slow reclaim don't blow up if we switch back.
> Since only ever one thread can be actually doing slow reclaim
> (everyone is serialized on the single ww mutex lru lock) should be
> doable by checking for the slow reclaim conditiona once you have the
> lru ww_mutex and if the slow reclaim condition is lifted, switch back
> to fast reclaim.
>
> The slow reclaim conditiona might also need to be a full reference
> count, to handle multiple threads hitting EBUSY/slow reclaim without
> the book-keeping getting all confused.
>
> Upshot of this is that it's guranteeing forward progress, but the perf
> cliff might be too steep if this happens too often. You might need to
> round it off with 1-2 retries when you hit EBUSY, before forcing slow
> reclaim.
> -Daniel
>
>> Steps 2-5 are certainly not trivial, but doable as far as I can see.
>>
>> Have fun :)
>> Christian.
>>
>> Am 23.04.19 um 15:19 schrieb Zhou, David(ChunMing):
>>
>> How about adding more condition ctx->resv inline to address your concern? As well as don't wait from same user, shouldn't lead to deadlock.
>>
>> Otherwise, any other idea?
>>
>> -------- Original Message --------
>> Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy
>> From: Christian König
>> To: "Liang, Prike" ,"Zhou, David(ChunMing)" ,dri-devel@lists.freedesktop.org
>> CC:
>>
>> Well that is certainly a NAK because it can lead to deadlock in the
>> memory management.
>>
>> You can't just busy wait with all those locks held.
>>
>> Regards,
>> Christian.
>>
>> Am 23.04.19 um 03:45 schrieb Liang, Prike:
>>> Acked-by: Prike Liang <Prike.Liang@amd.com>
>>>
>>> Thanks,
>>> Prike
>>> -----Original Message-----
>>> From: Chunming Zhou <david1.zhou@amd.com>
>>> Sent: Monday, April 22, 2019 6:39 PM
>>> To: dri-devel@lists.freedesktop.org
>>> Cc: Liang, Prike <Prike.Liang@amd.com>; Zhou, David(ChunMing) <David1.Zhou@amd.com>
>>> Subject: [PATCH] ttm: wait mem space if user allow while gpu busy
>>>
>>> heavy gpu job could occupy memory long time, which could lead to other user fail to get memory.
>>>
>>> Change-Id: I0b322d98cd76e5ac32b00462bbae8008d76c5e11
>>> Signed-off-by: Chunming Zhou <david1.zhou@amd.com>
>>> ---
>>>    drivers/gpu/drm/ttm/ttm_bo.c | 6 ++++--
>>>    1 file changed, 4 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 7c484729f9b2..6c596cc24bec 100644
>>> --- a/drivers/gpu/drm/ttm/ttm_bo.c
>>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>>> @@ -830,8 +830,10 @@ static int ttm_bo_mem_force_space(struct ttm_buffer_object *bo,
>>>                 if (mem->mm_node)
>>>                         break;
>>>                 ret = ttm_mem_evict_first(bdev, mem_type, place, ctx);
>>> -             if (unlikely(ret != 0))
>>> -                     return ret;
>>> +             if (unlikely(ret != 0)) {
>>> +                     if (!ctx || ctx->no_wait_gpu || ret != -EBUSY)
>>> +                             return ret;
>>> +             }
>>>         } while (1);
>>>         mem->mem_type = mem_type;
>>>         return ttm_bo_add_move_fence(bo, man, mem);
>>> --
>>> 2.17.1
>>>
>>> _______________________________________________
>>> dri-devel mailing list
>>> dri-devel@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>
>>
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
>
Christian König April 24, 2019, 8:07 a.m. UTC | #12
This is used in a work item, so you don't need to check for signals.

And checking if the LRU is populated is mandatory here or otherwise you 
can run into an endless loop.

Christian.

Am 24.04.19 um 09:59 schrieb zhoucm1:
>
> how about new attached?
>
>
> -David
>
>
> On 2019年04月24日 15:30, Christian König wrote:
>> That would change the semantics of ttm_bo_mem_space() and so could 
>> change the return code in an IOCTL as well. Not a good idea, cause 
>> that could have a lot of side effects.
>>
>> Instead in the calling DC code you could check if you get an -ENOMEM 
>> and then call schedule().
>>
>> If after the schedule() we see that we have now BOs on the LRU we can 
>> try again and see if pinning the frame buffer now succeeds.
>>
>> Christian.
>>
>> Am 24.04.19 um 09:17 schrieb zhoucm1:
>>>
>>> I made a patch as attached.
>>>
>>> I'm not sure how to make patch as your proposal, Could you make a 
>>> patch for that if mine isn't enough?
>>>
>>> -David
>>>
>>> On 2019年04月24日 15:12, Christian König wrote:
>>>>> how about just adding a wrapper for pin function as below?
>>>> I considered this as well and don't think it will work reliable.
>>>>
>>>> We could use it as a band aid for this specific problem, but in 
>>>> general we need to improve the handling in TTM to resolve those 
>>>> kind of resource conflicts.
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>> Am 23.04.19 um 17:09 schrieb Zhou, David(ChunMing):
>>>>> >3. If we have a ticket we grab a reference to the first BO on the 
>>>>> LRU, drop the LRU lock and try to grab the reservation lock with 
>>>>> the ticket.
>>>>>
>>>>> The BO on LRU is already locked by cs user, can it be dropped here 
>>>>> by DC user? and then DC user grab its lock with ticket, how does 
>>>>> CS grab it again?
>>>>>
>>>>> If you think waiting in ttm has this risk, how about just adding a 
>>>>> wrapper for pin function as below?
>>>>> amdgpu_get_pin_bo_timeout()
>>>>> {
>>>>> do {
>>>>> amdgpo_bo_reserve();
>>>>> r=amdgpu_bo_pin();
>>>>>
>>>>> if(!r)
>>>>>         break;
>>>>> amdgpu_bo_unreserve();
>>>>> timeout--;
>>>>>
>>>>> } while(timeout>0);
>>>>>
>>>>> }
>>>>>
>>>>> -------- Original Message --------
>>>>> Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy
>>>>> From: Christian König
>>>>> To: "Zhou, David(ChunMing)" ,"Koenig, Christian" ,"Liang, Prike" 
>>>>> ,dri-devel@lists.freedesktop.org
>>>>> CC:
>>>>>
>>>>> Well that's not so easy of hand.
>>>>>
>>>>> The basic problem here is that when you busy wait at this place 
>>>>> you can easily run into situations where application A busy waits 
>>>>> for B while B busy waits for A -> deadlock.
>>>>>
>>>>> So what we need here is the deadlock detection logic of the 
>>>>> ww_mutex. To use this we at least need to do the following steps:
>>>>>
>>>>> 1. Reserve the BO in DC using a ww_mutex ticket (trivial).
>>>>>
>>>>> 2. If we then run into this EBUSY condition in TTM check if the BO 
>>>>> we need memory for (or rather the ww_mutex of its reservation 
>>>>> object) has a ticket assigned.
>>>>>
>>>>> 3. If we have a ticket we grab a reference to the first BO on the 
>>>>> LRU, drop the LRU lock and try to grab the reservation lock with 
>>>>> the ticket.
>>>>>
>>>>> 4. If getting the reservation lock with the ticket succeeded we 
>>>>> check if the BO is still the first one on the LRU in question (the 
>>>>> BO could have moved).
>>>>>
>>>>> 5. If the BO is still the first one on the LRU in question we try 
>>>>> to evict it as we would evict any other BO.
>>>>>
>>>>> 6. If any of the "If's" above fail we just back off and return -EBUSY.
>>>>>
>>>>> Steps 2-5 are certainly not trivial, but doable as far as I can see.
>>>>>
>>>>> Have fun :)
>>>>> Christian.
>>>>>
>>>>> Am 23.04.19 um 15:19 schrieb Zhou, David(ChunMing):
>>>>>> How about adding more condition ctx->resv inline to address your 
>>>>>> concern? As well as don't wait from same user, shouldn't lead to 
>>>>>> deadlock.
>>>>>>
>>>>>> Otherwise, any other idea?
>>>>>>
>>>>>> -------- Original Message --------
>>>>>> Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy
>>>>>> From: Christian König
>>>>>> To: "Liang, Prike" ,"Zhou, David(ChunMing)" 
>>>>>> ,dri-devel@lists.freedesktop.org
>>>>>> CC:
>>>>>>
>>>>>> Well that is certainly a NAK because it can lead to deadlock in the
>>>>>> memory management.
>>>>>>
>>>>>> You can't just busy wait with all those locks held.
>>>>>>
>>>>>> Regards,
>>>>>> Christian.
>>>>>>
>>>>>> Am 23.04.19 um 03:45 schrieb Liang, Prike:
>>>>>> > Acked-by: Prike Liang <Prike.Liang@amd.com>
>>>>>> >
>>>>>> > Thanks,
>>>>>> > Prike
>>>>>> > -----Original Message-----
>>>>>> > From: Chunming Zhou <david1.zhou@amd.com>
>>>>>> > Sent: Monday, April 22, 2019 6:39 PM
>>>>>> > To: dri-devel@lists.freedesktop.org
>>>>>> > Cc: Liang, Prike <Prike.Liang@amd.com>; Zhou, David(ChunMing) 
>>>>>> <David1.Zhou@amd.com>
>>>>>> > Subject: [PATCH] ttm: wait mem space if user allow while gpu busy
>>>>>> >
>>>>>> > heavy gpu job could occupy memory long time, which could lead 
>>>>>> to other user fail to get memory.
>>>>>> >
>>>>>> > Change-Id: I0b322d98cd76e5ac32b00462bbae8008d76c5e11
>>>>>> > Signed-off-by: Chunming Zhou <david1.zhou@amd.com>
>>>>>> > ---
>>>>>> >   drivers/gpu/drm/ttm/ttm_bo.c | 6 ++++--
>>>>>> >   1 file changed, 4 insertions(+), 2 deletions(-)
>>>>>> >
>>>>>> > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c 
>>>>>> b/drivers/gpu/drm/ttm/ttm_bo.c index 7c484729f9b2..6c596cc24bec 
>>>>>> 100644
>>>>>> > --- a/drivers/gpu/drm/ttm/ttm_bo.c
>>>>>> > +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>>>>>> > @@ -830,8 +830,10 @@ static int ttm_bo_mem_force_space(struct 
>>>>>> ttm_buffer_object *bo,
>>>>>> >                if (mem->mm_node)
>>>>>> >                        break;
>>>>>> >                ret = ttm_mem_evict_first(bdev, mem_type, place, 
>>>>>> ctx);
>>>>>> > -             if (unlikely(ret != 0))
>>>>>> > -                     return ret;
>>>>>> > +             if (unlikely(ret != 0)) {
>>>>>> > +                     if (!ctx || ctx->no_wait_gpu || ret != 
>>>>>> -EBUSY)
>>>>>> > +                             return ret;
>>>>>> > +             }
>>>>>> >        } while (1);
>>>>>> >        mem->mem_type = mem_type;
>>>>>> >        return ttm_bo_add_move_fence(bo, man, mem);
>>>>>> > --
>>>>>> > 2.17.1
>>>>>> >
>>>>>> > _______________________________________________
>>>>>> > dri-devel mailing list
>>>>>> > dri-devel@lists.freedesktop.org
>>>>>> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> dri-devel mailing list
>>>>>> dri-devel@lists.freedesktop.org
>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> dri-devel mailing list
>>>>> dri-devel@lists.freedesktop.org
>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>>
>>>
>>>
>>> _______________________________________________
>>> dri-devel mailing list
>>> dri-devel@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>
>
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <div class="moz-cite-prefix">This is used in a work item, so you
      don't need to check for signals.<br>
      <br>
      And checking if the LRU is populated is mandatory here or
      otherwise you can run into an endless loop.<br>
      <br>
      Christian.<br>
      <br>
      Am 24.04.19 um 09:59 schrieb zhoucm1:<br>
    </div>
    <blockquote type="cite"
      cite="mid:26341685-6e73-55c4-2609-52616d92c06a@amd.com">
      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
      <p>how about new attached?</p>
      <p><br>
      </p>
      <p>-David<br>
      </p>
      <br>
      <div class="moz-cite-prefix">On 2019年04月24日 15:30, Christian König
        wrote:<br>
      </div>
      <blockquote type="cite"
        cite="mid:8db16fe8-78fe-7f25-4acd-51e37e645f3b@gmail.com">
        <meta http-equiv="Content-Type" content="text/html;
          charset=UTF-8">
        <div class="moz-cite-prefix">That would change the semantics of
          ttm_bo_mem_space() and so could change the return code in an
          IOCTL as well. Not a good idea, cause that could have a lot of
          side effects.<br>
          <br>
          Instead in the calling DC code you could check if you get an
          -ENOMEM and then call schedule().<br>
          <br>
          If after the schedule() we see that we have now BOs on the LRU
          we can try again and see if pinning the frame buffer now
          succeeds.<br>
          <br>
          Christian.<br>
          <br>
          Am 24.04.19 um 09:17 schrieb zhoucm1:<br>
        </div>
        <blockquote type="cite"
          cite="mid:b57d94e6-6e20-d809-7ac2-89a39a57a365@amd.com">
          <p>I made a patch as attached.</p>
          <p>I'm not sure how to make patch as your proposal, Could you
            make a patch for that if mine isn't enough?<br>
          </p>
          -David<br>
          <br>
          <div class="moz-cite-prefix">On 2019年04月24日 15:12, Christian
            König wrote:<br>
          </div>
          <blockquote type="cite"
            cite="mid:32f2a1b7-99f8-de9a-799c-f98af308842b@gmail.com">
            <div class="moz-cite-prefix">
              <blockquote type="cite">how about just adding a wrapper
                for pin function as below?</blockquote>
              I considered this as well and don't think it will work
              reliable.<br>
              <br>
              We could use it as a band aid for this specific problem,
              but in general we need to improve the handling in TTM to
              resolve those kind of resource conflicts.<br>
              <br>
              Regards,<br>
              Christian.<br>
              <br>
              Am 23.04.19 um 17:09 schrieb Zhou, David(ChunMing):<br>
            </div>
            <blockquote type="cite"
cite="mid:-jm8957cqk536djh1631fvvv-xx4wzb5q21ak-v8q7rd2l14aq-f68kxr7kbea18a7xceae626b-8s84c4d1mgrg53g0bhq3ahee89h70qrv4ly1-vf5a7d63x4mbquxnfmiehuk2-rwaw2m-qc2chu.1556032141262@email.android.com">
              <meta content="text/html; charset=Windows-1252">
              &gt;3. If we have a ticket we grab a reference to the
              first BO on the LRU, drop the LRU lock and try to grab the
              reservation lock with the ticket.<br>
              <br>
              The BO on LRU is already locked by cs user, can it be
              dropped here by DC user? and then DC user grab its lock
              with ticket, how does CS grab it again?<br>
              <br>
              If you think waiting in ttm has this risk, how about just
              adding a wrapper for pin function as below?<br>
              amdgpu_get_pin_bo_timeout()<br>
              {<br>
              do {<br>
              amdgpo_bo_reserve();<br>
              r=amdgpu_bo_pin();<br>
              <br>
              if(!r)<br>
                      break;<br>
              amdgpu_bo_unreserve();<br>
              timeout--;<br>
              <br>
              } while(timeout&gt;0);<br>
              <br>
              }<br>
              <br>
              -------- Original Message --------<br>
              Subject: Re: [PATCH] ttm: wait mem space if user allow
              while gpu busy<br>
              From: Christian König <br>
              To: "Zhou, David(ChunMing)" ,"Koenig, Christian" ,"Liang,
              Prike" ,<a class="moz-txt-link-abbreviated"
                href="mailto:dri-devel@lists.freedesktop.org"
                moz-do-not-send="true">dri-devel@lists.freedesktop.org</a><br>
              CC: <br>
              <br>
              <div>
                <div class="moz-cite-prefix">Well that's not so easy of
                  hand.<br>
                  <br>
                  The basic problem here is that when you busy wait at
                  this place you can easily run into situations where
                  application A busy waits for B while B busy waits for
                  A -&gt; deadlock.<br>
                  <br>
                  So what we need here is the deadlock detection logic
                  of the ww_mutex. To use this we at least need to do
                  the following steps:<br>
                  <br>
                  1. Reserve the BO in DC using a ww_mutex ticket
                  (trivial).<br>
                  <br>
                  2. If we then run into this EBUSY condition in TTM
                  check if the BO we need memory for (or rather the
                  ww_mutex of its reservation object) has a ticket
                  assigned.<br>
                  <br>
                  3. If we have a ticket we grab a reference to the
                  first BO on the LRU, drop the LRU lock and try to grab
                  the reservation lock with the ticket.<br>
                  <br>
                  4. If getting the reservation lock with the ticket
                  succeeded we check if the BO is still the first one on
                  the LRU in question (the BO could have moved).<br>
                  <br>
                  5. If the BO is still the first one on the LRU in
                  question we try to evict it as we would evict any
                  other BO.<br>
                  <br>
                  6. If any of the "If's" above fail we just back off
                  and return -EBUSY.<br>
                  <br>
                  Steps 2-5 are certainly not trivial, but doable as far
                  as I can see.<br>
                  <br>
                  Have fun :)<br>
                  Christian.<br>
                  <br>
                  Am 23.04.19 um 15:19 schrieb Zhou, David(ChunMing):<br>
                </div>
                <blockquote type="cite">
                  <meta name="Generator" content="Microsoft Exchange
                    Server">
                  <style>
<!--
.EmailQuote
	{margin-left:1pt;
	padding-left:4pt;
	border-left:#800000 2px solid}
-->
</style>
                  <div>How about adding more condition ctx-&gt;resv
                    inline to address your concern? As well as don't
                    wait from same user, shouldn't lead to deadlock.<br>
                    <br>
                    Otherwise, any other idea?<br>
                    <br>
                    -------- Original Message --------<br>
                    Subject: Re: [PATCH] ttm: wait mem space if user
                    allow while gpu busy<br>
                    From: Christian König <br>
                    To: "Liang, Prike" ,"Zhou, David(ChunMing)" ,<a
                      class="moz-txt-link-abbreviated"
                      href="mailto:dri-devel@lists.freedesktop.org"
                      moz-do-not-send="true">dri-devel@lists.freedesktop.org</a><br>
                    CC: <br>
                    <br>
                  </div>
                  <font size="2"><span style="font-size:11pt">
                      <div class="PlainText">Well that is certainly a
                        NAK because it can lead to deadlock in the <br>
                        memory management.<br>
                        <br>
                        You can't just busy wait with all those locks
                        held.<br>
                        <br>
                        Regards,<br>
                        Christian.<br>
                        <br>
                        Am 23.04.19 um 03:45 schrieb Liang, Prike:<br>
                        &gt; Acked-by: Prike Liang <a
                          class="moz-txt-link-rfc2396E"
                          href="mailto:Prike.Liang@amd.com"
                          moz-do-not-send="true">
                          &lt;Prike.Liang@amd.com&gt;</a><br>
                        &gt;<br>
                        &gt; Thanks,<br>
                        &gt; Prike<br>
                        &gt; -----Original Message-----<br>
                        &gt; From: Chunming Zhou <a
                          class="moz-txt-link-rfc2396E"
                          href="mailto:david1.zhou@amd.com"
                          moz-do-not-send="true">
                          &lt;david1.zhou@amd.com&gt;</a><br>
                        &gt; Sent: Monday, April 22, 2019 6:39 PM<br>
                        &gt; To: <a class="moz-txt-link-abbreviated"
                          href="mailto:dri-devel@lists.freedesktop.org"
                          moz-do-not-send="true">
                          dri-devel@lists.freedesktop.org</a><br>
                        &gt; Cc: Liang, Prike <a
                          class="moz-txt-link-rfc2396E"
                          href="mailto:Prike.Liang@amd.com"
                          moz-do-not-send="true">
                          &lt;Prike.Liang@amd.com&gt;</a>; Zhou,
                        David(ChunMing) <a
                          class="moz-txt-link-rfc2396E"
                          href="mailto:David1.Zhou@amd.com"
                          moz-do-not-send="true">
                          &lt;David1.Zhou@amd.com&gt;</a><br>
                        &gt; Subject: [PATCH] ttm: wait mem space if
                        user allow while gpu busy<br>
                        &gt;<br>
                        &gt; heavy gpu job could occupy memory long
                        time, which could lead to other user fail to get
                        memory.<br>
                        &gt;<br>
                        &gt; Change-Id:
                        I0b322d98cd76e5ac32b00462bbae8008d76c5e11<br>
                        &gt; Signed-off-by: Chunming Zhou <a
                          class="moz-txt-link-rfc2396E"
                          href="mailto:david1.zhou@amd.com"
                          moz-do-not-send="true">
                          &lt;david1.zhou@amd.com&gt;</a><br>
                        &gt; ---<br>
                        &gt;   drivers/gpu/drm/ttm/ttm_bo.c | 6 ++++--<br>
                        &gt;   1 file changed, 4 insertions(+), 2
                        deletions(-)<br>
                        &gt;<br>
                        &gt; diff --git a/drivers/gpu/drm/ttm/ttm_bo.c
                        b/drivers/gpu/drm/ttm/ttm_bo.c index
                        7c484729f9b2..6c596cc24bec 100644<br>
                        &gt; --- a/drivers/gpu/drm/ttm/ttm_bo.c<br>
                        &gt; +++ b/drivers/gpu/drm/ttm/ttm_bo.c<br>
                        &gt; @@ -830,8 +830,10 @@ static int
                        ttm_bo_mem_force_space(struct ttm_buffer_object
                        *bo,<br>
                        &gt;                if (mem-&gt;mm_node)<br>
                        &gt;                        break;<br>
                        &gt;                ret =
                        ttm_mem_evict_first(bdev, mem_type, place, ctx);<br>
                        &gt; -             if (unlikely(ret != 0))<br>
                        &gt; -                     return ret;<br>
                        &gt; +             if (unlikely(ret != 0)) {<br>
                        &gt; +                     if (!ctx ||
                        ctx-&gt;no_wait_gpu || ret != -EBUSY)<br>
                        &gt; +                             return ret;<br>
                        &gt; +             }<br>
                        &gt;        } while (1);<br>
                        &gt;        mem-&gt;mem_type = mem_type;<br>
                        &gt;        return ttm_bo_add_move_fence(bo,
                        man, mem);<br>
                        &gt; --<br>
                        &gt; 2.17.1<br>
                        &gt;<br>
                        &gt;
                        _______________________________________________<br>
                        &gt; dri-devel mailing list<br>
                        &gt; <a class="moz-txt-link-abbreviated"
                          href="mailto:dri-devel@lists.freedesktop.org"
                          moz-do-not-send="true">
                          dri-devel@lists.freedesktop.org</a><br>
                        &gt; <a
                          href="https://lists.freedesktop.org/mailman/listinfo/dri-devel"
                          moz-do-not-send="true">https://lists.freedesktop.org/mailman/listinfo/dri-devel</a><br>
                        <br>
                      </div>
                    </span></font><br>
                  <fieldset class="mimeAttachmentHeader"></fieldset>
                  <pre class="moz-quote-pre">_______________________________________________
dri-devel mailing list
<a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true">dri-devel@lists.freedesktop.org</a>
<a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/dri-devel" moz-do-not-send="true">https://lists.freedesktop.org/mailman/listinfo/dri-devel</a></pre>
                </blockquote>
                <br>
              </div>
              <br>
              <fieldset class="mimeAttachmentHeader"></fieldset>
              <pre class="moz-quote-pre" wrap="">_______________________________________________
dri-devel mailing list
<a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true">dri-devel@lists.freedesktop.org</a>
<a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/dri-devel" moz-do-not-send="true">https://lists.freedesktop.org/mailman/listinfo/dri-devel</a></pre>
            </blockquote>
            <br>
          </blockquote>
          <br>
          <br>
          <fieldset class="mimeAttachmentHeader"></fieldset>
          <pre class="moz-quote-pre" wrap="">_______________________________________________
dri-devel mailing list
<a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true">dri-devel@lists.freedesktop.org</a>
<a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/dri-devel" moz-do-not-send="true">https://lists.freedesktop.org/mailman/listinfo/dri-devel</a></pre>
        </blockquote>
        <br>
      </blockquote>
      <br>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <pre class="moz-quote-pre" wrap="">_______________________________________________
dri-devel mailing list
<a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org">dri-devel@lists.freedesktop.org</a>
<a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/dri-devel">https://lists.freedesktop.org/mailman/listinfo/dri-devel</a></pre>
    </blockquote>
    <br>
  </body>
</html>
Chunming Zhou April 24, 2019, 8:11 a.m. UTC | #13
On 2019年04月24日 16:07, Christian König wrote:
> This is used in a work item, so you don't need to check for signals.
will remove.
>
> And checking if the LRU is populated is mandatory here
How to check it outside of TTM? because the code is in dm.

> or otherwise you can run into an endless loop.
I already add a timeout for that.

-David
>
> Christian.
>
> Am 24.04.19 um 09:59 schrieb zhoucm1:
>>
>> how about new attached?
>>
>>
>> -David
>>
>>
>> On 2019年04月24日 15:30, Christian König wrote:
>>> That would change the semantics of ttm_bo_mem_space() and so could 
>>> change the return code in an IOCTL as well. Not a good idea, cause 
>>> that could have a lot of side effects.
>>>
>>> Instead in the calling DC code you could check if you get an -ENOMEM 
>>> and then call schedule().
>>>
>>> If after the schedule() we see that we have now BOs on the LRU we 
>>> can try again and see if pinning the frame buffer now succeeds.
>>>
>>> Christian.
>>>
>>> Am 24.04.19 um 09:17 schrieb zhoucm1:
>>>>
>>>> I made a patch as attached.
>>>>
>>>> I'm not sure how to make patch as your proposal, Could you make a 
>>>> patch for that if mine isn't enough?
>>>>
>>>> -David
>>>>
>>>> On 2019年04月24日 15:12, Christian König wrote:
>>>>>> how about just adding a wrapper for pin function as below?
>>>>> I considered this as well and don't think it will work reliable.
>>>>>
>>>>> We could use it as a band aid for this specific problem, but in 
>>>>> general we need to improve the handling in TTM to resolve those 
>>>>> kind of resource conflicts.
>>>>>
>>>>> Regards,
>>>>> Christian.
>>>>>
>>>>> Am 23.04.19 um 17:09 schrieb Zhou, David(ChunMing):
>>>>>> >3. If we have a ticket we grab a reference to the first BO on 
>>>>>> the LRU, drop the LRU lock and try to grab the reservation lock 
>>>>>> with the ticket.
>>>>>>
>>>>>> The BO on LRU is already locked by cs user, can it be dropped 
>>>>>> here by DC user? and then DC user grab its lock with ticket, how 
>>>>>> does CS grab it again?
>>>>>>
>>>>>> If you think waiting in ttm has this risk, how about just adding 
>>>>>> a wrapper for pin function as below?
>>>>>> amdgpu_get_pin_bo_timeout()
>>>>>> {
>>>>>> do {
>>>>>> amdgpo_bo_reserve();
>>>>>> r=amdgpu_bo_pin();
>>>>>>
>>>>>> if(!r)
>>>>>>         break;
>>>>>> amdgpu_bo_unreserve();
>>>>>> timeout--;
>>>>>>
>>>>>> } while(timeout>0);
>>>>>>
>>>>>> }
>>>>>>
>>>>>> -------- Original Message --------
>>>>>> Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy
>>>>>> From: Christian König
>>>>>> To: "Zhou, David(ChunMing)" ,"Koenig, Christian" ,"Liang, Prike" 
>>>>>> ,dri-devel@lists.freedesktop.org
>>>>>> CC:
>>>>>>
>>>>>> Well that's not so easy of hand.
>>>>>>
>>>>>> The basic problem here is that when you busy wait at this place 
>>>>>> you can easily run into situations where application A busy waits 
>>>>>> for B while B busy waits for A -> deadlock.
>>>>>>
>>>>>> So what we need here is the deadlock detection logic of the 
>>>>>> ww_mutex. To use this we at least need to do the following steps:
>>>>>>
>>>>>> 1. Reserve the BO in DC using a ww_mutex ticket (trivial).
>>>>>>
>>>>>> 2. If we then run into this EBUSY condition in TTM check if the 
>>>>>> BO we need memory for (or rather the ww_mutex of its reservation 
>>>>>> object) has a ticket assigned.
>>>>>>
>>>>>> 3. If we have a ticket we grab a reference to the first BO on the 
>>>>>> LRU, drop the LRU lock and try to grab the reservation lock with 
>>>>>> the ticket.
>>>>>>
>>>>>> 4. If getting the reservation lock with the ticket succeeded we 
>>>>>> check if the BO is still the first one on the LRU in question 
>>>>>> (the BO could have moved).
>>>>>>
>>>>>> 5. If the BO is still the first one on the LRU in question we try 
>>>>>> to evict it as we would evict any other BO.
>>>>>>
>>>>>> 6. If any of the "If's" above fail we just back off and return 
>>>>>> -EBUSY.
>>>>>>
>>>>>> Steps 2-5 are certainly not trivial, but doable as far as I can see.
>>>>>>
>>>>>> Have fun :)
>>>>>> Christian.
>>>>>>
>>>>>> Am 23.04.19 um 15:19 schrieb Zhou, David(ChunMing):
>>>>>>> How about adding more condition ctx->resv inline to address your 
>>>>>>> concern? As well as don't wait from same user, shouldn't lead to 
>>>>>>> deadlock.
>>>>>>>
>>>>>>> Otherwise, any other idea?
>>>>>>>
>>>>>>> -------- Original Message --------
>>>>>>> Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu 
>>>>>>> busy
>>>>>>> From: Christian König
>>>>>>> To: "Liang, Prike" ,"Zhou, David(ChunMing)" 
>>>>>>> ,dri-devel@lists.freedesktop.org
>>>>>>> CC:
>>>>>>>
>>>>>>> Well that is certainly a NAK because it can lead to deadlock in the
>>>>>>> memory management.
>>>>>>>
>>>>>>> You can't just busy wait with all those locks held.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Christian.
>>>>>>>
>>>>>>> Am 23.04.19 um 03:45 schrieb Liang, Prike:
>>>>>>> > Acked-by: Prike Liang <Prike.Liang@amd.com>
>>>>>>> >
>>>>>>> > Thanks,
>>>>>>> > Prike
>>>>>>> > -----Original Message-----
>>>>>>> > From: Chunming Zhou <david1.zhou@amd.com>
>>>>>>> > Sent: Monday, April 22, 2019 6:39 PM
>>>>>>> > To: dri-devel@lists.freedesktop.org
>>>>>>> > Cc: Liang, Prike <Prike.Liang@amd.com>; Zhou, David(ChunMing) 
>>>>>>> <David1.Zhou@amd.com>
>>>>>>> > Subject: [PATCH] ttm: wait mem space if user allow while gpu busy
>>>>>>> >
>>>>>>> > heavy gpu job could occupy memory long time, which could lead 
>>>>>>> to other user fail to get memory.
>>>>>>> >
>>>>>>> > Change-Id: I0b322d98cd76e5ac32b00462bbae8008d76c5e11
>>>>>>> > Signed-off-by: Chunming Zhou <david1.zhou@amd.com>
>>>>>>> > ---
>>>>>>> >   drivers/gpu/drm/ttm/ttm_bo.c | 6 ++++--
>>>>>>> >   1 file changed, 4 insertions(+), 2 deletions(-)
>>>>>>> >
>>>>>>> > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c 
>>>>>>> b/drivers/gpu/drm/ttm/ttm_bo.c index 7c484729f9b2..6c596cc24bec 
>>>>>>> 100644
>>>>>>> > --- a/drivers/gpu/drm/ttm/ttm_bo.c
>>>>>>> > +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>>>>>>> > @@ -830,8 +830,10 @@ static int ttm_bo_mem_force_space(struct 
>>>>>>> ttm_buffer_object *bo,
>>>>>>> >                if (mem->mm_node)
>>>>>>> >                        break;
>>>>>>> >                ret = ttm_mem_evict_first(bdev, mem_type, 
>>>>>>> place, ctx);
>>>>>>> > -             if (unlikely(ret != 0))
>>>>>>> > -                     return ret;
>>>>>>> > +             if (unlikely(ret != 0)) {
>>>>>>> > +                     if (!ctx || ctx->no_wait_gpu || ret != 
>>>>>>> -EBUSY)
>>>>>>> > +                             return ret;
>>>>>>> > +             }
>>>>>>> >        } while (1);
>>>>>>> >        mem->mem_type = mem_type;
>>>>>>> >        return ttm_bo_add_move_fence(bo, man, mem);
>>>>>>> > --
>>>>>>> > 2.17.1
>>>>>>> >
>>>>>>> > _______________________________________________
>>>>>>> > dri-devel mailing list
>>>>>>> > dri-devel@lists.freedesktop.org
>>>>>>> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> dri-devel mailing list
>>>>>>> dri-devel@lists.freedesktop.org
>>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> dri-devel mailing list
>>>>>> dri-devel@lists.freedesktop.org
>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> dri-devel mailing list
>>>> dri-devel@lists.freedesktop.org
>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>
>>
>>
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <p><br>
    </p>
    <br>
    <div class="moz-cite-prefix">On 2019年04月24日 16:07, Christian König
      wrote:<br>
    </div>
    <blockquote type="cite"
      cite="mid:f4f27f33-5e9f-985e-ca89-5cef39bf7aaf@gmail.com">
      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
      <div class="moz-cite-prefix">This is used in a work item, so you
        don't need to check for signals.<br>
      </div>
    </blockquote>
    will remove.<br>
    <blockquote type="cite"
      cite="mid:f4f27f33-5e9f-985e-ca89-5cef39bf7aaf@gmail.com">
      <div class="moz-cite-prefix"> <br>
        And checking if the LRU is populated is mandatory here </div>
    </blockquote>
    How to check it outside of TTM? because the code is in dm.<br>
    <br>
    <blockquote type="cite"
      cite="mid:f4f27f33-5e9f-985e-ca89-5cef39bf7aaf@gmail.com">
      <div class="moz-cite-prefix">or otherwise you can run into an
        endless loop.<br>
      </div>
    </blockquote>
    I already add a timeout for that.<br>
    <br>
    -David<br>
    <blockquote type="cite"
      cite="mid:f4f27f33-5e9f-985e-ca89-5cef39bf7aaf@gmail.com">
      <div class="moz-cite-prefix"> <br>
        Christian.<br>
        <br>
        Am 24.04.19 um 09:59 schrieb zhoucm1:<br>
      </div>
      <blockquote type="cite"
        cite="mid:26341685-6e73-55c4-2609-52616d92c06a@amd.com">
        <p>how about new attached?</p>
        <p><br>
        </p>
        <p>-David<br>
        </p>
        <br>
        <div class="moz-cite-prefix">On 2019年04月24日 15:30, Christian
          König wrote:<br>
        </div>
        <blockquote type="cite"
          cite="mid:8db16fe8-78fe-7f25-4acd-51e37e645f3b@gmail.com">
          <div class="moz-cite-prefix">That would change the semantics
            of ttm_bo_mem_space() and so could change the return code in
            an IOCTL as well. Not a good idea, cause that could have a
            lot of side effects.<br>
            <br>
            Instead in the calling DC code you could check if you get an
            -ENOMEM and then call schedule().<br>
            <br>
            If after the schedule() we see that we have now BOs on the
            LRU we can try again and see if pinning the frame buffer now
            succeeds.<br>
            <br>
            Christian.<br>
            <br>
            Am 24.04.19 um 09:17 schrieb zhoucm1:<br>
          </div>
          <blockquote type="cite"
            cite="mid:b57d94e6-6e20-d809-7ac2-89a39a57a365@amd.com">
            <p>I made a patch as attached.</p>
            <p>I'm not sure how to make patch as your proposal, Could
              you make a patch for that if mine isn't enough?<br>
            </p>
            -David<br>
            <br>
            <div class="moz-cite-prefix">On 2019年04月24日 15:12, Christian
              König wrote:<br>
            </div>
            <blockquote type="cite"
              cite="mid:32f2a1b7-99f8-de9a-799c-f98af308842b@gmail.com">
              <div class="moz-cite-prefix">
                <blockquote type="cite">how about just adding a wrapper
                  for pin function as below?</blockquote>
                I considered this as well and don't think it will work
                reliable.<br>
                <br>
                We could use it as a band aid for this specific problem,
                but in general we need to improve the handling in TTM to
                resolve those kind of resource conflicts.<br>
                <br>
                Regards,<br>
                Christian.<br>
                <br>
                Am 23.04.19 um 17:09 schrieb Zhou, David(ChunMing):<br>
              </div>
              <blockquote type="cite"
cite="mid:-jm8957cqk536djh1631fvvv-xx4wzb5q21ak-v8q7rd2l14aq-f68kxr7kbea18a7xceae626b-8s84c4d1mgrg53g0bhq3ahee89h70qrv4ly1-vf5a7d63x4mbquxnfmiehuk2-rwaw2m-qc2chu.1556032141262@email.android.com">
                <meta content="text/html; charset=Windows-1252">
                &gt;3. If we have a ticket we grab a reference to the
                first BO on the LRU, drop the LRU lock and try to grab
                the reservation lock with the ticket.<br>
                <br>
                The BO on LRU is already locked by cs user, can it be
                dropped here by DC user? and then DC user grab its lock
                with ticket, how does CS grab it again?<br>
                <br>
                If you think waiting in ttm has this risk, how about
                just adding a wrapper for pin function as below?<br>
                amdgpu_get_pin_bo_timeout()<br>
                {<br>
                do {<br>
                amdgpo_bo_reserve();<br>
                r=amdgpu_bo_pin();<br>
                <br>
                if(!r)<br>
                        break;<br>
                amdgpu_bo_unreserve();<br>
                timeout--;<br>
                <br>
                } while(timeout&gt;0);<br>
                <br>
                }<br>
                <br>
                -------- Original Message --------<br>
                Subject: Re: [PATCH] ttm: wait mem space if user allow
                while gpu busy<br>
                From: Christian König <br>
                To: "Zhou, David(ChunMing)" ,"Koenig, Christian"
                ,"Liang, Prike" ,<a class="moz-txt-link-abbreviated"
                  href="mailto:dri-devel@lists.freedesktop.org"
                  moz-do-not-send="true">dri-devel@lists.freedesktop.org</a><br>
                CC: <br>
                <br>
                <div>
                  <div class="moz-cite-prefix">Well that's not so easy
                    of hand.<br>
                    <br>
                    The basic problem here is that when you busy wait at
                    this place you can easily run into situations where
                    application A busy waits for B while B busy waits
                    for A -&gt; deadlock.<br>
                    <br>
                    So what we need here is the deadlock detection logic
                    of the ww_mutex. To use this we at least need to do
                    the following steps:<br>
                    <br>
                    1. Reserve the BO in DC using a ww_mutex ticket
                    (trivial).<br>
                    <br>
                    2. If we then run into this EBUSY condition in TTM
                    check if the BO we need memory for (or rather the
                    ww_mutex of its reservation object) has a ticket
                    assigned.<br>
                    <br>
                    3. If we have a ticket we grab a reference to the
                    first BO on the LRU, drop the LRU lock and try to
                    grab the reservation lock with the ticket.<br>
                    <br>
                    4. If getting the reservation lock with the ticket
                    succeeded we check if the BO is still the first one
                    on the LRU in question (the BO could have moved).<br>
                    <br>
                    5. If the BO is still the first one on the LRU in
                    question we try to evict it as we would evict any
                    other BO.<br>
                    <br>
                    6. If any of the "If's" above fail we just back off
                    and return -EBUSY.<br>
                    <br>
                    Steps 2-5 are certainly not trivial, but doable as
                    far as I can see.<br>
                    <br>
                    Have fun :)<br>
                    Christian.<br>
                    <br>
                    Am 23.04.19 um 15:19 schrieb Zhou, David(ChunMing):<br>
                  </div>
                  <blockquote type="cite">
                    <meta name="Generator" content="Microsoft Exchange
                      Server">
                    <style>
<!--
.EmailQuote
	{margin-left:1pt;
	padding-left:4pt;
	border-left:#800000 2px solid}
-->
</style>
                    <div>How about adding more condition ctx-&gt;resv
                      inline to address your concern? As well as don't
                      wait from same user, shouldn't lead to deadlock.<br>
                      <br>
                      Otherwise, any other idea?<br>
                      <br>
                      -------- Original Message --------<br>
                      Subject: Re: [PATCH] ttm: wait mem space if user
                      allow while gpu busy<br>
                      From: Christian König <br>
                      To: "Liang, Prike" ,"Zhou, David(ChunMing)" ,<a
                        class="moz-txt-link-abbreviated"
                        href="mailto:dri-devel@lists.freedesktop.org"
                        moz-do-not-send="true">dri-devel@lists.freedesktop.org</a><br>
                      CC: <br>
                      <br>
                    </div>
                    <font size="2"><span style="font-size:11pt">
                        <div class="PlainText">Well that is certainly a
                          NAK because it can lead to deadlock in the <br>
                          memory management.<br>
                          <br>
                          You can't just busy wait with all those locks
                          held.<br>
                          <br>
                          Regards,<br>
                          Christian.<br>
                          <br>
                          Am 23.04.19 um 03:45 schrieb Liang, Prike:<br>
                          &gt; Acked-by: Prike Liang <a
                            class="moz-txt-link-rfc2396E"
                            href="mailto:Prike.Liang@amd.com"
                            moz-do-not-send="true">
                            &lt;Prike.Liang@amd.com&gt;</a><br>
                          &gt;<br>
                          &gt; Thanks,<br>
                          &gt; Prike<br>
                          &gt; -----Original Message-----<br>
                          &gt; From: Chunming Zhou <a
                            class="moz-txt-link-rfc2396E"
                            href="mailto:david1.zhou@amd.com"
                            moz-do-not-send="true">
                            &lt;david1.zhou@amd.com&gt;</a><br>
                          &gt; Sent: Monday, April 22, 2019 6:39 PM<br>
                          &gt; To: <a class="moz-txt-link-abbreviated"
href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true">
                            dri-devel@lists.freedesktop.org</a><br>
                          &gt; Cc: Liang, Prike <a
                            class="moz-txt-link-rfc2396E"
                            href="mailto:Prike.Liang@amd.com"
                            moz-do-not-send="true">
                            &lt;Prike.Liang@amd.com&gt;</a>; Zhou,
                          David(ChunMing) <a
                            class="moz-txt-link-rfc2396E"
                            href="mailto:David1.Zhou@amd.com"
                            moz-do-not-send="true">
                            &lt;David1.Zhou@amd.com&gt;</a><br>
                          &gt; Subject: [PATCH] ttm: wait mem space if
                          user allow while gpu busy<br>
                          &gt;<br>
                          &gt; heavy gpu job could occupy memory long
                          time, which could lead to other user fail to
                          get memory.<br>
                          &gt;<br>
                          &gt; Change-Id:
                          I0b322d98cd76e5ac32b00462bbae8008d76c5e11<br>
                          &gt; Signed-off-by: Chunming Zhou <a
                            class="moz-txt-link-rfc2396E"
                            href="mailto:david1.zhou@amd.com"
                            moz-do-not-send="true">
                            &lt;david1.zhou@amd.com&gt;</a><br>
                          &gt; ---<br>
                          &gt;   drivers/gpu/drm/ttm/ttm_bo.c | 6 ++++--<br>
                          &gt;   1 file changed, 4 insertions(+), 2
                          deletions(-)<br>
                          &gt;<br>
                          &gt; diff --git a/drivers/gpu/drm/ttm/ttm_bo.c
                          b/drivers/gpu/drm/ttm/ttm_bo.c index
                          7c484729f9b2..6c596cc24bec 100644<br>
                          &gt; --- a/drivers/gpu/drm/ttm/ttm_bo.c<br>
                          &gt; +++ b/drivers/gpu/drm/ttm/ttm_bo.c<br>
                          &gt; @@ -830,8 +830,10 @@ static int
                          ttm_bo_mem_force_space(struct
                          ttm_buffer_object *bo,<br>
                          &gt;                if (mem-&gt;mm_node)<br>
                          &gt;                        break;<br>
                          &gt;                ret =
                          ttm_mem_evict_first(bdev, mem_type, place,
                          ctx);<br>
                          &gt; -             if (unlikely(ret != 0))<br>
                          &gt; -                     return ret;<br>
                          &gt; +             if (unlikely(ret != 0)) {<br>
                          &gt; +                     if (!ctx ||
                          ctx-&gt;no_wait_gpu || ret != -EBUSY)<br>
                          &gt; +                             return ret;<br>
                          &gt; +             }<br>
                          &gt;        } while (1);<br>
                          &gt;        mem-&gt;mem_type = mem_type;<br>
                          &gt;        return ttm_bo_add_move_fence(bo,
                          man, mem);<br>
                          &gt; --<br>
                          &gt; 2.17.1<br>
                          &gt;<br>
                          &gt;
                          _______________________________________________<br>
                          &gt; dri-devel mailing list<br>
                          &gt; <a class="moz-txt-link-abbreviated"
                            href="mailto:dri-devel@lists.freedesktop.org"
                            moz-do-not-send="true">
                            dri-devel@lists.freedesktop.org</a><br>
                          &gt; <a
                            href="https://lists.freedesktop.org/mailman/listinfo/dri-devel"
                            moz-do-not-send="true">https://lists.freedesktop.org/mailman/listinfo/dri-devel</a><br>
                          <br>
                        </div>
                      </span></font><br>
                    <fieldset class="mimeAttachmentHeader"></fieldset>
                    <pre class="moz-quote-pre">_______________________________________________
dri-devel mailing list
<a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true">dri-devel@lists.freedesktop.org</a>
<a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/dri-devel" moz-do-not-send="true">https://lists.freedesktop.org/mailman/listinfo/dri-devel</a></pre>
                  </blockquote>
                  <br>
                </div>
                <br>
                <fieldset class="mimeAttachmentHeader"></fieldset>
                <pre class="moz-quote-pre" wrap="">_______________________________________________
dri-devel mailing list
<a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true">dri-devel@lists.freedesktop.org</a>
<a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/dri-devel" moz-do-not-send="true">https://lists.freedesktop.org/mailman/listinfo/dri-devel</a></pre>
              </blockquote>
              <br>
            </blockquote>
            <br>
            <br>
            <fieldset class="mimeAttachmentHeader"></fieldset>
            <pre class="moz-quote-pre" wrap="">_______________________________________________
dri-devel mailing list
<a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true">dri-devel@lists.freedesktop.org</a>
<a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/dri-devel" moz-do-not-send="true">https://lists.freedesktop.org/mailman/listinfo/dri-devel</a></pre>
          </blockquote>
          <br>
        </blockquote>
        <br>
        <br>
        <fieldset class="mimeAttachmentHeader"></fieldset>
        <pre class="moz-quote-pre" wrap="">_______________________________________________
dri-devel mailing list
<a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true">dri-devel@lists.freedesktop.org</a>
<a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/dri-devel" moz-do-not-send="true">https://lists.freedesktop.org/mailman/listinfo/dri-devel</a></pre>
      </blockquote>
      <br>
    </blockquote>
    <br>
  </body>
</html>
Christian König April 24, 2019, 8:59 a.m. UTC | #14
Am 24.04.19 um 10:11 schrieb zhoucm1:


On 2019年04月24日 16:07, Christian König wrote:
This is used in a work item, so you don't need to check for signals.
will remove.

And checking if the LRU is populated is mandatory here
How to check it outside of TTM? because the code is in dm.

Well just use a static cast on the first entry of the LRU.

We can't upstream that solution anyway, so just a hack should do for now.


or otherwise you can run into an endless loop.
I already add a timeout for that.

Thinking more about it we most likely actually need to grab the lock on the first BO entry from the LRU.


-David

Christian.

Am 24.04.19 um 09:59 schrieb zhoucm1:

how about new attached?


-David

On 2019年04月24日 15:30, Christian König wrote:
That would change the semantics of ttm_bo_mem_space() and so could change the return code in an IOCTL as well. Not a good idea, cause that could have a lot of side effects.

Instead in the calling DC code you could check if you get an -ENOMEM and then call schedule().

If after the schedule() we see that we have now BOs on the LRU we can try again and see if pinning the frame buffer now succeeds.

Christian.

Am 24.04.19 um 09:17 schrieb zhoucm1:

I made a patch as attached.

I'm not sure how to make patch as your proposal, Could you make a patch for that if mine isn't enough?

-David

On 2019年04月24日 15:12, Christian König wrote:
how about just adding a wrapper for pin function as below?
I considered this as well and don't think it will work reliable.

We could use it as a band aid for this specific problem, but in general we need to improve the handling in TTM to resolve those kind of resource conflicts.

Regards,
Christian.

Am 23.04.19 um 17:09 schrieb Zhou, David(ChunMing):
>3. If we have a ticket we grab a reference to the first BO on the LRU, drop the LRU lock and try to grab the reservation lock with the ticket.

The BO on LRU is already locked by cs user, can it be dropped here by DC user? and then DC user grab its lock with ticket, how does CS grab it again?

If you think waiting in ttm has this risk, how about just adding a wrapper for pin function as below?
amdgpu_get_pin_bo_timeout()
{
do {
amdgpo_bo_reserve();
r=amdgpu_bo_pin();

if(!r)
        break;
amdgpu_bo_unreserve();
timeout--;

} while(timeout>0);

}

-------- Original Message --------
Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy
From: Christian König
To: "Zhou, David(ChunMing)" ,"Koenig, Christian" ,"Liang, Prike" ,dri-devel@lists.freedesktop.org<mailto:dri-devel@lists.freedesktop.org>
CC:

Well that's not so easy of hand.

The basic problem here is that when you busy wait at this place you can easily run into situations where application A busy waits for B while B busy waits for A -> deadlock.

So what we need here is the deadlock detection logic of the ww_mutex. To use this we at least need to do the following steps:

1. Reserve the BO in DC using a ww_mutex ticket (trivial).

2. If we then run into this EBUSY condition in TTM check if the BO we need memory for (or rather the ww_mutex of its reservation object) has a ticket assigned.

3. If we have a ticket we grab a reference to the first BO on the LRU, drop the LRU lock and try to grab the reservation lock with the ticket.

4. If getting the reservation lock with the ticket succeeded we check if the BO is still the first one on the LRU in question (the BO could have moved).

5. If the BO is still the first one on the LRU in question we try to evict it as we would evict any other BO.

6. If any of the "If's" above fail we just back off and return -EBUSY.

Steps 2-5 are certainly not trivial, but doable as far as I can see.

Have fun :)
Christian.

Am 23.04.19 um 15:19 schrieb Zhou, David(ChunMing):
How about adding more condition ctx->resv inline to address your concern? As well as don't wait from same user, shouldn't lead to deadlock.

Otherwise, any other idea?

-------- Original Message --------
Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy
From: Christian König
To: "Liang, Prike" ,"Zhou, David(ChunMing)" ,dri-devel@lists.freedesktop.org<mailto:dri-devel@lists.freedesktop.org>
CC:

Well that is certainly a NAK because it can lead to deadlock in the
memory management.

You can't just busy wait with all those locks held.

Regards,
Christian.

Am 23.04.19 um 03:45 schrieb Liang, Prike:
> Acked-by: Prike Liang <Prike.Liang@amd.com><mailto:Prike.Liang@amd.com>
>
> Thanks,
> Prike
> -----Original Message-----
> From: Chunming Zhou <david1.zhou@amd.com><mailto:david1.zhou@amd.com>
> Sent: Monday, April 22, 2019 6:39 PM
> To: dri-devel@lists.freedesktop.org<mailto:dri-devel@lists.freedesktop.org>
> Cc: Liang, Prike <Prike.Liang@amd.com><mailto:Prike.Liang@amd.com>; Zhou, David(ChunMing) <David1.Zhou@amd.com><mailto:David1.Zhou@amd.com>
> Subject: [PATCH] ttm: wait mem space if user allow while gpu busy
>
> heavy gpu job could occupy memory long time, which could lead to other user fail to get memory.
>
> Change-Id: I0b322d98cd76e5ac32b00462bbae8008d76c5e11
> Signed-off-by: Chunming Zhou <david1.zhou@amd.com><mailto:david1.zhou@amd.com>
> ---
>   drivers/gpu/drm/ttm/ttm_bo.c | 6 ++++--
>   1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 7c484729f9b2..6c596cc24bec 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -830,8 +830,10 @@ static int ttm_bo_mem_force_space(struct ttm_buffer_object *bo,
>                if (mem->mm_node)
>                        break;
>                ret = ttm_mem_evict_first(bdev, mem_type, place, ctx);
> -             if (unlikely(ret != 0))
> -                     return ret;
> +             if (unlikely(ret != 0)) {
> +                     if (!ctx || ctx->no_wait_gpu || ret != -EBUSY)
> +                             return ret;
> +             }
>        } while (1);
>        mem->mem_type = mem_type;
>        return ttm_bo_add_move_fence(bo, man, mem);
> --
> 2.17.1
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org<mailto:dri-devel@lists.freedesktop.org>
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
Chunming Zhou April 24, 2019, 9:03 a.m. UTC | #15
OK, Let's drop mine, then Could you draft a solution for that?


-David


On 2019年04月24日 16:59, Koenig, Christian wrote:
> Am 24.04.19 um 10:11 schrieb zhoucm1:
>>
>>
>>
>> On 2019年04月24日 16:07, Christian König wrote:
>>> This is used in a work item, so you don't need to check for signals.
>> will remove.
>>>
>>> And checking if the LRU is populated is mandatory here
>> How to check it outside of TTM? because the code is in dm.
>
> Well just use a static cast on the first entry of the LRU.
>
> We can't upstream that solution anyway, so just a hack should do for now.
>
>>
>>> or otherwise you can run into an endless loop.
>> I already add a timeout for that.
>
> Thinking more about it we most likely actually need to grab the lock 
> on the first BO entry from the LRU.
>
>>
>> -David
>>>
>>> Christian.
>>>
>>> Am 24.04.19 um 09:59 schrieb zhoucm1:
>>>>
>>>> how about new attached?
>>>>
>>>>
>>>> -David
>>>>
>>>>
>>>> On 2019年04月24日 15:30, Christian König wrote:
>>>>> That would change the semantics of ttm_bo_mem_space() and so could 
>>>>> change the return code in an IOCTL as well. Not a good idea, cause 
>>>>> that could have a lot of side effects.
>>>>>
>>>>> Instead in the calling DC code you could check if you get an 
>>>>> -ENOMEM and then call schedule().
>>>>>
>>>>> If after the schedule() we see that we have now BOs on the LRU we 
>>>>> can try again and see if pinning the frame buffer now succeeds.
>>>>>
>>>>> Christian.
>>>>>
>>>>> Am 24.04.19 um 09:17 schrieb zhoucm1:
>>>>>>
>>>>>> I made a patch as attached.
>>>>>>
>>>>>> I'm not sure how to make patch as your proposal, Could you make a 
>>>>>> patch for that if mine isn't enough?
>>>>>>
>>>>>> -David
>>>>>>
>>>>>> On 2019年04月24日 15:12, Christian König wrote:
>>>>>>>> how about just adding a wrapper for pin function as below?
>>>>>>> I considered this as well and don't think it will work reliable.
>>>>>>>
>>>>>>> We could use it as a band aid for this specific problem, but in 
>>>>>>> general we need to improve the handling in TTM to resolve those 
>>>>>>> kind of resource conflicts.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Christian.
>>>>>>>
>>>>>>> Am 23.04.19 um 17:09 schrieb Zhou, David(ChunMing):
>>>>>>>> >3. If we have a ticket we grab a reference to the first BO on 
>>>>>>>> the LRU, drop the LRU lock and try to grab the reservation lock 
>>>>>>>> with the ticket.
>>>>>>>>
>>>>>>>> The BO on LRU is already locked by cs user, can it be dropped 
>>>>>>>> here by DC user? and then DC user grab its lock with ticket, 
>>>>>>>> how does CS grab it again?
>>>>>>>>
>>>>>>>> If you think waiting in ttm has this risk, how about just 
>>>>>>>> adding a wrapper for pin function as below?
>>>>>>>> amdgpu_get_pin_bo_timeout()
>>>>>>>> {
>>>>>>>> do {
>>>>>>>> amdgpo_bo_reserve();
>>>>>>>> r=amdgpu_bo_pin();
>>>>>>>>
>>>>>>>> if(!r)
>>>>>>>>         break;
>>>>>>>> amdgpu_bo_unreserve();
>>>>>>>> timeout--;
>>>>>>>>
>>>>>>>> } while(timeout>0);
>>>>>>>>
>>>>>>>> }
>>>>>>>>
>>>>>>>> -------- Original Message --------
>>>>>>>> Subject: Re: [PATCH] ttm: wait mem space if user allow while 
>>>>>>>> gpu busy
>>>>>>>> From: Christian König
>>>>>>>> To: "Zhou, David(ChunMing)" ,"Koenig, Christian" ,"Liang, 
>>>>>>>> Prike" ,dri-devel@lists.freedesktop.org
>>>>>>>> CC:
>>>>>>>>
>>>>>>>> Well that's not so easy of hand.
>>>>>>>>
>>>>>>>> The basic problem here is that when you busy wait at this place 
>>>>>>>> you can easily run into situations where application A busy 
>>>>>>>> waits for B while B busy waits for A -> deadlock.
>>>>>>>>
>>>>>>>> So what we need here is the deadlock detection logic of the 
>>>>>>>> ww_mutex. To use this we at least need to do the following steps:
>>>>>>>>
>>>>>>>> 1. Reserve the BO in DC using a ww_mutex ticket (trivial).
>>>>>>>>
>>>>>>>> 2. If we then run into this EBUSY condition in TTM check if the 
>>>>>>>> BO we need memory for (or rather the ww_mutex of its 
>>>>>>>> reservation object) has a ticket assigned.
>>>>>>>>
>>>>>>>> 3. If we have a ticket we grab a reference to the first BO on 
>>>>>>>> the LRU, drop the LRU lock and try to grab the reservation lock 
>>>>>>>> with the ticket.
>>>>>>>>
>>>>>>>> 4. If getting the reservation lock with the ticket succeeded we 
>>>>>>>> check if the BO is still the first one on the LRU in question 
>>>>>>>> (the BO could have moved).
>>>>>>>>
>>>>>>>> 5. If the BO is still the first one on the LRU in question we 
>>>>>>>> try to evict it as we would evict any other BO.
>>>>>>>>
>>>>>>>> 6. If any of the "If's" above fail we just back off and return 
>>>>>>>> -EBUSY.
>>>>>>>>
>>>>>>>> Steps 2-5 are certainly not trivial, but doable as far as I can 
>>>>>>>> see.
>>>>>>>>
>>>>>>>> Have fun :)
>>>>>>>> Christian.
>>>>>>>>
>>>>>>>> Am 23.04.19 um 15:19 schrieb Zhou, David(ChunMing):
>>>>>>>>> How about adding more condition ctx->resv inline to address 
>>>>>>>>> your concern? As well as don't wait from same user, shouldn't 
>>>>>>>>> lead to deadlock.
>>>>>>>>>
>>>>>>>>> Otherwise, any other idea?
>>>>>>>>>
>>>>>>>>> -------- Original Message --------
>>>>>>>>> Subject: Re: [PATCH] ttm: wait mem space if user allow while 
>>>>>>>>> gpu busy
>>>>>>>>> From: Christian König
>>>>>>>>> To: "Liang, Prike" ,"Zhou, David(ChunMing)" 
>>>>>>>>> ,dri-devel@lists.freedesktop.org
>>>>>>>>> CC:
>>>>>>>>>
>>>>>>>>> Well that is certainly a NAK because it can lead to deadlock 
>>>>>>>>> in the
>>>>>>>>> memory management.
>>>>>>>>>
>>>>>>>>> You can't just busy wait with all those locks held.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Christian.
>>>>>>>>>
>>>>>>>>> Am 23.04.19 um 03:45 schrieb Liang, Prike:
>>>>>>>>> > Acked-by: Prike Liang <Prike.Liang@amd.com>
>>>>>>>>> >
>>>>>>>>> > Thanks,
>>>>>>>>> > Prike
>>>>>>>>> > -----Original Message-----
>>>>>>>>> > From: Chunming Zhou <david1.zhou@amd.com>
>>>>>>>>> > Sent: Monday, April 22, 2019 6:39 PM
>>>>>>>>> > To: dri-devel@lists.freedesktop.org
>>>>>>>>> > Cc: Liang, Prike <Prike.Liang@amd.com>; Zhou, 
>>>>>>>>> David(ChunMing) <David1.Zhou@amd.com>
>>>>>>>>> > Subject: [PATCH] ttm: wait mem space if user allow while gpu 
>>>>>>>>> busy
>>>>>>>>> >
>>>>>>>>> > heavy gpu job could occupy memory long time, which could 
>>>>>>>>> lead to other user fail to get memory.
>>>>>>>>> >
>>>>>>>>> > Change-Id: I0b322d98cd76e5ac32b00462bbae8008d76c5e11
>>>>>>>>> > Signed-off-by: Chunming Zhou <david1.zhou@amd.com>
>>>>>>>>> > ---
>>>>>>>>> >   drivers/gpu/drm/ttm/ttm_bo.c | 6 ++++--
>>>>>>>>> >   1 file changed, 4 insertions(+), 2 deletions(-)
>>>>>>>>> >
>>>>>>>>> > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c 
>>>>>>>>> b/drivers/gpu/drm/ttm/ttm_bo.c index 
>>>>>>>>> 7c484729f9b2..6c596cc24bec 100644
>>>>>>>>> > --- a/drivers/gpu/drm/ttm/ttm_bo.c
>>>>>>>>> > +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>>>>>>>>> > @@ -830,8 +830,10 @@ static int 
>>>>>>>>> ttm_bo_mem_force_space(struct ttm_buffer_object *bo,
>>>>>>>>> >                if (mem->mm_node)
>>>>>>>>> >                        break;
>>>>>>>>> >                ret = ttm_mem_evict_first(bdev, mem_type, 
>>>>>>>>> place, ctx);
>>>>>>>>> > -             if (unlikely(ret != 0))
>>>>>>>>> > -                     return ret;
>>>>>>>>> > +             if (unlikely(ret != 0)) {
>>>>>>>>> > +                     if (!ctx || ctx->no_wait_gpu || ret != 
>>>>>>>>> -EBUSY)
>>>>>>>>> > +                             return ret;
>>>>>>>>> > +             }
>>>>>>>>> >        } while (1);
>>>>>>>>> >        mem->mem_type = mem_type;
>>>>>>>>> >        return ttm_bo_add_move_fence(bo, man, mem);
>>>>>>>>> > --
>>>>>>>>> > 2.17.1
>>>>>>>>> >
>>>>>>>>> > _______________________________________________
>>>>>>>>> > dri-devel mailing list
>>>>>>>>> > dri-devel@lists.freedesktop.org
>>>>>>>>> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> dri-devel mailing list
>>>>>>>>> dri-devel@lists.freedesktop.org
>>>>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> dri-devel mailing list
>>>>>>>> dri-devel@lists.freedesktop.org
>>>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> dri-devel mailing list
>>>>>> dri-devel@lists.freedesktop.org
>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> dri-devel mailing list
>>>> dri-devel@lists.freedesktop.org
>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>
>>
>
<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <p>OK, Let's drop mine, then Could you draft a solution for that?</p>
    <p><br>
    </p>
    <p>-David<br>
    </p>
    <br>
    <div class="moz-cite-prefix">On 2019年04月24日 16:59, Koenig, Christian
      wrote:<br>
    </div>
    <blockquote type="cite"
      cite="mid:9e6e865f-29dd-be69-fada-f6a6b35e6c69@amd.com">
      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
      <div class="moz-cite-prefix">Am 24.04.19 um 10:11 schrieb zhoucm1:<br>
      </div>
      <blockquote type="cite"
        cite="mid:cc8a8a83-a200-528f-9d31-a16115c3c00f@amd.com">
        <p><br>
        </p>
        <br>
        <div class="moz-cite-prefix">On 2019年04月24日 16:07, Christian
          König wrote:<br>
        </div>
        <blockquote type="cite"
          cite="mid:f4f27f33-5e9f-985e-ca89-5cef39bf7aaf@gmail.com">
          <div class="moz-cite-prefix">This is used in a work item, so
            you don't need to check for signals.<br>
          </div>
        </blockquote>
        will remove.<br>
        <blockquote type="cite"
          cite="mid:f4f27f33-5e9f-985e-ca89-5cef39bf7aaf@gmail.com">
          <div class="moz-cite-prefix"><br>
            And checking if the LRU is populated is mandatory here </div>
        </blockquote>
        How to check it outside of TTM? because the code is in dm.<br>
      </blockquote>
      <br>
      Well just use a static cast on the first entry of the LRU.<br>
      <br>
      We can't upstream that solution anyway, so just a hack should do
      for now.<br>
      <br>
      <blockquote type="cite"
        cite="mid:cc8a8a83-a200-528f-9d31-a16115c3c00f@amd.com"><br>
        <blockquote type="cite"
          cite="mid:f4f27f33-5e9f-985e-ca89-5cef39bf7aaf@gmail.com">
          <div class="moz-cite-prefix">or otherwise you can run into an
            endless loop.<br>
          </div>
        </blockquote>
        I already add a timeout for that.<br>
      </blockquote>
      <br>
      Thinking more about it we most likely actually need to grab the
      lock on the first BO entry from the LRU.<br>
      <br>
      <blockquote type="cite"
        cite="mid:cc8a8a83-a200-528f-9d31-a16115c3c00f@amd.com"><br>
        -David<br>
        <blockquote type="cite"
          cite="mid:f4f27f33-5e9f-985e-ca89-5cef39bf7aaf@gmail.com">
          <div class="moz-cite-prefix"><br>
            Christian.<br>
            <br>
            Am 24.04.19 um 09:59 schrieb zhoucm1:<br>
          </div>
          <blockquote type="cite"
            cite="mid:26341685-6e73-55c4-2609-52616d92c06a@amd.com">
            <p>how about new attached?</p>
            <p><br>
            </p>
            <p>-David<br>
            </p>
            <br>
            <div class="moz-cite-prefix">On 2019年04月24日 15:30, Christian
              König wrote:<br>
            </div>
            <blockquote type="cite"
              cite="mid:8db16fe8-78fe-7f25-4acd-51e37e645f3b@gmail.com">
              <div class="moz-cite-prefix">That would change the
                semantics of ttm_bo_mem_space() and so could change the
                return code in an IOCTL as well. Not a good idea, cause
                that could have a lot of side effects.<br>
                <br>
                Instead in the calling DC code you could check if you
                get an -ENOMEM and then call schedule().<br>
                <br>
                If after the schedule() we see that we have now BOs on
                the LRU we can try again and see if pinning the frame
                buffer now succeeds.<br>
                <br>
                Christian.<br>
                <br>
                Am 24.04.19 um 09:17 schrieb zhoucm1:<br>
              </div>
              <blockquote type="cite"
                cite="mid:b57d94e6-6e20-d809-7ac2-89a39a57a365@amd.com">
                <p>I made a patch as attached.</p>
                <p>I'm not sure how to make patch as your proposal,
                  Could you make a patch for that if mine isn't enough?<br>
                </p>
                -David<br>
                <br>
                <div class="moz-cite-prefix">On 2019年04月24日 15:12,
                  Christian König wrote:<br>
                </div>
                <blockquote type="cite"
                  cite="mid:32f2a1b7-99f8-de9a-799c-f98af308842b@gmail.com">
                  <div class="moz-cite-prefix">
                    <blockquote type="cite">how about just adding a
                      wrapper for pin function as below?</blockquote>
                    I considered this as well and don't think it will
                    work reliable.<br>
                    <br>
                    We could use it as a band aid for this specific
                    problem, but in general we need to improve the
                    handling in TTM to resolve those kind of resource
                    conflicts.<br>
                    <br>
                    Regards,<br>
                    Christian.<br>
                    <br>
                    Am 23.04.19 um 17:09 schrieb Zhou, David(ChunMing):<br>
                  </div>
                  <blockquote type="cite"
cite="mid:-jm8957cqk536djh1631fvvv-xx4wzb5q21ak-v8q7rd2l14aq-f68kxr7kbea18a7xceae626b-8s84c4d1mgrg53g0bhq3ahee89h70qrv4ly1-vf5a7d63x4mbquxnfmiehuk2-rwaw2m-qc2chu.1556032141262@email.android.com">
                    <meta content="text/html; charset=Windows-1252">
                    &gt;3. If we have a ticket we grab a reference to
                    the first BO on the LRU, drop the LRU lock and try
                    to grab the reservation lock with the ticket.<br>
                    <br>
                    The BO on LRU is already locked by cs user, can it
                    be dropped here by DC user? and then DC user grab
                    its lock with ticket, how does CS grab it again?<br>
                    <br>
                    If you think waiting in ttm has this risk, how about
                    just adding a wrapper for pin function as below?<br>
                    amdgpu_get_pin_bo_timeout()<br>
                    {<br>
                    do {<br>
                    amdgpo_bo_reserve();<br>
                    r=amdgpu_bo_pin();<br>
                    <br>
                    if(!r)<br>
                            break;<br>
                    amdgpu_bo_unreserve();<br>
                    timeout--;<br>
                    <br>
                    } while(timeout&gt;0);<br>
                    <br>
                    }<br>
                    <br>
                    -------- Original Message --------<br>
                    Subject: Re: [PATCH] ttm: wait mem space if user
                    allow while gpu busy<br>
                    From: Christian König <br>
                    To: "Zhou, David(ChunMing)" ,"Koenig, Christian"
                    ,"Liang, Prike" ,<a class="moz-txt-link-abbreviated"
                      href="mailto:dri-devel@lists.freedesktop.org"
                      moz-do-not-send="true">dri-devel@lists.freedesktop.org</a><br>
                    CC: <br>
                    <br>
                    <div>
                      <div class="moz-cite-prefix">Well that's not so
                        easy of hand.<br>
                        <br>
                        The basic problem here is that when you busy
                        wait at this place you can easily run into
                        situations where application A busy waits for B
                        while B busy waits for A -&gt; deadlock.<br>
                        <br>
                        So what we need here is the deadlock detection
                        logic of the ww_mutex. To use this we at least
                        need to do the following steps:<br>
                        <br>
                        1. Reserve the BO in DC using a ww_mutex ticket
                        (trivial).<br>
                        <br>
                        2. If we then run into this EBUSY condition in
                        TTM check if the BO we need memory for (or
                        rather the ww_mutex of its reservation object)
                        has a ticket assigned.<br>
                        <br>
                        3. If we have a ticket we grab a reference to
                        the first BO on the LRU, drop the LRU lock and
                        try to grab the reservation lock with the
                        ticket.<br>
                        <br>
                        4. If getting the reservation lock with the
                        ticket succeeded we check if the BO is still the
                        first one on the LRU in question (the BO could
                        have moved).<br>
                        <br>
                        5. If the BO is still the first one on the LRU
                        in question we try to evict it as we would evict
                        any other BO.<br>
                        <br>
                        6. If any of the "If's" above fail we just back
                        off and return -EBUSY.<br>
                        <br>
                        Steps 2-5 are certainly not trivial, but doable
                        as far as I can see.<br>
                        <br>
                        Have fun :)<br>
                        Christian.<br>
                        <br>
                        Am 23.04.19 um 15:19 schrieb Zhou,
                        David(ChunMing):<br>
                      </div>
                      <blockquote type="cite">
                        <meta name="Generator" content="Microsoft
                          Exchange Server">
                        <style>
<!--
.EmailQuote
	{margin-left:1pt;
	padding-left:4pt;
	border-left:#800000 2px solid}
-->
</style>
                        <div>How about adding more condition
                          ctx-&gt;resv inline to address your concern?
                          As well as don't wait from same user,
                          shouldn't lead to deadlock.<br>
                          <br>
                          Otherwise, any other idea?<br>
                          <br>
                          -------- Original Message --------<br>
                          Subject: Re: [PATCH] ttm: wait mem space if
                          user allow while gpu busy<br>
                          From: Christian König <br>
                          To: "Liang, Prike" ,"Zhou, David(ChunMing)" ,<a
                            class="moz-txt-link-abbreviated"
                            href="mailto:dri-devel@lists.freedesktop.org"
                            moz-do-not-send="true">dri-devel@lists.freedesktop.org</a><br>
                          CC: <br>
                          <br>
                        </div>
                        <font size="2"><span style="font-size:11pt">
                            <div class="PlainText">Well that is
                              certainly a NAK because it can lead to
                              deadlock in the
                              <br>
                              memory management.<br>
                              <br>
                              You can't just busy wait with all those
                              locks held.<br>
                              <br>
                              Regards,<br>
                              Christian.<br>
                              <br>
                              Am 23.04.19 um 03:45 schrieb Liang, Prike:<br>
                              &gt; Acked-by: Prike Liang <a
                                class="moz-txt-link-rfc2396E"
                                href="mailto:Prike.Liang@amd.com"
                                moz-do-not-send="true">
                                &lt;Prike.Liang@amd.com&gt;</a><br>
                              &gt;<br>
                              &gt; Thanks,<br>
                              &gt; Prike<br>
                              &gt; -----Original Message-----<br>
                              &gt; From: Chunming Zhou <a
                                class="moz-txt-link-rfc2396E"
                                href="mailto:david1.zhou@amd.com"
                                moz-do-not-send="true">
                                &lt;david1.zhou@amd.com&gt;</a><br>
                              &gt; Sent: Monday, April 22, 2019 6:39 PM<br>
                              &gt; To: <a
                                class="moz-txt-link-abbreviated"
                                href="mailto:dri-devel@lists.freedesktop.org"
                                moz-do-not-send="true">
                                dri-devel@lists.freedesktop.org</a><br>
                              &gt; Cc: Liang, Prike <a
                                class="moz-txt-link-rfc2396E"
                                href="mailto:Prike.Liang@amd.com"
                                moz-do-not-send="true">
                                &lt;Prike.Liang@amd.com&gt;</a>; Zhou,
                              David(ChunMing) <a
                                class="moz-txt-link-rfc2396E"
                                href="mailto:David1.Zhou@amd.com"
                                moz-do-not-send="true">
                                &lt;David1.Zhou@amd.com&gt;</a><br>
                              &gt; Subject: [PATCH] ttm: wait mem space
                              if user allow while gpu busy<br>
                              &gt;<br>
                              &gt; heavy gpu job could occupy memory
                              long time, which could lead to other user
                              fail to get memory.<br>
                              &gt;<br>
                              &gt; Change-Id:
                              I0b322d98cd76e5ac32b00462bbae8008d76c5e11<br>
                              &gt; Signed-off-by: Chunming Zhou <a
                                class="moz-txt-link-rfc2396E"
                                href="mailto:david1.zhou@amd.com"
                                moz-do-not-send="true">
                                &lt;david1.zhou@amd.com&gt;</a><br>
                              &gt; ---<br>
                              &gt;   drivers/gpu/drm/ttm/ttm_bo.c | 6
                              ++++--<br>
                              &gt;   1 file changed, 4 insertions(+), 2
                              deletions(-)<br>
                              &gt;<br>
                              &gt; diff --git
                              a/drivers/gpu/drm/ttm/ttm_bo.c
                              b/drivers/gpu/drm/ttm/ttm_bo.c index
                              7c484729f9b2..6c596cc24bec 100644<br>
                              &gt; --- a/drivers/gpu/drm/ttm/ttm_bo.c<br>
                              &gt; +++ b/drivers/gpu/drm/ttm/ttm_bo.c<br>
                              &gt; @@ -830,8 +830,10 @@ static int
                              ttm_bo_mem_force_space(struct
                              ttm_buffer_object *bo,<br>
                              &gt;                if (mem-&gt;mm_node)<br>
                              &gt;                        break;<br>
                              &gt;                ret =
                              ttm_mem_evict_first(bdev, mem_type, place,
                              ctx);<br>
                              &gt; -             if (unlikely(ret != 0))<br>
                              &gt; -                     return ret;<br>
                              &gt; +             if (unlikely(ret != 0))
                              {<br>
                              &gt; +                     if (!ctx ||
                              ctx-&gt;no_wait_gpu || ret != -EBUSY)<br>
                              &gt; +                             return
                              ret;<br>
                              &gt; +             }<br>
                              &gt;        } while (1);<br>
                              &gt;        mem-&gt;mem_type = mem_type;<br>
                              &gt;        return
                              ttm_bo_add_move_fence(bo, man, mem);<br>
                              &gt; --<br>
                              &gt; 2.17.1<br>
                              &gt;<br>
                              &gt;
                              _______________________________________________<br>
                              &gt; dri-devel mailing list<br>
                              &gt; <a class="moz-txt-link-abbreviated"
href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true">
                                dri-devel@lists.freedesktop.org</a><br>
                              &gt; <a
                                href="https://lists.freedesktop.org/mailman/listinfo/dri-devel"
                                moz-do-not-send="true">
https://lists.freedesktop.org/mailman/listinfo/dri-devel</a><br>
                              <br>
                            </div>
                          </span></font><br>
                        <fieldset class="mimeAttachmentHeader"></fieldset>
                        <pre class="moz-quote-pre">_______________________________________________
dri-devel mailing list
<a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true">dri-devel@lists.freedesktop.org</a>
<a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/dri-devel" moz-do-not-send="true">https://lists.freedesktop.org/mailman/listinfo/dri-devel</a></pre>
                      </blockquote>
                      <br>
                    </div>
                    <br>
                    <fieldset class="mimeAttachmentHeader"></fieldset>
                    <pre class="moz-quote-pre" wrap="">_______________________________________________
dri-devel mailing list
<a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true">dri-devel@lists.freedesktop.org</a>
<a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/dri-devel" moz-do-not-send="true">https://lists.freedesktop.org/mailman/listinfo/dri-devel</a></pre>
                  </blockquote>
                  <br>
                </blockquote>
                <br>
                <br>
                <fieldset class="mimeAttachmentHeader"></fieldset>
                <pre class="moz-quote-pre" wrap="">_______________________________________________
dri-devel mailing list
<a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true">dri-devel@lists.freedesktop.org</a>
<a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/dri-devel" moz-do-not-send="true">https://lists.freedesktop.org/mailman/listinfo/dri-devel</a></pre>
              </blockquote>
              <br>
            </blockquote>
            <br>
            <br>
            <fieldset class="mimeAttachmentHeader"></fieldset>
            <pre class="moz-quote-pre" wrap="">_______________________________________________
dri-devel mailing list
<a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true">dri-devel@lists.freedesktop.org</a>
<a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/dri-devel" moz-do-not-send="true">https://lists.freedesktop.org/mailman/listinfo/dri-devel</a></pre>
          </blockquote>
          <br>
        </blockquote>
        <br>
      </blockquote>
      <br>
    </blockquote>
    <br>
  </body>
</html>
Christian König April 24, 2019, 10:03 a.m. UTC | #16
No, I'm busy with P2P and recoverable page faults.

Christian.

Am 24.04.19 um 11:03 schrieb zhoucm1:
>
> OK, Let's drop mine, then Could you draft a solution for that?
>
>
> -David
>
>
> On 2019年04月24日 16:59, Koenig, Christian wrote:
>> Am 24.04.19 um 10:11 schrieb zhoucm1:
>>>
>>>
>>>
>>> On 2019年04月24日 16:07, Christian König wrote:
>>>> This is used in a work item, so you don't need to check for signals.
>>> will remove.
>>>>
>>>> And checking if the LRU is populated is mandatory here
>>> How to check it outside of TTM? because the code is in dm.
>>
>> Well just use a static cast on the first entry of the LRU.
>>
>> We can't upstream that solution anyway, so just a hack should do for now.
>>
>>>
>>>> or otherwise you can run into an endless loop.
>>> I already add a timeout for that.
>>
>> Thinking more about it we most likely actually need to grab the lock 
>> on the first BO entry from the LRU.
>>
>>>
>>> -David
>>>>
>>>> Christian.
>>>>
>>>> Am 24.04.19 um 09:59 schrieb zhoucm1:
>>>>>
>>>>> how about new attached?
>>>>>
>>>>>
>>>>> -David
>>>>>
>>>>>
>>>>> On 2019年04月24日 15:30, Christian König wrote:
>>>>>> That would change the semantics of ttm_bo_mem_space() and so 
>>>>>> could change the return code in an IOCTL as well. Not a good 
>>>>>> idea, cause that could have a lot of side effects.
>>>>>>
>>>>>> Instead in the calling DC code you could check if you get an 
>>>>>> -ENOMEM and then call schedule().
>>>>>>
>>>>>> If after the schedule() we see that we have now BOs on the LRU we 
>>>>>> can try again and see if pinning the frame buffer now succeeds.
>>>>>>
>>>>>> Christian.
>>>>>>
>>>>>> Am 24.04.19 um 09:17 schrieb zhoucm1:
>>>>>>>
>>>>>>> I made a patch as attached.
>>>>>>>
>>>>>>> I'm not sure how to make patch as your proposal, Could you make 
>>>>>>> a patch for that if mine isn't enough?
>>>>>>>
>>>>>>> -David
>>>>>>>
>>>>>>> On 2019年04月24日 15:12, Christian König wrote:
>>>>>>>>> how about just adding a wrapper for pin function as below?
>>>>>>>> I considered this as well and don't think it will work reliable.
>>>>>>>>
>>>>>>>> We could use it as a band aid for this specific problem, but in 
>>>>>>>> general we need to improve the handling in TTM to resolve those 
>>>>>>>> kind of resource conflicts.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Christian.
>>>>>>>>
>>>>>>>> Am 23.04.19 um 17:09 schrieb Zhou, David(ChunMing):
>>>>>>>>> >3. If we have a ticket we grab a reference to the first BO on 
>>>>>>>>> the LRU, drop the LRU lock and try to grab the reservation 
>>>>>>>>> lock with the ticket.
>>>>>>>>>
>>>>>>>>> The BO on LRU is already locked by cs user, can it be dropped 
>>>>>>>>> here by DC user? and then DC user grab its lock with ticket, 
>>>>>>>>> how does CS grab it again?
>>>>>>>>>
>>>>>>>>> If you think waiting in ttm has this risk, how about just 
>>>>>>>>> adding a wrapper for pin function as below?
>>>>>>>>> amdgpu_get_pin_bo_timeout()
>>>>>>>>> {
>>>>>>>>> do {
>>>>>>>>> amdgpo_bo_reserve();
>>>>>>>>> r=amdgpu_bo_pin();
>>>>>>>>>
>>>>>>>>> if(!r)
>>>>>>>>>         break;
>>>>>>>>> amdgpu_bo_unreserve();
>>>>>>>>> timeout--;
>>>>>>>>>
>>>>>>>>> } while(timeout>0);
>>>>>>>>>
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> -------- Original Message --------
>>>>>>>>> Subject: Re: [PATCH] ttm: wait mem space if user allow while 
>>>>>>>>> gpu busy
>>>>>>>>> From: Christian König
>>>>>>>>> To: "Zhou, David(ChunMing)" ,"Koenig, Christian" ,"Liang, 
>>>>>>>>> Prike" ,dri-devel@lists.freedesktop.org
>>>>>>>>> CC:
>>>>>>>>>
>>>>>>>>> Well that's not so easy of hand.
>>>>>>>>>
>>>>>>>>> The basic problem here is that when you busy wait at this 
>>>>>>>>> place you can easily run into situations where application A 
>>>>>>>>> busy waits for B while B busy waits for A -> deadlock.
>>>>>>>>>
>>>>>>>>> So what we need here is the deadlock detection logic of the 
>>>>>>>>> ww_mutex. To use this we at least need to do the following steps:
>>>>>>>>>
>>>>>>>>> 1. Reserve the BO in DC using a ww_mutex ticket (trivial).
>>>>>>>>>
>>>>>>>>> 2. If we then run into this EBUSY condition in TTM check if 
>>>>>>>>> the BO we need memory for (or rather the ww_mutex of its 
>>>>>>>>> reservation object) has a ticket assigned.
>>>>>>>>>
>>>>>>>>> 3. If we have a ticket we grab a reference to the first BO on 
>>>>>>>>> the LRU, drop the LRU lock and try to grab the reservation 
>>>>>>>>> lock with the ticket.
>>>>>>>>>
>>>>>>>>> 4. If getting the reservation lock with the ticket succeeded 
>>>>>>>>> we check if the BO is still the first one on the LRU in 
>>>>>>>>> question (the BO could have moved).
>>>>>>>>>
>>>>>>>>> 5. If the BO is still the first one on the LRU in question we 
>>>>>>>>> try to evict it as we would evict any other BO.
>>>>>>>>>
>>>>>>>>> 6. If any of the "If's" above fail we just back off and return 
>>>>>>>>> -EBUSY.
>>>>>>>>>
>>>>>>>>> Steps 2-5 are certainly not trivial, but doable as far as I 
>>>>>>>>> can see.
>>>>>>>>>
>>>>>>>>> Have fun :)
>>>>>>>>> Christian.
>>>>>>>>>
>>>>>>>>> Am 23.04.19 um 15:19 schrieb Zhou, David(ChunMing):
>>>>>>>>>> How about adding more condition ctx->resv inline to address 
>>>>>>>>>> your concern? As well as don't wait from same user, shouldn't 
>>>>>>>>>> lead to deadlock.
>>>>>>>>>>
>>>>>>>>>> Otherwise, any other idea?
>>>>>>>>>>
>>>>>>>>>> -------- Original Message --------
>>>>>>>>>> Subject: Re: [PATCH] ttm: wait mem space if user allow while 
>>>>>>>>>> gpu busy
>>>>>>>>>> From: Christian König
>>>>>>>>>> To: "Liang, Prike" ,"Zhou, David(ChunMing)" 
>>>>>>>>>> ,dri-devel@lists.freedesktop.org
>>>>>>>>>> CC:
>>>>>>>>>>
>>>>>>>>>> Well that is certainly a NAK because it can lead to deadlock 
>>>>>>>>>> in the
>>>>>>>>>> memory management.
>>>>>>>>>>
>>>>>>>>>> You can't just busy wait with all those locks held.
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Christian.
>>>>>>>>>>
>>>>>>>>>> Am 23.04.19 um 03:45 schrieb Liang, Prike:
>>>>>>>>>> > Acked-by: Prike Liang <Prike.Liang@amd.com>
>>>>>>>>>> >
>>>>>>>>>> > Thanks,
>>>>>>>>>> > Prike
>>>>>>>>>> > -----Original Message-----
>>>>>>>>>> > From: Chunming Zhou <david1.zhou@amd.com>
>>>>>>>>>> > Sent: Monday, April 22, 2019 6:39 PM
>>>>>>>>>> > To: dri-devel@lists.freedesktop.org
>>>>>>>>>> > Cc: Liang, Prike <Prike.Liang@amd.com>; Zhou, 
>>>>>>>>>> David(ChunMing) <David1.Zhou@amd.com>
>>>>>>>>>> > Subject: [PATCH] ttm: wait mem space if user allow while 
>>>>>>>>>> gpu busy
>>>>>>>>>> >
>>>>>>>>>> > heavy gpu job could occupy memory long time, which could 
>>>>>>>>>> lead to other user fail to get memory.
>>>>>>>>>> >
>>>>>>>>>> > Change-Id: I0b322d98cd76e5ac32b00462bbae8008d76c5e11
>>>>>>>>>> > Signed-off-by: Chunming Zhou <david1.zhou@amd.com>
>>>>>>>>>> > ---
>>>>>>>>>> >   drivers/gpu/drm/ttm/ttm_bo.c | 6 ++++--
>>>>>>>>>> >   1 file changed, 4 insertions(+), 2 deletions(-)
>>>>>>>>>> >
>>>>>>>>>> > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c 
>>>>>>>>>> b/drivers/gpu/drm/ttm/ttm_bo.c index 
>>>>>>>>>> 7c484729f9b2..6c596cc24bec 100644
>>>>>>>>>> > --- a/drivers/gpu/drm/ttm/ttm_bo.c
>>>>>>>>>> > +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>>>>>>>>>> > @@ -830,8 +830,10 @@ static int 
>>>>>>>>>> ttm_bo_mem_force_space(struct ttm_buffer_object *bo,
>>>>>>>>>> >                if (mem->mm_node)
>>>>>>>>>> >                        break;
>>>>>>>>>> >                ret = ttm_mem_evict_first(bdev, mem_type, 
>>>>>>>>>> place, ctx);
>>>>>>>>>> > -             if (unlikely(ret != 0))
>>>>>>>>>> > -                     return ret;
>>>>>>>>>> > +             if (unlikely(ret != 0)) {
>>>>>>>>>> > +                     if (!ctx || ctx->no_wait_gpu || ret 
>>>>>>>>>> != -EBUSY)
>>>>>>>>>> > + return ret;
>>>>>>>>>> > +             }
>>>>>>>>>> >        } while (1);
>>>>>>>>>> >        mem->mem_type = mem_type;
>>>>>>>>>> >        return ttm_bo_add_move_fence(bo, man, mem);
>>>>>>>>>> > --
>>>>>>>>>> > 2.17.1
>>>>>>>>>> >
>>>>>>>>>> > _______________________________________________
>>>>>>>>>> > dri-devel mailing list
>>>>>>>>>> > dri-devel@lists.freedesktop.org
>>>>>>>>>> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> dri-devel mailing list
>>>>>>>>>> dri-devel@lists.freedesktop.org
>>>>>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> dri-devel mailing list
>>>>>>>>> dri-devel@lists.freedesktop.org
>>>>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> dri-devel mailing list
>>>>>>> dri-devel@lists.freedesktop.org
>>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> dri-devel mailing list
>>>>> dri-devel@lists.freedesktop.org
>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>>
>>>
>>
>
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <div class="moz-cite-prefix">No, I'm busy with P2P and recoverable
      page faults.<br>
      <br>
      Christian.<br>
      <br>
      Am 24.04.19 um 11:03 schrieb zhoucm1:<br>
    </div>
    <blockquote type="cite"
      cite="mid:f18a4a91-822b-4a0f-d038-3488fdd64c64@amd.com">
      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
      <p>OK, Let's drop mine, then Could you draft a solution for that?</p>
      <p><br>
      </p>
      <p>-David<br>
      </p>
      <br>
      <div class="moz-cite-prefix">On 2019年04月24日 16:59, Koenig,
        Christian wrote:<br>
      </div>
      <blockquote type="cite"
        cite="mid:9e6e865f-29dd-be69-fada-f6a6b35e6c69@amd.com">
        <meta http-equiv="Content-Type" content="text/html;
          charset=UTF-8">
        <div class="moz-cite-prefix">Am 24.04.19 um 10:11 schrieb
          zhoucm1:<br>
        </div>
        <blockquote type="cite"
          cite="mid:cc8a8a83-a200-528f-9d31-a16115c3c00f@amd.com">
          <p><br>
          </p>
          <br>
          <div class="moz-cite-prefix">On 2019年04月24日 16:07, Christian
            König wrote:<br>
          </div>
          <blockquote type="cite"
            cite="mid:f4f27f33-5e9f-985e-ca89-5cef39bf7aaf@gmail.com">
            <div class="moz-cite-prefix">This is used in a work item, so
              you don't need to check for signals.<br>
            </div>
          </blockquote>
          will remove.<br>
          <blockquote type="cite"
            cite="mid:f4f27f33-5e9f-985e-ca89-5cef39bf7aaf@gmail.com">
            <div class="moz-cite-prefix"><br>
              And checking if the LRU is populated is mandatory here </div>
          </blockquote>
          How to check it outside of TTM? because the code is in dm.<br>
        </blockquote>
        <br>
        Well just use a static cast on the first entry of the LRU.<br>
        <br>
        We can't upstream that solution anyway, so just a hack should do
        for now.<br>
        <br>
        <blockquote type="cite"
          cite="mid:cc8a8a83-a200-528f-9d31-a16115c3c00f@amd.com"><br>
          <blockquote type="cite"
            cite="mid:f4f27f33-5e9f-985e-ca89-5cef39bf7aaf@gmail.com">
            <div class="moz-cite-prefix">or otherwise you can run into
              an endless loop.<br>
            </div>
          </blockquote>
          I already add a timeout for that.<br>
        </blockquote>
        <br>
        Thinking more about it we most likely actually need to grab the
        lock on the first BO entry from the LRU.<br>
        <br>
        <blockquote type="cite"
          cite="mid:cc8a8a83-a200-528f-9d31-a16115c3c00f@amd.com"><br>
          -David<br>
          <blockquote type="cite"
            cite="mid:f4f27f33-5e9f-985e-ca89-5cef39bf7aaf@gmail.com">
            <div class="moz-cite-prefix"><br>
              Christian.<br>
              <br>
              Am 24.04.19 um 09:59 schrieb zhoucm1:<br>
            </div>
            <blockquote type="cite"
              cite="mid:26341685-6e73-55c4-2609-52616d92c06a@amd.com">
              <p>how about new attached?</p>
              <p><br>
              </p>
              <p>-David<br>
              </p>
              <br>
              <div class="moz-cite-prefix">On 2019年04月24日 15:30,
                Christian König wrote:<br>
              </div>
              <blockquote type="cite"
                cite="mid:8db16fe8-78fe-7f25-4acd-51e37e645f3b@gmail.com">
                <div class="moz-cite-prefix">That would change the
                  semantics of ttm_bo_mem_space() and so could change
                  the return code in an IOCTL as well. Not a good idea,
                  cause that could have a lot of side effects.<br>
                  <br>
                  Instead in the calling DC code you could check if you
                  get an -ENOMEM and then call schedule().<br>
                  <br>
                  If after the schedule() we see that we have now BOs on
                  the LRU we can try again and see if pinning the frame
                  buffer now succeeds.<br>
                  <br>
                  Christian.<br>
                  <br>
                  Am 24.04.19 um 09:17 schrieb zhoucm1:<br>
                </div>
                <blockquote type="cite"
                  cite="mid:b57d94e6-6e20-d809-7ac2-89a39a57a365@amd.com">
                  <p>I made a patch as attached.</p>
                  <p>I'm not sure how to make patch as your proposal,
                    Could you make a patch for that if mine isn't
                    enough?<br>
                  </p>
                  -David<br>
                  <br>
                  <div class="moz-cite-prefix">On 2019年04月24日 15:12,
                    Christian König wrote:<br>
                  </div>
                  <blockquote type="cite"
                    cite="mid:32f2a1b7-99f8-de9a-799c-f98af308842b@gmail.com">
                    <div class="moz-cite-prefix">
                      <blockquote type="cite">how about just adding a
                        wrapper for pin function as below?</blockquote>
                      I considered this as well and don't think it will
                      work reliable.<br>
                      <br>
                      We could use it as a band aid for this specific
                      problem, but in general we need to improve the
                      handling in TTM to resolve those kind of resource
                      conflicts.<br>
                      <br>
                      Regards,<br>
                      Christian.<br>
                      <br>
                      Am 23.04.19 um 17:09 schrieb Zhou,
                      David(ChunMing):<br>
                    </div>
                    <blockquote type="cite"
cite="mid:-jm8957cqk536djh1631fvvv-xx4wzb5q21ak-v8q7rd2l14aq-f68kxr7kbea18a7xceae626b-8s84c4d1mgrg53g0bhq3ahee89h70qrv4ly1-vf5a7d63x4mbquxnfmiehuk2-rwaw2m-qc2chu.1556032141262@email.android.com">
                      <meta content="text/html; charset=Windows-1252">
                      &gt;3. If we have a ticket we grab a reference to
                      the first BO on the LRU, drop the LRU lock and try
                      to grab the reservation lock with the ticket.<br>
                      <br>
                      The BO on LRU is already locked by cs user, can it
                      be dropped here by DC user? and then DC user grab
                      its lock with ticket, how does CS grab it again?<br>
                      <br>
                      If you think waiting in ttm has this risk, how
                      about just adding a wrapper for pin function as
                      below?<br>
                      amdgpu_get_pin_bo_timeout()<br>
                      {<br>
                      do {<br>
                      amdgpo_bo_reserve();<br>
                      r=amdgpu_bo_pin();<br>
                      <br>
                      if(!r)<br>
                              break;<br>
                      amdgpu_bo_unreserve();<br>
                      timeout--;<br>
                      <br>
                      } while(timeout&gt;0);<br>
                      <br>
                      }<br>
                      <br>
                      -------- Original Message --------<br>
                      Subject: Re: [PATCH] ttm: wait mem space if user
                      allow while gpu busy<br>
                      From: Christian König <br>
                      To: "Zhou, David(ChunMing)" ,"Koenig, Christian"
                      ,"Liang, Prike" ,<a
                        class="moz-txt-link-abbreviated"
                        href="mailto:dri-devel@lists.freedesktop.org"
                        moz-do-not-send="true">dri-devel@lists.freedesktop.org</a><br>
                      CC: <br>
                      <br>
                      <div>
                        <div class="moz-cite-prefix">Well that's not so
                          easy of hand.<br>
                          <br>
                          The basic problem here is that when you busy
                          wait at this place you can easily run into
                          situations where application A busy waits for
                          B while B busy waits for A -&gt; deadlock.<br>
                          <br>
                          So what we need here is the deadlock detection
                          logic of the ww_mutex. To use this we at least
                          need to do the following steps:<br>
                          <br>
                          1. Reserve the BO in DC using a ww_mutex
                          ticket (trivial).<br>
                          <br>
                          2. If we then run into this EBUSY condition in
                          TTM check if the BO we need memory for (or
                          rather the ww_mutex of its reservation object)
                          has a ticket assigned.<br>
                          <br>
                          3. If we have a ticket we grab a reference to
                          the first BO on the LRU, drop the LRU lock and
                          try to grab the reservation lock with the
                          ticket.<br>
                          <br>
                          4. If getting the reservation lock with the
                          ticket succeeded we check if the BO is still
                          the first one on the LRU in question (the BO
                          could have moved).<br>
                          <br>
                          5. If the BO is still the first one on the LRU
                          in question we try to evict it as we would
                          evict any other BO.<br>
                          <br>
                          6. If any of the "If's" above fail we just
                          back off and return -EBUSY.<br>
                          <br>
                          Steps 2-5 are certainly not trivial, but
                          doable as far as I can see.<br>
                          <br>
                          Have fun :)<br>
                          Christian.<br>
                          <br>
                          Am 23.04.19 um 15:19 schrieb Zhou,
                          David(ChunMing):<br>
                        </div>
                        <blockquote type="cite">
                          <meta name="Generator" content="Microsoft
                            Exchange Server">
                          <style>
<!--
.EmailQuote
	{margin-left:1pt;
	padding-left:4pt;
	border-left:#800000 2px solid}
-->
</style>
                          <div>How about adding more condition
                            ctx-&gt;resv inline to address your concern?
                            As well as don't wait from same user,
                            shouldn't lead to deadlock.<br>
                            <br>
                            Otherwise, any other idea?<br>
                            <br>
                            -------- Original Message --------<br>
                            Subject: Re: [PATCH] ttm: wait mem space if
                            user allow while gpu busy<br>
                            From: Christian König <br>
                            To: "Liang, Prike" ,"Zhou, David(ChunMing)"
                            ,<a class="moz-txt-link-abbreviated"
                              href="mailto:dri-devel@lists.freedesktop.org"
                              moz-do-not-send="true">dri-devel@lists.freedesktop.org</a><br>
                            CC: <br>
                            <br>
                          </div>
                          <font size="2"><span style="font-size:11pt">
                              <div class="PlainText">Well that is
                                certainly a NAK because it can lead to
                                deadlock in the <br>
                                memory management.<br>
                                <br>
                                You can't just busy wait with all those
                                locks held.<br>
                                <br>
                                Regards,<br>
                                Christian.<br>
                                <br>
                                Am 23.04.19 um 03:45 schrieb Liang,
                                Prike:<br>
                                &gt; Acked-by: Prike Liang <a
                                  class="moz-txt-link-rfc2396E"
                                  href="mailto:Prike.Liang@amd.com"
                                  moz-do-not-send="true">
                                  &lt;Prike.Liang@amd.com&gt;</a><br>
                                &gt;<br>
                                &gt; Thanks,<br>
                                &gt; Prike<br>
                                &gt; -----Original Message-----<br>
                                &gt; From: Chunming Zhou <a
                                  class="moz-txt-link-rfc2396E"
                                  href="mailto:david1.zhou@amd.com"
                                  moz-do-not-send="true">
                                  &lt;david1.zhou@amd.com&gt;</a><br>
                                &gt; Sent: Monday, April 22, 2019 6:39
                                PM<br>
                                &gt; To: <a
                                  class="moz-txt-link-abbreviated"
                                  href="mailto:dri-devel@lists.freedesktop.org"
                                  moz-do-not-send="true">
                                  dri-devel@lists.freedesktop.org</a><br>
                                &gt; Cc: Liang, Prike <a
                                  class="moz-txt-link-rfc2396E"
                                  href="mailto:Prike.Liang@amd.com"
                                  moz-do-not-send="true">
                                  &lt;Prike.Liang@amd.com&gt;</a>; Zhou,
                                David(ChunMing) <a
                                  class="moz-txt-link-rfc2396E"
                                  href="mailto:David1.Zhou@amd.com"
                                  moz-do-not-send="true">
                                  &lt;David1.Zhou@amd.com&gt;</a><br>
                                &gt; Subject: [PATCH] ttm: wait mem
                                space if user allow while gpu busy<br>
                                &gt;<br>
                                &gt; heavy gpu job could occupy memory
                                long time, which could lead to other
                                user fail to get memory.<br>
                                &gt;<br>
                                &gt; Change-Id:
                                I0b322d98cd76e5ac32b00462bbae8008d76c5e11<br>
                                &gt; Signed-off-by: Chunming Zhou <a
                                  class="moz-txt-link-rfc2396E"
                                  href="mailto:david1.zhou@amd.com"
                                  moz-do-not-send="true">
                                  &lt;david1.zhou@amd.com&gt;</a><br>
                                &gt; ---<br>
                                &gt;   drivers/gpu/drm/ttm/ttm_bo.c | 6
                                ++++--<br>
                                &gt;   1 file changed, 4 insertions(+),
                                2 deletions(-)<br>
                                &gt;<br>
                                &gt; diff --git
                                a/drivers/gpu/drm/ttm/ttm_bo.c
                                b/drivers/gpu/drm/ttm/ttm_bo.c index
                                7c484729f9b2..6c596cc24bec 100644<br>
                                &gt; --- a/drivers/gpu/drm/ttm/ttm_bo.c<br>
                                &gt; +++ b/drivers/gpu/drm/ttm/ttm_bo.c<br>
                                &gt; @@ -830,8 +830,10 @@ static int
                                ttm_bo_mem_force_space(struct
                                ttm_buffer_object *bo,<br>
                                &gt;                if (mem-&gt;mm_node)<br>
                                &gt;                        break;<br>
                                &gt;                ret =
                                ttm_mem_evict_first(bdev, mem_type,
                                place, ctx);<br>
                                &gt; -             if (unlikely(ret !=
                                0))<br>
                                &gt; -                     return ret;<br>
                                &gt; +             if (unlikely(ret !=
                                0)) {<br>
                                &gt; +                     if (!ctx ||
                                ctx-&gt;no_wait_gpu || ret != -EBUSY)<br>
                                &gt; +                            
                                return ret;<br>
                                &gt; +             }<br>
                                &gt;        } while (1);<br>
                                &gt;        mem-&gt;mem_type = mem_type;<br>
                                &gt;        return
                                ttm_bo_add_move_fence(bo, man, mem);<br>
                                &gt; --<br>
                                &gt; 2.17.1<br>
                                &gt;<br>
                                &gt;
                                _______________________________________________<br>
                                &gt; dri-devel mailing list<br>
                                &gt; <a
                                  class="moz-txt-link-abbreviated"
                                  href="mailto:dri-devel@lists.freedesktop.org"
                                  moz-do-not-send="true">
                                  dri-devel@lists.freedesktop.org</a><br>
                                &gt; <a
                                  href="https://lists.freedesktop.org/mailman/listinfo/dri-devel"
                                  moz-do-not-send="true">
https://lists.freedesktop.org/mailman/listinfo/dri-devel</a><br>
                                <br>
                              </div>
                            </span></font><br>
                          <fieldset class="mimeAttachmentHeader"></fieldset>
                          <pre class="moz-quote-pre">_______________________________________________
dri-devel mailing list
<a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true">dri-devel@lists.freedesktop.org</a>
<a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/dri-devel" moz-do-not-send="true">https://lists.freedesktop.org/mailman/listinfo/dri-devel</a></pre>
                        </blockquote>
                        <br>
                      </div>
                      <br>
                      <fieldset class="mimeAttachmentHeader"></fieldset>
                      <pre class="moz-quote-pre" wrap="">_______________________________________________
dri-devel mailing list
<a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true">dri-devel@lists.freedesktop.org</a>
<a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/dri-devel" moz-do-not-send="true">https://lists.freedesktop.org/mailman/listinfo/dri-devel</a></pre>
                    </blockquote>
                    <br>
                  </blockquote>
                  <br>
                  <br>
                  <fieldset class="mimeAttachmentHeader"></fieldset>
                  <pre class="moz-quote-pre" wrap="">_______________________________________________
dri-devel mailing list
<a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true">dri-devel@lists.freedesktop.org</a>
<a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/dri-devel" moz-do-not-send="true">https://lists.freedesktop.org/mailman/listinfo/dri-devel</a></pre>
                </blockquote>
                <br>
              </blockquote>
              <br>
              <br>
              <fieldset class="mimeAttachmentHeader"></fieldset>
              <pre class="moz-quote-pre" wrap="">_______________________________________________
dri-devel mailing list
<a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true">dri-devel@lists.freedesktop.org</a>
<a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/dri-devel" moz-do-not-send="true">https://lists.freedesktop.org/mailman/listinfo/dri-devel</a></pre>
            </blockquote>
            <br>
          </blockquote>
          <br>
        </blockquote>
        <br>
      </blockquote>
      <br>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <pre class="moz-quote-pre" wrap="">_______________________________________________
dri-devel mailing list
<a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org">dri-devel@lists.freedesktop.org</a>
<a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/dri-devel">https://lists.freedesktop.org/mailman/listinfo/dri-devel</a></pre>
    </blockquote>
    <br>
  </body>
</html>
diff mbox series

Patch

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 7c484729f9b2..6c596cc24bec 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -830,8 +830,10 @@  static int ttm_bo_mem_force_space(struct ttm_buffer_object *bo,
 		if (mem->mm_node)
 			break;
 		ret = ttm_mem_evict_first(bdev, mem_type, place, ctx);
-		if (unlikely(ret != 0))
-			return ret;
+		if (unlikely(ret != 0)) {
+			if (!ctx || ctx->no_wait_gpu || ret != -EBUSY)
+				return ret;
+		}
 	} while (1);
 	mem->mem_type = mem_type;
 	return ttm_bo_add_move_fence(bo, man, mem);