Message ID | 20190514123127.1650-11-christian.koenig@amd.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [01/11] drm/ttm: Make LRU removal optional. | expand |
my only concern is how to fresh LRU when bo is from bo list. -David -------- Original Message -------- Subject: [PATCH 11/11] drm/amdgpu: stop removing BOs from the LRU during CS From: Christian König To: "Olsak, Marek" ,"Zhou, David(ChunMing)" ,"Liang, Prike" ,dri-devel@lists.freedesktop.org,amd-gfx@lists.freedesktop.org CC: [CAUTION: External Email] This avoids OOM situations when we have lots of threads submitting at the same time. Signed-off-by: Christian König <christian.koenig@amd.com> --- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c index fff558cf385b..f9240a94217b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c @@ -648,7 +648,7 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p, } r = ttm_eu_reserve_buffers(&p->ticket, &p->validated, true, - &duplicates, true); + &duplicates, false); if (unlikely(r != 0)) { if (r != -ERESTARTSYS) DRM_ERROR("ttm_eu_reserve_buffers failed.\n"); -- 2.17.1 <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> <meta name="Generator" content="Microsoft Exchange Server"> <!-- converted from text --><style><!-- .EmailQuote { margin-left: 1pt; padding-left: 4pt; border-left: #800000 2px solid; } --></style> </head> <body> <div>my only concern is how to fresh LRU when bo is from bo list.<br> <br> -David<br> <br> -------- Original Message --------<br> Subject: [PATCH 11/11] drm/amdgpu: stop removing BOs from the LRU during CS<br> From: Christian König <br> To: "Olsak, Marek" ,"Zhou, David(ChunMing)" ,"Liang, Prike" ,dri-devel@lists.freedesktop.org,amd-gfx@lists.freedesktop.org<br> CC: <br> <br> </div> <font size="2"><span style="font-size:11pt;"> <div class="PlainText">[CAUTION: External Email]<br> <br> This avoids OOM situations when we have lots of threads<br> submitting at the same time.<br> <br> Signed-off-by: Christian König <christian.koenig@amd.com><br> ---<br> drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 +-<br> 1 file changed, 1 insertion(+), 1 deletion(-)<br> <br> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c<br> index fff558cf385b..f9240a94217b 100644<br> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c<br> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c<br> @@ -648,7 +648,7 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,<br> }<br> <br> r = ttm_eu_reserve_buffers(&p->ticket, &p->validated, true,<br> - &duplicates, true);<br> + &duplicates, false);<br> if (unlikely(r != 0)) {<br> if (r != -ERESTARTSYS)<br> DRM_ERROR("ttm_eu_reserve_buffers failed.\n");<br> --<br> 2.17.1<br> <br> </div> </span></font> </body> </html>
Hui? What do you mean with that? Christian. Am 14.05.19 um 15:12 schrieb Zhou, David(ChunMing): > my only concern is how to fresh LRU when bo is from bo list. > > -David > > -------- Original Message -------- > Subject: [PATCH 11/11] drm/amdgpu: stop removing BOs from the LRU > during CS > From: Christian König > To: "Olsak, Marek" ,"Zhou, David(ChunMing)" ,"Liang, Prike" > ,dri-devel@lists.freedesktop.org,amd-gfx@lists.freedesktop.org > CC: > > [CAUTION: External Email] > > This avoids OOM situations when we have lots of threads > submitting at the same time. > > Signed-off-by: Christian König <christian.koenig@amd.com> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c > index fff558cf385b..f9240a94217b 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c > @@ -648,7 +648,7 @@ static int amdgpu_cs_parser_bos(struct > amdgpu_cs_parser *p, > } > > r = ttm_eu_reserve_buffers(&p->ticket, &p->validated, true, > - &duplicates, true); > + &duplicates, false); > if (unlikely(r != 0)) { > if (r != -ERESTARTSYS) > DRM_ERROR("ttm_eu_reserve_buffers failed.\n"); > -- > 2.17.1 > <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=windows-1252"> </head> <body text="#000000" bgcolor="#FFFFFF"> <div class="moz-cite-prefix">Hui? What do you mean with that?<br> <br> Christian.<br> <br> Am 14.05.19 um 15:12 schrieb Zhou, David(ChunMing):<br> </div> <blockquote type="cite" cite="mid:-vf7xt3-qgf5mz-veq8ih-okgxtz-9ehg3tx8dyemoidihe-fwj066fntvvx-x3y4nh-bn07hl-82anfo4oofx-4di7gg-3nkfhtbcgh58-yj9ws0-pthytc-oq9qcxd40s4g-249dv8-x6wbfujry6xi-mu2nvl.1557839540398@email.android.com"> <meta http-equiv="Content-Type" content="text/html; charset=windows-1252"> <meta name="Generator" content="Microsoft Exchange Server"> <!-- converted from text --> <style><!-- .EmailQuote { margin-left: 1pt; padding-left: 4pt; border-left: #800000 2px solid; } --></style> <div>my only concern is how to fresh LRU when bo is from bo list.<br> <br> -David<br> <br> -------- Original Message --------<br> Subject: [PATCH 11/11] drm/amdgpu: stop removing BOs from the LRU during CS<br> From: Christian König <br> To: "Olsak, Marek" ,"Zhou, David(ChunMing)" ,"Liang, Prike" ,<a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org,amd-gfx@lists.freedesktop.org">dri-devel@lists.freedesktop.org,amd-gfx@lists.freedesktop.org</a><br> CC: <br> <br> </div> <font size="2"><span style="font-size:11pt;"> <div class="PlainText">[CAUTION: External Email]<br> <br> This avoids OOM situations when we have lots of threads<br> submitting at the same time.<br> <br> Signed-off-by: Christian König <a class="moz-txt-link-rfc2396E" href="mailto:christian.koenig@amd.com"><christian.koenig@amd.com></a><br> ---<br> drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 +-<br> 1 file changed, 1 insertion(+), 1 deletion(-)<br> <br> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c<br> index fff558cf385b..f9240a94217b 100644<br> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c<br> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c<br> @@ -648,7 +648,7 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,<br> }<br> <br> r = ttm_eu_reserve_buffers(&p->ticket, &p->validated, true,<br> - &duplicates, true);<br> + &duplicates, false);<br> if (unlikely(r != 0)) {<br> if (r != -ERESTARTSYS)<br> DRM_ERROR("ttm_eu_reserve_buffers failed.\n");<br> --<br> 2.17.1<br> <br> </div> </span></font> </blockquote> <br> </body> </html>
how to refresh LRU to keep the order align with bo list passed from user space? you can verify it by some games, performance could be different much between multiple runnings. -David -------- Original Message -------- Subject: Re: [PATCH 11/11] drm/amdgpu: stop removing BOs from the LRU during CS From: Christian König To: "Zhou, David(ChunMing)" ,"Olsak, Marek" ,"Liang, Prike" ,dri-devel@lists.freedesktop.org,amd-gfx@lists.freedesktop.org CC: [CAUTION: External Email] Hui? What do you mean with that? Christian. Am 14.05.19 um 15:12 schrieb Zhou, David(ChunMing): my only concern is how to fresh LRU when bo is from bo list. -David -------- Original Message -------- Subject: [PATCH 11/11] drm/amdgpu: stop removing BOs from the LRU during CS From: Christian König To: "Olsak, Marek" ,"Zhou, David(ChunMing)" ,"Liang, Prike" ,dri-devel@lists.freedesktop.org,amd-gfx@lists.freedesktop.org<mailto:dri-devel@lists.freedesktop.org,amd-gfx@lists.freedesktop.org> CC: [CAUTION: External Email] This avoids OOM situations when we have lots of threads submitting at the same time. Signed-off-by: Christian König <christian.koenig@amd.com><mailto:christian.koenig@amd.com> --- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c index fff558cf385b..f9240a94217b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c @@ -648,7 +648,7 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p, } r = ttm_eu_reserve_buffers(&p->ticket, &p->validated, true, - &duplicates, true); + &duplicates, false); if (unlikely(r != 0)) { if (r != -ERESTARTSYS) DRM_ERROR("ttm_eu_reserve_buffers failed.\n"); -- 2.17.1 <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=Windows-1252"> <meta content="text/html; charset=Windows-1252"> </head> <body bgcolor="#FFFFFF"> how to refresh LRU to keep the order align with bo list passed from user space?<br> <br> you can verify it by some games, performance could be different much between multiple runnings.<br> <br> -David<br> <br> -------- Original Message --------<br> Subject: Re: [PATCH 11/11] drm/amdgpu: stop removing BOs from the LRU during CS<br> From: Christian König <br> To: "Zhou, David(ChunMing)" ,"Olsak, Marek" ,"Liang, Prike" ,dri-devel@lists.freedesktop.org,amd-gfx@lists.freedesktop.org<br> CC: <br> <br> <div>[CAUTION: External Email] <div> <div class="moz-cite-prefix">Hui? What do you mean with that?<br> <br> Christian.<br> <br> Am 14.05.19 um 15:12 schrieb Zhou, David(ChunMing):<br> </div> <blockquote type="cite"> <meta name="Generator" content="Microsoft Exchange Server"> <style> <!-- .EmailQuote {margin-left:1pt; padding-left:4pt; border-left:#800000 2px solid} --> </style> <div>my only concern is how to fresh LRU when bo is from bo list.<br> <br> -David<br> <br> -------- Original Message --------<br> Subject: [PATCH 11/11] drm/amdgpu: stop removing BOs from the LRU during CS<br> From: Christian König <br> To: "Olsak, Marek" ,"Zhou, David(ChunMing)" ,"Liang, Prike" ,<a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org,amd-gfx@lists.freedesktop.org">dri-devel@lists.freedesktop.org,amd-gfx@lists.freedesktop.org</a><br> CC: <br> <br> </div> <font size="2"><span style="font-size:11pt"> <div class="PlainText">[CAUTION: External Email]<br> <br> This avoids OOM situations when we have lots of threads<br> submitting at the same time.<br> <br> Signed-off-by: Christian König <a class="moz-txt-link-rfc2396E" href="mailto:christian.koenig@amd.com"> <christian.koenig@amd.com></a><br> ---<br> drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 +-<br> 1 file changed, 1 insertion(+), 1 deletion(-)<br> <br> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c<br> index fff558cf385b..f9240a94217b 100644<br> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c<br> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c<br> @@ -648,7 +648,7 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,<br> }<br> <br> r = ttm_eu_reserve_buffers(&p->ticket, &p->validated, true,<br> - &duplicates, true);<br> + &duplicates, false);<br> if (unlikely(r != 0)) {<br> if (r != -ERESTARTSYS)<br> DRM_ERROR("ttm_eu_reserve_buffers failed.\n");<br> --<br> 2.17.1<br> <br> </div> </span></font></blockquote> <br> </div> </div> </body> </html>
This series fixes the OOM errors. However, if I torture the kernel driver more, I can get it to deadlock and end up with unkillable processes. I can also get an OOM error. I just ran the test 5 times: AMD_DEBUG=testgdsmm glxgears & AMD_DEBUG=testgdsmm glxgears & AMD_DEBUG=testgdsmm glxgears & AMD_DEBUG=testgdsmm glxgears & AMD_DEBUG=testgdsmm glxgears Marek On Tue, May 14, 2019 at 8:31 AM Christian König < ckoenig.leichtzumerken@gmail.com> wrote: > This avoids OOM situations when we have lots of threads > submitting at the same time. > > Signed-off-by: Christian König <christian.koenig@amd.com> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c > index fff558cf385b..f9240a94217b 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c > @@ -648,7 +648,7 @@ static int amdgpu_cs_parser_bos(struct > amdgpu_cs_parser *p, > } > > r = ttm_eu_reserve_buffers(&p->ticket, &p->validated, true, > - &duplicates, true); > + &duplicates, false); > if (unlikely(r != 0)) { > if (r != -ERESTARTSYS) > DRM_ERROR("ttm_eu_reserve_buffers failed.\n"); > -- > 2.17.1 > > _______________________________________________ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx <div dir="ltr"><div dir="ltr"><div>This series fixes the OOM errors. However, if I torture the kernel driver more, I can get it to deadlock and end up with unkillable processes. I can also get an OOM error. I just ran the test 5 times:</div><div><br></div><div>AMD_DEBUG=testgdsmm glxgears & AMD_DEBUG=testgdsmm glxgears & AMD_DEBUG=testgdsmm glxgears & AMD_DEBUG=testgdsmm glxgears & AMD_DEBUG=testgdsmm glxgears</div><div></div><br><div>Marek<br></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, May 14, 2019 at 8:31 AM Christian König <<a href="mailto:ckoenig.leichtzumerken@gmail.com">ckoenig.leichtzumerken@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">This avoids OOM situations when we have lots of threads<br> submitting at the same time.<br> <br> Signed-off-by: Christian König <<a href="mailto:christian.koenig@amd.com" target="_blank">christian.koenig@amd.com</a>><br> ---<br> drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 +-<br> 1 file changed, 1 insertion(+), 1 deletion(-)<br> <br> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c<br> index fff558cf385b..f9240a94217b 100644<br> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c<br> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c<br> @@ -648,7 +648,7 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,<br> }<br> <br> r = ttm_eu_reserve_buffers(&p->ticket, &p->validated, true,<br> - &duplicates, true);<br> + &duplicates, false);<br> if (unlikely(r != 0)) {<br> if (r != -ERESTARTSYS)<br> DRM_ERROR("ttm_eu_reserve_buffers failed.\n");<br> -- <br> 2.17.1<br> <br> _______________________________________________<br> amd-gfx mailing list<br> <a href="mailto:amd-gfx@lists.freedesktop.org" target="_blank">amd-gfx@lists.freedesktop.org</a><br> <a href="https://lists.freedesktop.org/mailman/listinfo/amd-gfx" rel="noreferrer" target="_blank">https://lists.freedesktop.org/mailman/listinfo/amd-gfx</a></blockquote></div>
Hi Christian , I just wonder when encounter ENOMEM error during pin amdgpu BOs can we retry validate again as below. With the following simply patch the Abaqus pinned issue not observed. diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index 11cbf63..72a32f5 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -902,11 +902,15 @@ int amdgpu_bo_pin_restricted(struct amdgpu_bo *bo, u32 domain, bo->placements[i].lpfn = lpfn; bo->placements[i].flags |= TTM_PL_FLAG_NO_EVICT; } - +retry: r = ttm_bo_validate(&bo->tbo, &bo->placement, &ctx); if (unlikely(r)) { - dev_err(adev->dev, "%p pin failed\n", bo); - goto error; + if (r == -ENOMEM){ + goto retry; + } else { + dev_err(adev->dev, "%p pin failed\n", bo); + goto error; + } } bo->pin_count = 1; Thanks, Prike From: Marek Olšák <maraeo@gmail.com> Sent: Wednesday, May 15, 2019 3:33 AM To: Christian König <ckoenig.leichtzumerken@gmail.com> Cc: Zhou, David(ChunMing) <David1.Zhou@amd.com>; Liang, Prike <Prike.Liang@amd.com>; dri-devel <dri-devel@lists.freedesktop.org>; amd-gfx mailing list <amd-gfx@lists.freedesktop.org> Subject: Re: [PATCH 11/11] drm/amdgpu: stop removing BOs from the LRU during CS [CAUTION: External Email] This series fixes the OOM errors. However, if I torture the kernel driver more, I can get it to deadlock and end up with unkillable processes. I can also get an OOM error. I just ran the test 5 times: AMD_DEBUG=testgdsmm glxgears & AMD_DEBUG=testgdsmm glxgears & AMD_DEBUG=testgdsmm glxgears & AMD_DEBUG=testgdsmm glxgears & AMD_DEBUG=testgdsmm glxgears Marek On Tue, May 14, 2019 at 8:31 AM Christian König <ckoenig.leichtzumerken@gmail.com<mailto:ckoenig.leichtzumerken@gmail.com>> wrote: This avoids OOM situations when we have lots of threads submitting at the same time. Signed-off-by: Christian König <christian.koenig@amd.com<mailto:christian.koenig@amd.com>> --- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c index fff558cf385b..f9240a94217b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c @@ -648,7 +648,7 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p, } r = ttm_eu_reserve_buffers(&p->ticket, &p->validated, true, - &duplicates, true); + &duplicates, false); if (unlikely(r != 0)) { if (r != -ERESTARTSYS) DRM_ERROR("ttm_eu_reserve_buffers failed.\n"); -- 2.17.1
Hi Prike, no, that can lead to massive problems in a real OOM situation and is not something we can do here. Christian. Am 15.05.19 um 04:00 schrieb Liang, Prike: > > Hi Christian , > > I just wonder when encounter ENOMEM error during pin amdgpu BOs can we > retry validate again as below. > > With the following simply patch the Abaqus pinned issue not observed. > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > > index 11cbf63..72a32f5 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > > @@ -902,11 +902,15 @@ int amdgpu_bo_pin_restricted(struct amdgpu_bo > *bo, u32 domain, > > bo->placements[i].lpfn = lpfn; > > bo->placements[i].flags |= TTM_PL_FLAG_NO_EVICT; > > } > > - > > +retry: > > r = ttm_bo_validate(&bo->tbo, &bo->placement, &ctx); > > if (unlikely(r)) { > > - dev_err(adev->dev, "%p pin failed\n", bo); > > - goto error; > > + if (r == -ENOMEM){ > > + goto retry; > > + } else { > > + dev_err(adev->dev, "%p pin failed\n", bo); > > + goto error; > > + } > > } > > bo->pin_count = 1; > > Thanks, > > Prike > > *From:* Marek Olšák <maraeo@gmail.com> > *Sent:* Wednesday, May 15, 2019 3:33 AM > *To:* Christian König <ckoenig.leichtzumerken@gmail.com> > *Cc:* Zhou, David(ChunMing) <David1.Zhou@amd.com>; Liang, Prike > <Prike.Liang@amd.com>; dri-devel <dri-devel@lists.freedesktop.org>; > amd-gfx mailing list <amd-gfx@lists.freedesktop.org> > *Subject:* Re: [PATCH 11/11] drm/amdgpu: stop removing BOs from the > LRU during CS > > [CAUTION: External Email] > > This series fixes the OOM errors. However, if I torture the kernel > driver more, I can get it to deadlock and end up with unkillable > processes. I can also get an OOM error. I just ran the test 5 times: > > AMD_DEBUG=testgdsmm glxgears & AMD_DEBUG=testgdsmm glxgears & > AMD_DEBUG=testgdsmm glxgears & AMD_DEBUG=testgdsmm glxgears & > AMD_DEBUG=testgdsmm glxgears > > Marek > > On Tue, May 14, 2019 at 8:31 AM Christian König > <ckoenig.leichtzumerken@gmail.com > <mailto:ckoenig.leichtzumerken@gmail.com>> wrote: > > This avoids OOM situations when we have lots of threads > submitting at the same time. > > Signed-off-by: Christian König <christian.koenig@amd.com > <mailto:christian.koenig@amd.com>> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c > index fff558cf385b..f9240a94217b 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c > @@ -648,7 +648,7 @@ static int amdgpu_cs_parser_bos(struct > amdgpu_cs_parser *p, > } > > r = ttm_eu_reserve_buffers(&p->ticket, &p->validated, true, > - &duplicates, true); > + &duplicates, false); > if (unlikely(r != 0)) { > if (r != -ERESTARTSYS) > DRM_ERROR("ttm_eu_reserve_buffers failed.\n"); > -- > 2.17.1 > > _______________________________________________ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org <mailto:amd-gfx@lists.freedesktop.org> > https://lists.freedesktop.org/mailman/listinfo/amd-gfx > <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> </head> <body text="#000000" bgcolor="#FFFFFF"> <div class="moz-cite-prefix">Hi Prike,<br> <br> no, that can lead to massive problems in a real OOM situation and is not something we can do here.<br> <br> Christian.<br> <br> Am 15.05.19 um 04:00 schrieb Liang, Prike:<br> </div> <blockquote type="cite" cite="mid:BYAPR12MB35256D8A0583B5DD019C2925FB090@BYAPR12MB3525.namprd12.prod.outlook.com"> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> <meta name="Generator" content="Microsoft Word 15 (filtered medium)"> <style><!-- /* Font Definitions */ @font-face {font-family:"Cambria Math"; panose-1:2 4 5 3 5 4 6 3 2 4;} @font-face {font-family:DengXian; panose-1:2 1 6 0 3 1 1 1 1 1;} @font-face {font-family:Calibri; panose-1:2 15 5 2 2 2 4 3 2 4;} @font-face {font-family:"\@等线"; panose-1:2 1 6 0 3 1 1 1 1 1;} /* Style Definitions */ p.MsoNormal, li.MsoNormal, div.MsoNormal {margin:0in; margin-bottom:.0001pt; font-size:11.0pt; font-family:"Calibri",sans-serif;} a:link, span.MsoHyperlink {mso-style-priority:99; color:blue; text-decoration:underline;} a:visited, span.MsoHyperlinkFollowed {mso-style-priority:99; color:purple; text-decoration:underline;} p.msonormal0, li.msonormal0, div.msonormal0 {mso-style-name:msonormal; mso-margin-top-alt:auto; margin-right:0in; mso-margin-bottom-alt:auto; margin-left:0in; font-size:11.0pt; font-family:"Calibri",sans-serif;} span.EmailStyle18 {mso-style-type:personal-reply; font-family:"Calibri",sans-serif; color:windowtext;} .MsoChpDefault {mso-style-type:export-only; font-size:10.0pt;} @page WordSection1 {size:8.5in 11.0in; margin:1.0in 1.25in 1.0in 1.25in;} div.WordSection1 {page:WordSection1;} --></style><!--[if gte mso 9]><xml> <o:shapedefaults v:ext="edit" spidmax="1026" /> </xml><![endif]--><!--[if gte mso 9]><xml> <o:shapelayout v:ext="edit"> <o:idmap v:ext="edit" data="1" /> </o:shapelayout></xml><![endif]--> <div class="WordSection1"> <p class="MsoNormal">Hi Christian ,<o:p></o:p></p> <p class="MsoNormal"><o:p> </o:p></p> <p class="MsoNormal">I just wonder when encounter ENOMEM error during pin amdgpu BOs can we retry validate again as below.<o:p></o:p></p> <p class="MsoNormal">With the following simply patch the Abaqus pinned issue not observed. <o:p></o:p></p> <p class="MsoNormal"><o:p> </o:p></p> <p class="MsoNormal">diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c<o:p></o:p></p> <p class="MsoNormal">index 11cbf63..72a32f5 100644<o:p></o:p></p> <p class="MsoNormal">--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c<o:p></o:p></p> <p class="MsoNormal">+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c<o:p></o:p></p> <p class="MsoNormal">@@ -902,11 +902,15 @@ int amdgpu_bo_pin_restricted(struct amdgpu_bo *bo, u32 domain,<o:p></o:p></p> <p class="MsoNormal"> bo->placements[i].lpfn = lpfn;<o:p></o:p></p> <p class="MsoNormal"> bo->placements[i].flags |= TTM_PL_FLAG_NO_EVICT;<o:p></o:p></p> <p class="MsoNormal"> }<o:p></o:p></p> <p class="MsoNormal">-<o:p></o:p></p> <p class="MsoNormal">+retry:<o:p></o:p></p> <p class="MsoNormal"> r = ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);<o:p></o:p></p> <p class="MsoNormal"> if (unlikely(r)) {<o:p></o:p></p> <p class="MsoNormal">- dev_err(adev->dev, "%p pin failed\n", bo);<o:p></o:p></p> <p class="MsoNormal">- goto error;<o:p></o:p></p> <p class="MsoNormal">+ if (r == -ENOMEM){<o:p></o:p></p> <p class="MsoNormal">+ goto retry;<o:p></o:p></p> <p class="MsoNormal">+ } else {<o:p></o:p></p> <p class="MsoNormal">+ dev_err(adev->dev, "%p pin failed\n", bo);<o:p></o:p></p> <p class="MsoNormal">+ goto error;<o:p></o:p></p> <p class="MsoNormal">+ }<o:p></o:p></p> <p class="MsoNormal"> }<o:p></o:p></p> <p class="MsoNormal"><o:p> </o:p></p> <p class="MsoNormal"> bo->pin_count = 1;<o:p></o:p></p> <p class="MsoNormal"><o:p> </o:p></p> <p class="MsoNormal"><o:p> </o:p></p> <p class="MsoNormal">Thanks,<o:p></o:p></p> <p class="MsoNormal">Prike<o:p></o:p></p> <p class="MsoNormal"><o:p> </o:p></p> <div> <div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in 0in 0in"> <p class="MsoNormal"><b>From:</b> Marek Olšák <a class="moz-txt-link-rfc2396E" href="mailto:maraeo@gmail.com"><maraeo@gmail.com></a> <br> <b>Sent:</b> Wednesday, May 15, 2019 3:33 AM<br> <b>To:</b> Christian König <a class="moz-txt-link-rfc2396E" href="mailto:ckoenig.leichtzumerken@gmail.com"><ckoenig.leichtzumerken@gmail.com></a><br> <b>Cc:</b> Zhou, David(ChunMing) <a class="moz-txt-link-rfc2396E" href="mailto:David1.Zhou@amd.com"><David1.Zhou@amd.com></a>; Liang, Prike <a class="moz-txt-link-rfc2396E" href="mailto:Prike.Liang@amd.com"><Prike.Liang@amd.com></a>; dri-devel <a class="moz-txt-link-rfc2396E" href="mailto:dri-devel@lists.freedesktop.org"><dri-devel@lists.freedesktop.org></a>; amd-gfx mailing list <a class="moz-txt-link-rfc2396E" href="mailto:amd-gfx@lists.freedesktop.org"><amd-gfx@lists.freedesktop.org></a><br> <b>Subject:</b> Re: [PATCH 11/11] drm/amdgpu: stop removing BOs from the LRU during CS<o:p></o:p></p> </div> </div> <p class="MsoNormal"><o:p> </o:p></p> <p class="MsoNormal">[CAUTION: External Email] <o:p></o:p></p> <div> <div> <div> <div> <p class="MsoNormal">This series fixes the OOM errors. However, if I torture the kernel driver more, I can get it to deadlock and end up with unkillable processes. I can also get an OOM error. I just ran the test 5 times:<o:p></o:p></p> </div> <div> <p class="MsoNormal"><o:p> </o:p></p> </div> <div> <p class="MsoNormal">AMD_DEBUG=testgdsmm glxgears & AMD_DEBUG=testgdsmm glxgears & AMD_DEBUG=testgdsmm glxgears & AMD_DEBUG=testgdsmm glxgears & AMD_DEBUG=testgdsmm glxgears<o:p></o:p></p> </div> <p class="MsoNormal"><o:p> </o:p></p> <div> <p class="MsoNormal">Marek<o:p></o:p></p> </div> </div> </div> <p class="MsoNormal"><o:p> </o:p></p> <div> <div> <p class="MsoNormal">On Tue, May 14, 2019 at 8:31 AM Christian König <<a href="mailto:ckoenig.leichtzumerken@gmail.com" moz-do-not-send="true">ckoenig.leichtzumerken@gmail.com</a>> wrote:<o:p></o:p></p> </div> <blockquote style="border:none;border-left:solid #CCCCCC 1.0pt;padding:0in 0in 0in 6.0pt;margin-left:4.8pt;margin-right:0in"> <p class="MsoNormal">This avoids OOM situations when we have lots of threads<br> submitting at the same time.<br> <br> Signed-off-by: Christian König <<a href="mailto:christian.koenig@amd.com" target="_blank" moz-do-not-send="true">christian.koenig@amd.com</a>><br> ---<br> drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 +-<br> 1 file changed, 1 insertion(+), 1 deletion(-)<br> <br> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c<br> index fff558cf385b..f9240a94217b 100644<br> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c<br> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c<br> @@ -648,7 +648,7 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,<br> }<br> <br> r = ttm_eu_reserve_buffers(&p->ticket, &p->validated, true,<br> - &duplicates, true);<br> + &duplicates, false);<br> if (unlikely(r != 0)) {<br> if (r != -ERESTARTSYS)<br> DRM_ERROR("ttm_eu_reserve_buffers failed.\n");<br> -- <br> 2.17.1<br> <br> _______________________________________________<br> amd-gfx mailing list<br> <a href="mailto:amd-gfx@lists.freedesktop.org" target="_blank" moz-do-not-send="true">amd-gfx@lists.freedesktop.org</a><br> <a href="https://lists.freedesktop.org/mailman/listinfo/amd-gfx" target="_blank" moz-do-not-send="true">https://lists.freedesktop.org/mailman/listinfo/amd-gfx</a><o:p></o:p></p> </blockquote> </div> </div> </div> </blockquote> <br> </body> </html>
That is a good point, but actually not a problem in practice. See the change to ttm_eu_fence_buffer_objects: > - ttm_bo_add_to_lru(bo); > + if (list_empty(&bo->lru)) > + ttm_bo_add_to_lru(bo); > + else > + ttm_bo_move_to_lru_tail(bo, NULL); We still move the BOs to the end of the LRU in the same order we have before, we just don't remove them when they are reserved. Regards, Christian. Am 14.05.19 um 16:31 schrieb Zhou, David(ChunMing): > how to refresh LRU to keep the order align with bo list passed from > user space? > > you can verify it by some games, performance could be different much > between multiple runnings. > > -David > > -------- Original Message -------- > Subject: Re: [PATCH 11/11] drm/amdgpu: stop removing BOs from the LRU > during CS > From: Christian König > To: "Zhou, David(ChunMing)" ,"Olsak, Marek" ,"Liang, Prike" > ,dri-devel@lists.freedesktop.org,amd-gfx@lists.freedesktop.org > CC: > > [CAUTION: External Email] > Hui? What do you mean with that? > > Christian. > > Am 14.05.19 um 15:12 schrieb Zhou, David(ChunMing): >> my only concern is how to fresh LRU when bo is from bo list. >> >> -David >> >> -------- Original Message -------- >> Subject: [PATCH 11/11] drm/amdgpu: stop removing BOs from the LRU >> during CS >> From: Christian König >> To: "Olsak, Marek" ,"Zhou, David(ChunMing)" ,"Liang, Prike" >> ,dri-devel@lists.freedesktop.org,amd-gfx@lists.freedesktop.org >> CC: >> >> [CAUTION: External Email] >> >> This avoids OOM situations when we have lots of threads >> submitting at the same time. >> >> Signed-off-by: Christian König <christian.koenig@amd.com> >> --- >> drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c >> index fff558cf385b..f9240a94217b 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c >> @@ -648,7 +648,7 @@ static int amdgpu_cs_parser_bos(struct >> amdgpu_cs_parser *p, >> } >> >> r = ttm_eu_reserve_buffers(&p->ticket, &p->validated, true, >> - &duplicates, true); >> + &duplicates, false); >> if (unlikely(r != 0)) { >> if (r != -ERESTARTSYS) >> DRM_ERROR("ttm_eu_reserve_buffers failed.\n"); >> -- >> 2.17.1 >> > > > _______________________________________________ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=windows-1252"> </head> <body text="#000000" bgcolor="#FFFFFF"> <div class="moz-cite-prefix">That is a good point, but actually not a problem in practice.<br> <br> See the change to ttm_eu_fence_buffer_objects:<br> <blockquote type="cite">- ttm_bo_add_to_lru(bo);<br> + if (list_empty(&bo->lru))<br> + ttm_bo_add_to_lru(bo);<br> + else<br> + ttm_bo_move_to_lru_tail(bo, NULL);<br> </blockquote> <br> We still move the BOs to the end of the LRU in the same order we have before, we just don't remove them when they are reserved.<br> <br> Regards,<br> Christian.<br> <br> Am 14.05.19 um 16:31 schrieb Zhou, David(ChunMing):<br> </div> <blockquote type="cite" cite="mid:-wsx1tz-kxfbz1yns7x33sra134gl11xhlux4lx3izissqr2httt4mb1vleyxgj8i7k6-q6ze8ub3ff8c4o0fxmx7niu76yg4-ybakue-3v14jw-ed5ol8ybh6o9-1ze886-hbstfi448pvq3pwhkj.1557844282594@email.android.com"> <meta http-equiv="Content-Type" content="text/html; charset=windows-1252"> <meta content="text/html; charset=Windows-1252"> how to refresh LRU to keep the order align with bo list passed from user space?<br> <br> you can verify it by some games, performance could be different much between multiple runnings.<br> <br> -David<br> <br> -------- Original Message --------<br> Subject: Re: [PATCH 11/11] drm/amdgpu: stop removing BOs from the LRU during CS<br> From: Christian König <br> To: "Zhou, David(ChunMing)" ,"Olsak, Marek" ,"Liang, Prike" ,<a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org,amd-gfx@lists.freedesktop.org">dri-devel@lists.freedesktop.org,amd-gfx@lists.freedesktop.org</a><br> CC: <br> <br> <div>[CAUTION: External Email] <div> <div class="moz-cite-prefix">Hui? What do you mean with that?<br> <br> Christian.<br> <br> Am 14.05.19 um 15:12 schrieb Zhou, David(ChunMing):<br> </div> <blockquote type="cite"> <meta name="Generator" content="Microsoft Exchange Server"> <style> <!-- .EmailQuote {margin-left:1pt; padding-left:4pt; border-left:#800000 2px solid} --> </style> <div>my only concern is how to fresh LRU when bo is from bo list.<br> <br> -David<br> <br> -------- Original Message --------<br> Subject: [PATCH 11/11] drm/amdgpu: stop removing BOs from the LRU during CS<br> From: Christian König <br> To: "Olsak, Marek" ,"Zhou, David(ChunMing)" ,"Liang, Prike" ,<a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org,amd-gfx@lists.freedesktop.org" moz-do-not-send="true">dri-devel@lists.freedesktop.org,amd-gfx@lists.freedesktop.org</a><br> CC: <br> <br> </div> <font size="2"><span style="font-size:11pt"> <div class="PlainText">[CAUTION: External Email]<br> <br> This avoids OOM situations when we have lots of threads<br> submitting at the same time.<br> <br> Signed-off-by: Christian König <a class="moz-txt-link-rfc2396E" href="mailto:christian.koenig@amd.com" moz-do-not-send="true"> <christian.koenig@amd.com></a><br> ---<br> drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 +-<br> 1 file changed, 1 insertion(+), 1 deletion(-)<br> <br> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c<br> index fff558cf385b..f9240a94217b 100644<br> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c<br> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c<br> @@ -648,7 +648,7 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,<br> }<br> <br> r = ttm_eu_reserve_buffers(&p->ticket, &p->validated, true,<br> - &duplicates, true);<br> + &duplicates, false);<br> if (unlikely(r != 0)) {<br> if (r != -ERESTARTSYS)<br> DRM_ERROR("ttm_eu_reserve_buffers failed.\n");<br> --<br> 2.17.1<br> <br> </div> </span></font></blockquote> <br> </div> </div> <br> <fieldset class="mimeAttachmentHeader"></fieldset> <pre class="moz-quote-pre" wrap="">_______________________________________________ amd-gfx mailing list <a class="moz-txt-link-abbreviated" href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a> <a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/amd-gfx">https://lists.freedesktop.org/mailman/listinfo/amd-gfx</a></pre> </blockquote> <br> </body> </html>
Isn't this patch trying to stop removing for all BOs from bo list? -David -------- Original Message -------- Subject: Re: [PATCH 11/11] drm/amdgpu: stop removing BOs from the LRU during CS From: Christian König To: "Zhou, David(ChunMing)" ,"Koenig, Christian" ,"Olsak, Marek" ,"Liang, Prike" ,dri-devel@lists.freedesktop.org,amd-gfx@lists.freedesktop.org CC: [CAUTION: External Email] That is a good point, but actually not a problem in practice. See the change to ttm_eu_fence_buffer_objects: - ttm_bo_add_to_lru(bo); + if (list_empty(&bo->lru)) + ttm_bo_add_to_lru(bo); + else + ttm_bo_move_to_lru_tail(bo, NULL); We still move the BOs to the end of the LRU in the same order we have before, we just don't remove them when they are reserved. Regards, Christian. Am 14.05.19 um 16:31 schrieb Zhou, David(ChunMing): how to refresh LRU to keep the order align with bo list passed from user space? you can verify it by some games, performance could be different much between multiple runnings. -David -------- Original Message -------- Subject: Re: [PATCH 11/11] drm/amdgpu: stop removing BOs from the LRU during CS From: Christian König To: "Zhou, David(ChunMing)" ,"Olsak, Marek" ,"Liang, Prike" ,dri-devel@lists.freedesktop.org,amd-gfx@lists.freedesktop.org<mailto:dri-devel@lists.freedesktop.org,amd-gfx@lists.freedesktop.org> CC: [CAUTION: External Email] Hui? What do you mean with that? Christian. Am 14.05.19 um 15:12 schrieb Zhou, David(ChunMing): my only concern is how to fresh LRU when bo is from bo list. -David -------- Original Message -------- Subject: [PATCH 11/11] drm/amdgpu: stop removing BOs from the LRU during CS From: Christian König To: "Olsak, Marek" ,"Zhou, David(ChunMing)" ,"Liang, Prike" ,dri-devel@lists.freedesktop.org,amd-gfx@lists.freedesktop.org<mailto:dri-devel@lists.freedesktop.org,amd-gfx@lists.freedesktop.org> CC: [CAUTION: External Email] This avoids OOM situations when we have lots of threads submitting at the same time. Signed-off-by: Christian König <christian.koenig@amd.com><mailto:christian.koenig@amd.com> --- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c index fff558cf385b..f9240a94217b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c @@ -648,7 +648,7 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p, } r = ttm_eu_reserve_buffers(&p->ticket, &p->validated, true, - &duplicates, true); + &duplicates, false); if (unlikely(r != 0)) { if (r != -ERESTARTSYS) DRM_ERROR("ttm_eu_reserve_buffers failed.\n"); -- 2.17.1
BO list? No, we stop removing them from the LRU. But we still move them to the end of the LRU before releasing them. Christian. Am 15.05.19 um 16:21 schrieb Zhou, David(ChunMing): Isn't this patch trying to stop removing for all BOs from bo list? -David -------- Original Message -------- Subject: Re: [PATCH 11/11] drm/amdgpu: stop removing BOs from the LRU during CS From: Christian König To: "Zhou, David(ChunMing)" ,"Koenig, Christian" ,"Olsak, Marek" ,"Liang, Prike" ,dri-devel@lists.freedesktop.org,amd-gfx@lists.freedesktop.org<mailto:dri-devel@lists.freedesktop.org,amd-gfx@lists.freedesktop.org> CC: [CAUTION: External Email] That is a good point, but actually not a problem in practice. See the change to ttm_eu_fence_buffer_objects: - ttm_bo_add_to_lru(bo); + if (list_empty(&bo->lru)) + ttm_bo_add_to_lru(bo); + else + ttm_bo_move_to_lru_tail(bo, NULL); We still move the BOs to the end of the LRU in the same order we have before, we just don't remove them when they are reserved. Regards, Christian. Am 14.05.19 um 16:31 schrieb Zhou, David(ChunMing): how to refresh LRU to keep the order align with bo list passed from user space? you can verify it by some games, performance could be different much between multiple runnings. -David -------- Original Message -------- Subject: Re: [PATCH 11/11] drm/amdgpu: stop removing BOs from the LRU during CS From: Christian König To: "Zhou, David(ChunMing)" ,"Olsak, Marek" ,"Liang, Prike" ,dri-devel@lists.freedesktop.org,amd-gfx@lists.freedesktop.org<mailto:dri-devel@lists.freedesktop.org,amd-gfx@lists.freedesktop.org> CC: [CAUTION: External Email] Hui? What do you mean with that? Christian. Am 14.05.19 um 15:12 schrieb Zhou, David(ChunMing): my only concern is how to fresh LRU when bo is from bo list. -David -------- Original Message -------- Subject: [PATCH 11/11] drm/amdgpu: stop removing BOs from the LRU during CS From: Christian König To: "Olsak, Marek" ,"Zhou, David(ChunMing)" ,"Liang, Prike" ,dri-devel@lists.freedesktop.org,amd-gfx@lists.freedesktop.org<mailto:dri-devel@lists.freedesktop.org,amd-gfx@lists.freedesktop.org> CC: [CAUTION: External Email] This avoids OOM situations when we have lots of threads submitting at the same time. Signed-off-by: Christian König <christian.koenig@amd.com><mailto:christian.koenig@amd.com> --- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c index fff558cf385b..f9240a94217b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c @@ -648,7 +648,7 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p, } r = ttm_eu_reserve_buffers(&p->ticket, &p->validated, true, - &duplicates, true); + &duplicates, false); if (unlikely(r != 0)) { if (r != -ERESTARTSYS) DRM_ERROR("ttm_eu_reserve_buffers failed.\n"); -- 2.17.1
Ah, sorry, I missed "+ ttm_bo_move_to_lru_tail(bo, NULL);". Right, moving them to end before releasing is fixing my concern. Sorry for noise. -David -------- Original Message -------- Subject: Re: [PATCH 11/11] drm/amdgpu: stop removing BOs from the LRU during CS From: "Koenig, Christian" To: "Zhou, David(ChunMing)" ,"Olsak, Marek" ,"Liang, Prike" ,dri-devel@lists.freedesktop.org,amd-gfx@lists.freedesktop.org CC: [CAUTION: External Email] BO list? No, we stop removing them from the LRU. But we still move them to the end of the LRU before releasing them. Christian. Am 15.05.19 um 16:21 schrieb Zhou, David(ChunMing): Isn't this patch trying to stop removing for all BOs from bo list? -David -------- Original Message -------- Subject: Re: [PATCH 11/11] drm/amdgpu: stop removing BOs from the LRU during CS From: Christian König To: "Zhou, David(ChunMing)" ,"Koenig, Christian" ,"Olsak, Marek" ,"Liang, Prike" ,dri-devel@lists.freedesktop.org,amd-gfx@lists.freedesktop.org<mailto:dri-devel@lists.freedesktop.org,amd-gfx@lists.freedesktop.org> CC: [CAUTION: External Email] That is a good point, but actually not a problem in practice. See the change to ttm_eu_fence_buffer_objects: - ttm_bo_add_to_lru(bo); + if (list_empty(&bo->lru)) + ttm_bo_add_to_lru(bo); + else + ttm_bo_move_to_lru_tail(bo, NULL); We still move the BOs to the end of the LRU in the same order we have before, we just don't remove them when they are reserved. Regards, Christian. Am 14.05.19 um 16:31 schrieb Zhou, David(ChunMing): how to refresh LRU to keep the order align with bo list passed from user space? you can verify it by some games, performance could be different much between multiple runnings. -David -------- Original Message -------- Subject: Re: [PATCH 11/11] drm/amdgpu: stop removing BOs from the LRU during CS From: Christian König To: "Zhou, David(ChunMing)" ,"Olsak, Marek" ,"Liang, Prike" ,dri-devel@lists.freedesktop.org,amd-gfx@lists.freedesktop.org<mailto:dri-devel@lists.freedesktop.org,amd-gfx@lists.freedesktop.org> CC: [CAUTION: External Email] Hui? What do you mean with that? Christian. Am 14.05.19 um 15:12 schrieb Zhou, David(ChunMing): my only concern is how to fresh LRU when bo is from bo list. -David -------- Original Message -------- Subject: [PATCH 11/11] drm/amdgpu: stop removing BOs from the LRU during CS From: Christian König To: "Olsak, Marek" ,"Zhou, David(ChunMing)" ,"Liang, Prike" ,dri-devel@lists.freedesktop.org,amd-gfx@lists.freedesktop.org<mailto:dri-devel@lists.freedesktop.org,amd-gfx@lists.freedesktop.org> CC: [CAUTION: External Email] This avoids OOM situations when we have lots of threads submitting at the same time. Signed-off-by: Christian König <christian.koenig@amd.com><mailto:christian.koenig@amd.com> --- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c index fff558cf385b..f9240a94217b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c @@ -648,7 +648,7 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p, } r = ttm_eu_reserve_buffers(&p->ticket, &p->validated, true, - &duplicates, true); + &duplicates, false); if (unlikely(r != 0)) { if (r != -ERESTARTSYS) DRM_ERROR("ttm_eu_reserve_buffers failed.\n"); -- 2.17.1
Hi Christian, With the series patch set , amdgpu_vm_validate_pt_bos occasionally evicted amdgpu BOs failed and can’t find the valid first busy bo . Another problem is that during the first BOs get lock period will run into deadlock . /* check if other user occupy memory too long time */ if (!first_bo || !request_resv || !request_resv->lock.ctx) { if (first_bo) ttm_bo_put(first_bo); return -EBUSY; } if (first_bo->resv == request_resv) { ttm_bo_put(first_bo); return -EBUSY; } if (ctx->interruptible) ret = ww_mutex_lock_interruptible(&first_bo->resv->lock, request_resv->lock.ctx); else ret = ww_mutex_lock(&first_bo->resv->lock, request_resv->lock.ctx); if (ret) { ttm_bo_put(first_bo); if (ret == -EDEADLK) { ret = -EAGAIN; } return ret; } Thanks Prike From: Christian König <ckoenig.leichtzumerken@gmail.com> Sent: Wednesday, May 15, 2019 3:05 PM To: Liang, Prike <Prike.Liang@amd.com>; Marek Olšák <maraeo@gmail.com> Cc: Zhou, David(ChunMing) <David1.Zhou@amd.com>; dri-devel <dri-devel@lists.freedesktop.org>; amd-gfx mailing list <amd-gfx@lists.freedesktop.org> Subject: Re: [PATCH 11/11] drm/amdgpu: stop removing BOs from the LRU during CS [CAUTION: External Email] Hi Prike, no, that can lead to massive problems in a real OOM situation and is not something we can do here. Christian. Am 15.05.19 um 04:00 schrieb Liang, Prike: Hi Christian , I just wonder when encounter ENOMEM error during pin amdgpu BOs can we retry validate again as below. With the following simply patch the Abaqus pinned issue not observed. diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index 11cbf63..72a32f5 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -902,11 +902,15 @@ int amdgpu_bo_pin_restricted(struct amdgpu_bo *bo, u32 domain, bo->placements[i].lpfn = lpfn; bo->placements[i].flags |= TTM_PL_FLAG_NO_EVICT; } - +retry: r = ttm_bo_validate(&bo->tbo, &bo->placement, &ctx); if (unlikely(r)) { - dev_err(adev->dev, "%p pin failed\n", bo); - goto error; + if (r == -ENOMEM){ + goto retry; + } else { + dev_err(adev->dev, "%p pin failed\n", bo); + goto error; + } } bo->pin_count = 1; Thanks, Prike From: Marek Olšák <maraeo@gmail.com><mailto:maraeo@gmail.com> Sent: Wednesday, May 15, 2019 3:33 AM To: Christian König <ckoenig.leichtzumerken@gmail.com><mailto:ckoenig.leichtzumerken@gmail.com> Cc: Zhou, David(ChunMing) <David1.Zhou@amd.com><mailto:David1.Zhou@amd.com>; Liang, Prike <Prike.Liang@amd.com><mailto:Prike.Liang@amd.com>; dri-devel <dri-devel@lists.freedesktop.org><mailto:dri-devel@lists.freedesktop.org>; amd-gfx mailing list <amd-gfx@lists.freedesktop.org><mailto:amd-gfx@lists.freedesktop.org> Subject: Re: [PATCH 11/11] drm/amdgpu: stop removing BOs from the LRU during CS [CAUTION: External Email] This series fixes the OOM errors. However, if I torture the kernel driver more, I can get it to deadlock and end up with unkillable processes. I can also get an OOM error. I just ran the test 5 times: AMD_DEBUG=testgdsmm glxgears & AMD_DEBUG=testgdsmm glxgears & AMD_DEBUG=testgdsmm glxgears & AMD_DEBUG=testgdsmm glxgears & AMD_DEBUG=testgdsmm glxgears Marek On Tue, May 14, 2019 at 8:31 AM Christian König <ckoenig.leichtzumerken@gmail.com<mailto:ckoenig.leichtzumerken@gmail.com>> wrote: This avoids OOM situations when we have lots of threads submitting at the same time. Signed-off-by: Christian König <christian.koenig@amd.com<mailto:christian.koenig@amd.com>> --- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c index fff558cf385b..f9240a94217b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c @@ -648,7 +648,7 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p, } r = ttm_eu_reserve_buffers(&p->ticket, &p->validated, true, - &duplicates, true); + &duplicates, false); if (unlikely(r != 0)) { if (r != -ERESTARTSYS) DRM_ERROR("ttm_eu_reserve_buffers failed.\n"); -- 2.17.1
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c index fff558cf385b..f9240a94217b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c @@ -648,7 +648,7 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p, } r = ttm_eu_reserve_buffers(&p->ticket, &p->validated, true, - &duplicates, true); + &duplicates, false); if (unlikely(r != 0)) { if (r != -ERESTARTSYS) DRM_ERROR("ttm_eu_reserve_buffers failed.\n");
This avoids OOM situations when we have lots of threads submitting at the same time. Signed-off-by: Christian König <christian.koenig@amd.com> --- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)