Message ID | 20250109005553.623947-1-airlied@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | nouveau/fence: handle cross device fences properly. (v3) | expand |
On 9/1/25 10:55, Dave Airlie wrote: > From: Dave Airlie <airlied@redhat.com> > > This is the 3rd iteration of this after talking to Ben and > Danilo, I think this makes sense now. > > The fence sync logic doesn't handle a fence sync across devices > as it tries to write to a channel offset from one device into > the fence bo from a different device, which won't work so well. > > This patch fixes that to avoid using the sync path in the case > where the fences come from different nouveau drm devices. > > This works fine on a single device as the fence bo is shared > across the devices, and mapped into each channels vma space, > the channel offsets are therefore okay to pass between sides, > so one channel can sync on the seqnos from the other by using > the offset into it's vma. > > Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Ben Skeggs <bskeggs@nvidia.com> > Cc: stable@vger.kernel.org > --- > drivers/gpu/drm/nouveau/nouveau_fence.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c > index ee5e9d40c166..a3eb1f447a29 100644 > --- a/drivers/gpu/drm/nouveau/nouveau_fence.c > +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c > @@ -367,11 +367,13 @@ nouveau_fence_sync(struct nouveau_bo *nvbo, struct nouveau_channel *chan, > if (f) { > struct nouveau_channel *prev; > bool must_wait = true; > + bool local; > > rcu_read_lock(); > prev = rcu_dereference(f->channel); > - if (prev && (prev == chan || > - fctx->sync(f, prev, chan) == 0)) > + local = prev && prev->drm == chan->drm; > + if (local && (prev == chan || > + fctx->sync(f, prev, chan) == 0)) > must_wait = false; > rcu_read_unlock(); > if (!must_wait)
On 9/1/25 13:42, Ben Skeggs wrote: > On 9/1/25 10:55, Dave Airlie wrote: > >> From: Dave Airlie <airlied@redhat.com> >> >> This is the 3rd iteration of this after talking to Ben and >> Danilo, I think this makes sense now. >> >> The fence sync logic doesn't handle a fence sync across devices >> as it tries to write to a channel offset from one device into >> the fence bo from a different device, which won't work so well. >> >> This patch fixes that to avoid using the sync path in the case >> where the fences come from different nouveau drm devices. >> >> This works fine on a single device as the fence bo is shared >> across the devices, and mapped into each channels vma space, >> the channel offsets are therefore okay to pass between sides, >> so one channel can sync on the seqnos from the other by using >> the offset into it's vma. >> >> Signed-off-by: Dave Airlie <airlied@redhat.com> > Signed-off-by: Ben Skeggs <bskeggs@nvidia.com> Err, not sure what my fingers did there ;) Reviewed-by: Ben Skeggs <bskeggs@nvidia.com> > >> Cc: stable@vger.kernel.org >> --- >> drivers/gpu/drm/nouveau/nouveau_fence.c | 6 ++++-- >> 1 file changed, 4 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c >> b/drivers/gpu/drm/nouveau/nouveau_fence.c >> index ee5e9d40c166..a3eb1f447a29 100644 >> --- a/drivers/gpu/drm/nouveau/nouveau_fence.c >> +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c >> @@ -367,11 +367,13 @@ nouveau_fence_sync(struct nouveau_bo *nvbo, >> struct nouveau_channel *chan, >> if (f) { >> struct nouveau_channel *prev; >> bool must_wait = true; >> + bool local; >> rcu_read_lock(); >> prev = rcu_dereference(f->channel); >> - if (prev && (prev == chan || >> - fctx->sync(f, prev, chan) == 0)) >> + local = prev && prev->drm == chan->drm; >> + if (local && (prev == chan || >> + fctx->sync(f, prev, chan) == 0)) >> must_wait = false; >> rcu_read_unlock(); >> if (!must_wait)
On Thu, Jan 09, 2025 at 10:55:53AM +1000, Dave Airlie wrote: > From: Dave Airlie <airlied@redhat.com> > > This is the 3rd iteration of this after talking to Ben and > Danilo, I think this makes sense now. > > The fence sync logic doesn't handle a fence sync across devices > as it tries to write to a channel offset from one device into > the fence bo from a different device, which won't work so well. > > This patch fixes that to avoid using the sync path in the case > where the fences come from different nouveau drm devices. > > This works fine on a single device as the fence bo is shared > across the devices, and mapped into each channels vma space, > the channel offsets are therefore okay to pass between sides, > so one channel can sync on the seqnos from the other by using > the offset into it's vma. Huh, they indeed all share and map drm->fence->bo, good catch. > > Signed-off-by: Dave Airlie <airlied@redhat.com> > Cc: stable@vger.kernel.org > --- > drivers/gpu/drm/nouveau/nouveau_fence.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c > index ee5e9d40c166..a3eb1f447a29 100644 > --- a/drivers/gpu/drm/nouveau/nouveau_fence.c > +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c > @@ -367,11 +367,13 @@ nouveau_fence_sync(struct nouveau_bo *nvbo, struct nouveau_channel *chan, > if (f) { > struct nouveau_channel *prev; > bool must_wait = true; > + bool local; > > rcu_read_lock(); > prev = rcu_dereference(f->channel); > - if (prev && (prev == chan || > - fctx->sync(f, prev, chan) == 0)) > + local = prev && prev->drm == chan->drm; struct nouveau_channel has no pointer to a struct nouveau_drm, this should be prev->cli->drm and chan->cli->drm instead. No need to resend, I can fix it when applying the patch if you want. > + if (local && (prev == chan || > + fctx->sync(f, prev, chan) == 0)) > must_wait = false; > rcu_read_unlock(); > if (!must_wait) > -- > 2.43.0 >
Hi Dave, kernel test robot noticed the following build errors: [auto build test ERROR on linus/master] [also build test ERROR on drm-misc/drm-misc-next drm-tip/drm-tip v6.13-rc6 next-20250109] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Dave-Airlie/nouveau-fence-handle-cross-device-fences-properly-v3/20250109-085805 base: linus/master patch link: https://lore.kernel.org/r/20250109005553.623947-1-airlied%40gmail.com patch subject: [PATCH] nouveau/fence: handle cross device fences properly. (v3) config: loongarch-randconfig-002-20250110 (https://download.01.org/0day-ci/archive/20250110/202501101033.wlEjeZwK-lkp@intel.com/config) compiler: loongarch64-linux-gcc (GCC) 14.2.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250110/202501101033.wlEjeZwK-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202501101033.wlEjeZwK-lkp@intel.com/ All errors (new ones prefixed by >>): drivers/gpu/drm/nouveau/nouveau_fence.c: In function 'nouveau_fence_sync': >> drivers/gpu/drm/nouveau/nouveau_fence.c:394:53: error: 'struct nouveau_channel' has no member named 'drm' 394 | local = prev && prev->drm == chan->drm; | ^~ drivers/gpu/drm/nouveau/nouveau_fence.c:394:66: error: 'struct nouveau_channel' has no member named 'drm' 394 | local = prev && prev->drm == chan->drm; | ^~ vim +394 drivers/gpu/drm/nouveau/nouveau_fence.c 356 357 int 358 nouveau_fence_sync(struct nouveau_bo *nvbo, struct nouveau_channel *chan, 359 bool exclusive, bool intr) 360 { 361 struct nouveau_fence_chan *fctx = chan->fence; 362 struct dma_resv *resv = nvbo->bo.base.resv; 363 int i, ret; 364 365 ret = dma_resv_reserve_fences(resv, 1); 366 if (ret) 367 return ret; 368 369 /* Waiting for the writes first causes performance regressions 370 * under some circumstances. So manually wait for the reads first. 371 */ 372 for (i = 0; i < 2; ++i) { 373 struct dma_resv_iter cursor; 374 struct dma_fence *fence; 375 376 dma_resv_for_each_fence(&cursor, resv, 377 dma_resv_usage_rw(exclusive), 378 fence) { 379 enum dma_resv_usage usage; 380 struct nouveau_fence *f; 381 382 usage = dma_resv_iter_usage(&cursor); 383 if (i == 0 && usage == DMA_RESV_USAGE_WRITE) 384 continue; 385 386 f = nouveau_local_fence(fence, chan->cli->drm); 387 if (f) { 388 struct nouveau_channel *prev; 389 bool must_wait = true; 390 bool local; 391 392 rcu_read_lock(); 393 prev = rcu_dereference(f->channel); > 394 local = prev && prev->drm == chan->drm; 395 if (local && (prev == chan || 396 fctx->sync(f, prev, chan) == 0)) 397 must_wait = false; 398 rcu_read_unlock(); 399 if (!must_wait) 400 continue; 401 } 402 403 ret = dma_fence_wait(fence, intr); 404 if (ret) 405 return ret; 406 } 407 } 408 409 return 0; 410 } 411
diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c index ee5e9d40c166..a3eb1f447a29 100644 --- a/drivers/gpu/drm/nouveau/nouveau_fence.c +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c @@ -367,11 +367,13 @@ nouveau_fence_sync(struct nouveau_bo *nvbo, struct nouveau_channel *chan, if (f) { struct nouveau_channel *prev; bool must_wait = true; + bool local; rcu_read_lock(); prev = rcu_dereference(f->channel); - if (prev && (prev == chan || - fctx->sync(f, prev, chan) == 0)) + local = prev && prev->drm == chan->drm; + if (local && (prev == chan || + fctx->sync(f, prev, chan) == 0)) must_wait = false; rcu_read_unlock(); if (!must_wait)