diff mbox series

nouveau/fence: handle cross device fences properly. (v3)

Message ID 20250109005553.623947-1-airlied@gmail.com (mailing list archive)
State New
Headers show
Series nouveau/fence: handle cross device fences properly. (v3) | expand

Commit Message

Dave Airlie Jan. 9, 2025, 12:55 a.m. UTC
From: Dave Airlie <airlied@redhat.com>

This is the 3rd iteration of this after talking to Ben and
Danilo, I think this makes sense now.

The fence sync logic doesn't handle a fence sync across devices
as it tries to write to a channel offset from one device into
the fence bo from a different device, which won't work so well.

This patch fixes that to avoid using the sync path in the case
where the fences come from different nouveau drm devices.

This works fine on a single device as the fence bo is shared
across the devices, and mapped into each channels vma space,
the channel offsets are therefore okay to pass between sides,
so one channel can sync on the seqnos from the other by using
the offset into it's vma.

Signed-off-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
---
 drivers/gpu/drm/nouveau/nouveau_fence.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Comments

Ben Skeggs Jan. 9, 2025, 3:42 a.m. UTC | #1
On 9/1/25 10:55, Dave Airlie wrote:

> From: Dave Airlie <airlied@redhat.com>
>
> This is the 3rd iteration of this after talking to Ben and
> Danilo, I think this makes sense now.
>
> The fence sync logic doesn't handle a fence sync across devices
> as it tries to write to a channel offset from one device into
> the fence bo from a different device, which won't work so well.
>
> This patch fixes that to avoid using the sync path in the case
> where the fences come from different nouveau drm devices.
>
> This works fine on a single device as the fence bo is shared
> across the devices, and mapped into each channels vma space,
> the channel offsets are therefore okay to pass between sides,
> so one channel can sync on the seqnos from the other by using
> the offset into it's vma.
>
> Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@nvidia.com>
> Cc: stable@vger.kernel.org
> ---
>   drivers/gpu/drm/nouveau/nouveau_fence.c | 6 ++++--
>   1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c
> index ee5e9d40c166..a3eb1f447a29 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_fence.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c
> @@ -367,11 +367,13 @@ nouveau_fence_sync(struct nouveau_bo *nvbo, struct nouveau_channel *chan,
>   			if (f) {
>   				struct nouveau_channel *prev;
>   				bool must_wait = true;
> +				bool local;
>   
>   				rcu_read_lock();
>   				prev = rcu_dereference(f->channel);
> -				if (prev && (prev == chan ||
> -					     fctx->sync(f, prev, chan) == 0))
> +				local = prev && prev->drm == chan->drm;
> +				if (local && (prev == chan ||
> +					      fctx->sync(f, prev, chan) == 0))
>   					must_wait = false;
>   				rcu_read_unlock();
>   				if (!must_wait)
Ben Skeggs Jan. 9, 2025, 4:13 a.m. UTC | #2
On 9/1/25 13:42, Ben Skeggs wrote:

> On 9/1/25 10:55, Dave Airlie wrote:
>
>> From: Dave Airlie <airlied@redhat.com>
>>
>> This is the 3rd iteration of this after talking to Ben and
>> Danilo, I think this makes sense now.
>>
>> The fence sync logic doesn't handle a fence sync across devices
>> as it tries to write to a channel offset from one device into
>> the fence bo from a different device, which won't work so well.
>>
>> This patch fixes that to avoid using the sync path in the case
>> where the fences come from different nouveau drm devices.
>>
>> This works fine on a single device as the fence bo is shared
>> across the devices, and mapped into each channels vma space,
>> the channel offsets are therefore okay to pass between sides,
>> so one channel can sync on the seqnos from the other by using
>> the offset into it's vma.
>>
>> Signed-off-by: Dave Airlie <airlied@redhat.com>
> Signed-off-by: Ben Skeggs <bskeggs@nvidia.com>

Err, not sure what my fingers did there ;)

Reviewed-by: Ben Skeggs <bskeggs@nvidia.com>

>
>> Cc: stable@vger.kernel.org
>> ---
>>   drivers/gpu/drm/nouveau/nouveau_fence.c | 6 ++++--
>>   1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c 
>> b/drivers/gpu/drm/nouveau/nouveau_fence.c
>> index ee5e9d40c166..a3eb1f447a29 100644
>> --- a/drivers/gpu/drm/nouveau/nouveau_fence.c
>> +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c
>> @@ -367,11 +367,13 @@ nouveau_fence_sync(struct nouveau_bo *nvbo, 
>> struct nouveau_channel *chan,
>>               if (f) {
>>                   struct nouveau_channel *prev;
>>                   bool must_wait = true;
>> +                bool local;
>>                     rcu_read_lock();
>>                   prev = rcu_dereference(f->channel);
>> -                if (prev && (prev == chan ||
>> -                         fctx->sync(f, prev, chan) == 0))
>> +                local = prev && prev->drm == chan->drm;
>> +                if (local && (prev == chan ||
>> +                          fctx->sync(f, prev, chan) == 0))
>>                       must_wait = false;
>>                   rcu_read_unlock();
>>                   if (!must_wait)
Danilo Krummrich Jan. 9, 2025, 4:58 p.m. UTC | #3
On Thu, Jan 09, 2025 at 10:55:53AM +1000, Dave Airlie wrote:
> From: Dave Airlie <airlied@redhat.com>
> 
> This is the 3rd iteration of this after talking to Ben and
> Danilo, I think this makes sense now.
> 
> The fence sync logic doesn't handle a fence sync across devices
> as it tries to write to a channel offset from one device into
> the fence bo from a different device, which won't work so well.
> 
> This patch fixes that to avoid using the sync path in the case
> where the fences come from different nouveau drm devices.
> 
> This works fine on a single device as the fence bo is shared
> across the devices, and mapped into each channels vma space,
> the channel offsets are therefore okay to pass between sides,
> so one channel can sync on the seqnos from the other by using
> the offset into it's vma.

Huh, they indeed all share and map drm->fence->bo, good catch.

> 
> Signed-off-by: Dave Airlie <airlied@redhat.com>
> Cc: stable@vger.kernel.org
> ---
>  drivers/gpu/drm/nouveau/nouveau_fence.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c
> index ee5e9d40c166..a3eb1f447a29 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_fence.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c
> @@ -367,11 +367,13 @@ nouveau_fence_sync(struct nouveau_bo *nvbo, struct nouveau_channel *chan,
>  			if (f) {
>  				struct nouveau_channel *prev;
>  				bool must_wait = true;
> +				bool local;
>  
>  				rcu_read_lock();
>  				prev = rcu_dereference(f->channel);
> -				if (prev && (prev == chan ||
> -					     fctx->sync(f, prev, chan) == 0))
> +				local = prev && prev->drm == chan->drm;

struct nouveau_channel has no pointer to a struct nouveau_drm, this should be
prev->cli->drm and chan->cli->drm instead.

No need to resend, I can fix it when applying the patch if you want.

> +				if (local && (prev == chan ||
> +					      fctx->sync(f, prev, chan) == 0))
>  					must_wait = false;
>  				rcu_read_unlock();
>  				if (!must_wait)
> -- 
> 2.43.0
>
kernel test robot Jan. 10, 2025, 2:28 a.m. UTC | #4
Hi Dave,

kernel test robot noticed the following build errors:

[auto build test ERROR on linus/master]
[also build test ERROR on drm-misc/drm-misc-next drm-tip/drm-tip v6.13-rc6 next-20250109]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Dave-Airlie/nouveau-fence-handle-cross-device-fences-properly-v3/20250109-085805
base:   linus/master
patch link:    https://lore.kernel.org/r/20250109005553.623947-1-airlied%40gmail.com
patch subject: [PATCH] nouveau/fence: handle cross device fences properly. (v3)
config: loongarch-randconfig-002-20250110 (https://download.01.org/0day-ci/archive/20250110/202501101033.wlEjeZwK-lkp@intel.com/config)
compiler: loongarch64-linux-gcc (GCC) 14.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250110/202501101033.wlEjeZwK-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202501101033.wlEjeZwK-lkp@intel.com/

All errors (new ones prefixed by >>):

   drivers/gpu/drm/nouveau/nouveau_fence.c: In function 'nouveau_fence_sync':
>> drivers/gpu/drm/nouveau/nouveau_fence.c:394:53: error: 'struct nouveau_channel' has no member named 'drm'
     394 |                                 local = prev && prev->drm == chan->drm;
         |                                                     ^~
   drivers/gpu/drm/nouveau/nouveau_fence.c:394:66: error: 'struct nouveau_channel' has no member named 'drm'
     394 |                                 local = prev && prev->drm == chan->drm;
         |                                                                  ^~


vim +394 drivers/gpu/drm/nouveau/nouveau_fence.c

   356	
   357	int
   358	nouveau_fence_sync(struct nouveau_bo *nvbo, struct nouveau_channel *chan,
   359			   bool exclusive, bool intr)
   360	{
   361		struct nouveau_fence_chan *fctx = chan->fence;
   362		struct dma_resv *resv = nvbo->bo.base.resv;
   363		int i, ret;
   364	
   365		ret = dma_resv_reserve_fences(resv, 1);
   366		if (ret)
   367			return ret;
   368	
   369		/* Waiting for the writes first causes performance regressions
   370		 * under some circumstances. So manually wait for the reads first.
   371		 */
   372		for (i = 0; i < 2; ++i) {
   373			struct dma_resv_iter cursor;
   374			struct dma_fence *fence;
   375	
   376			dma_resv_for_each_fence(&cursor, resv,
   377						dma_resv_usage_rw(exclusive),
   378						fence) {
   379				enum dma_resv_usage usage;
   380				struct nouveau_fence *f;
   381	
   382				usage = dma_resv_iter_usage(&cursor);
   383				if (i == 0 && usage == DMA_RESV_USAGE_WRITE)
   384					continue;
   385	
   386				f = nouveau_local_fence(fence, chan->cli->drm);
   387				if (f) {
   388					struct nouveau_channel *prev;
   389					bool must_wait = true;
   390					bool local;
   391	
   392					rcu_read_lock();
   393					prev = rcu_dereference(f->channel);
 > 394					local = prev && prev->drm == chan->drm;
   395					if (local && (prev == chan ||
   396						      fctx->sync(f, prev, chan) == 0))
   397						must_wait = false;
   398					rcu_read_unlock();
   399					if (!must_wait)
   400						continue;
   401				}
   402	
   403				ret = dma_fence_wait(fence, intr);
   404				if (ret)
   405					return ret;
   406			}
   407		}
   408	
   409		return 0;
   410	}
   411
diff mbox series

Patch

diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c
index ee5e9d40c166..a3eb1f447a29 100644
--- a/drivers/gpu/drm/nouveau/nouveau_fence.c
+++ b/drivers/gpu/drm/nouveau/nouveau_fence.c
@@ -367,11 +367,13 @@  nouveau_fence_sync(struct nouveau_bo *nvbo, struct nouveau_channel *chan,
 			if (f) {
 				struct nouveau_channel *prev;
 				bool must_wait = true;
+				bool local;
 
 				rcu_read_lock();
 				prev = rcu_dereference(f->channel);
-				if (prev && (prev == chan ||
-					     fctx->sync(f, prev, chan) == 0))
+				local = prev && prev->drm == chan->drm;
+				if (local && (prev == chan ||
+					      fctx->sync(f, prev, chan) == 0))
 					must_wait = false;
 				rcu_read_unlock();
 				if (!must_wait)