mbox series

[0/2] dma-fence: Rename dma_fence_is_signaled()

Message ID 20250409120640.106408-2-phasta@kernel.org (mailing list archive)
Headers show
Series dma-fence: Rename dma_fence_is_signaled() | expand

Message

Philipp Stanner April 9, 2025, 12:06 p.m. UTC
Hi all,

I'm currently debugging a Nouveau issue [1] and potentially might want to
add a function that just checks whether a fence is signaled already –
which then would obviously be called dma_fence_is_signaled().

In any case, I think it is reasonable to rename dma_fence_is_signaled()
so that it becomes very, very explicit when reading code that this is a
place where fences can get signaled.

This series obsoletes this patch: [2]

P.

[1] https://lore.kernel.org/all/20250403101353.42880-2-phasta@kernel.org/
[2] https://lore.kernel.org/all/20250408122217.61530-2-phasta@kernel.org/


Philipp Stanner (2):
  dma-fence: Rename dma_fence_is_signaled()
  dma-fence: Improve docu for dma_fence_check_and_signal()

 drivers/dma-buf/dma-fence-array.c             |  2 +-
 drivers/dma-buf/dma-fence-chain.c             |  6 +--
 drivers/dma-buf/dma-fence-unwrap.c            |  4 +-
 drivers/dma-buf/dma-fence.c                   |  6 +--
 drivers/dma-buf/dma-resv.c                    |  6 +--
 drivers/dma-buf/st-dma-fence-chain.c          | 10 ++--
 drivers/dma-buf/st-dma-fence.c                |  8 ++--
 drivers/dma-buf/sw_sync.c                     |  2 +-
 drivers/dma-buf/sync_file.c                   |  4 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c  |  2 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |  8 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c       |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c   |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c    |  4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c     |  4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c       |  8 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c      |  6 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c      | 10 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c        |  6 +--
 drivers/gpu/drm/amd/amdkfd/kfd_device.c       |  2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c          |  2 +-
 drivers/gpu/drm/drm_suballoc.c                |  6 +--
 drivers/gpu/drm/drm_syncobj.c                 |  6 +--
 drivers/gpu/drm/etnaviv/etnaviv_gpu.c         |  2 +-
 drivers/gpu/drm/etnaviv/etnaviv_sched.c       |  2 +-
 drivers/gpu/drm/i915/gem/i915_gem_wait.c      |  4 +-
 .../drm/i915/gem/selftests/i915_gem_migrate.c |  2 +-
 drivers/gpu/drm/i915/i915_deps.c              |  6 +--
 drivers/gpu/drm/i915/i915_request.c           |  6 +--
 drivers/gpu/drm/i915/i915_sw_fence.c          |  4 +-
 drivers/gpu/drm/i915/i915_vma.c               |  2 +-
 drivers/gpu/drm/i915/selftests/i915_request.c |  4 +-
 drivers/gpu/drm/imagination/pvr_queue.c       | 10 ++--
 drivers/gpu/drm/lima/lima_sched.c             |  4 +-
 drivers/gpu/drm/msm/msm_gpu.c                 |  2 +-
 drivers/gpu/drm/nouveau/nouveau_drm.c         |  2 +-
 drivers/gpu/drm/nouveau/nouveau_fence.c       |  2 +-
 drivers/gpu/drm/panfrost/panfrost_job.c       |  4 +-
 drivers/gpu/drm/qxl/qxl_release.c             |  2 +-
 drivers/gpu/drm/scheduler/sched_entity.c      |  2 +-
 drivers/gpu/drm/scheduler/sched_main.c        |  4 +-
 drivers/gpu/drm/ttm/ttm_bo.c                  |  2 +-
 drivers/gpu/drm/v3d/v3d_sched.c               |  4 +-
 drivers/gpu/drm/vgem/vgem_fence.c             |  2 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_fence.c         |  6 +--
 drivers/gpu/drm/xe/xe_bo.c                    |  2 +-
 drivers/gpu/drm/xe/xe_guc_submit.c            |  4 +-
 drivers/gpu/drm/xe/xe_hw_fence.c              |  2 +-
 drivers/gpu/drm/xe/xe_pt.c                    |  2 +-
 drivers/gpu/drm/xe/xe_range_fence.c           |  2 +-
 include/linux/dma-fence.h                     | 47 ++++++++++++-------
 51 files changed, 133 insertions(+), 120 deletions(-)

Comments

Philipp Stanner April 9, 2025, 12:56 p.m. UTC | #1
On Wed, 2025-04-09 at 14:51 +0200, Philipp Stanner wrote:
> On Wed, 2025-04-09 at 14:39 +0200, Boris Brezillon wrote:
> > Hi Philipp,
> > 
> > On Wed,  9 Apr 2025 14:06:37 +0200
> > Philipp Stanner <phasta@kernel.org> wrote:
> > 
> > > dma_fence_is_signaled()'s name strongly reads as if this function
> > > were
> > > intended for checking whether a fence is already signaled. Also
> > > the
> > > boolean it returns hints at that.
> > > 
> > > The function's behavior, however, is more complex: it can check
> > > with a
> > > driver callback whether the hardware's sequence number indicates
> > > that
> > > the fence can already be treated as signaled, although the
> > > hardware's /
> > > driver's interrupt handler has not signaled it yet. If that's the
> > > case,
> > > the function also signals the fence.
> > > 
> > > (Presumably) this has caused a bug in Nouveau (unknown commit),
> > > where
> > > nouveau_fence_done() uses the function to check a fence, which
> > > causes a
> > > race.
> > > 
> > > Give the function a more obvious name.
> > 
> > This is just my personal view on this, but I find the new name just
> > as
> > confusing as the old one. It sounds like something is checked, but
> > it's
> > clear what, and then the fence is forcibly signaled like it would
> > be
> > if
> > you call drm_fence_signal(). Of course, this clarified by the doc,
> > but
> > given the goal was to make the function name clearly reflect what
> > it
> > does, I'm not convinced it's significantly better.
> > 
> > Maybe dma_fence_check_hw_state_and_propagate(), though it might be
> > too long of name. Oh well, feel free to ignore this comments if a
> > majority is fine with the new name.
> 
> Yoa, the name isn't perfect (the perfect name describing the whole
> behavior would be
> dma_fence_check_if_already_signaled_then_check_hardware_state_and_pro
> pa
> gate() ^^'
> 
> My intention here is to have the reader realize "watch out, the fence
> might get signaled here!", which is probably the most important event
> regarding fences, which can race, invoke the callbacks and so on.
> 
> For details readers will then check the documentation.
> 
> But I'm of course open to see if there's a majority for this or that
> name.

how about:

dma_fence_check_hw_and_signal() ?

P.

> 
> P.
> 
> 
> > 
> > Regards,
> > 
> > Boris
>
Philipp Stanner April 9, 2025, 3:04 p.m. UTC | #2
On Wed, 2025-04-09 at 16:10 +0200, Christian König wrote:
> Am 09.04.25 um 16:01 schrieb Philipp Stanner:
> > On Wed, 2025-04-09 at 15:14 +0200, Christian König wrote:
> > > Am 09.04.25 um 14:56 schrieb Philipp Stanner:
> > > > On Wed, 2025-04-09 at 14:51 +0200, Philipp Stanner wrote:
> > > > > On Wed, 2025-04-09 at 14:39 +0200, Boris Brezillon wrote:
> > > > > > Hi Philipp,
> > > > > > 
> > > > > > On Wed,  9 Apr 2025 14:06:37 +0200
> > > > > > Philipp Stanner <phasta@kernel.org> wrote:
> > > > > > 
> > > > > > > dma_fence_is_signaled()'s name strongly reads as if this
> > > > > > > function
> > > > > > > were
> > > > > > > intended for checking whether a fence is already
> > > > > > > signaled.
> > > > > > > Also
> > > > > > > the
> > > > > > > boolean it returns hints at that.
> > > > > > > 
> > > > > > > The function's behavior, however, is more complex: it can
> > > > > > > check
> > > > > > > with a
> > > > > > > driver callback whether the hardware's sequence number
> > > > > > > indicates
> > > > > > > that
> > > > > > > the fence can already be treated as signaled, although
> > > > > > > the
> > > > > > > hardware's /
> > > > > > > driver's interrupt handler has not signaled it yet. If
> > > > > > > that's
> > > > > > > the
> > > > > > > case,
> > > > > > > the function also signals the fence.
> > > > > > > 
> > > > > > > (Presumably) this has caused a bug in Nouveau (unknown
> > > > > > > commit),
> > > > > > > where
> > > > > > > nouveau_fence_done() uses the function to check a fence,
> > > > > > > which
> > > > > > > causes a
> > > > > > > race.
> > > > > > > 
> > > > > > > Give the function a more obvious name.
> > > > > > This is just my personal view on this, but I find the new
> > > > > > name
> > > > > > just
> > > > > > as
> > > > > > confusing as the old one. It sounds like something is
> > > > > > checked,
> > > > > > but
> > > > > > it's
> > > > > > clear what, and then the fence is forcibly signaled like it
> > > > > > would
> > > > > > be
> > > > > > if
> > > > > > you call drm_fence_signal(). Of course, this clarified by
> > > > > > the
> > > > > > doc,
> > > > > > but
> > > > > > given the goal was to make the function name clearly
> > > > > > reflect
> > > > > > what
> > > > > > it
> > > > > > does, I'm not convinced it's significantly better.
> > > > > > 
> > > > > > Maybe dma_fence_check_hw_state_and_propagate(), though it
> > > > > > might
> > > > > > be
> > > > > > too long of name. Oh well, feel free to ignore this
> > > > > > comments if
> > > > > > a
> > > > > > majority is fine with the new name.
> > > > > Yoa, the name isn't perfect (the perfect name describing the
> > > > > whole
> > > > > behavior would be
> > > > > dma_fence_check_if_already_signaled_then_check_hardware_state
> > > > > _and
> > > > > _pro
> > > > > pa
> > > > > gate() ^^'
> > > > > 
> > > > > My intention here is to have the reader realize "watch out,
> > > > > the
> > > > > fence
> > > > > might get signaled here!", which is probably the most
> > > > > important
> > > > > event
> > > > > regarding fences, which can race, invoke the callbacks and so
> > > > > on.
> > > > > 
> > > > > For details readers will then check the documentation.
> > > > > 
> > > > > But I'm of course open to see if there's a majority for this
> > > > > or
> > > > > that
> > > > > name.
> > > > how about:
> > > > 
> > > > dma_fence_check_hw_and_signal() ?
> > > I don't think that renaming the function is a good idea in the
> > > first
> > > place.
> > > 
> > > What the function does internally is an implementation detail of
> > > the
> > > framework.
> > > 
> > > For the code using this function it's completely irrelevant if
> > > the
> > > function might also signal the fence, what matters for the caller
> > > is
> > > the returned status of the fence. I think this also counts for
> > > the
> > > dma_fence_is_signaled() documentation.
> > It does obviously matter. As it's currently implemented, a lot of
> > important things happen implicitly.
> 
> Yeah, but that's ok.
> 
> The code who calls this is the consumer of the interface and so
> shouldn't need to know this. That's why we have created the DMA fence
> framework in the first place.
> 
> For the provider side when a driver or similar implements the
> interface the relevant documentation is the dma_fence_ops structure.
> 
> > I only see improvement by making things more obvious.
> > 
> > In any case, how would you call a wrapper that just does
> > test_bit(IS_SIGNALED, …) ?
> 
> Broken, that was very intentionally removed quite shortly after we
> created the framework.
> 
> We have a few cases were implementations do check that for their
> fences, but consumers should never be allowed to touch such
> internals.

There is theory and there is practice. In practice, those internals are
being used by Nouveau, i915, Xe, vmgfx and radeon.

So it seems that we failed quite a bit at communicating clearly how the
interface should be used.

And, to repeat myself, with both name and docu of that function, I
think it is very easy to misunderstand what it's doing. You say that it
shouldn't matter – and maybe that's true, in theory. In practice, it
does matter. In practice, APIs get misused and have side-effects. And
making that harder is desirable.

In any case, I might have to add another such call to Nouveau, because
the solution preferred by you over the callback causes another race.
Certainly one could solve this in a clean way, but someone has to do
the work, and we're talking about more than a few hours here.

In any case, be so kind and look at patch 2 and tell me there if you're
at least OK with making the documentation more detailed.

P.

> 
> Regards,
> Christian.
> 
> > 
> > P.
> > 
> > > What we should improve is the documentation of the dma_fence_ops-
> > > > enable_signaling and dma_fence_ops->signaled callbacks.
> > > Especially see the comment about reference counts on
> > > enable_signaling
> > > which is missing on the signaled callback. That is most likely
> > > the
> > > root cause why nouveau implemented enable_signaling correctly but
> > > not
> > > the other one.
> > > 
> > > But putting that aside I think we should make nails with heads
> > > and
> > > let the framework guarantee that the fences stay alive until they
> > > are
> > > signaled (one way or another). This completely removes the burden
> > > to
> > > keep a reference on unsignaled fences from the drivers /
> > > implementations and make things more over all more defensive.
> > > 
> > > Regards,
> > > Christian.
> > > 
> > > > P.
> > > > 
> > > > > P.
> > > > > 
> > > > > 
> > > > > > Regards,
> > > > > > 
> > > > > > Boris
>