diff mbox series

dma-buf: Precheck for a valid dma_fence before acquiring the reference

Message ID 20200221143820.2795039-1-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show
Series dma-buf: Precheck for a valid dma_fence before acquiring the reference | expand

Commit Message

Chris Wilson Feb. 21, 2020, 2:38 p.m. UTC
dma_fence_get_rcu() is used to acquire a reference to under a dma-fence
under racey conditions -- a perfect recipe for a disaster. As we know
the caller may be handling stale memory, use kasan to confirm the
dma-fence, or rather its memory block, is valid before attempting to
acquire a reference. This should help us to more quickly and clearly
identify lost races.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 include/linux/dma-fence.h | 3 +++
 1 file changed, 3 insertions(+)

Comments

Daniel Vetter Feb. 21, 2020, 3:17 p.m. UTC | #1
On Fri, Feb 21, 2020 at 3:38 PM Chris Wilson <chris@chris-wilson.co.uk> wrote:
> dma_fence_get_rcu() is used to acquire a reference to under a dma-fence
> under racey conditions -- a perfect recipe for a disaster. As we know
> the caller may be handling stale memory, use kasan to confirm the
> dma-fence, or rather its memory block, is valid before attempting to
> acquire a reference. This should help us to more quickly and clearly
> identify lost races.

Hm ... I'm a bit lost on the purpose, and what this does. Fences need
to be rcu-freed, and I have honestly no idea how kasan treats those.
Are we throwing false positives, because kasan thinks the stuff is
freed, but we're still accessing it (while the grace period hasn't
passed, so anything freed is still guaranteed to be at least in the
slab cache somewhere).

I'm not seeing how this catches lost races quicker, since the refcount
should get to 0 way before we get to the kfree. So the refcount check
on the next line should catch strictly more races than the kasan
check.
-Daniel

> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> ---
>  include/linux/dma-fence.h | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
> index 3347c54f3a87..2805edd74738 100644
> --- a/include/linux/dma-fence.h
> +++ b/include/linux/dma-fence.h
> @@ -301,6 +301,9 @@ static inline struct dma_fence *dma_fence_get(struct dma_fence *fence)
>   */
>  static inline struct dma_fence *dma_fence_get_rcu(struct dma_fence *fence)
>  {
> +       if (unlikely(!kasan_check_read(fence, sizeof(*fence))))
> +               return NULL;
> +
>         if (kref_get_unless_zero(&fence->refcount))
>                 return fence;
>         else
> --
> 2.25.1
>
Chris Wilson Feb. 21, 2020, 3:23 p.m. UTC | #2
Quoting Daniel Vetter (2020-02-21 15:17:24)
> On Fri, Feb 21, 2020 at 3:38 PM Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > dma_fence_get_rcu() is used to acquire a reference to under a dma-fence
> > under racey conditions -- a perfect recipe for a disaster. As we know
> > the caller may be handling stale memory, use kasan to confirm the
> > dma-fence, or rather its memory block, is valid before attempting to
> > acquire a reference. This should help us to more quickly and clearly
> > identify lost races.
> 
> Hm ... I'm a bit lost on the purpose, and what this does. Fences need
> to be rcu-freed, and I have honestly no idea how kasan treats those.
> Are we throwing false positives, because kasan thinks the stuff is
> freed, but we're still accessing it (while the grace period hasn't
> passed, so anything freed is still guaranteed to be at least in the
> slab cache somewhere).
> 
> I'm not seeing how this catches lost races quicker, since the refcount
> should get to 0 way before we get to the kfree. So the refcount check
> on the next line should catch strictly more races than the kasan
> check.

It's not about the fence itself, but those pointing to the fence. That's
where we may find garbage, and by returning NULL the kernel keeps
working for a bit longer as you try to piece together the race.
-Chris
Chris Wilson Feb. 21, 2020, 3:26 p.m. UTC | #3
Quoting Chris Wilson (2020-02-21 15:23:38)
> Quoting Daniel Vetter (2020-02-21 15:17:24)
> > On Fri, Feb 21, 2020 at 3:38 PM Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > > dma_fence_get_rcu() is used to acquire a reference to under a dma-fence
> > > under racey conditions -- a perfect recipe for a disaster. As we know
> > > the caller may be handling stale memory, use kasan to confirm the
> > > dma-fence, or rather its memory block, is valid before attempting to
> > > acquire a reference. This should help us to more quickly and clearly
> > > identify lost races.
> > 
> > Hm ... I'm a bit lost on the purpose, and what this does. Fences need
> > to be rcu-freed, and I have honestly no idea how kasan treats those.
> > Are we throwing false positives, because kasan thinks the stuff is
> > freed, but we're still accessing it (while the grace period hasn't
> > passed, so anything freed is still guaranteed to be at least in the
> > slab cache somewhere).
> > 
> > I'm not seeing how this catches lost races quicker, since the refcount
> > should get to 0 way before we get to the kfree. So the refcount check
> > on the next line should catch strictly more races than the kasan
> > check.
> 
> It's not about the fence itself, but those pointing to the fence. That's
> where we may find garbage, and by returning NULL the kernel keeps
> working for a bit longer as you try to piece together the race.

Plus given all the inlining, the kasan warning for the refcount is not
that clear about where it is called from, which is a bit of a nuisance,
so an explicit warning was much easier for finding the culprit.
-Chris
Daniel Vetter Feb. 21, 2020, 4:34 p.m. UTC | #4
On Fri, Feb 21, 2020 at 4:26 PM Chris Wilson <chris@chris-wilson.co.uk> wrote:
>
> Quoting Chris Wilson (2020-02-21 15:23:38)
> > Quoting Daniel Vetter (2020-02-21 15:17:24)
> > > On Fri, Feb 21, 2020 at 3:38 PM Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > > > dma_fence_get_rcu() is used to acquire a reference to under a dma-fence
> > > > under racey conditions -- a perfect recipe for a disaster. As we know
> > > > the caller may be handling stale memory, use kasan to confirm the
> > > > dma-fence, or rather its memory block, is valid before attempting to
> > > > acquire a reference. This should help us to more quickly and clearly
> > > > identify lost races.
> > >
> > > Hm ... I'm a bit lost on the purpose, and what this does. Fences need
> > > to be rcu-freed, and I have honestly no idea how kasan treats those.
> > > Are we throwing false positives, because kasan thinks the stuff is
> > > freed, but we're still accessing it (while the grace period hasn't
> > > passed, so anything freed is still guaranteed to be at least in the
> > > slab cache somewhere).
> > >
> > > I'm not seeing how this catches lost races quicker, since the refcount
> > > should get to 0 way before we get to the kfree. So the refcount check
> > > on the next line should catch strictly more races than the kasan
> > > check.
> >
> > It's not about the fence itself, but those pointing to the fence. That's
> > where we may find garbage, and by returning NULL the kernel keeps
> > working for a bit longer as you try to piece together the race.
>
> Plus given all the inlining, the kasan warning for the refcount is not
> that clear about where it is called from, which is a bit of a nuisance,
> so an explicit warning was much easier for finding the culprit.

Hm ok, that makes a lot more sense. Can you pls amend the commit
message (paste the above if you feel uninspired, this discussion
explains pretty good what's the point). And maybe a link to bug report
or include an example splat and how it's improved.

With that stuff added Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>

But maybe get another co-conspirator/bug-hunter to ack this too.
-Daniel
diff mbox series

Patch

diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
index 3347c54f3a87..2805edd74738 100644
--- a/include/linux/dma-fence.h
+++ b/include/linux/dma-fence.h
@@ -301,6 +301,9 @@  static inline struct dma_fence *dma_fence_get(struct dma_fence *fence)
  */
 static inline struct dma_fence *dma_fence_get_rcu(struct dma_fence *fence)
 {
+	if (unlikely(!kasan_check_read(fence, sizeof(*fence))))
+		return NULL;
+
 	if (kref_get_unless_zero(&fence->refcount))
 		return fence;
 	else