mbox series

[v6,0/2] Add p2p via dmabuf to habanalabs

Message ID 20210912165309.98695-1-ogabbay@kernel.org (mailing list archive)
Headers show
Series Add p2p via dmabuf to habanalabs | expand

Message

Oded Gabbay Sept. 12, 2021, 4:53 p.m. UTC
Hi,
Re-sending this patch-set following the release of our user-space TPC
compiler and runtime library.

I would appreciate a review on this.

Thanks,
Oded

Oded Gabbay (1):
  habanalabs: define uAPI to export FD for DMA-BUF

Tomer Tayar (1):
  habanalabs: add support for dma-buf exporter

 drivers/misc/habanalabs/Kconfig             |   1 +
 drivers/misc/habanalabs/common/habanalabs.h |  22 +
 drivers/misc/habanalabs/common/memory.c     | 522 +++++++++++++++++++-
 drivers/misc/habanalabs/gaudi/gaudi.c       |   1 +
 drivers/misc/habanalabs/goya/goya.c         |   1 +
 include/uapi/misc/habanalabs.h              |  28 +-
 6 files changed, 570 insertions(+), 5 deletions(-)

Comments

Daniel Vetter Sept. 14, 2021, 2:18 p.m. UTC | #1
On Sun, Sep 12, 2021 at 07:53:07PM +0300, Oded Gabbay wrote:
> Hi,
> Re-sending this patch-set following the release of our user-space TPC
> compiler and runtime library.
> 
> I would appreciate a review on this.

I think the big open we have is the entire revoke discussions. Having the
option to let dma-buf hang around which map to random local memory ranges,
without clear ownership link and a way to kill it sounds bad to me.

I think there's a few options:
- We require revoke support. But I've heard rdma really doesn't like that,
  I guess because taking out an MR while holding the dma_resv_lock would
  be an inversion, so can't be done. Jason, can you recap what exactly the
  hold-up was again that makes this a no-go?

- The other option I discussed is a bit more the exlusive device ownership
  model we've had for gpus in drm of the really old kind. Roughly this
  would work like this, in terms of drm_device:
  - Only the current owner (drm_master in current drm code, but should
    probably rename that to drm_owner) is allowed to use the accel driver.
    So all ioctl would fail if you're not drm_master.
  - On dropmaster/file close we'd revoke as much as possible, e.g.
    in-flight commands, mmaps, anything really that can be revoked.
  - For non-revokable things like these dma-buf we'd keep a drm_master
    reference around. This would prevent the next open to acquire
    ownership rights, which at least prevents all the nasty potential
    problems.
  - admin (or well container orchestrator) then has responsibility to
    shoot down all process until the problem goes away (i.e. until you hit
    the one with the rdma MR which keeps the dma-buf alive)

- Not sure there's another reasonable way to do this without inviting some
  problems once we get outside of the "single kernel instance per tenant"
  use-case.

Wrt implementation there's the trouble of this reinventing a bunch of drm
stuff and concepts, but that's maybe for after we've figured out
semantics.

Also would be great if you have a pull request for the userspace runtime
that shows a bit how this all gets used and tied together. Or maybe some
pointers, since I guess retconning a PR in github is maybe a bit much.

Cheers, Daniel

> 
> Thanks,
> Oded
> 
> Oded Gabbay (1):
>   habanalabs: define uAPI to export FD for DMA-BUF
> 
> Tomer Tayar (1):
>   habanalabs: add support for dma-buf exporter
> 
>  drivers/misc/habanalabs/Kconfig             |   1 +
>  drivers/misc/habanalabs/common/habanalabs.h |  22 +
>  drivers/misc/habanalabs/common/memory.c     | 522 +++++++++++++++++++-
>  drivers/misc/habanalabs/gaudi/gaudi.c       |   1 +
>  drivers/misc/habanalabs/goya/goya.c         |   1 +
>  include/uapi/misc/habanalabs.h              |  28 +-
>  6 files changed, 570 insertions(+), 5 deletions(-)
> 
> -- 
> 2.17.1
>
Oded Gabbay Sept. 14, 2021, 2:58 p.m. UTC | #2
On Tue, Sep 14, 2021 at 5:18 PM Daniel Vetter <daniel@ffwll.ch> wrote:
>
> On Sun, Sep 12, 2021 at 07:53:07PM +0300, Oded Gabbay wrote:
> > Hi,
> > Re-sending this patch-set following the release of our user-space TPC
> > compiler and runtime library.
> >
> > I would appreciate a review on this.
>
> I think the big open we have is the entire revoke discussions. Having the
> option to let dma-buf hang around which map to random local memory ranges,
> without clear ownership link and a way to kill it sounds bad to me.
>
Hi Daniel, thanks for the reply.

What is this revocation requirement ?
Is it relevant to my case, where our device has a single user at a
time (only a single process can open the device character file) and
that user has ownership of the entire device local memory ?

Because I don't care if the user has this dma-buf object lying around,
as it only wastes device memory for that user. And the user can't
close the fd of the device until it has closed the fd of the dmabuf.

Or is the revocation referring to something else entirely ?

> I think there's a few options:
> - We require revoke support. But I've heard rdma really doesn't like that,
>   I guess because taking out an MR while holding the dma_resv_lock would
>   be an inversion, so can't be done. Jason, can you recap what exactly the
>   hold-up was again that makes this a no-go?
>
> - The other option I discussed is a bit more the exlusive device ownership
>   model we've had for gpus in drm of the really old kind. Roughly this
>   would work like this, in terms of drm_device:
>   - Only the current owner (drm_master in current drm code, but should
>     probably rename that to drm_owner) is allowed to use the accel driver.
>     So all ioctl would fail if you're not drm_master.
>   - On dropmaster/file close we'd revoke as much as possible, e.g.
>     in-flight commands, mmaps, anything really that can be revoked.
>   - For non-revokable things like these dma-buf we'd keep a drm_master
>     reference around. This would prevent the next open to acquire
>     ownership rights, which at least prevents all the nasty potential
>     problems.
>   - admin (or well container orchestrator) then has responsibility to
>     shoot down all process until the problem goes away (i.e. until you hit
>     the one with the rdma MR which keeps the dma-buf alive)
>
> - Not sure there's another reasonable way to do this without inviting some
>   problems once we get outside of the "single kernel instance per tenant"
>   use-case.
>
> Wrt implementation there's the trouble of this reinventing a bunch of drm
> stuff and concepts, but that's maybe for after we've figured out
> semantics.
>
> Also would be great if you have a pull request for the userspace runtime
> that shows a bit how this all gets used and tied together. Or maybe some
> pointers, since I guess retconning a PR in github is maybe a bit much.

hmm.. so actually this has only an API in the hl-thunk library. I have
not put it in github but I can do it fairly quickly.
But the callee of this API is not the userspace runtime. The callee is
another library which is responsible for doing scale-out of training
outside of a box of gaudi devices. That library implements collective
operations (e.g. all gather, all reduce) over multiple gaudi devices.
And in fact, the real user is the training framework (e.g. tensorflow,
pytorch) that calls these collective operations. The framework then
passes the dmabuf fd to libfabric (open source project) which uses
rdma-core to pass it to the rdma driver.

I can give you a short presentation on that if you want :)

>
> Cheers, Daniel
>
> >
> > Thanks,
> > Oded
> >
> > Oded Gabbay (1):
> >   habanalabs: define uAPI to export FD for DMA-BUF
> >
> > Tomer Tayar (1):
> >   habanalabs: add support for dma-buf exporter
> >
> >  drivers/misc/habanalabs/Kconfig             |   1 +
> >  drivers/misc/habanalabs/common/habanalabs.h |  22 +
> >  drivers/misc/habanalabs/common/memory.c     | 522 +++++++++++++++++++-
> >  drivers/misc/habanalabs/gaudi/gaudi.c       |   1 +
> >  drivers/misc/habanalabs/goya/goya.c         |   1 +
> >  include/uapi/misc/habanalabs.h              |  28 +-
> >  6 files changed, 570 insertions(+), 5 deletions(-)
> >
> > --
> > 2.17.1
> >
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
Jason Gunthorpe Sept. 14, 2021, 4:12 p.m. UTC | #3
On Tue, Sep 14, 2021 at 04:18:31PM +0200, Daniel Vetter wrote:
> On Sun, Sep 12, 2021 at 07:53:07PM +0300, Oded Gabbay wrote:
> > Hi,
> > Re-sending this patch-set following the release of our user-space TPC
> > compiler and runtime library.
> > 
> > I would appreciate a review on this.
> 
> I think the big open we have is the entire revoke discussions. Having the
> option to let dma-buf hang around which map to random local memory ranges,
> without clear ownership link and a way to kill it sounds bad to me.
> 
> I think there's a few options:
> - We require revoke support. But I've heard rdma really doesn't like that,
>   I guess because taking out an MR while holding the dma_resv_lock would
>   be an inversion, so can't be done. Jason, can you recap what exactly the
>   hold-up was again that makes this a no-go?

RDMA HW can't do revoke.

So we have to exclude almost all the HW and several interesting use
cases to enable a revoke operation.

>   - For non-revokable things like these dma-buf we'd keep a drm_master
>     reference around. This would prevent the next open to acquire
>     ownership rights, which at least prevents all the nasty potential
>     problems.

This is what I generally would expect, the DMABUF FD and its DMA
memory just floats about until the unrevokable user releases it, which
happens when the FD that is driving the import eventually gets closed.

I still don't think any of the complexity is needed, pinnable memory
is a thing in Linux, just account for it in mlocked and that is
enough.

Jason
Oded Gabbay Sept. 15, 2021, 7:45 a.m. UTC | #4
On Tue, Sep 14, 2021 at 7:12 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Tue, Sep 14, 2021 at 04:18:31PM +0200, Daniel Vetter wrote:
> > On Sun, Sep 12, 2021 at 07:53:07PM +0300, Oded Gabbay wrote:
> > > Hi,
> > > Re-sending this patch-set following the release of our user-space TPC
> > > compiler and runtime library.
> > >
> > > I would appreciate a review on this.
> >
> > I think the big open we have is the entire revoke discussions. Having the
> > option to let dma-buf hang around which map to random local memory ranges,
> > without clear ownership link and a way to kill it sounds bad to me.
> >
> > I think there's a few options:
> > - We require revoke support. But I've heard rdma really doesn't like that,
> >   I guess because taking out an MR while holding the dma_resv_lock would
> >   be an inversion, so can't be done. Jason, can you recap what exactly the
> >   hold-up was again that makes this a no-go?
>
> RDMA HW can't do revoke.
>
> So we have to exclude almost all the HW and several interesting use
> cases to enable a revoke operation.
>
> >   - For non-revokable things like these dma-buf we'd keep a drm_master
> >     reference around. This would prevent the next open to acquire
> >     ownership rights, which at least prevents all the nasty potential
> >     problems.
>
> This is what I generally would expect, the DMABUF FD and its DMA
> memory just floats about until the unrevokable user releases it, which
> happens when the FD that is driving the import eventually gets closed.
This is exactly what we are doing in the driver. We make sure
everything is valid until the unrevokable user releases it and that
happens only when the dmabuf fd gets closed.
And the user can't close it's fd of the device until he performs the
above, so there is no leakage between users.

>
> I still don't think any of the complexity is needed, pinnable memory
> is a thing in Linux, just account for it in mlocked and that is
> enough.
>
> Jason
Daniel Vetter Sept. 16, 2021, 12:31 p.m. UTC | #5
On Wed, Sep 15, 2021 at 10:45:36AM +0300, Oded Gabbay wrote:
> On Tue, Sep 14, 2021 at 7:12 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Tue, Sep 14, 2021 at 04:18:31PM +0200, Daniel Vetter wrote:
> > > On Sun, Sep 12, 2021 at 07:53:07PM +0300, Oded Gabbay wrote:
> > > > Hi,
> > > > Re-sending this patch-set following the release of our user-space TPC
> > > > compiler and runtime library.
> > > >
> > > > I would appreciate a review on this.
> > >
> > > I think the big open we have is the entire revoke discussions. Having the
> > > option to let dma-buf hang around which map to random local memory ranges,
> > > without clear ownership link and a way to kill it sounds bad to me.
> > >
> > > I think there's a few options:
> > > - We require revoke support. But I've heard rdma really doesn't like that,
> > >   I guess because taking out an MR while holding the dma_resv_lock would
> > >   be an inversion, so can't be done. Jason, can you recap what exactly the
> > >   hold-up was again that makes this a no-go?
> >
> > RDMA HW can't do revoke.

Like why? I'm assuming when the final open handle or whatever for that MR
is closed, you do clean up everything? Or does that MR still stick around
forever too?

> > So we have to exclude almost all the HW and several interesting use
> > cases to enable a revoke operation.
> >
> > >   - For non-revokable things like these dma-buf we'd keep a drm_master
> > >     reference around. This would prevent the next open to acquire
> > >     ownership rights, which at least prevents all the nasty potential
> > >     problems.
> >
> > This is what I generally would expect, the DMABUF FD and its DMA
> > memory just floats about until the unrevokable user releases it, which
> > happens when the FD that is driving the import eventually gets closed.
> This is exactly what we are doing in the driver. We make sure
> everything is valid until the unrevokable user releases it and that
> happens only when the dmabuf fd gets closed.
> And the user can't close it's fd of the device until he performs the
> above, so there is no leakage between users.

Maybe I got the device security model all wrong, but I thought Guadi is
single user, and the only thing it protects is the system against the
Gaudi device trhough iommu/device gart. So roughly the following can
happen:

1. User A opens gaudi device, sets up dma-buf export

2. User A registers that with RDMA, or anything else that doesn't support
revoke.

3. User A closes gaudi device

4. User B opens gaudi device, assumes that it has full control over the
device and uploads some secrets, which happen to end up in the dma-buf
region user A set up

5. User B extracts secrets.

> > I still don't think any of the complexity is needed, pinnable memory
> > is a thing in Linux, just account for it in mlocked and that is
> > enough.

It's not mlocked memory, it's mlocked memory and I can exfiltrate it.
Mlock is fine, exfiltration not so much. It's mlock, but a global pool and
if you didn't munlock then the next mlock from a completely different user
will alias with your stuff.

Or is there something that prevents that? Oded at least explain that gaudi
works like a gpu from 20 years ago, single user, no security at all within
the device.
-Daniel
Oded Gabbay Sept. 16, 2021, 12:44 p.m. UTC | #6
On Thu, Sep 16, 2021 at 3:31 PM Daniel Vetter <daniel@ffwll.ch> wrote:
>
> On Wed, Sep 15, 2021 at 10:45:36AM +0300, Oded Gabbay wrote:
> > On Tue, Sep 14, 2021 at 7:12 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > >
> > > On Tue, Sep 14, 2021 at 04:18:31PM +0200, Daniel Vetter wrote:
> > > > On Sun, Sep 12, 2021 at 07:53:07PM +0300, Oded Gabbay wrote:
> > > > > Hi,
> > > > > Re-sending this patch-set following the release of our user-space TPC
> > > > > compiler and runtime library.
> > > > >
> > > > > I would appreciate a review on this.
> > > >
> > > > I think the big open we have is the entire revoke discussions. Having the
> > > > option to let dma-buf hang around which map to random local memory ranges,
> > > > without clear ownership link and a way to kill it sounds bad to me.
> > > >
> > > > I think there's a few options:
> > > > - We require revoke support. But I've heard rdma really doesn't like that,
> > > >   I guess because taking out an MR while holding the dma_resv_lock would
> > > >   be an inversion, so can't be done. Jason, can you recap what exactly the
> > > >   hold-up was again that makes this a no-go?
> > >
> > > RDMA HW can't do revoke.
>
> Like why? I'm assuming when the final open handle or whatever for that MR
> is closed, you do clean up everything? Or does that MR still stick around
> forever too?
>
> > > So we have to exclude almost all the HW and several interesting use
> > > cases to enable a revoke operation.
> > >
> > > >   - For non-revokable things like these dma-buf we'd keep a drm_master
> > > >     reference around. This would prevent the next open to acquire
> > > >     ownership rights, which at least prevents all the nasty potential
> > > >     problems.
> > >
> > > This is what I generally would expect, the DMABUF FD and its DMA
> > > memory just floats about until the unrevokable user releases it, which
> > > happens when the FD that is driving the import eventually gets closed.
> > This is exactly what we are doing in the driver. We make sure
> > everything is valid until the unrevokable user releases it and that
> > happens only when the dmabuf fd gets closed.
> > And the user can't close it's fd of the device until he performs the
> > above, so there is no leakage between users.
>
> Maybe I got the device security model all wrong, but I thought Guadi is
> single user, and the only thing it protects is the system against the
> Gaudi device trhough iommu/device gart. So roughly the following can
> happen:
>
> 1. User A opens gaudi device, sets up dma-buf export
>
> 2. User A registers that with RDMA, or anything else that doesn't support
> revoke.
>
> 3. User A closes gaudi device
This can not happen without User A closing the FD of the dma-buf it exported.
We prevent User A from closing the device because when it exported the
dma-buf, the driver's code took a refcnt of the user's private
structure. You can see that in export_dmabuf_common() in the 2nd
patch. There is a call there to hl_ctx_get.
So even if User A calls close(device_fd), the driver won't let any
other user open the device until User A closes the fd of the dma-buf
object.

Moreover, once User A will close the dma-buf fd and the device is
released, the driver will scrub the device memory (this is optional
for systems who care about security).

And AFAIK, User A can't close the dma-buf fd once it registered it
with RDMA, without doing unregister.
This can be seen in ib_umem_dmabuf_get() which calls dma_buf_get()
which does fget(fd)


>
> 4. User B opens gaudi device, assumes that it has full control over the
> device and uploads some secrets, which happen to end up in the dma-buf
> region user A set up
>
> 5. User B extracts secrets.
>
> > > I still don't think any of the complexity is needed, pinnable memory
> > > is a thing in Linux, just account for it in mlocked and that is
> > > enough.
>
> It's not mlocked memory, it's mlocked memory and I can exfiltrate it.
> Mlock is fine, exfiltration not so much. It's mlock, but a global pool and
> if you didn't munlock then the next mlock from a completely different user
> will alias with your stuff.
>
> Or is there something that prevents that? Oded at least explain that gaudi
> works like a gpu from 20 years ago, single user, no security at all within
> the device.
> -Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
Jason Gunthorpe Sept. 16, 2021, 1:10 p.m. UTC | #7
On Thu, Sep 16, 2021 at 02:31:34PM +0200, Daniel Vetter wrote:
> On Wed, Sep 15, 2021 at 10:45:36AM +0300, Oded Gabbay wrote:
> > On Tue, Sep 14, 2021 at 7:12 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > >
> > > On Tue, Sep 14, 2021 at 04:18:31PM +0200, Daniel Vetter wrote:
> > > > On Sun, Sep 12, 2021 at 07:53:07PM +0300, Oded Gabbay wrote:
> > > > > Hi,
> > > > > Re-sending this patch-set following the release of our user-space TPC
> > > > > compiler and runtime library.
> > > > >
> > > > > I would appreciate a review on this.
> > > >
> > > > I think the big open we have is the entire revoke discussions. Having the
> > > > option to let dma-buf hang around which map to random local memory ranges,
> > > > without clear ownership link and a way to kill it sounds bad to me.
> > > >
> > > > I think there's a few options:
> > > > - We require revoke support. But I've heard rdma really doesn't like that,
> > > >   I guess because taking out an MR while holding the dma_resv_lock would
> > > >   be an inversion, so can't be done. Jason, can you recap what exactly the
> > > >   hold-up was again that makes this a no-go?
> > >
> > > RDMA HW can't do revoke.
> 
> Like why? I'm assuming when the final open handle or whatever for that MR
> is closed, you do clean up everything? Or does that MR still stick around
> forever too?

It is a combination of uAPI and HW specification.

revoke here means you take a MR object and tell it to stop doing DMA
without causing the MR object to be destructed.

All the drivers can of course destruct the MR, but doing such a
destruction without explicit synchronization with user space opens
things up to a serious use-after potential that could be a security
issue.

When the open handle closes the userspace is synchronized with the
kernel and we can destruct the HW objects safely.

So, the special HW feature required is 'stop doing DMA but keep the
object in an error state' which isn't really implemented, and doesn't
extend very well to other object types beyond simple MRs.

> 1. User A opens gaudi device, sets up dma-buf export
> 
> 2. User A registers that with RDMA, or anything else that doesn't support
> revoke.
> 
> 3. User A closes gaudi device
> 
> 4. User B opens gaudi device, assumes that it has full control over the
> device and uploads some secrets, which happen to end up in the dma-buf
> region user A set up

I would expect this is blocked so long as the DMABUF exists - eg the
DMABUF will hold a fget on the FD of #1 until the DMABUF is closed, so
that #3 can't actually happen.

> It's not mlocked memory, it's mlocked memory and I can exfiltrate
> it.

That's just bug, don't make buggy drivers :)

Jason
Oded Gabbay Sept. 16, 2021, 1:16 p.m. UTC | #8
On Thu, Sep 16, 2021 at 3:44 PM Oded Gabbay <ogabbay@kernel.org> wrote:
>
> On Thu, Sep 16, 2021 at 3:31 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> >
> > Maybe I got the device security model all wrong, but I thought Guadi is
> > single user, and the only thing it protects is the system against the
> > Gaudi device trhough iommu/device gart. So roughly the following can
> > happen:
> >
> > 1. User A opens gaudi device, sets up dma-buf export
> >
> > 2. User A registers that with RDMA, or anything else that doesn't support
> > revoke.
> >
> > 3. User A closes gaudi device
> This can not happen without User A closing the FD of the dma-buf it exported.
> We prevent User A from closing the device because when it exported the
> dma-buf, the driver's code took a refcnt of the user's private
> structure. You can see that in export_dmabuf_common() in the 2nd
> patch. There is a call there to hl_ctx_get.
> So even if User A calls close(device_fd), the driver won't let any
> other user open the device until User A closes the fd of the dma-buf
> object.
>
> Moreover, once User A will close the dma-buf fd and the device is
> released, the driver will scrub the device memory (this is optional
> for systems who care about security).
>
> And AFAIK, User A can't close the dma-buf fd once it registered it
> with RDMA, without doing unregister.
> This can be seen in ib_umem_dmabuf_get() which calls dma_buf_get()
> which does fget(fd)

Adding Daniel, I don't know how his email got dropped when I replied to him...
Daniel Vetter Sept. 17, 2021, 12:25 p.m. UTC | #9
On Thu, Sep 16, 2021 at 03:44:25PM +0300, Oded Gabbay wrote:
> On Thu, Sep 16, 2021 at 3:31 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> >
> > On Wed, Sep 15, 2021 at 10:45:36AM +0300, Oded Gabbay wrote:
> > > On Tue, Sep 14, 2021 at 7:12 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > > >
> > > > On Tue, Sep 14, 2021 at 04:18:31PM +0200, Daniel Vetter wrote:
> > > > > On Sun, Sep 12, 2021 at 07:53:07PM +0300, Oded Gabbay wrote:
> > > > > > Hi,
> > > > > > Re-sending this patch-set following the release of our user-space TPC
> > > > > > compiler and runtime library.
> > > > > >
> > > > > > I would appreciate a review on this.
> > > > >
> > > > > I think the big open we have is the entire revoke discussions. Having the
> > > > > option to let dma-buf hang around which map to random local memory ranges,
> > > > > without clear ownership link and a way to kill it sounds bad to me.
> > > > >
> > > > > I think there's a few options:
> > > > > - We require revoke support. But I've heard rdma really doesn't like that,
> > > > >   I guess because taking out an MR while holding the dma_resv_lock would
> > > > >   be an inversion, so can't be done. Jason, can you recap what exactly the
> > > > >   hold-up was again that makes this a no-go?
> > > >
> > > > RDMA HW can't do revoke.
> >
> > Like why? I'm assuming when the final open handle or whatever for that MR
> > is closed, you do clean up everything? Or does that MR still stick around
> > forever too?
> >
> > > > So we have to exclude almost all the HW and several interesting use
> > > > cases to enable a revoke operation.
> > > >
> > > > >   - For non-revokable things like these dma-buf we'd keep a drm_master
> > > > >     reference around. This would prevent the next open to acquire
> > > > >     ownership rights, which at least prevents all the nasty potential
> > > > >     problems.
> > > >
> > > > This is what I generally would expect, the DMABUF FD and its DMA
> > > > memory just floats about until the unrevokable user releases it, which
> > > > happens when the FD that is driving the import eventually gets closed.
> > > This is exactly what we are doing in the driver. We make sure
> > > everything is valid until the unrevokable user releases it and that
> > > happens only when the dmabuf fd gets closed.
> > > And the user can't close it's fd of the device until he performs the
> > > above, so there is no leakage between users.
> >
> > Maybe I got the device security model all wrong, but I thought Guadi is
> > single user, and the only thing it protects is the system against the
> > Gaudi device trhough iommu/device gart. So roughly the following can
> > happen:
> >
> > 1. User A opens gaudi device, sets up dma-buf export
> >
> > 2. User A registers that with RDMA, or anything else that doesn't support
> > revoke.
> >
> > 3. User A closes gaudi device
> This can not happen without User A closing the FD of the dma-buf it exported.
> We prevent User A from closing the device because when it exported the
> dma-buf, the driver's code took a refcnt of the user's private
> structure. You can see that in export_dmabuf_common() in the 2nd
> patch. There is a call there to hl_ctx_get.
> So even if User A calls close(device_fd), the driver won't let any
> other user open the device until User A closes the fd of the dma-buf
> object.
> 
> Moreover, once User A will close the dma-buf fd and the device is
> released, the driver will scrub the device memory (this is optional
> for systems who care about security).
> 
> And AFAIK, User A can't close the dma-buf fd once it registered it
> with RDMA, without doing unregister.
> This can be seen in ib_umem_dmabuf_get() which calls dma_buf_get()
> which does fget(fd)

Yeah that's essentially what I was looking for. This is defacto
hand-rolling the drm_master owner tracking stuff. As long as we have
something like this in place it should be fine I think.
-Daniel

> > 4. User B opens gaudi device, assumes that it has full control over the
> > device and uploads some secrets, which happen to end up in the dma-buf
> > region user A set up
> >
> > 5. User B extracts secrets.
> >
> > > > I still don't think any of the complexity is needed, pinnable memory
> > > > is a thing in Linux, just account for it in mlocked and that is
> > > > enough.
> >
> > It's not mlocked memory, it's mlocked memory and I can exfiltrate it.
> > Mlock is fine, exfiltration not so much. It's mlock, but a global pool and
> > if you didn't munlock then the next mlock from a completely different user
> > will alias with your stuff.
> >
> > Or is there something that prevents that? Oded at least explain that gaudi
> > works like a gpu from 20 years ago, single user, no security at all within
> > the device.
> > -Daniel
> > --
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > http://blog.ffwll.ch
Daniel Vetter Sept. 17, 2021, 12:30 p.m. UTC | #10
On Thu, Sep 16, 2021 at 10:10:14AM -0300, Jason Gunthorpe wrote:
> On Thu, Sep 16, 2021 at 02:31:34PM +0200, Daniel Vetter wrote:
> > On Wed, Sep 15, 2021 at 10:45:36AM +0300, Oded Gabbay wrote:
> > > On Tue, Sep 14, 2021 at 7:12 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > > >
> > > > On Tue, Sep 14, 2021 at 04:18:31PM +0200, Daniel Vetter wrote:
> > > > > On Sun, Sep 12, 2021 at 07:53:07PM +0300, Oded Gabbay wrote:
> > > > > > Hi,
> > > > > > Re-sending this patch-set following the release of our user-space TPC
> > > > > > compiler and runtime library.
> > > > > >
> > > > > > I would appreciate a review on this.
> > > > >
> > > > > I think the big open we have is the entire revoke discussions. Having the
> > > > > option to let dma-buf hang around which map to random local memory ranges,
> > > > > without clear ownership link and a way to kill it sounds bad to me.
> > > > >
> > > > > I think there's a few options:
> > > > > - We require revoke support. But I've heard rdma really doesn't like that,
> > > > >   I guess because taking out an MR while holding the dma_resv_lock would
> > > > >   be an inversion, so can't be done. Jason, can you recap what exactly the
> > > > >   hold-up was again that makes this a no-go?
> > > >
> > > > RDMA HW can't do revoke.
> > 
> > Like why? I'm assuming when the final open handle or whatever for that MR
> > is closed, you do clean up everything? Or does that MR still stick around
> > forever too?
> 
> It is a combination of uAPI and HW specification.
> 
> revoke here means you take a MR object and tell it to stop doing DMA
> without causing the MR object to be destructed.
> 
> All the drivers can of course destruct the MR, but doing such a
> destruction without explicit synchronization with user space opens
> things up to a serious use-after potential that could be a security
> issue.
> 
> When the open handle closes the userspace is synchronized with the
> kernel and we can destruct the HW objects safely.
> 
> So, the special HW feature required is 'stop doing DMA but keep the
> object in an error state' which isn't really implemented, and doesn't
> extend very well to other object types beyond simple MRs.

Yeah revoke without destroying the MR doesn't work, and it sounds like
revoke by destroying the MR just moves the can of worms around to another
place.

> > 1. User A opens gaudi device, sets up dma-buf export
> > 
> > 2. User A registers that with RDMA, or anything else that doesn't support
> > revoke.
> > 
> > 3. User A closes gaudi device
> > 
> > 4. User B opens gaudi device, assumes that it has full control over the
> > device and uploads some secrets, which happen to end up in the dma-buf
> > region user A set up
> 
> I would expect this is blocked so long as the DMABUF exists - eg the
> DMABUF will hold a fget on the FD of #1 until the DMABUF is closed, so
> that #3 can't actually happen.
> 
> > It's not mlocked memory, it's mlocked memory and I can exfiltrate
> > it.
> 
> That's just bug, don't make buggy drivers :)

Well yeah, but given that habanalabs hand rolled this I can't just check
for the usual things we have to enforce this in drm. And generally you can
just open chardevs arbitrarily, and multiple users fighting over each
another. The troubles only start when you have private state or memory
allocations of some kind attached to the struct file (instead of the
underlying device), or something else that requires device exclusivity.
There's no standard way to do that.

Plus in many cases you really want revoke on top (can't get that here
unfortunately it seems), and the attempts to get towards a generic
revoke() just never went anywhere. So again it's all hand-rolled
per-subsystem. *insert lament about us not having done this through a
proper subsystem*

Anyway it sounds like the code takes care of that.
-Daniel
Oded Gabbay Sept. 18, 2021, 8:38 a.m. UTC | #11
On Fri, Sep 17, 2021 at 3:30 PM Daniel Vetter <daniel@ffwll.ch> wrote:
>
> On Thu, Sep 16, 2021 at 10:10:14AM -0300, Jason Gunthorpe wrote:
> > On Thu, Sep 16, 2021 at 02:31:34PM +0200, Daniel Vetter wrote:
> > > On Wed, Sep 15, 2021 at 10:45:36AM +0300, Oded Gabbay wrote:
> > > > On Tue, Sep 14, 2021 at 7:12 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > > > >
> > > > > On Tue, Sep 14, 2021 at 04:18:31PM +0200, Daniel Vetter wrote:
> > > > > > On Sun, Sep 12, 2021 at 07:53:07PM +0300, Oded Gabbay wrote:
> > > > > > > Hi,
> > > > > > > Re-sending this patch-set following the release of our user-space TPC
> > > > > > > compiler and runtime library.
> > > > > > >
> > > > > > > I would appreciate a review on this.
> > > > > >
> > > > > > I think the big open we have is the entire revoke discussions. Having the
> > > > > > option to let dma-buf hang around which map to random local memory ranges,
> > > > > > without clear ownership link and a way to kill it sounds bad to me.
> > > > > >
> > > > > > I think there's a few options:
> > > > > > - We require revoke support. But I've heard rdma really doesn't like that,
> > > > > >   I guess because taking out an MR while holding the dma_resv_lock would
> > > > > >   be an inversion, so can't be done. Jason, can you recap what exactly the
> > > > > >   hold-up was again that makes this a no-go?
> > > > >
> > > > > RDMA HW can't do revoke.
> > >
> > > Like why? I'm assuming when the final open handle or whatever for that MR
> > > is closed, you do clean up everything? Or does that MR still stick around
> > > forever too?
> >
> > It is a combination of uAPI and HW specification.
> >
> > revoke here means you take a MR object and tell it to stop doing DMA
> > without causing the MR object to be destructed.
> >
> > All the drivers can of course destruct the MR, but doing such a
> > destruction without explicit synchronization with user space opens
> > things up to a serious use-after potential that could be a security
> > issue.
> >
> > When the open handle closes the userspace is synchronized with the
> > kernel and we can destruct the HW objects safely.
> >
> > So, the special HW feature required is 'stop doing DMA but keep the
> > object in an error state' which isn't really implemented, and doesn't
> > extend very well to other object types beyond simple MRs.
>
> Yeah revoke without destroying the MR doesn't work, and it sounds like
> revoke by destroying the MR just moves the can of worms around to another
> place.
>
> > > 1. User A opens gaudi device, sets up dma-buf export
> > >
> > > 2. User A registers that with RDMA, or anything else that doesn't support
> > > revoke.
> > >
> > > 3. User A closes gaudi device
> > >
> > > 4. User B opens gaudi device, assumes that it has full control over the
> > > device and uploads some secrets, which happen to end up in the dma-buf
> > > region user A set up
> >
> > I would expect this is blocked so long as the DMABUF exists - eg the
> > DMABUF will hold a fget on the FD of #1 until the DMABUF is closed, so
> > that #3 can't actually happen.
> >
> > > It's not mlocked memory, it's mlocked memory and I can exfiltrate
> > > it.
> >
> > That's just bug, don't make buggy drivers :)
>
> Well yeah, but given that habanalabs hand rolled this I can't just check
> for the usual things we have to enforce this in drm. And generally you can
> just open chardevs arbitrarily, and multiple users fighting over each
> another. The troubles only start when you have private state or memory
> allocations of some kind attached to the struct file (instead of the
> underlying device), or something else that requires device exclusivity.
> There's no standard way to do that.
>
> Plus in many cases you really want revoke on top (can't get that here
> unfortunately it seems), and the attempts to get towards a generic
> revoke() just never went anywhere. So again it's all hand-rolled
> per-subsystem. *insert lament about us not having done this through a
> proper subsystem*
>
> Anyway it sounds like the code takes care of that.
> -Daniel

Daniel, Jason,
Thanks for reviewing this code.

Can I get an R-B / A-B from you for this patch-set ?

Thanks,
Oded
Oded Gabbay Sept. 23, 2021, 9:22 a.m. UTC | #12
On Sat, Sep 18, 2021 at 11:38 AM Oded Gabbay <ogabbay@kernel.org> wrote:
>
> On Fri, Sep 17, 2021 at 3:30 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> >
> > On Thu, Sep 16, 2021 at 10:10:14AM -0300, Jason Gunthorpe wrote:
> > > On Thu, Sep 16, 2021 at 02:31:34PM +0200, Daniel Vetter wrote:
> > > > On Wed, Sep 15, 2021 at 10:45:36AM +0300, Oded Gabbay wrote:
> > > > > On Tue, Sep 14, 2021 at 7:12 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > > > > >
> > > > > > On Tue, Sep 14, 2021 at 04:18:31PM +0200, Daniel Vetter wrote:
> > > > > > > On Sun, Sep 12, 2021 at 07:53:07PM +0300, Oded Gabbay wrote:
> > > > > > > > Hi,
> > > > > > > > Re-sending this patch-set following the release of our user-space TPC
> > > > > > > > compiler and runtime library.
> > > > > > > >
> > > > > > > > I would appreciate a review on this.
> > > > > > >
> > > > > > > I think the big open we have is the entire revoke discussions. Having the
> > > > > > > option to let dma-buf hang around which map to random local memory ranges,
> > > > > > > without clear ownership link and a way to kill it sounds bad to me.
> > > > > > >
> > > > > > > I think there's a few options:
> > > > > > > - We require revoke support. But I've heard rdma really doesn't like that,
> > > > > > >   I guess because taking out an MR while holding the dma_resv_lock would
> > > > > > >   be an inversion, so can't be done. Jason, can you recap what exactly the
> > > > > > >   hold-up was again that makes this a no-go?
> > > > > >
> > > > > > RDMA HW can't do revoke.
> > > >
> > > > Like why? I'm assuming when the final open handle or whatever for that MR
> > > > is closed, you do clean up everything? Or does that MR still stick around
> > > > forever too?
> > >
> > > It is a combination of uAPI and HW specification.
> > >
> > > revoke here means you take a MR object and tell it to stop doing DMA
> > > without causing the MR object to be destructed.
> > >
> > > All the drivers can of course destruct the MR, but doing such a
> > > destruction without explicit synchronization with user space opens
> > > things up to a serious use-after potential that could be a security
> > > issue.
> > >
> > > When the open handle closes the userspace is synchronized with the
> > > kernel and we can destruct the HW objects safely.
> > >
> > > So, the special HW feature required is 'stop doing DMA but keep the
> > > object in an error state' which isn't really implemented, and doesn't
> > > extend very well to other object types beyond simple MRs.
> >
> > Yeah revoke without destroying the MR doesn't work, and it sounds like
> > revoke by destroying the MR just moves the can of worms around to another
> > place.
> >
> > > > 1. User A opens gaudi device, sets up dma-buf export
> > > >
> > > > 2. User A registers that with RDMA, or anything else that doesn't support
> > > > revoke.
> > > >
> > > > 3. User A closes gaudi device
> > > >
> > > > 4. User B opens gaudi device, assumes that it has full control over the
> > > > device and uploads some secrets, which happen to end up in the dma-buf
> > > > region user A set up
> > >
> > > I would expect this is blocked so long as the DMABUF exists - eg the
> > > DMABUF will hold a fget on the FD of #1 until the DMABUF is closed, so
> > > that #3 can't actually happen.
> > >
> > > > It's not mlocked memory, it's mlocked memory and I can exfiltrate
> > > > it.
> > >
> > > That's just bug, don't make buggy drivers :)
> >
> > Well yeah, but given that habanalabs hand rolled this I can't just check
> > for the usual things we have to enforce this in drm. And generally you can
> > just open chardevs arbitrarily, and multiple users fighting over each
> > another. The troubles only start when you have private state or memory
> > allocations of some kind attached to the struct file (instead of the
> > underlying device), or something else that requires device exclusivity.
> > There's no standard way to do that.
> >
> > Plus in many cases you really want revoke on top (can't get that here
> > unfortunately it seems), and the attempts to get towards a generic
> > revoke() just never went anywhere. So again it's all hand-rolled
> > per-subsystem. *insert lament about us not having done this through a
> > proper subsystem*
> >
> > Anyway it sounds like the code takes care of that.
> > -Daniel
>
> Daniel, Jason,
> Thanks for reviewing this code.
>
> Can I get an R-B / A-B from you for this patch-set ?
>
> Thanks,
> Oded

A kind reminder.

Thanks,
Oded
Oded Gabbay Sept. 28, 2021, 7:04 a.m. UTC | #13
On Thu, Sep 23, 2021 at 12:22 PM Oded Gabbay <ogabbay@kernel.org> wrote:
>
> On Sat, Sep 18, 2021 at 11:38 AM Oded Gabbay <ogabbay@kernel.org> wrote:
> >
> > On Fri, Sep 17, 2021 at 3:30 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> > >
> > > On Thu, Sep 16, 2021 at 10:10:14AM -0300, Jason Gunthorpe wrote:
> > > > On Thu, Sep 16, 2021 at 02:31:34PM +0200, Daniel Vetter wrote:
> > > > > On Wed, Sep 15, 2021 at 10:45:36AM +0300, Oded Gabbay wrote:
> > > > > > On Tue, Sep 14, 2021 at 7:12 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > > > > > >
> > > > > > > On Tue, Sep 14, 2021 at 04:18:31PM +0200, Daniel Vetter wrote:
> > > > > > > > On Sun, Sep 12, 2021 at 07:53:07PM +0300, Oded Gabbay wrote:
> > > > > > > > > Hi,
> > > > > > > > > Re-sending this patch-set following the release of our user-space TPC
> > > > > > > > > compiler and runtime library.
> > > > > > > > >
> > > > > > > > > I would appreciate a review on this.
> > > > > > > >
> > > > > > > > I think the big open we have is the entire revoke discussions. Having the
> > > > > > > > option to let dma-buf hang around which map to random local memory ranges,
> > > > > > > > without clear ownership link and a way to kill it sounds bad to me.
> > > > > > > >
> > > > > > > > I think there's a few options:
> > > > > > > > - We require revoke support. But I've heard rdma really doesn't like that,
> > > > > > > >   I guess because taking out an MR while holding the dma_resv_lock would
> > > > > > > >   be an inversion, so can't be done. Jason, can you recap what exactly the
> > > > > > > >   hold-up was again that makes this a no-go?
> > > > > > >
> > > > > > > RDMA HW can't do revoke.
> > > > >
> > > > > Like why? I'm assuming when the final open handle or whatever for that MR
> > > > > is closed, you do clean up everything? Or does that MR still stick around
> > > > > forever too?
> > > >
> > > > It is a combination of uAPI and HW specification.
> > > >
> > > > revoke here means you take a MR object and tell it to stop doing DMA
> > > > without causing the MR object to be destructed.
> > > >
> > > > All the drivers can of course destruct the MR, but doing such a
> > > > destruction without explicit synchronization with user space opens
> > > > things up to a serious use-after potential that could be a security
> > > > issue.
> > > >
> > > > When the open handle closes the userspace is synchronized with the
> > > > kernel and we can destruct the HW objects safely.
> > > >
> > > > So, the special HW feature required is 'stop doing DMA but keep the
> > > > object in an error state' which isn't really implemented, and doesn't
> > > > extend very well to other object types beyond simple MRs.
> > >
> > > Yeah revoke without destroying the MR doesn't work, and it sounds like
> > > revoke by destroying the MR just moves the can of worms around to another
> > > place.
> > >
> > > > > 1. User A opens gaudi device, sets up dma-buf export
> > > > >
> > > > > 2. User A registers that with RDMA, or anything else that doesn't support
> > > > > revoke.
> > > > >
> > > > > 3. User A closes gaudi device
> > > > >
> > > > > 4. User B opens gaudi device, assumes that it has full control over the
> > > > > device and uploads some secrets, which happen to end up in the dma-buf
> > > > > region user A set up
> > > >
> > > > I would expect this is blocked so long as the DMABUF exists - eg the
> > > > DMABUF will hold a fget on the FD of #1 until the DMABUF is closed, so
> > > > that #3 can't actually happen.
> > > >
> > > > > It's not mlocked memory, it's mlocked memory and I can exfiltrate
> > > > > it.
> > > >
> > > > That's just bug, don't make buggy drivers :)
> > >
> > > Well yeah, but given that habanalabs hand rolled this I can't just check
> > > for the usual things we have to enforce this in drm. And generally you can
> > > just open chardevs arbitrarily, and multiple users fighting over each
> > > another. The troubles only start when you have private state or memory
> > > allocations of some kind attached to the struct file (instead of the
> > > underlying device), or something else that requires device exclusivity.
> > > There's no standard way to do that.
> > >
> > > Plus in many cases you really want revoke on top (can't get that here
> > > unfortunately it seems), and the attempts to get towards a generic
> > > revoke() just never went anywhere. So again it's all hand-rolled
> > > per-subsystem. *insert lament about us not having done this through a
> > > proper subsystem*
> > >
> > > Anyway it sounds like the code takes care of that.
> > > -Daniel
> >
> > Daniel, Jason,
> > Thanks for reviewing this code.
> >
> > Can I get an R-B / A-B from you for this patch-set ?
> >
> > Thanks,
> > Oded
>
> A kind reminder.
>
> Thanks,
> Oded

Hi,
I know last week was LPC and maybe this got lost in the inbox, so I'm
sending it again to make sure you got my request for R-B / A-B.

Thanks,
Oded
Daniel Vetter Sept. 30, 2021, 9:13 a.m. UTC | #14
On Tue, Sep 28, 2021 at 10:04:29AM +0300, Oded Gabbay wrote:
> On Thu, Sep 23, 2021 at 12:22 PM Oded Gabbay <ogabbay@kernel.org> wrote:
> >
> > On Sat, Sep 18, 2021 at 11:38 AM Oded Gabbay <ogabbay@kernel.org> wrote:
> > >
> > > On Fri, Sep 17, 2021 at 3:30 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> > > >
> > > > On Thu, Sep 16, 2021 at 10:10:14AM -0300, Jason Gunthorpe wrote:
> > > > > On Thu, Sep 16, 2021 at 02:31:34PM +0200, Daniel Vetter wrote:
> > > > > > On Wed, Sep 15, 2021 at 10:45:36AM +0300, Oded Gabbay wrote:
> > > > > > > On Tue, Sep 14, 2021 at 7:12 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > > > > > > >
> > > > > > > > On Tue, Sep 14, 2021 at 04:18:31PM +0200, Daniel Vetter wrote:
> > > > > > > > > On Sun, Sep 12, 2021 at 07:53:07PM +0300, Oded Gabbay wrote:
> > > > > > > > > > Hi,
> > > > > > > > > > Re-sending this patch-set following the release of our user-space TPC
> > > > > > > > > > compiler and runtime library.
> > > > > > > > > >
> > > > > > > > > > I would appreciate a review on this.
> > > > > > > > >
> > > > > > > > > I think the big open we have is the entire revoke discussions. Having the
> > > > > > > > > option to let dma-buf hang around which map to random local memory ranges,
> > > > > > > > > without clear ownership link and a way to kill it sounds bad to me.
> > > > > > > > >
> > > > > > > > > I think there's a few options:
> > > > > > > > > - We require revoke support. But I've heard rdma really doesn't like that,
> > > > > > > > >   I guess because taking out an MR while holding the dma_resv_lock would
> > > > > > > > >   be an inversion, so can't be done. Jason, can you recap what exactly the
> > > > > > > > >   hold-up was again that makes this a no-go?
> > > > > > > >
> > > > > > > > RDMA HW can't do revoke.
> > > > > >
> > > > > > Like why? I'm assuming when the final open handle or whatever for that MR
> > > > > > is closed, you do clean up everything? Or does that MR still stick around
> > > > > > forever too?
> > > > >
> > > > > It is a combination of uAPI and HW specification.
> > > > >
> > > > > revoke here means you take a MR object and tell it to stop doing DMA
> > > > > without causing the MR object to be destructed.
> > > > >
> > > > > All the drivers can of course destruct the MR, but doing such a
> > > > > destruction without explicit synchronization with user space opens
> > > > > things up to a serious use-after potential that could be a security
> > > > > issue.
> > > > >
> > > > > When the open handle closes the userspace is synchronized with the
> > > > > kernel and we can destruct the HW objects safely.
> > > > >
> > > > > So, the special HW feature required is 'stop doing DMA but keep the
> > > > > object in an error state' which isn't really implemented, and doesn't
> > > > > extend very well to other object types beyond simple MRs.
> > > >
> > > > Yeah revoke without destroying the MR doesn't work, and it sounds like
> > > > revoke by destroying the MR just moves the can of worms around to another
> > > > place.
> > > >
> > > > > > 1. User A opens gaudi device, sets up dma-buf export
> > > > > >
> > > > > > 2. User A registers that with RDMA, or anything else that doesn't support
> > > > > > revoke.
> > > > > >
> > > > > > 3. User A closes gaudi device
> > > > > >
> > > > > > 4. User B opens gaudi device, assumes that it has full control over the
> > > > > > device and uploads some secrets, which happen to end up in the dma-buf
> > > > > > region user A set up
> > > > >
> > > > > I would expect this is blocked so long as the DMABUF exists - eg the
> > > > > DMABUF will hold a fget on the FD of #1 until the DMABUF is closed, so
> > > > > that #3 can't actually happen.
> > > > >
> > > > > > It's not mlocked memory, it's mlocked memory and I can exfiltrate
> > > > > > it.
> > > > >
> > > > > That's just bug, don't make buggy drivers :)
> > > >
> > > > Well yeah, but given that habanalabs hand rolled this I can't just check
> > > > for the usual things we have to enforce this in drm. And generally you can
> > > > just open chardevs arbitrarily, and multiple users fighting over each
> > > > another. The troubles only start when you have private state or memory
> > > > allocations of some kind attached to the struct file (instead of the
> > > > underlying device), or something else that requires device exclusivity.
> > > > There's no standard way to do that.
> > > >
> > > > Plus in many cases you really want revoke on top (can't get that here
> > > > unfortunately it seems), and the attempts to get towards a generic
> > > > revoke() just never went anywhere. So again it's all hand-rolled
> > > > per-subsystem. *insert lament about us not having done this through a
> > > > proper subsystem*
> > > >
> > > > Anyway it sounds like the code takes care of that.
> > > > -Daniel
> > >
> > > Daniel, Jason,
> > > Thanks for reviewing this code.
> > >
> > > Can I get an R-B / A-B from you for this patch-set ?
> > >
> > > Thanks,
> > > Oded
> >
> > A kind reminder.
> >
> > Thanks,
> > Oded
> 
> Hi,
> I know last week was LPC and maybe this got lost in the inbox, so I'm
> sending it again to make sure you got my request for R-B / A-B.

I was waiting for some clarity from the maintainers summit, but that's
still about as unclear as it gets. Either way technically it sounds ok,
but I'm a bit burried so didn't look at the code.

Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>

But looking beyond the strict lens of dma-buf I'm still impressed by the
mess this created, to get to the same endpoint of "we open our stack" in
the same time it takes others to sort this out. I'm still looking for some
kind of plan to fix this.

Also you probably want to get Dave to ack this too, I pinged him on irc
last week about this after maintainer summit.
-Daniel