diff mbox series

[03/11] PCI: pci_stub: Suppress kernel DMA ownership auto-claiming

Message ID 20211115020552.2378167-4-baolu.lu@linux.intel.com (mailing list archive)
State Superseded
Delegated to: Bjorn Helgaas
Headers show
Series Fix BUG_ON in vfio_iommu_group_notifier() | expand

Commit Message

Baolu Lu Nov. 15, 2021, 2:05 a.m. UTC
pci_stub allows the admin to block driver binding on a device and make
it permanently shared with userspace. Since pci_stub does not do DMA,
it is safe. However the admin must understand that using pci_stub allows
userspace to attack whatever device it was bound to.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/pci/pci-stub.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Christoph Hellwig Nov. 15, 2021, 1:21 p.m. UTC | #1
On Mon, Nov 15, 2021 at 10:05:44AM +0800, Lu Baolu wrote:
> pci_stub allows the admin to block driver binding on a device and make
> it permanently shared with userspace. Since pci_stub does not do DMA,
> it is safe.

If an IOMMU is setup and dma-iommu or friends are not used nothing is
unsafe anyway, it just is that IOMMU won't work..

> However the admin must understand that using pci_stub allows
> userspace to attack whatever device it was bound to.

I don't understand this sentence at all.
Jason Gunthorpe Nov. 15, 2021, 1:31 p.m. UTC | #2
On Mon, Nov 15, 2021 at 05:21:26AM -0800, Christoph Hellwig wrote:
> On Mon, Nov 15, 2021 at 10:05:44AM +0800, Lu Baolu wrote:
> > pci_stub allows the admin to block driver binding on a device and make
> > it permanently shared with userspace. Since pci_stub does not do DMA,
> > it is safe.
> 
> If an IOMMU is setup and dma-iommu or friends are not used nothing is
> unsafe anyway, it just is that IOMMU won't work..
> 
> > However the admin must understand that using pci_stub allows
> > userspace to attack whatever device it was bound to.
> 
> I don't understand this sentence at all.

If userspace has control of device A and can cause A to issue DMA to
arbitary DMA addresses then there are certain PCI topologies where A
can now issue peer to peer DMA and manipulate the MMMIO registers in
device B.

A kernel driver on device B is thus subjected to concurrent
manipulation of the device registers from userspace.

So, a 'safe' kernel driver is one that can tolerate this, and an
'unsafe' driver is one where userspace can break kernel integrity.

The second issue is DMA - because there is only one iommu_domain
underlying many devices if we give that iommu_domain to userspace it
means the kernel DMA API on other devices no longer works. 

So no kernel driver doing DMA can work at all, under any PCI topology,
if userspace owns the IO page table.

Jason
Robin Murphy Nov. 15, 2021, 3:14 p.m. UTC | #3
On 2021-11-15 13:31, Jason Gunthorpe via iommu wrote:
> On Mon, Nov 15, 2021 at 05:21:26AM -0800, Christoph Hellwig wrote:
>> On Mon, Nov 15, 2021 at 10:05:44AM +0800, Lu Baolu wrote:
>>> pci_stub allows the admin to block driver binding on a device and make
>>> it permanently shared with userspace. Since pci_stub does not do DMA,
>>> it is safe.
>>
>> If an IOMMU is setup and dma-iommu or friends are not used nothing is
>> unsafe anyway, it just is that IOMMU won't work..
>>
>>> However the admin must understand that using pci_stub allows
>>> userspace to attack whatever device it was bound to.
>>
>> I don't understand this sentence at all.
> 
> If userspace has control of device A and can cause A to issue DMA to
> arbitary DMA addresses then there are certain PCI topologies where A
> can now issue peer to peer DMA and manipulate the MMMIO registers in
> device B.
> 
> A kernel driver on device B is thus subjected to concurrent
> manipulation of the device registers from userspace.
> 
> So, a 'safe' kernel driver is one that can tolerate this, and an
> 'unsafe' driver is one where userspace can break kernel integrity.

You mean in the case where the kernel driver is trying to use device B 
in a purely PIO mode, such that userspace might potentially be able to 
interfere with data being transferred in and out of the kernel? Perhaps 
it's not so clear to put that under a notion of "DMA ownership", since 
device B's DMA is irrelevant and it's really much more equivalent to 
/dev/mem access or mmaping BARs to userspace while a driver is bound.

> The second issue is DMA - because there is only one iommu_domain
> underlying many devices if we give that iommu_domain to userspace it
> means the kernel DMA API on other devices no longer works.

Actually, the DMA API itself via iommu-dma will "work" just fine in the 
sense that it will still successfully perform all its operations in the 
unattached default domain, it's just that if the driver then programs 
the device to access the returned DMA address, the device is likely to 
get a nasty surprise.

> So no kernel driver doing DMA can work at all, under any PCI topology,
> if userspace owns the IO page table.

This isn't really about userspace at all - it's true of any case where a 
kernel driver wants to attach a grouped device to its own unmanaged 
domain. The fact that the VFIO kernel driver uses its unmanaged domains 
to map user pages upon user requests is merely a VFIO detail, and VFIO 
happens to be the only common case where unmanaged domains and 
non-singleton groups intersect. I'd say that, logically, if you want to 
put policy on mutual driver/usage compatibility anywhere it should be in 
iommu_attach_group().

Robin.
Jason Gunthorpe Nov. 15, 2021, 4:17 p.m. UTC | #4
On Mon, Nov 15, 2021 at 03:14:49PM +0000, Robin Murphy wrote:

> > If userspace has control of device A and can cause A to issue DMA to
> > arbitary DMA addresses then there are certain PCI topologies where A
> > can now issue peer to peer DMA and manipulate the MMMIO registers in
> > device B.
> > 
> > A kernel driver on device B is thus subjected to concurrent
> > manipulation of the device registers from userspace.
> > 
> > So, a 'safe' kernel driver is one that can tolerate this, and an
> > 'unsafe' driver is one where userspace can break kernel integrity.
> 
> You mean in the case where the kernel driver is trying to use device B in a
> purely PIO mode, such that userspace might potentially be able to interfere
> with data being transferred in and out of the kernel?

s/PIO/MMIO, but yes basically. And not just data trasnfer but
userspace can interfere with the device state as well.

> Perhaps it's not so clear to put that under a notion of "DMA
> ownership", since device B's DMA is irrelevant and it's really much
> more equivalent to /dev/mem access or mmaping BARs to userspace
> while a driver is bound.

It is DMA ownership because device A's DMA is what is relevant
here. device A's DMA compromises device B. So device A asserts it has
USER ownership for DMA.

Any device in a group with USER ownership is incompatible with a
kernel driver.

> > The second issue is DMA - because there is only one iommu_domain
> > underlying many devices if we give that iommu_domain to userspace it
> > means the kernel DMA API on other devices no longer works.
> 
> Actually, the DMA API itself via iommu-dma will "work" just fine in the
> sense that it will still successfully perform all its operations in the
> unattached default domain, it's just that if the driver then programs the
> device to access the returned DMA address, the device is likely to get a
> nasty surprise.

A DMA API that returns an dma_ddr_t that does not result in data
transfer to the specified buffers is not working, in my book - it
breaks the API contract.

> > So no kernel driver doing DMA can work at all, under any PCI topology,
> > if userspace owns the IO page table.
> 
> This isn't really about userspace at all - it's true of any case where a
> kernel driver wants to attach a grouped device to its own unmanaged
> domain.

This is true for the dma api issue in isolation.

I think if we have a user someday it would make sense to add another
API DMA_OWNER_DRIVER_DOMAIN that captures how the dma API doesn't work
but DMA MMIO attacks are not possible.

> The fact that the VFIO kernel driver uses its unmanaged domains to map user
> pages upon user requests is merely a VFIO detail, and VFIO happens to be the
> only common case where unmanaged domains and non-singleton groups intersect.
> I'd say that, logically, if you want to put policy on mutual driver/usage
> compatibility anywhere it should be in iommu_attach_group().

It would make sense for iommu_attach_group() to require that the
DMA_OWNERSHIP is USER or DRIVER_DOMAIN.

That has a nice symmetry with iommu_attach_device() already requiring
that the group has a single device. For a driver to use these APIs it
must ensure security, one way or another.

That is a good idea, but requires understanding what tegra is
doing. Maybe tegra is that DMA_OWNER_DRIVER_DOMAIN user?

I wouldn't want to see iommu_attach_group() change the DMA_OWNERSHIP,
I think ownership is cleaner as a dedicated API. Adding a file * and
probably the enum to iommu_attach_group() feels weird.

We need the dedicated API for the dma_configure op, and keeping
ownership split from the current domain makes more sense with the
design in the iommfd RFC.

Thanks,
Jason
Robin Murphy Nov. 15, 2021, 5:54 p.m. UTC | #5
On 2021-11-15 16:17, Jason Gunthorpe wrote:
> On Mon, Nov 15, 2021 at 03:14:49PM +0000, Robin Murphy wrote:
> 
>>> If userspace has control of device A and can cause A to issue DMA to
>>> arbitary DMA addresses then there are certain PCI topologies where A
>>> can now issue peer to peer DMA and manipulate the MMMIO registers in
>>> device B.
>>>
>>> A kernel driver on device B is thus subjected to concurrent
>>> manipulation of the device registers from userspace.
>>>
>>> So, a 'safe' kernel driver is one that can tolerate this, and an
>>> 'unsafe' driver is one where userspace can break kernel integrity.
>>
>> You mean in the case where the kernel driver is trying to use device B in a
>> purely PIO mode, such that userspace might potentially be able to interfere
>> with data being transferred in and out of the kernel?
> 
> s/PIO/MMIO, but yes basically. And not just data trasnfer but
> userspace can interfere with the device state as well.

Sure, but unexpected changes in device state could happen for any number 
of reasons - uncorrected ECC error, surprise removal, etc. - so if that 
can affect "kernel integrity" I'm considering it an independent problem.

>> Perhaps it's not so clear to put that under a notion of "DMA
>> ownership", since device B's DMA is irrelevant and it's really much
>> more equivalent to /dev/mem access or mmaping BARs to userspace
>> while a driver is bound.
> 
> It is DMA ownership because device A's DMA is what is relevant
> here. device A's DMA compromises device B. So device A asserts it has
> USER ownership for DMA.
> 
> Any device in a group with USER ownership is incompatible with a
> kernel driver.

I can see the argument from that angle, but you can equally look at it 
another way and say that a device with kernel ownership is incompatible 
with a kernel driver, if userspace can call write() on 
"/sys/devices/B/resource0" such that device A's kernel driver DMAs all 
over it. Maybe that particular example lands firmly under "just don't do 
that", but I'd like to figure out where exactly we should draw the line 
between "DMA" and "ability to mess with a device".

>>> The second issue is DMA - because there is only one iommu_domain
>>> underlying many devices if we give that iommu_domain to userspace it
>>> means the kernel DMA API on other devices no longer works.
>>
>> Actually, the DMA API itself via iommu-dma will "work" just fine in the
>> sense that it will still successfully perform all its operations in the
>> unattached default domain, it's just that if the driver then programs the
>> device to access the returned DMA address, the device is likely to get a
>> nasty surprise.
> 
> A DMA API that returns an dma_ddr_t that does not result in data
> transfer to the specified buffers is not working, in my book - it
> breaks the API contract.
> 
>>> So no kernel driver doing DMA can work at all, under any PCI topology,
>>> if userspace owns the IO page table.
>>
>> This isn't really about userspace at all - it's true of any case where a
>> kernel driver wants to attach a grouped device to its own unmanaged
>> domain.
> 
> This is true for the dma api issue in isolation.

No, it's definitely a general IOMMU-API-level thing; you could just as 
well have two drivers both trying to attach to their own unmanaged 
domains without DMA API involvement. What I think it boils down to is 
that if multiple devices in a group are bound (or want to bind) to 
different drivers, we want to enforce some kind of consensus between 
those drivers over IOMMU API usage.

> I think if we have a user someday it would make sense to add another
> API DMA_OWNER_DRIVER_DOMAIN that captures how the dma API doesn't work
> but DMA MMIO attacks are not possible.
> 
>> The fact that the VFIO kernel driver uses its unmanaged domains to map user
>> pages upon user requests is merely a VFIO detail, and VFIO happens to be the
>> only common case where unmanaged domains and non-singleton groups intersect.
>> I'd say that, logically, if you want to put policy on mutual driver/usage
>> compatibility anywhere it should be in iommu_attach_group().
> 
> It would make sense for iommu_attach_group() to require that the
> DMA_OWNERSHIP is USER or DRIVER_DOMAIN.
> 
> That has a nice symmetry with iommu_attach_device() already requiring
> that the group has a single device. For a driver to use these APIs it
> must ensure security, one way or another.

iommu_attach_device() is supposed to be deprecated and eventually going 
away; I wouldn't look at it too much.

> That is a good idea, but requires understanding what tegra is
> doing. Maybe tegra is that DMA_OWNER_DRIVER_DOMAIN user?
> 
> I wouldn't want to see iommu_attach_group() change the DMA_OWNERSHIP,
> I think ownership is cleaner as a dedicated API. Adding a file * and
> probably the enum to iommu_attach_group() feels weird.

Indeed I wasn't imagining it changing any ownership, just preventing a 
group from being attached to a non-default domain if it contains devices 
bound to different incompatible drivers. Basically just taking the 
existing check that VFIO tries to enforce and formalising it into the 
core API. It's not too far off what we already have around changing the 
default domain type, so there seems to be room for it to all fit 
together quite nicely.

There would still need to be separate enforcement elsewhere to prevent 
new drivers binding *after* a group *has* been attached to an unmanaged 
domain, but again it can still be in those simplest terms. Tying it in 
to userspace and FDs just muddies the water far more than necessary.

Robin.

> We need the dedicated API for the dma_configure op, and keeping
> ownership split from the current domain makes more sense with the
> design in the iommfd RFC.
> 
> Thanks,
> Jason
>
Christoph Hellwig Nov. 15, 2021, 6:19 p.m. UTC | #6
On Mon, Nov 15, 2021 at 05:54:42PM +0000, Robin Murphy wrote:
> > s/PIO/MMIO, but yes basically. And not just data trasnfer but
> > userspace can interfere with the device state as well.
> 
> Sure, but unexpected changes in device state could happen for any number of
> reasons - uncorrected ECC error, surprise removal, etc. - so if that can
> affect "kernel integrity" I'm considering it an independent problem.

Well, most DMA is triggered by the host requesting it through MMIO.
So having access to the BAR can turn many devices into somewhat
arbitrary DMA engines.

> I can see the argument from that angle, but you can equally look at it
> another way and say that a device with kernel ownership is incompatible with
> a kernel driver, if userspace can call write() on "/sys/devices/B/resource0"
> such that device A's kernel driver DMAs all over it. Maybe that particular
> example lands firmly under "just don't do that", but I'd like to figure out
> where exactly we should draw the line between "DMA" and "ability to mess
> with a device".

Userspace writing to the resourceN files with a bound driver is a mive
receipe for trouble.  Do we really allow this currently?
Robin Murphy Nov. 15, 2021, 6:44 p.m. UTC | #7
On 2021-11-15 18:19, Christoph Hellwig wrote:
> On Mon, Nov 15, 2021 at 05:54:42PM +0000, Robin Murphy wrote:
>>> s/PIO/MMIO, but yes basically. And not just data trasnfer but
>>> userspace can interfere with the device state as well.
>>
>> Sure, but unexpected changes in device state could happen for any number of
>> reasons - uncorrected ECC error, surprise removal, etc. - so if that can
>> affect "kernel integrity" I'm considering it an independent problem.
> 
> Well, most DMA is triggered by the host requesting it through MMIO.
> So having access to the BAR can turn many devices into somewhat
> arbitrary DMA engines.

Yup, but as far as I understand we're talking about the situation where 
the overall group is already attached to the VFIO domain by virtue of 
device A, so any unsolicited DMA by device B could only be to 
userspace's own memory.

>> I can see the argument from that angle, but you can equally look at it
>> another way and say that a device with kernel ownership is incompatible with
>> a kernel driver, if userspace can call write() on "/sys/devices/B/resource0"
>> such that device A's kernel driver DMAs all over it. Maybe that particular
>> example lands firmly under "just don't do that", but I'd like to figure out
>> where exactly we should draw the line between "DMA" and "ability to mess
>> with a device".
> 
> Userspace writing to the resourceN files with a bound driver is a mive
> receipe for trouble.  Do we really allow this currently?

No idea - I just want to make sure we don't get blinkered on VFIO at 
this point and consider the potential problem space fully :)

Robin.
Jason Gunthorpe Nov. 15, 2021, 7:22 p.m. UTC | #8
On Mon, Nov 15, 2021 at 05:54:42PM +0000, Robin Murphy wrote:
> On 2021-11-15 16:17, Jason Gunthorpe wrote:
> > On Mon, Nov 15, 2021 at 03:14:49PM +0000, Robin Murphy wrote:
> > 
> > > > If userspace has control of device A and can cause A to issue DMA to
> > > > arbitary DMA addresses then there are certain PCI topologies where A
> > > > can now issue peer to peer DMA and manipulate the MMMIO registers in
> > > > device B.
> > > > 
> > > > A kernel driver on device B is thus subjected to concurrent
> > > > manipulation of the device registers from userspace.
> > > > 
> > > > So, a 'safe' kernel driver is one that can tolerate this, and an
> > > > 'unsafe' driver is one where userspace can break kernel integrity.
> > > 
> > > You mean in the case where the kernel driver is trying to use device B in a
> > > purely PIO mode, such that userspace might potentially be able to interfere
> > > with data being transferred in and out of the kernel?
> > 
> > s/PIO/MMIO, but yes basically. And not just data trasnfer but
> > userspace can interfere with the device state as well.
> 
> Sure, but unexpected changes in device state could happen for any number of
> reasons - uncorrected ECC error, surprise removal, etc. - so if that can
> affect "kernel integrity" I'm considering it an independent problem.

There is a big difference in my mind between a device/HW attacking the
kernel and userspace can attack the kernel. They are both valid cases,
and I know people are also working on the device/HW attacks the kernel
problem.

This series is only about user attacks kernel.

> > > Perhaps it's not so clear to put that under a notion of "DMA
> > > ownership", since device B's DMA is irrelevant and it's really much
> > > more equivalent to /dev/mem access or mmaping BARs to userspace
> > > while a driver is bound.
> > 
> > It is DMA ownership because device A's DMA is what is relevant
> > here. device A's DMA compromises device B. So device A asserts it has
> > USER ownership for DMA.
> > 
> > Any device in a group with USER ownership is incompatible with a
> > kernel driver.
> 
> I can see the argument from that angle, but you can equally look at it
> another way and say that a device with kernel ownership is incompatible with
> a kernel driver, if userspace can call write() on "/sys/devices/B/resource0"
> such that device A's kernel driver DMAs all over it. Maybe that particular
> example lands firmly under "just don't do that", but I'd like to figure out
> where exactly we should draw the line between "DMA" and "ability to mess
> with a device".

The above scenarios are already blocked by the kernel with
LOCKDOWN_DEV_MEM - yes there are historical ways to violate kernel
integrity, and these days they almost all have mitigation. I would
consider any kernel integrity violation to be a bug today if
LOCKDOWN_INTEGRITY_MAX is enabled.

I don't know why you bring this up?

> > That has a nice symmetry with iommu_attach_device() already requiring
> > that the group has a single device. For a driver to use these APIs it
> > must ensure security, one way or another.
> 
> iommu_attach_device() is supposed to be deprecated and eventually going
> away; I wouldn't look at it too much.

What is the preference then? This is the only working API today,
right?

> Indeed I wasn't imagining it changing any ownership, just preventing a group
> from being attached to a non-default domain if it contains devices bound to
> different incompatible drivers. 

So this could solve just the domain/DMA API problem, but it leaves the
MMIO peer-to-peer issue unsolved, and it gives no tools to solve it in
a layered way. 

This seems like half an idea, do you have a solution for the rest?

The concept of DMA USER is important here, and it is more than just
which domain is attached.

> Basically just taking the existing check that VFIO tries to enforce
> and formalising it into the core API. It's not too far off what we
> already have around changing the default domain type, so there seems
> to be room for it to all fit together quite nicely.

VFIO also has logic related to the file

> Tying it in to userspace and FDs just muddies the water far more
> than necessary.

It isn't muddying the water, it is providing common security code that
is easy to undertand.

*Which* userspace FD/process owns the iommu_group is important
security information because we can't have process A do DMA attacks on
some other process B.

Before userspace can be allowed to touch the MMIO registers it must
ensure ownership. This is also why the split API makes sense.

Jason
Bjorn Helgaas Nov. 15, 2021, 8:48 p.m. UTC | #9
On Mon, Nov 15, 2021 at 10:05:44AM +0800, Lu Baolu wrote:
> pci_stub allows the admin to block driver binding on a device and make
> it permanently shared with userspace. Since pci_stub does not do DMA,
> it is safe. 

Can you elaborate on what "permanently shared with userspace" means
here?  I assume it's only permanent as long as pci-stub is bound to
the device?

Also, a few words about what "it is safe" means here would be helpful.

> However the admin must understand that using pci_stub allows
> userspace to attack whatever device it was bound to.

The admin isn't going to read this sentence.  Should there be a doc
update related to this?  What sort of attack does this refer to?

> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> ---
>  drivers/pci/pci-stub.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/pci/pci-stub.c b/drivers/pci/pci-stub.c
> index e408099fea52..6324c68602b4 100644
> --- a/drivers/pci/pci-stub.c
> +++ b/drivers/pci/pci-stub.c
> @@ -36,6 +36,9 @@ static struct pci_driver stub_driver = {
>  	.name		= "pci-stub",
>  	.id_table	= NULL,	/* only dynamic id's */
>  	.probe		= pci_stub_probe,
> +	.driver		= {
> +		.suppress_auto_claim_dma_owner = true,
> +	},
>  };
>  
>  static int __init pci_stub_init(void)
> -- 
> 2.25.1
>
Robin Murphy Nov. 15, 2021, 8:58 p.m. UTC | #10
On 2021-11-15 19:22, Jason Gunthorpe wrote:
> On Mon, Nov 15, 2021 at 05:54:42PM +0000, Robin Murphy wrote:
>> On 2021-11-15 16:17, Jason Gunthorpe wrote:
>>> On Mon, Nov 15, 2021 at 03:14:49PM +0000, Robin Murphy wrote:
>>>
>>>>> If userspace has control of device A and can cause A to issue DMA to
>>>>> arbitary DMA addresses then there are certain PCI topologies where A
>>>>> can now issue peer to peer DMA and manipulate the MMMIO registers in
>>>>> device B.
>>>>>
>>>>> A kernel driver on device B is thus subjected to concurrent
>>>>> manipulation of the device registers from userspace.
>>>>>
>>>>> So, a 'safe' kernel driver is one that can tolerate this, and an
>>>>> 'unsafe' driver is one where userspace can break kernel integrity.
>>>>
>>>> You mean in the case where the kernel driver is trying to use device B in a
>>>> purely PIO mode, such that userspace might potentially be able to interfere
>>>> with data being transferred in and out of the kernel?
>>>
>>> s/PIO/MMIO, but yes basically. And not just data trasnfer but
>>> userspace can interfere with the device state as well.
>>
>> Sure, but unexpected changes in device state could happen for any number of
>> reasons - uncorrected ECC error, surprise removal, etc. - so if that can
>> affect "kernel integrity" I'm considering it an independent problem.
> 
> There is a big difference in my mind between a device/HW attacking the
> kernel and userspace can attack the kernel. They are both valid cases,
> and I know people are also working on the device/HW attacks the kernel
> problem.
> 
> This series is only about user attacks kernel.

Indeed, I was just commenting that if there's any actual attack surface 
for "user makes device go wrong" then that's really a whole other issue. 
I took "device state" to mean any state *other than* what could be used 
to observe and/or subvert the kernel's normal operation of the device. 
Say it's some kind of storage device with some global state bit that 
could be flipped to disable encryption on blocks being written such that 
the medium could be attacked offline later, that's still firmly in my 
"interfering with data transfer" category.

>>>> Perhaps it's not so clear to put that under a notion of "DMA
>>>> ownership", since device B's DMA is irrelevant and it's really much
>>>> more equivalent to /dev/mem access or mmaping BARs to userspace
>>>> while a driver is bound.
>>>
>>> It is DMA ownership because device A's DMA is what is relevant
>>> here. device A's DMA compromises device B. So device A asserts it has
>>> USER ownership for DMA.
>>>
>>> Any device in a group with USER ownership is incompatible with a
>>> kernel driver.
>>
>> I can see the argument from that angle, but you can equally look at it
>> another way and say that a device with kernel ownership is incompatible with
>> a kernel driver, if userspace can call write() on "/sys/devices/B/resource0"
>> such that device A's kernel driver DMAs all over it. Maybe that particular
>> example lands firmly under "just don't do that", but I'd like to figure out
>> where exactly we should draw the line between "DMA" and "ability to mess
>> with a device".
> 
> The above scenarios are already blocked by the kernel with
> LOCKDOWN_DEV_MEM - yes there are historical ways to violate kernel
> integrity, and these days they almost all have mitigation. I would
> consider any kernel integrity violation to be a bug today if
> LOCKDOWN_INTEGRITY_MAX is enabled.
> 
> I don't know why you bring this up?

Because as soon as anyone brings up "security" I'm not going to blindly 
accept the implicit assumption that VFIO is the only possible way to get 
one device to mess with another. That was just a silly example in the 
most basic terms, and obviously I don't expect well-worn generic sysfs 
interfaces to be a genuine threat, but how confident are you that no 
other subsystem- or driver-level interfaces past present and future can 
ever be tricked into p2p DMA?

>>> That has a nice symmetry with iommu_attach_device() already requiring
>>> that the group has a single device. For a driver to use these APIs it
>>> must ensure security, one way or another.
>>
>> iommu_attach_device() is supposed to be deprecated and eventually going
>> away; I wouldn't look at it too much.
> 
> What is the preference then? This is the only working API today,
> right?

I believe the intent was that everyone should move to 
iommu_group_get()/iommu_attach_group() - precisely *because* 
iommu_attach_device() can't work sensibly for multi-device groups.

>> Indeed I wasn't imagining it changing any ownership, just preventing a group
>> from being attached to a non-default domain if it contains devices bound to
>> different incompatible drivers.
> 
> So this could solve just the domain/DMA API problem, but it leaves the
> MMIO peer-to-peer issue unsolved, and it gives no tools to solve it in
> a layered way.
> 
> This seems like half an idea, do you have a solution for the rest?

Tell me how the p2p DMA issue can manifest if device A is prohibited 
from attaching to VFIO's unmanaged domain while device B still has a 
driver bound, and thus would fail to be assigned to userspace in the 
first place. And conversely if non-VFIO drivers are still prevented from 
binding to device B while device A remains attached to the VFIO domain.

(Bonus: if so, also tell me how that wouldn't disprove your initial 
argument anyway)

> The concept of DMA USER is important here, and it is more than just
> which domain is attached.

Tell me how a device would be assigned to userspace while its group is 
still attached to a kernel-managed default domain.

As soon as anyone calls iommu_attach_group() - or indeed 
iommu_attach_device() if more devices may end up hotplugged into the 
same group later - *that's* when the door opens for potential subversion 
of any kind, without ever having to leave kernel space.

>> Basically just taking the existing check that VFIO tries to enforce
>> and formalising it into the core API. It's not too far off what we
>> already have around changing the default domain type, so there seems
>> to be room for it to all fit together quite nicely.
> 
> VFIO also has logic related to the file

Yes, because unsurprisingly VFIO code is tailored for the specific case 
of VFIO usage rather than anything more general.

>> Tying it in to userspace and FDs just muddies the water far more
>> than necessary.
> 
> It isn't muddying the water, it is providing common security code that
> is easy to undertand.
> 
> *Which* userspace FD/process owns the iommu_group is important
> security information because we can't have process A do DMA attacks on
> some other process B.

Tell me how a single group could be attached to two domains representing 
two different process address spaces at once.

In case this concept wasn't as clear as I thought, which seems to be so:

                  | dev->iommu_group->domain | dev->driver
------------------------------------------------------------
DMA_OWNER_NONE   |          default         |   unbound
DMA_OWNER_KERNEL |          default         |    bound
DMA_OWNER_USER   |        non-default       |    bound

It's literally that simple. The information's already there. And in a 
more robust form to boot, given that, as before, "user" ownership may 
still exist entirely within kernel space.

Thanks,
Robin.
Jason Gunthorpe Nov. 15, 2021, 9:19 p.m. UTC | #11
On Mon, Nov 15, 2021 at 08:58:19PM +0000, Robin Murphy wrote:
> > The above scenarios are already blocked by the kernel with
> > LOCKDOWN_DEV_MEM - yes there are historical ways to violate kernel
> > integrity, and these days they almost all have mitigation. I would
> > consider any kernel integrity violation to be a bug today if
> > LOCKDOWN_INTEGRITY_MAX is enabled.
> > 
> > I don't know why you bring this up?
> 
> Because as soon as anyone brings up "security" I'm not going to blindly
> accept the implicit assumption that VFIO is the only possible way to get one
> device to mess with another. That was just a silly example in the most basic
> terms, and obviously I don't expect well-worn generic sysfs interfaces to be
> a genuine threat, but how confident are you that no other subsystem- or
> driver-level interfaces past present and future can ever be tricked into p2p
> DMA?

Given the definition of LOCKDOWN_INTEGRITY_MAX I will consider any
past/present/future p2p attacks as definitive kernel bugs.

Generally, allowing a device to do arbitary DMA to a userspace
controlled address is a pretty serious bug, and directly attacking the
kernel memory is a much more interesting and serious threat vector.

> > What is the preference then? This is the only working API today,
> > right?
> 
> I believe the intent was that everyone should move to
> iommu_group_get()/iommu_attach_group() - precisely *because*
> iommu_attach_device() can't work sensibly for multi-device groups.

And iommu_attach_group() can't work sensibly for anything except VFIO
today, so hum :)

> > > Indeed I wasn't imagining it changing any ownership, just preventing a group
> > > from being attached to a non-default domain if it contains devices bound to
> > > different incompatible drivers.
> > 
> > So this could solve just the domain/DMA API problem, but it leaves the
> > MMIO peer-to-peer issue unsolved, and it gives no tools to solve it in
> > a layered way.
> > 
> > This seems like half an idea, do you have a solution for the rest?
> 
> Tell me how the p2p DMA issue can manifest if device A is prohibited from
> attaching to VFIO's unmanaged domain while device B still has a driver
> bound, and thus would fail to be assigned to userspace in the first place.
> And conversely if non-VFIO drivers are still prevented from binding to
> device B while device A remains attached to the VFIO domain.

You've assumed that a group is continuously attached to the same
domain during the entire period that userspace has MMIO.

Any domain detatch creates a race where a kernel driver can jump in
and bind, while user space continues to have MMIO control over a
device. That violates the security invariant.

Many new flows, like PASID support, are requiring dynamically changing
the domain bound to a group.

If you want to go in this direction then we also need to have some
kind of compare and swap operation for the domain currently bound to a
group.

From a security perspective I disliek this idea a lot. Instead of
having nice clear barriers indicating the security domain we have a
very subtle 'so long as a domain is attached' idea, which is going to
get broken.

> > The concept of DMA USER is important here, and it is more than just
> > which domain is attached.
> 
> Tell me how a device would be assigned to userspace while its group is still
> attached to a kernel-managed default domain.
> 
> As soon as anyone calls iommu_attach_group() - or indeed
> iommu_attach_device() if more devices may end up hotplugged into the same
> group later - *that's* when the door opens for potential subversion of any
> kind, without ever having to leave kernel space.

The approach in this series can solve both, attach_device could switch
the device to user mode and it will block future hot plugged kernel
drivers.

> > VFIO also has logic related to the file
> 
> Yes, because unsurprisingly VFIO code is tailored for the specific case of
> VFIO usage rather than anything more general.

VFIO represents this class of users exposing the IOMMU to userspace,
I say it is general of that use class.

> > It isn't muddying the water, it is providing common security code that
> > is easy to undertand.
> > 
> > *Which* userspace FD/process owns the iommu_group is important
> > security information because we can't have process A do DMA attacks on
> > some other process B.
> 
> Tell me how a single group could be attached to two domains representing two
> different process address spaces at once.

Again you are focused on domains and ignoring MMIO.

Requiring the users of the API to continuously assert a non-default
domain is a non-trivial ask.

> In case this concept wasn't as clear as I thought, which seems to be so:
>
>                  | dev->iommu_group->domain | dev->driver
> DMA_OWNER_NONE   |          default         |   unbound
> DMA_OWNER_KERNEL |          default         |    bound
> DMA_OWNER_USER   |        non-default       |    bound

Unfortunately this can't use dev->driver. Reading dev->driver of every
device in the group requires holding the device_lock. really_probe()
already holds the device lock so this becomes a user triggerable ABBA
deadlock when scaled up to N devices.

This is why this series uses only the group mutex and tracks if
drivers are bound inside the group. I view that as unavoidable.

Jason
Bjorn Helgaas Nov. 15, 2021, 10:17 p.m. UTC | #12
On Mon, Nov 15, 2021 at 10:05:44AM +0800, Lu Baolu wrote:
> pci_stub allows the admin to block driver binding on a device and make
> it permanently shared with userspace. Since pci_stub does not do DMA,
> it is safe. However the admin must understand that using pci_stub allows
> userspace to attack whatever device it was bound to.

This commit log doesn't say what the patch does.  I think it tells us
something about what pci-stub *already* does ("allows admin to block
driver binding") and something about why that is safe ("does not do
DMA").

But it doesn't say what this patch changes.  Based on the subject
line, I expected something like:

  As of ("<commit subject>"), <some function>() marks the iommu_group
  as containing only devices with kernel drivers that manage DMA.

  Avoid this default behavior for pci-stub because it does not program
  any DMA itself.  This allows <some desirable behavior>.

> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> ---
>  drivers/pci/pci-stub.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/pci/pci-stub.c b/drivers/pci/pci-stub.c
> index e408099fea52..6324c68602b4 100644
> --- a/drivers/pci/pci-stub.c
> +++ b/drivers/pci/pci-stub.c
> @@ -36,6 +36,9 @@ static struct pci_driver stub_driver = {
>  	.name		= "pci-stub",
>  	.id_table	= NULL,	/* only dynamic id's */
>  	.probe		= pci_stub_probe,
> +	.driver		= {
> +		.suppress_auto_claim_dma_owner = true,
> +	},
>  };
>  
>  static int __init pci_stub_init(void)
> -- 
> 2.25.1
>
Baolu Lu Nov. 16, 2021, 6:05 a.m. UTC | #13
Hi Bjorn,

On 11/16/21 6:17 AM, Bjorn Helgaas wrote:
> On Mon, Nov 15, 2021 at 10:05:44AM +0800, Lu Baolu wrote:
>> pci_stub allows the admin to block driver binding on a device and make
>> it permanently shared with userspace. Since pci_stub does not do DMA,
>> it is safe. However the admin must understand that using pci_stub allows
>> userspace to attack whatever device it was bound to.
> This commit log doesn't say what the patch does.  I think it tells us
> something about what pci-stub*already*  does ("allows admin to block
> driver binding") and something about why that is safe ("does not do
> DMA").

Yes, you are right. This patch is to keep the pci_stub's existing use
case ("allows admin to block driver binding") after moving the viable
check from the vfio to iommu layer (done by this series).

About "safe" (should not be part of this description), there are two
sides from my understanding:

#1) The pci_stub driver itself doesn't control the device to do any DMA.
     So it won't interfere the user space through device DMA.

#2) The pci_stub driver doesn't access the PCI bar and doesn't build any
     device driver state around any value in the bar. So other devices
     in the same iommu group (assigned to user space) have no means to
     change the kernel driver consistency via p2p access.
> 
> But it doesn't say what this patch changes.  Based on the subject
> line, I expected something like:
> 
>    As of ("<commit subject>"), <some function>() marks the iommu_group
>    as containing only devices with kernel drivers that manage DMA.
> 
>    Avoid this default behavior for pci-stub because it does not program
>    any DMA itself.  This allows <some desirable behavior>.
> 

Sure. I will rephrase the description like above.

Best regards,
baolu
diff mbox series

Patch

diff --git a/drivers/pci/pci-stub.c b/drivers/pci/pci-stub.c
index e408099fea52..6324c68602b4 100644
--- a/drivers/pci/pci-stub.c
+++ b/drivers/pci/pci-stub.c
@@ -36,6 +36,9 @@  static struct pci_driver stub_driver = {
 	.name		= "pci-stub",
 	.id_table	= NULL,	/* only dynamic id's */
 	.probe		= pci_stub_probe,
+	.driver		= {
+		.suppress_auto_claim_dma_owner = true,
+	},
 };
 
 static int __init pci_stub_init(void)