diff mbox series

vfio/type1: Limit DMA mappings per container

Message ID 155414977872.12780.13728555131525362206.stgit@gimli.home (mailing list archive)
State New, archived
Headers show
Series vfio/type1: Limit DMA mappings per container | expand

Commit Message

Alex Williamson April 1, 2019, 8:16 p.m. UTC
Memory backed DMA mappings are accounted against a user's locked
memory limit, including multiple mappings of the same memory.  This
accounting bounds the number of such mappings that a user can create.
However, DMA mappings that are not backed by memory, such as DMA
mappings of device MMIO via mmaps, do not make use of page pinning
and therefore do not count against the user's locked memory limit.
These mappings still consume memory, but the memory is not well
associated to the process for the purpose of oom killing a task.

To add bounding on this use case, we introduce a limit to the total
number of concurrent DMA mappings that a user is allowed to create.
This limit is exposed as a tunable module option where the default
value of 64K is expected to be well in excess of any reasonable use
case (a large virtual machine configuration would typically only make
use of tens of concurrent mappings).

This fixes CVE-2019-3882.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
 drivers/vfio/vfio_iommu_type1.c |   14 ++++++++++++++
 1 file changed, 14 insertions(+)

Comments

Peter Xu April 2, 2019, 2:41 a.m. UTC | #1
On Mon, Apr 01, 2019 at 02:16:52PM -0600, Alex Williamson wrote:

[...]

> @@ -1081,8 +1088,14 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
>  		goto out_unlock;
>  	}
>  
> +	if (!atomic_add_unless(&iommu->dma_avail, -1, 0)) {
> +		ret = -ENOSPC;
> +		goto out_unlock;
> +	}
> +
>  	dma = kzalloc(sizeof(*dma), GFP_KERNEL);
>  	if (!dma) {
> +		atomic_inc(&iommu->dma_avail);

This should be the only special path to revert the change.  Not sure
whether this can be avoided by simply using atomic_read() or even
READ_ONCE() (I feel like we don't need atomic ops with dma_avail
because we've had the mutex but it of course it doesn't hurt...) to
replace atomic_add_unless() above to check against zero then we do
+1/-1 in vfio_[un]link_dma() only.  But AFAICT this patch is correct.

Reviewed-by: Peter Xu <peterx@redhat.com>

Thanks,
Alex Williamson April 2, 2019, 4:34 a.m. UTC | #2
On Tue, 2 Apr 2019 10:41:15 +0800
Peter Xu <peterx@redhat.com> wrote:

> On Mon, Apr 01, 2019 at 02:16:52PM -0600, Alex Williamson wrote:
> 
> [...]
> 
> > @@ -1081,8 +1088,14 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
> >  		goto out_unlock;
> >  	}
> >  
> > +	if (!atomic_add_unless(&iommu->dma_avail, -1, 0)) {
> > +		ret = -ENOSPC;
> > +		goto out_unlock;
> > +	}
> > +
> >  	dma = kzalloc(sizeof(*dma), GFP_KERNEL);
> >  	if (!dma) {
> > +		atomic_inc(&iommu->dma_avail);  
> 
> This should be the only special path to revert the change.  Not sure
> whether this can be avoided by simply using atomic_read() or even
> READ_ONCE() (I feel like we don't need atomic ops with dma_avail
> because we've had the mutex but it of course it doesn't hurt...) to
> replace atomic_add_unless() above to check against zero then we do
> +1/-1 in vfio_[un]link_dma() only.  But AFAICT this patch is correct.

Thanks for the review, you're right, we're only twiddling this atomic
while holding the iommu->lock mutex, so it appears unnecessary.  Since
we're within the mutex, I think we don't even need a READ_ONCE.  We can
simple test it before alloc and decrement after.  Am I missing something
that would specifically require READ_ONCE within our mutex critical
section?  Thanks,

Alex
Peter Xu April 2, 2019, 5:18 a.m. UTC | #3
On Mon, Apr 01, 2019 at 10:34:13PM -0600, Alex Williamson wrote:
> On Tue, 2 Apr 2019 10:41:15 +0800
> Peter Xu <peterx@redhat.com> wrote:
> 
> > On Mon, Apr 01, 2019 at 02:16:52PM -0600, Alex Williamson wrote:
> > 
> > [...]
> > 
> > > @@ -1081,8 +1088,14 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
> > >  		goto out_unlock;
> > >  	}
> > >  
> > > +	if (!atomic_add_unless(&iommu->dma_avail, -1, 0)) {
> > > +		ret = -ENOSPC;
> > > +		goto out_unlock;
> > > +	}
> > > +
> > >  	dma = kzalloc(sizeof(*dma), GFP_KERNEL);
> > >  	if (!dma) {
> > > +		atomic_inc(&iommu->dma_avail);  
> > 
> > This should be the only special path to revert the change.  Not sure
> > whether this can be avoided by simply using atomic_read() or even
> > READ_ONCE() (I feel like we don't need atomic ops with dma_avail
> > because we've had the mutex but it of course it doesn't hurt...) to
> > replace atomic_add_unless() above to check against zero then we do
> > +1/-1 in vfio_[un]link_dma() only.  But AFAICT this patch is correct.
> 
> Thanks for the review, you're right, we're only twiddling this atomic
> while holding the iommu->lock mutex, so it appears unnecessary.  Since
> we're within the mutex, I think we don't even need a READ_ONCE.  We can
> simple test it before alloc and decrement after.  Am I missing something
> that would specifically require READ_ONCE within our mutex critical
> section?  Thanks,

I don't know very clear on this and I'd be glad to learn about that.
My understanding is that [READ|WRITE]_ONCE() is the same as volatile
mem operation and will make sure we don't keep variables in the
registers.  So if the mutex semantics can support that (say, a "*addr
= val" following with a mutex_unlock will make sure "val" will
definitely land into memory of "&addr") then I do think it's fine even
without it (which corresponds to WRITE_ONCE(&addr, val) in this case).

Thanks,
Alex Williamson April 2, 2019, 2:40 p.m. UTC | #4
On Tue, 2 Apr 2019 13:18:02 +0800
Peter Xu <peterx@redhat.com> wrote:

> On Mon, Apr 01, 2019 at 10:34:13PM -0600, Alex Williamson wrote:
> > On Tue, 2 Apr 2019 10:41:15 +0800
> > Peter Xu <peterx@redhat.com> wrote:
> >   
> > > On Mon, Apr 01, 2019 at 02:16:52PM -0600, Alex Williamson wrote:
> > > 
> > > [...]
> > >   
> > > > @@ -1081,8 +1088,14 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
> > > >  		goto out_unlock;
> > > >  	}
> > > >  
> > > > +	if (!atomic_add_unless(&iommu->dma_avail, -1, 0)) {
> > > > +		ret = -ENOSPC;
> > > > +		goto out_unlock;
> > > > +	}
> > > > +
> > > >  	dma = kzalloc(sizeof(*dma), GFP_KERNEL);
> > > >  	if (!dma) {
> > > > +		atomic_inc(&iommu->dma_avail);    
> > > 
> > > This should be the only special path to revert the change.  Not sure
> > > whether this can be avoided by simply using atomic_read() or even
> > > READ_ONCE() (I feel like we don't need atomic ops with dma_avail
> > > because we've had the mutex but it of course it doesn't hurt...) to
> > > replace atomic_add_unless() above to check against zero then we do
> > > +1/-1 in vfio_[un]link_dma() only.  But AFAICT this patch is correct.  
> > 
> > Thanks for the review, you're right, we're only twiddling this atomic
> > while holding the iommu->lock mutex, so it appears unnecessary.  Since
> > we're within the mutex, I think we don't even need a READ_ONCE.  We can
> > simple test it before alloc and decrement after.  Am I missing something
> > that would specifically require READ_ONCE within our mutex critical
> > section?  Thanks,  
> 
> I don't know very clear on this and I'd be glad to learn about that.
> My understanding is that [READ|WRITE]_ONCE() is the same as volatile
> mem operation and will make sure we don't keep variables in the
> registers.  So if the mutex semantics can support that (say, a "*addr
> = val" following with a mutex_unlock will make sure "val" will
> definitely land into memory of "&addr") then I do think it's fine even
> without it (which corresponds to WRITE_ONCE(&addr, val) in this case).

The READ/WRITE_ONCE macros add memory barriers, but we have the mutex
for protecting concurrent access to the data.  I don't see that there's
anything special about a counter on the iommu object that needs special
attention vs any other elements that might get modified in these
sections.  Thanks,

Alex
Cornelia Huck April 2, 2019, 2:58 p.m. UTC | #5
On Mon, 01 Apr 2019 14:16:52 -0600
Alex Williamson <alex.williamson@redhat.com> wrote:

> Memory backed DMA mappings are accounted against a user's locked
> memory limit, including multiple mappings of the same memory.  This
> accounting bounds the number of such mappings that a user can create.
> However, DMA mappings that are not backed by memory, such as DMA
> mappings of device MMIO via mmaps, do not make use of page pinning
> and therefore do not count against the user's locked memory limit.
> These mappings still consume memory, but the memory is not well
> associated to the process for the purpose of oom killing a task.
> 
> To add bounding on this use case, we introduce a limit to the total
> number of concurrent DMA mappings that a user is allowed to create.
> This limit is exposed as a tunable module option where the default
> value of 64K is expected to be well in excess of any reasonable use
> case (a large virtual machine configuration would typically only make
> use of tens of concurrent mappings).
> 
> This fixes CVE-2019-3882.
> 
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> ---
>  drivers/vfio/vfio_iommu_type1.c |   14 ++++++++++++++
>  1 file changed, 14 insertions(+)

Reviewed-by: Cornelia Huck <cohuck@redhat.com>
diff mbox series

Patch

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 73652e21efec..7fc8fd7d4dc7 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -58,12 +58,18 @@  module_param_named(disable_hugepages,
 MODULE_PARM_DESC(disable_hugepages,
 		 "Disable VFIO IOMMU support for IOMMU hugepages.");
 
+static int dma_entry_limit __read_mostly = U16_MAX;
+module_param_named(dma_entry_limit, dma_entry_limit, int, 0644);
+MODULE_PARM_DESC(dma_entry_limit,
+		 "Maximum number of user DMA mappings per container (65535).");
+
 struct vfio_iommu {
 	struct list_head	domain_list;
 	struct vfio_domain	*external_domain; /* domain for external user */
 	struct mutex		lock;
 	struct rb_root		dma_list;
 	struct blocking_notifier_head notifier;
+	atomic_t		dma_avail;
 	bool			v2;
 	bool			nesting;
 };
@@ -836,6 +842,7 @@  static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
 	vfio_unlink_dma(iommu, dma);
 	put_task_struct(dma->task);
 	kfree(dma);
+	atomic_inc(&iommu->dma_avail);
 }
 
 static unsigned long vfio_pgsize_bitmap(struct vfio_iommu *iommu)
@@ -1081,8 +1088,14 @@  static int vfio_dma_do_map(struct vfio_iommu *iommu,
 		goto out_unlock;
 	}
 
+	if (!atomic_add_unless(&iommu->dma_avail, -1, 0)) {
+		ret = -ENOSPC;
+		goto out_unlock;
+	}
+
 	dma = kzalloc(sizeof(*dma), GFP_KERNEL);
 	if (!dma) {
+		atomic_inc(&iommu->dma_avail);
 		ret = -ENOMEM;
 		goto out_unlock;
 	}
@@ -1583,6 +1596,7 @@  static void *vfio_iommu_type1_open(unsigned long arg)
 
 	INIT_LIST_HEAD(&iommu->domain_list);
 	iommu->dma_list = RB_ROOT;
+	atomic_set(&iommu->dma_avail, dma_entry_limit);
 	mutex_init(&iommu->lock);
 	BLOCKING_INIT_NOTIFIER_HEAD(&iommu->notifier);