diff mbox series

[RFCv2,01/24] iommu: Add RCU-protected page free support

Message ID 20230518204650.14541-2-joao.m.martins@oracle.com (mailing list archive)
State New, archived
Headers show
Series IOMMUFD Dirty Tracking | expand

Commit Message

Joao Martins May 18, 2023, 8:46 p.m. UTC
From: Lu Baolu <baolu.lu@linux.intel.com>

The IOMMU page tables are updated using iommu_map/unmap() interfaces.
Currently, there is no mandatory requirement for drivers to use locks
to ensure concurrent updates to page tables, because it's assumed that
overlapping IOVA ranges do not have concurrent updates. Therefore the
IOMMU drivers only need to take care of concurrent updates to level
page table entries.

But enabling new features challenges this assumption. For example, the
hardware assisted dirty page tracking feature requires scanning page
tables in interfaces other than mapping and unmapping. This might result
in a use-after-free scenario in which a level page table has been freed
by the unmap() interface, while another thread is scanning the next level
page table.

This adds RCU-protected page free support so that the pages are really
freed and reused after a RCU grace period. Hence, the page tables are
safe for scanning within a rcu_read_lock critical region. Considering
that scanning the page table is a rare case, this also adds a domain
flag and the RCU-protected page free is only used when this flat is set.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/iommu.c | 23 +++++++++++++++++++++++
 include/linux/iommu.h | 10 ++++++++++
 2 files changed, 33 insertions(+)

Comments

Jason Gunthorpe May 19, 2023, 1:32 p.m. UTC | #1
On Thu, May 18, 2023 at 09:46:27PM +0100, Joao Martins wrote:
> From: Lu Baolu <baolu.lu@linux.intel.com>
> 
> The IOMMU page tables are updated using iommu_map/unmap() interfaces.
> Currently, there is no mandatory requirement for drivers to use locks
> to ensure concurrent updates to page tables, because it's assumed that
> overlapping IOVA ranges do not have concurrent updates. Therefore the
> IOMMU drivers only need to take care of concurrent updates to level
> page table entries.
> 
> But enabling new features challenges this assumption. For example, the
> hardware assisted dirty page tracking feature requires scanning page
> tables in interfaces other than mapping and unmapping. This might result
> in a use-after-free scenario in which a level page table has been freed
> by the unmap() interface, while another thread is scanning the next level
> page table.

I'm not convinced.. The basic model we have is that the caller has to
bring the range locking and the caller has to promise it doesn't do
overlapping things to ranges.

iommufd implements this with area based IOVA range locking.

So, I don't really see an obvious reason why we can't also require
that the dirty reporting hold the area lock and domain locks while it
is calling the iommu driver?

Then we don't have a locking or RCU problem here.

Jason
Joao Martins May 19, 2023, 4:48 p.m. UTC | #2
On 19/05/2023 14:32, Jason Gunthorpe wrote:
> On Thu, May 18, 2023 at 09:46:27PM +0100, Joao Martins wrote:
>> From: Lu Baolu <baolu.lu@linux.intel.com>
>>
>> The IOMMU page tables are updated using iommu_map/unmap() interfaces.
>> Currently, there is no mandatory requirement for drivers to use locks
>> to ensure concurrent updates to page tables, because it's assumed that
>> overlapping IOVA ranges do not have concurrent updates. Therefore the
>> IOMMU drivers only need to take care of concurrent updates to level
>> page table entries.
>>
>> But enabling new features challenges this assumption. For example, the
>> hardware assisted dirty page tracking feature requires scanning page
>> tables in interfaces other than mapping and unmapping. This might result
>> in a use-after-free scenario in which a level page table has been freed
>> by the unmap() interface, while another thread is scanning the next level
>> page table.
> 
> I'm not convinced.. The basic model we have is that the caller has to
> bring the range locking and the caller has to promise it doesn't do
> overlapping things to ranges.
> 
> iommufd implements this with area based IOVA range locking.
> 
Right

> So, I don't really see an obvious reason why we can't also require
> that the dirty reporting hold the area lock and domain locks while it
> is calling the iommu driver?
> 
> Then we don't have a locking or RCU problem here.
> 
I would rather keep base on area range locking -- I think I got confused from
this other thread discussion on having RCU for the iopte walks. I'll remove it
from the series for now, and if later deemed needed it should come as a separate
thing.
diff mbox series

Patch

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 91573efd9488..2088caae5074 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -3432,3 +3432,26 @@  struct iommu_domain *iommu_sva_domain_alloc(struct device *dev,
 
 	return domain;
 }
+
+static void pgtble_page_free_rcu(struct rcu_head *rcu)
+{
+	struct page *page = container_of(rcu, struct page, rcu_head);
+
+	__free_pages(page, 0);
+}
+
+void iommu_free_pgtbl_pages(struct iommu_domain *domain,
+			    struct list_head *pages)
+{
+	struct page *page, *next;
+
+	if (!domain->concurrent_traversal) {
+		put_pages_list(pages);
+		return;
+	}
+
+	list_for_each_entry_safe(page, next, pages, lru) {
+		list_del(&page->lru);
+		call_rcu(&page->rcu_head, pgtble_page_free_rcu);
+	}
+}
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index e8c9a7da1060..39d25645a5ab 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -110,6 +110,7 @@  struct iommu_domain {
 			int users;
 		};
 	};
+	unsigned long concurrent_traversal:1;
 };
 
 static inline bool iommu_is_dma_domain(struct iommu_domain *domain)
@@ -697,6 +698,12 @@  static inline void dev_iommu_priv_set(struct device *dev, void *priv)
 	dev->iommu->priv = priv;
 }
 
+static inline void domain_set_concurrent_traversal(struct iommu_domain *domain,
+						   bool value)
+{
+	domain->concurrent_traversal = value;
+}
+
 int iommu_probe_device(struct device *dev);
 
 int iommu_dev_enable_feature(struct device *dev, enum iommu_dev_features f);
@@ -721,6 +728,9 @@  void iommu_detach_device_pasid(struct iommu_domain *domain,
 struct iommu_domain *
 iommu_get_domain_for_dev_pasid(struct device *dev, ioasid_t pasid,
 			       unsigned int type);
+
+void iommu_free_pgtbl_pages(struct iommu_domain *domain,
+			    struct list_head *pages);
 #else /* CONFIG_IOMMU_API */
 
 struct iommu_ops {};