diff mbox series

[v2] xen/gntdev: Avoid blocking in unmap_grant_pages()

Message ID 20220525184153.6059-1-demi@invisiblethingslab.com (mailing list archive)
State Superseded
Headers show
Series [v2] xen/gntdev: Avoid blocking in unmap_grant_pages() | expand

Commit Message

Demi Marie Obenour May 25, 2022, 6:41 p.m. UTC
unmap_grant_pages() currently waits for the pages to no longer be used.
In https://github.com/QubesOS/qubes-issues/issues/7481, this lead to a
deadlock against i915: i915 was waiting for gntdev's MMU notifier to
finish, while gntdev was waiting for i915 to free its pages.  I also
believe this is responsible for various deadlocks I have experienced in
the past.

Avoid these problems by making unmap_grant_pages async.  This requires
making it return void, as any errors will not be available when the
function returns.  Fortunately, the only use of the return value is a
WARN_ON(), which can be replaced by a WARN_ON when the error is
detected.  Additionally, a failed call will not prevent further calls
from being made, but this is harmless.

Because unmap_grant_pages is now async, the grant handle will be sent to
INVALID_GRANT_HANDLE too late to prevent multiple unmaps of the same
handle.  Instead, a separate bool array is allocated for this purpose.
This wastes memory, but stuffing this information in padding bytes is
too fragile.  Furthermore, it is necessary to grab a reference to the
map before making the asynchronous call, and release the reference when
the call returns.

Fixes: 745282256c75 ("xen/gntdev: safely unmap grants in case they are still in use")
Cc: stable@vger.kernel.org
Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com>
---
 drivers/xen/gntdev-common.h |   5 ++
 drivers/xen/gntdev.c        | 100 +++++++++++++++++++-----------------
 2 files changed, 59 insertions(+), 46 deletions(-)

Comments

Jürgen Groß May 30, 2022, 6:41 a.m. UTC | #1
On 25.05.22 20:41, Demi Marie Obenour wrote:
> unmap_grant_pages() currently waits for the pages to no longer be used.
> In https://github.com/QubesOS/qubes-issues/issues/7481, this lead to a
> deadlock against i915: i915 was waiting for gntdev's MMU notifier to
> finish, while gntdev was waiting for i915 to free its pages.  I also
> believe this is responsible for various deadlocks I have experienced in
> the past.
> 
> Avoid these problems by making unmap_grant_pages async.  This requires
> making it return void, as any errors will not be available when the
> function returns.  Fortunately, the only use of the return value is a
> WARN_ON(), which can be replaced by a WARN_ON when the error is
> detected.  Additionally, a failed call will not prevent further calls
> from being made, but this is harmless.
> 
> Because unmap_grant_pages is now async, the grant handle will be sent to
> INVALID_GRANT_HANDLE too late to prevent multiple unmaps of the same
> handle.  Instead, a separate bool array is allocated for this purpose.
> This wastes memory, but stuffing this information in padding bytes is
> too fragile.  Furthermore, it is necessary to grab a reference to the
> map before making the asynchronous call, and release the reference when
> the call returns.

I think there is even more syncing needed:

- In the error path of gntdev_mmap() unmap_grant_pages() is being called and
   it is assumed, map is available afterwards again. This should be rather easy
   to avoid by adding a counter of active mappings to struct gntdev_grant_map
   (number of pages not being unmapped yet). In case this counter is not zero
   gntdev_mmap() should bail out early.

- gntdev_put_map() is calling unmap_grant_pages() in case the refcount has
   dropped to zero. This call can set the refcount to 1 again, so there is
   another delay needed before freeing map. I think unmap_grant_pages() should
   return in case the count of mapped pages is zero (see above), thus avoiding
   to increment the refcount of map if nothing is to be done. This would enable
   gntdev_put_map() to just return after the call of unmap_grant_pages() in case
   the refcount has been incremented again.


Juergen
Demi Marie Obenour May 30, 2022, 5:50 p.m. UTC | #2
On Mon, May 30, 2022 at 08:41:20AM +0200, Juergen Gross wrote:
> On 25.05.22 20:41, Demi Marie Obenour wrote:
> > unmap_grant_pages() currently waits for the pages to no longer be used.
> > In https://github.com/QubesOS/qubes-issues/issues/7481, this lead to a
> > deadlock against i915: i915 was waiting for gntdev's MMU notifier to
> > finish, while gntdev was waiting for i915 to free its pages.  I also
> > believe this is responsible for various deadlocks I have experienced in
> > the past.
> > 
> > Avoid these problems by making unmap_grant_pages async.  This requires
> > making it return void, as any errors will not be available when the
> > function returns.  Fortunately, the only use of the return value is a
> > WARN_ON(), which can be replaced by a WARN_ON when the error is
> > detected.  Additionally, a failed call will not prevent further calls
> > from being made, but this is harmless.
> > 
> > Because unmap_grant_pages is now async, the grant handle will be sent to
> > INVALID_GRANT_HANDLE too late to prevent multiple unmaps of the same
> > handle.  Instead, a separate bool array is allocated for this purpose.
> > This wastes memory, but stuffing this information in padding bytes is
> > too fragile.  Furthermore, it is necessary to grab a reference to the
> > map before making the asynchronous call, and release the reference when
> > the call returns.
> 
> I think there is even more syncing needed:
> 
> - In the error path of gntdev_mmap() unmap_grant_pages() is being called and
>   it is assumed, map is available afterwards again. This should be rather easy
>   to avoid by adding a counter of active mappings to struct gntdev_grant_map
>   (number of pages not being unmapped yet). In case this counter is not zero
>   gntdev_mmap() should bail out early.

Is it possible to just unmap the pages directly here?  I don’t think
there can be any other users of these pages yet.  Userspace could race
against the unmap and cause a page fault, but that should just cause
userspace to get SIGSEGV or SIGBUS.  In any case this code can use the
sync version; if it gets blocked that’s userspace’s problem.

> - gntdev_put_map() is calling unmap_grant_pages() in case the refcount has
>   dropped to zero. This call can set the refcount to 1 again, so there is
>   another delay needed before freeing map. I think unmap_grant_pages() should
>   return in case the count of mapped pages is zero (see above), thus avoiding
>   to increment the refcount of map if nothing is to be done. This would enable
>   gntdev_put_map() to just return after the call of unmap_grant_pages() in case
>   the refcount has been incremented again.

I will change this in v3, but I do wonder if gntdev is using the wrong
MMU notifier callback.  It seems that the appropriate callback is
actually release(): if I understand correctly, release() is called
precisely when the refcount on the physical page is about to drop to 0,
and that is what we want.
Jürgen Groß June 2, 2022, 9:54 a.m. UTC | #3
On 30.05.22 19:50, Demi Marie Obenour wrote:
> On Mon, May 30, 2022 at 08:41:20AM +0200, Juergen Gross wrote:
>> On 25.05.22 20:41, Demi Marie Obenour wrote:
>>> unmap_grant_pages() currently waits for the pages to no longer be used.
>>> In https://github.com/QubesOS/qubes-issues/issues/7481, this lead to a
>>> deadlock against i915: i915 was waiting for gntdev's MMU notifier to
>>> finish, while gntdev was waiting for i915 to free its pages.  I also
>>> believe this is responsible for various deadlocks I have experienced in
>>> the past.
>>>
>>> Avoid these problems by making unmap_grant_pages async.  This requires
>>> making it return void, as any errors will not be available when the
>>> function returns.  Fortunately, the only use of the return value is a
>>> WARN_ON(), which can be replaced by a WARN_ON when the error is
>>> detected.  Additionally, a failed call will not prevent further calls
>>> from being made, but this is harmless.
>>>
>>> Because unmap_grant_pages is now async, the grant handle will be sent to
>>> INVALID_GRANT_HANDLE too late to prevent multiple unmaps of the same
>>> handle.  Instead, a separate bool array is allocated for this purpose.
>>> This wastes memory, but stuffing this information in padding bytes is
>>> too fragile.  Furthermore, it is necessary to grab a reference to the
>>> map before making the asynchronous call, and release the reference when
>>> the call returns.
>>
>> I think there is even more syncing needed:
>>
>> - In the error path of gntdev_mmap() unmap_grant_pages() is being called and
>>    it is assumed, map is available afterwards again. This should be rather easy
>>    to avoid by adding a counter of active mappings to struct gntdev_grant_map
>>    (number of pages not being unmapped yet). In case this counter is not zero
>>    gntdev_mmap() should bail out early.
> 
> Is it possible to just unmap the pages directly here?  I don’t think
> there can be any other users of these pages yet.  Userspace could race
> against the unmap and cause a page fault, but that should just cause
> userspace to get SIGSEGV or SIGBUS.  In any case this code can use the
> sync version; if it gets blocked that’s userspace’s problem.
> 
>> - gntdev_put_map() is calling unmap_grant_pages() in case the refcount has
>>    dropped to zero. This call can set the refcount to 1 again, so there is
>>    another delay needed before freeing map. I think unmap_grant_pages() should
>>    return in case the count of mapped pages is zero (see above), thus avoiding
>>    to increment the refcount of map if nothing is to be done. This would enable
>>    gntdev_put_map() to just return after the call of unmap_grant_pages() in case
>>    the refcount has been incremented again.
> 
> I will change this in v3, but I do wonder if gntdev is using the wrong
> MMU notifier callback.  It seems that the appropriate callback is
> actually release(): if I understand correctly, release() is called
> precisely when the refcount on the physical page is about to drop to 0,
> and that is what we want.

No, I don't think this is correct.

release() is being called when the refcount of the address space is about
to drop to 0. It has no page granularity.


Juergen
diff mbox series

Patch

diff --git a/drivers/xen/gntdev-common.h b/drivers/xen/gntdev-common.h
index 20d7d059dadb..a268cdb1f7bf 100644
--- a/drivers/xen/gntdev-common.h
+++ b/drivers/xen/gntdev-common.h
@@ -16,6 +16,7 @@ 
 #include <linux/mmu_notifier.h>
 #include <linux/types.h>
 #include <xen/interface/event_channel.h>
+#include <xen/grant_table.h>
 
 struct gntdev_dmabuf_priv;
 
@@ -56,6 +57,7 @@  struct gntdev_grant_map {
 	struct gnttab_unmap_grant_ref *unmap_ops;
 	struct gnttab_map_grant_ref   *kmap_ops;
 	struct gnttab_unmap_grant_ref *kunmap_ops;
+	bool *being_removed;
 	struct page **pages;
 	unsigned long pages_vm_start;
 
@@ -73,6 +75,9 @@  struct gntdev_grant_map {
 	/* Needed to avoid allocation in gnttab_dma_free_pages(). */
 	xen_pfn_t *frames;
 #endif
+
+	/* Needed to avoid allocation in __unmap_grant_pages */
+	struct gntab_unmap_queue_data unmap_data;
 };
 
 struct gntdev_grant_map *gntdev_alloc_map(struct gntdev_priv *priv, int count,
diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
index 59ffea800079..90bd2b5ef7dd 100644
--- a/drivers/xen/gntdev.c
+++ b/drivers/xen/gntdev.c
@@ -35,6 +35,7 @@ 
 #include <linux/slab.h>
 #include <linux/highmem.h>
 #include <linux/refcount.h>
+#include <linux/workqueue.h>
 
 #include <xen/xen.h>
 #include <xen/grant_table.h>
@@ -62,8 +63,8 @@  MODULE_PARM_DESC(limit,
 
 static int use_ptemod;
 
-static int unmap_grant_pages(struct gntdev_grant_map *map,
-			     int offset, int pages);
+static void unmap_grant_pages(struct gntdev_grant_map *map,
+			      int offset, int pages);
 
 static struct miscdevice gntdev_miscdev;
 
@@ -120,6 +121,7 @@  static void gntdev_free_map(struct gntdev_grant_map *map)
 	kvfree(map->unmap_ops);
 	kvfree(map->kmap_ops);
 	kvfree(map->kunmap_ops);
+	kvfree(map->being_removed);
 	kfree(map);
 }
 
@@ -140,10 +142,13 @@  struct gntdev_grant_map *gntdev_alloc_map(struct gntdev_priv *priv, int count,
 	add->unmap_ops = kvmalloc_array(count, sizeof(add->unmap_ops[0]),
 					GFP_KERNEL);
 	add->pages     = kvcalloc(count, sizeof(add->pages[0]), GFP_KERNEL);
+	add->being_removed =
+		kvcalloc(count, sizeof(add->being_removed[0]), GFP_KERNEL);
 	if (NULL == add->grants    ||
 	    NULL == add->map_ops   ||
 	    NULL == add->unmap_ops ||
-	    NULL == add->pages)
+	    NULL == add->pages     ||
+	    NULL == add->being_removed)
 		goto err;
 	if (use_ptemod) {
 		add->kmap_ops   = kvmalloc_array(count, sizeof(add->kmap_ops[0]),
@@ -349,79 +354,84 @@  int gntdev_map_grant_pages(struct gntdev_grant_map *map)
 	return err;
 }
 
-static int __unmap_grant_pages(struct gntdev_grant_map *map, int offset,
-			       int pages)
+static void __unmap_grant_pages_done(int result,
+		struct gntab_unmap_queue_data *data)
 {
-	int i, err = 0;
-	struct gntab_unmap_queue_data unmap_data;
-
-	if (map->notify.flags & UNMAP_NOTIFY_CLEAR_BYTE) {
-		int pgno = (map->notify.addr >> PAGE_SHIFT);
-		if (pgno >= offset && pgno < offset + pages) {
-			/* No need for kmap, pages are in lowmem */
-			uint8_t *tmp = pfn_to_kaddr(page_to_pfn(map->pages[pgno]));
-			tmp[map->notify.addr & (PAGE_SIZE-1)] = 0;
-			map->notify.flags &= ~UNMAP_NOTIFY_CLEAR_BYTE;
-		}
-	}
-
-	unmap_data.unmap_ops = map->unmap_ops + offset;
-	unmap_data.kunmap_ops = use_ptemod ? map->kunmap_ops + offset : NULL;
-	unmap_data.pages = map->pages + offset;
-	unmap_data.count = pages;
-
-	err = gnttab_unmap_refs_sync(&unmap_data);
-	if (err)
-		return err;
+	unsigned int i;
+	struct gntdev_grant_map *map = data->data;
+	unsigned int offset = data->unmap_ops - map->unmap_ops;
 
-	for (i = 0; i < pages; i++) {
-		if (map->unmap_ops[offset+i].status)
-			err = -EINVAL;
+	for (i = 0; i < data->count; i++) {
+		WARN_ON(map->unmap_ops[offset+i].status);
 		pr_debug("unmap handle=%d st=%d\n",
 			map->unmap_ops[offset+i].handle,
 			map->unmap_ops[offset+i].status);
 		map->unmap_ops[offset+i].handle = INVALID_GRANT_HANDLE;
 		if (use_ptemod) {
-			if (map->kunmap_ops[offset+i].status)
-				err = -EINVAL;
+			WARN_ON(map->kunmap_ops[offset+i].status);
 			pr_debug("kunmap handle=%u st=%d\n",
 				 map->kunmap_ops[offset+i].handle,
 				 map->kunmap_ops[offset+i].status);
 			map->kunmap_ops[offset+i].handle = INVALID_GRANT_HANDLE;
 		}
 	}
-	return err;
+
+	/* Release reference taken by __unmap_grant_pages */
+	gntdev_put_map(NULL, map);
 }
 
-static int unmap_grant_pages(struct gntdev_grant_map *map, int offset,
-			     int pages)
+static void __unmap_grant_pages(struct gntdev_grant_map *map, int offset,
+			       int pages)
+{
+	if (map->notify.flags & UNMAP_NOTIFY_CLEAR_BYTE) {
+		int pgno = (map->notify.addr >> PAGE_SHIFT);
+
+		if (pgno >= offset && pgno < offset + pages) {
+			/* No need for kmap, pages are in lowmem */
+			uint8_t *tmp = pfn_to_kaddr(page_to_pfn(map->pages[pgno]));
+
+			tmp[map->notify.addr & (PAGE_SIZE-1)] = 0;
+			map->notify.flags &= ~UNMAP_NOTIFY_CLEAR_BYTE;
+		}
+	}
+
+	map->unmap_data.unmap_ops = map->unmap_ops + offset;
+	map->unmap_data.kunmap_ops = use_ptemod ? map->kunmap_ops + offset : NULL;
+	map->unmap_data.pages = map->pages + offset;
+	map->unmap_data.count = pages;
+	map->unmap_data.done = __unmap_grant_pages_done;
+	map->unmap_data.data = map;
+	refcount_inc(&map->users); /* to keep map alive during async call below */
+
+	gnttab_unmap_refs_async(&map->unmap_data);
+}
+
+static void unmap_grant_pages(struct gntdev_grant_map *map, int offset,
+			      int pages)
 {
-	int range, err = 0;
+	int range;
 
 	pr_debug("unmap %d+%d [%d+%d]\n", map->index, map->count, offset, pages);
 
 	/* It is possible the requested range will have a "hole" where we
 	 * already unmapped some of the grants. Only unmap valid ranges.
 	 */
-	while (pages && !err) {
-		while (pages &&
-		       map->unmap_ops[offset].handle == INVALID_GRANT_HANDLE) {
+	while (pages) {
+		while (pages && map->being_removed[offset]) {
 			offset++;
 			pages--;
 		}
 		range = 0;
 		while (range < pages) {
-			if (map->unmap_ops[offset + range].handle ==
-			    INVALID_GRANT_HANDLE)
+			if (map->being_removed[offset + range])
 				break;
+			map->being_removed[offset + range] = true;
 			range++;
 		}
-		err = __unmap_grant_pages(map, offset, range);
+		__unmap_grant_pages(map, offset, range);
 		offset += range;
 		pages -= range;
 	}
-
-	return err;
 }
 
 /* ------------------------------------------------------------------ */
@@ -473,7 +483,6 @@  static bool gntdev_invalidate(struct mmu_interval_notifier *mn,
 	struct gntdev_grant_map *map =
 		container_of(mn, struct gntdev_grant_map, notifier);
 	unsigned long mstart, mend;
-	int err;
 
 	if (!mmu_notifier_range_blockable(range))
 		return false;
@@ -494,10 +503,9 @@  static bool gntdev_invalidate(struct mmu_interval_notifier *mn,
 			map->index, map->count,
 			map->vma->vm_start, map->vma->vm_end,
 			range->start, range->end, mstart, mend);
-	err = unmap_grant_pages(map,
+	unmap_grant_pages(map,
 				(mstart - map->vma->vm_start) >> PAGE_SHIFT,
 				(mend - mstart) >> PAGE_SHIFT);
-	WARN_ON(err);
 
 	return true;
 }