diff mbox series

[5/5] mm/gup: Take mmap_sem in get_dump_page()

Message ID 20200428032745.133556-6-jannh@google.com (mailing list archive)
State New, archived
Headers show
Series Fix ELF / FDPIC ELF core dumping, and use mmap_sem properly in there | expand

Commit Message

Jann Horn April 28, 2020, 3:27 a.m. UTC
Properly take the mmap_sem before calling into the GUP code from
get_dump_page(); and play nice, allowing __get_user_pages_locked() to drop
the mmap_sem if it has to sleep.

This requires adjusting the check in __get_user_pages_locked() to be
slightly less strict: While `vmas != NULL` is normally incompatible with
the lock-dropping retry logic, it's fine if we only want a single page,
because then retries can only happen when we haven't grabbed any pages yet.

Signed-off-by: Jann Horn <jannh@google.com>
---
 mm/gup.c | 21 ++++++++++++++++-----
 1 file changed, 16 insertions(+), 5 deletions(-)

Comments

Linus Torvalds April 28, 2020, 3:50 a.m. UTC | #1
On Mon, Apr 27, 2020 at 8:28 PM Jann Horn <jannh@google.com> wrote:
>
> Properly take the mmap_sem before calling into the GUP code from
> get_dump_page(); and play nice, allowing __get_user_pages_locked() to drop
> the mmap_sem if it has to sleep.

This makes my skin crawl.

The only reason for this all is that page cache flushing.

My gut feeling is that it should be done by get_user_pages() anyway,
since all the other users presumably want it to be coherent in the
cache.

And in fact, looking at __get_user_pages(), it already does that

                if (pages) {
                        pages[i] = page;
                        flush_anon_page(vma, page, start);
                        flush_dcache_page(page);
                        ctx.page_mask = 0;
                }

and I think that the get_dump_page() logic is unnecessary to begin with.

               Linus
Jann Horn April 28, 2020, 6:10 a.m. UTC | #2
On Tue, Apr 28, 2020 at 5:50 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Mon, Apr 27, 2020 at 8:28 PM Jann Horn <jannh@google.com> wrote:
> >
> > Properly take the mmap_sem before calling into the GUP code from
> > get_dump_page(); and play nice, allowing __get_user_pages_locked() to drop
> > the mmap_sem if it has to sleep.
>
> This makes my skin crawl.
>
> The only reason for this all is that page cache flushing.
>
> My gut feeling is that it should be done by get_user_pages() anyway,
> since all the other users presumably want it to be coherent in the
> cache.
>
> And in fact, looking at __get_user_pages(), it already does that
>
>                 if (pages) {
>                         pages[i] = page;
>                         flush_anon_page(vma, page, start);
>                         flush_dcache_page(page);
>                         ctx.page_mask = 0;
>                 }
>
> and I think that the get_dump_page() logic is unnecessary to begin with.

Ah! And even though flush_cache_page() is broader than
flush_dcache_page(), that's actually unnecessary, right? Since the
kernel only wants to read from the page, and therefore e.g. the icache
is irrelevant?

Yay! :) I did think this was a bit gnarly, and it's nice to know that
this can be simplified.

(And now I'm going to avert my eyes from the GUP code before I start
thinking too hard about how much it sucks that FOLL_LONGTERM doesn't
drop the mmap_sem across the access and how much I dislike the whole
idea of FOLL_LONGTERM in general...)
diff mbox series

Patch

diff --git a/mm/gup.c b/mm/gup.c
index 9a7e83772f1fe..4bb4149c0e259 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1261,7 +1261,8 @@  static __always_inline long __get_user_pages_locked(struct task_struct *tsk,
 
 	if (locked) {
 		/* if VM_FAULT_RETRY can be returned, vmas become invalid */
-		BUG_ON(vmas);
+		if (WARN_ON(vmas && nr_pages != 1))
+			return -EFAULT;
 		/* check caller initialized locked */
 		BUG_ON(*locked != 1);
 	}
@@ -1548,18 +1549,28 @@  static long __get_user_pages_locked(struct task_struct *tsk,
  * NULL wherever the ZERO_PAGE, or an anonymous pte_none, has been found -
  * allowing a hole to be left in the corefile to save diskspace.
  *
- * Called without mmap_sem, but after all other threads have been killed.
+ * Called without mmap_sem (takes and releases the mmap_sem by itself).
  */
 struct page *get_dump_page(unsigned long addr)
 {
+	struct mm_struct *mm = current->mm;
 	struct vm_area_struct *vma;
 	struct page *page;
+	int locked = 1;
+	int ret;
 
-	if (__get_user_pages(current, current->mm, addr, 1,
-			     FOLL_FORCE | FOLL_DUMP | FOLL_GET, &page, &vma,
-			     NULL) < 1)
+	if (down_read_killable(&mm->mmap_sem))
+		return NULL;
+	ret = __get_user_pages_locked(current, mm, addr, 1, &page, &vma,
+				      &locked,
+				      FOLL_FORCE | FOLL_DUMP | FOLL_GET);
+	if (ret != 1) {
+		if (locked)
+			up_read(&mm->mmap_sem);
 		return NULL;
+	}
 	flush_cache_page(vma, addr, page_to_pfn(page));
+	up_read(&mm->mmap_sem);
 	return page;
 }