From patchwork Wed Apr 26 20:52:29 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Andy Lutomirski <luto@kernel.org>
X-Patchwork-Id: 9702011
Return-Path: <xen-devel-bounces@lists.xen.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
	[172.30.200.125])
	by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id
	38F28603F4 for <patchwork-xen-devel@patchwork.kernel.org>;
	Wed, 26 Apr 2017 20:55:29 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1E3EE27F94
	for <patchwork-xen-devel@patchwork.kernel.org>;
	Wed, 26 Apr 2017 20:55:29 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 11B072858D; Wed, 26 Apr 2017 20:55:29 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-4.2 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_MED
	autolearn=ham version=3.3.1
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
	(using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 7C62727F94
	for <patchwork-xen-devel@patchwork.kernel.org>;
	Wed, 26 Apr 2017 20:55:27 +0000 (UTC)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.84_2)
	(envelope-from <xen-devel-bounces@lists.xen.org>)
	id 1d3Tvo-0003bq-40; Wed, 26 Apr 2017 20:53:00 +0000
Received: from mail6.bemta6.messagelabs.com ([193.109.254.103])
	by lists.xenproject.org with esmtp (Exim 4.84_2)
	(envelope-from <luto@kernel.org>) id 1d3Tvm-0003bk-TK
	for xen-devel@lists.xenproject.org; Wed, 26 Apr 2017 20:52:58 +0000
Received: from [193.109.254.147] by server-6.bemta-6.messagelabs.com id
	25/11-03920-AA801095; Wed, 26 Apr 2017 20:52:58 +0000
X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrBIsWRWlGSWpSXmKPExsVybKJsh+5KDsZ
	IgxtPWS2+b5nM5MDocfjDFZYAxijWzLyk/IoE1owlC3+wFvSoV7x8XNTAeFeui5GLQ0hgF6NE
	V/9FRgjnCJPEqVezoJwZQJmNB9i6GDk5JATyJG6vnsoEYRdJLL7ynAXE5hUQlDg58wmYLSTgL
	dG+bTFQPQcHm4C6REunL0iYRUBVYt/0+8wQrYkSyxYuYINoDZDo3fQRzBYWUJL41byNCWSviM
	A0RonuDytZQRLMAjUSUy7eYwKZyQw0c/08IRBTQqBAYv/haIiRXhKLblxihbDVJK6e28Q8gVF
	oFpLjZiE0L2BkWsWoUZxaVJZapGtkqZdUlJmeUZKbmJmja2hgppebWlycmJ6ak5hUrJecn7uJ
	ERiyDECwg/HAosBDjJIcTEqivOtXMEQK8SXlp1RmJBZnxBeV5qQWH2KU4eBQkuC1Y2eMFBIsS
	k1PrUjLzAFGD0xagoNHSYRXkA0ozVtckJhbnJkOkTrFaMnx4vL790wc75Z+AJJPVv54zyTEkp
	eflyolzpsGMk8ApCGjNA9uHCzCLzHKSgnzMgIdKMRTkFqUm1mCKv+KUZyDUUmYdwLIFJ7MvBK
	4ra+ADmICOojFhQHkoJJEhJRUA2PZBIEl/L/YOYP+Pt6zz8/ko++ra43nnppNfL73saSV9ppD
	8psXfNkrzrMv8/i+yaIbJsoypVyaat0gJPO1Vo0zOGfKgYa0uTs/3THUP9H4hXHmZaeq2A/RR
	owKjkc+WCyR3PtqXaRp9/laLcVHrSoHOmOPPs3wXPTq5tQHLK2r7xhVhTN0uNxVYinOSDTUYi
	4qTgQA9eJsrOsCAAA=
X-Env-Sender: luto@kernel.org
X-Msg-Ref: server-4.tower-27.messagelabs.com!1493239975!98375762!1
X-Originating-IP: [198.145.29.136]
X-SpamReason: No, hits=0.8 required=7.0 tests=BODY_RANDOM_LONG, RCVD_BY_IP
X-StarScan-Received: 
X-StarScan-Version: 9.4.12; banners=-,-,-
X-VirusChecked: Checked
Received: (qmail 2884 invoked from network); 26 Apr 2017 20:52:56 -0000
Received: from mail.kernel.org (HELO mail.kernel.org) (198.145.29.136)
	by server-4.tower-27.messagelabs.com with DHE-RSA-AES256-GCM-SHA384
	encrypted SMTP; 26 Apr 2017 20:52:56 -0000
Received: from mail.kernel.org (localhost [127.0.0.1])
	by mail.kernel.org (Postfix) with ESMTP id EAD2320165
	for <xen-devel@lists.xenproject.org>;
	Wed, 26 Apr 2017 20:52:52 +0000 (UTC)
Received: from mail-ua0-f172.google.com (mail-ua0-f172.google.com
	[209.85.217.172])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128
	bits)) (No client certificate requested)
	by mail.kernel.org (Postfix) with ESMTPSA id AE6A5201F2
	for <xen-devel@lists.xenproject.org>;
	Wed, 26 Apr 2017 20:52:50 +0000 (UTC)
Received: by mail-ua0-f172.google.com with SMTP id 110so8198433uas.3
	for <xen-devel@lists.xenproject.org>;
	Wed, 26 Apr 2017 13:52:50 -0700 (PDT)
X-Gm-Message-State: AN3rC/6yv8Fs4dptBlZeG1+n1MfnCN9BYhvRBSKjli7wH7de6IYCOVJ2
	9y2DOypqVq8/PkeKXzXVhxd6Kcwuxp9F
X-Received: by 10.159.59.108 with SMTP id j44mr1120462uah.49.1493239969788;
	Wed, 26 Apr 2017 13:52:49 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.103.88.143 with HTTP; Wed, 26 Apr 2017 13:52:29 -0700 (PDT)
From: Andy Lutomirski <luto@kernel.org>
Date: Wed, 26 Apr 2017 13:52:29 -0700
X-Gmail-Original-Message-ID: 
 <CALCETrUPa9xvugPNcTmShJFfgSesa31dD-wy0hY3XnH1Knjn6g@mail.gmail.com>
Message-ID: 
 <CALCETrUPa9xvugPNcTmShJFfgSesa31dD-wy0hY3XnH1Knjn6g@mail.gmail.com>
To: "xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>,
	Boris Ostrovsky <boris.ostrovsky@oracle.com>,
	Juergen Gross <jgross@suse.com>
X-Virus-Scanned: ClamAV using ClamSMTP
Cc: X86 ML <x86@kernel.org>, Borislav Petkov <bp@alien8.de>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: [Xen-devel] xen_exit_mmap() questions
X-BeenThere: xen-devel@lists.xen.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xen.org>
List-Unsubscribe: <https://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <https://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Errors-To: xen-devel-bounces@lists.xen.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xen.org>
X-Virus-Scanned: ClamAV using ClamSMTP

I was trying to understand xen_drop_mm_ref() to update it for some
changes I'm working on, and I'm wondering whether we need
xen_exit_mmap() at all.

AFAICS the intent is to force all CPUs to drop their lazy uses of the
mm being destroyed so it can be unpinned before tearing down the page
tables, thus making it faster to tear down the page tables.  This
seems like it'll speed up xen_set_pud() and xen_set_pmd(), but this
seems like it may be of rather limited value.  Could we get away with
deleting it?

Also, this code in drop_other_mm_ref() looks dubious to me:

    /* If this cpu still has a stale cr3 reference, then make sure
       it has been flushed. */
    if (this_cpu_read(xen_current_cr3) == __pa(mm->pgd))
        load_cr3(swapper_pg_dir);

If cr3 hasn't been flushed to the hypervisor because we're in a lazy
mode, why would load_cr3() help?  Shouldn't this be xen_mc_flush()
instead?

Anyway, the whitespace-damaged patch below seems to result in a
fully-functional kernel:

 static void __init pin_pagetable_pfn(unsigned cmd, unsigned long pfn)
@@ -1544,6 +1449,8 @@ static int xen_pgd_alloc(struct mm_struct *mm)

 static void xen_pgd_free(struct mm_struct *mm, pgd_t *pgd)
 {
+    xen_pgd_unpin(mm);
+
 #ifdef CONFIG_X86_64
     pgd_t *user_pgd = xen_get_user_pgd(pgd);

@@ -2465,7 +2372,6 @@ static const struct pv_mmu_ops xen_mmu_ops __initconst = {

     .activate_mm = xen_activate_mm,
     .dup_mmap = xen_dup_mmap,
-    .exit_mmap = xen_exit_mmap,

     .lazy_mode = {
         .enter = paravirt_enter_lazy_mmu,

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 37cb5aad71de..e4e073844cbf 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -998,101 +998,6 @@ static void xen_dup_mmap(struct mm_struct
*oldmm, struct mm_struct *mm)
     spin_unlock(&mm->page_table_lock);
 }

-
-#ifdef CONFIG_SMP
-/* Another cpu may still have their %cr3 pointing at the pagetable, so
-   we need to repoint it somewhere else before we can unpin it. */
-static void drop_other_mm_ref(void *info)
-{
-    struct mm_struct *mm = info;
-    struct mm_struct *active_mm;
-
-    active_mm = this_cpu_read(cpu_tlbstate.active_mm);
-
-    if (active_mm == mm && this_cpu_read(cpu_tlbstate.state) != TLBSTATE_OK)
-        leave_mm(smp_processor_id());
-
-    /* If this cpu still has a stale cr3 reference, then make sure
-       it has been flushed. */
-    if (this_cpu_read(xen_current_cr3) == __pa(mm->pgd))
-        load_cr3(swapper_pg_dir);
-}
-
-static void xen_drop_mm_ref(struct mm_struct *mm)
-{
-    cpumask_var_t mask;
-    unsigned cpu;
-
-    if (current->active_mm == mm) {
-        if (current->mm == mm)
-            load_cr3(swapper_pg_dir);
-        else
-            leave_mm(smp_processor_id());
-    }
-
-    /* Get the "official" set of cpus referring to our pagetable. */
-    if (!alloc_cpumask_var(&mask, GFP_ATOMIC)) {
-        for_each_online_cpu(cpu) {
-            if (!cpumask_test_cpu(cpu, mm_cpumask(mm))
-                && per_cpu(xen_current_cr3, cpu) != __pa(mm->pgd))
-                continue;
-            smp_call_function_single(cpu, drop_other_mm_ref, mm, 1);
-        }
-        return;
-    }
-    cpumask_copy(mask, mm_cpumask(mm));
-
-    /* It's possible that a vcpu may have a stale reference to our
-       cr3, because its in lazy mode, and it hasn't yet flushed
-       its set of pending hypercalls yet.  In this case, we can
-       look at its actual current cr3 value, and force it to flush
-       if needed. */
-    for_each_online_cpu(cpu) {
-        if (per_cpu(xen_current_cr3, cpu) == __pa(mm->pgd))
-            cpumask_set_cpu(cpu, mask);
-    }
-
-    if (!cpumask_empty(mask))
-        smp_call_function_many(mask, drop_other_mm_ref, mm, 1);
-    free_cpumask_var(mask);
-}
-#else
-static void xen_drop_mm_ref(struct mm_struct *mm)
-{
-    if (current->active_mm == mm)
-        load_cr3(swapper_pg_dir);
-}
-#endif
-
-/*
- * While a process runs, Xen pins its pagetables, which means that the
- * hypervisor forces it to be read-only, and it controls all updates
- * to it.  This means that all pagetable updates have to go via the
- * hypervisor, which is moderately expensive.
- *
- * Since we're pulling the pagetable down, we switch to use init_mm,
- * unpin old process pagetable and mark it all read-write, which
- * allows further operations on it to be simple memory accesses.
- *
- * The only subtle point is that another CPU may be still using the
- * pagetable because of lazy tlb flushing.  This means we need need to
- * switch all CPUs off this pagetable before we can unpin it.
- */
-static void xen_exit_mmap(struct mm_struct *mm)
-{
-    get_cpu();        /* make sure we don't move around */
-    xen_drop_mm_ref(mm);
-    put_cpu();
-
-    spin_lock(&mm->page_table_lock);
-
-    /* pgd may not be pinned in the error exit path of execve */
-    if (xen_page_pinned(mm->pgd))
-        xen_pgd_unpin(mm);
-
-    spin_unlock(&mm->page_table_lock);
-}
-
 static void xen_post_allocator_init(void);