From patchwork Thu May 2 12:19:08 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Beulich X-Patchwork-Id: 10926669 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8BA1E1395 for ; Thu, 2 May 2019 12:20:31 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7BE5A28E95 for ; Thu, 2 May 2019 12:20:31 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 6F62529094; Thu, 2 May 2019 12:20:31 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id D52AF28F59 for ; Thu, 2 May 2019 12:20:30 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1hMAgE-0004P6-DQ; Thu, 02 May 2019 12:19:14 +0000 Received: from us1-rack-dfw2.inumbo.com ([104.130.134.6]) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1hMAgC-0004Ot-O1 for xen-devel@lists.xenproject.org; Thu, 02 May 2019 12:19:12 +0000 X-Inumbo-ID: 7d3a1c2b-6cd4-11e9-843c-bc764e045a96 Received: from prv1-mh.provo.novell.com (unknown [137.65.248.33]) by us1-rack-dfw2.inumbo.com (Halon) with ESMTPS id 7d3a1c2b-6cd4-11e9-843c-bc764e045a96; Thu, 02 May 2019 12:19:10 +0000 (UTC) Received: from INET-PRV1-MTA by prv1-mh.provo.novell.com with Novell_GroupWise; Thu, 02 May 2019 06:19:09 -0600 Message-Id: <5CCAE03C020000780022B2E8@prv1-mh.provo.novell.com> X-Mailer: Novell GroupWise Internet Agent 18.1.0 Date: Thu, 02 May 2019 06:19:08 -0600 From: "Jan Beulich" To: "xen-devel" References: <5CCAD5ED020000780022B2A2@prv1-mh.provo.novell.com> In-Reply-To: <5CCAD5ED020000780022B2A2@prv1-mh.provo.novell.com> Mime-Version: 1.0 Content-Disposition: inline Subject: [Xen-devel] [PATCH 2/9] x86: limit the amount of TLB flushing in switch_cr3_cr4() X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: George Dunlap , Andrew Cooper , Wei Liu , Roger Pau Monne Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP We really need to flush the TLB just once, if we do so with or after the CR3 write. The only case where two flushes are unavoidable is when we mean to turn off CR4.PGE (perhaps just temporarily; see the code comment). Signed-off-by: Jan Beulich --- a/xen/arch/x86/flushtlb.c +++ b/xen/arch/x86/flushtlb.c @@ -104,82 +104,65 @@ static void do_tlb_flush(void) void switch_cr3_cr4(unsigned long cr3, unsigned long cr4) { unsigned long flags, old_cr4; - unsigned int old_pcid; u32 t; + /* Throughout this function we make this assumption: */ + ASSERT(!(cr4 & X86_CR4_PCIDE) || !(cr4 & X86_CR4_PGE)); + /* This non-reentrant function is sometimes called in interrupt context. */ local_irq_save(flags); t = pre_flush(); old_cr4 = read_cr4(); - if ( old_cr4 & X86_CR4_PGE ) + ASSERT(!(old_cr4 & X86_CR4_PCIDE) || !(old_cr4 & X86_CR4_PGE)); + + /* + * We need to write CR4 before CR3 if we're about to enable PCIDE, at the + * very least when the new PCID is non-zero. + * + * As we also need to do two CR4 writes in total when PGE is enabled and + * is to remain enabled, do the one temporarily turning off the bit right + * here as well. + * + * The only TLB flushing effect we depend on here is in case we move from + * PGE set to PCIDE set, where we want global page entries gone (and none + * to re-appear) after this write. + */ + if ( !(old_cr4 & X86_CR4_PCIDE) && + ((cr4 & X86_CR4_PCIDE) || (cr4 & old_cr4 & X86_CR4_PGE)) ) { - /* - * X86_CR4_PGE set means PCID is inactive. - * We have to purge the TLB via flipping cr4.pge. - */ old_cr4 = cr4 & ~X86_CR4_PGE; write_cr4(old_cr4); } - else if ( use_invpcid ) - { - /* - * Flushing the TLB via INVPCID is necessary only in case PCIDs are - * in use, which is true only with INVPCID being available. - * Without PCID usage the following write_cr3() will purge the TLB - * (we are in the cr4.pge off path) of all entries. - * Using invpcid_flush_all_nonglobals() seems to be faster than - * invpcid_flush_all(), so use that. - */ - invpcid_flush_all_nonglobals(); - - /* - * CR4.PCIDE needs to be set before the CR3 write below. Otherwise - * - the CR3 write will fault when CR3.NOFLUSH is set (which is the - * case normally), - * - the subsequent CR4 write will fault if CR3.PCID != 0. - */ - if ( (old_cr4 & X86_CR4_PCIDE) < (cr4 & X86_CR4_PCIDE) ) - { - write_cr4(cr4); - old_cr4 = cr4; - } - } /* - * If we don't change PCIDs, the CR3 write below needs to flush this very - * PCID, even when a full flush was performed above, as we are currently - * accumulating TLB entries again from the old address space. - * NB: Clearing the bit when we don't use PCID is benign (as it is clear - * already in that case), but allows the if() to be more simple. + * If the CR4 write is to turn off PCIDE, we don't need the CR3 write to + * flush anything, as that transition is a full flush itself. */ - old_pcid = cr3_pcid(read_cr3()); - if ( old_pcid == cr3_pcid(cr3) ) - cr3 &= ~X86_CR3_NOFLUSH; - + if ( (old_cr4 & X86_CR4_PCIDE) > (cr4 & X86_CR4_PCIDE) ) + cr3 |= X86_CR3_NOFLUSH; write_cr3(cr3); if ( old_cr4 != cr4 ) write_cr4(cr4); /* - * Make sure no TLB entries related to the old PCID created between - * flushing the TLB and writing the new %cr3 value remain in the TLB. - * - * The write to CR4 just above has performed a wider flush in certain - * cases, which therefore get excluded here. Since that write is - * conditional, note in particular that it won't be skipped if PCIDE - * transitions from 1 to 0. This is because the CR4 write further up will - * have been skipped in this case, as PCIDE and PGE won't both be set at - * the same time. - * - * Note also that PGE is always clear in old_cr4. + * PGE | PCIDE | flush at + * ------+-------+------------------------ + * 0->0 | 0->0 | CR3 write + * 0->0 | 0->1 | n/a (see 1st CR4 write) + * 0->x | 1->0 | CR4 write + * x->1 | x->1 | n/a + * 0->0 | 1->1 | INVPCID + * 0->1 | 0->0 | CR3 and CR4 writes + * 1->0 | 0->0 | CR4 write + * 1->0 | 0->1 | n/a (see 1st CR4 write) + * 1->1 | 0->0 | n/a (see 1st CR4 write) + * 1->x | 1->x | n/a */ - if ( old_pcid != cr3_pcid(cr3) && - !(cr4 & X86_CR4_PGE) && - (old_cr4 & X86_CR4_PCIDE) <= (cr4 & X86_CR4_PCIDE) ) - invpcid_flush_single_context(old_pcid); + if ( cr4 & X86_CR4_PCIDE ) + invpcid_flush_all_nonglobals(); post_flush(t);