From patchwork Tue Sep 17 06:13:33 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jan Beulich X-Patchwork-Id: 11148101 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E3C9B1708 for ; Tue, 17 Sep 2019 06:14:58 +0000 (UTC) Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id C93AB206C2 for ; Tue, 17 Sep 2019 06:14:58 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C93AB206C2 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=xen-devel-bounces@lists.xenproject.org Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1iA6jy-0000dd-R6; Tue, 17 Sep 2019 06:13:30 +0000 Received: from all-amaz-eas1.inumbo.com ([34.197.232.57] helo=us1-amaz-eas2.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1iA6jx-0000dR-9v for xen-devel@lists.xenproject.org; Tue, 17 Sep 2019 06:13:29 +0000 X-Inumbo-ID: 4398043a-d912-11e9-9601-12813bfff9fa Received: from mx1.suse.de (unknown [195.135.220.15]) by us1-amaz-eas2.inumbo.com (Halon) with ESMTPS id 4398043a-d912-11e9-9601-12813bfff9fa; Tue, 17 Sep 2019 06:13:28 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 8DFB2AC4D; Tue, 17 Sep 2019 06:13:27 +0000 (UTC) From: Jan Beulich To: "xen-devel@lists.xenproject.org" References: <625c29ba-cfb8-88ee-eb15-5f2019198865@suse.com> Message-ID: <05363bde-6b5b-56e2-1f85-b64f995d82cf@suse.com> Date: Tue, 17 Sep 2019 08:13:33 +0200 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.9.0 MIME-Version: 1.0 In-Reply-To: <625c29ba-cfb8-88ee-eb15-5f2019198865@suse.com> Content-Language: en-US Subject: [Xen-devel] [PATCH v2 2/9] x86: limit the amount of TLB flushing in switch_cr3_cr4() X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: George Dunlap , Andrew Cooper , Wei Liu , =?utf-8?q?Roger_Pau_Monn=C3=A9?= Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" We really need to flush the TLB just once, if we do so with or after the CR3 write. The only case where two flushes are unavoidable is when we mean to turn off CR4.PGE (perhaps just temporarily; see the code comment). Signed-off-by: Jan Beulich Reviewed-by: Roger Pau Monné Acked-by: Andrew Cooper --- a/xen/arch/x86/flushtlb.c +++ b/xen/arch/x86/flushtlb.c @@ -104,82 +104,65 @@ static void do_tlb_flush(void) void switch_cr3_cr4(unsigned long cr3, unsigned long cr4) { unsigned long flags, old_cr4; - unsigned int old_pcid; u32 t; + /* Throughout this function we make this assumption: */ + ASSERT(!(cr4 & X86_CR4_PCIDE) || !(cr4 & X86_CR4_PGE)); + /* This non-reentrant function is sometimes called in interrupt context. */ local_irq_save(flags); t = pre_flush(); old_cr4 = read_cr4(); - if ( old_cr4 & X86_CR4_PGE ) + ASSERT(!(old_cr4 & X86_CR4_PCIDE) || !(old_cr4 & X86_CR4_PGE)); + + /* + * We need to write CR4 before CR3 if we're about to enable PCIDE, at the + * very least when the new PCID is non-zero. + * + * As we also need to do two CR4 writes in total when PGE is enabled and + * is to remain enabled, do the one temporarily turning off the bit right + * here as well. + * + * The only TLB flushing effect we depend on here is in case we move from + * PGE set to PCIDE set, where we want global page entries gone (and none + * to re-appear) after this write. + */ + if ( !(old_cr4 & X86_CR4_PCIDE) && + ((cr4 & X86_CR4_PCIDE) || (cr4 & old_cr4 & X86_CR4_PGE)) ) { - /* - * X86_CR4_PGE set means PCID is inactive. - * We have to purge the TLB via flipping cr4.pge. - */ old_cr4 = cr4 & ~X86_CR4_PGE; write_cr4(old_cr4); } - else if ( use_invpcid ) - { - /* - * Flushing the TLB via INVPCID is necessary only in case PCIDs are - * in use, which is true only with INVPCID being available. - * Without PCID usage the following write_cr3() will purge the TLB - * (we are in the cr4.pge off path) of all entries. - * Using invpcid_flush_all_nonglobals() seems to be faster than - * invpcid_flush_all(), so use that. - */ - invpcid_flush_all_nonglobals(); - - /* - * CR4.PCIDE needs to be set before the CR3 write below. Otherwise - * - the CR3 write will fault when CR3.NOFLUSH is set (which is the - * case normally), - * - the subsequent CR4 write will fault if CR3.PCID != 0. - */ - if ( (old_cr4 & X86_CR4_PCIDE) < (cr4 & X86_CR4_PCIDE) ) - { - write_cr4(cr4); - old_cr4 = cr4; - } - } /* - * If we don't change PCIDs, the CR3 write below needs to flush this very - * PCID, even when a full flush was performed above, as we are currently - * accumulating TLB entries again from the old address space. - * NB: Clearing the bit when we don't use PCID is benign (as it is clear - * already in that case), but allows the if() to be more simple. + * If the CR4 write is to turn off PCIDE, we don't need the CR3 write to + * flush anything, as that transition is a full flush itself. */ - old_pcid = cr3_pcid(read_cr3()); - if ( old_pcid == cr3_pcid(cr3) ) - cr3 &= ~X86_CR3_NOFLUSH; - + if ( (old_cr4 & X86_CR4_PCIDE) > (cr4 & X86_CR4_PCIDE) ) + cr3 |= X86_CR3_NOFLUSH; write_cr3(cr3); if ( old_cr4 != cr4 ) write_cr4(cr4); /* - * Make sure no TLB entries related to the old PCID created between - * flushing the TLB and writing the new %cr3 value remain in the TLB. - * - * The write to CR4 just above has performed a wider flush in certain - * cases, which therefore get excluded here. Since that write is - * conditional, note in particular that it won't be skipped if PCIDE - * transitions from 1 to 0. This is because the CR4 write further up will - * have been skipped in this case, as PCIDE and PGE won't both be set at - * the same time. - * - * Note also that PGE is always clear in old_cr4. + * PGE | PCIDE | flush at + * ------+-------+------------------------ + * 0->0 | 0->0 | CR3 write + * 0->0 | 0->1 | n/a (see 1st CR4 write) + * 0->x | 1->0 | CR4 write + * x->1 | x->1 | n/a + * 0->0 | 1->1 | INVPCID + * 0->1 | 0->0 | CR3 and CR4 writes + * 1->0 | 0->0 | CR4 write + * 1->0 | 0->1 | n/a (see 1st CR4 write) + * 1->1 | 0->0 | n/a (see 1st CR4 write) + * 1->x | 1->x | n/a */ - if ( old_pcid != cr3_pcid(cr3) && - !(cr4 & X86_CR4_PGE) && - (old_cr4 & X86_CR4_PCIDE) <= (cr4 & X86_CR4_PCIDE) ) - invpcid_flush_single_context(old_pcid); + if ( cr4 & X86_CR4_PCIDE ) + invpcid_flush_all_nonglobals(); post_flush(t);