From patchwork Mon May 16 09:28:23 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 9099311 Return-Path: X-Original-To: patchwork-kvm@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 4FCAFBF29F for ; Mon, 16 May 2016 09:28:59 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 4A26B20218 for ; Mon, 16 May 2016 09:28:58 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 299E8201B4 for ; Mon, 16 May 2016 09:28:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753639AbcEPJ2n (ORCPT ); Mon, 16 May 2016 05:28:43 -0400 Received: from mail-wm0-f67.google.com ([74.125.82.67]:34675 "EHLO mail-wm0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753623AbcEPJ2l (ORCPT ); Mon, 16 May 2016 05:28:41 -0400 Received: by mail-wm0-f67.google.com with SMTP id n129so16649460wmn.1 for ; Mon, 16 May 2016 02:28:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:subject:to:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-transfer-encoding; bh=CqSWqjesjDVOgyp0izHmQcksvhr8SeCPamrtiQW/tTc=; b=q8qn9pfhlt7rH0i3EFmxQV0WzA4qo4RvDiUil14YNAJ1i90lSAJL+YgWlXP7KQD89X TApFvtKgEM972AE1x5Iylq/1O1teUJV1ezt1urnJLgEy47onTf/DHpRpuoMQ/32041N4 d84sDLtL8iVPN2ICYoY3Cr4/4MqvS9IK3fqEQYs95/0h6P26iz0BZtKWGQ9NegRGUohU 1+Q6CMIfv6f8JQdSaK9CcV3GrCuhSySrm9xwYryapIEZAhUmdUheQQKzlZk/RnbiF2cH HT40AjXuhGzENt/eLJDC9IDWfNJdzoMDjbvMK++0m+OkpmrRj4j5weSZayqzhZacPTij bxpA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:sender:subject:to:references:from:message-id :date:user-agent:mime-version:in-reply-to:content-transfer-encoding; bh=CqSWqjesjDVOgyp0izHmQcksvhr8SeCPamrtiQW/tTc=; b=I2cd6HaqknyyCrgEa+khkflSg9NaoW+uYaUtkS6DatMX9mzv0PtSrb87Dju5xAgZiP jG7oVzBXwkFAKwsqFtkOPDRTX7lm/wJUeiYyxBxlcYYwtqPEz8vKCWGQO4/Cl53qVldk kOOJ8uykv14BcfXmStGvEPNQr0aGAYTc9jpeEZL898UdriFtWR/bkF1HqmBFcu2jNaqN CNOAJhYqaCES8/nR/UFSBTjckgmFwAPgUuH/hCVdYxPmzRYeQewQ5M4IqmjsE+rLor4O DRv3cXklH9hznlpwHPgad3xfu1lRKHFPdGxzPM8vpf7CYUyNq9V/4wYbos7n6ER/8vDU M0jw== X-Gm-Message-State: AOPr4FWrU9Vohfpybo+7ddlszUcDsvwoHq9dijfPjCSonUGM+5cx6FKo+8k6iGLtvAB7tA== X-Received: by 10.194.108.197 with SMTP id hm5mr28718999wjb.167.1463390914578; Mon, 16 May 2016 02:28:34 -0700 (PDT) Received: from [192.168.10.150] (dynamic-adsl-78-12-252-58.clienti.tiscali.it. [78.12.252.58]) by smtp.googlemail.com with ESMTPSA id kz1sm32698046wjc.46.2016.05.16.02.28.33 (version=TLSv1/SSLv3 cipher=OTHER); Mon, 16 May 2016 02:28:33 -0700 (PDT) Subject: Re: x86: strange behavior of invlpg To: Nadav Amit , kvm@vger.kernel.org References: <2C79F043-EF77-4C44-BE36-1CEDE16E788F@gmail.com> From: Paolo Bonzini Message-ID: <573992B7.20300@redhat.com> Date: Mon, 16 May 2016 11:28:23 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.7.1 MIME-Version: 1.0 In-Reply-To: <2C79F043-EF77-4C44-BE36-1CEDE16E788F@gmail.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Spam-Status: No, score=-8.2 required=5.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,RP_MATCHES_RCVD,T_DKIM_INVALID,UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On 14/05/2016 11:35, Nadav Amit wrote: > I encountered a strange phenomenum and I would appreciate your sanity check > and opinion. It looks as if 'invlpg' that runs in a VM causes a very broad > flush. > > I created a small kvm-unit-test (below) to show what I talk about. The test > touches 50 pages, and then either: (1) runs full flush, (2) runs invlpg to > an arbitrary (other) address, or (3) runs memory barrier. > > It appears that the execution time of the test is indeed determined by TLB > misses, since the runtime of the memory barrier flavor is considerably lower. Did you check the performance counters? Another explanation is that there are no TLB misses, but CR3 writes are optimized in such a way that they do not incur TLB misses either. (Disclaimer: I didn't check the performance counters to prove the alternative theory ;)). > What I find strange is that if I compute the net access time for tests 1 & 2, > by deducing the time of the flushes, the time is almost identical. I am aware > that invlpg flushes the page-walk caches, but I would still expect the invlpg > flavor to run considerably faster than the full-flush flavor. That's interesting. I guess you're using EPT because I get very similar number on an Ivy Bridge laptop: with invlpg: 902,224,568 with full flush: 880,103,513 invlpg only 113,186,461 full flushes only 100,236,620 access net 104,454,125 w/full flush net 779,866,893 w/invlpg net 789,038,107 (commas added for readability). Out of curiosity I tried making all pages global (patch after my signature). Both invlpg and write to CR3 become much faster, but invlpg now is faster than full flush, even though in theory it should be the opposite... with invlpg: 223,079,661 with full flush: 294,280,788 invlpg only 126,236,334 full flushes only 107,614,525 access net 90,830,503 w/full flush net 186,666,263 w/invlpg net 96,843,327 Thanks for the interesting test! Paolo > Am I missing something? > > > On my Haswell EP I get the following results: > > with invlpg: 948965249 > with full flush: 1047927009 > invlpg only 127682028 > full flushes only 224055273 > access net 107691277 --> considerably lower than w/flushes > w/full flush net 823871736 > w/invlpg net 821283221 --> almost identical to full-flush net > > --- > > > #include "libcflat.h" > #include "fwcfg.h" > #include "vm.h" > #include "smp.h" > > #define N_PAGES (50) > #define ITERATIONS (500000) > volatile char buf[N_PAGES * PAGE_SIZE] __attribute__ ((aligned (PAGE_SIZE))); > > int main(void) > { > void *another_addr = (void*)0x50f9000; > int i, j; > unsigned long t_start, t_single, t_full, t_single_only, t_full_only, > t_access; > unsigned long cr3; > char v = 0; > > setup_vm(); > > cr3 = read_cr3(); > > t_start = rdtsc(); > for (i = 0; i < ITERATIONS; i++) { > invlpg(another_addr); > for (j = 0; j < N_PAGES; j++) > v = buf[PAGE_SIZE * j]; > } > t_single = rdtsc() - t_start; > printf("with invlpg: %lu\n", t_single); > > t_start = rdtsc(); > for (i = 0; i < ITERATIONS; i++) { > write_cr3(cr3); > for (j = 0; j < N_PAGES; j++) > v = buf[PAGE_SIZE * j]; > } > t_full = rdtsc() - t_start; > printf("with full flush: %lu\n", t_full); > > t_start = rdtsc(); > for (i = 0; i < ITERATIONS; i++) > invlpg(another_addr); > t_single_only = rdtsc() - t_start; > printf("invlpg only %lu\n", t_single_only); > > t_start = rdtsc(); > for (i = 0; i < ITERATIONS; i++) > write_cr3(cr3); > t_full_only = rdtsc() - t_start; > printf("full flushes only %lu\n", t_full_only); > > t_start = rdtsc(); > for (i = 0; i < ITERATIONS; i++) { > for (j = 0; j < N_PAGES; j++) > v = buf[PAGE_SIZE * j]; > mb(); > } > t_access = rdtsc()-t_start; > printf("access net %lu\n", t_access); > printf("w/full flush net %lu\n", t_full - t_full_only); > printf("w/invlpg net %lu\n", t_single - t_single_only); > > (void)v; > return 0; > }-- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > --- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/lib/x86/vm.c b/lib/x86/vm.c index 7ce7bbc..3b9b81a 100644 --- a/lib/x86/vm.c +++ b/lib/x86/vm.c @@ -2,6 +2,7 @@ #include "vm.h" #include "libcflat.h" +#define PTE_GLOBAL 256 #define PAGE_SIZE 4096ul #ifdef __x86_64__ #define LARGE_PAGE_SIZE (512 * PAGE_SIZE) @@ -106,14 +107,14 @@ unsigned long *install_large_page(unsigned long *cr3, void *virt) { return install_pte(cr3, 2, virt, - phys | PTE_PRESENT | PTE_WRITE | PTE_USER | PTE_PSE, 0); + phys | PTE_PRESENT | PTE_WRITE | PTE_USER | PTE_PSE | PTE_GLOBAL, 0); } unsigned long *install_page(unsigned long *cr3, unsigned long phys, void *virt) { - return install_pte(cr3, 1, virt, phys | PTE_PRESENT | PTE_WRITE | PTE_USER, 0); + return install_pte(cr3, 1, virt, phys | PTE_PRESENT | PTE_WRITE | PTE_USER | PTE_GLOBAL, 0); }