From patchwork Mon Jul 22 04:26:20 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Wanpeng Li X-Patchwork-Id: 11051537 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 08DFF912 for ; Mon, 22 Jul 2019 04:26:36 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DDD15284A3 for ; Mon, 22 Jul 2019 04:26:35 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id CE17D284F5; Mon, 22 Jul 2019 04:26:35 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 63A0B284A3 for ; Mon, 22 Jul 2019 04:26:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726675AbfGVE01 (ORCPT ); Mon, 22 Jul 2019 00:26:27 -0400 Received: from mail-pf1-f194.google.com ([209.85.210.194]:39602 "EHLO mail-pf1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725766AbfGVE01 (ORCPT ); Mon, 22 Jul 2019 00:26:27 -0400 Received: by mail-pf1-f194.google.com with SMTP id f17so12744246pfn.6; Sun, 21 Jul 2019 21:26:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=/GECkMX2BvgmJ+stDaMKbwKqcE9+Rsp9237P1eJja/Q=; b=VEONqYJkW8skLU3QWUv/3Q2eWK9Ld5CPEKr4dV3WS4mTTVaZqg8cdn4IraDoSirj/g HjENpWKixV96QUX/ifWxWiRajyg0xIiOzNNmLvUDWYNK6pqonkqSu9MMCVIZtUetDbnu Ot1SzJyKlBOino4/UVtlwfrsnKxWe9khy0R0tPCBiyA1/oiOnMQwzmk6QaFEOOFLQ0y9 P+bnQ3kACtLZmfjLeGL5WsblFrnPCNwICOIgsPEo7L35VhK9E1pY/51NPyk2lv72+xJH m0S+SuxBLEnH7yVWWHOXz0SSmXsiY4RtB97GFeGwq0/D0JiD7ak2BmrtktHpk8tVP/zx NnBw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=/GECkMX2BvgmJ+stDaMKbwKqcE9+Rsp9237P1eJja/Q=; b=PeINbTcpIjjtIiTsVK7hq0v5NYrW6sjz4HSiQOT4UIDIr/dPchgWki5sVEeSsKz7EH s+PgEYRwpJl2ov6osFr7iaAq/X/OYwOCymH5XsF2F2fsFe88C1Z8+zbBT3ztMvOb3wZW dmOEb70Jf+gQNJ5b1FVF6Aj/BReTf6kK2bt+T/sDRuaUscisGxV90z4im/XPsSQ/fKGw 3PizTGwFk+eUvANYsnD9wNyWcMizJZKBdWO1XfPCQSazkXSqj7m0vDfV12l6QWWaLA1L VAH7N27UrBaWew39+0ARSKWkZAeEB4lYaDPXgDhMGbZ59sbzL0exOR8MlP9fimHoTEzT +/Zg== X-Gm-Message-State: APjAAAUYPzf+ysMihtmJnX/usFvHFACNgvaWwwATLTcXkrfjrcukyDk3 ZYieg+1Z6PPQHV7g4IOBmM6omC9Thy0= X-Google-Smtp-Source: APXvYqz4tCkRU/CC2Y7DGSppuc7kMSC8+iomq6jRFSbYdXF9P+JPt8Ilyh/dRC0sUwlxX8fzTuFZcg== X-Received: by 2002:a63:460c:: with SMTP id t12mr69057416pga.69.1563769586567; Sun, 21 Jul 2019 21:26:26 -0700 (PDT) Received: from localhost.localdomain ([203.205.141.123]) by smtp.googlemail.com with ESMTPSA id r9sm17108217pjq.3.2019.07.21.21.26.24 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Sun, 21 Jul 2019 21:26:26 -0700 (PDT) From: Wanpeng Li X-Google-Original-From: Wanpeng Li To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: Paolo Bonzini , =?utf-8?b?UmFkaW0gS3LEjW3DocWZ?= , Thomas Lambertz , anthony , stable@vger.kernel.org Subject: [PATCH 1/2] KVM: X86: Fix fpu state crash in kvm guest Date: Mon, 22 Jul 2019 12:26:20 +0800 Message-Id: <1563769581-20293-1-git-send-email-wanpengli@tencent.com> X-Mailer: git-send-email 2.7.4 MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Wanpeng Li The idea before commit 240c35a37 was that we have the following FPU states: userspace (QEMU) guest --------------------------------------------------------------------------- processor vcpu->arch.guest_fpu >>> KVM_RUN: kvm_load_guest_fpu vcpu->arch.user_fpu processor >>> preempt out vcpu->arch.user_fpu current->thread.fpu >>> preempt in vcpu->arch.user_fpu processor >>> back to userspace >>> kvm_put_guest_fpu processor vcpu->arch.guest_fpu --------------------------------------------------------------------------- With the new lazy model we want to get the state back to the processor when schedule in from current->thread.fpu. Reported-by: Thomas Lambertz Reported-by: anthony Tested-by: anthony Cc: Paolo Bonzini Cc: Radim Krčmář Cc: Thomas Lambertz Cc: anthony Cc: stable@vger.kernel.org Fixes: 5f409e20b (x86/fpu: Defer FPU state load until return to userspace) Signed-off-by: Wanpeng Li --- arch/x86/kvm/x86.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index cf2afdf..bdcd250 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3306,6 +3306,10 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu) kvm_x86_ops->vcpu_load(vcpu, cpu); + fpregs_assert_state_consistent(); + if (test_thread_flag(TIF_NEED_FPU_LOAD)) + switch_fpu_return(); + /* Apply any externally detected TSC adjustments (due to suspend) */ if (unlikely(vcpu->arch.tsc_offset_adjustment)) { adjust_tsc_offset_host(vcpu, vcpu->arch.tsc_offset_adjustment); @@ -7990,9 +7994,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) trace_kvm_entry(vcpu->vcpu_id); guest_enter_irqoff(); - fpregs_assert_state_consistent(); - if (test_thread_flag(TIF_NEED_FPU_LOAD)) - switch_fpu_return(); + WARN_ON_ONCE(test_thread_flag(TIF_NEED_FPU_LOAD)); if (unlikely(vcpu->arch.switch_db_regs)) { set_debugreg(0, 7); From patchwork Mon Jul 22 04:26:21 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Wanpeng Li X-Patchwork-Id: 11051539 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 636CD6C5 for ; Mon, 22 Jul 2019 04:26:36 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 50CA8284A3 for ; Mon, 22 Jul 2019 04:26:36 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4533B284E9; Mon, 22 Jul 2019 04:26:36 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 83A7E284C5 for ; Mon, 22 Jul 2019 04:26:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727558AbfGVE03 (ORCPT ); Mon, 22 Jul 2019 00:26:29 -0400 Received: from mail-pl1-f194.google.com ([209.85.214.194]:38708 "EHLO mail-pl1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727404AbfGVE03 (ORCPT ); Mon, 22 Jul 2019 00:26:29 -0400 Received: by mail-pl1-f194.google.com with SMTP id az7so18552024plb.5; Sun, 21 Jul 2019 21:26:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=KkQ+Pf6+qToCR37/fN3WwGFZWrKUQApYlq4rvuPnwdM=; b=UMHZ+vKEgfICcULWelNdpMFtjDrGGeWbirtEqMLDBazIKNkXIxBOzORVSj+IWXigfP lW1G14yX9+pjePMWtHJPv7gfsGkl4FV7J3/ZOg2KcE6ShNvwFpRdr3gw5dZZqFgBiPrD 6gKKq5dMkxfQItddrXM78QNAIa1SibQIP74lyXnUWt6dfu/VeiwSfkruWQ6hj8uugI8C urHrXZMVrJGUz3LGwE7g5HTCHr0eYl6B20LmOw6ccRNCDyKTO42Ynvawx/MQbVpBmbDx gAQPv0v0i06QObd4a7okMRUrpyO7dOw/lRzWU/JAJKY7sps8JiNBYCTVW0IK/qEb8xNv +lbg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=KkQ+Pf6+qToCR37/fN3WwGFZWrKUQApYlq4rvuPnwdM=; b=c7f2wA/4YUeVUwtAF9kHkJPSbj78e1QnQLHexDP4CWnuXPfMyXVISEIzxPzkTnOYYs GjQaVI5i3KllYtPfkJL8MOAjfzGyULE7QV1pwtu7pnLpWrp9WVVwlOqfonSjZMq9ZelM h3kI+CnPmwww61ifGwtlQoAu/6OtsnUAVaZEN7UTaH4bQja7o9Z7otw9haLfVRY/VsE8 Rkicp0ZH1fSeye+cKT0WRLXRUv2XZdWtV1hwKWimoOaTI0Dpt0PiZ4Gwqwd7lBD8F4PX A2lnqx6F0XjQQ+MqamTHAsS8yMJXUoCKGThyDiH5d4DFq/tpG6S6/hj/TR6FuZ7D80mJ 6udg== X-Gm-Message-State: APjAAAVDUiZbZyPpjEDxkJbM5WlI/mutTX9vJZ5SWSp8o/B5rqWzAJkk pYVCoy35g72ljO9f7ZpT5AuD6WyDOLc= X-Google-Smtp-Source: APXvYqzooHPsRlSTibc76awHb/m7uhXARacTdJ6Wh1ZeBuu2i/mFRc3kI4AmUc//gnr1bDDlhch9yQ== X-Received: by 2002:a17:902:b08a:: with SMTP id p10mr73916631plr.83.1563769588393; Sun, 21 Jul 2019 21:26:28 -0700 (PDT) Received: from localhost.localdomain ([203.205.141.123]) by smtp.googlemail.com with ESMTPSA id r9sm17108217pjq.3.2019.07.21.21.26.26 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Sun, 21 Jul 2019 21:26:27 -0700 (PDT) From: Wanpeng Li X-Google-Original-From: Wanpeng Li To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: Paolo Bonzini , =?utf-8?b?UmFkaW0gS3LEjW3DocWZ?= Subject: [PATCH 2/2] KVM: X86: Dynamically allocate user_fpu Date: Mon, 22 Jul 2019 12:26:21 +0800 Message-Id: <1563769581-20293-2-git-send-email-wanpengli@tencent.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1563769581-20293-1-git-send-email-wanpengli@tencent.com> References: <1563769581-20293-1-git-send-email-wanpengli@tencent.com> MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Wanpeng Li After reverting commit 240c35a3783a (kvm: x86: Use task structs fpu field for user), struct kvm_vcpu is 19456 bytes on my server, PAGE_ALLOC_COSTLY_ORDER(3) is the order at which allocations are deemed costly to service. In serveless scenario, one host can service hundreds/thoudands firecracker/kata-container instances, howerver, new instance will fail to launch after memory is too fragmented to allocate kvm_vcpu struct on host, this was observed in some cloud provider product environments. This patch dynamically allocates user_fpu, kvm_vcpu is 15168 bytes now on my Skylake server. Cc: Paolo Bonzini Cc: Radim Krčmář Signed-off-by: Wanpeng Li --- arch/x86/include/asm/kvm_host.h | 2 +- arch/x86/kvm/svm.c | 13 ++++++++++++- arch/x86/kvm/vmx/vmx.c | 13 ++++++++++++- arch/x86/kvm/x86.c | 4 ++-- 4 files changed, 27 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 4f938ac..7b0a4ee 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -616,7 +616,7 @@ struct kvm_vcpu_arch { * "guest_fpu" state here contains the guest FPU context, with the * host PRKU bits. */ - struct fpu user_fpu; + struct fpu *user_fpu; struct fpu *guest_fpu; u64 xcr0; diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 19f69df..7eafc69 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -2143,12 +2143,20 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, unsigned int id) goto out; } + svm->vcpu.arch.user_fpu = kmem_cache_zalloc(x86_fpu_cache, + GFP_KERNEL_ACCOUNT); + if (!svm->vcpu.arch.user_fpu) { + printk(KERN_ERR "kvm: failed to allocate kvm userspace's fpu\n"); + err = -ENOMEM; + goto free_partial_svm; + } + svm->vcpu.arch.guest_fpu = kmem_cache_zalloc(x86_fpu_cache, GFP_KERNEL_ACCOUNT); if (!svm->vcpu.arch.guest_fpu) { printk(KERN_ERR "kvm: failed to allocate vcpu's fpu\n"); err = -ENOMEM; - goto free_partial_svm; + goto free_user_fpu; } err = kvm_vcpu_init(&svm->vcpu, kvm, id); @@ -2211,6 +2219,8 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, unsigned int id) kvm_vcpu_uninit(&svm->vcpu); free_svm: kmem_cache_free(x86_fpu_cache, svm->vcpu.arch.guest_fpu); +free_user_fpu: + kmem_cache_free(x86_fpu_cache, svm->vcpu.arch.user_fpu); free_partial_svm: kmem_cache_free(kvm_vcpu_cache, svm); out: @@ -2241,6 +2251,7 @@ static void svm_free_vcpu(struct kvm_vcpu *vcpu) __free_page(virt_to_page(svm->nested.hsave)); __free_pages(virt_to_page(svm->nested.msrpm), MSRPM_ALLOC_ORDER); kvm_vcpu_uninit(vcpu); + kmem_cache_free(x86_fpu_cache, svm->vcpu.arch.user_fpu); kmem_cache_free(x86_fpu_cache, svm->vcpu.arch.guest_fpu); kmem_cache_free(kvm_vcpu_cache, svm); } diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index a279447..074385c 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -6598,6 +6598,7 @@ static void vmx_free_vcpu(struct kvm_vcpu *vcpu) free_loaded_vmcs(vmx->loaded_vmcs); kfree(vmx->guest_msrs); kvm_vcpu_uninit(vcpu); + kmem_cache_free(x86_fpu_cache, vmx->vcpu.arch.user_fpu); kmem_cache_free(x86_fpu_cache, vmx->vcpu.arch.guest_fpu); kmem_cache_free(kvm_vcpu_cache, vmx); } @@ -6613,12 +6614,20 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id) if (!vmx) return ERR_PTR(-ENOMEM); + vmx->vcpu.arch.user_fpu = kmem_cache_zalloc(x86_fpu_cache, + GFP_KERNEL_ACCOUNT); + if (!vmx->vcpu.arch.user_fpu) { + printk(KERN_ERR "kvm: failed to allocate kvm userspace's fpu\n"); + err = -ENOMEM; + goto free_partial_vcpu; + } + vmx->vcpu.arch.guest_fpu = kmem_cache_zalloc(x86_fpu_cache, GFP_KERNEL_ACCOUNT); if (!vmx->vcpu.arch.guest_fpu) { printk(KERN_ERR "kvm: failed to allocate vcpu's fpu\n"); err = -ENOMEM; - goto free_partial_vcpu; + goto free_user_fpu; } vmx->vpid = allocate_vpid(); @@ -6721,6 +6730,8 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id) free_vcpu: free_vpid(vmx->vpid); kmem_cache_free(x86_fpu_cache, vmx->vcpu.arch.guest_fpu); +free_user_fpu: + kmem_cache_free(x86_fpu_cache, vmx->vcpu.arch.user_fpu); free_partial_vcpu: kmem_cache_free(kvm_vcpu_cache, vmx); return ERR_PTR(err); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index bdcd250..09dbc93 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -8272,7 +8272,7 @@ static void kvm_load_guest_fpu(struct kvm_vcpu *vcpu) { fpregs_lock(); - copy_fpregs_to_fpstate(&vcpu->arch.user_fpu); + copy_fpregs_to_fpstate(vcpu->arch.user_fpu); /* PKRU is separately restored in kvm_x86_ops->run. */ __copy_kernel_to_fpregs(&vcpu->arch.guest_fpu->state, ~XFEATURE_MASK_PKRU); @@ -8289,7 +8289,7 @@ static void kvm_put_guest_fpu(struct kvm_vcpu *vcpu) fpregs_lock(); copy_fpregs_to_fpstate(vcpu->arch.guest_fpu); - copy_kernel_to_fpregs(&vcpu->arch.user_fpu.state); + copy_kernel_to_fpregs(&vcpu->arch.user_fpu->state); fpregs_mark_activate(); fpregs_unlock();