From patchwork Tue Oct 20 06:18:43 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: "Kirill A. Shutemov" X-Patchwork-Id: 11845753 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E544F16C0 for ; Tue, 20 Oct 2020 06:19:15 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 9003322282 for ; Tue, 20 Oct 2020 06:19:15 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=shutemov-name.20150623.gappssmtp.com header.i=@shutemov-name.20150623.gappssmtp.com header.b="sABcN5Qa" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9003322282 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 3583D6B006C; Tue, 20 Oct 2020 02:19:09 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 2E3766B0071; Tue, 20 Oct 2020 02:19:09 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0E76C6B0070; Tue, 20 Oct 2020 02:19:08 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id CA1BD6B006E for ; Tue, 20 Oct 2020 02:19:08 -0400 (EDT) Received: from smtpin20.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 577C9181AEF10 for ; Tue, 20 Oct 2020 06:19:08 +0000 (UTC) X-FDA: 77391301176.20.chair81_47131192723d Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin20.hostedemail.com (Postfix) with ESMTP id 3F241180C07AF for ; Tue, 20 Oct 2020 06:19:08 +0000 (UTC) X-Spam-Summary: 1,0,0,a06c16de2b219807,d41d8cd98f00b204,kirill@shutemov.name,,RULES_HIT:1:2:41:152:355:379:541:968:973:988:989:1042:1260:1277:1311:1313:1314:1345:1437:1515:1516:1518:1593:1594:1605:1730:1747:1777:1792:1801:2393:2559:2562:2693:2894:2895:2897:2901:2913:2918:2920:2923:2924:2925:2926:3138:3139:3140:3141:3142:3865:3866:3867:3868:3870:3871:3872:3874:4037:4050:4250:4470:4605:5007:6119:6120:6261:6653:6742:7558:7875:7901:7903:8784:10004:11026:11473:11658:11914:12043:12048:12291:12296:12297:12438:12517:12519:12555:12679:12895:13141:13161:13184:13229:13230:13894:14096:14097:14659:21080:21220:21222:21433:21444:21451:21627:21740:21772:21795:21990:30003:30012:30034:30051:30054:30055:30079,0,RBL:209.85.208.195:@shutemov.name:.lbl8.mailshell.net-66.201.201.201 62.8.84.100;04yf5canbndigmamuhp6ozhh8n3bmycpsx3ptjsbta69pirg5qaxhaoeqkgp8ru.q9bxdqnwxeeicedg5jimbho55sbcodybi99rudst4c3dje5ywrqfwf43pkucw8p.c-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck :none,Do X-HE-Tag: chair81_47131192723d X-Filterd-Recvd-Size: 11120 Received: from mail-lj1-f195.google.com (mail-lj1-f195.google.com [209.85.208.195]) by imf34.hostedemail.com (Postfix) with ESMTP for ; Tue, 20 Oct 2020 06:19:07 +0000 (UTC) Received: by mail-lj1-f195.google.com with SMTP id a4so720340lji.12 for ; Mon, 19 Oct 2020 23:19:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov-name.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=qy46RuuwUMCzWmTgEIxdF1BOhdP6zI3q5PMnJgwpnBs=; b=sABcN5QaSsunVLaxJ0We8Xt8MrzQTVy5cBduLMsOp5iNL1UcuB13+0+21Uv9bIcvoW C9sJyR15fwUZxskw8/vZoTuZMJ2BhQf5J3XjITgH9nQ5bhe4LqP4smFh/6k6QULitN3P dr2Yfcdnhnu00CH4eYADmFfNT+9uBCSkFbxxlZH3hRFyPZgiuXt+us40SYDIaFyD8UGi ZVtgsSDpMW4fAMwjfE9kyFoeBwLnVGC5IPCj3uet1VY05L5n90LWT6iaMaZgBcxSjff1 s+Hu+p6gN25kQFyZEVSrzrmkkmeIgT7VF68yfQQtWh2f9owWe0JrnJs4Am2Qhd6M0J4K GHlA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=qy46RuuwUMCzWmTgEIxdF1BOhdP6zI3q5PMnJgwpnBs=; b=iWmmdmEEmdE9lvR0MzjS9Z6xYd3Y/5oqwcy1bwUk1TIAOyh8KRPV9oCSMc5tzIq219 gUcn/FiqMkFAmDkkHCmL8VQjP3VBX80n+/SdY8DgBLJnI37rMe2CYgMGxOZi0cl0Ukn8 I6ahWc7C/VwqxyRcQMWSh9GJgE81G1bN4HocgaCg0mxmAxWqXBOQfgGq+VZMdWokcBdf nKMTmpvLxZgTWQl1Q/Mhjy2GtwsSByBhiK95RuKIJKjSNvuuRgFCPDa9Xe7lfF0gVNF5 aLs0IqqaPzjrGMaQY6A1AEuYpW8g/j6x+0+7KWsK3dZsYn/DNVf4cCMbsGNZi5OcHDIt 6tGQ== X-Gm-Message-State: AOAM5312bf45a0x+TyzfYGcoudmMXZCObdWVL/ZXgJNadKg8VqT0bArq wuxUPNGiBU7CMy/B5pw7n+WxzQ== X-Google-Smtp-Source: ABdhPJxjJIWrUWM9VHKfS1XrHymQKfZS16xX42izRFnmIzpYdrTg+8toB6yA/wBC4CCUmWziRM0ysQ== X-Received: by 2002:a05:651c:205a:: with SMTP id t26mr450811ljo.260.1603174746431; Mon, 19 Oct 2020 23:19:06 -0700 (PDT) Received: from box.localdomain ([86.57.175.117]) by smtp.gmail.com with ESMTPSA id x4sm135508lfn.280.2020.10.19.23.19.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 19 Oct 2020 23:19:04 -0700 (PDT) From: "Kirill A. Shutemov" X-Google-Original-From: "Kirill A. Shutemov" Received: by box.localdomain (Postfix, from userid 1000) id B1A2F102328; Tue, 20 Oct 2020 09:19:01 +0300 (+03) To: Dave Hansen , Andy Lutomirski , Peter Zijlstra , Paolo Bonzini , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel Cc: David Rientjes , Andrea Arcangeli , Kees Cook , Will Drewry , "Edgecombe, Rick P" , "Kleen, Andi" , Liran Alon , Mike Rapoport , x86@kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [RFCv2 00/16] KVM protected memory extension Date: Tue, 20 Oct 2020 09:18:43 +0300 Message-Id: <20201020061859.18385-1-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.26.2 MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: == Background / Problem == There are a number of hardware features (MKTME, SEV) which protect guest memory from some unauthorized host access. The patchset proposes a purely software feature that mitigates some of the same host-side read-only attacks. == What does this set mitigate? == - Host kernel ”accidental” access to guest data (think speculation) - Host kernel induced access to guest data (write(fd, &guest_data_ptr, len)) - Host userspace access to guest data (compromised qemu) - Guest privilege escalation via compromised QEMU device emulation == What does this set NOT mitigate? == - Full host kernel compromise. Kernel will just map the pages again. - Hardware attacks The second RFC revision addresses /most/ of the feedback. I still didn't found a good solution to reboot and kexec. Unprotect all the memory on such operations defeat the goal of the feature. Clearing up most of the memory before unprotecting what is required for reboot (or kexec) is tedious and error-prone. Maybe we should just declare them unsupported? == Series Overview == The hardware features protect guest data by encrypting it and then ensuring that only the right guest can decrypt it. This has the side-effect of making the kernel direct map and userspace mapping (QEMU et al) useless. But, this teaches us something very useful: neither the kernel or userspace mappings are really necessary for normal guest operations. Instead of using encryption, this series simply unmaps the memory. One advantage compared to allowing access to ciphertext is that it allows bad accesses to be caught instead of simply reading garbage. Protection from physical attacks needs to be provided by some other means. On Intel platforms, (single-key) Total Memory Encryption (TME) provides mitigation against physical attacks, such as DIMM interposers sniffing memory bus traffic. The patchset modifies both host and guest kernel. The guest OS must enable the feature via hypercall and mark any memory range that has to be shared with the host: DMA regions, bounce buffers, etc. SEV does this marking via a bit in the guest’s page table while this approach uses a hypercall. For removing the userspace mapping, use a trick similar to what NUMA balancing does: convert memory that belongs to KVM memory slots to PROT_NONE: all existing entries converted to PROT_NONE with mprotect() and the newly faulted in pages get PROT_NONE from the updated vm_page_prot. The new VMA flag -- VM_KVM_PROTECTED -- indicates that the pages in the VMA must be treated in a special way in the GUP and fault paths. The flag allows GUP to return the page even though it is mapped with PROT_NONE, but only if the new GUP flag -- FOLL_KVM -- is specified. Any userspace access to the memory would result in SIGBUS. Any GUP access without FOLL_KVM would result in -EFAULT. Removing userspace mapping of the guest memory from QEMU process can help to address some guest privilege escalation attacks. Consider the case when unprivileged guest user exploits bug in a QEMU device emulation to gain access to data it cannot normally have access within the guest. Any anonymous page faulted into the VM_KVM_PROTECTED VMA gets removed from the direct mapping with kernel_map_pages(). Note that kernel_map_pages() only flushes local TLB. I think it's a reasonable compromise between security and perfromance. Zapping the PTE would bring the page back to the direct mapping after clearing. At least for now, we don't remove file-backed pages from the direct mapping. File-backed pages could be accessed via read/write syscalls. It adds complexity. Occasionally, host kernel has to access guest memory that was not made shared by the guest. For instance, it happens for instruction emulation. Normally, it's done via copy_to/from_user() which would fail with -EFAULT now. We introduced a new pair of helpers: copy_to/from_guest(). The new helpers acquire the page via GUP, map it into kernel address space with kmap_atomic()-style mechanism and only then copy the data. For some instruction emulation copying is not good enough: cmpxchg emulation has to have direct access to the guest memory. __kvm_map_gfn() is modified to accommodate the case. The patchset is on top of v5.9 Kirill A. Shutemov (16): x86/mm: Move force_dma_unencrypted() to common code x86/kvm: Introduce KVM memory protection feature x86/kvm: Make DMA pages shared x86/kvm: Use bounce buffers for KVM memory protection x86/kvm: Make VirtIO use DMA API in KVM guest x86/kvmclock: Share hvclock memory with the host x86/realmode: Share trampoline area if KVM memory protection enabled KVM: Use GUP instead of copy_from/to_user() to access guest memory KVM: mm: Introduce VM_KVM_PROTECTED KVM: x86: Use GUP for page walk instead of __get_user() KVM: Protected memory extension KVM: x86: Enabled protected memory extension KVM: Rework copy_to/from_guest() to avoid direct mapping KVM: Handle protected memory in __kvm_map_gfn()/__kvm_unmap_gfn() KVM: Unmap protected pages from direct mapping mm: Do not use zero page for VM_KVM_PROTECTED VMAs arch/powerpc/kvm/book3s_64_mmu_hv.c | 2 +- arch/powerpc/kvm/book3s_64_mmu_radix.c | 2 +- arch/s390/include/asm/pgtable.h | 2 +- arch/x86/Kconfig | 11 +- arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/include/asm/io.h | 6 +- arch/x86/include/asm/kvm_para.h | 5 + arch/x86/include/asm/pgtable_types.h | 1 + arch/x86/include/uapi/asm/kvm_para.h | 3 +- arch/x86/kernel/kvm.c | 20 +++ arch/x86/kernel/kvmclock.c | 2 +- arch/x86/kernel/pci-swiotlb.c | 3 +- arch/x86/kvm/Kconfig | 1 + arch/x86/kvm/cpuid.c | 3 +- arch/x86/kvm/mmu/mmu.c | 6 +- arch/x86/kvm/mmu/paging_tmpl.h | 10 +- arch/x86/kvm/x86.c | 9 + arch/x86/mm/Makefile | 2 + arch/x86/mm/ioremap.c | 16 +- arch/x86/mm/mem_encrypt.c | 51 ------ arch/x86/mm/mem_encrypt_common.c | 62 +++++++ arch/x86/mm/pat/set_memory.c | 7 + arch/x86/realmode/init.c | 7 +- drivers/virtio/virtio_ring.c | 4 + include/linux/kvm_host.h | 11 +- include/linux/kvm_types.h | 1 + include/linux/mm.h | 21 ++- include/uapi/linux/kvm_para.h | 5 +- mm/gup.c | 20 ++- mm/huge_memory.c | 31 +++- mm/ksm.c | 2 + mm/memory.c | 18 +- mm/mmap.c | 3 + mm/rmap.c | 4 + virt/kvm/Kconfig | 3 + virt/kvm/async_pf.c | 2 +- virt/kvm/kvm_main.c | 238 +++++++++++++++++++++---- virt/lib/Makefile | 1 + virt/lib/mem_protected.c | 193 ++++++++++++++++++++ 39 files changed, 666 insertions(+), 123 deletions(-) create mode 100644 arch/x86/mm/mem_encrypt_common.c create mode 100644 virt/lib/mem_protected.c