From patchwork Tue Jan 11 11:33:08 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "kirill.shutemov@linux.intel.com" X-Patchwork-Id: 12709735 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AA27EC433F5 for ; Tue, 11 Jan 2022 11:33:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DB51F6B0075; Tue, 11 Jan 2022 06:33:18 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D14396B007B; Tue, 11 Jan 2022 06:33:18 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AF1816B007D; Tue, 11 Jan 2022 06:33:18 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0235.hostedemail.com [216.40.44.235]) by kanga.kvack.org (Postfix) with ESMTP id 8FB616B0075 for ; Tue, 11 Jan 2022 06:33:18 -0500 (EST) Received: from smtpin31.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 54794824C421 for ; Tue, 11 Jan 2022 11:33:18 +0000 (UTC) X-FDA: 79017795276.31.23A4D6B Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by imf13.hostedemail.com (Postfix) with ESMTP id 54A4720002 for ; Tue, 11 Jan 2022 11:33:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1641900797; x=1673436797; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=eAlNpFgI2dj8DL18robIsH5Fta03JBWQ7Hr8f4oKmC0=; b=WIeLSgtcfy04zkVgeRZF2FfRvWG7qh1pRsw85stlNWai65UvryQAJo0u xeJOZ+7dNApEZU/QVregQbTfGb6cjOBwH58ut33tuQ8L/HtkKIqGePNM0 1oEozy7mvzorPLPQrBMIhImvH65E0GDKJgQw0LlTVAVWQ8vSd6mG1s5Lm KVCHcaC4Wh7TFCCK2/0x7p1Er4zdkDTk99WPo0d4CFY4hnP5cBHHI+Jz+ ECWqbmhudoxBOk45u3peIhptHGo9+3nG5qMwDgwuBBjROfYCTuQvHa7G5 W+KiTB4NXTgSNIUddYEzZtOkjq9dk0ZrToNrJYiMkfaFZafdpXB1Iz6WD w==; X-IronPort-AV: E=McAfee;i="6200,9189,10223"; a="242277578" X-IronPort-AV: E=Sophos;i="5.88,279,1635231600"; d="scan'208";a="242277578" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Jan 2022 03:33:14 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,279,1635231600"; d="scan'208";a="515063272" Received: from black.fi.intel.com ([10.237.72.28]) by orsmga007.jf.intel.com with ESMTP; 11 Jan 2022 03:33:08 -0800 Received: by black.fi.intel.com (Postfix, from userid 1000) id 20859125; Tue, 11 Jan 2022 13:33:19 +0200 (EET) From: "Kirill A. Shutemov" To: Borislav Petkov , Andy Lutomirski , Sean Christopherson , Andrew Morton , Joerg Roedel , Ard Biesheuvel Cc: Andi Kleen , Kuppuswamy Sathyanarayanan , David Rientjes , Vlastimil Babka , Tom Lendacky , Thomas Gleixner , Peter Zijlstra , Paolo Bonzini , Ingo Molnar , Varad Gautam , Dario Faggioli , x86@kernel.org, linux-mm@kvack.org, linux-coco@lists.linux.dev, linux-efi@vger.kernel.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [PATCHv2 1/7] mm: Add support for unaccepted memory Date: Tue, 11 Jan 2022 14:33:08 +0300 Message-Id: <20220111113314.27173-2-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220111113314.27173-1-kirill.shutemov@linux.intel.com> References: <20220111113314.27173-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 54A4720002 X-Stat-Signature: 8gym5ectpmuptxyeiofyn6gy4z9dkmh9 Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=WIeLSgtc; dmarc=pass (policy=none) header.from=intel.com; spf=none (imf13.hostedemail.com: domain of kirill.shutemov@linux.intel.com has no SPF policy when checking 192.55.52.120) smtp.mailfrom=kirill.shutemov@linux.intel.com X-Rspamd-Server: rspam06 X-HE-Tag: 1641900797-335957 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: UEFI Specification version 2.9 introduces the concept of memory acceptance. Some Virtual Machine platforms, such as Intel TDX or AMD SEV-SNP, requiring memory to be accepted before it can be used by the guest. Accepting happens via a protocol specific for the Virtual Machine platform. Accepting memory is costly and it makes VMM allocate memory for the accepted guest physical address range. It's better to postpone memory acceptance until memory is needed. It lowers boot time and reduces memory overhead. Support of such memory requires a few changes in core-mm code: - memblock has to accept memory on allocation; - page allocator has to accept memory on the first allocation of the page; Memblock change is trivial. The page allocator is modified to accept pages on the first allocation. PageOffline() is used to indicate that the page requires acceptance. The flag is currently used by hotplug and ballooning. Such pages are not available to the page allocator. Architecture has to provide three helpers if it wants to support unaccepted memory: - accept_memory() makes a range of physical addresses accepted. - maybe_set_page_offline() marks a page PageOffline() if it requires acceptance. Used during boot to put pages on free lists. - accept_and_clear_page_offline() makes a page accepted and clears PageOffline(). Signed-off-by: Kirill A. Shutemov --- include/linux/page-flags.h | 4 ++++ mm/internal.h | 15 +++++++++++++++ mm/memblock.c | 1 + mm/page_alloc.c | 21 ++++++++++++++++++++- 4 files changed, 40 insertions(+), 1 deletion(-) diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 52ec4b5e5615..281f70da329c 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -887,6 +887,10 @@ PAGE_TYPE_OPS(Buddy, buddy) * any further access to page content. PFN walkers that read content of random * pages should check PageOffline() and synchronize with such drivers using * page_offline_freeze()/page_offline_thaw(). + * + * If a PageOffline() page encountered on a buddy allocator's free list it has + * to be "accepted" before it can be used. + * See accept_and_clear_page_offline() and CONFIG_UNACCEPTED_MEMORY. */ PAGE_TYPE_OPS(Offline, offline) diff --git a/mm/internal.h b/mm/internal.h index 3b79a5c9427a..1738a4e2a27e 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -713,4 +713,19 @@ void vunmap_range_noflush(unsigned long start, unsigned long end); int numa_migrate_prep(struct page *page, struct vm_area_struct *vma, unsigned long addr, int page_nid, int *flags); +#ifndef CONFIG_UNACCEPTED_MEMORY +static inline void maybe_set_page_offline(struct page *page, unsigned int order) +{ +} + +static inline void accept_and_clear_page_offline(struct page *page, + unsigned int order) +{ +} + +static inline void accept_memory(phys_addr_t start, phys_addr_t end) +{ +} +#endif + #endif /* __MM_INTERNAL_H */ diff --git a/mm/memblock.c b/mm/memblock.c index 1018e50566f3..6dfa594192de 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -1400,6 +1400,7 @@ phys_addr_t __init memblock_alloc_range_nid(phys_addr_t size, */ kmemleak_alloc_phys(found, size, 0, 0); + accept_memory(found, found + size); return found; } diff --git a/mm/page_alloc.c b/mm/page_alloc.c index c5952749ad40..5707b4b5f774 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1064,6 +1064,7 @@ static inline void __free_one_page(struct page *page, unsigned int max_order; struct page *buddy; bool to_tail; + bool offline = PageOffline(page); max_order = min_t(unsigned int, MAX_ORDER - 1, pageblock_order); @@ -1097,6 +1098,10 @@ static inline void __free_one_page(struct page *page, clear_page_guard(zone, buddy, order, migratetype); else del_page_from_free_list(buddy, zone, order); + + if (PageOffline(buddy)) + offline = true; + combined_pfn = buddy_pfn & pfn; page = page + (combined_pfn - pfn); pfn = combined_pfn; @@ -1130,6 +1135,9 @@ static inline void __free_one_page(struct page *page, done_merging: set_buddy_order(page, order); + if (offline) + __SetPageOffline(page); + if (fpi_flags & FPI_TO_TAIL) to_tail = true; else if (is_shuffle_order(order)) @@ -1155,7 +1163,8 @@ static inline void __free_one_page(struct page *page, static inline bool page_expected_state(struct page *page, unsigned long check_flags) { - if (unlikely(atomic_read(&page->_mapcount) != -1)) + if (unlikely(atomic_read(&page->_mapcount) != -1) && + !PageOffline(page)) return false; if (unlikely((unsigned long)page->mapping | @@ -1734,6 +1743,8 @@ void __init memblock_free_pages(struct page *page, unsigned long pfn, { if (early_page_uninitialised(pfn)) return; + + maybe_set_page_offline(page, order); __free_pages_core(page, order); } @@ -1823,10 +1834,12 @@ static void __init deferred_free_range(unsigned long pfn, if (nr_pages == pageblock_nr_pages && (pfn & (pageblock_nr_pages - 1)) == 0) { set_pageblock_migratetype(page, MIGRATE_MOVABLE); + maybe_set_page_offline(page, pageblock_order); __free_pages_core(page, pageblock_order); return; } + accept_memory(pfn << PAGE_SHIFT, (pfn + nr_pages) << PAGE_SHIFT); for (i = 0; i < nr_pages; i++, page++, pfn++) { if ((pfn & (pageblock_nr_pages - 1)) == 0) set_pageblock_migratetype(page, MIGRATE_MOVABLE); @@ -2297,6 +2310,9 @@ static inline void expand(struct zone *zone, struct page *page, if (set_page_guard(zone, &page[size], high, migratetype)) continue; + if (PageOffline(page)) + __SetPageOffline(&page[size]); + add_to_free_list(&page[size], zone, high, migratetype); set_buddy_order(&page[size], high); } @@ -2393,6 +2409,9 @@ inline void post_alloc_hook(struct page *page, unsigned int order, */ kernel_unpoison_pages(page, 1 << order); + if (PageOffline(page)) + accept_and_clear_page_offline(page, order); + /* * As memory initialization might be integrated into KASAN, * kasan_alloc_pages and kernel_init_free_pages must be From patchwork Tue Jan 11 11:33:09 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "kirill.shutemov@linux.intel.com" X-Patchwork-Id: 12709731 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0A35CC433EF for ; Tue, 11 Jan 2022 11:33:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 54DEC6B0072; Tue, 11 Jan 2022 06:33:16 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4D6506B0073; Tue, 11 Jan 2022 06:33:16 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 39E2D6B0074; Tue, 11 Jan 2022 06:33:16 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0138.hostedemail.com [216.40.44.138]) by kanga.kvack.org (Postfix) with ESMTP id 267D36B0072 for ; Tue, 11 Jan 2022 06:33:16 -0500 (EST) Received: from smtpin24.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id D3E9F94FB2 for ; Tue, 11 Jan 2022 11:33:15 +0000 (UTC) X-FDA: 79017795150.24.648C2DB Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by imf14.hostedemail.com (Postfix) with ESMTP id 1BE99100007 for ; Tue, 11 Jan 2022 11:33:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1641900795; x=1673436795; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=HaLasq89QiEz30QdU5oVDEDYHaeFFEnWUHw6RN/QsG4=; b=VtnqvK0Al5GSDqo95u+QpqKW/GcImt0EVXQO59Yf+FcI1KIVtBlxQM5N CV4T9INpLbWbcFR6nclwj0dAKBFBMF8zzPlPMuETxdnJsqMzMAhlzbNKh cH0gB6Ko0JGuoI7odzsgX8lnz3ReKADTTr/8baaE/61ZGfq3CVGHcR0Tp rTg7e7bASQWp+YrSi5kz9dmIz68hKFPoAFDmRQTSEoufncoe8CF4CjMWN zS1gaAD4QoAVgaAdrPmRw2QO++seXwMveMzcZvt9ErVxXKjHhUEPCunvQ BtHNrDMXUfWHu5WBuX6a5U4BjkqBqTzHNy0rO9LjreMPseQUOGgV+/8Op Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10223"; a="243261606" X-IronPort-AV: E=Sophos;i="5.88,279,1635231600"; d="scan'208";a="243261606" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Jan 2022 03:33:13 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,279,1635231600"; d="scan'208";a="490351596" Received: from black.fi.intel.com ([10.237.72.28]) by orsmga002.jf.intel.com with ESMTP; 11 Jan 2022 03:33:07 -0800 Received: by black.fi.intel.com (Postfix, from userid 1000) id 35429232; Tue, 11 Jan 2022 13:33:19 +0200 (EET) From: "Kirill A. Shutemov" To: Borislav Petkov , Andy Lutomirski , Sean Christopherson , Andrew Morton , Joerg Roedel , Ard Biesheuvel Cc: Andi Kleen , Kuppuswamy Sathyanarayanan , David Rientjes , Vlastimil Babka , Tom Lendacky , Thomas Gleixner , Peter Zijlstra , Paolo Bonzini , Ingo Molnar , Varad Gautam , Dario Faggioli , x86@kernel.org, linux-mm@kvack.org, linux-coco@lists.linux.dev, linux-efi@vger.kernel.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [PATCHv2 2/7] efi/x86: Get full memory map in allocate_e820() Date: Tue, 11 Jan 2022 14:33:09 +0300 Message-Id: <20220111113314.27173-3-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220111113314.27173-1-kirill.shutemov@linux.intel.com> References: <20220111113314.27173-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 1BE99100007 X-Stat-Signature: txudfcmj5ga65gqgnxdkmc8d8zsiwn9j Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=VtnqvK0A; dmarc=pass (policy=none) header.from=intel.com; spf=none (imf14.hostedemail.com: domain of kirill.shutemov@linux.intel.com has no SPF policy when checking 134.134.136.24) smtp.mailfrom=kirill.shutemov@linux.intel.com X-Rspamd-Server: rspam02 X-HE-Tag: 1641900794-223114 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Currently allocate_e820() only interested in the size of map and size of memory descriptor to determine how many e820 entries the kernel needs. UEFI Specification version 2.9 introduces a new memory type -- unaccepted memory. To track unaccepted memory kernel needs to allocate a bitmap. The size of the bitmap is dependent on the maximum physical address present in the system. A full memory map is required to find the maximum address. Modify allocate_e820() to get a full memory map. This is preparation for the next patch that implements handling of unaccepted memory in EFI stub. Signed-off-by: Kirill A. Shutemov --- drivers/firmware/efi/libstub/x86-stub.c | 28 ++++++++++++------------- 1 file changed, 13 insertions(+), 15 deletions(-) diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c index f14c4ff5839f..a0b946182b5e 100644 --- a/drivers/firmware/efi/libstub/x86-stub.c +++ b/drivers/firmware/efi/libstub/x86-stub.c @@ -569,30 +569,28 @@ static efi_status_t alloc_e820ext(u32 nr_desc, struct setup_data **e820ext, } static efi_status_t allocate_e820(struct boot_params *params, + struct efi_boot_memmap *map, struct setup_data **e820ext, u32 *e820ext_size) { - unsigned long map_size, desc_size, map_key; efi_status_t status; - __u32 nr_desc, desc_version; + __u32 nr_desc; - /* Only need the size of the mem map and size of each mem descriptor */ - map_size = 0; - status = efi_bs_call(get_memory_map, &map_size, NULL, &map_key, - &desc_size, &desc_version); - if (status != EFI_BUFFER_TOO_SMALL) - return (status != EFI_SUCCESS) ? status : EFI_UNSUPPORTED; - - nr_desc = map_size / desc_size + EFI_MMAP_NR_SLACK_SLOTS; + status = efi_get_memory_map(map); + if (status != EFI_SUCCESS) + return status; - if (nr_desc > ARRAY_SIZE(params->e820_table)) { - u32 nr_e820ext = nr_desc - ARRAY_SIZE(params->e820_table); + nr_desc = *map->map_size / *map->desc_size; + if (nr_desc > ARRAY_SIZE(params->e820_table) - EFI_MMAP_NR_SLACK_SLOTS) { + u32 nr_e820ext = nr_desc - ARRAY_SIZE(params->e820_table) + + EFI_MMAP_NR_SLACK_SLOTS; status = alloc_e820ext(nr_e820ext, e820ext, e820ext_size); if (status != EFI_SUCCESS) - return status; + goto out; } - +out: + efi_bs_call(free_pool, *map->map); return EFI_SUCCESS; } @@ -642,7 +640,7 @@ static efi_status_t exit_boot(struct boot_params *boot_params, void *handle) priv.boot_params = boot_params; priv.efi = &boot_params->efi_info; - status = allocate_e820(boot_params, &e820ext, &e820ext_size); + status = allocate_e820(boot_params, &map, &e820ext, &e820ext_size); if (status != EFI_SUCCESS) return status; From patchwork Tue Jan 11 11:33:10 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "kirill.shutemov@linux.intel.com" X-Patchwork-Id: 12709732 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 22BAAC433F5 for ; Tue, 11 Jan 2022 11:33:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E75C86B0073; Tue, 11 Jan 2022 06:33:16 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E24EA6B0074; Tue, 11 Jan 2022 06:33:16 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CC6296B0075; Tue, 11 Jan 2022 06:33:16 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0141.hostedemail.com [216.40.44.141]) by kanga.kvack.org (Postfix) with ESMTP id BBAE66B0073 for ; Tue, 11 Jan 2022 06:33:16 -0500 (EST) Received: from smtpin31.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 71117181D6051 for ; Tue, 11 Jan 2022 11:33:16 +0000 (UTC) X-FDA: 79017795192.31.C4863FD Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by imf12.hostedemail.com (Postfix) with ESMTP id 806BA40002 for ; Tue, 11 Jan 2022 11:33:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1641900795; x=1673436795; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=r8ybxKYOCLta3T20mMeFLRdqIJU8SN1mkWkRljEEK/k=; b=I6PH46BulF71tXZqgnjHoFDAwMuEtLaEIg6x8YkuUEdkWprBAJUSvxKe 0cMjAhKBMuWznQaCcIMcpzUGHf5j1N/UCDBtJlE/zWdwmMv9oh7mrv63e 73LqftOnvyB5niky6PqXI7IFuAIkml573RvdSOpR3a1ULOI9rGLkiNVoa WJjj5I4ZmHvvGrCykUzdwS0iXYUfhuRuvO3v5zPg/Wo6yCSRWdrGOd8LE lHLVnleTxX3LflvyvYYwf6HKZEWmvBaphUNvl2hEVYlj1UESc6gq8AHu8 jW7b8x4yRLWmKz1xZKQpzmjh5FlF6EadvGoecxWArF2x3SFjIAU469Z51 A==; X-IronPort-AV: E=McAfee;i="6200,9189,10223"; a="242277572" X-IronPort-AV: E=Sophos;i="5.88,279,1635231600"; d="scan'208";a="242277572" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Jan 2022 03:33:13 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,279,1635231600"; d="scan'208";a="558334285" Received: from black.fi.intel.com ([10.237.72.28]) by orsmga001.jf.intel.com with ESMTP; 11 Jan 2022 03:33:08 -0800 Received: by black.fi.intel.com (Postfix, from userid 1000) id 3AF3E2C7; Tue, 11 Jan 2022 13:33:19 +0200 (EET) From: "Kirill A. Shutemov" To: Borislav Petkov , Andy Lutomirski , Sean Christopherson , Andrew Morton , Joerg Roedel , Ard Biesheuvel Cc: Andi Kleen , Kuppuswamy Sathyanarayanan , David Rientjes , Vlastimil Babka , Tom Lendacky , Thomas Gleixner , Peter Zijlstra , Paolo Bonzini , Ingo Molnar , Varad Gautam , Dario Faggioli , x86@kernel.org, linux-mm@kvack.org, linux-coco@lists.linux.dev, linux-efi@vger.kernel.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [PATCHv2 3/7] efi/x86: Implement support for unaccepted memory Date: Tue, 11 Jan 2022 14:33:10 +0300 Message-Id: <20220111113314.27173-4-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220111113314.27173-1-kirill.shutemov@linux.intel.com> References: <20220111113314.27173-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 806BA40002 X-Stat-Signature: p4tegyp73kr6bh8rua63uix9c9zwmmjm Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=I6PH46Bu; spf=none (imf12.hostedemail.com: domain of kirill.shutemov@linux.intel.com has no SPF policy when checking 192.55.52.120) smtp.mailfrom=kirill.shutemov@linux.intel.com; dmarc=pass (policy=none) header.from=intel.com X-HE-Tag: 1641900795-148049 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: UEFI Specification version 2.9 introduces the concept of memory acceptance: Some Virtual Machine platforms, such as Intel TDX or AMD SEV-SNP, requiring memory to be accepted before it can be used by the guest. Accepting happens via a protocol specific for the Virtual Machine platform. Accepting memory is costly and it makes VMM allocate memory for the accepted guest physical address range. It's better to postpone memory acceptance until memory is needed. It lowers boot time and reduces memory overhead. The kernel needs to know what memory has been accepted. Firmware communicates this information via memory map: a new memory type -- EFI_UNACCEPTED_MEMORY -- indicates such memory. Range-based tracking works fine for firmware, but it gets bulky for the kernel: e820 has to be modified on every page acceptance. It leads to table fragmentation, but there's a limited number of entries in the e820 table Another option is to mark such memory as usable in e820 and track if the range has been accepted in a bitmap. One bit in the bitmap represents 2MiB in the address space: one 4k page is enough to track 64GiB or physical address space. In the worst-case scenario -- a huge hole in the middle of the address space -- It needs 256MiB to handle 4PiB of the address space. Any unaccepted memory that is not aligned to 2M gets accepted upfront. The bitmap is allocated and constructed in the EFI stub and passed down to the kernel via boot_params. allocate_e820() allocates the bitmap if unaccepted memory is present, according to the maximum address in the memory map. The same boot_params.unaccepted_memory can be used to pass the bitmap between two kernels on kexec, but the use-case is not yet implemented. Signed-off-by: Kirill A. Shutemov --- Documentation/x86/zero-page.rst | 1 + arch/x86/boot/compressed/Makefile | 1 + arch/x86/boot/compressed/bitmap.c | 24 ++++++++ arch/x86/boot/compressed/unaccepted_memory.c | 45 +++++++++++++++ arch/x86/include/asm/unaccepted_memory.h | 12 ++++ arch/x86/include/uapi/asm/bootparam.h | 3 +- drivers/firmware/efi/Kconfig | 14 +++++ drivers/firmware/efi/efi.c | 1 + drivers/firmware/efi/libstub/x86-stub.c | 60 +++++++++++++++++++- include/linux/efi.h | 3 +- 10 files changed, 161 insertions(+), 3 deletions(-) create mode 100644 arch/x86/boot/compressed/bitmap.c create mode 100644 arch/x86/boot/compressed/unaccepted_memory.c create mode 100644 arch/x86/include/asm/unaccepted_memory.h diff --git a/Documentation/x86/zero-page.rst b/Documentation/x86/zero-page.rst index f088f5881666..8e3447a4b373 100644 --- a/Documentation/x86/zero-page.rst +++ b/Documentation/x86/zero-page.rst @@ -42,4 +42,5 @@ Offset/Size Proto Name Meaning 2D0/A00 ALL e820_table E820 memory map table (array of struct e820_entry) D00/1EC ALL eddbuf EDD data (array of struct edd_info) +ECC/008 ALL unaccepted_memory Bitmap of unaccepted memory (1bit == 2M) =========== ===== ======================= ================================================= diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile index 1bfe30ebadbe..f5b49e74d728 100644 --- a/arch/x86/boot/compressed/Makefile +++ b/arch/x86/boot/compressed/Makefile @@ -100,6 +100,7 @@ endif vmlinux-objs-$(CONFIG_ACPI) += $(obj)/acpi.o vmlinux-objs-$(CONFIG_INTEL_TDX_GUEST) += $(obj)/tdx.o vmlinux-objs-$(CONFIG_INTEL_TDX_GUEST) += $(obj)/tdcall.o +vmlinux-objs-$(CONFIG_UNACCEPTED_MEMORY) += $(obj)/bitmap.o $(obj)/unaccepted_memory.o vmlinux-objs-$(CONFIG_EFI_MIXED) += $(obj)/efi_thunk_$(BITS).o efi-obj-$(CONFIG_EFI_STUB) = $(objtree)/drivers/firmware/efi/libstub/lib.a diff --git a/arch/x86/boot/compressed/bitmap.c b/arch/x86/boot/compressed/bitmap.c new file mode 100644 index 000000000000..bf58b259380a --- /dev/null +++ b/arch/x86/boot/compressed/bitmap.c @@ -0,0 +1,24 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* Taken from lib/string.c */ + +#include + +void __bitmap_set(unsigned long *map, unsigned int start, int len) +{ + unsigned long *p = map + BIT_WORD(start); + const unsigned int size = start + len; + int bits_to_set = BITS_PER_LONG - (start % BITS_PER_LONG); + unsigned long mask_to_set = BITMAP_FIRST_WORD_MASK(start); + + while (len - bits_to_set >= 0) { + *p |= mask_to_set; + len -= bits_to_set; + bits_to_set = BITS_PER_LONG; + mask_to_set = ~0UL; + p++; + } + if (len) { + mask_to_set &= BITMAP_LAST_WORD_MASK(size); + *p |= mask_to_set; + } +} diff --git a/arch/x86/boot/compressed/unaccepted_memory.c b/arch/x86/boot/compressed/unaccepted_memory.c new file mode 100644 index 000000000000..d8081cde0eed --- /dev/null +++ b/arch/x86/boot/compressed/unaccepted_memory.c @@ -0,0 +1,45 @@ +// SPDX-License-Identifier: GPL-2.0-only + +#include "error.h" +#include "misc.h" + +static inline void __accept_memory(phys_addr_t start, phys_addr_t end) +{ + /* Platform-specific memory-acceptance call goes here */ + error("Cannot accept memory"); +} + +void mark_unaccepted(struct boot_params *params, u64 start, u64 end) +{ + /* + * The accepted memory bitmap only works at PMD_SIZE granularity. + * If a request comes in to mark memory as unaccepted which is not + * PMD_SIZE-aligned, simply accept the memory now since it can not be + * *marked* as unaccepted. + */ + + /* Immediately accept whole range if it is within a PMD_SIZE block: */ + if ((start & PMD_MASK) == (end & PMD_MASK)) { + npages = (end - start) / PAGE_SIZE; + __accept_memory(start, start + npages * PAGE_SIZE); + return; + } + + /* Immediately accept a unaccepted_memory, + start / PMD_SIZE, (end - start) / PMD_SIZE); +} diff --git a/arch/x86/include/asm/unaccepted_memory.h b/arch/x86/include/asm/unaccepted_memory.h new file mode 100644 index 000000000000..cbc24040b853 --- /dev/null +++ b/arch/x86/include/asm/unaccepted_memory.h @@ -0,0 +1,12 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* Copyright (C) 2020 Intel Corporation */ +#ifndef _ASM_X86_UNACCEPTED_MEMORY_H +#define _ASM_X86_UNACCEPTED_MEMORY_H + +#include + +struct boot_params; + +void mark_unaccepted(struct boot_params *params, u64 start, u64 num); + +#endif diff --git a/arch/x86/include/uapi/asm/bootparam.h b/arch/x86/include/uapi/asm/bootparam.h index b25d3f82c2f3..16bc686a198d 100644 --- a/arch/x86/include/uapi/asm/bootparam.h +++ b/arch/x86/include/uapi/asm/bootparam.h @@ -217,7 +217,8 @@ struct boot_params { struct boot_e820_entry e820_table[E820_MAX_ENTRIES_ZEROPAGE]; /* 0x2d0 */ __u8 _pad8[48]; /* 0xcd0 */ struct edd_info eddbuf[EDDMAXNR]; /* 0xd00 */ - __u8 _pad9[276]; /* 0xeec */ + __u64 unaccepted_memory; /* 0xeec */ + __u8 _pad9[268]; /* 0xef4 */ } __attribute__((packed)); /** diff --git a/drivers/firmware/efi/Kconfig b/drivers/firmware/efi/Kconfig index 2c3dac5ecb36..36c1bf33f112 100644 --- a/drivers/firmware/efi/Kconfig +++ b/drivers/firmware/efi/Kconfig @@ -243,6 +243,20 @@ config EFI_DISABLE_PCI_DMA options "efi=disable_early_pci_dma" or "efi=no_disable_early_pci_dma" may be used to override this option. +config UNACCEPTED_MEMORY + bool + depends on EFI_STUB + help + Some Virtual Machine platforms, such as Intel TDX, introduce + the concept of memory acceptance, requiring memory to be accepted + before it can be used by the guest. This protects against a class of + attacks by the virtual machine platform. + + UEFI specification v2.9 introduced EFI_UNACCEPTED_MEMORY memory type. + + This option adds support for unaccepted memory and makes such memory + usable by kernel. + endmenu config EFI_EMBEDDED_FIRMWARE diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c index ae79c3300129..abe862c381b6 100644 --- a/drivers/firmware/efi/efi.c +++ b/drivers/firmware/efi/efi.c @@ -740,6 +740,7 @@ static __initdata char memory_type_name[][13] = { "MMIO Port", "PAL Code", "Persistent", + "Unaccepted", }; char * __init efi_md_typeattr_format(char *buf, size_t size, diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c index a0b946182b5e..346b12d6f1b2 100644 --- a/drivers/firmware/efi/libstub/x86-stub.c +++ b/drivers/firmware/efi/libstub/x86-stub.c @@ -9,12 +9,14 @@ #include #include #include +#include #include #include #include #include #include +#include #include "efistub.h" @@ -504,6 +506,13 @@ setup_e820(struct boot_params *params, struct setup_data *e820ext, u32 e820ext_s e820_type = E820_TYPE_PMEM; break; + case EFI_UNACCEPTED_MEMORY: + if (!IS_ENABLED(CONFIG_UNACCEPTED_MEMORY)) + continue; + e820_type = E820_TYPE_RAM; + mark_unaccepted(params, d->phys_addr, + d->phys_addr + PAGE_SIZE * d->num_pages); + break; default: continue; } @@ -575,6 +584,9 @@ static efi_status_t allocate_e820(struct boot_params *params, { efi_status_t status; __u32 nr_desc; + bool unaccepted_memory_present = false; + u64 max_addr = 0; + int i; status = efi_get_memory_map(map); if (status != EFI_SUCCESS) @@ -589,9 +601,55 @@ static efi_status_t allocate_e820(struct boot_params *params, if (status != EFI_SUCCESS) goto out; } + + if (!IS_ENABLED(CONFIG_UNACCEPTED_MEMORY)) + goto out; + + /* Check if there's any unaccepted memory and find the max address */ + for (i = 0; i < nr_desc; i++) { + efi_memory_desc_t *d; + + d = efi_early_memdesc_ptr(*map->map, *map->desc_size, i); + if (d->type == EFI_UNACCEPTED_MEMORY) + unaccepted_memory_present = true; + if (d->phys_addr + d->num_pages * PAGE_SIZE > max_addr) + max_addr = d->phys_addr + d->num_pages * PAGE_SIZE; + } + + /* + * If unaccepted memory present allocate a bitmap to track what memory + * has to be accepted before access. + * + * One bit in the bitmap represents 2MiB in the address space: one 4k + * page is enough to track 64GiB or physical address space. + * + * In the worst case scenario -- a huge hole in the middle of the + * address space -- It needs 256MiB to handle 4PiB of the address + * space. + * + * TODO: handle situation if params->unaccepted_memory has already set. + * It's required to deal with kexec. + * + * The bitmap will be populated in setup_e820() according to the memory + * map after efi_exit_boot_services(). + */ + if (unaccepted_memory_present) { + unsigned long *unaccepted_memory = NULL; + u64 size = DIV_ROUND_UP(max_addr, PMD_SIZE * BITS_PER_BYTE); + + status = efi_allocate_pages(size, + (unsigned long *)&unaccepted_memory, + ULONG_MAX); + if (status != EFI_SUCCESS) + goto out; + memset(unaccepted_memory, 0, size); + params->unaccepted_memory = (u64)unaccepted_memory; + } + out: efi_bs_call(free_pool, *map->map); - return EFI_SUCCESS; + return status; + } struct exit_boot_struct { diff --git a/include/linux/efi.h b/include/linux/efi.h index dbd39b20e034..270333b9b94d 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -108,7 +108,8 @@ typedef struct { #define EFI_MEMORY_MAPPED_IO_PORT_SPACE 12 #define EFI_PAL_CODE 13 #define EFI_PERSISTENT_MEMORY 14 -#define EFI_MAX_MEMORY_TYPE 15 +#define EFI_UNACCEPTED_MEMORY 15 +#define EFI_MAX_MEMORY_TYPE 16 /* Attribute values: */ #define EFI_MEMORY_UC ((u64)0x0000000000000001ULL) /* uncached */ From patchwork Tue Jan 11 11:33:11 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "kirill.shutemov@linux.intel.com" X-Patchwork-Id: 12709733 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E768CC4332F for ; Tue, 11 Jan 2022 11:33:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AF75F6B0078; Tue, 11 Jan 2022 06:33:17 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A55326B007D; Tue, 11 Jan 2022 06:33:17 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8F73C6B007B; Tue, 11 Jan 2022 06:33:17 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0165.hostedemail.com [216.40.44.165]) by kanga.kvack.org (Postfix) with ESMTP id 6F7106B0074 for ; Tue, 11 Jan 2022 06:33:17 -0500 (EST) Received: from smtpin19.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 392A1824C421 for ; Tue, 11 Jan 2022 11:33:17 +0000 (UTC) X-FDA: 79017795234.19.82E9AAC Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by imf25.hostedemail.com (Postfix) with ESMTP id BBE3DA0007 for ; Tue, 11 Jan 2022 11:33:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1641900795; x=1673436795; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=5dqBdfMq7ze0AdxX7ODzhLdYdgNTaNVDweYGFgnO8Es=; b=mBzgIwlFJSRCnyS/YPzsJKUiVv8mxNCsTRboXrJhw4qAfTElgFc6a0El PzXNgEBK/cQDsXuvTPkcK8wYZARECOWhweGT28RnBJNJLPbVywJYTc1Fo 4ByZyHFYBD0Hsq9AXTbsOwSB75T2axGb1D4K+L3+uaQLfzuhUFJfrK44X IgJ/ZR7pEN345ZfUIJ1RYdv6mlqJsdsE7rsIH93hwg3lPLtgQ4XJR/K+p WFtuaDEOzcc2ubn9ZONZzn0QoKpcgi+yPFl6aNds1TJF+SQ1PLI0QmJ28 suCHE0TVfA/XuPPdfmvsh4bzfe7xMMa8kmOIzPVc/XxLG/QzwvibmYxpY g==; X-IronPort-AV: E=McAfee;i="6200,9189,10223"; a="224161672" X-IronPort-AV: E=Sophos;i="5.88,279,1635231600"; d="scan'208";a="224161672" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Jan 2022 03:33:14 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,279,1635231600"; d="scan'208";a="472431125" Received: from black.fi.intel.com ([10.237.72.28]) by orsmga003.jf.intel.com with ESMTP; 11 Jan 2022 03:33:08 -0800 Received: by black.fi.intel.com (Postfix, from userid 1000) id 5228C346; Tue, 11 Jan 2022 13:33:19 +0200 (EET) From: "Kirill A. Shutemov" To: Borislav Petkov , Andy Lutomirski , Sean Christopherson , Andrew Morton , Joerg Roedel , Ard Biesheuvel Cc: Andi Kleen , Kuppuswamy Sathyanarayanan , David Rientjes , Vlastimil Babka , Tom Lendacky , Thomas Gleixner , Peter Zijlstra , Paolo Bonzini , Ingo Molnar , Varad Gautam , Dario Faggioli , x86@kernel.org, linux-mm@kvack.org, linux-coco@lists.linux.dev, linux-efi@vger.kernel.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [PATCHv2 4/7] x86/boot/compressed: Handle unaccepted memory Date: Tue, 11 Jan 2022 14:33:11 +0300 Message-Id: <20220111113314.27173-5-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220111113314.27173-1-kirill.shutemov@linux.intel.com> References: <20220111113314.27173-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: BBE3DA0007 X-Stat-Signature: z5fuhpkrtn8s3epirgn59seo6zshe3n3 Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=mBzgIwlF; dmarc=pass (policy=none) header.from=intel.com; spf=none (imf25.hostedemail.com: domain of kirill.shutemov@linux.intel.com has no SPF policy when checking 192.55.52.151) smtp.mailfrom=kirill.shutemov@linux.intel.com X-HE-Tag: 1641900795-701342 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Firmware is responsible for accepting memory where compressed kernel image and initrd land. But kernel has to accept memory for decompression buffer: accept memory just before decompression starts. KASLR is allowed to use unaccepted memory for the output buffer. Signed-off-by: Kirill A. Shutemov --- arch/x86/boot/compressed/bitmap.c | 62 ++++++++++++++++++++ arch/x86/boot/compressed/kaslr.c | 14 ++++- arch/x86/boot/compressed/misc.c | 9 +++ arch/x86/boot/compressed/unaccepted_memory.c | 13 ++++ arch/x86/include/asm/unaccepted_memory.h | 2 + 5 files changed, 98 insertions(+), 2 deletions(-) diff --git a/arch/x86/boot/compressed/bitmap.c b/arch/x86/boot/compressed/bitmap.c index bf58b259380a..ba2de61c0823 100644 --- a/arch/x86/boot/compressed/bitmap.c +++ b/arch/x86/boot/compressed/bitmap.c @@ -2,6 +2,48 @@ /* Taken from lib/string.c */ #include +#include +#include + +unsigned long _find_next_bit(const unsigned long *addr1, + const unsigned long *addr2, unsigned long nbits, + unsigned long start, unsigned long invert, unsigned long le) +{ + unsigned long tmp, mask; + + if (unlikely(start >= nbits)) + return nbits; + + tmp = addr1[start / BITS_PER_LONG]; + if (addr2) + tmp &= addr2[start / BITS_PER_LONG]; + tmp ^= invert; + + /* Handle 1st word. */ + mask = BITMAP_FIRST_WORD_MASK(start); + if (le) + mask = swab(mask); + + tmp &= mask; + + start = round_down(start, BITS_PER_LONG); + + while (!tmp) { + start += BITS_PER_LONG; + if (start >= nbits) + return nbits; + + tmp = addr1[start / BITS_PER_LONG]; + if (addr2) + tmp &= addr2[start / BITS_PER_LONG]; + tmp ^= invert; + } + + if (le) + tmp = swab(tmp); + + return min(start + __ffs(tmp), nbits); +} void __bitmap_set(unsigned long *map, unsigned int start, int len) { @@ -22,3 +64,23 @@ void __bitmap_set(unsigned long *map, unsigned int start, int len) *p |= mask_to_set; } } + +void __bitmap_clear(unsigned long *map, unsigned int start, int len) +{ + unsigned long *p = map + BIT_WORD(start); + const unsigned int size = start + len; + int bits_to_clear = BITS_PER_LONG - (start % BITS_PER_LONG); + unsigned long mask_to_clear = BITMAP_FIRST_WORD_MASK(start); + + while (len - bits_to_clear >= 0) { + *p &= ~mask_to_clear; + len -= bits_to_clear; + bits_to_clear = BITS_PER_LONG; + mask_to_clear = ~0UL; + p++; + } + if (len) { + mask_to_clear &= BITMAP_LAST_WORD_MASK(size); + *p &= ~mask_to_clear; + } +} diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c index 411b268bc0a2..59db90626042 100644 --- a/arch/x86/boot/compressed/kaslr.c +++ b/arch/x86/boot/compressed/kaslr.c @@ -725,10 +725,20 @@ process_efi_entries(unsigned long minimum, unsigned long image_size) * but in practice there's firmware where using that memory leads * to crashes. * - * Only EFI_CONVENTIONAL_MEMORY is guaranteed to be free. + * Only EFI_CONVENTIONAL_MEMORY and EFI_UNACCEPTED_MEMORY (if + * supported) are guaranteed to be free. */ - if (md->type != EFI_CONVENTIONAL_MEMORY) + + switch (md->type) { + case EFI_CONVENTIONAL_MEMORY: + break; + case EFI_UNACCEPTED_MEMORY: + if (IS_ENABLED(CONFIG_UNACCEPTED_MEMORY)) + break; continue; + default: + continue; + } if (efi_soft_reserve_enabled() && (md->attribute & EFI_MEMORY_SP)) diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c index d8373d766672..1e3efd0a8e11 100644 --- a/arch/x86/boot/compressed/misc.c +++ b/arch/x86/boot/compressed/misc.c @@ -18,6 +18,7 @@ #include "../string.h" #include "../voffset.h" #include +#include /* * WARNING!! @@ -446,6 +447,14 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap, #endif debug_putstr("\nDecompressing Linux... "); + + if (IS_ENABLED(CONFIG_UNACCEPTED_MEMORY) && + boot_params->unaccepted_memory) { + debug_putstr("Accepting memory... "); + accept_memory((phys_addr_t)output, + (phys_addr_t)output + needed_size); + } + __decompress(input_data, input_len, NULL, NULL, output, output_len, NULL, error); parse_elf(output); diff --git a/arch/x86/boot/compressed/unaccepted_memory.c b/arch/x86/boot/compressed/unaccepted_memory.c index d8081cde0eed..91db800d5f5e 100644 --- a/arch/x86/boot/compressed/unaccepted_memory.c +++ b/arch/x86/boot/compressed/unaccepted_memory.c @@ -43,3 +43,16 @@ void mark_unaccepted(struct boot_params *params, u64 start, u64 end) bitmap_set((unsigned long *)params->unaccepted_memory, start / PMD_SIZE, (end - start) / PMD_SIZE); } + +void accept_memory(phys_addr_t start, phys_addr_t end) +{ + unsigned long *unaccepted_memory; + unsigned int rs, re; + + unaccepted_memory = (unsigned long *)boot_params->unaccepted_memory; + bitmap_for_each_set_region(unaccepted_memory, rs, re, + start / PMD_SIZE, end / PMD_SIZE) { + __accept_memory(rs * PMD_SIZE, re * PMD_SIZE); + bitmap_clear(unaccepted_memory, rs, re - rs); + } +} diff --git a/arch/x86/include/asm/unaccepted_memory.h b/arch/x86/include/asm/unaccepted_memory.h index cbc24040b853..f1f835d3cd78 100644 --- a/arch/x86/include/asm/unaccepted_memory.h +++ b/arch/x86/include/asm/unaccepted_memory.h @@ -9,4 +9,6 @@ struct boot_params; void mark_unaccepted(struct boot_params *params, u64 start, u64 num); +void accept_memory(phys_addr_t start, phys_addr_t end); + #endif From patchwork Tue Jan 11 11:33:12 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "kirill.shutemov@linux.intel.com" X-Patchwork-Id: 12709736 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6ECDAC433EF for ; Tue, 11 Jan 2022 11:33:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1A4926B007B; Tue, 11 Jan 2022 06:33:21 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0E1416B007D; Tue, 11 Jan 2022 06:33:20 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D3E316B007E; Tue, 11 Jan 2022 06:33:20 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0129.hostedemail.com [216.40.44.129]) by kanga.kvack.org (Postfix) with ESMTP id BF5E86B007B for ; Tue, 11 Jan 2022 06:33:20 -0500 (EST) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 7CD1294FB2 for ; Tue, 11 Jan 2022 11:33:20 +0000 (UTC) X-FDA: 79017795360.25.6F93818 Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by imf14.hostedemail.com (Postfix) with ESMTP id 80E33100004 for ; Tue, 11 Jan 2022 11:33:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1641900799; x=1673436799; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=hyOD0iOzQ3OdcyuSvPELW4e2F6ocBQ3s5Blwr9IqDVU=; b=UQ9mjWbR+pCIH+8JdoyKKi7x42HyQAKrR+0etELonfdVlW/4rFPKfU4N 03O2y3zRnHVbNDtLNNAD1kghcxpOdmaSQf9WeimTgM/RVWa7RWSzy2CFQ DxLwylAjGgccAMZQRgWm5/N705cwTeKXjufzseDVgHQ2psTzUpiWvQnCX geQ8byxKOeROVl33pRrZIXXL3o9fg4vqZ14l7ZIc1lk9ZajUw/EltX2SA Vy8XHf7MtPUJZFfqnxJqhIkj/S95cFKOjbH0Wx4qDBOsePsVhMoiX1us2 52eOJ6nZBTlAJlxIQF/z6UaKewStCuA1hXDnAvEDfmS91f+rXaWZu4FL3 w==; X-IronPort-AV: E=McAfee;i="6200,9189,10223"; a="243261641" X-IronPort-AV: E=Sophos;i="5.88,279,1635231600"; d="scan'208";a="243261641" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Jan 2022 03:33:19 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,279,1635231600"; d="scan'208";a="623042894" Received: from black.fi.intel.com ([10.237.72.28]) by orsmga004.jf.intel.com with ESMTP; 11 Jan 2022 03:33:14 -0800 Received: by black.fi.intel.com (Postfix, from userid 1000) id 5F7974AC; Tue, 11 Jan 2022 13:33:19 +0200 (EET) From: "Kirill A. Shutemov" To: Borislav Petkov , Andy Lutomirski , Sean Christopherson , Andrew Morton , Joerg Roedel , Ard Biesheuvel Cc: Andi Kleen , Kuppuswamy Sathyanarayanan , David Rientjes , Vlastimil Babka , Tom Lendacky , Thomas Gleixner , Peter Zijlstra , Paolo Bonzini , Ingo Molnar , Varad Gautam , Dario Faggioli , x86@kernel.org, linux-mm@kvack.org, linux-coco@lists.linux.dev, linux-efi@vger.kernel.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [PATCHv2 5/7] x86/mm: Reserve unaccepted memory bitmap Date: Tue, 11 Jan 2022 14:33:12 +0300 Message-Id: <20220111113314.27173-6-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220111113314.27173-1-kirill.shutemov@linux.intel.com> References: <20220111113314.27173-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 80E33100004 X-Stat-Signature: jmk1of1fmw936p18ptqucrgd1cq1qr8n Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=UQ9mjWbR; dmarc=pass (policy=none) header.from=intel.com; spf=none (imf14.hostedemail.com: domain of kirill.shutemov@linux.intel.com has no SPF policy when checking 134.134.136.24) smtp.mailfrom=kirill.shutemov@linux.intel.com X-Rspamd-Server: rspam02 X-HE-Tag: 1641900799-612943 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Unaccepted memory bitmap is allocated during decompression stage and handed over to main kernel image via boot_params. The bitmap is used to track if memory has been accepted. Reserve unaccepted memory bitmap has to prevent reallocating memory for other means. Signed-off-by: Kirill A. Shutemov --- arch/x86/kernel/e820.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c index bc0657f0deed..dc9048e2d421 100644 --- a/arch/x86/kernel/e820.c +++ b/arch/x86/kernel/e820.c @@ -1290,6 +1290,16 @@ void __init e820__memory_setup(void) pr_info("BIOS-provided physical RAM map:\n"); e820__print_table(who); + + /* Mark unaccepted memory bitmap reserved */ + if (boot_params.unaccepted_memory) { + unsigned long size; + + /* One bit per 2MB */ + size = DIV_ROUND_UP(e820__end_of_ram_pfn() * PAGE_SIZE, + PMD_SIZE * BITS_PER_BYTE); + memblock_reserve(boot_params.unaccepted_memory, size); + } } void __init e820__memblock_setup(void) From patchwork Tue Jan 11 11:33:13 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "kirill.shutemov@linux.intel.com" X-Patchwork-Id: 12709738 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D8D59C433EF for ; Tue, 11 Jan 2022 11:33:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9E4B76B007E; Tue, 11 Jan 2022 06:33:23 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 9447A6B0080; Tue, 11 Jan 2022 06:33:23 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 770696B0081; Tue, 11 Jan 2022 06:33:23 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0140.hostedemail.com [216.40.44.140]) by kanga.kvack.org (Postfix) with ESMTP id 5F06A6B007E for ; Tue, 11 Jan 2022 06:33:23 -0500 (EST) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 2352B94FCA for ; Tue, 11 Jan 2022 11:33:23 +0000 (UTC) X-FDA: 79017795486.14.36E19B3 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by imf21.hostedemail.com (Postfix) with ESMTP id 0B3671C0006 for ; Tue, 11 Jan 2022 11:33:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1641900802; x=1673436802; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=XXiNVM5MD77OC0PKAJ7Qh/8iyqoz0w1CDeuRwsZQWGI=; b=Y8Kp+b6YRpCEKQaQrrwGRS22EY123gAow+E0HHkNypSqdYX7Rfp8xuFZ 5OufXaij+H8HIHeRKXJuoBLaVoimZUG9vcO15ZDMszA6oxXBkUXpJH+vD Qd4mdRt3DuJlCw8j2cbBG3fbb+xe9WfkZ0XLn/Hy/hhSBtJnVC7TIC9zL RSKgXpl/8j66tHcswavIs1lyzDyV9I5RLOllI9b+gCkBE+Dl9aT2rYg/Z uKbTA8QsEPCnAaRXCeVPGrlLV0TWBK7Kvgup8Dm698ic+6NiHmpZ3BS+e alkOAqAbjymsk5tf/+mnhK576VJKLVdXhj4Mj9+W+mdjBAyhqnwtwA2ak A==; X-IronPort-AV: E=McAfee;i="6200,9189,10223"; a="243663703" X-IronPort-AV: E=Sophos;i="5.88,279,1635231600"; d="scan'208";a="243663703" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Jan 2022 03:33:19 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,279,1635231600"; d="scan'208";a="490351608" Received: from black.fi.intel.com ([10.237.72.28]) by orsmga002.jf.intel.com with ESMTP; 11 Jan 2022 03:33:14 -0800 Received: by black.fi.intel.com (Postfix, from userid 1000) id 6C8D9651; Tue, 11 Jan 2022 13:33:19 +0200 (EET) From: "Kirill A. Shutemov" To: Borislav Petkov , Andy Lutomirski , Sean Christopherson , Andrew Morton , Joerg Roedel , Ard Biesheuvel Cc: Andi Kleen , Kuppuswamy Sathyanarayanan , David Rientjes , Vlastimil Babka , Tom Lendacky , Thomas Gleixner , Peter Zijlstra , Paolo Bonzini , Ingo Molnar , Varad Gautam , Dario Faggioli , x86@kernel.org, linux-mm@kvack.org, linux-coco@lists.linux.dev, linux-efi@vger.kernel.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [PATCHv2 6/7] x86/mm: Provide helpers for unaccepted memory Date: Tue, 11 Jan 2022 14:33:13 +0300 Message-Id: <20220111113314.27173-7-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220111113314.27173-1-kirill.shutemov@linux.intel.com> References: <20220111113314.27173-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=Y8Kp+b6Y; dmarc=pass (policy=none) header.from=intel.com; spf=none (imf21.hostedemail.com: domain of kirill.shutemov@linux.intel.com has no SPF policy when checking 192.55.52.115) smtp.mailfrom=kirill.shutemov@linux.intel.com X-Stat-Signature: 3fpy7reowawx6zbm9u6awzof4c9uosaz X-Rspamd-Queue-Id: 0B3671C0006 X-Rspamd-Server: rspam12 X-HE-Tag: 1641900801-982341 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Core-mm requires few helpers to support unaccepted memory: - accept_memory() checks the range of addresses against the bitmap and accept memory if needed; - maybe_set_page_offline() checks the bitmap and marks a page with PageOffline() if memory acceptance required on the first allocation of the page. - accept_and_clear_page_offline() accepts memory for the page and clears PageOffline(). Signed-off-by: Kirill A. Shutemov --- arch/x86/boot/compressed/unaccepted_memory.c | 3 +- arch/x86/include/asm/page.h | 5 ++ arch/x86/include/asm/unaccepted_memory.h | 3 + arch/x86/mm/Makefile | 2 + arch/x86/mm/unaccepted_memory.c | 90 ++++++++++++++++++++ 5 files changed, 101 insertions(+), 2 deletions(-) create mode 100644 arch/x86/mm/unaccepted_memory.c diff --git a/arch/x86/boot/compressed/unaccepted_memory.c b/arch/x86/boot/compressed/unaccepted_memory.c index 91db800d5f5e..b6caca4d3d22 100644 --- a/arch/x86/boot/compressed/unaccepted_memory.c +++ b/arch/x86/boot/compressed/unaccepted_memory.c @@ -20,8 +20,7 @@ void mark_unaccepted(struct boot_params *params, u64 start, u64 end) /* Immediately accept whole range if it is within a PMD_SIZE block: */ if ((start & PMD_MASK) == (end & PMD_MASK)) { - npages = (end - start) / PAGE_SIZE; - __accept_memory(start, start + npages * PAGE_SIZE); + __accept_memory(start, end); return; } diff --git a/arch/x86/include/asm/page.h b/arch/x86/include/asm/page.h index 4d5810c8fab7..1e56d76ca474 100644 --- a/arch/x86/include/asm/page.h +++ b/arch/x86/include/asm/page.h @@ -19,6 +19,11 @@ struct page; #include + +#ifdef CONFIG_UNACCEPTED_MEMORY +#include +#endif + extern struct range pfn_mapped[]; extern int nr_pfn_mapped; diff --git a/arch/x86/include/asm/unaccepted_memory.h b/arch/x86/include/asm/unaccepted_memory.h index f1f835d3cd78..8a06ac8fc9e9 100644 --- a/arch/x86/include/asm/unaccepted_memory.h +++ b/arch/x86/include/asm/unaccepted_memory.h @@ -6,9 +6,12 @@ #include struct boot_params; +struct page; void mark_unaccepted(struct boot_params *params, u64 start, u64 num); void accept_memory(phys_addr_t start, phys_addr_t end); +void maybe_set_page_offline(struct page *page, unsigned int order); +void accept_and_clear_page_offline(struct page *page, unsigned int order); #endif diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile index fe3d3061fc11..e327f83e6bbf 100644 --- a/arch/x86/mm/Makefile +++ b/arch/x86/mm/Makefile @@ -60,3 +60,5 @@ obj-$(CONFIG_AMD_MEM_ENCRYPT) += mem_encrypt_amd.o obj-$(CONFIG_AMD_MEM_ENCRYPT) += mem_encrypt_identity.o obj-$(CONFIG_AMD_MEM_ENCRYPT) += mem_encrypt_boot.o + +obj-$(CONFIG_UNACCEPTED_MEMORY) += unaccepted_memory.o diff --git a/arch/x86/mm/unaccepted_memory.c b/arch/x86/mm/unaccepted_memory.c new file mode 100644 index 000000000000..984eaead0b11 --- /dev/null +++ b/arch/x86/mm/unaccepted_memory.c @@ -0,0 +1,90 @@ +#include +#include +#include +#include + +#include +#include +#include + +static DEFINE_SPINLOCK(unaccepted_memory_lock); + +#define PMD_ORDER (PMD_SHIFT - PAGE_SHIFT) + +static void __accept_memory(phys_addr_t start, phys_addr_t end) +{ + unsigned long *unaccepted_memory; + unsigned int rs, re; + + unaccepted_memory = __va(boot_params.unaccepted_memory); + bitmap_for_each_set_region(unaccepted_memory, rs, re, + start / PMD_SIZE, + DIV_ROUND_UP(end, PMD_SIZE)) { + /* Platform-specific memory-acceptance call goes here */ + panic("Cannot accept memory"); + bitmap_clear(unaccepted_memory, rs, re - rs); + } +} + +void accept_memory(phys_addr_t start, phys_addr_t end) +{ + unsigned long flags; + if (!boot_params.unaccepted_memory) + return; + + spin_lock_irqsave(&unaccepted_memory_lock, flags); + __accept_memory(start, end); + spin_unlock_irqrestore(&unaccepted_memory_lock, flags); +} + +void __init maybe_set_page_offline(struct page *page, unsigned int order) +{ + unsigned long *unaccepted_memory; + phys_addr_t addr = page_to_phys(page); + unsigned long flags; + bool unaccepted = false; + unsigned int i; + + if (!boot_params.unaccepted_memory) + return; + + unaccepted_memory = __va(boot_params.unaccepted_memory); + spin_lock_irqsave(&unaccepted_memory_lock, flags); + if (order < PMD_ORDER) { + BUG_ON(test_bit(addr / PMD_SIZE, unaccepted_memory)); + goto out; + } + + for (i = 0; i < (1 << (order - PMD_ORDER)); i++) { + if (test_bit(addr / PMD_SIZE + i, unaccepted_memory)) { + unaccepted = true; + break; + } + } + + /* At least part of page is uneccepted */ + if (unaccepted) + __SetPageOffline(page); +out: + spin_unlock_irqrestore(&unaccepted_memory_lock, flags); +} + +void accept_and_clear_page_offline(struct page *page, unsigned int order) +{ + phys_addr_t addr = round_down(page_to_phys(page), PMD_SIZE); + int i; + + /* PageOffline() page on a free list, but no unaccepted memory? Hm. */ + WARN_ON_ONCE(!boot_params.unaccepted_memory); + + page = pfn_to_page(addr >> PAGE_SHIFT); + if (order < PMD_ORDER) + order = PMD_ORDER; + + accept_memory(addr, addr + (PAGE_SIZE << order)); + + for (i = 0; i < (1 << order); i++) { + if (PageOffline(page + i)) + __ClearPageOffline(page + i); + } +} From patchwork Tue Jan 11 11:33:14 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "kirill.shutemov@linux.intel.com" X-Patchwork-Id: 12709737 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 392A9C433F5 for ; Tue, 11 Jan 2022 11:33:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 85E856B007D; Tue, 11 Jan 2022 06:33:22 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 746E76B007E; Tue, 11 Jan 2022 06:33:22 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 52A076B0080; Tue, 11 Jan 2022 06:33:22 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0204.hostedemail.com [216.40.44.204]) by kanga.kvack.org (Postfix) with ESMTP id 2035D6B007D for ; Tue, 11 Jan 2022 06:33:22 -0500 (EST) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id D62F3181D6051 for ; Tue, 11 Jan 2022 11:33:21 +0000 (UTC) X-FDA: 79017795402.14.396CF5D Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by imf21.hostedemail.com (Postfix) with ESMTP id 1921A1C0006 for ; Tue, 11 Jan 2022 11:33:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1641900801; x=1673436801; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ix1Lln08Q+6dMYxeRkLCXlqDI7dwftevgrEgpLe26DE=; b=kc3DKZt1vQAbKRlhWIVtJ3eKO/NlclVQlxqJXKavbjJgB9w2a4QKjD7s 3Mb0tOlI+QKix6oDGYX0ZtmSYNQkvBDCp0prO7NzX+QGchiNm7/OZS4ds SWXuDE9VyBARw3J0iHFUmDXPV2QQw4Du1Vj4OZ9Grmh8Rgi6IF35pvFAU 0Lk70H2DeNx+nRQJ3RJtGf4tJTc9uXyW6DxJg89SJO5RLd1sAJzgG4SV9 ndm+XCQ8hJVWUxYgSocgypobOKqMnViUhAwHiMcic/kgZcepITYF9Gbdm M5UvE13msps6MZda2u3diyFb+ki6bJ1lOHyG+kSJ0QbguivwLcwYFRTFX g==; X-IronPort-AV: E=McAfee;i="6200,9189,10223"; a="243663698" X-IronPort-AV: E=Sophos;i="5.88,279,1635231600"; d="scan'208";a="243663698" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Jan 2022 03:33:19 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,279,1635231600"; d="scan'208";a="576179347" Received: from black.fi.intel.com ([10.237.72.28]) by fmsmga008.fm.intel.com with ESMTP; 11 Jan 2022 03:33:14 -0800 Received: by black.fi.intel.com (Postfix, from userid 1000) id 79EB7699; Tue, 11 Jan 2022 13:33:19 +0200 (EET) From: "Kirill A. Shutemov" To: Borislav Petkov , Andy Lutomirski , Sean Christopherson , Andrew Morton , Joerg Roedel , Ard Biesheuvel Cc: Andi Kleen , Kuppuswamy Sathyanarayanan , David Rientjes , Vlastimil Babka , Tom Lendacky , Thomas Gleixner , Peter Zijlstra , Paolo Bonzini , Ingo Molnar , Varad Gautam , Dario Faggioli , x86@kernel.org, linux-mm@kvack.org, linux-coco@lists.linux.dev, linux-efi@vger.kernel.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [PATCHv2 7/7] x86/tdx: Unaccepted memory support Date: Tue, 11 Jan 2022 14:33:14 +0300 Message-Id: <20220111113314.27173-8-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220111113314.27173-1-kirill.shutemov@linux.intel.com> References: <20220111113314.27173-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=kc3DKZt1; dmarc=pass (policy=none) header.from=intel.com; spf=none (imf21.hostedemail.com: domain of kirill.shutemov@linux.intel.com has no SPF policy when checking 192.55.52.115) smtp.mailfrom=kirill.shutemov@linux.intel.com X-Stat-Signature: 8t43sp5tfskyqg9tk3q63qp9f35noe1z X-Rspamd-Queue-Id: 1921A1C0006 X-Rspamd-Server: rspam12 X-HE-Tag: 1641900800-195145 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: All preparation is complete. Hookup TDX-specific code to accept memory. There are two tdx_accept_memory() implementations: one in main kernel and one in the decompressor. The implementation in core kernel uses tdx_hcall_gpa_intent(). The helper is not available in the decompressor, self-contained implementation added there instead. Signed-off-by: Kirill A. Shutemov --- arch/x86/Kconfig | 1 + arch/x86/boot/compressed/tdx.c | 67 ++++++++++++++++++++ arch/x86/boot/compressed/unaccepted_memory.c | 9 ++- arch/x86/include/asm/tdx.h | 2 + arch/x86/kernel/tdx.c | 7 ++ arch/x86/mm/unaccepted_memory.c | 6 +- 6 files changed, 90 insertions(+), 2 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index e2ed1684f399..5d0f99bd3538 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -879,6 +879,7 @@ config INTEL_TDX_GUEST select ARCH_HAS_CC_PLATFORM select X86_MCE select X86_MEM_ENCRYPT + select UNACCEPTED_MEMORY help Support running as a guest under Intel TDX. Without this support, the guest kernel can not boot or run under TDX. diff --git a/arch/x86/boot/compressed/tdx.c b/arch/x86/boot/compressed/tdx.c index 50c8145bd0f3..587e6d948953 100644 --- a/arch/x86/boot/compressed/tdx.c +++ b/arch/x86/boot/compressed/tdx.c @@ -5,12 +5,54 @@ #include "../cpuflags.h" #include "../string.h" +#include "error.h" +#include + +#define TDX_HYPERCALL_STANDARD 0 #define TDX_CPUID_LEAF_ID 0x21 #define TDX_IDENT "IntelTDX " +/* + * Used in __tdx_module_call() helper function to gather the + * output registers' values of TDCALL instruction when requesting + * services from the TDX module. This is software only structure + * and not related to TDX module/VMM. + */ +struct tdx_module_output { + u64 rcx; + u64 rdx; + u64 r8; + u64 r9; + u64 r10; + u64 r11; +}; + +/* + * Used in __tdx_hypercall() helper function to gather the + * output registers' values of TDCALL instruction when requesting + * services from the VMM. This is software only structure + * and not related to TDX module/VMM. + */ +struct tdx_hypercall_output { + u64 r10; + u64 r11; + u64 r12; + u64 r13; + u64 r14; + u64 r15; +}; + static bool tdx_guest_detected; +/* Helper function used to communicate with the TDX module */ +u64 __tdx_module_call(u64 fn, u64 rcx, u64 rdx, u64 r8, u64 r9, + struct tdx_module_output *out); + +/* Helper function used to request services from VMM */ +u64 __tdx_hypercall(u64 type, u64 fn, u64 r12, u64 r13, u64 r14, + u64 r15, struct tdx_hypercall_output *out); + void early_tdx_detect(void) { u32 eax, sig[3]; @@ -28,3 +70,28 @@ bool early_is_tdx_guest(void) { return tdx_guest_detected; } + +#define TDACCEPTPAGE 6 +#define TDVMCALL_MAP_GPA 0x10001 + +void tdx_accept_memory(phys_addr_t start, phys_addr_t end) +{ + struct tdx_hypercall_output outl = {0}; + int i; + + if (__tdx_hypercall(TDX_HYPERCALL_STANDARD, TDVMCALL_MAP_GPA, + start, end, 0, 0, &outl)) { + error("Cannot accept memory: MapGPA failed\n"); + } + + /* + * For shared->private conversion, accept the page using TDACCEPTPAGE + * TDX module call. + */ + for (i = 0; i < (end - start) / PAGE_SIZE; i++) { + if (__tdx_module_call(TDACCEPTPAGE, start + i * PAGE_SIZE, + 0, 0, 0, NULL)) { + error("Cannot accept memory: page accept failed\n"); + } + } +} diff --git a/arch/x86/boot/compressed/unaccepted_memory.c b/arch/x86/boot/compressed/unaccepted_memory.c index b6caca4d3d22..c23526c25e50 100644 --- a/arch/x86/boot/compressed/unaccepted_memory.c +++ b/arch/x86/boot/compressed/unaccepted_memory.c @@ -2,11 +2,15 @@ #include "error.h" #include "misc.h" +#include "tdx.h" static inline void __accept_memory(phys_addr_t start, phys_addr_t end) { /* Platform-specific memory-acceptance call goes here */ - error("Cannot accept memory"); + if (early_is_tdx_guest()) + tdx_accept_memory(start, end); + else + error("Cannot accept memory"); } void mark_unaccepted(struct boot_params *params, u64 start, u64 end) @@ -18,6 +22,9 @@ void mark_unaccepted(struct boot_params *params, u64 start, u64 end) * *marked* as unaccepted. */ + /* __accept_memory() needs to know if kernel runs in TDX environment */ + early_tdx_detect(); + /* Immediately accept whole range if it is within a PMD_SIZE block: */ if ((start & PMD_MASK) == (end & PMD_MASK)) { __accept_memory(start, end); diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index 6d901cb6d607..fbbe4644cc7b 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -90,6 +90,8 @@ phys_addr_t tdx_shared_mask(void); int tdx_hcall_request_gpa_type(phys_addr_t start, phys_addr_t end, enum tdx_map_type map_type); +extern void tdx_accept_memory(phys_addr_t start, phys_addr_t end); + #else static inline void tdx_early_init(void) { }; diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c index 0f8f7285c05b..a0ff720425d8 100644 --- a/arch/x86/kernel/tdx.c +++ b/arch/x86/kernel/tdx.c @@ -162,6 +162,13 @@ int tdx_hcall_request_gpa_type(phys_addr_t start, phys_addr_t end, return 0; } +void tdx_accept_memory(phys_addr_t start, phys_addr_t end) +{ + if (tdx_hcall_request_gpa_type(start, end, TDX_MAP_PRIVATE)) { + panic("Accepting memory failed\n"); + } +} + static u64 __cpuidle _tdx_halt(const bool irq_disabled, const bool do_sti) { /* diff --git a/arch/x86/mm/unaccepted_memory.c b/arch/x86/mm/unaccepted_memory.c index 984eaead0b11..9f468d58d51f 100644 --- a/arch/x86/mm/unaccepted_memory.c +++ b/arch/x86/mm/unaccepted_memory.c @@ -5,6 +5,7 @@ #include #include +#include #include static DEFINE_SPINLOCK(unaccepted_memory_lock); @@ -21,7 +22,10 @@ static void __accept_memory(phys_addr_t start, phys_addr_t end) start / PMD_SIZE, DIV_ROUND_UP(end, PMD_SIZE)) { /* Platform-specific memory-acceptance call goes here */ - panic("Cannot accept memory"); + if (cc_platform_has(CC_ATTR_GUEST_TDX)) + tdx_accept_memory(rs * PMD_SIZE, re * PMD_SIZE); + else + panic("Cannot accept memory"); bitmap_clear(unaccepted_memory, rs, re - rs); } }