[v1,15/26] x86/sev: Introduce snp leaked pages list

Message ID	20231230161954.569267-16-michael.roth@amd.com (mailing list archive)
State	New, archived
Headers	show Received: from NAM12-MW2-obe.outbound.protection.outlook.com (mail-mw2nam12on2087.outbound.protection.outlook.com [40.107.244.87]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9E4661428F; Sat, 30 Dec 2023 16:23:54 +0000 (UTC) Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C From: Michael Roth <michael.roth@amd.com> To: <x86@kernel.org> CC: <kvm@vger.kernel.org>, <linux-coco@lists.linux.dev>, <linux-mm@kvack.org>, <linux-crypto@vger.kernel.org>, <linux-kernel@vger.kernel.org>, <tglx@linutronix.de>, <mingo@redhat.com>, <jroedel@suse.de>, <thomas.lendacky@amd.com>, <hpa@zytor.com>, <ardb@kernel.org>, <pbonzini@redhat.com>, <seanjc@google.com>, <vkuznets@redhat.com>, <jmattson@google.com>, <luto@kernel.org>, <dave.hansen@linux.intel.com>, <slp@redhat.com>, <pgonda@google.com>, <peterz@infradead.org>, <srinivas.pandruvada@linux.intel.com>, <rientjes@google.com>, <tobin@ibm.com>, <bp@alien8.de>, <vbabka@suse.cz>, <kirill@shutemov.name>, <ak@linux.intel.com>, <tony.luck@intel.com>, <sathyanarayanan.kuppuswamy@linux.intel.com>, <alpergun@google.com>, <jarkko@kernel.org>, <ashish.kalra@amd.com>, <nikunj.dadhania@amd.com>, <pankaj.gupta@amd.com>, <liam.merwick@oracle.com>, <zhi.a.wang@intel.com> Subject: [PATCH v1 15/26] x86/sev: Introduce snp leaked pages list Date: Sat, 30 Dec 2023 10:19:43 -0600 Message-ID: <20231230161954.569267-16-michael.roth@amd.com> In-Reply-To: <20231230161954.569267-1-michael.roth@amd.com> References: <20231230161954.569267-1-michael.roth@amd.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain
Series	Add AMD Secure Nested Paging (SEV-SNP) Initialization Support \| expand [v1,00/26] Add AMD Secure Nested Paging (SEV-SNP) Initialization Support [v1,01/26] x86/cpufeatures: Add SEV-SNP CPU feature [v1,02/26] x86/speculation: Do not enable Automatic IBRS if SEV SNP is enabled [v1,03/26] iommu/amd: Don't rely on external callers to enable IOMMU SNP support [v1,04/26] x86/sev: Add the host SEV-SNP initialization support [v1,05/26] x86/mtrr: Don't print errors if MtrrFixDramModEn is set when SNP enabled [v1,06/26] x86/sev: Add RMP entry lookup helpers [v1,07/26] x86/fault: Add helper for dumping RMP entries [v1,08/26] x86/traps: Define RMP violation #PF error code [v1,09/26] x86/fault: Dump RMP table information when RMP page faults occur [v1,10/26] x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction [v1,11/26] x86/sev: Invalidate pages from the direct map when adding them to the RMP table [v1,12/26] crypto: ccp: Define the SEV-SNP commands [v1,13/26] crypto: ccp: Add support to initialize the AMD-SP for SEV-SNP [v1,14/26] crypto: ccp: Provide API to issue SEV and SNP commands [v1,15/26] x86/sev: Introduce snp leaked pages list [v1,16/26] crypto: ccp: Handle the legacy TMR allocation when SNP is enabled [v1,17/26] crypto: ccp: Handle non-volatile INIT_EX data when SNP is enabled [v1,18/26] crypto: ccp: Handle legacy SEV commands when SNP is enabled [v1,19/26] iommu/amd: Clean up RMP entries for IOMMU pages during SNP shutdown [v1,20/26] crypto: ccp: Add debug support for decrypting pages [v1,21/26] crypto: ccp: Add panic notifier for SEV/SNP firmware shutdown on kdump [v1,22/26] KVM: SEV: Make AVIC backing, VMSA and VMCB memory allocation SNP safe [v1,23/26] x86/cpufeatures: Enable/unmask SEV-SNP CPU feature [v1,24/26] crypto: ccp: Add the SNP_PLATFORM_STATUS command [v1,25/26] crypto: ccp: Add the SNP_COMMIT command [v1,26/26] crypto: ccp: Add the SNP_SET_CONFIG command

Message ID

20231230161954.569267-16-michael.roth@amd.com (mailing list archive)

State

New, archived

Headers

Received-SPF: Pass (protection.outlook.com: domain of amd.com designates
 165.204.84.17 as permitted sender) receiver=protection.outlook.com;
 client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C
From: Michael Roth <michael.roth@amd.com>
To: <x86@kernel.org>
CC: <kvm@vger.kernel.org>, <linux-coco@lists.linux.dev>, <linux-mm@kvack.org>,
	<linux-crypto@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
	<tglx@linutronix.de>, <mingo@redhat.com>, <jroedel@suse.de>,
	<thomas.lendacky@amd.com>, <hpa@zytor.com>, <ardb@kernel.org>,
	<pbonzini@redhat.com>, <seanjc@google.com>, <vkuznets@redhat.com>,
	<jmattson@google.com>, <luto@kernel.org>, <dave.hansen@linux.intel.com>,
	<slp@redhat.com>, <pgonda@google.com>, <peterz@infradead.org>,
	<srinivas.pandruvada@linux.intel.com>, <rientjes@google.com>,
	<tobin@ibm.com>, <bp@alien8.de>, <vbabka@suse.cz>, <kirill@shutemov.name>,
	<ak@linux.intel.com>, <tony.luck@intel.com>,
	<sathyanarayanan.kuppuswamy@linux.intel.com>, <alpergun@google.com>,
	<jarkko@kernel.org>, <ashish.kalra@amd.com>, <nikunj.dadhania@amd.com>,
	<pankaj.gupta@amd.com>, <liam.merwick@oracle.com>, <zhi.a.wang@intel.com>
Subject: [PATCH v1 15/26] x86/sev: Introduce snp leaked pages list
Date: Sat, 30 Dec 2023 10:19:43 -0600
Message-ID: <20231230161954.569267-16-michael.roth@amd.com>
In-Reply-To: <20231230161954.569267-1-michael.roth@amd.com>
References: <20231230161954.569267-1-michael.roth@amd.com>
Precedence: bulk
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Content-Type: text/plain
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 30 Dec 2023 16:23:51.3331
 (UTC)
X-MS-Exchange-CrossTenant-Network-Message-Id: 
 09363be5-f9e1-44a1-c6a8-08dc0953b5a3
X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d
X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: 
 TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com]
X-MS-Exchange-CrossTenant-AuthSource: 
	DS3PEPF000099E2.namprd04.prod.outlook.com
X-MS-Exchange-CrossTenant-AuthAs: Anonymous
X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem
X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY8PR12MB8411

Series

Add AMD Secure Nested Paging (SEV-SNP) Initialization Support | expand

Commit Message

Michael Roth Dec. 30, 2023, 4:19 p.m. UTC

From: Ashish Kalra <ashish.kalra@amd.com>

Pages are unsafe to be released back to the page-allocator, if they
have been transitioned to firmware/guest state and can't be reclaimed
or transitioned back to hypervisor/shared state. In this case add
them to an internal leaked pages list to ensure that they are not freed
or touched/accessed to cause fatal page faults.

Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
[mdr: relocate to arch/x86/virt/svm/sev.c]
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/include/asm/sev.h |  2 ++
 arch/x86/virt/svm/sev.c    | 35 +++++++++++++++++++++++++++++++++++
 2 files changed, 37 insertions(+)

Comments

Vlastimil Babka Jan. 8, 2024, 10:45 a.m. UTC | #1

On 12/30/23 17:19, Michael Roth wrote:
> From: Ashish Kalra <ashish.kalra@amd.com>
> 
> Pages are unsafe to be released back to the page-allocator, if they
> have been transitioned to firmware/guest state and can't be reclaimed
> or transitioned back to hypervisor/shared state. In this case add
> them to an internal leaked pages list to ensure that they are not freed
> or touched/accessed to cause fatal page faults.
> 
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> [mdr: relocate to arch/x86/virt/svm/sev.c]
> Signed-off-by: Michael Roth <michael.roth@amd.com>

Hi, sorry I didn't respond in time to the last mail discussing previous
version in
https://lore.kernel.org/all/8c1fd8da-912a-a9ce-9547-107ba8a450fc@amd.com/
due to upcoming holidays.

I would rather avoid the approach of allocating container objects:
- it's allocating memory when effectively losing memory, a dangerous thing
- are all the callers and their context ok with GFP_KERNEL?
- GFP_KERNEL_ACCOUNT seems wrong, why would we be charging this to the
current process, it's probably not its fault the pages are leaked? Also the
charging can fail?
- given the benefit of having leaked pages on a list is basically just
debugging (i.e. crash dump or drgn inspection) this seems too heavy

I think it would be better and sufficient to use page->lru for order-0 and
head pages, and simply skip tail pages (possibly with adjusted warning
message for that case).

Vlastimil

<snip>

> +
> +void snp_leak_pages(u64 pfn, unsigned int npages)
> +{
> +	struct page *page = pfn_to_page(pfn);
> +	struct leaked_page *leak;
> +
> +	pr_debug("%s: leaking PFN range 0x%llx-0x%llx\n", __func__, pfn, pfn + npages);
> +
> +	spin_lock(&snp_leaked_pages_list_lock);
> +	while (npages--) {
> +		leak = kzalloc(sizeof(*leak), GFP_KERNEL_ACCOUNT);
> +		if (!leak)
> +			goto unlock;

Should we skip the dump_rmpentry() in such a case?

> +		leak->page = page;
> +		list_add_tail(&leak->list, &snp_leaked_pages_list);
> +		dump_rmpentry(pfn);
> +		snp_nr_leaked_pages++;
> +		pfn++;
> +		page++;
> +	}
> +unlock:
> +	spin_unlock(&snp_leaked_pages_list_lock);
> +}
> +EXPORT_SYMBOL_GPL(snp_leak_pages);

Kalra, Ashish Jan. 9, 2024, 10:19 p.m. UTC | #2

Hello Vlastimil,

On 1/8/2024 4:45 AM, Vlastimil Babka wrote:
> On 12/30/23 17:19, Michael Roth wrote:
>> From: Ashish Kalra <ashish.kalra@amd.com>
>>
>> Pages are unsafe to be released back to the page-allocator, if they
>> have been transitioned to firmware/guest state and can't be reclaimed
>> or transitioned back to hypervisor/shared state. In this case add
>> them to an internal leaked pages list to ensure that they are not freed
>> or touched/accessed to cause fatal page faults.
>>
>> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
>> [mdr: relocate to arch/x86/virt/svm/sev.c]
>> Signed-off-by: Michael Roth <michael.roth@amd.com>
> Hi, sorry I didn't respond in time to the last mail discussing previous
> version in
> https://lore.kernel.org/all/8c1fd8da-912a-a9ce-9547-107ba8a450fc@amd.com/
> due to upcoming holidays.
>
> I would rather avoid the approach of allocating container objects:
> - it's allocating memory when effectively losing memory, a dangerous thing
> - are all the callers and their context ok with GFP_KERNEL?
> - GFP_KERNEL_ACCOUNT seems wrong, why would we be charging this to the
> current process, it's probably not its fault the pages are leaked? Also the
> charging can fail?
> - given the benefit of having leaked pages on a list is basically just
> debugging (i.e. crash dump or drgn inspection) this seems too heavy
>
> I think it would be better and sufficient to use page->lru for order-0 and
> head pages, and simply skip tail pages (possibly with adjusted warning
> message for that case).
>
> Vlastimil
>
> <snip

Considering the above thoughts, this is updated version of 
snp_leak_pages(), looking forward to any review comments/feedback you 
have on the same:

void snp_leak_pages(u64 pfn, unsigned int npages)
{
         struct page *page = pfn_to_page(pfn);

         pr_debug("%s: leaking PFN range 0x%llx-0x%llx\n", __func__, 
pfn, pfn + npages);

         spin_lock(&snp_leaked_pages_list_lock);
         while (npages--) {
                 /*
                  * Reuse the page's buddy list for chaining into the leaked
                  * pages list. This page should not be on a free list 
currently
                  * and is also unsafe to be added to a free list.
                  */
                 if ((likely(!PageCompound(page))) || (PageCompound(page) &&
                     !PageTail(page) && compound_head(page) == page))
                         /*
                          * Skip inserting tail pages of compound page as
                          * page->buddy_list of tail pages is not usable.
                          */
                         list_add_tail(&page->buddy_list, 
&snp_leaked_pages_list);
                 sev_dump_rmpentry(pfn);
                 snp_nr_leaked_pages++;
                 pfn++;
                 page++;
         }
         spin_unlock(&snp_leaked_pages_list_lock);
}

Thanks, Ashish

Vlastimil Babka Jan. 10, 2024, 8:59 a.m. UTC | #3

On 1/9/24 23:19, Kalra, Ashish wrote:
> Hello Vlastimil,
> 
> On 1/8/2024 4:45 AM, Vlastimil Babka wrote:
>> On 12/30/23 17:19, Michael Roth wrote:
>>> From: Ashish Kalra <ashish.kalra@amd.com>
>>>
>>> Pages are unsafe to be released back to the page-allocator, if they
>>> have been transitioned to firmware/guest state and can't be reclaimed
>>> or transitioned back to hypervisor/shared state. In this case add
>>> them to an internal leaked pages list to ensure that they are not freed
>>> or touched/accessed to cause fatal page faults.
>>>
>>> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
>>> [mdr: relocate to arch/x86/virt/svm/sev.c]
>>> Signed-off-by: Michael Roth <michael.roth@amd.com>
>> Hi, sorry I didn't respond in time to the last mail discussing previous
>> version in
>> https://lore.kernel.org/all/8c1fd8da-912a-a9ce-9547-107ba8a450fc@amd.com/
>> due to upcoming holidays.
>>
>> I would rather avoid the approach of allocating container objects:
>> - it's allocating memory when effectively losing memory, a dangerous thing
>> - are all the callers and their context ok with GFP_KERNEL?
>> - GFP_KERNEL_ACCOUNT seems wrong, why would we be charging this to the
>> current process, it's probably not its fault the pages are leaked? Also the
>> charging can fail?
>> - given the benefit of having leaked pages on a list is basically just
>> debugging (i.e. crash dump or drgn inspection) this seems too heavy
>>
>> I think it would be better and sufficient to use page->lru for order-0 and
>> head pages, and simply skip tail pages (possibly with adjusted warning
>> message for that case).
>>
>> Vlastimil
>>
>> <snip
> 
> Considering the above thoughts, this is updated version of 
> snp_leak_pages(), looking forward to any review comments/feedback you 
> have on the same:
> 
> void snp_leak_pages(u64 pfn, unsigned int npages)
> {
>          struct page *page = pfn_to_page(pfn);
> 
>          pr_debug("%s: leaking PFN range 0x%llx-0x%llx\n", __func__, 
> pfn, pfn + npages);
> 
>          spin_lock(&snp_leaked_pages_list_lock);
>          while (npages--) {
>                  /*
>                   * Reuse the page's buddy list for chaining into the leaked
>                   * pages list. This page should not be on a free list 
> currently
>                   * and is also unsafe to be added to a free list.
>                   */
>                  if ((likely(!PageCompound(page))) || (PageCompound(page) &&
>                      !PageTail(page) && compound_head(page) == page))

This is unnecessarily paranoid wrt that compound_head(page) test, but OTOH
doesn't handle the weird case when we're leaking less than whole compound
page (if that can even happen). So I'd suggest:

while (npages) {

  if ((likely(!PageCompound(page))) || (PageHead(page) && compound_nr(page)
<= npages))
	list_add_tail(&page->buddy_list, ...)
  }

  ... (no change from yours)

  npages--;
}

(or an equivalent for()) perhaps

>                          /*
>                           * Skip inserting tail pages of compound page as
>                           * page->buddy_list of tail pages is not usable.
>                           */
>                          list_add_tail(&page->buddy_list, 
> &snp_leaked_pages_list);
>                  sev_dump_rmpentry(pfn);
>                  snp_nr_leaked_pages++;
>                  pfn++;
>                  page++;
>          }
>          spin_unlock(&snp_leaked_pages_list_lock);
> }
> 
> Thanks, Ashish
>

diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index d3ccb7a0c7e9..435ba9bc4510 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -264,6 +264,7 @@  void snp_dump_hva_rmpentry(unsigned long address);
 int psmash(u64 pfn);
 int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid, bool immutable);
 int rmp_make_shared(u64 pfn, enum pg_level level);
+void snp_leak_pages(u64 pfn, unsigned int npages);
 #else
 static inline bool snp_probe_rmptable_info(void) { return false; }
 static inline int snp_lookup_rmpentry(u64 pfn, bool *assigned, int *level) { return -ENODEV; }
@@ -275,6 +276,7 @@  static inline int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int as
 	return -ENODEV;
 }
 static inline int rmp_make_shared(u64 pfn, enum pg_level level) { return -ENODEV; }
+static inline void snp_leak_pages(u64 pfn, unsigned int npages) {}
 #endif
 
 #endif
diff --git a/arch/x86/virt/svm/sev.c b/arch/x86/virt/svm/sev.c
index ee182351d93a..0f2e1ce241b5 100644
--- a/arch/x86/virt/svm/sev.c
+++ b/arch/x86/virt/svm/sev.c
@@ -60,6 +60,17 @@  static u64 probed_rmp_base, probed_rmp_size;
 static struct rmpentry *rmptable __ro_after_init;
 static u64 rmptable_max_pfn __ro_after_init;
 
+/* List of pages which are leaked and cannot be reclaimed */
+struct leaked_page {
+	struct page *page;
+	struct list_head list;
+};
+
+static LIST_HEAD(snp_leaked_pages_list);
+static DEFINE_SPINLOCK(snp_leaked_pages_list_lock);
+
+static unsigned long snp_nr_leaked_pages;
+
 #undef pr_fmt
 #define pr_fmt(fmt)	"SEV-SNP: " fmt
 
@@ -476,3 +487,27 @@  int rmp_make_shared(u64 pfn, enum pg_level level)
 	return rmpupdate(pfn, &state);
 }
 EXPORT_SYMBOL_GPL(rmp_make_shared);
+
+void snp_leak_pages(u64 pfn, unsigned int npages)
+{
+	struct page *page = pfn_to_page(pfn);
+	struct leaked_page *leak;
+
+	pr_debug("%s: leaking PFN range 0x%llx-0x%llx\n", __func__, pfn, pfn + npages);
+
+	spin_lock(&snp_leaked_pages_list_lock);
+	while (npages--) {
+		leak = kzalloc(sizeof(*leak), GFP_KERNEL_ACCOUNT);
+		if (!leak)
+			goto unlock;
+		leak->page = page;
+		list_add_tail(&leak->list, &snp_leaked_pages_list);
+		dump_rmpentry(pfn);
+		snp_nr_leaked_pages++;
+		pfn++;
+		page++;
+	}
+unlock:
+	spin_unlock(&snp_leaked_pages_list_lock);
+}
+EXPORT_SYMBOL_GPL(snp_leak_pages);

[v1,15/26] x86/sev: Introduce snp leaked pages list

Commit Message

Comments

Patch