diff mbox series

[v1,15/26] x86/sev: Introduce snp leaked pages list

Message ID 20231230161954.569267-16-michael.roth@amd.com (mailing list archive)
State Not Applicable
Delegated to: Herbert Xu
Headers show
Series Add AMD Secure Nested Paging (SEV-SNP) Initialization Support | expand

Commit Message

Michael Roth Dec. 30, 2023, 4:19 p.m. UTC
From: Ashish Kalra <ashish.kalra@amd.com>

Pages are unsafe to be released back to the page-allocator, if they
have been transitioned to firmware/guest state and can't be reclaimed
or transitioned back to hypervisor/shared state. In this case add
them to an internal leaked pages list to ensure that they are not freed
or touched/accessed to cause fatal page faults.

Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
[mdr: relocate to arch/x86/virt/svm/sev.c]
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/include/asm/sev.h |  2 ++
 arch/x86/virt/svm/sev.c    | 35 +++++++++++++++++++++++++++++++++++
 2 files changed, 37 insertions(+)

Comments

Vlastimil Babka Jan. 8, 2024, 10:45 a.m. UTC | #1
On 12/30/23 17:19, Michael Roth wrote:
> From: Ashish Kalra <ashish.kalra@amd.com>
> 
> Pages are unsafe to be released back to the page-allocator, if they
> have been transitioned to firmware/guest state and can't be reclaimed
> or transitioned back to hypervisor/shared state. In this case add
> them to an internal leaked pages list to ensure that they are not freed
> or touched/accessed to cause fatal page faults.
> 
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> [mdr: relocate to arch/x86/virt/svm/sev.c]
> Signed-off-by: Michael Roth <michael.roth@amd.com>

Hi, sorry I didn't respond in time to the last mail discussing previous
version in
https://lore.kernel.org/all/8c1fd8da-912a-a9ce-9547-107ba8a450fc@amd.com/
due to upcoming holidays.

I would rather avoid the approach of allocating container objects:
- it's allocating memory when effectively losing memory, a dangerous thing
- are all the callers and their context ok with GFP_KERNEL?
- GFP_KERNEL_ACCOUNT seems wrong, why would we be charging this to the
current process, it's probably not its fault the pages are leaked? Also the
charging can fail?
- given the benefit of having leaked pages on a list is basically just
debugging (i.e. crash dump or drgn inspection) this seems too heavy

I think it would be better and sufficient to use page->lru for order-0 and
head pages, and simply skip tail pages (possibly with adjusted warning
message for that case).

Vlastimil

<snip>

> +
> +void snp_leak_pages(u64 pfn, unsigned int npages)
> +{
> +	struct page *page = pfn_to_page(pfn);
> +	struct leaked_page *leak;
> +
> +	pr_debug("%s: leaking PFN range 0x%llx-0x%llx\n", __func__, pfn, pfn + npages);
> +
> +	spin_lock(&snp_leaked_pages_list_lock);
> +	while (npages--) {
> +		leak = kzalloc(sizeof(*leak), GFP_KERNEL_ACCOUNT);
> +		if (!leak)
> +			goto unlock;

Should we skip the dump_rmpentry() in such a case?

> +		leak->page = page;
> +		list_add_tail(&leak->list, &snp_leaked_pages_list);
> +		dump_rmpentry(pfn);
> +		snp_nr_leaked_pages++;
> +		pfn++;
> +		page++;
> +	}
> +unlock:
> +	spin_unlock(&snp_leaked_pages_list_lock);
> +}
> +EXPORT_SYMBOL_GPL(snp_leak_pages);
Kalra, Ashish Jan. 9, 2024, 10:19 p.m. UTC | #2
Hello Vlastimil,

On 1/8/2024 4:45 AM, Vlastimil Babka wrote:
> On 12/30/23 17:19, Michael Roth wrote:
>> From: Ashish Kalra <ashish.kalra@amd.com>
>>
>> Pages are unsafe to be released back to the page-allocator, if they
>> have been transitioned to firmware/guest state and can't be reclaimed
>> or transitioned back to hypervisor/shared state. In this case add
>> them to an internal leaked pages list to ensure that they are not freed
>> or touched/accessed to cause fatal page faults.
>>
>> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
>> [mdr: relocate to arch/x86/virt/svm/sev.c]
>> Signed-off-by: Michael Roth <michael.roth@amd.com>
> Hi, sorry I didn't respond in time to the last mail discussing previous
> version in
> https://lore.kernel.org/all/8c1fd8da-912a-a9ce-9547-107ba8a450fc@amd.com/
> due to upcoming holidays.
>
> I would rather avoid the approach of allocating container objects:
> - it's allocating memory when effectively losing memory, a dangerous thing
> - are all the callers and their context ok with GFP_KERNEL?
> - GFP_KERNEL_ACCOUNT seems wrong, why would we be charging this to the
> current process, it's probably not its fault the pages are leaked? Also the
> charging can fail?
> - given the benefit of having leaked pages on a list is basically just
> debugging (i.e. crash dump or drgn inspection) this seems too heavy
>
> I think it would be better and sufficient to use page->lru for order-0 and
> head pages, and simply skip tail pages (possibly with adjusted warning
> message for that case).
>
> Vlastimil
>
> <snip

Considering the above thoughts, this is updated version of 
snp_leak_pages(), looking forward to any review comments/feedback you 
have on the same:

void snp_leak_pages(u64 pfn, unsigned int npages)
{
         struct page *page = pfn_to_page(pfn);

         pr_debug("%s: leaking PFN range 0x%llx-0x%llx\n", __func__, 
pfn, pfn + npages);

         spin_lock(&snp_leaked_pages_list_lock);
         while (npages--) {
                 /*
                  * Reuse the page's buddy list for chaining into the leaked
                  * pages list. This page should not be on a free list 
currently
                  * and is also unsafe to be added to a free list.
                  */
                 if ((likely(!PageCompound(page))) || (PageCompound(page) &&
                     !PageTail(page) && compound_head(page) == page))
                         /*
                          * Skip inserting tail pages of compound page as
                          * page->buddy_list of tail pages is not usable.
                          */
                         list_add_tail(&page->buddy_list, 
&snp_leaked_pages_list);
                 sev_dump_rmpentry(pfn);
                 snp_nr_leaked_pages++;
                 pfn++;
                 page++;
         }
         spin_unlock(&snp_leaked_pages_list_lock);
}

Thanks, Ashish
Vlastimil Babka Jan. 10, 2024, 8:59 a.m. UTC | #3
On 1/9/24 23:19, Kalra, Ashish wrote:
> Hello Vlastimil,
> 
> On 1/8/2024 4:45 AM, Vlastimil Babka wrote:
>> On 12/30/23 17:19, Michael Roth wrote:
>>> From: Ashish Kalra <ashish.kalra@amd.com>
>>>
>>> Pages are unsafe to be released back to the page-allocator, if they
>>> have been transitioned to firmware/guest state and can't be reclaimed
>>> or transitioned back to hypervisor/shared state. In this case add
>>> them to an internal leaked pages list to ensure that they are not freed
>>> or touched/accessed to cause fatal page faults.
>>>
>>> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
>>> [mdr: relocate to arch/x86/virt/svm/sev.c]
>>> Signed-off-by: Michael Roth <michael.roth@amd.com>
>> Hi, sorry I didn't respond in time to the last mail discussing previous
>> version in
>> https://lore.kernel.org/all/8c1fd8da-912a-a9ce-9547-107ba8a450fc@amd.com/
>> due to upcoming holidays.
>>
>> I would rather avoid the approach of allocating container objects:
>> - it's allocating memory when effectively losing memory, a dangerous thing
>> - are all the callers and their context ok with GFP_KERNEL?
>> - GFP_KERNEL_ACCOUNT seems wrong, why would we be charging this to the
>> current process, it's probably not its fault the pages are leaked? Also the
>> charging can fail?
>> - given the benefit of having leaked pages on a list is basically just
>> debugging (i.e. crash dump or drgn inspection) this seems too heavy
>>
>> I think it would be better and sufficient to use page->lru for order-0 and
>> head pages, and simply skip tail pages (possibly with adjusted warning
>> message for that case).
>>
>> Vlastimil
>>
>> <snip
> 
> Considering the above thoughts, this is updated version of 
> snp_leak_pages(), looking forward to any review comments/feedback you 
> have on the same:
> 
> void snp_leak_pages(u64 pfn, unsigned int npages)
> {
>          struct page *page = pfn_to_page(pfn);
> 
>          pr_debug("%s: leaking PFN range 0x%llx-0x%llx\n", __func__, 
> pfn, pfn + npages);
> 
>          spin_lock(&snp_leaked_pages_list_lock);
>          while (npages--) {
>                  /*
>                   * Reuse the page's buddy list for chaining into the leaked
>                   * pages list. This page should not be on a free list 
> currently
>                   * and is also unsafe to be added to a free list.
>                   */
>                  if ((likely(!PageCompound(page))) || (PageCompound(page) &&
>                      !PageTail(page) && compound_head(page) == page))

This is unnecessarily paranoid wrt that compound_head(page) test, but OTOH
doesn't handle the weird case when we're leaking less than whole compound
page (if that can even happen). So I'd suggest:

while (npages) {

  if ((likely(!PageCompound(page))) || (PageHead(page) && compound_nr(page)
<= npages))
	list_add_tail(&page->buddy_list, ...)
  }

  ... (no change from yours)

  npages--;
}

(or an equivalent for()) perhaps

>                          /*
>                           * Skip inserting tail pages of compound page as
>                           * page->buddy_list of tail pages is not usable.
>                           */
>                          list_add_tail(&page->buddy_list, 
> &snp_leaked_pages_list);
>                  sev_dump_rmpentry(pfn);
>                  snp_nr_leaked_pages++;
>                  pfn++;
>                  page++;
>          }
>          spin_unlock(&snp_leaked_pages_list_lock);
> }
> 
> Thanks, Ashish
>
diff mbox series

Patch

diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index d3ccb7a0c7e9..435ba9bc4510 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -264,6 +264,7 @@  void snp_dump_hva_rmpentry(unsigned long address);
 int psmash(u64 pfn);
 int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid, bool immutable);
 int rmp_make_shared(u64 pfn, enum pg_level level);
+void snp_leak_pages(u64 pfn, unsigned int npages);
 #else
 static inline bool snp_probe_rmptable_info(void) { return false; }
 static inline int snp_lookup_rmpentry(u64 pfn, bool *assigned, int *level) { return -ENODEV; }
@@ -275,6 +276,7 @@  static inline int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int as
 	return -ENODEV;
 }
 static inline int rmp_make_shared(u64 pfn, enum pg_level level) { return -ENODEV; }
+static inline void snp_leak_pages(u64 pfn, unsigned int npages) {}
 #endif
 
 #endif
diff --git a/arch/x86/virt/svm/sev.c b/arch/x86/virt/svm/sev.c
index ee182351d93a..0f2e1ce241b5 100644
--- a/arch/x86/virt/svm/sev.c
+++ b/arch/x86/virt/svm/sev.c
@@ -60,6 +60,17 @@  static u64 probed_rmp_base, probed_rmp_size;
 static struct rmpentry *rmptable __ro_after_init;
 static u64 rmptable_max_pfn __ro_after_init;
 
+/* List of pages which are leaked and cannot be reclaimed */
+struct leaked_page {
+	struct page *page;
+	struct list_head list;
+};
+
+static LIST_HEAD(snp_leaked_pages_list);
+static DEFINE_SPINLOCK(snp_leaked_pages_list_lock);
+
+static unsigned long snp_nr_leaked_pages;
+
 #undef pr_fmt
 #define pr_fmt(fmt)	"SEV-SNP: " fmt
 
@@ -476,3 +487,27 @@  int rmp_make_shared(u64 pfn, enum pg_level level)
 	return rmpupdate(pfn, &state);
 }
 EXPORT_SYMBOL_GPL(rmp_make_shared);
+
+void snp_leak_pages(u64 pfn, unsigned int npages)
+{
+	struct page *page = pfn_to_page(pfn);
+	struct leaked_page *leak;
+
+	pr_debug("%s: leaking PFN range 0x%llx-0x%llx\n", __func__, pfn, pfn + npages);
+
+	spin_lock(&snp_leaked_pages_list_lock);
+	while (npages--) {
+		leak = kzalloc(sizeof(*leak), GFP_KERNEL_ACCOUNT);
+		if (!leak)
+			goto unlock;
+		leak->page = page;
+		list_add_tail(&leak->list, &snp_leaked_pages_list);
+		dump_rmpentry(pfn);
+		snp_nr_leaked_pages++;
+		pfn++;
+		page++;
+	}
+unlock:
+	spin_unlock(&snp_leaked_pages_list_lock);
+}
+EXPORT_SYMBOL_GPL(snp_leak_pages);