diff mbox series

[v3,03/14] KVM: s390: pv: leak the ASCE page when destroy fails

Message ID 20210804154046.88552-4-imbrenda@linux.ibm.com (mailing list archive)
State New, archived
Headers show
Series KVM: s390: pv: implement lazy destroy | expand

Commit Message

Claudio Imbrenda Aug. 4, 2021, 3:40 p.m. UTC
When a protected VM is created, the topmost level of page tables of its
ASCE is marked by the Ultravisor; any attempt to use that memory for
protected virtualization will result in failure.

Only a successful Destroy Configuration UVC will remove the marking.

When the Destroy Configuration UVC fails, the topmost level of page
tables of the VM does not get its marking cleared; to avoid issues it
must not be used again.

Since the page becomes in practice unusable, we set it aside and leak it.

Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
---
 arch/s390/include/asm/gmap.h |  2 ++
 arch/s390/kvm/pv.c           |  4 ++-
 arch/s390/mm/gmap.c          | 55 ++++++++++++++++++++++++++++++++++++
 3 files changed, 60 insertions(+), 1 deletion(-)

Comments

David Hildenbrand Aug. 6, 2021, 7:31 a.m. UTC | #1
On 04.08.21 17:40, Claudio Imbrenda wrote:
> When a protected VM is created, the topmost level of page tables of its
> ASCE is marked by the Ultravisor; any attempt to use that memory for
> protected virtualization will result in failure.
> 
> Only a successful Destroy Configuration UVC will remove the marking.
> 
> When the Destroy Configuration UVC fails, the topmost level of page
> tables of the VM does not get its marking cleared; to avoid issues it
> must not be used again.
> 
> Since the page becomes in practice unusable, we set it aside and leak it.

Instead of leaking, can't we add it to some list and try again later? Or 
do we only expect permanent errors?

Also, we really should bail out loud (pr_warn) to tell the admin that 
something really nasty is going on.
Claudio Imbrenda Aug. 6, 2021, 9:32 a.m. UTC | #2
On Fri, 6 Aug 2021 09:31:54 +0200
David Hildenbrand <david@redhat.com> wrote:

> On 04.08.21 17:40, Claudio Imbrenda wrote:
> > When a protected VM is created, the topmost level of page tables of
> > its ASCE is marked by the Ultravisor; any attempt to use that
> > memory for protected virtualization will result in failure.
> > 
> > Only a successful Destroy Configuration UVC will remove the marking.
> > 
> > When the Destroy Configuration UVC fails, the topmost level of page
> > tables of the VM does not get its marking cleared; to avoid issues
> > it must not be used again.
> > 
> > Since the page becomes in practice unusable, we set it aside and
> > leak it.  
> 
> Instead of leaking, can't we add it to some list and try again later?
> Or do we only expect permanent errors?

once the secure VM has been destroyed unsuccessfully, there is nothing
that can be done, this is a permanent error

> Also, we really should bail out loud (pr_warn) to tell the admin that 
> something really nasty is going on.

when a destroy secure VM UVC fails, there are already other warnings
printed, no need to add one more
David Hildenbrand Aug. 6, 2021, 11:39 a.m. UTC | #3
On 06.08.21 11:32, Claudio Imbrenda wrote:
> On Fri, 6 Aug 2021 09:31:54 +0200
> David Hildenbrand <david@redhat.com> wrote:
> 
>> On 04.08.21 17:40, Claudio Imbrenda wrote:
>>> When a protected VM is created, the topmost level of page tables of
>>> its ASCE is marked by the Ultravisor; any attempt to use that
>>> memory for protected virtualization will result in failure.
>>>
>>> Only a successful Destroy Configuration UVC will remove the marking.
>>>
>>> When the Destroy Configuration UVC fails, the topmost level of page
>>> tables of the VM does not get its marking cleared; to avoid issues
>>> it must not be used again.
>>>
>>> Since the page becomes in practice unusable, we set it aside and
>>> leak it.
>>
>> Instead of leaking, can't we add it to some list and try again later?
>> Or do we only expect permanent errors?
> 
> once the secure VM has been destroyed unsuccessfully, there is nothing
> that can be done, this is a permanent error
> 
>> Also, we really should bail out loud (pr_warn) to tell the admin that
>> something really nasty is going on.
> 
> when a destroy secure VM UVC fails, there are already other warnings
> printed, no need to add one more
> 

Okay, makes sense then to me, thanks! Might be worth adding some of that 
info to the patch description.
diff mbox series

Patch

diff --git a/arch/s390/include/asm/gmap.h b/arch/s390/include/asm/gmap.h
index 40264f60b0da..746e18bf8984 100644
--- a/arch/s390/include/asm/gmap.h
+++ b/arch/s390/include/asm/gmap.h
@@ -148,4 +148,6 @@  void gmap_sync_dirty_log_pmd(struct gmap *gmap, unsigned long dirty_bitmap[4],
 			     unsigned long gaddr, unsigned long vmaddr);
 int gmap_mark_unmergeable(void);
 void s390_reset_acc(struct mm_struct *mm);
+void s390_remove_old_asce(struct gmap *gmap);
+int s390_replace_asce(struct gmap *gmap);
 #endif /* _ASM_S390_GMAP_H */
diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c
index e007df11a2fe..f3dd5dd00013 100644
--- a/arch/s390/kvm/pv.c
+++ b/arch/s390/kvm/pv.c
@@ -169,9 +169,11 @@  int kvm_s390_pv_deinit_vm(struct kvm *kvm, u16 *rc, u16 *rrc)
 	atomic_set(&kvm->mm->context.is_protected, 0);
 	KVM_UV_EVENT(kvm, 3, "PROTVIRT DESTROY VM: rc %x rrc %x", *rc, *rrc);
 	WARN_ONCE(cc, "protvirt destroy vm failed rc %x rrc %x", *rc, *rrc);
-	/* Inteded memory leak on "impossible" error */
+	/* Intended memory leak on "impossible" error */
 	if (!cc)
 		kvm_s390_pv_dealloc_vm(kvm);
+	else
+		s390_replace_asce(kvm->arch.gmap);
 	return cc ? -EIO : 0;
 }
 
diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
index 9bb2c7512cd5..5a138f6220c4 100644
--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -2706,3 +2706,58 @@  void s390_reset_acc(struct mm_struct *mm)
 	mmput(mm);
 }
 EXPORT_SYMBOL_GPL(s390_reset_acc);
+
+/*
+ * Remove the topmost level of page tables from the list of page tables of
+ * the gmap.
+ * This means that it will not be freed when the VM is torn down, and needs
+ * to be handled separately by the caller, unless an intentional leak is
+ * intended.
+ */
+void s390_remove_old_asce(struct gmap *gmap)
+{
+	struct page *old;
+
+	old = virt_to_page(gmap->table);
+	spin_lock(&gmap->guest_table_lock);
+	list_del(&old->lru);
+	spin_unlock(&gmap->guest_table_lock);
+	/* in case the ASCE needs to be "removed" multiple times */
+	INIT_LIST_HEAD(&old->lru);
+}
+EXPORT_SYMBOL_GPL(s390_remove_old_asce);
+
+/*
+ * Try to replace the current ASCE with another equivalent one.
+ * If the allocation of the new top level page table fails, the ASCE is not
+ * replaced.
+ * In any case, the old ASCE is removed from the list, therefore the caller
+ * has to make sure to save a pointer to it beforehands, unless an
+ * intentional leak is intended.
+ */
+int s390_replace_asce(struct gmap *gmap)
+{
+	unsigned long asce;
+	struct page *page;
+	void *table;
+
+	s390_remove_old_asce(gmap);
+
+	page = alloc_pages(GFP_KERNEL_ACCOUNT, CRST_ALLOC_ORDER);
+	if (!page)
+		return -ENOMEM;
+	table = page_to_virt(page);
+	memcpy(table, gmap->table, 1UL << (CRST_ALLOC_ORDER + PAGE_SHIFT));
+
+	spin_lock(&gmap->guest_table_lock);
+	list_add(&page->lru, &gmap->crst_list);
+	spin_unlock(&gmap->guest_table_lock);
+
+	asce = (gmap->asce & ~PAGE_MASK) | __pa(table);
+	WRITE_ONCE(gmap->asce, asce);
+	WRITE_ONCE(gmap->mm->context.gmap_asce, asce);
+	WRITE_ONCE(gmap->table, table);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(s390_replace_asce);