diff mbox series

[v1,01/11] KVM: s390: pv: leak the ASCE page when destroy fails

Message ID 20210517200758.22593-2-imbrenda@linux.ibm.com (mailing list archive)
State New
Headers show
Series KVM: s390: pv: implement lazy destroy | expand

Commit Message

Claudio Imbrenda May 17, 2021, 8:07 p.m. UTC
When the destroy configuration UVC fails, the page pointed to by the
ASCE of the VM becomes poisoned, and, to avoid issues it must not be
used again.

Since the page becomes in practice unusable, we set it aside and leak it.

Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
---
 arch/s390/kvm/pv.c | 53 +++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 52 insertions(+), 1 deletion(-)

Comments

Janosch Frank May 18, 2021, 10:26 a.m. UTC | #1
On 5/17/21 10:07 PM, Claudio Imbrenda wrote:
> When the destroy configuration UVC fails, the page pointed to by the
> ASCE of the VM becomes poisoned, and, to avoid issues it must not be
> used again.
> 
> Since the page becomes in practice unusable, we set it aside and leak it.

I think we need something a bit more specific.

On creation of a protected guest the top most level of page tables are
marked by the Ultravisor and can only be used as top level page tables
for the protected guest that was created. If another protected guest
would re-use those pages for its top level page tables the UV would
throw errors.

When a destroy fails the UV will not remove the markings so these pages
are basically unusable since we can't guarantee that they won't be used
for a guest ASCE in the future.

Hence we choose to leak those pages in the very unlikely event that a
destroy fails.


LGTM

> 
> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
> ---
>  arch/s390/kvm/pv.c | 53 +++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 52 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c
> index 813b6e93dc83..e0532ab725bf 100644
> --- a/arch/s390/kvm/pv.c
> +++ b/arch/s390/kvm/pv.c
> @@ -150,6 +150,55 @@ static int kvm_s390_pv_alloc_vm(struct kvm *kvm)
>  	return -ENOMEM;
>  }
>  
> +/*
> + * Remove the topmost level of page tables from the list of page tables of
> + * the gmap.
> + * This means that it will not be freed when the VM is torn down, and needs
> + * to be handled separately by the caller, unless an intentional leak is
> + * intended.
> + */
> +static void kvm_s390_pv_remove_old_asce(struct kvm *kvm)
> +{
> +	struct page *old;
> +
> +	old = virt_to_page(kvm->arch.gmap->table);
> +	list_del(&old->lru);
> +	/* in case the ASCE needs to be "removed" multiple times */
> +	INIT_LIST_HEAD(&old->lru);

?

> +}
> +
> +/*
> + * Try to replace the current ASCE with another equivalent one.
> + * If the allocation of the new top level page table fails, the ASCE is not
> + * replaced.
> + * In any case, the old ASCE is removed from the list, therefore the caller
> + * has to make sure to save a pointer to it beforehands, unless an
> + * intentional leak is intended.
> + */
> +static int kvm_s390_pv_replace_asce(struct kvm *kvm)
> +{
> +	unsigned long asce;
> +	struct page *page;
> +	void *table;
> +
> +	kvm_s390_pv_remove_old_asce(kvm);
> +
> +	page = alloc_pages(GFP_KERNEL_ACCOUNT, CRST_ALLOC_ORDER);
> +	if (!page)
> +		return -ENOMEM;
> +	list_add(&page->lru, &kvm->arch.gmap->crst_list);
> +
> +	table = page_to_virt(page);
> +	memcpy(table, kvm->arch.gmap->table, 1UL << (CRST_ALLOC_ORDER + PAGE_SHIFT));
> +
> +	asce = (kvm->arch.gmap->asce & ~PAGE_MASK) | __pa(table);
> +	WRITE_ONCE(kvm->arch.gmap->asce, asce);
> +	WRITE_ONCE(kvm->mm->context.gmap_asce, asce);
> +	WRITE_ONCE(kvm->arch.gmap->table, table);
> +
> +	return 0;
> +}
> +
>  /* this should not fail, but if it does, we must not free the donated memory */
>  int kvm_s390_pv_deinit_vm(struct kvm *kvm, u16 *rc, u16 *rrc)
>  {
> @@ -164,9 +213,11 @@ int kvm_s390_pv_deinit_vm(struct kvm *kvm, u16 *rc, u16 *rrc)
>  	atomic_set(&kvm->mm->context.is_protected, 0);
>  	KVM_UV_EVENT(kvm, 3, "PROTVIRT DESTROY VM: rc %x rrc %x", *rc, *rrc);
>  	WARN_ONCE(cc, "protvirt destroy vm failed rc %x rrc %x", *rc, *rrc);
> -	/* Inteded memory leak on "impossible" error */
> +	/* Intended memory leak on "impossible" error */
>  	if (!cc)
>  		kvm_s390_pv_dealloc_vm(kvm);
> +	else
> +		kvm_s390_pv_replace_asce(kvm);
>  	return cc ? -EIO : 0;
>  }
>  
>
Claudio Imbrenda May 18, 2021, 10:40 a.m. UTC | #2
On Tue, 18 May 2021 12:26:51 +0200
Janosch Frank <frankja@linux.ibm.com> wrote:

> On 5/17/21 10:07 PM, Claudio Imbrenda wrote:
> > When the destroy configuration UVC fails, the page pointed to by the
> > ASCE of the VM becomes poisoned, and, to avoid issues it must not be
> > used again.
> > 
> > Since the page becomes in practice unusable, we set it aside and
> > leak it.  
> 
> I think we need something a bit more specific.
> 
> On creation of a protected guest the top most level of page tables are
> marked by the Ultravisor and can only be used as top level page tables
> for the protected guest that was created. If another protected guest
> would re-use those pages for its top level page tables the UV would
> throw errors.
> 
> When a destroy fails the UV will not remove the markings so these
> pages are basically unusable since we can't guarantee that they won't
> be used for a guest ASCE in the future.
> 
> Hence we choose to leak those pages in the very unlikely event that a
> destroy fails.

it's more than that. the top level page, once marked, also cannot be
used as backing for the virtual and real memory areas donated with the
create secure configuration and create secure cpu UVCs.

and there might also other circumstances in which that page cannot be
used that I am not aware of

> 
> LGTM
> 
> > 
> > Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
> > ---
> >  arch/s390/kvm/pv.c | 53
> > +++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 52
> > insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c
> > index 813b6e93dc83..e0532ab725bf 100644
> > --- a/arch/s390/kvm/pv.c
> > +++ b/arch/s390/kvm/pv.c
> > @@ -150,6 +150,55 @@ static int kvm_s390_pv_alloc_vm(struct kvm
> > *kvm) return -ENOMEM;
> >  }
> >  
> > +/*
> > + * Remove the topmost level of page tables from the list of page
> > tables of
> > + * the gmap.
> > + * This means that it will not be freed when the VM is torn down,
> > and needs
> > + * to be handled separately by the caller, unless an intentional
> > leak is
> > + * intended.
> > + */
> > +static void kvm_s390_pv_remove_old_asce(struct kvm *kvm)
> > +{
> > +	struct page *old;
> > +
> > +	old = virt_to_page(kvm->arch.gmap->table);
> > +	list_del(&old->lru);
> > +	/* in case the ASCE needs to be "removed" multiple times */
> > +	INIT_LIST_HEAD(&old->lru);  
> 
> ?
> 
> > +}
> > +
> > +/*
> > + * Try to replace the current ASCE with another equivalent one.
> > + * If the allocation of the new top level page table fails, the
> > ASCE is not
> > + * replaced.
> > + * In any case, the old ASCE is removed from the list, therefore
> > the caller
> > + * has to make sure to save a pointer to it beforehands, unless an
> > + * intentional leak is intended.
> > + */
> > +static int kvm_s390_pv_replace_asce(struct kvm *kvm)
> > +{
> > +	unsigned long asce;
> > +	struct page *page;
> > +	void *table;
> > +
> > +	kvm_s390_pv_remove_old_asce(kvm);
> > +
> > +	page = alloc_pages(GFP_KERNEL_ACCOUNT, CRST_ALLOC_ORDER);
> > +	if (!page)
> > +		return -ENOMEM;
> > +	list_add(&page->lru, &kvm->arch.gmap->crst_list);
> > +
> > +	table = page_to_virt(page);
> > +	memcpy(table, kvm->arch.gmap->table, 1UL <<
> > (CRST_ALLOC_ORDER + PAGE_SHIFT)); +
> > +	asce = (kvm->arch.gmap->asce & ~PAGE_MASK) | __pa(table);
> > +	WRITE_ONCE(kvm->arch.gmap->asce, asce);
> > +	WRITE_ONCE(kvm->mm->context.gmap_asce, asce);
> > +	WRITE_ONCE(kvm->arch.gmap->table, table);
> > +
> > +	return 0;
> > +}
> > +
> >  /* this should not fail, but if it does, we must not free the
> > donated memory */ int kvm_s390_pv_deinit_vm(struct kvm *kvm, u16
> > *rc, u16 *rrc) {
> > @@ -164,9 +213,11 @@ int kvm_s390_pv_deinit_vm(struct kvm *kvm, u16
> > *rc, u16 *rrc) atomic_set(&kvm->mm->context.is_protected, 0);
> >  	KVM_UV_EVENT(kvm, 3, "PROTVIRT DESTROY VM: rc %x rrc %x",
> > *rc, *rrc); WARN_ONCE(cc, "protvirt destroy vm failed rc %x rrc
> > %x", *rc, *rrc);
> > -	/* Inteded memory leak on "impossible" error */
> > +	/* Intended memory leak on "impossible" error */
> >  	if (!cc)
> >  		kvm_s390_pv_dealloc_vm(kvm);
> > +	else
> > +		kvm_s390_pv_replace_asce(kvm);
> >  	return cc ? -EIO : 0;
> >  }
> >  
> >   
>
Janosch Frank May 18, 2021, noon UTC | #3
On 5/18/21 12:40 PM, Claudio Imbrenda wrote:
> On Tue, 18 May 2021 12:26:51 +0200
> Janosch Frank <frankja@linux.ibm.com> wrote:
> 
>> On 5/17/21 10:07 PM, Claudio Imbrenda wrote:
>>> When the destroy configuration UVC fails, the page pointed to by the
>>> ASCE of the VM becomes poisoned, and, to avoid issues it must not be
>>> used again.
>>>
>>> Since the page becomes in practice unusable, we set it aside and
>>> leak it.  
>>
>> I think we need something a bit more specific.
>>
>> On creation of a protected guest the top most level of page tables are
>> marked by the Ultravisor and can only be used as top level page tables
>> for the protected guest that was created. If another protected guest
>> would re-use those pages for its top level page tables the UV would
>> throw errors.
>>
>> When a destroy fails the UV will not remove the markings so these
>> pages are basically unusable since we can't guarantee that they won't
>> be used for a guest ASCE in the future.
>>
>> Hence we choose to leak those pages in the very unlikely event that a
>> destroy fails.
> 
> it's more than that. the top level page, once marked, also cannot be
> used as backing for the virtual and real memory areas donated with the
> create secure configuration and create secure cpu UVCs.
> 
> and there might also other circumstances in which that page cannot be
> used that I am not aware of
> 

Even more reason to document it :)

>>
>> LGTM
>>
>>>
>>> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
>>> ---
>>>  arch/s390/kvm/pv.c | 53
>>> +++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 52
>>> insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c
>>> index 813b6e93dc83..e0532ab725bf 100644
>>> --- a/arch/s390/kvm/pv.c
>>> +++ b/arch/s390/kvm/pv.c
>>> @@ -150,6 +150,55 @@ static int kvm_s390_pv_alloc_vm(struct kvm
>>> *kvm) return -ENOMEM;
>>>  }
>>>  
>>> +/*
>>> + * Remove the topmost level of page tables from the list of page
>>> tables of
>>> + * the gmap.
>>> + * This means that it will not be freed when the VM is torn down,
>>> and needs
>>> + * to be handled separately by the caller, unless an intentional
>>> leak is
>>> + * intended.
>>> + */
>>> +static void kvm_s390_pv_remove_old_asce(struct kvm *kvm)
>>> +{
>>> +	struct page *old;
>>> +
>>> +	old = virt_to_page(kvm->arch.gmap->table);
>>> +	list_del(&old->lru);
>>> +	/* in case the ASCE needs to be "removed" multiple times */
>>> +	INIT_LIST_HEAD(&old->lru);  
>>
>> ?
>>
>>> +}
>>> +
>>> +/*
>>> + * Try to replace the current ASCE with another equivalent one.
>>> + * If the allocation of the new top level page table fails, the
>>> ASCE is not
>>> + * replaced.
>>> + * In any case, the old ASCE is removed from the list, therefore
>>> the caller
>>> + * has to make sure to save a pointer to it beforehands, unless an
>>> + * intentional leak is intended.
>>> + */
>>> +static int kvm_s390_pv_replace_asce(struct kvm *kvm)
>>> +{
>>> +	unsigned long asce;
>>> +	struct page *page;
>>> +	void *table;
>>> +
>>> +	kvm_s390_pv_remove_old_asce(kvm);
>>> +
>>> +	page = alloc_pages(GFP_KERNEL_ACCOUNT, CRST_ALLOC_ORDER);
>>> +	if (!page)
>>> +		return -ENOMEM;
>>> +	list_add(&page->lru, &kvm->arch.gmap->crst_list);
>>> +
>>> +	table = page_to_virt(page);
>>> +	memcpy(table, kvm->arch.gmap->table, 1UL <<
>>> (CRST_ALLOC_ORDER + PAGE_SHIFT)); +
>>> +	asce = (kvm->arch.gmap->asce & ~PAGE_MASK) | __pa(table);
>>> +	WRITE_ONCE(kvm->arch.gmap->asce, asce);
>>> +	WRITE_ONCE(kvm->mm->context.gmap_asce, asce);
>>> +	WRITE_ONCE(kvm->arch.gmap->table, table);
>>> +
>>> +	return 0;
>>> +}
>>> +
>>>  /* this should not fail, but if it does, we must not free the
>>> donated memory */ int kvm_s390_pv_deinit_vm(struct kvm *kvm, u16
>>> *rc, u16 *rrc) {
>>> @@ -164,9 +213,11 @@ int kvm_s390_pv_deinit_vm(struct kvm *kvm, u16
>>> *rc, u16 *rrc) atomic_set(&kvm->mm->context.is_protected, 0);
>>>  	KVM_UV_EVENT(kvm, 3, "PROTVIRT DESTROY VM: rc %x rrc %x",
>>> *rc, *rrc); WARN_ONCE(cc, "protvirt destroy vm failed rc %x rrc
>>> %x", *rc, *rrc);
>>> -	/* Inteded memory leak on "impossible" error */
>>> +	/* Intended memory leak on "impossible" error */
>>>  	if (!cc)
>>>  		kvm_s390_pv_dealloc_vm(kvm);
>>> +	else
>>> +		kvm_s390_pv_replace_asce(kvm);
>>>  	return cc ? -EIO : 0;
>>>  }
>>>  
>>>   
>>
>
diff mbox series

Patch

diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c
index 813b6e93dc83..e0532ab725bf 100644
--- a/arch/s390/kvm/pv.c
+++ b/arch/s390/kvm/pv.c
@@ -150,6 +150,55 @@  static int kvm_s390_pv_alloc_vm(struct kvm *kvm)
 	return -ENOMEM;
 }
 
+/*
+ * Remove the topmost level of page tables from the list of page tables of
+ * the gmap.
+ * This means that it will not be freed when the VM is torn down, and needs
+ * to be handled separately by the caller, unless an intentional leak is
+ * intended.
+ */
+static void kvm_s390_pv_remove_old_asce(struct kvm *kvm)
+{
+	struct page *old;
+
+	old = virt_to_page(kvm->arch.gmap->table);
+	list_del(&old->lru);
+	/* in case the ASCE needs to be "removed" multiple times */
+	INIT_LIST_HEAD(&old->lru);
+}
+
+/*
+ * Try to replace the current ASCE with another equivalent one.
+ * If the allocation of the new top level page table fails, the ASCE is not
+ * replaced.
+ * In any case, the old ASCE is removed from the list, therefore the caller
+ * has to make sure to save a pointer to it beforehands, unless an
+ * intentional leak is intended.
+ */
+static int kvm_s390_pv_replace_asce(struct kvm *kvm)
+{
+	unsigned long asce;
+	struct page *page;
+	void *table;
+
+	kvm_s390_pv_remove_old_asce(kvm);
+
+	page = alloc_pages(GFP_KERNEL_ACCOUNT, CRST_ALLOC_ORDER);
+	if (!page)
+		return -ENOMEM;
+	list_add(&page->lru, &kvm->arch.gmap->crst_list);
+
+	table = page_to_virt(page);
+	memcpy(table, kvm->arch.gmap->table, 1UL << (CRST_ALLOC_ORDER + PAGE_SHIFT));
+
+	asce = (kvm->arch.gmap->asce & ~PAGE_MASK) | __pa(table);
+	WRITE_ONCE(kvm->arch.gmap->asce, asce);
+	WRITE_ONCE(kvm->mm->context.gmap_asce, asce);
+	WRITE_ONCE(kvm->arch.gmap->table, table);
+
+	return 0;
+}
+
 /* this should not fail, but if it does, we must not free the donated memory */
 int kvm_s390_pv_deinit_vm(struct kvm *kvm, u16 *rc, u16 *rrc)
 {
@@ -164,9 +213,11 @@  int kvm_s390_pv_deinit_vm(struct kvm *kvm, u16 *rc, u16 *rrc)
 	atomic_set(&kvm->mm->context.is_protected, 0);
 	KVM_UV_EVENT(kvm, 3, "PROTVIRT DESTROY VM: rc %x rrc %x", *rc, *rrc);
 	WARN_ONCE(cc, "protvirt destroy vm failed rc %x rrc %x", *rc, *rrc);
-	/* Inteded memory leak on "impossible" error */
+	/* Intended memory leak on "impossible" error */
 	if (!cc)
 		kvm_s390_pv_dealloc_vm(kvm);
+	else
+		kvm_s390_pv_replace_asce(kvm);
 	return cc ? -EIO : 0;
 }