diff mbox series

[v5,02/25] KVM: arm64: Allow attaching of non-coalescable pages to a hyp pool

Message ID 20221020133827.5541-3-will@kernel.org (mailing list archive)
State New, archived
Headers show
Series KVM: arm64: Introduce pKVM hyp VM and vCPU state at EL2 | expand

Commit Message

Will Deacon Oct. 20, 2022, 1:38 p.m. UTC
From: Quentin Perret <qperret@google.com>

All the contiguous pages used to initialize a 'struct hyp_pool' are
considered coalescable, which means that the hyp page allocator will
actively try to merge them with their buddies on the hyp_put_page() path.
However, using hyp_put_page() on a page that is not part of the inital
memory range given to a hyp_pool() is currently unsupported.

In order to allow dynamically extending hyp pools at run-time, add a
check to __hyp_attach_page() to allow inserting 'external' pages into
the free-list of order 0. This will be necessary to allow lazy donation
of pages from the host to the hypervisor when allocating guest stage-2
page-table pages at EL2.

Tested-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/hyp/nvhe/page_alloc.c | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Oliver Upton Oct. 28, 2022, 12:17 a.m. UTC | #1
On Thu, Oct 20, 2022 at 02:38:04PM +0100, Will Deacon wrote:
> From: Quentin Perret <qperret@google.com>
> 
> All the contiguous pages used to initialize a 'struct hyp_pool' are
> considered coalescable, which means that the hyp page allocator will
> actively try to merge them with their buddies on the hyp_put_page() path.
> However, using hyp_put_page() on a page that is not part of the inital
> memory range given to a hyp_pool() is currently unsupported.
> 
> In order to allow dynamically extending hyp pools at run-time, add a
> check to __hyp_attach_page() to allow inserting 'external' pages into
> the free-list of order 0. This will be necessary to allow lazy donation
> of pages from the host to the hypervisor when allocating guest stage-2
> page-table pages at EL2.

Is it ever going to be the case that we wind up mixing static and
dynamic memory within the same buddy allocator? Reading ahead a bit it
would seem pKVM uses separate allocators (i.e. pkvm_hyp_vm::pool for
donated memory) but just wanted to make sure.

I suppose what I'm getting at is the fact that the pool range makes
little sense in this case. Adding a field to hyp_pool describing the
type of pool that it is would make this more readable, such that we know
a pool contains only donated memory, and thus zero order pages should
never be coalesced.

> Tested-by: Vincent Donnefort <vdonnefort@google.com>
> Signed-off-by: Quentin Perret <qperret@google.com>
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
>  arch/arm64/kvm/hyp/nvhe/page_alloc.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/arch/arm64/kvm/hyp/nvhe/page_alloc.c b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
> index 1ded09fc9b10..0d15227aced8 100644
> --- a/arch/arm64/kvm/hyp/nvhe/page_alloc.c
> +++ b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
> @@ -93,11 +93,15 @@ static inline struct hyp_page *node_to_page(struct list_head *node)
>  static void __hyp_attach_page(struct hyp_pool *pool,
>  			      struct hyp_page *p)
>  {
> +	phys_addr_t phys = hyp_page_to_phys(p);
>  	unsigned short order = p->order;
>  	struct hyp_page *buddy;
>  
>  	memset(hyp_page_to_virt(p), 0, PAGE_SIZE << p->order);
>  
> +	if (phys < pool->range_start || phys >= pool->range_end)
> +		goto insert;
> +

Assuming this is kept as-is...

This check reads really odd to me, but I understand how it applies to
the use case here. Perhaps create a helper (to be shared with
__find_buddy_nocheck()) and add a nice comment atop it describing the
significance of pages that exist outside the boundaries of the buddy
allocator.

--
Thanks,
Oliver
Oliver Upton Oct. 28, 2022, 8:09 a.m. UTC | #2
On Fri, Oct 28, 2022 at 12:17:40AM +0000, Oliver Upton wrote:
> On Thu, Oct 20, 2022 at 02:38:04PM +0100, Will Deacon wrote:
> > From: Quentin Perret <qperret@google.com>
> > 
> > All the contiguous pages used to initialize a 'struct hyp_pool' are
> > considered coalescable, which means that the hyp page allocator will
> > actively try to merge them with their buddies on the hyp_put_page() path.
> > However, using hyp_put_page() on a page that is not part of the inital
> > memory range given to a hyp_pool() is currently unsupported.
> > 
> > In order to allow dynamically extending hyp pools at run-time, add a
> > check to __hyp_attach_page() to allow inserting 'external' pages into
> > the free-list of order 0. This will be necessary to allow lazy donation
> > of pages from the host to the hypervisor when allocating guest stage-2
> > page-table pages at EL2.
> 
> Is it ever going to be the case that we wind up mixing static and
> dynamic memory within the same buddy allocator? Reading ahead a bit it
> would seem pKVM uses separate allocators (i.e. pkvm_hyp_vm::pool for
> donated memory) but just wanted to make sure.
> 
> I suppose what I'm getting at is the fact that the pool range makes
> little sense in this case. Adding a field to hyp_pool describing the
> type of pool that it is would make this more readable, such that we know
> a pool contains only donated memory, and thus zero order pages should
> never be coalesced.
> 
> > Tested-by: Vincent Donnefort <vdonnefort@google.com>
> > Signed-off-by: Quentin Perret <qperret@google.com>
> > Signed-off-by: Will Deacon <will@kernel.org>
> > ---
> >  arch/arm64/kvm/hyp/nvhe/page_alloc.c | 5 +++++
> >  1 file changed, 5 insertions(+)
> > 
> > diff --git a/arch/arm64/kvm/hyp/nvhe/page_alloc.c b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
> > index 1ded09fc9b10..0d15227aced8 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/page_alloc.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
> > @@ -93,11 +93,15 @@ static inline struct hyp_page *node_to_page(struct list_head *node)
> >  static void __hyp_attach_page(struct hyp_pool *pool,
> >  			      struct hyp_page *p)
> >  {
> > +	phys_addr_t phys = hyp_page_to_phys(p);
> >  	unsigned short order = p->order;
> >  	struct hyp_page *buddy;
> >  
> >  	memset(hyp_page_to_virt(p), 0, PAGE_SIZE << p->order);
> >  
> > +	if (phys < pool->range_start || phys >= pool->range_end)
> > +		goto insert;
> > +
> 
> Assuming this is kept as-is...
> 
> This check reads really odd to me, but I understand how it applies to
> the use case here. Perhaps create a helper (to be shared with
> __find_buddy_nocheck()) and add a nice comment atop it describing the
> significance of pages that exist outside the boundaries of the buddy
> allocator.

Sorry, I'm a moron. The check in __find_buddy_nocheck() is of course
necessary and irrelevant to the comment I've made above. But maybe I've
proved my point by tripping over it? :-)

--
Thanks,
Oliver
Quentin Perret Oct. 28, 2022, 9:28 a.m. UTC | #3
Hey Oliver,

On Friday 28 Oct 2022 at 00:17:40 (+0000), Oliver Upton wrote:
> On Thu, Oct 20, 2022 at 02:38:04PM +0100, Will Deacon wrote:
> > From: Quentin Perret <qperret@google.com>
> > 
> > All the contiguous pages used to initialize a 'struct hyp_pool' are
> > considered coalescable, which means that the hyp page allocator will
> > actively try to merge them with their buddies on the hyp_put_page() path.
> > However, using hyp_put_page() on a page that is not part of the inital
> > memory range given to a hyp_pool() is currently unsupported.
> > 
> > In order to allow dynamically extending hyp pools at run-time, add a
> > check to __hyp_attach_page() to allow inserting 'external' pages into
> > the free-list of order 0. This will be necessary to allow lazy donation
> > of pages from the host to the hypervisor when allocating guest stage-2
> > page-table pages at EL2.
> 
> Is it ever going to be the case that we wind up mixing static and
> dynamic memory within the same buddy allocator? Reading ahead a bit it
> would seem pKVM uses separate allocators (i.e. pkvm_hyp_vm::pool for
> donated memory) but just wanted to make sure.

So, in the code we have that builds on top of this, yes, but to be
frank it's a bit of a corner case. The per-guest pool is initially
populated with a physically contiguous memory range that covers the
need to allocate the guest's stage-2 PGD, which may be up to 16
contiguous pages IIRC. But aside from that, all subsequent allocations
will be at order 0 granularity, and those pages are added to the pool
dynamically.

> I suppose what I'm getting at is the fact that the pool range makes
> little sense in this case. Adding a field to hyp_pool describing the
> type of pool that it is would make this more readable, such that we know
> a pool contains only donated memory, and thus zero order pages should
> never be coalesced.

Right. In fact I think it would be fair to say that the buddy allocator
is overkill for what we need so far. The only high-order allocations we
do are for concatenated stage-2 PGDs (host and guest), but these could
be easily special-cased, and we'd be left with only page-size allocs for
which a simple free list would do just fine. I've considered removing
the buddy allocator TBH, but it doesn't really hurt to have it, and it
might turn out to be useful in the future (e.g. SMMU support might
require high-order allocs IIUC, and the ability to coalesce would come
handy then).
Quentin Perret Oct. 28, 2022, 9:29 a.m. UTC | #4
On Friday 28 Oct 2022 at 08:09:28 (+0000), Oliver Upton wrote:
> On Fri, Oct 28, 2022 at 12:17:40AM +0000, Oliver Upton wrote:
> > Assuming this is kept as-is...
> > 
> > This check reads really odd to me, but I understand how it applies to
> > the use case here. Perhaps create a helper (to be shared with
> > __find_buddy_nocheck()) and add a nice comment atop it describing the
> > significance of pages that exist outside the boundaries of the buddy
> > allocator.
> 
> Sorry, I'm a moron. The check in __find_buddy_nocheck() is of course
> necessary and irrelevant to the comment I've made above. But maybe I've
> proved my point by tripping over it? :-)

A comment won't hurt for sure, I'll add that for the next version.

Cheers!
Quentin
diff mbox series

Patch

diff --git a/arch/arm64/kvm/hyp/nvhe/page_alloc.c b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
index 1ded09fc9b10..0d15227aced8 100644
--- a/arch/arm64/kvm/hyp/nvhe/page_alloc.c
+++ b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
@@ -93,11 +93,15 @@  static inline struct hyp_page *node_to_page(struct list_head *node)
 static void __hyp_attach_page(struct hyp_pool *pool,
 			      struct hyp_page *p)
 {
+	phys_addr_t phys = hyp_page_to_phys(p);
 	unsigned short order = p->order;
 	struct hyp_page *buddy;
 
 	memset(hyp_page_to_virt(p), 0, PAGE_SIZE << p->order);
 
+	if (phys < pool->range_start || phys >= pool->range_end)
+		goto insert;
+
 	/*
 	 * Only the first struct hyp_page of a high-order page (otherwise known
 	 * as the 'head') should have p->order set. The non-head pages should
@@ -116,6 +120,7 @@  static void __hyp_attach_page(struct hyp_pool *pool,
 		p = min(p, buddy);
 	}
 
+insert:
 	/* Mark the new head, and insert it */
 	p->order = order;
 	page_add_to_list(p, &pool->free_area[order]);