diff mbox series

KVM: arm64: Consider NUMA affinity when allocating per-CPU stack_page

Message ID 20240415033614.43518-1-lirongqing@baidu.com (mailing list archive)
State New
Headers show
Series KVM: arm64: Consider NUMA affinity when allocating per-CPU stack_page | expand

Commit Message

Li,Rongqing April 15, 2024, 3:36 a.m. UTC
per-CPU stack_page are dominantly accessed from their own local CPUs,
so allocate them node-local to improve performance.

Signed-off-by: Li RongQing <lirongqing@baidu.com>
---
 arch/arm64/kvm/arm.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

Comments

Marc Zyngier April 15, 2024, 7:36 a.m. UTC | #1
On Mon, 15 Apr 2024 04:36:14 +0100,
Li RongQing <lirongqing@baidu.com> wrote:
> 
> per-CPU stack_page are dominantly accessed from their own local CPUs,
> so allocate them node-local to improve performance.

Do you have any performance data to back this up?

Given that this is only used in the non-VHE case, and that by doing so
you have left quite a lot of performance on the floor already, I'm
even more surprised to see the performance argument.

Furthermore, you don't address the allocation of per-CPU data, which
has a much larger potential impact, given how the HYP code is
structured.

> 
> Signed-off-by: Li RongQing <lirongqing@baidu.com>
> ---
>  arch/arm64/kvm/arm.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index c4a0a35..d745d01 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -2330,15 +2330,15 @@ static int __init init_hyp_mode(void)
>  	 * Allocate stack pages for Hypervisor-mode
>  	 */
>  	for_each_possible_cpu(cpu) {
> -		unsigned long stack_page;
> +		struct page *page;
>  
> -		stack_page = __get_free_page(GFP_KERNEL);
> -		if (!stack_page) {
> +		page = alloc_pages_node(cpu_to_node(cpu), GFP_KERNEL, 0);
> +		if (!page) {
>  			err = -ENOMEM;
>  			goto out_err;
>  		}
>  
> -		per_cpu(kvm_arm_hyp_stack_page, cpu) = stack_page;
> +		per_cpu(kvm_arm_hyp_stack_page, cpu) = page_address(page);
>  	}
>  
>  	/*

The patch itself looks correct, however partial and lacking some
convincing data.

	M.
Li,Rongqing April 18, 2024, 6:53 a.m. UTC | #2
> Li RongQing <lirongqing@baidu.com> wrote:
> >
> > per-CPU stack_page are dominantly accessed from their own local CPUs,
> > so allocate them node-local to improve performance.
> 
> Do you have any performance data to back this up?
> 
> Given that this is only used in the non-VHE case, and that by doing so you have
> left quite a lot of performance on the floor already, I'm even more surprised to
> see the performance argument.
> 

Sorry, I have not setup to test it.

> Furthermore, you don't address the allocation of per-CPU data, which has a
> much larger potential impact, given how the HYP code is structured.
> 

I will add it in V2

Thanks

-LiRongQing
Marc Zyngier April 18, 2024, 9:03 a.m. UTC | #3
On Thu, 18 Apr 2024 07:53:58 +0100,
"Li,Rongqing" <lirongqing@baidu.com> wrote:
> > Li RongQing <lirongqing@baidu.com> wrote:
> > >
> > > per-CPU stack_page are dominantly accessed from their own local CPUs,
> > > so allocate them node-local to improve performance.
> > 
> > Do you have any performance data to back this up?
> > 
> > Given that this is only used in the non-VHE case, and that by doing so you have
> > left quite a lot of performance on the floor already, I'm even more surprised to
> > see the performance argument.
> > 
> 
> Sorry, I have not setup to test it.
> 
> > Furthermore, you don't address the allocation of per-CPU data, which has a
> > much larger potential impact, given how the HYP code is structured.
> > 
> 
> I will add it in V2

Please test your patch before posting it. I'm not interested in
looking at patches that have not been tested by their author.

Thanks,

	M.
diff mbox series

Patch

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index c4a0a35..d745d01 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -2330,15 +2330,15 @@  static int __init init_hyp_mode(void)
 	 * Allocate stack pages for Hypervisor-mode
 	 */
 	for_each_possible_cpu(cpu) {
-		unsigned long stack_page;
+		struct page *page;
 
-		stack_page = __get_free_page(GFP_KERNEL);
-		if (!stack_page) {
+		page = alloc_pages_node(cpu_to_node(cpu), GFP_KERNEL, 0);
+		if (!page) {
 			err = -ENOMEM;
 			goto out_err;
 		}
 
-		per_cpu(kvm_arm_hyp_stack_page, cpu) = stack_page;
+		per_cpu(kvm_arm_hyp_stack_page, cpu) = page_address(page);
 	}
 
 	/*