Message ID | 20240415033614.43518-1-lirongqing@baidu.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | KVM: arm64: Consider NUMA affinity when allocating per-CPU stack_page | expand |
On Mon, 15 Apr 2024 04:36:14 +0100, Li RongQing <lirongqing@baidu.com> wrote: > > per-CPU stack_page are dominantly accessed from their own local CPUs, > so allocate them node-local to improve performance. Do you have any performance data to back this up? Given that this is only used in the non-VHE case, and that by doing so you have left quite a lot of performance on the floor already, I'm even more surprised to see the performance argument. Furthermore, you don't address the allocation of per-CPU data, which has a much larger potential impact, given how the HYP code is structured. > > Signed-off-by: Li RongQing <lirongqing@baidu.com> > --- > arch/arm64/kvm/arm.c | 8 ++++---- > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c > index c4a0a35..d745d01 100644 > --- a/arch/arm64/kvm/arm.c > +++ b/arch/arm64/kvm/arm.c > @@ -2330,15 +2330,15 @@ static int __init init_hyp_mode(void) > * Allocate stack pages for Hypervisor-mode > */ > for_each_possible_cpu(cpu) { > - unsigned long stack_page; > + struct page *page; > > - stack_page = __get_free_page(GFP_KERNEL); > - if (!stack_page) { > + page = alloc_pages_node(cpu_to_node(cpu), GFP_KERNEL, 0); > + if (!page) { > err = -ENOMEM; > goto out_err; > } > > - per_cpu(kvm_arm_hyp_stack_page, cpu) = stack_page; > + per_cpu(kvm_arm_hyp_stack_page, cpu) = page_address(page); > } > > /* The patch itself looks correct, however partial and lacking some convincing data. M.
> Li RongQing <lirongqing@baidu.com> wrote: > > > > per-CPU stack_page are dominantly accessed from their own local CPUs, > > so allocate them node-local to improve performance. > > Do you have any performance data to back this up? > > Given that this is only used in the non-VHE case, and that by doing so you have > left quite a lot of performance on the floor already, I'm even more surprised to > see the performance argument. > Sorry, I have not setup to test it. > Furthermore, you don't address the allocation of per-CPU data, which has a > much larger potential impact, given how the HYP code is structured. > I will add it in V2 Thanks -LiRongQing
On Thu, 18 Apr 2024 07:53:58 +0100, "Li,Rongqing" <lirongqing@baidu.com> wrote: > > Li RongQing <lirongqing@baidu.com> wrote: > > > > > > per-CPU stack_page are dominantly accessed from their own local CPUs, > > > so allocate them node-local to improve performance. > > > > Do you have any performance data to back this up? > > > > Given that this is only used in the non-VHE case, and that by doing so you have > > left quite a lot of performance on the floor already, I'm even more surprised to > > see the performance argument. > > > > Sorry, I have not setup to test it. > > > Furthermore, you don't address the allocation of per-CPU data, which has a > > much larger potential impact, given how the HYP code is structured. > > > > I will add it in V2 Please test your patch before posting it. I'm not interested in looking at patches that have not been tested by their author. Thanks, M.
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index c4a0a35..d745d01 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -2330,15 +2330,15 @@ static int __init init_hyp_mode(void) * Allocate stack pages for Hypervisor-mode */ for_each_possible_cpu(cpu) { - unsigned long stack_page; + struct page *page; - stack_page = __get_free_page(GFP_KERNEL); - if (!stack_page) { + page = alloc_pages_node(cpu_to_node(cpu), GFP_KERNEL, 0); + if (!page) { err = -ENOMEM; goto out_err; } - per_cpu(kvm_arm_hyp_stack_page, cpu) = stack_page; + per_cpu(kvm_arm_hyp_stack_page, cpu) = page_address(page); } /*
per-CPU stack_page are dominantly accessed from their own local CPUs, so allocate them node-local to improve performance. Signed-off-by: Li RongQing <lirongqing@baidu.com> --- arch/arm64/kvm/arm.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)