diff mbox series

KVM: x86/mmu: Don't create kvm-nx-lpage-re kthread if not itlb_multihit

Message ID 1679555884-32544-1-git-send-email-lirongqing@baidu.com (mailing list archive)
State New, archived
Headers show
Series KVM: x86/mmu: Don't create kvm-nx-lpage-re kthread if not itlb_multihit | expand

Commit Message

Li RongQing March 23, 2023, 7:18 a.m. UTC
From: Li RongQing <lirongqing@baidu.com>

if CPU has not X86_BUG_ITLB_MULTIHIT bug, kvm-nx-lpage-re kthread
is not needed to create

Signed-off-by: Li RongQing <lirongqing@baidu.com>
---
 arch/x86/kvm/mmu/mmu.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

Comments

Sean Christopherson March 23, 2023, 2:20 p.m. UTC | #1
On Thu, Mar 23, 2023, lirongqing@baidu.com wrote:
> From: Li RongQing <lirongqing@baidu.com>
> 
> if CPU has not X86_BUG_ITLB_MULTIHIT bug, kvm-nx-lpage-re kthread
> is not needed to create

Unless userspace forces the mitigation to be enabled, which can be done while KVM
is running.  I agree that spinning up a kthread that is unlikely to be used is
less than ideal, but the ~8KiB or so overhead is per-VM and not really all that
notable, e.g. KVM's page tables easily exceed that.

The kthread could be spun up on demand, but that adds a non-trivial amount of
complexity due to the kthread being per-VM, and KVM and userspace would have to
deal with potential errors in a path that really shouldn't fail.

If we really want to avoid the overhead, one idea would be to add a "never" option
to the module param and make it sticky.
Huang, Kai March 23, 2023, 10:32 p.m. UTC | #2
On Thu, 2023-03-23 at 07:20 -0700, Sean Christopherson wrote:
> On Thu, Mar 23, 2023, lirongqing@baidu.com wrote:
> > From: Li RongQing <lirongqing@baidu.com>
> > 
> > if CPU has not X86_BUG_ITLB_MULTIHIT bug, kvm-nx-lpage-re kthread
> > is not needed to create
> 
> Unless userspace forces the mitigation to be enabled, which can be done while KVM
> is running.  
> 

Wondering why does userspace want to force the mitigation to be enabled if CPU
doesn't have such bug?
Sean Christopherson March 23, 2023, 10:40 p.m. UTC | #3
On Thu, Mar 23, 2023, Huang, Kai wrote:
> On Thu, 2023-03-23 at 07:20 -0700, Sean Christopherson wrote:
> > On Thu, Mar 23, 2023, lirongqing@baidu.com wrote:
> > > From: Li RongQing <lirongqing@baidu.com>
> > > 
> > > if CPU has not X86_BUG_ITLB_MULTIHIT bug, kvm-nx-lpage-re kthread
> > > is not needed to create
> > 
> > Unless userspace forces the mitigation to be enabled, which can be done while KVM
> > is running. �
> > 
> 
> Wondering why does userspace want to force the mitigation to be enabled if CPU
> doesn't have such bug?

It's definitely useful for testing, but the real motivation is so that the
mitgation can be enabled without a kernel reboot (or reloading KVM), i.e. without
having to drain VMs off the host, if it turns out that the host CPU actually is
vulnerable.  I.e. to guard against "Nope, not vulnerable!  Oh, wait, just kidding!".
Huang, Kai March 23, 2023, 11:16 p.m. UTC | #4
On Thu, 2023-03-23 at 15:40 -0700, Sean Christopherson wrote:
> On Thu, Mar 23, 2023, Huang, Kai wrote:
> > On Thu, 2023-03-23 at 07:20 -0700, Sean Christopherson wrote:
> > > On Thu, Mar 23, 2023, lirongqing@baidu.com wrote:
> > > > From: Li RongQing <lirongqing@baidu.com>
> > > > 
> > > > if CPU has not X86_BUG_ITLB_MULTIHIT bug, kvm-nx-lpage-re kthread
> > > > is not needed to create
> > > 
> > > Unless userspace forces the mitigation to be enabled, which can be done while KVM
> > > is running. �
> > > 
> > 
> > Wondering why does userspace want to force the mitigation to be enabled if CPU
> > doesn't have such bug?
> 
> It's definitely useful for testing, but the real motivation is so that the
> mitgation can be enabled without a kernel reboot (or reloading KVM), i.e. without
> having to drain VMs off the host, if it turns out that the host CPU actually is
> vulnerable.  I.e. to guard against "Nope, not vulnerable!  Oh, wait, just kidding!".

Never thought about this case.  Thanks!
Li RongQing March 30, 2023, 8:18 a.m. UTC | #5
> -----Original Message-----
> From: Sean Christopherson <seanjc@google.com>
> Sent: Thursday, March 23, 2023 10:20 PM
> To: Li,Rongqing <lirongqing@baidu.com>
> Cc: pbonzini@redhat.com; tglx@linutronix.de; mingo@redhat.com;
> bp@alien8.de; dave.hansen@linux.intel.com; kvm@vger.kernel.org;
> x86@kernel.org
> Subject: Re: [PATCH] KVM: x86/mmu: Don't create kvm-nx-lpage-re kthread if
> not itlb_multihit
> 
> On Thu, Mar 23, 2023, lirongqing@baidu.com wrote:
> > From: Li RongQing <lirongqing@baidu.com>
> >
> > if CPU has not X86_BUG_ITLB_MULTIHIT bug, kvm-nx-lpage-re kthread is
> > not needed to create
> 
> Unless userspace forces the mitigation to be enabled, which can be done while
> KVM is running.  I agree that spinning up a kthread that is unlikely to be used is
> less than ideal, but the ~8KiB or so overhead is per-VM and not really all that
> notable, e.g. KVM's page tables easily exceed that.
> 

A thread will create lots of proc file, and slow the command, like: ps

-Li
Sean Christopherson March 30, 2023, 7:19 p.m. UTC | #6
On Thu, Mar 30, 2023, Li,Rongqing wrote:
> > From: Sean Christopherson <seanjc@google.com>
> > On Thu, Mar 23, 2023, lirongqing@baidu.com wrote:
> > > From: Li RongQing <lirongqing@baidu.com>
> > >
> > > if CPU has not X86_BUG_ITLB_MULTIHIT bug, kvm-nx-lpage-re kthread is
> > > not needed to create
> > 
> > Unless userspace forces the mitigation to be enabled, which can be done while
> > KVM is running.  I agree that spinning up a kthread that is unlikely to be used is
> > less than ideal, but the ~8KiB or so overhead is per-VM and not really all that
> > notable, e.g. KVM's page tables easily exceed that.
> > 
> 
> A thread will create lots of proc file, and slow the command, like: ps

"ps" doesn't seem like a performance critical operation though.  Again, I'm not
completely against letting the admin opt-out entirely, but I do want sufficient
justification for the extra complexity, both in the code base and in documentation.
Robert Hoo May 2, 2023, 2:07 a.m. UTC | #7
On 3/23/2023 3:18 PM, lirongqing@baidu.com wrote:
> From: Li RongQing <lirongqing@baidu.com>
> 
> if CPU has not X86_BUG_ITLB_MULTIHIT bug, kvm-nx-lpage-re kthread
> is not needed to create

(directed by Sean from 
https://lore.kernel.org/kvm/ZE%2FR1%2FhvbuWmD8mw@google.com/ here.)

No, I think it should tie to "nx_huge_pages" value rather than 
directly/partially tie to boot_cpu_has_bug(X86_BUG_ITLB_MULTIHIT).
> 
> Signed-off-by: Li RongQing <lirongqing@baidu.com>
> ---
>   arch/x86/kvm/mmu/mmu.c | 19 +++++++++++++++++++
>   1 file changed, 19 insertions(+)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 8354262..be98c69 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -6667,6 +6667,11 @@ static bool get_nx_auto_mode(void)
>   	return boot_cpu_has_bug(X86_BUG_ITLB_MULTIHIT) && !cpu_mitigations_off();
>   }
>   
> +static bool cpu_has_itlb_multihit(void)
> +{
> +	return boot_cpu_has_bug(X86_BUG_ITLB_MULTIHIT);
> +}
> +
>   static void __set_nx_huge_pages(bool val)
>   {
>   	nx_huge_pages = itlb_multihit_kvm_mitigation = val;
> @@ -6677,6 +6682,11 @@ static int set_nx_huge_pages(const char *val, const struct kernel_param *kp)
>   	bool old_val = nx_huge_pages;
>   	bool new_val;
>   
> +	if (!cpu_has_itlb_multihit()) {
> +		__set_nx_huge_pages(false);
> +		return 0;
> +	}
> +
It's rude simply return here just because 
!boot_cpu_has_bug(X86_BUG_ITLB_MULTIHIT), leaving all else behind, i.e. 
leaving below sysfs node useless.
If you meant to do this, you should clear these sysfs APIs because of 
!boot_cpu_has_bug(X86_BUG_ITLB_MULTIHIT).

>   	/* In "auto" mode deploy workaround only if CPU has the bug. */
>   	if (sysfs_streq(val, "off"))
>   		new_val = 0;
> @@ -6816,6 +6826,9 @@ static int set_nx_huge_pages_recovery_param(const char *val, const struct kernel
>   	uint old_period, new_period;
>   	int err;
>   
> +	if (!cpu_has_itlb_multihit())
> +		return 0;
> +
>   	was_recovery_enabled = calc_nx_huge_pages_recovery_period(&old_period);
>   
>   	err = param_set_uint(val, kp);
> @@ -6971,6 +6984,9 @@ int kvm_mmu_post_init_vm(struct kvm *kvm)
>   {
>   	int err;
>   
> +	if (!cpu_has_itlb_multihit())
> +		return 0;
> +
It's rude to simply return. kvm_mmu_post_init_vm() by name is far more than 
nx_hugepage stuff, though at present only this stuff in.
I would rather

	if (cpu_has_itlb_multihit()) {
		...
	}

Consider people in the future when they do modifications on this function.
>   	err = kvm_vm_create_worker_thread(kvm, kvm_nx_huge_page_recovery_worker, 0,
>   					  "kvm-nx-lpage-recovery",
>   					  &kvm->arch.nx_huge_page_recovery_thread);
> @@ -6982,6 +6998,9 @@ int kvm_mmu_post_init_vm(struct kvm *kvm)
>   
>   void kvm_mmu_pre_destroy_vm(struct kvm *kvm)
>   {
> +	if (!cpu_has_itlb_multihit())
> +		return;
> Ditto. It looks (wrongly) like: if !cpu_has_itlb_multihit(), no need to do 
anything about pre_destroy_vm.

>   	if (kvm->arch.nx_huge_page_recovery_thread)
>   		kthread_stop(kvm->arch.nx_huge_page_recovery_thread);
>   }

To summary, regardless of the concrete patch/implementation, what Sean more 
urgently needs is real world justification to mitigate NX_hugepage; which I 
believe you have at hand: why would you like to do this, what real world 
issue caused by this bothers you. You could have more descriptions.

With regards to NX_hugepage, I see people dislike it [1][2][3], but on HW 
with itlb_multihit, they've no choice but to use it to mitigate.

[1] this patch
[2] 
https://lore.kernel.org/kvm/CANZk6aSv5ta3emitOfWKxaB-JvURBVu-sXqFnCz9PKXhqjbV9w@mail.gmail.com/
[3] 
https://lore.kernel.org/kvm/20220613212523.3436117-1-bgardon@google.com/ 
(merged)
zhuangel570 May 5, 2023, 12:42 p.m. UTC | #8
FYI, this is our test scenario, simulating the FaaS business, every VM assign
0.1 core, starting lots VMs run in backgroud (such as 800 VM on a machine
with 80 cores), then burst create 10 VMs, then got 100ms+ latency in creating
"kvm-nx-lpage-recovery".

On Tue, May 2, 2023 at 10:20 AM Robert Hoo <robert.hoo.linux@gmail.com> wrote:
>
> On 3/23/2023 3:18 PM, lirongqing@baidu.com wrote:
> > From: Li RongQing <lirongqing@baidu.com>
> >
> > if CPU has not X86_BUG_ITLB_MULTIHIT bug, kvm-nx-lpage-re kthread
> > is not needed to create
>
> (directed by Sean from
> https://lore.kernel.org/kvm/ZE%2FR1%2FhvbuWmD8mw@google.com/ here.)
>
> No, I think it should tie to "nx_huge_pages" value rather than
> directly/partially tie to boot_cpu_has_bug(X86_BUG_ITLB_MULTIHIT).
> >
> > Signed-off-by: Li RongQing <lirongqing@baidu.com>
> > ---
> >   arch/x86/kvm/mmu/mmu.c | 19 +++++++++++++++++++
> >   1 file changed, 19 insertions(+)
> >
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index 8354262..be98c69 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -6667,6 +6667,11 @@ static bool get_nx_auto_mode(void)
> >       return boot_cpu_has_bug(X86_BUG_ITLB_MULTIHIT) && !cpu_mitigations_off();
> >   }
> >
> > +static bool cpu_has_itlb_multihit(void)
> > +{
> > +     return boot_cpu_has_bug(X86_BUG_ITLB_MULTIHIT);
> > +}
> > +
> >   static void __set_nx_huge_pages(bool val)
> >   {
> >       nx_huge_pages = itlb_multihit_kvm_mitigation = val;
> > @@ -6677,6 +6682,11 @@ static int set_nx_huge_pages(const char *val, const struct kernel_param *kp)
> >       bool old_val = nx_huge_pages;
> >       bool new_val;
> >
> > +     if (!cpu_has_itlb_multihit()) {
> > +             __set_nx_huge_pages(false);
> > +             return 0;
> > +     }
> > +
> It's rude simply return here just because
> !boot_cpu_has_bug(X86_BUG_ITLB_MULTIHIT), leaving all else behind, i.e.
> leaving below sysfs node useless.
> If you meant to do this, you should clear these sysfs APIs because of
> !boot_cpu_has_bug(X86_BUG_ITLB_MULTIHIT).
>
> >       /* In "auto" mode deploy workaround only if CPU has the bug. */
> >       if (sysfs_streq(val, "off"))
> >               new_val = 0;
> > @@ -6816,6 +6826,9 @@ static int set_nx_huge_pages_recovery_param(const char *val, const struct kernel
> >       uint old_period, new_period;
> >       int err;
> >
> > +     if (!cpu_has_itlb_multihit())
> > +             return 0;
> > +
> >       was_recovery_enabled = calc_nx_huge_pages_recovery_period(&old_period);
> >
> >       err = param_set_uint(val, kp);
> > @@ -6971,6 +6984,9 @@ int kvm_mmu_post_init_vm(struct kvm *kvm)
> >   {
> >       int err;
> >
> > +     if (!cpu_has_itlb_multihit())
> > +             return 0;
> > +
> It's rude to simply return. kvm_mmu_post_init_vm() by name is far more than
> nx_hugepage stuff, though at present only this stuff in.
> I would rather
>
>         if (cpu_has_itlb_multihit()) {
>                 ...
>         }
>
> Consider people in the future when they do modifications on this function.
> >       err = kvm_vm_create_worker_thread(kvm, kvm_nx_huge_page_recovery_worker, 0,
> >                                         "kvm-nx-lpage-recovery",
> >                                         &kvm->arch.nx_huge_page_recovery_thread);
> > @@ -6982,6 +6998,9 @@ int kvm_mmu_post_init_vm(struct kvm *kvm)
> >
> >   void kvm_mmu_pre_destroy_vm(struct kvm *kvm)
> >   {
> > +     if (!cpu_has_itlb_multihit())
> > +             return;
> > Ditto. It looks (wrongly) like: if !cpu_has_itlb_multihit(), no need to do
> anything about pre_destroy_vm.
>
> >       if (kvm->arch.nx_huge_page_recovery_thread)
> >               kthread_stop(kvm->arch.nx_huge_page_recovery_thread);
> >   }
>
> To summary, regardless of the concrete patch/implementation, what Sean more
> urgently needs is real world justification to mitigate NX_hugepage; which I
> believe you have at hand: why would you like to do this, what real world
> issue caused by this bothers you. You could have more descriptions.
>
> With regards to NX_hugepage, I see people dislike it [1][2][3], but on HW
> with itlb_multihit, they've no choice but to use it to mitigate.
>
> [1] this patch
> [2]
> https://lore.kernel.org/kvm/CANZk6aSv5ta3emitOfWKxaB-JvURBVu-sXqFnCz9PKXhqjbV9w@mail.gmail.com/
> [3]
> https://lore.kernel.org/kvm/20220613212523.3436117-1-bgardon@google.com/
> (merged)
Sean Christopherson May 5, 2023, 5:44 p.m. UTC | #9
On Fri, May 05, 2023, zhuangel570 wrote:
> FYI, this is our test scenario, simulating the FaaS business, every VM assign
> 0.1 core, starting lots VMs run in backgroud (such as 800 VM on a machine
> with 80 cores), then burst create 10 VMs, then got 100ms+ latency in creating
> "kvm-nx-lpage-recovery".
> 
> On Tue, May 2, 2023 at 10:20 AM Robert Hoo <robert.hoo.linux@gmail.com> wrote:
> >
> > On 3/23/2023 3:18 PM, lirongqing@baidu.com wrote:
> > > From: Li RongQing <lirongqing@baidu.com>
> > >
> > > if CPU has not X86_BUG_ITLB_MULTIHIT bug, kvm-nx-lpage-re kthread
> > > is not needed to create
> >
> > (directed by Sean from
> > https://lore.kernel.org/kvm/ZE%2FR1%2FhvbuWmD8mw@google.com/ here.)
> >
> > No, I think it should tie to "nx_huge_pages" value rather than
> > directly/partially tie to boot_cpu_has_bug(X86_BUG_ITLB_MULTIHIT).

Lightly tested.  This is what I'm thinking for a "never" param.  Unless someone
has an alternative idea, I'll post a formal patch after more testing. 

---
 arch/x86/kvm/mmu/mmu.c | 41 ++++++++++++++++++++++++++++++++++++-----
 1 file changed, 36 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index c8961f45e3b1..14713c050196 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -58,6 +58,8 @@
 
 extern bool itlb_multihit_kvm_mitigation;
 
+static bool nx_hugepage_mitigation_hard_disabled;
+
 int __read_mostly nx_huge_pages = -1;
 static uint __read_mostly nx_huge_pages_recovery_period_ms;
 #ifdef CONFIG_PREEMPT_RT
@@ -67,12 +69,13 @@ static uint __read_mostly nx_huge_pages_recovery_ratio = 0;
 static uint __read_mostly nx_huge_pages_recovery_ratio = 60;
 #endif
 
+static int get_nx_huge_pages(char *buffer, const struct kernel_param *kp);
 static int set_nx_huge_pages(const char *val, const struct kernel_param *kp);
 static int set_nx_huge_pages_recovery_param(const char *val, const struct kernel_param *kp);
 
 static const struct kernel_param_ops nx_huge_pages_ops = {
 	.set = set_nx_huge_pages,
-	.get = param_get_bool,
+	.get = get_nx_huge_pages,
 };
 
 static const struct kernel_param_ops nx_huge_pages_recovery_param_ops = {
@@ -6844,6 +6847,14 @@ static void mmu_destroy_caches(void)
 	kmem_cache_destroy(mmu_page_header_cache);
 }
 
+static int get_nx_huge_pages(char *buffer, const struct kernel_param *kp)
+{
+	if (nx_hugepage_mitigation_hard_disabled)
+		return sprintf(buffer, "never\n");
+
+	return param_get_bool(buffer, kp);
+}
+
 static bool get_nx_auto_mode(void)
 {
 	/* Return true when CPU has the bug, and mitigations are ON */
@@ -6860,15 +6871,29 @@ static int set_nx_huge_pages(const char *val, const struct kernel_param *kp)
 	bool old_val = nx_huge_pages;
 	bool new_val;
 
+	if (nx_hugepage_mitigation_hard_disabled)
+		return -EPERM;
+
 	/* In "auto" mode deploy workaround only if CPU has the bug. */
-	if (sysfs_streq(val, "off"))
+	if (sysfs_streq(val, "off")) {
 		new_val = 0;
-	else if (sysfs_streq(val, "force"))
+	} else if (sysfs_streq(val, "force")) {
 		new_val = 1;
-	else if (sysfs_streq(val, "auto"))
+	} else if (sysfs_streq(val, "auto")) {
 		new_val = get_nx_auto_mode();
-	else if (kstrtobool(val, &new_val) < 0)
+	} if (sysfs_streq(val, "never")) {
+		new_val = 0;
+
+		mutex_lock(&kvm_lock);
+		if (!list_empty(&vm_list)) {
+			mutex_unlock(&kvm_lock);
+			return -EBUSY;
+		}
+		nx_hugepage_mitigation_hard_disabled = true;
+		mutex_unlock(&kvm_lock);
+	} else if (kstrtobool(val, &new_val) < 0) {
 		return -EINVAL;
+	}
 
 	__set_nx_huge_pages(new_val);
 
@@ -7006,6 +7031,9 @@ static int set_nx_huge_pages_recovery_param(const char *val, const struct kernel
 	uint old_period, new_period;
 	int err;
 
+	if (nx_hugepage_mitigation_hard_disabled)
+		return -EPERM;
+
 	was_recovery_enabled = calc_nx_huge_pages_recovery_period(&old_period);
 
 	err = param_set_uint(val, kp);
@@ -7161,6 +7189,9 @@ int kvm_mmu_post_init_vm(struct kvm *kvm)
 {
 	int err;
 
+	if (nx_hugepage_mitigation_hard_disabled)
+		return 0;
+
 	err = kvm_vm_create_worker_thread(kvm, kvm_nx_huge_page_recovery_worker, 0,
 					  "kvm-nx-lpage-recovery",
 					  &kvm->arch.nx_huge_page_recovery_thread);

base-commit: b3c98052d46948a8d65d2778c7f306ff38366aac
--
Jim Mattson May 5, 2023, 5:56 p.m. UTC | #10
On Mon, May 1, 2023 at 7:07 PM Robert Hoo <robert.hoo.linux@gmail.com> wrote:

> With regards to NX_hugepage, I see people dislike it [1][2][3], but on HW
> with itlb_multihit, they've no choice but to use it to mitigate.

I think it's safe to say that no one likes the NX-hugepage mitigation.
It seems that we've gone to extremes to prevent this one specific DoS
vector. Do we have confidence that we have comparable protection from
*all* DoS vectors? If we let just one slip through, then there isn't
much point in going crazy about others.
zhuangel570 May 6, 2023, 7:12 a.m. UTC | #11
The "never" parameter works for environments without ITLB MULTIHIT issue. But
for vulnerable environments, should we prohibit users from turning off
software mitigations?

As for the nx_huge_page_recovery_thread worker thread, this is a solution to
optimize software mitigation, maybe not needed in all cases.
For example, on a vulnerable machine, software mitigations need to be enabled,
but worker threads may not be needed when the VM determines that huge pages
are not in use (not sure).

Do you think it is possible to introduce a new parameter to disable worker
threads?

On Sat, May 6, 2023 at 1:44 AM Sean Christopherson <seanjc@google.com> wrote:
>
> On Fri, May 05, 2023, zhuangel570 wrote:
> > FYI, this is our test scenario, simulating the FaaS business, every VM assign
> > 0.1 core, starting lots VMs run in backgroud (such as 800 VM on a machine
> > with 80 cores), then burst create 10 VMs, then got 100ms+ latency in creating
> > "kvm-nx-lpage-recovery".
> >
> > On Tue, May 2, 2023 at 10:20 AM Robert Hoo <robert.hoo.linux@gmail.com> wrote:
> > >
> > > On 3/23/2023 3:18 PM, lirongqing@baidu.com wrote:
> > > > From: Li RongQing <lirongqing@baidu.com>
> > > >
> > > > if CPU has not X86_BUG_ITLB_MULTIHIT bug, kvm-nx-lpage-re kthread
> > > > is not needed to create
> > >
> > > (directed by Sean from
> > > https://lore.kernel.org/kvm/ZE%2FR1%2FhvbuWmD8mw@google.com/ here.)
> > >
> > > No, I think it should tie to "nx_huge_pages" value rather than
> > > directly/partially tie to boot_cpu_has_bug(X86_BUG_ITLB_MULTIHIT).
>
> Lightly tested.  This is what I'm thinking for a "never" param.  Unless someone
> has an alternative idea, I'll post a formal patch after more testing.
>
> ---
>  arch/x86/kvm/mmu/mmu.c | 41 ++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 36 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index c8961f45e3b1..14713c050196 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -58,6 +58,8 @@
>
>  extern bool itlb_multihit_kvm_mitigation;
>
> +static bool nx_hugepage_mitigation_hard_disabled;
> +
>  int __read_mostly nx_huge_pages = -1;
>  static uint __read_mostly nx_huge_pages_recovery_period_ms;
>  #ifdef CONFIG_PREEMPT_RT
> @@ -67,12 +69,13 @@ static uint __read_mostly nx_huge_pages_recovery_ratio = 0;
>  static uint __read_mostly nx_huge_pages_recovery_ratio = 60;
>  #endif
>
> +static int get_nx_huge_pages(char *buffer, const struct kernel_param *kp);
>  static int set_nx_huge_pages(const char *val, const struct kernel_param *kp);
>  static int set_nx_huge_pages_recovery_param(const char *val, const struct kernel_param *kp);
>
>  static const struct kernel_param_ops nx_huge_pages_ops = {
>         .set = set_nx_huge_pages,
> -       .get = param_get_bool,
> +       .get = get_nx_huge_pages,
>  };
>
>  static const struct kernel_param_ops nx_huge_pages_recovery_param_ops = {
> @@ -6844,6 +6847,14 @@ static void mmu_destroy_caches(void)
>         kmem_cache_destroy(mmu_page_header_cache);
>  }
>
> +static int get_nx_huge_pages(char *buffer, const struct kernel_param *kp)
> +{
> +       if (nx_hugepage_mitigation_hard_disabled)
> +               return sprintf(buffer, "never\n");
> +
> +       return param_get_bool(buffer, kp);
> +}
> +
>  static bool get_nx_auto_mode(void)
>  {
>         /* Return true when CPU has the bug, and mitigations are ON */
> @@ -6860,15 +6871,29 @@ static int set_nx_huge_pages(const char *val, const struct kernel_param *kp)
>         bool old_val = nx_huge_pages;
>         bool new_val;
>
> +       if (nx_hugepage_mitigation_hard_disabled)
> +               return -EPERM;
> +
>         /* In "auto" mode deploy workaround only if CPU has the bug. */
> -       if (sysfs_streq(val, "off"))
> +       if (sysfs_streq(val, "off")) {
>                 new_val = 0;
> -       else if (sysfs_streq(val, "force"))
> +       } else if (sysfs_streq(val, "force")) {
>                 new_val = 1;
> -       else if (sysfs_streq(val, "auto"))
> +       } else if (sysfs_streq(val, "auto")) {
>                 new_val = get_nx_auto_mode();
> -       else if (kstrtobool(val, &new_val) < 0)
> +       } if (sysfs_streq(val, "never")) {
> +               new_val = 0;
> +
> +               mutex_lock(&kvm_lock);
> +               if (!list_empty(&vm_list)) {
> +                       mutex_unlock(&kvm_lock);
> +                       return -EBUSY;
> +               }
> +               nx_hugepage_mitigation_hard_disabled = true;
> +               mutex_unlock(&kvm_lock);
> +       } else if (kstrtobool(val, &new_val) < 0) {
>                 return -EINVAL;
> +       }
>
>         __set_nx_huge_pages(new_val);
>
> @@ -7006,6 +7031,9 @@ static int set_nx_huge_pages_recovery_param(const char *val, const struct kernel
>         uint old_period, new_period;
>         int err;
>
> +       if (nx_hugepage_mitigation_hard_disabled)
> +               return -EPERM;
> +
>         was_recovery_enabled = calc_nx_huge_pages_recovery_period(&old_period);
>
>         err = param_set_uint(val, kp);
> @@ -7161,6 +7189,9 @@ int kvm_mmu_post_init_vm(struct kvm *kvm)
>  {
>         int err;
>
> +       if (nx_hugepage_mitigation_hard_disabled)
> +               return 0;
> +
>         err = kvm_vm_create_worker_thread(kvm, kvm_nx_huge_page_recovery_worker, 0,
>                                           "kvm-nx-lpage-recovery",
>                                           &kvm->arch.nx_huge_page_recovery_thread);
>
> base-commit: b3c98052d46948a8d65d2778c7f306ff38366aac
> --
>
Robert Hoo May 6, 2023, 2:49 p.m. UTC | #12
On 5/6/2023 1:44 AM, Sean Christopherson wrote:
> Lightly tested.  This is what I'm thinking for a "never" param.  Unless someone
> has an alternative idea, I'll post a formal patch after more testing.
> 
> ---
>   arch/x86/kvm/mmu/mmu.c | 41 ++++++++++++++++++++++++++++++++++++-----
>   1 file changed, 36 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index c8961f45e3b1..14713c050196 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -58,6 +58,8 @@
>   
>   extern bool itlb_multihit_kvm_mitigation;
>   
> +static bool nx_hugepage_mitigation_hard_disabled;
> +
>   int __read_mostly nx_huge_pages = -1;
>   static uint __read_mostly nx_huge_pages_recovery_period_ms;
>   #ifdef CONFIG_PREEMPT_RT
> @@ -67,12 +69,13 @@ static uint __read_mostly nx_huge_pages_recovery_ratio = 0;
>   static uint __read_mostly nx_huge_pages_recovery_ratio = 60;
>   #endif
>   
> +static int get_nx_huge_pages(char *buffer, const struct kernel_param *kp);
>   static int set_nx_huge_pages(const char *val, const struct kernel_param *kp);
>   static int set_nx_huge_pages_recovery_param(const char *val, const struct kernel_param *kp);
>   
>   static const struct kernel_param_ops nx_huge_pages_ops = {
>   	.set = set_nx_huge_pages,
> -	.get = param_get_bool,
> +	.get = get_nx_huge_pages,
>   };
>   
>   static const struct kernel_param_ops nx_huge_pages_recovery_param_ops = {
> @@ -6844,6 +6847,14 @@ static void mmu_destroy_caches(void)
>   	kmem_cache_destroy(mmu_page_header_cache);
>   }
>   
> +static int get_nx_huge_pages(char *buffer, const struct kernel_param *kp)
> +{
> +	if (nx_hugepage_mitigation_hard_disabled)
> +		return sprintf(buffer, "never\n");
> +
> +	return param_get_bool(buffer, kp);
> +}
> +
>   static bool get_nx_auto_mode(void)
>   {
>   	/* Return true when CPU has the bug, and mitigations are ON */
> @@ -6860,15 +6871,29 @@ static int set_nx_huge_pages(const char *val, const struct kernel_param *kp)
>   	bool old_val = nx_huge_pages;
>   	bool new_val;
>   
> +	if (nx_hugepage_mitigation_hard_disabled)
> +		return -EPERM;
> +
>   	/* In "auto" mode deploy workaround only if CPU has the bug. */
> -	if (sysfs_streq(val, "off"))
> +	if (sysfs_streq(val, "off")) {
>   		new_val = 0;
> -	else if (sysfs_streq(val, "force"))
> +	} else if (sysfs_streq(val, "force")) {
>   		new_val = 1;
> -	else if (sysfs_streq(val, "auto"))
> +	} else if (sysfs_streq(val, "auto")) {
>   		new_val = get_nx_auto_mode();
> -	else if (kstrtobool(val, &new_val) < 0)
> +	} if (sysfs_streq(val, "never")) {

if --> else if?

And, what's the difference between "off" and "never"?

> +		new_val = 0;
> +
> +		mutex_lock(&kvm_lock);
> +		if (!list_empty(&vm_list)) {
> +			mutex_unlock(&kvm_lock);
> +			return -EBUSY;
> +		}
> +		nx_hugepage_mitigation_hard_disabled = true;
> +		mutex_unlock(&kvm_lock);
> +	} else if (kstrtobool(val, &new_val) < 0) {
>   		return -EINVAL;
> +	}
>   
>   	__set_nx_huge_pages(new_val);
>   
> @@ -7006,6 +7031,9 @@ static int set_nx_huge_pages_recovery_param(const char *val, const struct kernel
>   	uint old_period, new_period;
>   	int err;
>   
> +	if (nx_hugepage_mitigation_hard_disabled)
> +		return -EPERM;
> +
>   	was_recovery_enabled = calc_nx_huge_pages_recovery_period(&old_period);
>   
>   	err = param_set_uint(val, kp);
> @@ -7161,6 +7189,9 @@ int kvm_mmu_post_init_vm(struct kvm *kvm)
>   {
>   	int err;
>   
> +	if (nx_hugepage_mitigation_hard_disabled)
> +		return 0;
> +
I suggest

	if (!nx_hugepage_mitigation_hard_disabled) {
		create worker thread
	}

As this func name is kvm_mmu_post_init_vm(), 
nx_hugepage_mitigation_hard_disabled mean don't need to create the worker 
thread, but not don't need to do post init, although currently the only 
stuff in post init is to create work thread of nx_hugepage, but for the 
sake of future stuff.

>   	err = kvm_vm_create_worker_thread(kvm, kvm_nx_huge_page_recovery_worker, 0,
>   					  "kvm-nx-lpage-recovery",
>   					  &kvm->arch.nx_huge_page_recovery_thread);
> 
> base-commit: b3c98052d46948a8d65d2778c7f306ff38366aac
Robert Hoo May 6, 2023, 2:59 p.m. UTC | #13
On 5/6/2023 3:12 PM, zhuangel570 wrote:
> The "never" parameter works for environments without ITLB MULTIHIT issue. But
> for vulnerable environments, should we prohibit users from turning off
> software mitigations?
> 
> As for the nx_huge_page_recovery_thread worker thread, this is a solution to
> optimize software mitigation, maybe not needed in all cases.
> For example, on a vulnerable machine, software mitigations need to be enabled,
> but worker threads may not be needed when the VM determines that huge pages
> are not in use (not sure).

Then nx_hugepage is totally not needed:)
> 
> Do you think it is possible to introduce a new parameter to disable worker
> threads?

I suggest no. I would perceive this kthread as ingredient of nx_hugepage 
solution.
zhuangel570 May 6, 2023, 3:30 p.m. UTC | #14
Got it, I like the solution, thanks!

On Sat, May 6, 2023 at 10:59 PM Robert Hoo <robert.hoo.linux@gmail.com> wrote:
>
> On 5/6/2023 3:12 PM, zhuangel570 wrote:
> > The "never" parameter works for environments without ITLB MULTIHIT issue. But
> > for vulnerable environments, should we prohibit users from turning off
> > software mitigations?
> >
> > As for the nx_huge_page_recovery_thread worker thread, this is a solution to
> > optimize software mitigation, maybe not needed in all cases.
> > For example, on a vulnerable machine, software mitigations need to be enabled,
> > but worker threads may not be needed when the VM determines that huge pages
> > are not in use (not sure).
>
> Then nx_hugepage is totally not needed:)
> >
> > Do you think it is possible to introduce a new parameter to disable worker
> > threads?
>
> I suggest no. I would perceive this kthread as ingredient of nx_hugepage
> solution.
>
>
Robert Hoo May 7, 2023, 1:18 a.m. UTC | #15
On 5/6/2023 10:49 PM, Robert Hoo wrote:
> 
> And, what's the difference between "off" and "never"?
> 
Ah, after a night's sleep, my cognitive abilities are restored.
"never" intends to block any other settings if there's VM created.
diff mbox series

Patch

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 8354262..be98c69 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -6667,6 +6667,11 @@  static bool get_nx_auto_mode(void)
 	return boot_cpu_has_bug(X86_BUG_ITLB_MULTIHIT) && !cpu_mitigations_off();
 }
 
+static bool cpu_has_itlb_multihit(void)
+{
+	return boot_cpu_has_bug(X86_BUG_ITLB_MULTIHIT);
+}
+
 static void __set_nx_huge_pages(bool val)
 {
 	nx_huge_pages = itlb_multihit_kvm_mitigation = val;
@@ -6677,6 +6682,11 @@  static int set_nx_huge_pages(const char *val, const struct kernel_param *kp)
 	bool old_val = nx_huge_pages;
 	bool new_val;
 
+	if (!cpu_has_itlb_multihit()) {
+		__set_nx_huge_pages(false);
+		return 0;
+	}
+
 	/* In "auto" mode deploy workaround only if CPU has the bug. */
 	if (sysfs_streq(val, "off"))
 		new_val = 0;
@@ -6816,6 +6826,9 @@  static int set_nx_huge_pages_recovery_param(const char *val, const struct kernel
 	uint old_period, new_period;
 	int err;
 
+	if (!cpu_has_itlb_multihit())
+		return 0;
+
 	was_recovery_enabled = calc_nx_huge_pages_recovery_period(&old_period);
 
 	err = param_set_uint(val, kp);
@@ -6971,6 +6984,9 @@  int kvm_mmu_post_init_vm(struct kvm *kvm)
 {
 	int err;
 
+	if (!cpu_has_itlb_multihit())
+		return 0;
+
 	err = kvm_vm_create_worker_thread(kvm, kvm_nx_huge_page_recovery_worker, 0,
 					  "kvm-nx-lpage-recovery",
 					  &kvm->arch.nx_huge_page_recovery_thread);
@@ -6982,6 +6998,9 @@  int kvm_mmu_post_init_vm(struct kvm *kvm)
 
 void kvm_mmu_pre_destroy_vm(struct kvm *kvm)
 {
+	if (!cpu_has_itlb_multihit())
+		return;
+
 	if (kvm->arch.nx_huge_page_recovery_thread)
 		kthread_stop(kvm->arch.nx_huge_page_recovery_thread);
 }