diff mbox

[3/3] KVM: x86: frequency change hypercalls

Message ID 20170202174921.983532379@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Marcelo Tosatti Feb. 2, 2017, 5:47 p.m. UTC
Implement min/max/up/down frequency change 
KVM hypercalls. To be used by DPDK implementation.

Also allow such hypercalls from guest userspace.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

---
 Documentation/virtual/kvm/hypercalls.txt |   45 +++++++++++++++++++
 arch/x86/kvm/x86.c                       |   71 ++++++++++++++++++++++++++++++-
 include/uapi/linux/kvm_para.h            |    5 ++
 3 files changed, 120 insertions(+), 1 deletion(-)

Comments

Marcelo Tosatti Feb. 2, 2017, 6:01 p.m. UTC | #1
On Thu, Feb 02, 2017 at 03:47:58PM -0200, Marcelo Tosatti wrote:
> Implement min/max/up/down frequency change 
> KVM hypercalls. To be used by DPDK implementation.
> 
> Also allow such hypercalls from guest userspace.
> 
> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
> 
> ---
>  Documentation/virtual/kvm/hypercalls.txt |   45 +++++++++++++++++++
>  arch/x86/kvm/x86.c                       |   71 ++++++++++++++++++++++++++++++-
>  include/uapi/linux/kvm_para.h            |    5 ++
>  3 files changed, 120 insertions(+), 1 deletion(-)
> 
> Index: kvm-pvfreq/arch/x86/kvm/x86.c
> ===================================================================
> --- kvm-pvfreq.orig/arch/x86/kvm/x86.c	2017-02-02 11:17:17.063756725 -0200
> +++ kvm-pvfreq/arch/x86/kvm/x86.c	2017-02-02 11:17:17.822752510 -0200
> @@ -6219,10 +6219,58 @@
>  	kvm_x86_ops->refresh_apicv_exec_ctrl(vcpu);
>  }
>  
> +#ifdef CONFIG_CPU_FREQ_GOV_USERSPACE
> +/* call into cpufreq-userspace governor */
> +static int kvm_pvfreq_up(struct kvm_vcpu *vcpu)
> +{
> +	int ret;
> +	int cpu = get_cpu();
> +
> +	ret = cpufreq_userspace_freq_up(cpu);
> +	put_cpu();
> +
> +	return ret;
> +}
> +
> +static int kvm_pvfreq_down(struct kvm_vcpu *vcpu)
> +{
> +	int ret;
> +	int cpu = get_cpu();
> +
> +	ret = cpufreq_userspace_freq_down(cpu);
> +	put_cpu();
> +
> +	return ret;
> +}
> +
> +static int kvm_pvfreq_max(struct kvm_vcpu *vcpu)
> +{
> +	int ret;
> +	int cpu = get_cpu();
> +
> +	ret = cpufreq_userspace_freq_max(cpu);
> +	put_cpu();
> +
> +	return ret;
> +}
> +
> +static int kvm_pvfreq_min(struct kvm_vcpu *vcpu)
> +{
> +	int ret;
> +	int cpu = get_cpu();
> +
> +	ret = cpufreq_userspace_freq_min(cpu);
> +	put_cpu();
> +
> +	return ret;
> +}
> +#endif
> +
>  int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
>  {
>  	unsigned long nr, a0, a1, a2, a3, ret;
>  	int op_64_bit, r;
> +	bool cpl_check;
>  
>  	r = kvm_skip_emulated_instruction(vcpu);
>  
> @@ -6246,7 +6294,13 @@
>  		a3 &= 0xFFFFFFFF;
>  	}
>  
> -	if (kvm_x86_ops->get_cpl(vcpu) != 0) {
> +	cpl_check = true;
> +	if (nr == KVM_HC_FREQ_UP || nr == KVM_HC_FREQ_DOWN ||
> +	    nr == KVM_HC_FREQ_MIN || nr == KVM_HC_FREQ_MAX)
> +		if (vcpu->arch.allow_freq_hypercall == true)
> +			cpl_check = false;
> +
> +	if (cpl_check == true && kvm_x86_ops->get_cpl(vcpu) != 0) {
>  		ret = -KVM_EPERM;
>  		goto out;

This should fail with EPERM if vcpu->arch.allow_freq_hypercall ==
false, independently of CPL level.

Will resend with that (and other comments) in v2.
Radim Krčmář Feb. 3, 2017, 5:40 p.m. UTC | #2
2017-02-02 15:47-0200, Marcelo Tosatti:
> Implement min/max/up/down frequency change 
> KVM hypercalls. To be used by DPDK implementation.
> 
> Also allow such hypercalls from guest userspace.
> 
> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
> 
> ---
> Index: kvm-pvfreq/arch/x86/kvm/x86.c
> ===================================================================
> --- kvm-pvfreq.orig/arch/x86/kvm/x86.c	2017-02-02 11:17:17.063756725 -0200
> +++ kvm-pvfreq/arch/x86/kvm/x86.c	2017-02-02 11:17:17.822752510 -0200
> @@ -6219,10 +6219,58 @@

[Here lived copy-paste.]

>  int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
>  {
>  	unsigned long nr, a0, a1, a2, a3, ret;
>  	int op_64_bit, r;
> +	bool cpl_check;
>  
>  	r = kvm_skip_emulated_instruction(vcpu);
>  
> @@ -6246,7 +6294,13 @@
>  		a3 &= 0xFFFFFFFF;
>  	}
>  
> -	if (kvm_x86_ops->get_cpl(vcpu) != 0) {
> +	cpl_check = true;
> +	if (nr == KVM_HC_FREQ_UP || nr == KVM_HC_FREQ_DOWN ||
> +	    nr == KVM_HC_FREQ_MIN || nr == KVM_HC_FREQ_MAX)
> +		if (vcpu->arch.allow_freq_hypercall == true)
> +			cpl_check = false;
> +
> +	if (cpl_check == true && kvm_x86_ops->get_cpl(vcpu) != 0) {
>  		ret = -KVM_EPERM;
>  		goto out;
>  	}
> @@ -6262,6 +6316,21 @@
>  	case KVM_HC_CLOCK_PAIRING:
>  		ret = kvm_pv_clock_pairing(vcpu, a0, a1);
>  		break;
> +#ifdef CONFIG_CPU_FREQ_GOV_USERSPACE

CONFIG_CPU_FREQ_GOV_USERSPACE should be checked when enabling the
capability.

> +	case KVM_HC_FREQ_UP:
> +		ret = kvm_pvfreq_up(vcpu);
> +		break;
> +	case KVM_HC_FREQ_DOWN:
> +		ret = kvm_pvfreq_down(vcpu);
> +		break;
> +	case KVM_HC_FREQ_MAX:
> +		ret = kvm_pvfreq_max(vcpu);
> +		break;
> +	case KVM_HC_FREQ_MIN:
> +		ret = kvm_pvfreq_min(vcpu);
> +		break;

Having 4 hypercalls for this is an overkill.
You can make it one hypercall with an argument.

And the argument doesn't have to be enum {UP, DOWN, MAX, MIN}, but an
int, which would also allow you to do -2 steps.
A number over the capabilites of stepping would just map to MAX/MIN.

Avoiding an absolute scale for interface simplifies migration, where the
guest cannot really depend much on this.  Except that calling it with
MIN (INT_MIN) will get the minimum and MAX (INT_MAX) the maximum
frequency.

Plese explictly say in documentation that things like the number of
steps, which the guest can learn by doing MAX and then -1 until the
hypercall fails, is undefined and should not be depended upon.

Userspace might still want know the number of steps to avoid useless
hypercall -- I think we should return a different value when the limit
is reached, not just after the guest wants to go past it.

> +#endif
> +
>  	default:
>  		ret = -KVM_ENOSYS;
>  		break;

And thinking more about migration, userspace cannot learn the current
frequency (at least MIN/MAX), so the new host will just pick at random,
which will break userspace's expectations that it cannot increase or
decrease the frequency.  Is migration left for the future, because DPDK
doesn't migrate anyway?

Thanks.
Marcelo Tosatti Feb. 3, 2017, 6:24 p.m. UTC | #3
On Fri, Feb 03, 2017 at 06:40:34PM +0100, Radim Krcmar wrote:
> 2017-02-02 15:47-0200, Marcelo Tosatti:
> > Implement min/max/up/down frequency change 
> > KVM hypercalls. To be used by DPDK implementation.
> > 
> > Also allow such hypercalls from guest userspace.
> > 
> > Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
> > 
> > ---
> > Index: kvm-pvfreq/arch/x86/kvm/x86.c
> > ===================================================================
> > --- kvm-pvfreq.orig/arch/x86/kvm/x86.c	2017-02-02 11:17:17.063756725 -0200
> > +++ kvm-pvfreq/arch/x86/kvm/x86.c	2017-02-02 11:17:17.822752510 -0200
> > @@ -6219,10 +6219,58 @@
> 
> [Here lived copy-paste.]
> 
> >  int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
> >  {
> >  	unsigned long nr, a0, a1, a2, a3, ret;
> >  	int op_64_bit, r;
> > +	bool cpl_check;
> >  
> >  	r = kvm_skip_emulated_instruction(vcpu);
> >  
> > @@ -6246,7 +6294,13 @@
> >  		a3 &= 0xFFFFFFFF;
> >  	}
> >  
> > -	if (kvm_x86_ops->get_cpl(vcpu) != 0) {
> > +	cpl_check = true;
> > +	if (nr == KVM_HC_FREQ_UP || nr == KVM_HC_FREQ_DOWN ||
> > +	    nr == KVM_HC_FREQ_MIN || nr == KVM_HC_FREQ_MAX)
> > +		if (vcpu->arch.allow_freq_hypercall == true)
> > +			cpl_check = false;
> > +
> > +	if (cpl_check == true && kvm_x86_ops->get_cpl(vcpu) != 0) {
> >  		ret = -KVM_EPERM;
> >  		goto out;
> >  	}
> > @@ -6262,6 +6316,21 @@
> >  	case KVM_HC_CLOCK_PAIRING:
> >  		ret = kvm_pv_clock_pairing(vcpu, a0, a1);
> >  		break;
> > +#ifdef CONFIG_CPU_FREQ_GOV_USERSPACE
> 
> CONFIG_CPU_FREQ_GOV_USERSPACE should be checked when enabling the
> capability.
> 
> > +	case KVM_HC_FREQ_UP:
> > +		ret = kvm_pvfreq_up(vcpu);
> > +		break;
> > +	case KVM_HC_FREQ_DOWN:
> > +		ret = kvm_pvfreq_down(vcpu);
> > +		break;
> > +	case KVM_HC_FREQ_MAX:
> > +		ret = kvm_pvfreq_max(vcpu);
> > +		break;
> > +	case KVM_HC_FREQ_MIN:
> > +		ret = kvm_pvfreq_min(vcpu);
> > +		break;
> 
> Having 4 hypercalls for this is an overkill.
> You can make it one hypercall with an argument.

Fine.

> And the argument doesn't have to be enum {UP, DOWN, MAX, MIN}, but an
> int, which would also allow you to do -2 steps.

Are you suggesting to have an integer to signify the number of steps up
or down.

> A number over the capabilites of stepping would just map to MAX/MIN.

Then MAX == any positive value above the number of steps
     MIN == any negative value below the negative of number of steps

Sure.

> Avoiding an absolute scale for interface simplifies migration, where the
> guest cannot really depend much on this.  Except that calling it with
> MIN (INT_MIN) will get the minimum and MAX (INT_MAX) the maximum
> frequency.

Are you suggesting for the hypercall to return the maximum/minimum
frequency if called with the highest integer and lowest negative integer 
respectively? (That same hypercall).

Sure.

> Plese explictly say in documentation that things like the number of
> steps, which the guest can learn by doing MAX and then -1 until the
> hypercall fails, is undefined and should not be depended upon.

Sure, because it fails over migration.

> Userspace might still want know the number of steps to avoid useless
> hypercall -- I think we should return a different value when the limit
> is reached, not just after the guest wants to go past it.

Are you suggesting to return a different value when going from 

max-1 -> max  
and
min+1 -> min

frequencies?

Fine.

> > +#endif
> > +
> >  	default:
> >  		ret = -KVM_ENOSYS;
> >  		break;
> 
> And thinking more about migration, userspace cannot learn the current
> frequency (at least MIN/MAX), so the new host will just pick at random,
> which will break userspace's expectations that it cannot increase or
> decrease the frequency.  Is migration left for the future, because DPDK
> doesn't migrate anyway?
> 
> Thanks.

The new host should start with the highest frequency always. Then
the frequency tuning algorithm can reduce frequency afterwards.

Migration is a desired feature for DPDK, so it should be supported
(thats one reason why virtio-net drivers are used in the guest BTW).
Radim Krčmář Feb. 3, 2017, 7:28 p.m. UTC | #4
2017-02-03 16:24-0200, Marcelo Tosatti:
> On Fri, Feb 03, 2017 at 06:40:34PM +0100, Radim Krcmar wrote:
>> You can make it one hypercall with an argument.
> 
> Fine.
> 
>> And the argument doesn't have to be enum {UP, DOWN, MAX, MIN}, but an
>> int, which would also allow you to do -2 steps.
> 
> Are you suggesting to have an integer to signify the number of steps up
> or down.

Yes.

>> A number over the capabilites of stepping would just map to MAX/MIN.
> 
> Then MAX == any positive value above the number of steps
>      MIN == any negative value below the negative of number of steps
> 
> Sure.
> 
>> Avoiding an absolute scale for interface simplifies migration, where the
>> guest cannot really depend much on this.  Except that calling it with
>> MIN (INT_MIN) will get the minimum and MAX (INT_MAX) the maximum
>> frequency.
> 
> Are you suggesting for the hypercall to return the maximum/minimum
> frequency if called with the highest integer and lowest negative integer 
> respectively? (That same hypercall).

No, I meant that we will guarantee that the guest will always get (the
CPU will be in) the minimal frequency when hypercall parameter is
INT_MIN and the maximal with INT_MAX -- just so the guest wouldn't lose
the ability which you provided by MIN and MAX hypercalls.

(We could also make a stronger assertion that there is never going to be
 more than INT_MAX steps, CPUs that run KVM will probably never have
 that fine frequency control.)

>> Plese explictly say in documentation that things like the number of
>> steps, which the guest can learn by doing MAX and then -1 until the
>> hypercall fails, is undefined and should not be depended upon.
> 
> Sure, because it fails over migration.
> 
>> Userspace might still want know the number of steps to avoid useless
>> hypercall -- I think we should return a different value when the limit
>> is reached, not just after the guest wants to go past it.
> 
> Are you suggesting to return a different value when going from 
> 
> max-1 -> max  
> and
> min+1 -> min
> 
> frequencies?

Yes.  Like you do now when going "up" from "max".
It saves one call of the hypercall.

> Fine.
> 
>> > +#endif
>> > +
>> >  	default:
>> >  		ret = -KVM_ENOSYS;
>> >  		break;
>> 
>> And thinking more about migration, userspace cannot learn the current
>> frequency (at least MIN/MAX), so the new host will just pick at random,
>> which will break userspace's expectations that it cannot increase or
>> decrease the frequency.  Is migration left for the future, because DPDK
>> doesn't migrate anyway?
>> 
>> Thanks.
> 
> The new host should start with the highest frequency always. Then
> the frequency tuning algorithm can reduce frequency afterwards.

That is not going to work on migration.

Suppose we do that and the CPU is in minimal frequency before the
migration.  This means that queue is below the threshold and userspace
knows that it is in minimum frequency (because we provide that
information when going down), so it doesn't trigger useless hypercalls.

After migration, the host would set frequency to maximum, but userspace
would still thing that it is minimal, so it would decrease it.

The only reason for this series -- power saving -- is lost.

> Migration is a desired feature for DPDK, so it should be supported
> (thats one reason why virtio-net drivers are used in the guest BTW).

Oh, nice,

thanks.
diff mbox

Patch

Index: kvm-pvfreq/arch/x86/kvm/x86.c
===================================================================
--- kvm-pvfreq.orig/arch/x86/kvm/x86.c	2017-02-02 11:17:17.063756725 -0200
+++ kvm-pvfreq/arch/x86/kvm/x86.c	2017-02-02 11:17:17.822752510 -0200
@@ -6219,10 +6219,58 @@ 
 	kvm_x86_ops->refresh_apicv_exec_ctrl(vcpu);
 }
 
+#ifdef CONFIG_CPU_FREQ_GOV_USERSPACE
+/* call into cpufreq-userspace governor */
+static int kvm_pvfreq_up(struct kvm_vcpu *vcpu)
+{
+	int ret;
+	int cpu = get_cpu();
+
+	ret = cpufreq_userspace_freq_up(cpu);
+	put_cpu();
+
+	return ret;
+}
+
+static int kvm_pvfreq_down(struct kvm_vcpu *vcpu)
+{
+	int ret;
+	int cpu = get_cpu();
+
+	ret = cpufreq_userspace_freq_down(cpu);
+	put_cpu();
+
+	return ret;
+}
+
+static int kvm_pvfreq_max(struct kvm_vcpu *vcpu)
+{
+	int ret;
+	int cpu = get_cpu();
+
+	ret = cpufreq_userspace_freq_max(cpu);
+	put_cpu();
+
+	return ret;
+}
+
+static int kvm_pvfreq_min(struct kvm_vcpu *vcpu)
+{
+	int ret;
+	int cpu = get_cpu();
+
+	ret = cpufreq_userspace_freq_min(cpu);
+	put_cpu();
+
+	return ret;
+}
+#endif
+
 int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
 {
 	unsigned long nr, a0, a1, a2, a3, ret;
 	int op_64_bit, r;
+	bool cpl_check;
 
 	r = kvm_skip_emulated_instruction(vcpu);
 
@@ -6246,7 +6294,13 @@ 
 		a3 &= 0xFFFFFFFF;
 	}
 
-	if (kvm_x86_ops->get_cpl(vcpu) != 0) {
+	cpl_check = true;
+	if (nr == KVM_HC_FREQ_UP || nr == KVM_HC_FREQ_DOWN ||
+	    nr == KVM_HC_FREQ_MIN || nr == KVM_HC_FREQ_MAX)
+		if (vcpu->arch.allow_freq_hypercall == true)
+			cpl_check = false;
+
+	if (cpl_check == true && kvm_x86_ops->get_cpl(vcpu) != 0) {
 		ret = -KVM_EPERM;
 		goto out;
 	}
@@ -6262,6 +6316,21 @@ 
 	case KVM_HC_CLOCK_PAIRING:
 		ret = kvm_pv_clock_pairing(vcpu, a0, a1);
 		break;
+#ifdef CONFIG_CPU_FREQ_GOV_USERSPACE
+	case KVM_HC_FREQ_UP:
+		ret = kvm_pvfreq_up(vcpu);
+		break;
+	case KVM_HC_FREQ_DOWN:
+		ret = kvm_pvfreq_down(vcpu);
+		break;
+	case KVM_HC_FREQ_MAX:
+		ret = kvm_pvfreq_max(vcpu);
+		break;
+	case KVM_HC_FREQ_MIN:
+		ret = kvm_pvfreq_min(vcpu);
+		break;
+#endif
+
 	default:
 		ret = -KVM_ENOSYS;
 		break;
Index: kvm-pvfreq/include/uapi/linux/kvm_para.h
===================================================================
--- kvm-pvfreq.orig/include/uapi/linux/kvm_para.h	2017-02-02 10:51:53.741217306 -0200
+++ kvm-pvfreq/include/uapi/linux/kvm_para.h	2017-02-02 11:17:17.824752499 -0200
@@ -25,6 +25,11 @@ 
 #define KVM_HC_MIPS_EXIT_VM		7
 #define KVM_HC_MIPS_CONSOLE_OUTPUT	8
 #define KVM_HC_CLOCK_PAIRING		9
+#define KVM_HC_FREQ_UP			10
+#define KVM_HC_FREQ_DOWN		11
+#define KVM_HC_FREQ_MAX			12
+#define KVM_HC_FREQ_MIN			13
+
 
 /*
  * hypercalls use architecture specific
Index: kvm-pvfreq/Documentation/virtual/kvm/hypercalls.txt
===================================================================
--- kvm-pvfreq.orig/Documentation/virtual/kvm/hypercalls.txt	2017-02-02 10:51:53.741217306 -0200
+++ kvm-pvfreq/Documentation/virtual/kvm/hypercalls.txt	2017-02-02 15:29:24.401692793 -0200
@@ -116,3 +116,48 @@ 
 
 Returns KVM_EOPNOTSUPP if the host does not use TSC clocksource,
 or if clock type is different than KVM_CLOCK_PAIRING_WALLCLOCK.
+
+7. KVM_HC_FREQ_UP
+-----------------
+
+Architecture: x86
+Status: active
+Purpose: Hypercall used to increase frequency to the next
+higher frequency.
+Usage example: DPDK power aware applications, that run on
+isolated CPUs. No input argument, returns 0 if success,
+1 if already at lowest frequency, error otherwise.
+
+8. KVM_HC_FREQ_DOWN
+---------------------
+
+Architecture: x86
+Status: active
+Purpose: Hypercall used to decrease frequency to the next
+lower frequency.
+Usage example: DPDK power aware applications, that run on
+isolated CPUs. No input argument, returns 0 if success,
+1 if already at lowest frequency, negative error otherwise.
+
+9. KVM_HC_FREQ_MIN
+-------------------
+
+Architecture: x86
+Status: active
+Purpose: Hypercall used to decrease frequency to the
+minimum frequency.
+Usage example: DPDK power aware applications, that run
+on isolated CPUs. No input argument, returns 0 if success
+error otherwise.
+
+10. KVM_HC_FREQ_MAX
+-------------------
+
+Architecture: x86
+Status: active
+Purpose: Hypercall used to increase frequency to the
+maximum frequency.
+Usage example: DPDK power aware applications, that run
+on isolated CPUs. No input argument, returns 0 if success
+error otherwise.
+