diff mbox

[v3] x86: svm: use kvm_fast_pio_in()

Message ID 20150303164235.GB2494@potion.brq.redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Radim Krčmář March 3, 2015, 4:42 p.m. UTC
2015-03-02 15:02-0600, Joel Schopp:
> From: David Kaplan <David.Kaplan@amd.com>
> 
> We can make the in instruction go faster the same way the out instruction is
> already.

(How much faster do benchmarks run?)

> Changes from v2[Joel]:
> 	* changed rax from u32 to unsigned long
> 	* changed a couple return 0 to BUG_ON()
> 	* changed 8 to sizeof(new_rax)
> 	* added trace hook
> 	* removed redundant clearing of count
> Changes from v1[Joel]
> 	* Added kvm_fast_pio_in() implementation that was left out of v1
> 
> Signed-off-by: David Kaplan <David.Kaplan@amd.com>
> [extracted from larger unlrelated patch, forward ported, addressed reviews, tested]
> Signed-off-by: Joel Schopp <joel.schopp@amd.com>
> ---
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> @@ -5463,6 +5463,36 @@ int kvm_fast_pio_out(struct kvm_vcpu *vcpu, int size, unsigned short port)
>  }
>  EXPORT_SYMBOL_GPL(kvm_fast_pio_out);
>  
> +static int complete_fast_pio(struct kvm_vcpu *vcpu)

(complete_fast_pio_in()?)

> +{
> +	unsigned long new_rax = kvm_register_read(vcpu, VCPU_REGS_RAX);

Shouldn't we handle writes in EAX differently than in AX and AL, because
of implicit zero extension.

> +
> +	BUG_ON(!vcpu->arch.pio.count);
> +	BUG_ON(vcpu->arch.pio.count * vcpu->arch.pio.size > sizeof(new_rax));

(Looking at it again, a check for 'vcpu->arch.pio.count == 1' would be
 sufficient.)

> +
> +	memcpy(&new_rax, vcpu, sizeof(new_rax));
> +	trace_kvm_pio(KVM_PIO_IN, vcpu->arch.pio.port, vcpu->arch.pio.size,
> +		      vcpu->arch.pio.count, vcpu->arch.pio_data);
> +	kvm_register_write(vcpu, VCPU_REGS_RAX, new_rax);
> +	vcpu->arch.pio.count = 0;

I think it is better to call emulator_pio_in_emulated directly, like

   	emulator_pio_in_out(&vcpu->arch.emulate_ctxt, vcpu->arch.pio.size,
   			vcpu->arch.pio.port, &new_rax, 1);
   	kvm_register_write(vcpu, VCPU_REGS_RAX, new_rax);

because we know that vcpu->arch.pio.count != 0.

Refactoring could avoid the weird vcpu->ctxt->vcpu conversion.
(A better name is always welcome.)

---
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Joel Schopp March 3, 2015, 7:48 p.m. UTC | #1
Thank you for your detailed review on several of my patches.

>>  
>> +static int complete_fast_pio(struct kvm_vcpu *vcpu)
> (complete_fast_pio_in()?)
If I do a v4 I'll adopt that name.
>> +{
>> +	unsigned long new_rax = kvm_register_read(vcpu, VCPU_REGS_RAX);
> Shouldn't we handle writes in EAX differently than in AX and AL, because
> of implicit zero extension.
I don't think the implicit zero extension hurts us here, but maybe there
is something I'm missing that I need understand. Could you explain this
further?
>
>> +
>> +	BUG_ON(!vcpu->arch.pio.count);
>> +	BUG_ON(vcpu->arch.pio.count * vcpu->arch.pio.size > sizeof(new_rax));
> (Looking at it again, a check for 'vcpu->arch.pio.count == 1' would be
>  sufficient.)
I prefer the checks that are there now after your last review,
especially since surrounded by BUG_ON they only run on debug kernels.

>
>> +
>> +	memcpy(&new_rax, vcpu, sizeof(new_rax));
>> +	trace_kvm_pio(KVM_PIO_IN, vcpu->arch.pio.port, vcpu->arch.pio.size,
>> +		      vcpu->arch.pio.count, vcpu->arch.pio_data);
>> +	kvm_register_write(vcpu, VCPU_REGS_RAX, new_rax);
>> +	vcpu->arch.pio.count = 0;
> I think it is better to call emulator_pio_in_emulated directly, like
>
>    	emulator_pio_in_out(&vcpu->arch.emulate_ctxt, vcpu->arch.pio.size,
>    			vcpu->arch.pio.port, &new_rax, 1);
>    	kvm_register_write(vcpu, VCPU_REGS_RAX, new_rax);
>
> because we know that vcpu->arch.pio.count != 0.
I think two extra lines of code in my patch vs your suggestion are worth
it to a) reduce execution path length b) increase readability c) avoid
breaking the abstraction by not checking the return code d) avoid any
future bugs introduced by changes the function that would return a value
other than 1. 
>
> Refactoring could avoid the weird vcpu->ctxt->vcpu conversion.
> (A better name is always welcome.)
The pointer chasing is making me dizzy.  I'm not sure why
emulator_pio_in_emulated takes a x86_emulate_ctxt when all it does it
immediately translate that to a vcpu and never use the x86_emulate_ctxt,
why not pass the vcpu in the first place?

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Radim Krčmář March 3, 2015, 8:42 p.m. UTC | #2
2015-03-03 13:48-0600, Joel Schopp:
> >> +	unsigned long new_rax = kvm_register_read(vcpu, VCPU_REGS_RAX);
> > Shouldn't we handle writes in EAX differently than in AX and AL, because
> > of implicit zero extension.
> I don't think the implicit zero extension hurts us here, but maybe there
> is something I'm missing that I need understand. Could you explain this
> further?

According to APM vol.2, 2.5.3 Operands and Results, when using EAX,
we should zero upper 32 bits of RAX:

  Zero Extension of Results. In 64-bit mode, when performing 32-bit
  operations with a GPR destination, the processor zero-extends the 32-bit
  result into the full 64-bit destination. Both 8-bit and 16-bit
  operations on GPRs preserve all unwritten upper bits of the destination
  GPR. This is consistent with legacy 16-bit and 32-bit semantics for
  partial-width results.

Is IN not covered?

> >> +	BUG_ON(!vcpu->arch.pio.count);
> >> +	BUG_ON(vcpu->arch.pio.count * vcpu->arch.pio.size > sizeof(new_rax));
> > (Looking at it again, a check for 'vcpu->arch.pio.count == 1' would be
> >  sufficient.)
> I prefer the checks that are there now after your last review,
> especially since surrounded by BUG_ON they only run on debug kernels.

BUG_ON is checked on essentially all kernels that run KVM.
(All distribution-based configs should have it.)

If we wanted to validate the size, then this is strictly better:
  BUG_ON(vcpu->arch.pio.count != 1 || vcpu->arch.pio.size > sizeof(new_rax))

> >> +	memcpy(&new_rax, vcpu, sizeof(new_rax));
> >> +	trace_kvm_pio(KVM_PIO_IN, vcpu->arch.pio.port, vcpu->arch.pio.size,
> >> +		      vcpu->arch.pio.count, vcpu->arch.pio_data);
> >> +	kvm_register_write(vcpu, VCPU_REGS_RAX, new_rax);
> >> +	vcpu->arch.pio.count = 0;
> > I think it is better to call emulator_pio_in_emulated directly, like
> >
> >    	emulator_pio_in_out(&vcpu->arch.emulate_ctxt, vcpu->arch.pio.size,
> >    			vcpu->arch.pio.port, &new_rax, 1);
> >    	kvm_register_write(vcpu, VCPU_REGS_RAX, new_rax);
> >
> > because we know that vcpu->arch.pio.count != 0.
> I think two extra lines of code in my patch vs your suggestion are worth
> it to a) reduce execution path length b) increase readability c) avoid
> breaking the abstraction by not checking the return code d) avoid any
> future bugs introduced by changes the function that would return a value
> other than 1. 

True, it is horrible, the attached patch should have addressed (c) and
(d), and it could be inlined to match (a).

Pasting the same code creates bug opportunities when we forget to modify
all places.  This class of problems can be harder to deal with, that (c)
and (d), because we can't simply print all callers.

> > Refactoring could avoid the weird vcpu->ctxt->vcpu conversion.
> > (A better name is always welcome.)
> The pointer chasing is making me dizzy.  I'm not sure why
> emulator_pio_in_emulated takes a x86_emulate_ctxt when all it does it
> immediately translate that to a vcpu and never use the x86_emulate_ctxt,
> why not pass the vcpu in the first place?

It is a part of x86_emulate_ops, where ctxt is more important ...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paolo Bonzini April 7, 2015, 12:55 p.m. UTC | #3
On 03/03/2015 21:42, Radim Kr?má? wrote:
> 2015-03-03 13:48-0600, Joel Schopp:
>>>> +	unsigned long new_rax = kvm_register_read(vcpu, VCPU_REGS_RAX);
>>> Shouldn't we handle writes in EAX differently than in AX and AL, because
>>> of implicit zero extension.
>> I don't think the implicit zero extension hurts us here, but maybe there
>> is something I'm missing that I need understand. Could you explain this
>> further?
> 
> According to APM vol.2, 2.5.3 Operands and Results, when using EAX,
> we should zero upper 32 bits of RAX:
> 
>   Zero Extension of Results. In 64-bit mode, when performing 32-bit
>   operations with a GPR destination, the processor zero-extends the 32-bit
>   result into the full 64-bit destination. Both 8-bit and 16-bit
>   operations on GPRs preserve all unwritten upper bits of the destination
>   GPR. This is consistent with legacy 16-bit and 32-bit semantics for
>   partial-width results.
> 
> Is IN not covered?

It is.  You need to zero the upper 32 bits.

>>>> +	BUG_ON(!vcpu->arch.pio.count);
>>>> +	BUG_ON(vcpu->arch.pio.count * vcpu->arch.pio.size > sizeof(new_rax));
>>> (Looking at it again, a check for 'vcpu->arch.pio.count == 1' would be
>>>  sufficient.)
>> I prefer the checks that are there now after your last review,
>> especially since surrounded by BUG_ON they only run on debug kernels.
> 
> BUG_ON is checked on essentially all kernels that run KVM.
> (All distribution-based configs should have it.)

Correct.

> If we wanted to validate the size, then this is strictly better:
>   BUG_ON(vcpu->arch.pio.count != 1 || vcpu->arch.pio.size > sizeof(new_rax))

That would be a very weird assertion considering that
vcpu->arch.pio.size will architecturally be at most 4.

The first arm of the || is sufficient.

>>>> +	memcpy(&new_rax, vcpu, sizeof(new_rax));
>>>> +	trace_kvm_pio(KVM_PIO_IN, vcpu->arch.pio.port, vcpu->arch.pio.size,
>>>> +		      vcpu->arch.pio.count, vcpu->arch.pio_data);
>>>> +	kvm_register_write(vcpu, VCPU_REGS_RAX, new_rax);
>>>> +	vcpu->arch.pio.count = 0;
>>> I think it is better to call emulator_pio_in_emulated directly, like
>>>
>>>    	emulator_pio_in_out(&vcpu->arch.emulate_ctxt, vcpu->arch.pio.size,
>>>    			vcpu->arch.pio.port, &new_rax, 1);
>>>    	kvm_register_write(vcpu, VCPU_REGS_RAX, new_rax);
>>>
>>> because we know that vcpu->arch.pio.count != 0.
> 
> Pasting the same code creates bug opportunities when we forget to modify
> all places.  This class of problems can be harder to deal with, that (c)
> and (d), because we can't simply print all callers.

I agree with this and prefer calling emulator_pio_in_emulated in
complete_fast_pio_in, indeed.

>>> Refactoring could avoid the weird vcpu->ctxt->vcpu conversion.
>>> (A better name is always welcome.)

No need for that.

>> The pointer chasing is making me dizzy.  I'm not sure why
>> emulator_pio_in_emulated takes a x86_emulate_ctxt when all it does it
>> immediately translate that to a vcpu and never use the x86_emulate_ctxt,
>> why not pass the vcpu in the first place?

Because the emulator is written to be usable outside the Linux kernel as
well.

Also, the fast path (used if kernel_pio returns 0) doesn't read
VCPU_REGS_RAX, thus using an uninitialized variable here:

>>> +	unsigned long val;
>>> +	int ret = emulator_pio_in_emulated(&vcpu->arch.emulate_ctxt, size,
>>> +					   port, &val, 1);
>>> +
>>> +	if (ret)
>>> +		kvm_register_write(vcpu, VCPU_REGS_RAX, val);

Thanks,

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 96a8333f3db0..d0e5b086f2e1 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4663,22 +4663,23 @@  static int emulator_pio_in_out(struct kvm_vcpu *vcpu, int size,
 	return 0;
 }
 
+static void emulator_complete_pio_in(struct kvm_vcpu *vcpu, int size,
+		unsigned short port, void *val, unsigned int count)
+{
+	memcpy(val, vcpu->arch.pio_data, size * count);
+	trace_kvm_pio(KVM_PIO_IN, port, size, count, vcpu->arch.pio_data);
+	vcpu->arch.pio.count = 0;
+}
+
 static int emulator_pio_in_emulated(struct x86_emulate_ctxt *ctxt,
 				    int size, unsigned short port, void *val,
 				    unsigned int count)
 {
 	struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt);
-	int ret;
 
-	if (vcpu->arch.pio.count)
-		goto data_avail;
-
-	ret = emulator_pio_in_out(vcpu, size, port, val, count, true);
-	if (ret) {
-data_avail:
-		memcpy(val, vcpu->arch.pio_data, size * count);
-		trace_kvm_pio(KVM_PIO_IN, port, size, count, vcpu->arch.pio_data);
-		vcpu->arch.pio.count = 0;
+	if (vcpu->arch.pio.count ||
+	    emulator_pio_in_out(vcpu, size, port, val, count, true)) {
+		emulator_complete_pio_in(vcpu, size, port, val, count);
 		return 1;
 	}