Message ID | 20150303164235.GB2494@potion.brq.redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Thank you for your detailed review on several of my patches. >> >> +static int complete_fast_pio(struct kvm_vcpu *vcpu) > (complete_fast_pio_in()?) If I do a v4 I'll adopt that name. >> +{ >> + unsigned long new_rax = kvm_register_read(vcpu, VCPU_REGS_RAX); > Shouldn't we handle writes in EAX differently than in AX and AL, because > of implicit zero extension. I don't think the implicit zero extension hurts us here, but maybe there is something I'm missing that I need understand. Could you explain this further? > >> + >> + BUG_ON(!vcpu->arch.pio.count); >> + BUG_ON(vcpu->arch.pio.count * vcpu->arch.pio.size > sizeof(new_rax)); > (Looking at it again, a check for 'vcpu->arch.pio.count == 1' would be > sufficient.) I prefer the checks that are there now after your last review, especially since surrounded by BUG_ON they only run on debug kernels. > >> + >> + memcpy(&new_rax, vcpu, sizeof(new_rax)); >> + trace_kvm_pio(KVM_PIO_IN, vcpu->arch.pio.port, vcpu->arch.pio.size, >> + vcpu->arch.pio.count, vcpu->arch.pio_data); >> + kvm_register_write(vcpu, VCPU_REGS_RAX, new_rax); >> + vcpu->arch.pio.count = 0; > I think it is better to call emulator_pio_in_emulated directly, like > > emulator_pio_in_out(&vcpu->arch.emulate_ctxt, vcpu->arch.pio.size, > vcpu->arch.pio.port, &new_rax, 1); > kvm_register_write(vcpu, VCPU_REGS_RAX, new_rax); > > because we know that vcpu->arch.pio.count != 0. I think two extra lines of code in my patch vs your suggestion are worth it to a) reduce execution path length b) increase readability c) avoid breaking the abstraction by not checking the return code d) avoid any future bugs introduced by changes the function that would return a value other than 1. > > Refactoring could avoid the weird vcpu->ctxt->vcpu conversion. > (A better name is always welcome.) The pointer chasing is making me dizzy. I'm not sure why emulator_pio_in_emulated takes a x86_emulate_ctxt when all it does it immediately translate that to a vcpu and never use the x86_emulate_ctxt, why not pass the vcpu in the first place? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
2015-03-03 13:48-0600, Joel Schopp: > >> + unsigned long new_rax = kvm_register_read(vcpu, VCPU_REGS_RAX); > > Shouldn't we handle writes in EAX differently than in AX and AL, because > > of implicit zero extension. > I don't think the implicit zero extension hurts us here, but maybe there > is something I'm missing that I need understand. Could you explain this > further? According to APM vol.2, 2.5.3 Operands and Results, when using EAX, we should zero upper 32 bits of RAX: Zero Extension of Results. In 64-bit mode, when performing 32-bit operations with a GPR destination, the processor zero-extends the 32-bit result into the full 64-bit destination. Both 8-bit and 16-bit operations on GPRs preserve all unwritten upper bits of the destination GPR. This is consistent with legacy 16-bit and 32-bit semantics for partial-width results. Is IN not covered? > >> + BUG_ON(!vcpu->arch.pio.count); > >> + BUG_ON(vcpu->arch.pio.count * vcpu->arch.pio.size > sizeof(new_rax)); > > (Looking at it again, a check for 'vcpu->arch.pio.count == 1' would be > > sufficient.) > I prefer the checks that are there now after your last review, > especially since surrounded by BUG_ON they only run on debug kernels. BUG_ON is checked on essentially all kernels that run KVM. (All distribution-based configs should have it.) If we wanted to validate the size, then this is strictly better: BUG_ON(vcpu->arch.pio.count != 1 || vcpu->arch.pio.size > sizeof(new_rax)) > >> + memcpy(&new_rax, vcpu, sizeof(new_rax)); > >> + trace_kvm_pio(KVM_PIO_IN, vcpu->arch.pio.port, vcpu->arch.pio.size, > >> + vcpu->arch.pio.count, vcpu->arch.pio_data); > >> + kvm_register_write(vcpu, VCPU_REGS_RAX, new_rax); > >> + vcpu->arch.pio.count = 0; > > I think it is better to call emulator_pio_in_emulated directly, like > > > > emulator_pio_in_out(&vcpu->arch.emulate_ctxt, vcpu->arch.pio.size, > > vcpu->arch.pio.port, &new_rax, 1); > > kvm_register_write(vcpu, VCPU_REGS_RAX, new_rax); > > > > because we know that vcpu->arch.pio.count != 0. > I think two extra lines of code in my patch vs your suggestion are worth > it to a) reduce execution path length b) increase readability c) avoid > breaking the abstraction by not checking the return code d) avoid any > future bugs introduced by changes the function that would return a value > other than 1. True, it is horrible, the attached patch should have addressed (c) and (d), and it could be inlined to match (a). Pasting the same code creates bug opportunities when we forget to modify all places. This class of problems can be harder to deal with, that (c) and (d), because we can't simply print all callers. > > Refactoring could avoid the weird vcpu->ctxt->vcpu conversion. > > (A better name is always welcome.) > The pointer chasing is making me dizzy. I'm not sure why > emulator_pio_in_emulated takes a x86_emulate_ctxt when all it does it > immediately translate that to a vcpu and never use the x86_emulate_ctxt, > why not pass the vcpu in the first place? It is a part of x86_emulate_ops, where ctxt is more important ... -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 03/03/2015 21:42, Radim Kr?má? wrote: > 2015-03-03 13:48-0600, Joel Schopp: >>>> + unsigned long new_rax = kvm_register_read(vcpu, VCPU_REGS_RAX); >>> Shouldn't we handle writes in EAX differently than in AX and AL, because >>> of implicit zero extension. >> I don't think the implicit zero extension hurts us here, but maybe there >> is something I'm missing that I need understand. Could you explain this >> further? > > According to APM vol.2, 2.5.3 Operands and Results, when using EAX, > we should zero upper 32 bits of RAX: > > Zero Extension of Results. In 64-bit mode, when performing 32-bit > operations with a GPR destination, the processor zero-extends the 32-bit > result into the full 64-bit destination. Both 8-bit and 16-bit > operations on GPRs preserve all unwritten upper bits of the destination > GPR. This is consistent with legacy 16-bit and 32-bit semantics for > partial-width results. > > Is IN not covered? It is. You need to zero the upper 32 bits. >>>> + BUG_ON(!vcpu->arch.pio.count); >>>> + BUG_ON(vcpu->arch.pio.count * vcpu->arch.pio.size > sizeof(new_rax)); >>> (Looking at it again, a check for 'vcpu->arch.pio.count == 1' would be >>> sufficient.) >> I prefer the checks that are there now after your last review, >> especially since surrounded by BUG_ON they only run on debug kernels. > > BUG_ON is checked on essentially all kernels that run KVM. > (All distribution-based configs should have it.) Correct. > If we wanted to validate the size, then this is strictly better: > BUG_ON(vcpu->arch.pio.count != 1 || vcpu->arch.pio.size > sizeof(new_rax)) That would be a very weird assertion considering that vcpu->arch.pio.size will architecturally be at most 4. The first arm of the || is sufficient. >>>> + memcpy(&new_rax, vcpu, sizeof(new_rax)); >>>> + trace_kvm_pio(KVM_PIO_IN, vcpu->arch.pio.port, vcpu->arch.pio.size, >>>> + vcpu->arch.pio.count, vcpu->arch.pio_data); >>>> + kvm_register_write(vcpu, VCPU_REGS_RAX, new_rax); >>>> + vcpu->arch.pio.count = 0; >>> I think it is better to call emulator_pio_in_emulated directly, like >>> >>> emulator_pio_in_out(&vcpu->arch.emulate_ctxt, vcpu->arch.pio.size, >>> vcpu->arch.pio.port, &new_rax, 1); >>> kvm_register_write(vcpu, VCPU_REGS_RAX, new_rax); >>> >>> because we know that vcpu->arch.pio.count != 0. > > Pasting the same code creates bug opportunities when we forget to modify > all places. This class of problems can be harder to deal with, that (c) > and (d), because we can't simply print all callers. I agree with this and prefer calling emulator_pio_in_emulated in complete_fast_pio_in, indeed. >>> Refactoring could avoid the weird vcpu->ctxt->vcpu conversion. >>> (A better name is always welcome.) No need for that. >> The pointer chasing is making me dizzy. I'm not sure why >> emulator_pio_in_emulated takes a x86_emulate_ctxt when all it does it >> immediately translate that to a vcpu and never use the x86_emulate_ctxt, >> why not pass the vcpu in the first place? Because the emulator is written to be usable outside the Linux kernel as well. Also, the fast path (used if kernel_pio returns 0) doesn't read VCPU_REGS_RAX, thus using an uninitialized variable here: >>> + unsigned long val; >>> + int ret = emulator_pio_in_emulated(&vcpu->arch.emulate_ctxt, size, >>> + port, &val, 1); >>> + >>> + if (ret) >>> + kvm_register_write(vcpu, VCPU_REGS_RAX, val); Thanks, Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 96a8333f3db0..d0e5b086f2e1 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4663,22 +4663,23 @@ static int emulator_pio_in_out(struct kvm_vcpu *vcpu, int size, return 0; } +static void emulator_complete_pio_in(struct kvm_vcpu *vcpu, int size, + unsigned short port, void *val, unsigned int count) +{ + memcpy(val, vcpu->arch.pio_data, size * count); + trace_kvm_pio(KVM_PIO_IN, port, size, count, vcpu->arch.pio_data); + vcpu->arch.pio.count = 0; +} + static int emulator_pio_in_emulated(struct x86_emulate_ctxt *ctxt, int size, unsigned short port, void *val, unsigned int count) { struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt); - int ret; - if (vcpu->arch.pio.count) - goto data_avail; - - ret = emulator_pio_in_out(vcpu, size, port, val, count, true); - if (ret) { -data_avail: - memcpy(val, vcpu->arch.pio_data, size * count); - trace_kvm_pio(KVM_PIO_IN, port, size, count, vcpu->arch.pio_data); - vcpu->arch.pio.count = 0; + if (vcpu->arch.pio.count || + emulator_pio_in_out(vcpu, size, port, val, count, true)) { + emulator_complete_pio_in(vcpu, size, port, val, count); return 1; }