Message ID | 20191104230001.27774-4-aarcange@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM monolithic v3 | expand |
On 04/11/19 23:59, Andrea Arcangeli wrote: > kvm_x86_set_hv_timer and kvm_x86_cancel_hv_timer needs to be defined > to succeed the 32bit kernel build, but they can't be called. > > Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> > --- > arch/x86/kvm/vmx/vmx.c | 11 +++++++++++ > 1 file changed, 11 insertions(+) > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c > index bd17ad61f7e3..1a58ae38c8f2 100644 > --- a/arch/x86/kvm/vmx/vmx.c > +++ b/arch/x86/kvm/vmx/vmx.c > @@ -7195,6 +7195,17 @@ void kvm_x86_cancel_hv_timer(struct kvm_vcpu *vcpu) > { > to_vmx(vcpu)->hv_deadline_tsc = -1; > } > +#else > +int kvm_x86_set_hv_timer(struct kvm_vcpu *vcpu, u64 guest_deadline_tsc, > + bool *expired) > +{ > + BUG(); > +} > + > +void kvm_x86_cancel_hv_timer(struct kvm_vcpu *vcpu) > +{ > + BUG(); > +} > #endif > > void kvm_x86_sched_in(struct kvm_vcpu *vcpu, int cpu) > I'll check for how long this has been broken. It may be the proof that we can actually drop 32-bit KVM support. Paolo
On 05/11/19 11:04, Paolo Bonzini wrote: > On 04/11/19 23:59, Andrea Arcangeli wrote: >> kvm_x86_set_hv_timer and kvm_x86_cancel_hv_timer needs to be defined >> to succeed the 32bit kernel build, but they can't be called. >> >> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> >> --- >> arch/x86/kvm/vmx/vmx.c | 11 +++++++++++ >> 1 file changed, 11 insertions(+) >> >> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c >> index bd17ad61f7e3..1a58ae38c8f2 100644 >> --- a/arch/x86/kvm/vmx/vmx.c >> +++ b/arch/x86/kvm/vmx/vmx.c >> @@ -7195,6 +7195,17 @@ void kvm_x86_cancel_hv_timer(struct kvm_vcpu *vcpu) >> { >> to_vmx(vcpu)->hv_deadline_tsc = -1; >> } >> +#else >> +int kvm_x86_set_hv_timer(struct kvm_vcpu *vcpu, u64 guest_deadline_tsc, >> + bool *expired) >> +{ >> + BUG(); >> +} >> + >> +void kvm_x86_cancel_hv_timer(struct kvm_vcpu *vcpu) >> +{ >> + BUG(); >> +} >> #endif >> >> void kvm_x86_sched_in(struct kvm_vcpu *vcpu, int cpu) >> > > I'll check for how long this has been broken. It may be the proof that > we can actually drop 32-bit KVM support. Ah no, I was confused because this series is not bisectable (in addition to doing two things at a same time, namely the monolithic kvm.ko and the retpoline eliminations). I have picked up the patches that are independent of the monolithic kvm.ko work or can be considered bugfixes. For the rest, please do this before posting again: - ensure that everything is bisectable - look into how to remove the modpost warnings. A simple (though somewhat ugly) way is to keep a kvm.ko module that includes common virt/kvm/ code as well as, for x86 only, page_track.o. A few functions, such as kvm_mmu_gfn_disallow_lpage and kvm_mmu_gfn_allow_lpage, would have to be moved into mmu.h, but that's not a big deal. - provide at least some examples of replacing the NULL kvm_x86_ops checks with error codes in the function (or just early "return"s). I can help with the others, but remember that for the patch to be merged, kvm_x86_ops must be removed completely. Thanks, Paolo
On Tue, Nov 05, 2019 at 11:37:47AM +0100, Paolo Bonzini wrote: > For the rest, please do this before posting again: > > - ensure that everything is bisectable x86-64 is already bisectable. All other archs bisectable I didn't check them all anyway. Even 4/13 is suboptimal and needs to be re-done later in more optimal way. I prefer all logic changes to happen at later steps so one can at least bisect to something that functionally works like before. And 4/13 also would need to be merged in the huge patch if one wants to guarantee bisectability on all CPUs, but it'll just be hidden there in the huge patch. Obviously I can squash both 3/13 and 4/13 into 2/13 but I don't feel like doing the right thing by squashing them just to increase bisectability. > - look into how to remove the modpost warnings. A simple (though > somewhat ugly) way is to keep a kvm.ko module that includes common > virt/kvm/ code as well as, for x86 only, page_track.o. A few functions, > such as kvm_mmu_gfn_disallow_lpage and kvm_mmu_gfn_allow_lpage, would > have to be moved into mmu.h, but that's not a big deal. I think we should: 1) whitelist to shut off the warnings on demand 2) verify that if two modules are registering the same export symbol the second one fails to load and the module code is robust about that, this hopefully should already be the case Provided verification of 2), the whitelist is more efficient than losing 4k of ram in all KVM hypervisors out there. > - provide at least some examples of replacing the NULL kvm_x86_ops > checks with error codes in the function (or just early "return"s). I > can help with the others, but remember that for the patch to be merged, > kvm_x86_ops must be removed completely. Even if kvm_x86_ops wouldn't be guaranteed to go away, this would already provide all the performance benefit to the KVM users, so I wouldn't see a reason not to apply it even if kvm_x86_ops cannot go away. Said that it will go away and there's no concern about it. It's just that the patchset seems large enough already and it rejects heavily already at every port. I simply stopped at the first self contained step that provides all performance benefits. If I go ahead and remove kvm_x86_ops how do I know it won't reject heavily the next day I rebase and I've to redo it all from scratch? If you explain me how you're going to guarantee that I won't have to do that work more than once I'd be happy to go ahead. Thanks, Andrea
On 05/11/19 14:54, Andrea Arcangeli wrote: > x86-64 is already bisectable. > > All other archs bisectable I didn't check them all anyway. > > Even 4/13 is suboptimal and needs to be re-done later in more optimal > way. I prefer all logic changes to happen at later steps so one can at > least bisect to something that functionally works like before. And > 4/13 also would need to be merged in the huge patch if one wants to > guarantee bisectability on all CPUs, but it'll just be hidden there in > the huge patch. > > Obviously I can squash both 3/13 and 4/13 into 2/13 but I don't feel > like doing the right thing by squashing them just to increase > bisectability. You can reorder patches so that kvm_x86_ops assignments never happen. That way, 4/13 for example would be moved to the very beginning. >> - look into how to remove the modpost warnings. A simple (though >> somewhat ugly) way is to keep a kvm.ko module that includes common >> virt/kvm/ code as well as, for x86 only, page_track.o. A few functions, >> such as kvm_mmu_gfn_disallow_lpage and kvm_mmu_gfn_allow_lpage, would >> have to be moved into mmu.h, but that's not a big deal. > > I think we should: > > 1) whitelist to shut off the warnings on demand Do you mean adding a whitelist to modpost? That would work, though I am not sure if the module maintainer (Jessica Yu) would accept that. > 2) verify that if two modules are registering the same export symbol > the second one fails to load and the module code is robust about > that, this hopefully should already be the case > > Provided verification of 2), the whitelist is more efficient than > losing 4k of ram in all KVM hypervisors out there. I agree. >> - provide at least some examples of replacing the NULL kvm_x86_ops >> checks with error codes in the function (or just early "return"s). I >> can help with the others, but remember that for the patch to be merged, >> kvm_x86_ops must be removed completely. > > Even if kvm_x86_ops wouldn't be guaranteed to go away, this would > already provide all the performance benefit to the KVM users, so I > wouldn't see a reason not to apply it even if kvm_x86_ops cannot go > away. The answer is maintainability. My suggestion is that we start looking into removing all assignments and tests of kvm_x86_ops, one step at a time. Until this is done, unfortunately we won't be able to reap the performance benefit. But the advantage is that this can be done in many separate submissions; it doesn't have to be one huge patch. Once this is done, removing kvm_x86_ops is trivial in the end. It's okay if the intermediate step has minimal performance regressions, we know what it will look like. I have to order patches with maintenance first and performance second, if possible. By the way, we are already planning to make some module parameters per-VM instead of global, so this refactoring would also help that effort. > Said that it will go away and there's no concern about it. It's > just that the patchset seems large enough already and it rejects > heavily already at every port. I simply stopped at the first self > contained step that provides all performance benefits. That is good enough to prove the feasibility of the idea, so I agree that was a good plan. Paolo > If I go ahead and remove kvm_x86_ops how do I know it won't reject > heavily the next day I rebase and I've to redo it all from scratch? If > you explain me how you're going to guarantee that I won't have to do > that work more than once I'd be happy to go ahead.
On Tue, Nov 05, 2019 at 03:09:37PM +0100, Paolo Bonzini wrote: > You can reorder patches so that kvm_x86_ops assignments never happen. > That way, 4/13 for example would be moved to the very beginning. Ok 3/13 and 4/14 can both go before 2/13, I reordered them fine, the end result at least is the same and the intermediate result should improve. Hopefully this is the best solution for those two outliers. Once this is committed I expect Sean to take over 4/14 with a more optimal version to get rid of that branch like he proposed initially. > >> - look into how to remove the modpost warnings. A simple (though > >> somewhat ugly) way is to keep a kvm.ko module that includes common > >> virt/kvm/ code as well as, for x86 only, page_track.o. A few functions, > >> such as kvm_mmu_gfn_disallow_lpage and kvm_mmu_gfn_allow_lpage, would > >> have to be moved into mmu.h, but that's not a big deal. > > > > I think we should: > > > > 1) whitelist to shut off the warnings on demand > > Do you mean adding a whitelist to modpost? That would work, though I am > not sure if the module maintainer (Jessica Yu) would accept that. Yes that's exactly what I meant. > > 2) verify that if two modules are registering the same export symbol > > the second one fails to load and the module code is robust about > > that, this hopefully should already be the case > > > > Provided verification of 2), the whitelist is more efficient than > > losing 4k of ram in all KVM hypervisors out there. > > I agree. Ok, so I enlarged the CC list accordingly to check how the whitelist can be done in modpost. > >> - provide at least some examples of replacing the NULL kvm_x86_ops > >> checks with error codes in the function (or just early "return"s). I > >> can help with the others, but remember that for the patch to be merged, > >> kvm_x86_ops must be removed completely. > > > > Even if kvm_x86_ops wouldn't be guaranteed to go away, this would > > already provide all the performance benefit to the KVM users, so I > > wouldn't see a reason not to apply it even if kvm_x86_ops cannot go > > away. > > The answer is maintainability. My suggestion is that we start looking > into removing all assignments and tests of kvm_x86_ops, one step at a > time. Until this is done, unfortunately we won't be able to reap the > performance benefit. But the advantage is that this can be done in many There's not much performance benefit left from the removal kvm_x86_ops. It'll only remove a few branches at best (and only if we don't have to replace the branches on the pointer check with other branches on a static variable to disambiguate the different cases). > separate submissions; it doesn't have to be one huge patch. > > Once this is done, removing kvm_x86_ops is trivial in the end. It's > okay if the intermediate step has minimal performance regressions, we > know what it will look like. I have to order patches with maintenance > first and performance second, if possible. The removal of kvm_x86_ops is just a badly needed code cleanup and of course I agree it must happen sooner than later. I'm just trying to avoid running into rejects on those further commit cleanups too. > By the way, we are already planning to make some module parameters > per-VM instead of global, so this refactoring would also help that effort. > > > Said that it will go away and there's no concern about it. It's > > just that the patchset seems large enough already and it rejects > > heavily already at every port. I simply stopped at the first self > > contained step that provides all performance benefits. > > That is good enough to prove the feasibility of the idea, so I agree > that was a good plan. All right, so I'm not exactly sure what's the plan and if it's ok to do it over time or if I should go ahead doing all logic changes while the big patch remains out of tree. If you apply it and reorder 4/13 and 3/13 before 2/13 in a rebase like I did locally, it should already be good starting point in my view and the modpost also can be fixed over time too, the warnings appears harmless so far. Thanks, Andrea
On 05/11/19 15:56, Andrea Arcangeli wrote: >>> I think we should: >>> >>> 1) whitelist to shut off the warnings on demand >> >> Do you mean adding a whitelist to modpost? That would work, though I am >> not sure if the module maintainer (Jessica Yu) would accept that. > > Yes that's exactly what I meant. Ok, thanks. Jessica, the issue here is that we have two (mutually exclusive) modules providing the same interface to a third module. Andrea will check that, when the same symbol is exported by two modules, the second-loaded module correctly fails insmod. If that is okay, we will also need modpost not to warn for these symbols in sym_add_exported. >> The answer is maintainability. My suggestion is that we start looking >> into removing all assignments and tests of kvm_x86_ops, one step at a >> time. Until this is done, unfortunately we won't be able to reap the >> performance benefit. But the advantage is that this can be done in many > > There's not much performance benefit left from the removal > kvm_x86_ops. Indeed; what I mean is that until then we will have to keep the retpolines. Not removing kvm_x86_ops leaves an unsustainable mess in terms of maintainability, therefore we will need to first refactor the code. Once the refactoring is over, kvm_x86_ops can be dropped easily, just like kvm_pmu_ops in this version of the series. The good thing is that the modpost discussion can proceed in parallel. > The removal of kvm_x86_ops is just a badly needed code cleanup and of > course I agree it must happen sooner than later. I'm just trying to > avoid running into rejects on those further commit cleanups too. >> That is good enough to prove the feasibility of the idea, so I agree >> that was a good plan. > > All right, so I'm not exactly sure what's the plan and if it's ok to > do it over time or if I should go ahead doing all logic changes while > the big patch remains out of tree. Yes, the changes to remove tests and assignments to kvm_x86_ops must happen first. I understand that the big patch is a conflict magnet, but once all the refactoring is done it will be very easy to review and it will get in quickly. Paolo
+++ Paolo Bonzini [05/11/19 16:10 +0100]: >On 05/11/19 15:56, Andrea Arcangeli wrote: >>>> I think we should: >>>> >>>> 1) whitelist to shut off the warnings on demand >>> >>> Do you mean adding a whitelist to modpost? That would work, though I am >>> not sure if the module maintainer (Jessica Yu) would accept that. >> >> Yes that's exactly what I meant. > >Ok, thanks. Jessica, the issue here is that we have two (mutually >exclusive) modules providing the same interface to a third module. >Andrea will check that, when the same symbol is exported by two modules, >the second-loaded module correctly fails insmod. Hi Paolo, thanks for getting me up to speed. The module loader already rejects loading a module with duplicate exported symbols. > If that is okay, we will also need modpost not to warn for these > symbols in sym_add_exported. I think it's certainly doable in modpost, for example we could pass a list of whitelisted symbols and have modpost read them in and not warn if it encounters the whitelisted symbols more than once. Modpost will also have to be modified to accomodate duplicate symbols. I'm not sure how ugly this would be without seeing the actual patch. And I am not sure what Masahiro (who takes care of all things kbuild-related) thinks of this idea. But before implementing all this, is there absolutely no way around having the duplicated exported symbols? (e.g., could the modules be configured/built in a mutally exclusive way? I'm lacking the context from the rest of the thread, so not sure which are the problematic modules.) Thanks, Jessica
On 08/11/19 14:56, Jessica Yu wrote: > And I am > not sure what Masahiro (who takes care of all things kbuild-related) > thinks of this idea. But before implementing all this, is there > absolutely no way around having the duplicated exported symbols? (e.g., > could the modules be configured/built in a mutally exclusive way? I'm > lacking the context from the rest of the thread, so not sure which are > the problematic modules.) The problematic modules are kvm_intel and kvm_amd, so we cannot build them in a mutually exclusive way (but we know it won't make sense to load both). We will have to build only one of them when built into vmlinux, but the module case must support building both. Currently we put the common symbols in kvm.ko, and kvm.ko acts as a kind of "library" for kvm_intel.ko and kvm_amd.ko. The problem is that kvm_intel.ko and kvm_amd.ko currently pass a large array of function pointers to kvm.ko, and Andrea measured a substantial performance penalty from retpolines when kvm.ko calls back through those pointers. Therefore he would like to remove kvm.ko, and that would result in symbols exported from two modules. I suppose we could use code patching mechanism to avoid the retpolines. Andrea, what do you think about that? That would have the advantage that we won't have to remove kvm_x86_ops. :) Thanks, Paolo
On Fri, Nov 08, 2019 at 08:51:04PM +0100, Paolo Bonzini wrote: > I suppose we could use code patching mechanism to avoid the retpolines. > Andrea, what do you think about that? That would have the advantage > that we won't have to remove kvm_x86_ops. :) page 17 covers pvops: https://people.redhat.com/~aarcange/slides/2019-KVM-monolithic.pdf
On 08/11/19 21:01, Andrea Arcangeli wrote: > On Fri, Nov 08, 2019 at 08:51:04PM +0100, Paolo Bonzini wrote: >> I suppose we could use code patching mechanism to avoid the retpolines. >> Andrea, what do you think about that? That would have the advantage >> that we won't have to remove kvm_x86_ops. :) > > page 17 covers pvops: > > https://people.redhat.com/~aarcange/slides/2019-KVM-monolithic.pdf You can patch call instructions directly using text_poke when kvm_intel.ko or kvm_amd.ko, I'm not sure why that would be worse for TLB or RAM usage. The hard part is recording the location of the call sites using some pushsection/popsection magic. Paolo
On Fri, Nov 08, 2019 at 10:02:52PM +0100, Paolo Bonzini wrote: > kvm_intel.ko or kvm_amd.ko, I'm not sure why that would be worse for TLB > or RAM usage. The hard part is recording the location of the call sites Let's ignore the different code complexity of supporting self modifying code: kvm.ko and kvm-*.ko will be located in different pages, hence it'll waste 1 iTLB for every vmexit and 2k of RAM in average. The L1 icache also will be wasted. It'll simply run slower. Now about the code complexity, it is even higher than pvops: KVM pvops ========= ============= 1) Changes daily Never change 2) Patched at runtime Patched only at boot time early on during module load and multiple times at every load of kvm-*.ko 3) The patching points to All patch destinations are linked into code in kernel modules the kernel Why exactly should we go through such a complication when it runs slower in the end and it's much more complex to implement and maintain and in fact even more complex than pvops already is? Runtime patching the indirect call like pvops do is strictly required when you are forced to resolve the linking at runtime. The alternative would be to ship two different Linux kernels for PV and bare metal. Maintaining a whole new kernel rpm and having to install a different rpm depending on the hypervisor/bare metal is troublesome so pvops is worth it. With kvm-amd and kvm-intel we can avoid the whole runtime patching of the call sites as already proven by KVM monolithic patchset, and it'll run faster in the CPU and it'll save RAM, so I'm not exactly sure how anybody could prefer runtime patching here when the only benefit is a few mbytes of disk space saved on disk. Furthermore by linking the thing statically we'll also enable LTO and other gcc features which would never be possible with those indirect calls. Thanks, Andrea
On 08/11/19 22:26, Andrea Arcangeli wrote: > On Fri, Nov 08, 2019 at 10:02:52PM +0100, Paolo Bonzini wrote: >> kvm_intel.ko or kvm_amd.ko, I'm not sure why that would be worse for TLB >> or RAM usage. The hard part is recording the location of the call sites > > Let's ignore the different code complexity of supporting self > modifying code: kvm.ko and kvm-*.ko will be located in different > pages, hence it'll waste 1 iTLB for every vmexit and 2k of RAM in > average. This is unlikely to make a difference, since kvm.o and kvm-intel.o are overall about 700 KiB in size. You do lose some inlining opportunities with LTO, but without LTO the L1 cache benefits are debatable too. The real loss is in the complexity, I agree with you about that. > Now about the code complexity, it is even higher than pvops: > > KVM pvops > ========= ============= > 1) Changes daily Never change > > 2) Patched at runtime Patched only at boot time early on > during module load > and multiple times > at every load of kvm-*.ko > > 3) The patching points to All patch destinations are linked into > code in kernel modules the kernel > > Why exactly should we go through such a complication when it runs > slower in the end and it's much more complex to implement and maintain > and in fact even more complex than pvops already is? For completeness, one advantage of patching would be to keep support for built-in Intel+AMD. The modpost patch should be pretty small, and since Jessica seemed quite open to it let's do that. Thanks, Paolo > Furthermore by linking the thing statically we'll also enable LTO and > other gcc features which would never be possible with those indirect > calls. > > Thanks, > Andrea >
(+CC: Lucas De Marchi, the kmod maintainer) On Sat, Nov 9, 2019 at 4:51 AM Paolo Bonzini <pbonzini@redhat.com> wrote: > > On 08/11/19 14:56, Jessica Yu wrote: > > And I am > > not sure what Masahiro (who takes care of all things kbuild-related) > > thinks of this idea. But before implementing all this, is there > > absolutely no way around having the duplicated exported symbols? (e.g., > > could the modules be configured/built in a mutally exclusive way? I'm > > lacking the context from the rest of the thread, so not sure which are > > the problematic modules.) I do not think having a white-list in the modpost is maintainable. > > The problematic modules are kvm_intel and kvm_amd, so we cannot build > them in a mutually exclusive way (but we know it won't make sense to > load both). We will have to build only one of them when built into > vmlinux, but the module case must support building both. > > Currently we put the common symbols in kvm.ko, and kvm.ko acts as a kind > of "library" for kvm_intel.ko and kvm_amd.ko. The problem is that > kvm_intel.ko and kvm_amd.ko currently pass a large array of function > pointers to kvm.ko, and Andrea measured a substantial performance > penalty from retpolines when kvm.ko calls back through those pointers. > > Therefore he would like to remove kvm.ko, and that would result in > symbols exported from two modules. If this is a general demand, I think we can relax the 'exported twice' warning; we can show the warning only when the previous export is from vmlinux. However, I am not sure how the module dependency should be handled when the same symbol is exported from multiple modules. Let's say the same symbol is exported from foo.ko and bar.ko and foo.ko appears first in modules.order . In this case, I think the foo.ko should be considered to have a higher priority than bar.ko The modpost records MODULE_INFO(depends, foo.ko) correctly, but modules.{dep, symbols} do not seem to reflect that. Maybe, depmod does not take multiple-export into consideration ?? If we change this, I want this working consistently. Masahiro Yamada > I suppose we could use code patching mechanism to avoid the retpolines. > Andrea, what do you think about that? That would have the advantage > that we won't have to remove kvm_x86_ops. :) > > Thanks, > > Paolo
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index bd17ad61f7e3..1a58ae38c8f2 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -7195,6 +7195,17 @@ void kvm_x86_cancel_hv_timer(struct kvm_vcpu *vcpu) { to_vmx(vcpu)->hv_deadline_tsc = -1; } +#else +int kvm_x86_set_hv_timer(struct kvm_vcpu *vcpu, u64 guest_deadline_tsc, + bool *expired) +{ + BUG(); +} + +void kvm_x86_cancel_hv_timer(struct kvm_vcpu *vcpu) +{ + BUG(); +} #endif void kvm_x86_sched_in(struct kvm_vcpu *vcpu, int cpu)
kvm_x86_set_hv_timer and kvm_x86_cancel_hv_timer needs to be defined to succeed the 32bit kernel build, but they can't be called. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> --- arch/x86/kvm/vmx/vmx.c | 11 +++++++++++ 1 file changed, 11 insertions(+)