Message ID | 1374750001-28527-10-git-send-email-gleb@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Il 25/07/2013 12:59, Gleb Natapov ha scritto: > +#if PTTYPE == PTTYPE_EPT > +#define CHECK_BAD_MT_XWR(G) mmu->bad_mt_xwr & (1ull << ((G) & 0x3f)); > +#else > +#define CHECK_BAD_MT_XWR(G) 0; > +#endif > + > static bool FNAME(is_rsvd_bits_set)(struct kvm_mmu *mmu, u64 gpte, int level) > { > int bit7; > > bit7 = (gpte >> 7) & 1; > - return (gpte & mmu->rsvd_bits_mask[bit7][level-1]) != 0; > + return ((gpte & mmu->rsvd_bits_mask[bit7][level-1]) != 0) || > + CHECK_BAD_MT_XWR(gpte); > } > > +#undef CHECK_BAD_MT_XWR Instead of a macro, you can do if (...) return true; #if PTTYPE == PTTYPE_EPT if (...) return true; #endif return false; The compiler should be smart enough to generate the same code for non-EPT PTTYPE. > > + /* > + * Use PFERR_RSVD_MASK in erorr_code to to tell if EPT > + * misconfiguration requires to be injected. The detection is > + * done by is_rsvd_bits_set() above. erorr_code -> error_code This patch has warnings for unused static functions. You can squash them, or split them differently according to file boundaries (i.e. mmu.c first, vmx.c second). Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jul 29, 2013 at 10:59:31AM +0200, Paolo Bonzini wrote: > Il 25/07/2013 12:59, Gleb Natapov ha scritto: > > +#if PTTYPE == PTTYPE_EPT > > +#define CHECK_BAD_MT_XWR(G) mmu->bad_mt_xwr & (1ull << ((G) & 0x3f)); > > +#else > > +#define CHECK_BAD_MT_XWR(G) 0; > > +#endif > > + > > static bool FNAME(is_rsvd_bits_set)(struct kvm_mmu *mmu, u64 gpte, int level) > > { > > int bit7; > > > > bit7 = (gpte >> 7) & 1; > > - return (gpte & mmu->rsvd_bits_mask[bit7][level-1]) != 0; > > + return ((gpte & mmu->rsvd_bits_mask[bit7][level-1]) != 0) || > > + CHECK_BAD_MT_XWR(gpte); > > } > > > > +#undef CHECK_BAD_MT_XWR > > Instead of a macro, you can do > > if (...) > return true; > #if PTTYPE == PTTYPE_EPT > if (...) > return true; > #endif > return false; > > The compiler should be smart enough to generate the same code for > non-EPT PTTYPE. > The idea behind this rsvd_bits_mask trickery is to produce code that does not have conditional branches. I don't want to rely on compiler to do the right things. On the other hand I am not sure that just dropping this ifdefs here and checking mmu->bad_mt_xwr for non ept case is not a good idea. The overhead should not be measurable. > > > > + /* > > + * Use PFERR_RSVD_MASK in erorr_code to to tell if EPT > > + * misconfiguration requires to be injected. The detection is > > + * done by is_rsvd_bits_set() above. > > erorr_code -> error_code > > This patch has warnings for unused static functions. You can squash > them, or split them differently according to file boundaries (i.e. mmu.c > first, vmx.c second). > I prefer to have an in between patch with a warning, but do not divide code that logically belongs together between patches. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Il 29/07/2013 12:52, Gleb Natapov ha scritto: > On Mon, Jul 29, 2013 at 10:59:31AM +0200, Paolo Bonzini wrote: >> Il 25/07/2013 12:59, Gleb Natapov ha scritto: >>> +#if PTTYPE == PTTYPE_EPT >>> +#define CHECK_BAD_MT_XWR(G) mmu->bad_mt_xwr & (1ull << ((G) & 0x3f)); >>> +#else >>> +#define CHECK_BAD_MT_XWR(G) 0; No semicolons here, BTW. >>> +#endif >>> + >>> static bool FNAME(is_rsvd_bits_set)(struct kvm_mmu *mmu, u64 gpte, int level) >>> { >>> int bit7; >>> >>> bit7 = (gpte >> 7) & 1; >>> - return (gpte & mmu->rsvd_bits_mask[bit7][level-1]) != 0; >>> + return ((gpte & mmu->rsvd_bits_mask[bit7][level-1]) != 0) || >>> + CHECK_BAD_MT_XWR(gpte); >>> } >>> >>> +#undef CHECK_BAD_MT_XWR >> >> Instead of a macro, you can do >> >> if (...) >> return true; >> #if PTTYPE == PTTYPE_EPT >> if (...) >> return true; >> #endif >> return false; >> >> The compiler should be smart enough to generate the same code for >> non-EPT PTTYPE. >> > The idea behind this rsvd_bits_mask trickery is to produce code that > does not have conditional branches. If you want to have no conditional branches, you need to use "|" not "||" (and you also need an "!= 0" in CHECK_BAD_MT_XWR). As you wrote it, the compiler is most likely to generate exactly the same code that I suggested. But I think what you _really_ want is not avoiding conditional branches. What you want is always inline is_rsvd_bits_set, so that the compiler can merge these "if"s with the one where the caller calls is_rsvd_bits_set. So just mark is_rsvd_bits_set as inline. > I don't want to rely on compiler to do > the right things. On the other hand I am not sure that just dropping this > ifdefs here and checking mmu->bad_mt_xwr for non ept case is not a > good idea. The overhead should not be measurable. That's also a possibility, but I think if you mark is_rsvd_bits_set as inline it is better to leave the ifs separate. >>> >>> + /* >>> + * Use PFERR_RSVD_MASK in erorr_code to to tell if EPT >>> + * misconfiguration requires to be injected. The detection is >>> + * done by is_rsvd_bits_set() above. >> >> erorr_code -> error_code >> >> This patch has warnings for unused static functions. You can squash >> them, or split them differently according to file boundaries (i.e. mmu.c >> first, vmx.c second). >> > I prefer to have an in between patch with a warning, but do not divide > code that logically belongs together between patches. Fine. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jul 29, 2013 at 12:59:54PM +0200, Paolo Bonzini wrote: > Il 29/07/2013 12:52, Gleb Natapov ha scritto: > > On Mon, Jul 29, 2013 at 10:59:31AM +0200, Paolo Bonzini wrote: > >> Il 25/07/2013 12:59, Gleb Natapov ha scritto: > >>> +#if PTTYPE == PTTYPE_EPT > >>> +#define CHECK_BAD_MT_XWR(G) mmu->bad_mt_xwr & (1ull << ((G) & 0x3f)); > >>> +#else > >>> +#define CHECK_BAD_MT_XWR(G) 0; > > No semicolons here, BTW. > > >>> +#endif > >>> + > >>> static bool FNAME(is_rsvd_bits_set)(struct kvm_mmu *mmu, u64 gpte, int level) > >>> { > >>> int bit7; > >>> > >>> bit7 = (gpte >> 7) & 1; > >>> - return (gpte & mmu->rsvd_bits_mask[bit7][level-1]) != 0; > >>> + return ((gpte & mmu->rsvd_bits_mask[bit7][level-1]) != 0) || > >>> + CHECK_BAD_MT_XWR(gpte); > >>> } > >>> > >>> +#undef CHECK_BAD_MT_XWR > >> > >> Instead of a macro, you can do > >> > >> if (...) > >> return true; > >> #if PTTYPE == PTTYPE_EPT > >> if (...) > >> return true; > >> #endif > >> return false; > >> > >> The compiler should be smart enough to generate the same code for > >> non-EPT PTTYPE. > >> > > The idea behind this rsvd_bits_mask trickery is to produce code that > > does not have conditional branches. > > If you want to have no conditional branches, you need to use "|" not > "||" (and you also need an "!= 0" in CHECK_BAD_MT_XWR). As you wrote > it, the compiler is most likely to generate exactly the same code that I > suggested. OK. I can add that :) I still prefer to have ifdefs outside the function, not inside. > > But I think what you _really_ want is not avoiding conditional branches. The idea is that it is hard for branch prediction to predict correct result when correct result depends on guest's page table that can contain anything, so in some places shadow paging code uses boolean logic to avoid branches, in this case it is hard to avoid if() anyway since the function invocation is in the if(). > What you want is always inline is_rsvd_bits_set, so that the compiler > can merge these "if"s with the one where the caller calls > is_rsvd_bits_set. So just mark is_rsvd_bits_set as inline. Will do, but it is inlined regardless. It's very small. > > > I don't want to rely on compiler to do > > the right things. On the other hand I am not sure that just dropping this > > ifdefs here and checking mmu->bad_mt_xwr for non ept case is not a > > good idea. The overhead should not be measurable. > > That's also a possibility, but I think if you mark is_rsvd_bits_set as > inline it is better to leave the ifs separate. > > >>> > >>> + /* > >>> + * Use PFERR_RSVD_MASK in erorr_code to to tell if EPT > >>> + * misconfiguration requires to be injected. The detection is > >>> + * done by is_rsvd_bits_set() above. > >> > >> erorr_code -> error_code > >> > >> This patch has warnings for unused static functions. You can squash > >> them, or split them differently according to file boundaries (i.e. mmu.c > >> first, vmx.c second). > >> > > I prefer to have an in between patch with a warning, but do not divide > > code that logically belongs together between patches. > > Fine. > > Paolo -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Il 29/07/2013 13:43, Gleb Natapov ha scritto: > On Mon, Jul 29, 2013 at 12:59:54PM +0200, Paolo Bonzini wrote: >> Il 29/07/2013 12:52, Gleb Natapov ha scritto: >>> On Mon, Jul 29, 2013 at 10:59:31AM +0200, Paolo Bonzini wrote: >>>> Il 25/07/2013 12:59, Gleb Natapov ha scritto: >>>>> +#if PTTYPE == PTTYPE_EPT >>>>> +#define CHECK_BAD_MT_XWR(G) mmu->bad_mt_xwr & (1ull << ((G) & 0x3f)); >>>>> +#else >>>>> +#define CHECK_BAD_MT_XWR(G) 0; >> >> No semicolons here, BTW. >> >>>>> +#endif >>>>> + >>>>> static bool FNAME(is_rsvd_bits_set)(struct kvm_mmu *mmu, u64 gpte, int level) >>>>> { >>>>> int bit7; >>>>> >>>>> bit7 = (gpte >> 7) & 1; >>>>> - return (gpte & mmu->rsvd_bits_mask[bit7][level-1]) != 0; >>>>> + return ((gpte & mmu->rsvd_bits_mask[bit7][level-1]) != 0) || >>>>> + CHECK_BAD_MT_XWR(gpte); >>>>> } >>>>> >>>>> +#undef CHECK_BAD_MT_XWR >>>> >>>> Instead of a macro, you can do >>>> >>>> if (...) >>>> return true; >>>> #if PTTYPE == PTTYPE_EPT >>>> if (...) >>>> return true; >>>> #endif >>>> return false; >>>> >>>> The compiler should be smart enough to generate the same code for >>>> non-EPT PTTYPE. >>>> >>> The idea behind this rsvd_bits_mask trickery is to produce code that >>> does not have conditional branches. >> >> If you want to have no conditional branches, you need to use "|" not >> "||" (and you also need an "!= 0" in CHECK_BAD_MT_XWR). As you wrote >> it, the compiler is most likely to generate exactly the same code that I >> suggested. > OK. I can add that :) I still prefer to have ifdefs outside the > function, not inside. > >> >> But I think what you _really_ want is not avoiding conditional branches. > The idea is that it is hard for branch prediction to predict correct > result when correct result depends on guest's page table that can > contain anything, so in some places shadow paging code uses boolean > logic to avoid branches, in this case it is hard to avoid if() anyway > since the function invocation is in the if(). Yes, I get the idea, but is_rsvd_bits_set should be predicted unlikely, no? If the guest has to run, it must use mostly valid ptes. :) Especially if you change prefetch_invalid_gpte to do the reserved bits test after the present test (so that is_rsvd_bits_set is only called on present pagetables), is_rsvd_bits_set's result should be really well-predicted. At this point (and especially since function invocation is always in "if"s), using boolean logic to avoid branches does not make much sense anymore for this function. The compiler may still use setCC and boolean logic, even if you write it as "if"s; and in simple cases like this, after inlining and seeing an "if" the compiler may undo all your careful choice of "|" vs. "||". So it's better anyway to ensure is_rsvd_bits_set is called only when it's unlikely. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jul 29, 2013 at 02:05:44PM +0200, Paolo Bonzini wrote: > Il 29/07/2013 13:43, Gleb Natapov ha scritto: > > On Mon, Jul 29, 2013 at 12:59:54PM +0200, Paolo Bonzini wrote: > >> Il 29/07/2013 12:52, Gleb Natapov ha scritto: > >>> On Mon, Jul 29, 2013 at 10:59:31AM +0200, Paolo Bonzini wrote: > >>>> Il 25/07/2013 12:59, Gleb Natapov ha scritto: > >>>>> +#if PTTYPE == PTTYPE_EPT > >>>>> +#define CHECK_BAD_MT_XWR(G) mmu->bad_mt_xwr & (1ull << ((G) & 0x3f)); > >>>>> +#else > >>>>> +#define CHECK_BAD_MT_XWR(G) 0; > >> > >> No semicolons here, BTW. > >> > >>>>> +#endif > >>>>> + > >>>>> static bool FNAME(is_rsvd_bits_set)(struct kvm_mmu *mmu, u64 gpte, int level) > >>>>> { > >>>>> int bit7; > >>>>> > >>>>> bit7 = (gpte >> 7) & 1; > >>>>> - return (gpte & mmu->rsvd_bits_mask[bit7][level-1]) != 0; > >>>>> + return ((gpte & mmu->rsvd_bits_mask[bit7][level-1]) != 0) || > >>>>> + CHECK_BAD_MT_XWR(gpte); > >>>>> } > >>>>> > >>>>> +#undef CHECK_BAD_MT_XWR > >>>> > >>>> Instead of a macro, you can do > >>>> > >>>> if (...) > >>>> return true; > >>>> #if PTTYPE == PTTYPE_EPT > >>>> if (...) > >>>> return true; > >>>> #endif > >>>> return false; > >>>> > >>>> The compiler should be smart enough to generate the same code for > >>>> non-EPT PTTYPE. > >>>> > >>> The idea behind this rsvd_bits_mask trickery is to produce code that > >>> does not have conditional branches. > >> > >> If you want to have no conditional branches, you need to use "|" not > >> "||" (and you also need an "!= 0" in CHECK_BAD_MT_XWR). As you wrote > >> it, the compiler is most likely to generate exactly the same code that I > >> suggested. > > OK. I can add that :) I still prefer to have ifdefs outside the > > function, not inside. > > > >> > >> But I think what you _really_ want is not avoiding conditional branches. > > The idea is that it is hard for branch prediction to predict correct > > result when correct result depends on guest's page table that can > > contain anything, so in some places shadow paging code uses boolean > > logic to avoid branches, in this case it is hard to avoid if() anyway > > since the function invocation is in the if(). > > Yes, I get the idea, but is_rsvd_bits_set should be predicted unlikely, > no? If the guest has to run, it must use mostly valid ptes. :) > You see, you are confused and you want branch prediction not to be? :) If your guest is KVM is_rsvd_bits_set() will be likely much more then unlikely because KVM misconfigures EPT entries to cache MMIO addresses, so all the "unlikely" cases will be fixed by shadow pages and will not reappear (until shadow pages are zapped), but misconfigured entries will continue to produces violations. > Especially if you change prefetch_invalid_gpte to do the reserved bits > test after the present test (so that is_rsvd_bits_set is only called on > present pagetables), is_rsvd_bits_set's result should be really > well-predicted. Nope, for ept page tables present is not a single bit, it is three bits which by themselves can have invalid values. For regular paging you are probably right. > At this point (and especially since function invocation > is always in "if"s), using boolean logic to avoid branches does not make > much sense anymore for this function. That's true. > > The compiler may still use setCC and boolean logic, even if you write it > as "if"s; and in simple cases like this, after inlining and seeing an > "if" the compiler may undo all your careful choice of "|" vs. "||". So > it's better anyway to ensure is_rsvd_bits_set is called only when it's > unlikely. > > Paolo -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Il 29/07/2013 14:34, Gleb Natapov ha scritto: >>>> But I think what you _really_ want is not avoiding conditional branches. >>> The idea is that it is hard for branch prediction to predict correct >>> result when correct result depends on guest's page table that can >>> contain anything, so in some places shadow paging code uses boolean >>> logic to avoid branches, in this case it is hard to avoid if() anyway >>> since the function invocation is in the if(). >> >> Yes, I get the idea, but is_rsvd_bits_set should be predicted unlikely, >> no? If the guest has to run, it must use mostly valid ptes. :) >> > You see, you are confused and you want branch prediction not to be? :) > If your guest is KVM is_rsvd_bits_set() will be likely much more then > unlikely because KVM misconfigures EPT entries to cache MMIO addresses, > so all the "unlikely" cases will be fixed by shadow pages and will not > reappear (until shadow pages are zapped), but misconfigured entries will > continue to produces violations. But then: 1) MMIO is a slow path anyway, losing 10 cycles on a mispredicted branch is not going to help much. Fast page faults are all I would optimize for. 2) in cases like this you just do not use likely/unlikely; the branch will be very unlikely in the beginning, and very likely once shadow pages are filled or in the no-EPT case. Just let the branch predictor adjust, it will probably do better than boolean tricks. >> Especially if you change prefetch_invalid_gpte to do the reserved bits >> test after the present test (so that is_rsvd_bits_set is only called on >> present pagetables), is_rsvd_bits_set's result should be really >> well-predicted. > Nope, for ept page tables present is not a single bit, it is three bits > which by themselves can have invalid values. We're not checking the validity of the bits in the is_present_gpte test, we're checking it in the is_rsvd_bits_set test (is_present_gpte is doing just "(pte & 7) != 0"). It doesn't change anything in the outcome of prefetch_invalid_gpte, and it makes the ordering consistent with walk_addr_generic which already tests presence before reserved bits. So doing this swap should be a win anyway. >> At this point (and especially since function invocation >> is always in "if"s), using boolean logic to avoid branches does not make >> much sense anymore for this function. > > That's true. So are you going to change to "if"s? Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jul 29, 2013 at 03:11:39PM +0200, Paolo Bonzini wrote: > Il 29/07/2013 14:34, Gleb Natapov ha scritto: > >>>> But I think what you _really_ want is not avoiding conditional branches. > >>> The idea is that it is hard for branch prediction to predict correct > >>> result when correct result depends on guest's page table that can > >>> contain anything, so in some places shadow paging code uses boolean > >>> logic to avoid branches, in this case it is hard to avoid if() anyway > >>> since the function invocation is in the if(). > >> > >> Yes, I get the idea, but is_rsvd_bits_set should be predicted unlikely, > >> no? If the guest has to run, it must use mostly valid ptes. :) > >> > > You see, you are confused and you want branch prediction not to be? :) > > If your guest is KVM is_rsvd_bits_set() will be likely much more then > > unlikely because KVM misconfigures EPT entries to cache MMIO addresses, > > so all the "unlikely" cases will be fixed by shadow pages and will not > > reappear (until shadow pages are zapped), but misconfigured entries will > > continue to produces violations. > > But then: > > 1) MMIO is a slow path anyway, losing 10 cycles on a mispredicted branch > is not going to help much. Fast page faults are all I would optimize for. > Of course, for that the check should be fast. > 2) in cases like this you just do not use likely/unlikely; the branch > will be very unlikely in the beginning, and very likely once shadow > pages are filled or in the no-EPT case. Just let the branch predictor > adjust, it will probably do better than boolean tricks. > likely/unlikely are usually useless anyway. If you can avoid if() altogether this is a win since there is no branch to predict. > >> Especially if you change prefetch_invalid_gpte to do the reserved bits > >> test after the present test (so that is_rsvd_bits_set is only called on > >> present pagetables), is_rsvd_bits_set's result should be really > >> well-predicted. > > Nope, for ept page tables present is not a single bit, it is three bits > > which by themselves can have invalid values. > > We're not checking the validity of the bits in the is_present_gpte test, > we're checking it in the is_rsvd_bits_set test (is_present_gpte is doing > just "(pte & 7) != 0"). It doesn't change anything in the outcome of > prefetch_invalid_gpte, and it makes the ordering consistent with > walk_addr_generic which already tests presence before reserved bits. > > So doing this swap should be a win anyway. > > >> At this point (and especially since function invocation > >> is always in "if"s), using boolean logic to avoid branches does not make > >> much sense anymore for this function. > > > > That's true. > > So are you going to change to "if"s? > I think it will be better just to check mmu->bad_mt_xwr always. (I dislike ifdefs if you haven't noticed :)). -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Il 29/07/2013 15:20, Gleb Natapov ha scritto: >> 2) in cases like this you just do not use likely/unlikely; the branch >> will be very unlikely in the beginning, and very likely once shadow >> pages are filled or in the no-EPT case. Just let the branch predictor >> adjust, it will probably do better than boolean tricks. >> > likely/unlikely are usually useless anyway. If you can avoid if() > altogether this is a win since there is no branch to predict. However, if the branches are dynamically well-predicted, if (simple) ... if (complex) ... is likely faster than if (simple | complex) because the branches then are very very cheap, and it pays off to not always evaluate the complex branch. In this case, the reserved bit test is the relatively complex one, it has a couple memory accesses and a longish chain of dependencies. >>>> Especially if you change prefetch_invalid_gpte to do the reserved bits >>>> test after the present test (so that is_rsvd_bits_set is only called on >>>> present pagetables), is_rsvd_bits_set's result should be really >>>> well-predicted. >>> Nope, for ept page tables present is not a single bit, it is three bits >>> which by themselves can have invalid values. >> >> We're not checking the validity of the bits in the is_present_gpte test, >> we're checking it in the is_rsvd_bits_set test (is_present_gpte is doing >> just "(pte & 7) != 0"). It doesn't change anything in the outcome of >> prefetch_invalid_gpte, and it makes the ordering consistent with >> walk_addr_generic which already tests presence before reserved bits. >> >> So doing this swap should be a win anyway. >> >>>> At this point (and especially since function invocation >>>> is always in "if"s), using boolean logic to avoid branches does not make >>>> much sense anymore for this function. >>> >>> That's true. >> >> So are you going to change to "if"s? >> > I think it will be better just to check mmu->bad_mt_xwr always. (I > dislike ifdefs if you haven't noticed :)). Yeah, I also thought of always checking bad_mt_xwr and even using it to subsume the present check too, i.e. turning it into is_rsvd_bits_set_or_nonpresent. It checks the same bits that are used in the present check (well, a superset). You can then check for presence separately if you care, which you don't in prefetch_invalid_gpte. It requires small changes in the callers but nothing major. But it still seems to me that we're in the above "if (simple || complex)" case and having a separate "if (!present)" check will be faster. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jul 29, 2013 at 04:12:33PM +0200, Paolo Bonzini wrote: > Il 29/07/2013 15:20, Gleb Natapov ha scritto: > >> 2) in cases like this you just do not use likely/unlikely; the branch > >> will be very unlikely in the beginning, and very likely once shadow > >> pages are filled or in the no-EPT case. Just let the branch predictor > >> adjust, it will probably do better than boolean tricks. > >> > > likely/unlikely are usually useless anyway. If you can avoid if() > > altogether this is a win since there is no branch to predict. > > However, if the branches are dynamically well-predicted, > > if (simple) > ... > if (complex) > ... > > is likely faster than > > if (simple | complex) > > because the branches then are very very cheap, and it pays off to not > always evaluate the complex branch. > Good point about about "|" always evaluating both. Is this the case with if (simple !=0 | complex != 0) too where theoretically compiler may see that if simple !=0 is true no need to evaluate the second one? > In this case, the reserved bit test is the relatively complex one, it > has a couple memory accesses and a longish chain of dependencies. > > >>>> Especially if you change prefetch_invalid_gpte to do the reserved bits > >>>> test after the present test (so that is_rsvd_bits_set is only called on > >>>> present pagetables), is_rsvd_bits_set's result should be really > >>>> well-predicted. > >>> Nope, for ept page tables present is not a single bit, it is three bits > >>> which by themselves can have invalid values. > >> > >> We're not checking the validity of the bits in the is_present_gpte test, > >> we're checking it in the is_rsvd_bits_set test (is_present_gpte is doing > >> just "(pte & 7) != 0"). It doesn't change anything in the outcome of > >> prefetch_invalid_gpte, and it makes the ordering consistent with > >> walk_addr_generic which already tests presence before reserved bits. > >> > >> So doing this swap should be a win anyway. > >> > >>>> At this point (and especially since function invocation > >>>> is always in "if"s), using boolean logic to avoid branches does not make > >>>> much sense anymore for this function. > >>> > >>> That's true. > >> > >> So are you going to change to "if"s? > >> > > I think it will be better just to check mmu->bad_mt_xwr always. (I > > dislike ifdefs if you haven't noticed :)). > > Yeah, I also thought of always checking bad_mt_xwr and even using it to > subsume the present check too, i.e. turning it into > is_rsvd_bits_set_or_nonpresent. It checks the same bits that are used > in the present check (well, a superset). You can then check for > presence separately if you care, which you don't in > prefetch_invalid_gpte. It requires small changes in the callers but > nothing major. I do not get what is_rsvd_bits_set_or_nonpresent() will check exactly and why do we needed it, there are two places where we check present/reserved and in one of them we need to know which one it is. Anyway order of checks in prefetch_invalid_gpte() is not relevant to that patchset, so lets better leave it to a separate discussion. > > But it still seems to me that we're in the above "if (simple || > complex)" case and having a separate "if (!present)" check will be faster. > > Paolo -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Il 29/07/2013 18:24, Gleb Natapov ha scritto: > On Mon, Jul 29, 2013 at 04:12:33PM +0200, Paolo Bonzini wrote: >> Il 29/07/2013 15:20, Gleb Natapov ha scritto: >>>> 2) in cases like this you just do not use likely/unlikely; the branch >>>> will be very unlikely in the beginning, and very likely once shadow >>>> pages are filled or in the no-EPT case. Just let the branch predictor >>>> adjust, it will probably do better than boolean tricks. >>>> >>> likely/unlikely are usually useless anyway. If you can avoid if() >>> altogether this is a win since there is no branch to predict. >> >> However, if the branches are dynamically well-predicted, >> >> if (simple) >> ... >> if (complex) >> ... >> >> is likely faster than >> >> if (simple | complex) >> >> because the branches then are very very cheap, and it pays off to not >> always evaluate the complex branch. > > Good point about about "|" always evaluating both. Is this the case > with if (simple !=0 | complex != 0) too where theoretically compiler may > see that if simple !=0 is true no need to evaluate the second one? Yes (only if complex doesn't have any side effects, which is the case here). >> Yeah, I also thought of always checking bad_mt_xwr and even using it to >> subsume the present check too, i.e. turning it into >> is_rsvd_bits_set_or_nonpresent. It checks the same bits that are used >> in the present check (well, a superset). You can then check for >> presence separately if you care, which you don't in >> prefetch_invalid_gpte. It requires small changes in the callers but >> nothing major. > > I do not get what is_rsvd_bits_set_or_nonpresent() will check exactly > and why do we needed it, there are two places where we check > present/reserved and in one of them we need to know which one it is. You can OR bad_mt_xwr with 0x5555555555555555ULL (I think). Then your implementation of is_rsvd_bits_set() using bad_mt_xwr will return true in all cases where the pte is non-present. You can then call is_present_pte to discriminate the two cases. if (is_rsvd_bits_set_or_nonpresent) { if (!present) ... else ... } In more abstract terms this is: if (simple) ... if (complex) ... to if (simple_or_complex) { if (simple) ... else ... } This can actually make sense if simple is almost always false, because then you save something from not evaluating it on the fast path. But in this case, adding bad_mt_xwr to the non-EPT case is a small loss. > Anyway order of checks in prefetch_invalid_gpte() is not relevant to > that patchset, so lets better leave it to a separate discussion. Yes. Paolo >> >> But it still seems to me that we're in the above "if (simple || >> complex)" case and having a separate "if (!present)" check will be faster. >> >> Paolo > > -- > Gleb. > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jul 29, 2013 at 06:36:12PM +0200, Paolo Bonzini wrote: > Il 29/07/2013 18:24, Gleb Natapov ha scritto: > > On Mon, Jul 29, 2013 at 04:12:33PM +0200, Paolo Bonzini wrote: > >> Il 29/07/2013 15:20, Gleb Natapov ha scritto: > >>>> 2) in cases like this you just do not use likely/unlikely; the branch > >>>> will be very unlikely in the beginning, and very likely once shadow > >>>> pages are filled or in the no-EPT case. Just let the branch predictor > >>>> adjust, it will probably do better than boolean tricks. > >>>> > >>> likely/unlikely are usually useless anyway. If you can avoid if() > >>> altogether this is a win since there is no branch to predict. > >> > >> However, if the branches are dynamically well-predicted, > >> > >> if (simple) > >> ... > >> if (complex) > >> ... > >> > >> is likely faster than > >> > >> if (simple | complex) > >> > >> because the branches then are very very cheap, and it pays off to not > >> always evaluate the complex branch. > > > > Good point about about "|" always evaluating both. Is this the case > > with if (simple !=0 | complex != 0) too where theoretically compiler may > > see that if simple !=0 is true no need to evaluate the second one? > > Yes (only if complex doesn't have any side effects, which is the case here). > > >> Yeah, I also thought of always checking bad_mt_xwr and even using it to > >> subsume the present check too, i.e. turning it into > >> is_rsvd_bits_set_or_nonpresent. It checks the same bits that are used > >> in the present check (well, a superset). You can then check for > >> presence separately if you care, which you don't in > >> prefetch_invalid_gpte. It requires small changes in the callers but > >> nothing major. > > > > I do not get what is_rsvd_bits_set_or_nonpresent() will check exactly > > and why do we needed it, there are two places where we check > > present/reserved and in one of them we need to know which one it is. > > You can OR bad_mt_xwr with 0x5555555555555555ULL (I think). Then your With 0x1010101. > implementation of is_rsvd_bits_set() using bad_mt_xwr will return true > in all cases where the pte is non-present. You can then call > is_present_pte to discriminate the two cases. > > if (is_rsvd_bits_set_or_nonpresent) { > if (!present) > ... > else > ... > } > > In more abstract terms this is: > > if (simple) > ... > if (complex) > ... > > to > > if (simple_or_complex) { > if (simple) > ... > else > ... > } > > This can actually make sense if simple is almost always false, because > then you save something from not evaluating it on the fast path. > > But in this case, adding bad_mt_xwr to the non-EPT case is a small loss. > > > Anyway order of checks in prefetch_invalid_gpte() is not relevant to > > that patchset, so lets better leave it to a separate discussion. > > Yes. > > Paolo > > >> > >> But it still seems to me that we're in the above "if (simple || > >> complex)" case and having a separate "if (!present)" check will be faster. > >> > >> Paolo > > > > -- > > Gleb. > > -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 531f47c..58a17c0 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -286,6 +286,7 @@ struct kvm_mmu { u64 *pae_root; u64 *lm_root; u64 rsvd_bits_mask[2][4]; + u64 bad_mt_xwr; /* * Bitmap: bit set = last pte in walk @@ -512,6 +513,9 @@ struct kvm_vcpu_arch { * instruction. */ bool write_fault_to_shadow_pgtable; + + /* set at EPT violation at this point */ + unsigned long exit_qualification; }; struct kvm_lpage_info { diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 3df3ac3..58ae9db 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -3521,6 +3521,8 @@ static void reset_rsvds_bits_mask(struct kvm_vcpu *vcpu, int maxphyaddr = cpuid_maxphyaddr(vcpu); u64 exb_bit_rsvd = 0; + context->bad_mt_xwr = 0; + if (!context->nx) exb_bit_rsvd = rsvd_bits(63, 63); switch (context->root_level) { @@ -3576,6 +3578,38 @@ static void reset_rsvds_bits_mask(struct kvm_vcpu *vcpu, } } +static void reset_rsvds_bits_mask_ept(struct kvm_vcpu *vcpu, + struct kvm_mmu *context, bool execonly) +{ + int maxphyaddr = cpuid_maxphyaddr(vcpu); + int pte; + + context->rsvd_bits_mask[0][3] = + rsvd_bits(maxphyaddr, 51) | rsvd_bits(3, 7); + context->rsvd_bits_mask[0][2] = + rsvd_bits(maxphyaddr, 51) | rsvd_bits(3, 6); + context->rsvd_bits_mask[0][1] = + rsvd_bits(maxphyaddr, 51) | rsvd_bits(3, 6); + context->rsvd_bits_mask[0][0] = rsvd_bits(maxphyaddr, 51); + + /* large page */ + context->rsvd_bits_mask[1][3] = context->rsvd_bits_mask[0][3]; + context->rsvd_bits_mask[1][2] = + rsvd_bits(maxphyaddr, 51) | rsvd_bits(12, 29); + context->rsvd_bits_mask[1][1] = + rsvd_bits(maxphyaddr, 51) | rsvd_bits(12, 20); + context->rsvd_bits_mask[1][0] = context->rsvd_bits_mask[0][0]; + + for (pte = 0; pte < 64; pte++) { + int rwx_bits = pte & 7; + int mt = pte >> 3; + if (mt == 0x2 || mt == 0x3 || mt == 0x7 || + rwx_bits == 0x2 || rwx_bits == 0x6 || + (rwx_bits == 0x4 && !execonly)) + context->bad_mt_xwr |= (1ull << pte); + } +} + static void update_permission_bitmask(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu) { unsigned bit, byte, pfec; diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index 23a19a5..58d2f87 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -121,14 +121,23 @@ static inline void FNAME(protect_clean_gpte)(unsigned *access, unsigned gpte) #endif } +#if PTTYPE == PTTYPE_EPT +#define CHECK_BAD_MT_XWR(G) mmu->bad_mt_xwr & (1ull << ((G) & 0x3f)); +#else +#define CHECK_BAD_MT_XWR(G) 0; +#endif + static bool FNAME(is_rsvd_bits_set)(struct kvm_mmu *mmu, u64 gpte, int level) { int bit7; bit7 = (gpte >> 7) & 1; - return (gpte & mmu->rsvd_bits_mask[bit7][level-1]) != 0; + return ((gpte & mmu->rsvd_bits_mask[bit7][level-1]) != 0) || + CHECK_BAD_MT_XWR(gpte); } +#undef CHECK_BAD_MT_XWR + static inline int FNAME(is_present_gpte)(unsigned long pte) { #if PTTYPE != PTTYPE_EPT @@ -376,6 +385,25 @@ error: walker->fault.vector = PF_VECTOR; walker->fault.error_code_valid = true; walker->fault.error_code = errcode; + +#if PTTYPE == PTTYPE_EPT + /* + * Use PFERR_RSVD_MASK in erorr_code to to tell if EPT + * misconfiguration requires to be injected. The detection is + * done by is_rsvd_bits_set() above. + * + * We set up the value of exit_qualification to inject: + * [2:0] - Derive from [2:0] of real exit_qualification at EPT violation + * [5:3] - Calculated by the page walk of the guest EPT page tables + * [7:8] - Clear to 0. + * + * The other bits are set to 0. + */ + if (!(errcode & PFERR_RSVD_MASK)) { + vcpu->arch.exit_qualification &= 0x7; + vcpu->arch.exit_qualification |= ((pt_access & pte) & 0x7) << 3; + } +#endif walker->fault.address = addr; walker->fault.nested_page_fault = mmu != vcpu->arch.walk_mmu; diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index fc24370..bbfff8d 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -5321,6 +5321,8 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu) /* ept page table is present? */ error_code |= (exit_qualification >> 3) & 0x1; + vcpu->arch.exit_qualification = exit_qualification; + return kvm_mmu_page_fault(vcpu, gpa, error_code, NULL, 0); } @@ -7416,6 +7418,21 @@ static void vmx_set_supported_cpuid(u32 func, struct kvm_cpuid_entry2 *entry) entry->ecx |= bit(X86_FEATURE_VMX); } +static void nested_ept_inject_page_fault(struct kvm_vcpu *vcpu, + struct x86_exception *fault) +{ + struct vmcs12 *vmcs12; + nested_vmx_vmexit(vcpu); + vmcs12 = get_vmcs12(vcpu); + + if (fault->error_code & PFERR_RSVD_MASK) + vmcs12->vm_exit_reason = EXIT_REASON_EPT_MISCONFIG; + else + vmcs12->vm_exit_reason = EXIT_REASON_EPT_VIOLATION; + vmcs12->exit_qualification = vcpu->arch.exit_qualification; + vmcs12->guest_physical_address = fault->address; +} + /* * prepare_vmcs02 is called when the L1 guest hypervisor runs its nested * L2 guest. L1 has a vmcs for L2 (vmcs12), and this function "merges" it