Message ID | cover.1711035400.git.reinette.chatre@intel.com (mailing list archive) |
---|---|
Headers | show |
Series | KVM: x86: Make bus clock frequency for vAPIC timer configurable | expand |
On Thu, 2024-03-21 at 09:37 -0700, Reinette Chatre wrote: > > Summary > ------- > Add KVM_CAP_X86_APIC_BUS_FREQUENCY capability to configure the APIC > bus clock frequency for APIC timer emulation. > Allow KVM_ENABLE_CAPABILITY(KVM_CAP_X86_APIC_BUS_FREQUENCY) to set the > frequency in nanoseconds. When using this capability, the user space > VMM should configure CPUID leaf 0x15 to advertise the frequency. Looks good to me and... Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com> The only thing missing is actually integrating it into TDX qemu patches and testing the resulting TD. I think we are making a fair assumption that the problem should be resolved based on the analysis, but we have not actually tested that part. Is that right? What do you think?
On Tue, Apr 16, 2024, Rick P Edgecombe wrote: > On Thu, 2024-03-21 at 09:37 -0700, Reinette Chatre wrote: > > > > Summary > > ------- > > Add KVM_CAP_X86_APIC_BUS_FREQUENCY capability to configure the APIC > > bus clock frequency for APIC timer emulation. > > Allow KVM_ENABLE_CAPABILITY(KVM_CAP_X86_APIC_BUS_FREQUENCY) to set the > > frequency in nanoseconds. When using this capability, the user space > > VMM should configure CPUID leaf 0x15 to advertise the frequency. > > Looks good to me and... > Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com> > > The only thing missing is actually integrating it into TDX qemu patches and > testing the resulting TD. I think we are making a fair assumption that the > problem should be resolved based on the analysis, but we have not actually > tested that part. Is that right? Please tell me that Rick is wrong, and that this actually has been tested with a TDX guest. I don't care _who_ tested it, or with what VMM it has been tested, but _someone_ needs to verify that this actually fixes the TDX issue.
On Wed, 2024-04-24 at 09:13 -0700, Sean Christopherson wrote: > On Tue, Apr 16, 2024, Rick P Edgecombe wrote: > > On Thu, 2024-03-21 at 09:37 -0700, Reinette Chatre wrote: > > > > > > Summary > > > ------- > > > Add KVM_CAP_X86_APIC_BUS_FREQUENCY capability to configure the APIC > > > bus clock frequency for APIC timer emulation. > > > Allow KVM_ENABLE_CAPABILITY(KVM_CAP_X86_APIC_BUS_FREQUENCY) to set the > > > frequency in nanoseconds. When using this capability, the user space > > > VMM should configure CPUID leaf 0x15 to advertise the frequency. > > > > Looks good to me and... > > Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com> > > > > The only thing missing is actually integrating it into TDX qemu patches and > > testing the resulting TD. I think we are making a fair assumption that the > > problem should be resolved based on the analysis, but we have not actually > > tested that part. Is that right? > > Please tell me that Rick is wrong, and that this actually has been tested with > a TDX guest. I don't care _who_ tested it, or with what VMM it has been > tested, > but _someone_ needs to verify that this actually fixes the TDX issue. It is in the process of getting a TDX test developed (or rather updated). Agreed, it requires verification that it fixes the original TDX issue. That is why I raised it. Reinette was working on this internally and some iterations were happening, but we are trying to work on the public list as much as possible per your earlier comments. So that is why she posted it. There was at least some level of TDX integration in the past. I'm not sure what exactly was tested, but we are going to re-verify it with the latest everything.
On Wed, Apr 24, 2024, Rick P Edgecombe wrote: > On Wed, 2024-04-24 at 09:13 -0700, Sean Christopherson wrote: > > On Tue, Apr 16, 2024, Rick P Edgecombe wrote: > > > On Thu, 2024-03-21 at 09:37 -0700, Reinette Chatre wrote: > > > > > > > > Summary > > > > ------- > > > > Add KVM_CAP_X86_APIC_BUS_FREQUENCY capability to configure the APIC > > > > bus clock frequency for APIC timer emulation. > > > > Allow KVM_ENABLE_CAPABILITY(KVM_CAP_X86_APIC_BUS_FREQUENCY) to set the > > > > frequency in nanoseconds. When using this capability, the user space > > > > VMM should configure CPUID leaf 0x15 to advertise the frequency. > > > > > > Looks good to me and... > > > Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com> > > > > > > The only thing missing is actually integrating it into TDX qemu patches and > > > testing the resulting TD. I think we are making a fair assumption that the > > > problem should be resolved based on the analysis, but we have not actually > > > tested that part. Is that right? > > > > Please tell me that Rick is wrong, and that this actually has been tested with > > a TDX guest. I don't care _who_ tested it, or with what VMM it has been > > tested, but _someone_ needs to verify that this actually fixes the TDX issue. > > It is in the process of getting a TDX test developed (or rather updated). > Agreed, it requires verification that it fixes the original TDX issue. That is > why I raised it. > > Reinette was working on this internally and some iterations were happening, but > we are trying to work on the public list as much as possible per your earlier > comments. So that is why she posted it. I have no problem posting "early", but Documentation/process/maintainer-kvm-x86.rst clearly states under Testing that: If you can't fully test a change, e.g. due to lack of hardware, clearly state what level of testing you were able to do, e.g. in the cover letter. I was assuming that this was actually *fully* tested, because nothing suggests otherwise. And _that_ is a problem, e.g. I was planning on applying this series for 6.10, which would have made for quite the mess if it turns out that this doesn't actually solve the TDX problem. > There was at least some level of TDX integration in the past. I'm not sure what > exactly was tested, but we are going to re-verify it with the latest everything. Honest question, is it a big lift to re-test the QEMU+KVM TDX changes, e.g. to verify this new capability actually does what we hope it does? If testing is a big lift, what are the pain points? Or perhaps a better question is, is there anything we (both upstream people, and end users like Google) can do to make re-testing less awful?
+Vishal On Wed, 2024-04-24 at 10:23 -0700, Sean Christopherson wrote: > I have no problem posting "early", but Documentation/process/maintainer-kvm- > x86.rst > clearly states under Testing that: > > If you can't fully test a change, e.g. due to lack of hardware, clearly > state > what level of testing you were able to do, e.g. in the cover letter. > > I was assuming that this was actually *fully* tested, because nothing suggests > otherwise. And _that_ is a problem, e.g. I was planning on applying this > series > for 6.10, which would have made for quite the mess if it turns out that this > doesn't actually solve the TDX problem. Ok, sorry. Will definitely be clear about this in the future. There is a little bit of chaos on our end right now as new people spin up and we adjust our working model to be more upstream focused. Thanks for being clear. Yes, please don't apply until we have the full testing done. It may not be far away though, per Reinette. > > > There was at least some level of TDX integration in the past. I'm not sure > > what > > exactly was tested, but we are going to re-verify it with the latest > > everything. > > Honest question, is it a big lift to re-test the QEMU+KVM TDX changes, e.g. to > verify this new capability actually does what we hope it does? If testing is > a > big lift, what are the pain points? Or perhaps a better question is, is there > anything we (both upstream people, and end users like Google) can do to make > re-testing less awful? I wouldn't say a big lift, but probably more than usual. Most of the testing challenges come from updating and matching specific, often out of tree components. We need to have specific OVMF, TDX module, QEMU, KVM core patches, KVM breakout series, and tests versions that all match. To wrangle it, automated testing is something we are working on internally right now. I can't think of anything to ask of upstream specifically. But Vishal might. As for Google, the TDX selftests are useful. We need to update them ourselves to keep up with uAPI changes. We could do a little more co-development on those? As in, not wait until we post new versions to get updates from Google's side. Just an idea off the top of my head. As for the TDX kvm unit tests updates [0]. They have not had much review. I think we maybe have enough TDX patches to focus on already though. Long term though, I have been wondering about how to prevent TDX regressions especially on the MMU pieces. It is one thing to have the TDX setups available for maintainers, but most normal developers will likely not have access to TDX HW for a bit. Just a problem without a solution. [0] https://lore.kernel.org/kvm/20231218072247.2573516-1-qian.wen@intel.com/#t
On Wed, Apr 24, 2024, Rick P Edgecombe wrote: > Long term though, I have been wondering about how to prevent TDX regressions > especially on the MMU pieces. It is one thing to have the TDX setups available > for maintainers, but most normal developers will likely not have access to TDX > HW for a bit. Just a problem without a solution. I wouldn't worry too much about hardware availability. As you said, it's not a problem we can really solve, and we already have to be concious of the fact that not all developers have comparable hardware. E.g. most people don't have a 4-sock, multi-hundred CPU system with TiBs of RAM. Not being able to test at all is obviously a little different, but it's not entirely new. Instead, I would encourage spending time and effort (after things have settled down patch wise) to build out selftests. I tried to run a "real" SEV-ES VM and gave up because I needed the "right" OVMF build, blah blah blah. At some point I'll probably bite the bullet and get a "full" CoCo setup working, but it's not exactly at the top of my todo list, in no small part because the triage and debug experience when things go wrong is miles and miles better in selftests.
(+Yao Yuan) Hi Sean and Rick, On 4/24/2024 9:38 AM, Edgecombe, Rick P wrote: > On Wed, 2024-04-24 at 09:13 -0700, Sean Christopherson wrote: >> On Tue, Apr 16, 2024, Rick P Edgecombe wrote: >>> On Thu, 2024-03-21 at 09:37 -0700, Reinette Chatre wrote: >>>> >>>> Summary >>>> ------- >>>> Add KVM_CAP_X86_APIC_BUS_FREQUENCY capability to configure the APIC >>>> bus clock frequency for APIC timer emulation. >>>> Allow KVM_ENABLE_CAPABILITY(KVM_CAP_X86_APIC_BUS_FREQUENCY) to set the >>>> frequency in nanoseconds. When using this capability, the user space >>>> VMM should configure CPUID leaf 0x15 to advertise the frequency. >>> >>> Looks good to me and... >>> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com> >>> >>> The only thing missing is actually integrating it into TDX qemu patches and >>> testing the resulting TD. I think we are making a fair assumption that the >>> problem should be resolved based on the analysis, but we have not actually >>> tested that part. Is that right? >> >> Please tell me that Rick is wrong, and that this actually has been tested with >> a TDX guest. I don't care _who_ tested it, or with what VMM it has been >> tested, >> but _someone_ needs to verify that this actually fixes the TDX issue. > > It is in the process of getting a TDX test developed (or rather updated). > Agreed, it requires verification that it fixes the original TDX issue. That is > why I raised it. > > Reinette was working on this internally and some iterations were happening, but > we are trying to work on the public list as much as possible per your earlier > comments. So that is why she posted it. > > There was at least some level of TDX integration in the past. I'm not sure what > exactly was tested, but we are going to re-verify it with the latest everything. Apologies for the delay. I am the one needing to do this testing and it took me a while to ramp up on all the parts (and I am still learning). I encountered quite the roadblock (for me) along the way that was caused by a lingering timer (presumably left by TDVF). Thank you so much to Isaku and Yao Yuan for helping me to root cause this. I believe that this is unique to the kvm-unit-tests that does not reset the environment like the OS. A modified x86/apic.c:test_apic_timer_one_shot() was used to test this feature. Below I provide the diff of essential parts against https://github.com/intel/kvm-unit-tests-tdx/blob/tdx/x86/apic.c for your reference. With these modifications it can be confirmed that the test within a TD fails without the work in this series, and passes with it. This was tested against a host kernel running a snapshot of the ongoing KVM TDX work and corresponding QEMU changes (including a QEMU change that enables the new capability introduced in this series). Below are the core changes made to the existing APIC test. The two major changes are: (a) stop any lingering timers before the test starts, (b) use CPUID 0x15 in TDX to accurately determine the TSC and APIC frequencies instead of making 1GHz assumption and use similar check as the kselftest test introduced in this series (I did have to increase the amount with which the frequency is allowed to deviate by 1% in my testing). Please note that there are some more changes needed to run this test in TDX since all APIC tests are not appropriate for TDX. This snippet was used in my testing and I will work with kvm-unit-test folks on the next steps to have it integrated. @@ -477,11 +478,29 @@ static void lvtt_handler(isr_regs_t *regs) static void test_apic_timer_one_shot(void) { - uint64_t tsc1, tsc2; static const uint32_t interval = 0x10000; + uint64_t measured_apic_freq, tsc2, tsc1; + uint32_t tsc_freq = 0, apic_freq = 0; + struct cpuid cpuid_tsc = {}; #define APIC_LVT_TIMER_VECTOR (0xee) + /* + * CPUID 0x15 is not available in VMX, can use it to obtain + * TSC and APIC frequency for accurate testing + */ + if (is_tdx_guest()) { + cpuid_tsc = raw_cpuid(0x15, 0); + tsc_freq = cpuid_tsc.c * cpuid_tsc.b / cpuid_tsc.a; + apic_freq = cpuid_tsc.c; + } + /* + stop already fired local timer + the test case can be negative failure if the timer fired + after installed lvtt_handler but *before* + write to TIMICT again. + */ + apic_write(APIC_TMICT, 0); handle_irq(APIC_LVT_TIMER_VECTOR, lvtt_handler); /* One shot mode */ @@ -503,8 +522,16 @@ static void test_apic_timer_one_shot(void) * cases, the following should satisfy on all modern * processors. */ - report((lvtt_counter == 1) && (tsc2 - tsc1 >= interval), - "APIC LVT timer one shot"); + if (is_tdx_guest()) { + measured_apic_freq = interval * (tsc_freq / (tsc2 - tsc1)); + report((lvtt_counter == 1) && + (measured_apic_freq < apic_freq * 102 / 100) && + (measured_apic_freq > apic_freq * 98 / 100), + "APIC LVT timer one shot"); + } else { + report((lvtt_counter == 1) && (tsc2 - tsc1 >= interval), + "APIC LVT timer one shot"); + } } Reinette
Hi Sean, On 4/24/2024 10:23 AM, Sean Christopherson wrote: > On Wed, Apr 24, 2024, Rick P Edgecombe wrote: >> On Wed, 2024-04-24 at 09:13 -0700, Sean Christopherson wrote: >>> On Tue, Apr 16, 2024, Rick P Edgecombe wrote: >>>> On Thu, 2024-03-21 at 09:37 -0700, Reinette Chatre wrote: >>>>> >>>>> Summary >>>>> ------- >>>>> Add KVM_CAP_X86_APIC_BUS_FREQUENCY capability to configure the APIC >>>>> bus clock frequency for APIC timer emulation. >>>>> Allow KVM_ENABLE_CAPABILITY(KVM_CAP_X86_APIC_BUS_FREQUENCY) to set the >>>>> frequency in nanoseconds. When using this capability, the user space >>>>> VMM should configure CPUID leaf 0x15 to advertise the frequency. >>>> >>>> Looks good to me and... >>>> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com> >>>> >>>> The only thing missing is actually integrating it into TDX qemu patches and >>>> testing the resulting TD. I think we are making a fair assumption that the >>>> problem should be resolved based on the analysis, but we have not actually >>>> tested that part. Is that right? >>> >>> Please tell me that Rick is wrong, and that this actually has been tested with >>> a TDX guest. I don't care _who_ tested it, or with what VMM it has been >>> tested, but _someone_ needs to verify that this actually fixes the TDX issue. >> >> It is in the process of getting a TDX test developed (or rather updated). >> Agreed, it requires verification that it fixes the original TDX issue. That is >> why I raised it. >> >> Reinette was working on this internally and some iterations were happening, but >> we are trying to work on the public list as much as possible per your earlier >> comments. So that is why she posted it. > > I have no problem posting "early", but Documentation/process/maintainer-kvm-x86.rst > clearly states under Testing that: > > If you can't fully test a change, e.g. due to lack of hardware, clearly state > what level of testing you were able to do, e.g. in the cover letter. > > I was assuming that this was actually *fully* tested, because nothing suggests > otherwise. And _that_ is a problem, e.g. I was planning on applying this series > for 6.10, which would have made for quite the mess if it turns out that this > doesn't actually solve the TDX problem. There was one vote for the capability name to rather be KVM_CAP_X86_APIC_BUS_CYCLES_NS [1] I'd be happy to resubmit with the name changed but after reading your statement above it is not clear to me what name is preferred: KVM_CAP_X86_APIC_BUS_FREQUENCY as used in this series that seem to meet your approval or KVM_CAP_X86_APIC_BUS_CYCLES_NS. Please let me know what you prefer. Thank you. Reinette [1] https://lore.kernel.org/lkml/1e26b405-f382-45f4-9dd5-3ea5db68302a@intel.com/
On Wed, Apr 24, 2024, Reinette Chatre wrote: > There was one vote for the capability name to rather be KVM_CAP_X86_APIC_BUS_CYCLES_NS [1] > > I'd be happy to resubmit with the name changed but after reading your > statement above it is not clear to me what name is preferred: > KVM_CAP_X86_APIC_BUS_FREQUENCY as used in this series that seem to meet your > approval or KVM_CAP_X86_APIC_BUS_CYCLES_NS. > > Please let me know what you prefer. Both work for me, I don't have a strong preference.
Hi Sean, On 4/25/2024 9:17 AM, Sean Christopherson wrote: > On Wed, Apr 24, 2024, Reinette Chatre wrote: >> There was one vote for the capability name to rather be KVM_CAP_X86_APIC_BUS_CYCLES_NS [1] >> >> I'd be happy to resubmit with the name changed but after reading your >> statement above it is not clear to me what name is preferred: >> KVM_CAP_X86_APIC_BUS_FREQUENCY as used in this series that seem to meet your >> approval or KVM_CAP_X86_APIC_BUS_CYCLES_NS. >> >> Please let me know what you prefer. > > Both work for me, I don't have a strong preference. Thank you. I'll resubmit with the capability name changed to KVM_CAP_X86_APIC_BUS_CYCLES_NS. This is the only planned change. Reinette