Message ID | 162826611747.32391.16149996928851353357.stgit@bmoger-ubuntu (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Couple of SVM fixes | expand |
On 8/6/21 11:53 AM, Sean Christopherson wrote: > On Fri, Aug 06, 2021, Babu Moger wrote: >> From: Babu Moger <Babu.Moger@amd.com> >> >> The test ./x86/access fails with a timeout. This is due to the number test >> combination. The test cases increase exponentially as the features get >> enabled. The new machine adds the feature AC_CPU_CR4_PKE. The default >> timeout is 180 seconds. Seen this problem both on AMD and Intel machines. >> >> #./tests/access >> qemu-system-x86_64: terminating on signal 15 from pid 20050 (timeout) >> FAIL access (timeout; duration=180) >> >> This test can take about 7 minutes without timeout. >> time ./tests/access >> 58982405 tests, 0 failures >> PASS access >> >> real 7m10.063s >> user 7m9.063s >> sys 0m0.309s >> >> Fix the problem by adding few more limit checks. > > Please state somewhere in the changelog what is actually being changed, and the > actual effect of the change. E.g. > > Disallow protection keys testcase in combination with reserved bit > testcasess to further limit the number of tests run to avoid timeouts on > systems with support for protection keys. > > Disallowing this combination reduces the total number of tests from > 58982405 to <???>, and the runtime from ~7 minutes to <???> Sure. Will do. > >> Signed-off-by: Babu Moger <Babu.Moger@amd.com> >> --- >> x86/access.c | 4 ++-- >> 1 file changed, 2 insertions(+), 2 deletions(-) >> >> diff --git a/x86/access.c b/x86/access.c >> index 47807cc..e371dd5 100644 >> --- a/x86/access.c >> +++ b/x86/access.c >> @@ -317,9 +317,9 @@ static _Bool ac_test_legal(ac_test_t *at) >> /* >> * Shorten the test by avoiding testing too many reserved bit combinations >> */ >> - if ((F(AC_PDE_BIT51) + F(AC_PDE_BIT36) + F(AC_PDE_BIT13)) > 1) >> + if ((F(AC_PDE_BIT51) + F(AC_PDE_BIT36) + F(AC_PDE_BIT13) + F(AC_CPU_CR4_PKE)) > 1) >> return false; >> - if ((F(AC_PTE_BIT51) + F(AC_PTE_BIT36)) > 1) >> + if ((F(AC_PTE_BIT51) + F(AC_PTE_BIT36) + F(AC_CPU_CR4_PKE)) > 1) > > Why are protection keys the sacrifical lamb? Simply because they're the newest? Yes. I added it because it was new ):. > > And before we start killing off arbitrary combinations, what about sharding the > test so that testcases that depend on a specific CR0/CR4/EFER bit, i.e. trigger > a VM-Exit when the configuration changes, are separate runs? Being able to run > a specific combination would also hopefully make it easier to debug issues as > the user could specify which combo to run without having to modify the code and > recompile. > > That probably won't actually reduce the total run time, but it would make each > run a separate test and give developers a warm fuzzy feeling that they're indeed > making progress :-) > > Not sure how this could be automagically expressed this in unittest.cfg though... Let me investigate if we can do that fairly easy. Will let you know. Thanks Babu > >> return false; >> >> return true; >>
On 8/9/21 2:43 PM, Babu Moger wrote: > > > On 8/6/21 11:53 AM, Sean Christopherson wrote: >> On Fri, Aug 06, 2021, Babu Moger wrote: >>> From: Babu Moger <Babu.Moger@amd.com> >>> >>> The test ./x86/access fails with a timeout. This is due to the number test >>> combination. The test cases increase exponentially as the features get >>> enabled. The new machine adds the feature AC_CPU_CR4_PKE. The default >>> timeout is 180 seconds. Seen this problem both on AMD and Intel machines. >>> >>> #./tests/access >>> qemu-system-x86_64: terminating on signal 15 from pid 20050 (timeout) >>> FAIL access (timeout; duration=180) >>> >>> This test can take about 7 minutes without timeout. >>> time ./tests/access >>> 58982405 tests, 0 failures >>> PASS access >>> >>> real 7m10.063s >>> user 7m9.063s >>> sys 0m0.309s >>> >>> Fix the problem by adding few more limit checks. >> >> Please state somewhere in the changelog what is actually being changed, and the >> actual effect of the change. E.g. >> >> Disallow protection keys testcase in combination with reserved bit >> testcasess to further limit the number of tests run to avoid timeouts on >> systems with support for protection keys. >> >> Disallowing this combination reduces the total number of tests from >> 58982405 to <???>, and the runtime from ~7 minutes to <???> > > Sure. Will do. >> >>> Signed-off-by: Babu Moger <Babu.Moger@amd.com> >>> --- >>> x86/access.c | 4 ++-- >>> 1 file changed, 2 insertions(+), 2 deletions(-) >>> >>> diff --git a/x86/access.c b/x86/access.c >>> index 47807cc..e371dd5 100644 >>> --- a/x86/access.c >>> +++ b/x86/access.c >>> @@ -317,9 +317,9 @@ static _Bool ac_test_legal(ac_test_t *at) >>> /* >>> * Shorten the test by avoiding testing too many reserved bit combinations >>> */ >>> - if ((F(AC_PDE_BIT51) + F(AC_PDE_BIT36) + F(AC_PDE_BIT13)) > 1) >>> + if ((F(AC_PDE_BIT51) + F(AC_PDE_BIT36) + F(AC_PDE_BIT13) + F(AC_CPU_CR4_PKE)) > 1) >>> return false; >>> - if ((F(AC_PTE_BIT51) + F(AC_PTE_BIT36)) > 1) >>> + if ((F(AC_PTE_BIT51) + F(AC_PTE_BIT36) + F(AC_CPU_CR4_PKE)) > 1) >> >> Why are protection keys the sacrifical lamb? Simply because they're the newest? > > Yes. I added it because it was new ):. >> >> And before we start killing off arbitrary combinations, what about sharding the >> test so that testcases that depend on a specific CR0/CR4/EFER bit, i.e. trigger >> a VM-Exit when the configuration changes, are separate runs? Being able to run >> a specific combination would also hopefully make it easier to debug issues as >> the user could specify which combo to run without having to modify the code and >> recompile. >> >> That probably won't actually reduce the total run time, but it would make each >> run a separate test and give developers a warm fuzzy feeling that they're indeed >> making progress :-) >> >> Not sure how this could be automagically expressed this in unittest.cfg though... As we know now that we cannot run a huge number of tests without running into timeout, I was thinking of adding a extra parameter "max_runs" for these tests and add a check in ac_test_run to limit the number of runs. The max_runs will be set to default 10000000. But it can be changed in unittests.cfg. Something like this. [access] file = access.flat arch = x86_64 -extra_params = -cpu max +extra_params = -cpu max -append 10000000 timeout = 180 Thoughts? > > Let me investigate if we can do that fairly easy. Will let you know. > Thanks > Babu >> >>> return false; >>> >>> return true; >>>
On 8/10/21 11:59 AM, Babu Moger wrote: > > > On 8/9/21 2:43 PM, Babu Moger wrote: >> >> >> On 8/6/21 11:53 AM, Sean Christopherson wrote: >>> On Fri, Aug 06, 2021, Babu Moger wrote: >>>> From: Babu Moger <Babu.Moger@amd.com> >>>> >>>> The test ./x86/access fails with a timeout. This is due to the number test >>>> combination. The test cases increase exponentially as the features get >>>> enabled. The new machine adds the feature AC_CPU_CR4_PKE. The default >>>> timeout is 180 seconds. Seen this problem both on AMD and Intel machines. >>>> >>>> #./tests/access >>>> qemu-system-x86_64: terminating on signal 15 from pid 20050 (timeout) >>>> FAIL access (timeout; duration=180) >>>> >>>> This test can take about 7 minutes without timeout. >>>> time ./tests/access >>>> 58982405 tests, 0 failures >>>> PASS access >>>> >>>> real 7m10.063s >>>> user 7m9.063s >>>> sys 0m0.309s >>>> >>>> Fix the problem by adding few more limit checks. >>> >>> Please state somewhere in the changelog what is actually being changed, and the >>> actual effect of the change. E.g. >>> >>> Disallow protection keys testcase in combination with reserved bit >>> testcasess to further limit the number of tests run to avoid timeouts on >>> systems with support for protection keys. >>> >>> Disallowing this combination reduces the total number of tests from >>> 58982405 to <???>, and the runtime from ~7 minutes to <???> >> >> Sure. Will do. >>> >>>> Signed-off-by: Babu Moger <Babu.Moger@amd.com> >>>> --- >>>> x86/access.c | 4 ++-- >>>> 1 file changed, 2 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/x86/access.c b/x86/access.c >>>> index 47807cc..e371dd5 100644 >>>> --- a/x86/access.c >>>> +++ b/x86/access.c >>>> @@ -317,9 +317,9 @@ static _Bool ac_test_legal(ac_test_t *at) >>>> /* >>>> * Shorten the test by avoiding testing too many reserved bit combinations >>>> */ >>>> - if ((F(AC_PDE_BIT51) + F(AC_PDE_BIT36) + F(AC_PDE_BIT13)) > 1) >>>> + if ((F(AC_PDE_BIT51) + F(AC_PDE_BIT36) + F(AC_PDE_BIT13) + F(AC_CPU_CR4_PKE)) > 1) >>>> return false; >>>> - if ((F(AC_PTE_BIT51) + F(AC_PTE_BIT36)) > 1) >>>> + if ((F(AC_PTE_BIT51) + F(AC_PTE_BIT36) + F(AC_CPU_CR4_PKE)) > 1) >>> >>> Why are protection keys the sacrifical lamb? Simply because they're the newest? >> >> Yes. I added it because it was new ):. >>> >>> And before we start killing off arbitrary combinations, what about sharding the >>> test so that testcases that depend on a specific CR0/CR4/EFER bit, i.e. trigger >>> a VM-Exit when the configuration changes, are separate runs? Being able to run >>> a specific combination would also hopefully make it easier to debug issues as >>> the user could specify which combo to run without having to modify the code and >>> recompile. >>> >>> That probably won't actually reduce the total run time, but it would make each >>> run a separate test and give developers a warm fuzzy feeling that they're indeed >>> making progress :-) >>> >>> Not sure how this could be automagically expressed this in unittest.cfg though... > > As we know now that we cannot run a huge number of tests without running > into timeout, I was thinking of adding a extra parameter "max_runs" for > these tests and add a check in ac_test_run to limit the number of runs. > The max_runs will be set to default 10000000. But it can be changed in > unittests.cfg. Something like this. > > [access] > file = access.flat > arch = x86_64 > -extra_params = -cpu max > +extra_params = -cpu max -append 10000000 No. This will not work. The PKU feature flag is bit 30. That is 2^30 iterations to cover the tests for this feature. Looks like I need to split the tests into PKU and non PKU tests. For PKU tests I may need to change the bump frequency (in ac_test_bump_one) to much higher value. Right now, it is 1. Let me try that, > timeout = 180 > > Thoughts? > >> >> Let me investigate if we can do that fairly easy. Will let you know. >> Thanks >> Babu >>> >>>> return false; >>>> >>>> return true; >>>>
On 11/08/21 01:38, Babu Moger wrote: > No. This will not work. The PKU feature flag is bit 30. That is 2^30 > iterations to cover the tests for this feature. Looks like I need to split > the tests into PKU and non PKU tests. For PKU tests I may need to change > the bump frequency (in ac_test_bump_one) to much higher value. Right now, > it is 1. Let me try that, The simplest way to cut on tests, which is actually similar to this patch, would be: - do not try all combinations of PTE access bits when reserved bits are set - do not try combinations with more than one reserved bit set Paolo
On 8/11/21 2:09 AM, Paolo Bonzini wrote: > On 11/08/21 01:38, Babu Moger wrote: >> No. This will not work. The PKU feature flag is bit 30. That is 2^30 >> iterations to cover the tests for this feature. Looks like I need to split >> the tests into PKU and non PKU tests. For PKU tests I may need to change >> the bump frequency (in ac_test_bump_one) to much higher value. Right now, >> it is 1. Let me try that, > > The simplest way to cut on tests, which is actually similar to this patch, > would be: > > - do not try all combinations of PTE access bits when reserved bits are set > > - do not try combinations with more than one reserved bit set Did you mean this? Just doing this reduces the combination by huge number. I don't need to add your first PTE access combinations. diff --git a/x86/access.c b/x86/access.c index 47807cc..a730b6b 100644 --- a/x86/access.c +++ b/x86/access.c @@ -317,9 +317,7 @@ static _Bool ac_test_legal(ac_test_t *at) /* * Shorten the test by avoiding testing too many reserved bit combinations */ - if ((F(AC_PDE_BIT51) + F(AC_PDE_BIT36) + F(AC_PDE_BIT13)) > 1) - return false; - if ((F(AC_PTE_BIT51) + F(AC_PTE_BIT36)) > 1) + if ((F(AC_PDE_BIT51) + F(AC_PDE_BIT36) + F(AC_PDE_BIT13) + F(AC_PTE_BIT51) + F(AC_PTE_BIT36)) > 1) return false; return true;
On Wed, Aug 11, 2021, Babu Moger wrote: > > On 8/11/21 2:09 AM, Paolo Bonzini wrote: > > On 11/08/21 01:38, Babu Moger wrote: > >> No. This will not work. The PKU feature flag is bit 30. That is 2^30 > >> iterations to cover the tests for this feature. Looks like I need to split > >> the tests into PKU and non PKU tests. For PKU tests I may need to change > >> the bump frequency (in ac_test_bump_one) to much higher value. Right now, > >> it is 1. Let me try that, > > > > The simplest way to cut on tests, which is actually similar to this patch, > > would be: > > > > - do not try all combinations of PTE access bits when reserved bits are set > > > > - do not try combinations with more than one reserved bit set > > Did you mean this? Just doing this reduces the combination by huge number. > I don't need to add your first PTE access combinations. > > diff --git a/x86/access.c b/x86/access.c > index 47807cc..a730b6b 100644 > --- a/x86/access.c > +++ b/x86/access.c > @@ -317,9 +317,7 @@ static _Bool ac_test_legal(ac_test_t *at) > /* > * Shorten the test by avoiding testing too many reserved bit > combinations > */ > - if ((F(AC_PDE_BIT51) + F(AC_PDE_BIT36) + F(AC_PDE_BIT13)) > 1) > - return false; > - if ((F(AC_PTE_BIT51) + F(AC_PTE_BIT36)) > 1) > + if ((F(AC_PDE_BIT51) + F(AC_PDE_BIT36) + F(AC_PDE_BIT13) + > F(AC_PTE_BIT51) + F(AC_PTE_BIT36)) > 1) > return false; > > return true; Looks good to me, is it sufficient to keep the overall runtime sane?. And maybe update the comment too, e.g. something like /* * Skip testing multiple reserved bits to shorten the test. Reserved * bit page faults are terminal and multiple reserved bits do not affect * the error code; the odds of a KVM bug are super low, and the odds of * actually being able to detect a bug are even lower. */
On 8/11/21 11:13 AM, Sean Christopherson wrote: > On Wed, Aug 11, 2021, Babu Moger wrote: >> >> On 8/11/21 2:09 AM, Paolo Bonzini wrote: >>> On 11/08/21 01:38, Babu Moger wrote: >>>> No. This will not work. The PKU feature flag is bit 30. That is 2^30 >>>> iterations to cover the tests for this feature. Looks like I need to split >>>> the tests into PKU and non PKU tests. For PKU tests I may need to change >>>> the bump frequency (in ac_test_bump_one) to much higher value. Right now, >>>> it is 1. Let me try that, >>> >>> The simplest way to cut on tests, which is actually similar to this patch, >>> would be: >>> >>> - do not try all combinations of PTE access bits when reserved bits are set >>> >>> - do not try combinations with more than one reserved bit set >> >> Did you mean this? Just doing this reduces the combination by huge number. >> I don't need to add your first PTE access combinations. >> >> diff --git a/x86/access.c b/x86/access.c >> index 47807cc..a730b6b 100644 >> --- a/x86/access.c >> +++ b/x86/access.c >> @@ -317,9 +317,7 @@ static _Bool ac_test_legal(ac_test_t *at) >> /* >> * Shorten the test by avoiding testing too many reserved bit >> combinations >> */ >> - if ((F(AC_PDE_BIT51) + F(AC_PDE_BIT36) + F(AC_PDE_BIT13)) > 1) >> - return false; >> - if ((F(AC_PTE_BIT51) + F(AC_PTE_BIT36)) > 1) >> + if ((F(AC_PDE_BIT51) + F(AC_PDE_BIT36) + F(AC_PDE_BIT13) + >> F(AC_PTE_BIT51) + F(AC_PTE_BIT36)) > 1) >> return false; >> >> return true; > > Looks good to me, is it sufficient to keep the overall runtime sane?. And maybe It keeps the running time about 2 minutes. > update the comment too, e.g. something like > > /* > * Skip testing multiple reserved bits to shorten the test. Reserved > * bit page faults are terminal and multiple reserved bits do not affect > * the error code; the odds of a KVM bug are super low, and the odds of > * actually being able to detect a bug are even lower. > */ > Sure. Will update the commit log. Thanks Babu
diff --git a/x86/access.c b/x86/access.c index 47807cc..e371dd5 100644 --- a/x86/access.c +++ b/x86/access.c @@ -317,9 +317,9 @@ static _Bool ac_test_legal(ac_test_t *at) /* * Shorten the test by avoiding testing too many reserved bit combinations */ - if ((F(AC_PDE_BIT51) + F(AC_PDE_BIT36) + F(AC_PDE_BIT13)) > 1) + if ((F(AC_PDE_BIT51) + F(AC_PDE_BIT36) + F(AC_PDE_BIT13) + F(AC_CPU_CR4_PKE)) > 1) return false; - if ((F(AC_PTE_BIT51) + F(AC_PTE_BIT36)) > 1) + if ((F(AC_PTE_BIT51) + F(AC_PTE_BIT36) + F(AC_CPU_CR4_PKE)) > 1) return false; return true;