Message ID | 20210316161844.1658-2-andrew.cooper3@citrix.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | x86/msr: Fixes for XSA-351 | expand |
On 16.03.2021 17:18, Andrew Cooper wrote: > In hindsight, this was a poor move. Some of these MSRs require probing for, > causing unhelpful spew into xl dmesg, as well as spew from unit tests > explicitly checking behaviour. I can indeed see your point for MSRs that require probing. But what about the others (which, as it seems, is the majority)? And perhaps specifically what about the entire WRMSR side, which won't be related to probing? I'm not opposed to the change, but I'd like to understand the reasoning for every one of the MSRs, not just a subset. Of course such ever-growing lists of case labels aren't very nice - this going away was one of the things I particularly liked about the original change. Jan > --- a/xen/arch/x86/msr.c > +++ b/xen/arch/x86/msr.c > @@ -175,6 +175,30 @@ int guest_rdmsr(struct vcpu *v, uint32_t msr, uint64_t *val) > > switch ( msr ) > { > + /* Write-only */ > + case MSR_AMD_PATCHLOADER: > + case MSR_IA32_UCODE_WRITE: > + case MSR_PRED_CMD: > + case MSR_FLUSH_CMD: > + > + /* Not offered to guests. */ > + case MSR_TEST_CTRL: > + case MSR_CORE_CAPABILITIES: > + case MSR_TSX_FORCE_ABORT: > + case MSR_TSX_CTRL: > + case MSR_MCU_OPT_CTRL: > + case MSR_RTIT_OUTPUT_BASE ... MSR_RTIT_ADDR_B(7): > + case MSR_U_CET: > + case MSR_S_CET: > + case MSR_PL0_SSP ... MSR_INTERRUPT_SSP_TABLE: > + case MSR_AMD64_LWP_CFG: > + case MSR_AMD64_LWP_CBADDR: > + case MSR_PPIN_CTL: > + case MSR_PPIN: > + case MSR_AMD_PPIN_CTL: > + case MSR_AMD_PPIN: > + goto gp_fault; > + > case MSR_IA32_FEATURE_CONTROL: > /* > * Architecturally, availability of this MSR is enumerated by the > @@ -382,6 +406,30 @@ int guest_wrmsr(struct vcpu *v, uint32_t msr, uint64_t val) > { > uint64_t rsvd; > > + /* Read-only */ > + case MSR_IA32_PLATFORM_ID: > + case MSR_CORE_CAPABILITIES: > + case MSR_INTEL_CORE_THREAD_COUNT: > + case MSR_INTEL_PLATFORM_INFO: > + case MSR_ARCH_CAPABILITIES: > + > + /* Not offered to guests. */ > + case MSR_TEST_CTRL: > + case MSR_TSX_FORCE_ABORT: > + case MSR_TSX_CTRL: > + case MSR_MCU_OPT_CTRL: > + case MSR_RTIT_OUTPUT_BASE ... MSR_RTIT_ADDR_B(7): > + case MSR_U_CET: > + case MSR_S_CET: > + case MSR_PL0_SSP ... MSR_INTERRUPT_SSP_TABLE: > + case MSR_AMD64_LWP_CFG: > + case MSR_AMD64_LWP_CBADDR: > + case MSR_PPIN_CTL: > + case MSR_PPIN: > + case MSR_AMD_PPIN_CTL: > + case MSR_AMD_PPIN: > + goto gp_fault; > + > case MSR_AMD_PATCHLEVEL: > BUILD_BUG_ON(MSR_IA32_UCODE_REV != MSR_AMD_PATCHLEVEL); > /* >
On Tue, Mar 16, 2021 at 04:18:42PM +0000, Andrew Cooper wrote: > In hindsight, this was a poor move. Some of these MSRs require probing for, > causing unhelpful spew into xl dmesg, as well as spew from unit tests > explicitly checking behaviour. > > This restores behaviour close to that of Xen 4.14. I think it might be worth adding that guest access to those MSRs will now always trigger a #GP, even when msr_relaxed is used. This is however fine, as that's not a regression when compared to older Xen versions, where access to the MSRs also trigger a #GP unconditionally. I assume the wrmsr side is added so that when using msr_relaxed Xen also injects a #GP for writes to those MSRs, as it would do for reads? Thanks, Roger.
I have read this thread and with my release manager hat on I feel confused and/or ignorant. Patch 3/ has a good explanation of what the problem is it is addressing and why this is important for 4.15. But then there is Jan's most recent reply starting "I find all of this confusing". Jan, can you please tell me in words of one syllable what the implication of that message is ? In particular is any of what you say a reason for me to withhold my release-ack ? AFAICT there is no explanation for why patches 1/ and 2/ deserve to go into 4.15. We are late in the freeze now, so I would ideally be looking for a clear and compelling argument. I'd also like to understand what the risks are of taking these. Can someone please enlighten me ? Thanks, Ian.
On 17/03/2021 13:37, Ian Jackson wrote: > I have read this thread and with my release manager hat on I feel > confused and/or ignorant. > > Patch 3/ has a good explanation of what the problem is it is > addressing and why this is important for 4.15. But then there is > Jan's most recent reply starting "I find all of this confusing". Jan, > can you please tell me in words of one syllable what the implication > of that message is ? In particular is any of what you say a reason > for me to withhold my release-ack ? > > AFAICT there is no explanation for why patches 1/ and 2/ deserve to go > into 4.15. We are late in the freeze now, so I would ideally be > looking for a clear and compelling argument. I'd also like to > understand what the risks are of taking these. Can someone please > enlighten me ? To make the code in 4.15 match 4.14, so patch 3 can be written in the first place. Also, as a side benefit, patches 1 and 2 reduce the quantity of logspew from the impacted MSRs. We cannot simply take patch 3 as-is, and say "4.14 and earlier" for backport, because that still forces end users to specify msr_relaxed to unbreak their Solaris guests, which is usability regression vs 4.14 ~Andrew
Andrew Cooper writes ("Re: [PATCH 1/3] Revert "x86/msr: drop compatibility #GP handling in guest_{rd,wr}msr()""): > On 17/03/2021 13:37, Ian Jackson wrote: > > AFAICT there is no explanation for why patches 1/ and 2/ deserve to go > > into 4.15. I see now, rereading the thread, that there was a sentence about this in each patch betwen the commit message and the diff. Sorry for missing that. (Although TBH at least one of those messages could usefully have gone into the commit message, as useful motivational background.) > > We are late in the freeze now, so I would ideally be > > looking for a clear and compelling argument. I'd also like to > > understand what the risks are of taking these. Can someone please > > enlighten me ? > > To make the code in 4.15 match 4.14, so patch 3 can be written in the > first place. > > Also, as a side benefit, patches 1 and 2 reduce the quantity of logspew > from the impacted MSRs. > > We cannot simply take patch 3 as-is, and say "4.14 and earlier" for > backport, because that still forces end users to specify msr_relaxed to > unbreak their Solaris guests, which is usability regression vs 4.14 This is plausible and going in the right direction but I still feel uncertain. Jan, what is your summary opinion about patch 3 ? Roger, can I get your opinion about the possible downside risks of this patch ? Thanks, Ian.
On Wed, Mar 17, 2021 at 02:46:20PM +0000, Ian Jackson wrote: > Andrew Cooper writes ("Re: [PATCH 1/3] Revert "x86/msr: drop compatibility #GP handling in guest_{rd,wr}msr()""): > > On 17/03/2021 13:37, Ian Jackson wrote: > > > AFAICT there is no explanation for why patches 1/ and 2/ deserve to go > > > into 4.15. > > I see now, rereading the thread, that there was a sentence about this > in each patch betwen the commit message and the diff. Sorry for > missing that. (Although TBH at least one of those messages could > usefully have gone into the commit message, as useful motivational > background.) > > > > We are late in the freeze now, so I would ideally be > > > looking for a clear and compelling argument. I'd also like to > > > understand what the risks are of taking these. Can someone please > > > enlighten me ? > > > > To make the code in 4.15 match 4.14, so patch 3 can be written in the > > first place. > > > > Also, as a side benefit, patches 1 and 2 reduce the quantity of logspew > > from the impacted MSRs. > > > > We cannot simply take patch 3 as-is, and say "4.14 and earlier" for > > backport, because that still forces end users to specify msr_relaxed to > > unbreak their Solaris guests, which is usability regression vs 4.14 > > This is plausible and going in the right direction but I still feel > uncertain. > > Jan, what is your summary opinion about patch 3 ? > > Roger, can I get your opinion about the possible downside risks of > this patch ? For patches 1 and 2 the risk is low I think. This is already the same handling that we do in pre-4.15, so it's unlikely to cause issues. From a guests PoV they don't change the result of trying to access any of the modified MSRs, accessing them will still result in a #GP being injected to the guest. The main risk for patch 3 would be that reporting 0 for MSR_RAPL_POWER_UNIT would result in some kind of issue on certain guests, or that it triggers the poking of other MSRs in the expectation that they would be available. I think those are quite unlikely, and the patch fixes a real issue with Solaris guests. Roger.
Roger Pau Monné writes ("Re: [PATCH 1/3] Revert "x86/msr: drop compatibility #GP handling in guest_{rd,wr}msr()""): > On Wed, Mar 17, 2021 at 02:46:20PM +0000, Ian Jackson wrote: > > Roger, can I get your opinion about the possible downside risks of > > this patch ? > > For patches 1 and 2 the risk is low I think. This is already the same > handling that we do in pre-4.15, so it's unlikely to cause issues. > >From a guests PoV they don't change the result of trying to access any > of the modified MSRs, accessing them will still result in a #GP being > injected to the guest. > > The main risk for patch 3 would be that reporting 0 for > MSR_RAPL_POWER_UNIT would result in some kind of issue on certain > guests, or that it triggers the poking of other MSRs in the > expectation that they would be available. I think those are quite > unlikely, and the patch fixes a real issue with Solaris guests. Thanks. That is very helpful. All three patches Release-Acked-by: Ian Jackson <iwj@xenproject.org> but subject to Jan's questions on patch 3 being resolved somehow. Ian.
On 17.03.2021 14:37, Ian Jackson wrote: > I have read this thread and with my release manager hat on I feel > confused and/or ignorant. > > Patch 3/ has a good explanation of what the problem is it is > addressing and why this is important for 4.15. But then there is > Jan's most recent reply starting "I find all of this confusing". Jan, > can you please tell me in words of one syllable what the implication > of that message is ? In particular is any of what you say a reason > for me to withhold my release-ack ? Answering the last question first - I don't think so. Something may indeed want doing here beyond what we already have, and it may well be precisely what Andrew is proposing, possibly just with extended descriptions and/or comments. My confusion about patch 3 is that it (a) claims behavior in turbostat that I can't locate and (b) implies (describes) behavior of code that I find entirely unexpected (as in: not making sense to me). And I'm sorry: Not all of the words are of one syllable. Jan
On 17.03.2021 15:46, Ian Jackson wrote:
> Jan, what is your summary opinion about patch 3 ?
Just to answer this question explicitly: I can't form one yet
without further information provided to me.
Jan
On 16/03/2021 16:58, Jan Beulich wrote: > On 16.03.2021 17:18, Andrew Cooper wrote: >> In hindsight, this was a poor move. Some of these MSRs require probing for, >> causing unhelpful spew into xl dmesg, as well as spew from unit tests >> explicitly checking behaviour. > I can indeed see your point for MSRs that require probing. But what about > the others (which, as it seems, is the majority)? And perhaps specifically > what about the entire WRMSR side, which won't be related to probing? I'm > not opposed to the change, but I'd like to understand the reasoning for > every one of the MSRs, not just a subset. > > Of course such ever-growing lists of case labels aren't very nice - this > going away was one of the things I particularly liked about the original > change. The logging in the default case is only useful when it is genuinely MSRs we haven't considered. It is very useful at pointing bugs in guests, or bugs in Xen, but only when the logging is not drowned out by things we know about. ~Andrew
On 18/03/2021 09:35, Jan Beulich wrote: > On 17.03.2021 14:37, Ian Jackson wrote: >> I have read this thread and with my release manager hat on I feel >> confused and/or ignorant. >> >> Patch 3/ has a good explanation of what the problem is it is >> addressing and why this is important for 4.15. But then there is >> Jan's most recent reply starting "I find all of this confusing". Jan, >> can you please tell me in words of one syllable what the implication >> of that message is ? In particular is any of what you say a reason >> for me to withhold my release-ack ? > Answering the last question first - I don't think so. Something may > indeed want doing here beyond what we already have, and it may well > be precisely what Andrew is proposing, possibly just with extended > descriptions and/or comments. My confusion about patch 3 is that it > (a) claims behavior in turbostat that I can't locate and (b) implies > (describes) behavior of code that I find entirely unexpected (as in: > not making sense to me). Turbostat was discussed on the LKML thread where KVM fixed the same bug, IIRC, but I'm struggling to find the reference. ~Andrew
On 19.03.2021 13:59, Andrew Cooper wrote: > On 16/03/2021 16:58, Jan Beulich wrote: >> On 16.03.2021 17:18, Andrew Cooper wrote: >>> In hindsight, this was a poor move. Some of these MSRs require probing for, >>> causing unhelpful spew into xl dmesg, as well as spew from unit tests >>> explicitly checking behaviour. >> I can indeed see your point for MSRs that require probing. But what about >> the others (which, as it seems, is the majority)? And perhaps specifically >> what about the entire WRMSR side, which won't be related to probing? I'm >> not opposed to the change, but I'd like to understand the reasoning for >> every one of the MSRs, not just a subset. >> >> Of course such ever-growing lists of case labels aren't very nice - this >> going away was one of the things I particularly liked about the original >> change. > > The logging in the default case is only useful when it is genuinely MSRs > we haven't considered. > > It is very useful at pointing bugs in guests, or bugs in Xen, but only > when the logging is not drowned out by things we know about. So would you mind adjusting the description accordingly? Right now, the way it's written, it reads (to my non-native interpretation) as entirely focusing on guests' probing needs. Even an adjustment as simple as "In hindsight, this was a poor move. Some of these MSRs require probing for, cause unhelpful spew into xl dmesg, or cause spew from unit tests explicitly checking behaviour." would already shift the focus imo. Jan
On 19/03/2021 13:56, Jan Beulich wrote: > On 19.03.2021 13:59, Andrew Cooper wrote: >> On 16/03/2021 16:58, Jan Beulich wrote: >>> On 16.03.2021 17:18, Andrew Cooper wrote: >>>> In hindsight, this was a poor move. Some of these MSRs require probing for, >>>> causing unhelpful spew into xl dmesg, as well as spew from unit tests >>>> explicitly checking behaviour. >>> I can indeed see your point for MSRs that require probing. But what about >>> the others (which, as it seems, is the majority)? And perhaps specifically >>> what about the entire WRMSR side, which won't be related to probing? I'm >>> not opposed to the change, but I'd like to understand the reasoning for >>> every one of the MSRs, not just a subset. >>> >>> Of course such ever-growing lists of case labels aren't very nice - this >>> going away was one of the things I particularly liked about the original >>> change. >> The logging in the default case is only useful when it is genuinely MSRs >> we haven't considered. >> >> It is very useful at pointing bugs in guests, or bugs in Xen, but only >> when the logging is not drowned out by things we know about. > So would you mind adjusting the description accordingly? Right now, the > way it's written, it reads (to my non-native interpretation) as entirely > focusing on guests' probing needs. Even an adjustment as simple as > > "In hindsight, this was a poor move. Some of these MSRs require probing for, > cause unhelpful spew into xl dmesg, or cause spew from unit tests > explicitly checking behaviour." > > would already shift the focus imo. Sure. I'll try to make this clearer. ~Andrew
Andrew Cooper writes ("[PATCH for-4.15 0/3] x86/msr: Fixes for XSA-351"): > This is slightly complicated. Patches 1 and 2 rearrange the code to look and > behave more like 4.14, and patch 3 fixes a Solaris (and turbostat) bug in a > manner which can be backported to all security trees. As far as I can tell this series needs a respin ? I have been through the thread and AFAICT the only comments were on the commit message for patch 2. Patchex 1 and 3 already have a release-ack. Patch 2 does not have any mind of maintainer review. I would like this series to go in today. Jan, since Andrew doesn't seem to have been able to do that respin yet, would you be able to rewrite the commit message of message 2 taking into account the two comments from you an from Roger ? I think that is all that's needed for these three to go into tree. Ian.
Ian Jackson writes ("Re: [PATCH for-4.15 0/3] x86/msr: Fixes for XSA-351 [and 1 more messages]"): > Andrew Cooper writes ("[PATCH for-4.15 0/3] x86/msr: Fixes for XSA-351"): > > This is slightly complicated. Patches 1 and 2 rearrange the code to look and > > behave more like 4.14, and patch 3 fixes a Solaris (and turbostat) bug in a > > manner which can be backported to all security trees. > > As far as I can tell this series needs a respin ? > > I have been through the thread and AFAICT the only comments were on > the commit message for patch 2. Patchex 1 and 3 already have a > release-ack. Patch 2 does not have any mind of maintainer review. Err, this is wrong. It is [PATCH 1/3] Revert "x86/msr: drop compatibility #GP handling in guest_{rd, wr}msr()" which has comments on the commit message from Jan: So would you mind adjusting the description accordingly? [...] "In hindsight, this was a poor move [..,] and Roger: I think it might be worth adding that guest access to those MSRs will now always trigger a #GP [...] I could easily fold in Jan's comments but it would be better for someone more familiar with the code do handle Roger's since Roger doesn't provide a precise wording. These two [PATCH 2/3] x86/msr: Forward port XSA-351 changes from 4.14 [PATCH 3/3] x86/msr: Fix Solaris and turbostat following XSA-351 have reviews from Roger. > I would like this series to go in today. > > Jan, since Andrew doesn't seem to have been able to do that respin > yet, would you be able to rewrite the commit message of message 2 > taking into account the two comments from you an from Roger ? > > I think that is all that's needed for these three to go into tree. Ian.
On Fri, Mar 26, 2021 at 02:30:29PM +0000, Ian Jackson wrote: > These two > > [PATCH 2/3] x86/msr: Forward port XSA-351 changes from 4.14 > [PATCH 3/3] x86/msr: Fix Solaris and turbostat following XSA-351 > > have reviews from Roger. From what I read on IRC patch 3 could cause issues with some Linux versions? I'm afraid it's not clear to me from the IRC conversation what's the issue exactly, so for the sake of not going over this again in the future, can we please have the issue with patch 3 mentioned in a reply to the patch itself? Or else it seems weird that a patch with a RB and a RAB didn't go in for no apparent reason. Thanks, Roger.
diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c index 0ebcb04259..c3a988bd11 100644 --- a/xen/arch/x86/msr.c +++ b/xen/arch/x86/msr.c @@ -175,6 +175,30 @@ int guest_rdmsr(struct vcpu *v, uint32_t msr, uint64_t *val) switch ( msr ) { + /* Write-only */ + case MSR_AMD_PATCHLOADER: + case MSR_IA32_UCODE_WRITE: + case MSR_PRED_CMD: + case MSR_FLUSH_CMD: + + /* Not offered to guests. */ + case MSR_TEST_CTRL: + case MSR_CORE_CAPABILITIES: + case MSR_TSX_FORCE_ABORT: + case MSR_TSX_CTRL: + case MSR_MCU_OPT_CTRL: + case MSR_RTIT_OUTPUT_BASE ... MSR_RTIT_ADDR_B(7): + case MSR_U_CET: + case MSR_S_CET: + case MSR_PL0_SSP ... MSR_INTERRUPT_SSP_TABLE: + case MSR_AMD64_LWP_CFG: + case MSR_AMD64_LWP_CBADDR: + case MSR_PPIN_CTL: + case MSR_PPIN: + case MSR_AMD_PPIN_CTL: + case MSR_AMD_PPIN: + goto gp_fault; + case MSR_IA32_FEATURE_CONTROL: /* * Architecturally, availability of this MSR is enumerated by the @@ -382,6 +406,30 @@ int guest_wrmsr(struct vcpu *v, uint32_t msr, uint64_t val) { uint64_t rsvd; + /* Read-only */ + case MSR_IA32_PLATFORM_ID: + case MSR_CORE_CAPABILITIES: + case MSR_INTEL_CORE_THREAD_COUNT: + case MSR_INTEL_PLATFORM_INFO: + case MSR_ARCH_CAPABILITIES: + + /* Not offered to guests. */ + case MSR_TEST_CTRL: + case MSR_TSX_FORCE_ABORT: + case MSR_TSX_CTRL: + case MSR_MCU_OPT_CTRL: + case MSR_RTIT_OUTPUT_BASE ... MSR_RTIT_ADDR_B(7): + case MSR_U_CET: + case MSR_S_CET: + case MSR_PL0_SSP ... MSR_INTERRUPT_SSP_TABLE: + case MSR_AMD64_LWP_CFG: + case MSR_AMD64_LWP_CBADDR: + case MSR_PPIN_CTL: + case MSR_PPIN: + case MSR_AMD_PPIN_CTL: + case MSR_AMD_PPIN: + goto gp_fault; + case MSR_AMD_PATCHLEVEL: BUILD_BUG_ON(MSR_IA32_UCODE_REV != MSR_AMD_PATCHLEVEL); /*
In hindsight, this was a poor move. Some of these MSRs require probing for, causing unhelpful spew into xl dmesg, as well as spew from unit tests explicitly checking behaviour. This restores behaviour close to that of Xen 4.14. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> --- CC: Jan Beulich <JBeulich@suse.com> CC: Roger Pau Monné <roger.pau@citrix.com> CC: Wei Liu <wl@xen.org> CC: Boris Ostrovsky <boris.ostrovsky@oracle.com> CC: Ian Jackson <iwj@xenproject.org> For 4.15. Restoring behaviour closer to 4.14, and prereq for a bugfix needing backporting. --- xen/arch/x86/msr.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 48 insertions(+)