diff mbox series

[RFC,03/22] x86/msr: always allow a pinned Dom0 to read any unknown MSR

Message ID 4c04e5661688cf1de3e3fd668b0a78b23b6d7b2e.1698261255.git.edwin.torok@cloud.com (mailing list archive)
State New, archived
Headers show
Series vPMU bugfixes and support for PMUv5 | expand

Commit Message

Edwin Török Oct. 25, 2023, 7:29 p.m. UTC
From: Edwin Török <edvin.torok@citrix.com>

This can be useful if you realize you have to inspect the value of an
MSR in production, without having to change into a new Xen first that
handles the MSR.

E.g. SMI count didn't use to be explicitly allowed in the past
(it now is, see a previous commit), but there could be other MSRs that
are useful when tracking down issues.

Backport: 4.15+

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
---
 xen/arch/x86/hvm/svm/svm.c     | 3 +++
 xen/arch/x86/hvm/vmx/vmx.c     | 3 +++
 xen/arch/x86/pv/emul-priv-op.c | 3 +++
 3 files changed, 9 insertions(+)

Comments

Jan Beulich Oct. 30, 2023, 4:29 p.m. UTC | #1
On 25.10.2023 21:29, Edwin Török wrote:
> This can be useful if you realize you have to inspect the value of an
> MSR in production, without having to change into a new Xen first that
> handles the MSR.

Yet on a non-pinned Dom0 you'd still be lost. Since iirc we generally
advise against pinning, I wonder of how much use such a change would be,
when it effectively undoes what we deliberately did a while ago.

> --- a/xen/arch/x86/hvm/svm/svm.c
> +++ b/xen/arch/x86/hvm/svm/svm.c
> @@ -1933,6 +1933,9 @@ static int cf_check svm_msr_read_intercept(
>          break;
>  
>      default:
> +        if ( is_hwdom_pinned_vcpu(v) && !rdmsr_safe(msr, *msr_content) )
> +            break;
> +
>          if ( d->arch.msr_relaxed && !rdmsr_safe(msr, tmp) )
>          {
>              *msr_content = 0;

If we went as far as undoing some of what was done, I'd then wonder
whether instead we should mandate relaxed mode to be enabled on such a
Dom0. Then, instead of returning fake 0 here, the actual value could
be returned in the specific case of (pinned?) Dom0.

Jan
Edwin Török Oct. 31, 2023, 9:31 a.m. UTC | #2
> On 30 Oct 2023, at 16:29, Jan Beulich <jbeulich@suse.com> wrote:
> 
> On 25.10.2023 21:29, Edwin Török wrote:
>> This can be useful if you realize you have to inspect the value of an
>> MSR in production, without having to change into a new Xen first that
>> handles the MSR.
> 
> Yet on a non-pinned Dom0 you'd still be lost. Since iirc we generally
> advise against pinning,

You can temporarily pin while debugging the issue (e.g. pin just 1 CPU from Dom0, and "walk" all your physical CPUs with it if you have to,
so that you query them all), e.g. with 'xl vcpu-pin'.
Although that is more invasive than reading a value.
 
Or alternatively have another (privileged) interface to read the MSR for a given core without exposing it to any guests, that way you don't affect the running system at all
(which would be preferable in a production environment), i.e. a Xen equivalent of 'rdmsr'.

> I wonder of how much use such a change would be,
> when it effectively undoes what we deliberately did a while ago.
> 
>> --- a/xen/arch/x86/hvm/svm/svm.c
>> +++ b/xen/arch/x86/hvm/svm/svm.c
>> @@ -1933,6 +1933,9 @@ static int cf_check svm_msr_read_intercept(
>>         break;
>> 
>>     default:
>> +        if ( is_hwdom_pinned_vcpu(v) && !rdmsr_safe(msr, *msr_content) )
>> +            break;
>> +
>>         if ( d->arch.msr_relaxed && !rdmsr_safe(msr, tmp) )
>>         {
>>             *msr_content = 0;
> 
> If we went as far as undoing some of what was done, I'd then wonder
> whether instead we should mandate relaxed mode to be enabled on such a
> Dom0. Then, instead of returning fake 0 here, the actual value could
> be returned in the specific case of (pinned?) Dom0.


Can relaxed mode be enabled at runtime? I'd be happy with either solution, but it should be something that can be enabled at runtime
(if you have to reboot Xen then you may lose the bug repro that you want to gather more information on).
Although changing such a setting in a production environment may still be risky, because the guest will then become very confused that it has previously read some 0s, now there are some real values, and later when you flip the switch off it gets 0s again.

Best regards,
--Edwin
Jan Beulich Oct. 31, 2023, 10:12 a.m. UTC | #3
On 31.10.2023 10:31, Edwin Torok wrote:
>> On 30 Oct 2023, at 16:29, Jan Beulich <jbeulich@suse.com> wrote:
>> On 25.10.2023 21:29, Edwin Török wrote:
>>> This can be useful if you realize you have to inspect the value of an
>>> MSR in production, without having to change into a new Xen first that
>>> handles the MSR.
>>
>> Yet on a non-pinned Dom0 you'd still be lost. Since iirc we generally
>> advise against pinning,
> 
> You can temporarily pin while debugging the issue (e.g. pin just 1 CPU from Dom0, and "walk" all your physical CPUs with it if you have to,
> so that you query them all), e.g. with 'xl vcpu-pin'.
> Although that is more invasive than reading a value.
>  
> Or alternatively have another (privileged) interface to read the MSR for a given core without exposing it to any guests, that way you don't affect the running system at all
> (which would be preferable in a production environment), i.e. a Xen equivalent of 'rdmsr'.

The interface we have (XENPF_resource_op) is, despite being privileged,
deliberately (so far at least) not permitting access to arbitrary MSRs.

In our old XenoLinux forward port we had an extension to the msr.ko
module to allow pCPU-based MSR accesses (and I had a private extension
to the rdmsr/wrmsr user space tools making use of that), but even that
would have been subject to restrictions enforced by Xen as to which
MSRs are accessible.

>> I wonder of how much use such a change would be,
>> when it effectively undoes what we deliberately did a while ago.
>>
>>> --- a/xen/arch/x86/hvm/svm/svm.c
>>> +++ b/xen/arch/x86/hvm/svm/svm.c
>>> @@ -1933,6 +1933,9 @@ static int cf_check svm_msr_read_intercept(
>>>         break;
>>>
>>>     default:
>>> +        if ( is_hwdom_pinned_vcpu(v) && !rdmsr_safe(msr, *msr_content) )
>>> +            break;
>>> +
>>>         if ( d->arch.msr_relaxed && !rdmsr_safe(msr, tmp) )
>>>         {
>>>             *msr_content = 0;
>>
>> If we went as far as undoing some of what was done, I'd then wonder
>> whether instead we should mandate relaxed mode to be enabled on such a
>> Dom0. Then, instead of returning fake 0 here, the actual value could
>> be returned in the specific case of (pinned?) Dom0.
> 
> 
> Can relaxed mode be enabled at runtime?

Not right now, no. But a hypfs control could certainly be added, with
suitable justification.

> I'd be happy with either solution, but it should be something that can be enabled at runtime
> (if you have to reboot Xen then you may lose the bug repro that you want to gather more information on).
> Although changing such a setting in a production environment may still be risky, because the guest will then become very confused that it has previously read some 0s, now there are some real values, and later when you flip the switch off it gets 0s again.

Indeed. If you flipped such a control for any domain at runtime, you'd
better first check that this wouldn't cause any such issues.

Jan
diff mbox series

Patch

diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index 24c417ca71..45f8e1ffd1 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -1933,6 +1933,9 @@  static int cf_check svm_msr_read_intercept(
         break;
 
     default:
+        if ( is_hwdom_pinned_vcpu(v) && !rdmsr_safe(msr, *msr_content) )
+            break;
+
         if ( d->arch.msr_relaxed && !rdmsr_safe(msr, tmp) )
         {
             *msr_content = 0;
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 1edc7f1e91..f6e5123f66 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -3377,6 +3377,9 @@  static int cf_check vmx_msr_read_intercept(
         if ( vmx_read_guest_msr(curr, msr, msr_content) == 0 )
             break;
 
+        if ( is_hwdom_pinned_vcpu(curr) && !rdmsr_safe(msr, *msr_content) )
+            return X86EMUL_OKAY;
+
         if ( is_last_branch_msr(msr) )
         {
             *msr_content = 0;
diff --git a/xen/arch/x86/pv/emul-priv-op.c b/xen/arch/x86/pv/emul-priv-op.c
index 0d9f84f458..978ae679a2 100644
--- a/xen/arch/x86/pv/emul-priv-op.c
+++ b/xen/arch/x86/pv/emul-priv-op.c
@@ -985,6 +985,9 @@  static int cf_check read_msr(
         }
         /* fall through */
     default:
+        if ( is_hwdom_pinned_vcpu(curr) && !rdmsr_safe(reg, *val) )
+            return X86EMUL_OKAY;
+
         if ( currd->arch.msr_relaxed && !rdmsr_safe(reg, tmp) )
         {
             *val = 0;