diff mbox series

[2/2] x86/hvm: fix write emulation of RO ranges

Message ID 20250408093156.83277-3-roger.pau@citrix.com (mailing list archive)
State New
Headers show
Series x86/hvm: fixes for RO MMIO emulation | expand

Commit Message

Roger Pau Monné April 8, 2025, 9:31 a.m. UTC
When running on AMD hardware in HVM mode the guest linear address (GLA)
will not be provided to hvm_emulate_one_mmio(), and instead is
unconditionally set of ~0.  As a consequence mmio_ro_emulated_write() will
always report an error, as the fault GLA generated by the emulation of the
access won't be ~0.

Fix this by only checking for the fault GLA in mmio_ro_emulated_write()
when the guest is PV.

Fixes: 33c19df9a5a0 ('x86/PCI: intercept accesses to RO MMIO from dom0s in HVM containers')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
 xen/arch/x86/mm.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

Comments

Jan Beulich April 8, 2025, 1:57 p.m. UTC | #1
On 08.04.2025 11:31, Roger Pau Monne wrote:
> When running on AMD hardware in HVM mode the guest linear address (GLA)
> will not be provided to hvm_emulate_one_mmio(), and instead is
> unconditionally set of ~0.  As a consequence mmio_ro_emulated_write() will
> always report an error, as the fault GLA generated by the emulation of the
> access won't be ~0.

Which means subpage_mmio_write_accept() is flawed, too, on AMD (or more
generally whenever .gla_valid isn't set).

> Fix this by only checking for the fault GLA in mmio_ro_emulated_write()
> when the guest is PV.

This narrows checking too much, imo. For VT-x we could continue to do so,
provided we pass e.g. npfec down into hvm_emulate_one_mmio(), i.e. make
the gla_valid flag visible there.

> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -5187,7 +5187,12 @@ int cf_check mmio_ro_emulated_write(
>  
>      /* Only allow naturally-aligned stores at the original %cr2 address. */
>      if ( ((bytes | offset) & (bytes - 1)) || !bytes ||
> -         offset != mmio_ro_ctxt->cr2 )
> +         /*
> +          * HVM domains might not have a valid fault GLA in the context, as AMD
> +          * NPT faults don't report the faulting GLA.  It's also possible for
> +          * the fault to happen in non-paging modes.
> +          */
> +         (is_pv_domain(current->domain) && offset != mmio_ro_ctxt->cr2) )
>      {
>          gdprintk(XENLOG_WARNING, "bad access (cr2=%lx, addr=%lx, bytes=%u)\n",
>                  mmio_ro_ctxt->cr2, offset, bytes);

Is logging the supposed CR2 value useful then for cases where the GLA
isn't valid? I fear it might be more confusing than helpful.

Jan
Andrew Cooper April 8, 2025, 2 p.m. UTC | #2
On 08/04/2025 10:31 am, Roger Pau Monne wrote:
> When running on AMD hardware in HVM mode the guest linear address (GLA)
> will not be provided to hvm_emulate_one_mmio(), and instead is
> unconditionally set of ~0.  As a consequence mmio_ro_emulated_write() will
> always report an error, as the fault GLA generated by the emulation of the
> access won't be ~0.
>
> Fix this by only checking for the fault GLA in mmio_ro_emulated_write()
> when the guest is PV.
>
> Fixes: 33c19df9a5a0 ('x86/PCI: intercept accesses to RO MMIO from dom0s in HVM containers')
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

I think there are several bugs here.

We do get %cr2 reliably for PV and Shadow guests.

Intel EPT may or may not give us GLA.  e.g. writes for pagetable A/D
updates don't get GLA.

Defaulting to ~0 isn't ok.  We need some kind of GLA-valid signal,
except for HAP guests, it isn't even the GLA we care about, it's the GPA
which identifies the MMIO region.

We shouldn't terminate the emulation if there's no GLA to check.  In the
case that we don't have a GLA, we should translate the memory operand
and cross-check the GPA.  We'll definitely have one of the two to hand.

~Andrew
Roger Pau Monné April 9, 2025, 9:07 a.m. UTC | #3
On Tue, Apr 08, 2025 at 03:57:17PM +0200, Jan Beulich wrote:
> On 08.04.2025 11:31, Roger Pau Monne wrote:
> > When running on AMD hardware in HVM mode the guest linear address (GLA)
> > will not be provided to hvm_emulate_one_mmio(), and instead is
> > unconditionally set of ~0.  As a consequence mmio_ro_emulated_write() will
> > always report an error, as the fault GLA generated by the emulation of the
> > access won't be ~0.
> 
> Which means subpage_mmio_write_accept() is flawed, too, on AMD (or more
> generally whenever .gla_valid isn't set).

Oh, yes, good catch.  I didn't notice that one.  We should move all
those checks to use a paddr rather than a gla.

> > Fix this by only checking for the fault GLA in mmio_ro_emulated_write()
> > when the guest is PV.
> 
> This narrows checking too much, imo. For VT-x we could continue to do so,
> provided we pass e.g. npfec down into hvm_emulate_one_mmio(), i.e. make
> the gla_valid flag visible there.

I don't think we should rely on the gla at all in
mmio_ro_emulated_write(), and instead just use the physical address.

> > --- a/xen/arch/x86/mm.c
> > +++ b/xen/arch/x86/mm.c
> > @@ -5187,7 +5187,12 @@ int cf_check mmio_ro_emulated_write(
> >  
> >      /* Only allow naturally-aligned stores at the original %cr2 address. */
> >      if ( ((bytes | offset) & (bytes - 1)) || !bytes ||
> > -         offset != mmio_ro_ctxt->cr2 )
> > +         /*
> > +          * HVM domains might not have a valid fault GLA in the context, as AMD
> > +          * NPT faults don't report the faulting GLA.  It's also possible for
> > +          * the fault to happen in non-paging modes.
> > +          */
> > +         (is_pv_domain(current->domain) && offset != mmio_ro_ctxt->cr2) )
> >      {
> >          gdprintk(XENLOG_WARNING, "bad access (cr2=%lx, addr=%lx, bytes=%u)\n",
> >                  mmio_ro_ctxt->cr2, offset, bytes);
> 
> Is logging the supposed CR2 value useful then for cases where the GLA
> isn't valid? I fear it might be more confusing than helpful.

Since it was a debug printk I was kind of OK with leaving it, but
given the other comments I think I will have to rework
mmio_ro_emulated_write() so that it doesn't use the gla anymore.

Thanks, Roger.
Roger Pau Monné April 9, 2025, 9:47 a.m. UTC | #4
On Tue, Apr 08, 2025 at 03:00:28PM +0100, Andrew Cooper wrote:
> On 08/04/2025 10:31 am, Roger Pau Monne wrote:
> > When running on AMD hardware in HVM mode the guest linear address (GLA)
> > will not be provided to hvm_emulate_one_mmio(), and instead is
> > unconditionally set of ~0.  As a consequence mmio_ro_emulated_write() will
> > always report an error, as the fault GLA generated by the emulation of the
> > access won't be ~0.
> >
> > Fix this by only checking for the fault GLA in mmio_ro_emulated_write()
> > when the guest is PV.
> >
> > Fixes: 33c19df9a5a0 ('x86/PCI: intercept accesses to RO MMIO from dom0s in HVM containers')
> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> 
> I think there are several bugs here.
> 
> We do get %cr2 reliably for PV and Shadow guests.
> 
> Intel EPT may or may not give us GLA.  e.g. writes for pagetable A/D
> updates don't get GLA.
> 
> Defaulting to ~0 isn't ok.  We need some kind of GLA-valid signal,
> except for HAP guests, it isn't even the GLA we care about, it's the GPA
> which identifies the MMIO region.
> 
> We shouldn't terminate the emulation if there's no GLA to check.  In the
> case that we don't have a GLA, we should translate the memory operand
> and cross-check the GPA.  We'll definitely have one of the two to hand.

I guess I will have to switch to a more complex approach for HVM and
use logic similar to hvmemul_write() to figure out the mfn, and
compare it with the fault provided one.

Thanks, Roger.
Jan Beulich April 9, 2025, 10 a.m. UTC | #5
On 09.04.2025 11:07, Roger Pau Monné wrote:
> On Tue, Apr 08, 2025 at 03:57:17PM +0200, Jan Beulich wrote:
>> On 08.04.2025 11:31, Roger Pau Monne wrote:
>>> When running on AMD hardware in HVM mode the guest linear address (GLA)
>>> will not be provided to hvm_emulate_one_mmio(), and instead is
>>> unconditionally set of ~0.  As a consequence mmio_ro_emulated_write() will
>>> always report an error, as the fault GLA generated by the emulation of the
>>> access won't be ~0.
>>
>> Which means subpage_mmio_write_accept() is flawed, too, on AMD (or more
>> generally whenever .gla_valid isn't set).
> 
> Oh, yes, good catch.  I didn't notice that one.  We should move all
> those checks to use a paddr rather than a gla.

Really that function could just be passed the offset into the page.

>>> Fix this by only checking for the fault GLA in mmio_ro_emulated_write()
>>> when the guest is PV.
>>
>> This narrows checking too much, imo. For VT-x we could continue to do so,
>> provided we pass e.g. npfec down into hvm_emulate_one_mmio(), i.e. make
>> the gla_valid flag visible there.
> 
> I don't think we should rely on the gla at all in
> mmio_ro_emulated_write(), and instead just use the physical address.

But you can't validate a physical address against a CR2 value. And I view
this validation as meaningful, to guard (best effort, but still) against
e.g. insn re-writing under our feet.

Jan
Roger Pau Monné April 9, 2025, 10:39 a.m. UTC | #6
On Wed, Apr 09, 2025 at 12:00:16PM +0200, Jan Beulich wrote:
> On 09.04.2025 11:07, Roger Pau Monné wrote:
> > On Tue, Apr 08, 2025 at 03:57:17PM +0200, Jan Beulich wrote:
> >> On 08.04.2025 11:31, Roger Pau Monne wrote:
> >>> When running on AMD hardware in HVM mode the guest linear address (GLA)
> >>> will not be provided to hvm_emulate_one_mmio(), and instead is
> >>> unconditionally set of ~0.  As a consequence mmio_ro_emulated_write() will
> >>> always report an error, as the fault GLA generated by the emulation of the
> >>> access won't be ~0.
> >>
> >> Which means subpage_mmio_write_accept() is flawed, too, on AMD (or more
> >> generally whenever .gla_valid isn't set).
> > 
> > Oh, yes, good catch.  I didn't notice that one.  We should move all
> > those checks to use a paddr rather than a gla.
> 
> Really that function could just be passed the offset into the page.
> 
> >>> Fix this by only checking for the fault GLA in mmio_ro_emulated_write()
> >>> when the guest is PV.
> >>
> >> This narrows checking too much, imo. For VT-x we could continue to do so,
> >> provided we pass e.g. npfec down into hvm_emulate_one_mmio(), i.e. make
> >> the gla_valid flag visible there.
> > 
> > I don't think we should rely on the gla at all in
> > mmio_ro_emulated_write(), and instead just use the physical address.
> 
> But you can't validate a physical address against a CR2 value. And I view
> this validation as meaningful, to guard (best effort, but still) against
> e.g. insn re-writing under our feet.

But we have the mfn in mmio_ro_ctxt, and could possibly use that to
validate?  I could expand the context to include the offset also, so
that we could fully validate it.

Thanks, Roger.
Jan Beulich April 9, 2025, 12:59 p.m. UTC | #7
On 09.04.2025 12:39, Roger Pau Monné wrote:
> On Wed, Apr 09, 2025 at 12:00:16PM +0200, Jan Beulich wrote:
>> On 09.04.2025 11:07, Roger Pau Monné wrote:
>>> On Tue, Apr 08, 2025 at 03:57:17PM +0200, Jan Beulich wrote:
>>>> On 08.04.2025 11:31, Roger Pau Monne wrote:
>>>>> When running on AMD hardware in HVM mode the guest linear address (GLA)
>>>>> will not be provided to hvm_emulate_one_mmio(), and instead is
>>>>> unconditionally set of ~0.  As a consequence mmio_ro_emulated_write() will
>>>>> always report an error, as the fault GLA generated by the emulation of the
>>>>> access won't be ~0.
>>>>
>>>> Which means subpage_mmio_write_accept() is flawed, too, on AMD (or more
>>>> generally whenever .gla_valid isn't set).
>>>
>>> Oh, yes, good catch.  I didn't notice that one.  We should move all
>>> those checks to use a paddr rather than a gla.
>>
>> Really that function could just be passed the offset into the page.
>>
>>>>> Fix this by only checking for the fault GLA in mmio_ro_emulated_write()
>>>>> when the guest is PV.
>>>>
>>>> This narrows checking too much, imo. For VT-x we could continue to do so,
>>>> provided we pass e.g. npfec down into hvm_emulate_one_mmio(), i.e. make
>>>> the gla_valid flag visible there.
>>>
>>> I don't think we should rely on the gla at all in
>>> mmio_ro_emulated_write(), and instead just use the physical address.
>>
>> But you can't validate a physical address against a CR2 value. And I view
>> this validation as meaningful, to guard (best effort, but still) against
>> e.g. insn re-writing under our feet.
> 
> But we have the mfn in mmio_ro_ctxt, and could possibly use that to
> validate?  I could expand the context to include the offset also, so
> that we could fully validate it.

How would you use the MFN to validate against the VA in CR2?

Jan
Roger Pau Monné April 9, 2025, 1:33 p.m. UTC | #8
On Wed, Apr 09, 2025 at 02:59:45PM +0200, Jan Beulich wrote:
> On 09.04.2025 12:39, Roger Pau Monné wrote:
> > On Wed, Apr 09, 2025 at 12:00:16PM +0200, Jan Beulich wrote:
> >> On 09.04.2025 11:07, Roger Pau Monné wrote:
> >>> On Tue, Apr 08, 2025 at 03:57:17PM +0200, Jan Beulich wrote:
> >>>> On 08.04.2025 11:31, Roger Pau Monne wrote:
> >>>>> When running on AMD hardware in HVM mode the guest linear address (GLA)
> >>>>> will not be provided to hvm_emulate_one_mmio(), and instead is
> >>>>> unconditionally set of ~0.  As a consequence mmio_ro_emulated_write() will
> >>>>> always report an error, as the fault GLA generated by the emulation of the
> >>>>> access won't be ~0.
> >>>>
> >>>> Which means subpage_mmio_write_accept() is flawed, too, on AMD (or more
> >>>> generally whenever .gla_valid isn't set).
> >>>
> >>> Oh, yes, good catch.  I didn't notice that one.  We should move all
> >>> those checks to use a paddr rather than a gla.
> >>
> >> Really that function could just be passed the offset into the page.
> >>
> >>>>> Fix this by only checking for the fault GLA in mmio_ro_emulated_write()
> >>>>> when the guest is PV.
> >>>>
> >>>> This narrows checking too much, imo. For VT-x we could continue to do so,
> >>>> provided we pass e.g. npfec down into hvm_emulate_one_mmio(), i.e. make
> >>>> the gla_valid flag visible there.
> >>>
> >>> I don't think we should rely on the gla at all in
> >>> mmio_ro_emulated_write(), and instead just use the physical address.
> >>
> >> But you can't validate a physical address against a CR2 value. And I view
> >> this validation as meaningful, to guard (best effort, but still) against
> >> e.g. insn re-writing under our feet.
> > 
> > But we have the mfn in mmio_ro_ctxt, and could possibly use that to
> > validate?  I could expand the context to include the offset also, so
> > that we could fully validate it.
> 
> How would you use the MFN to validate against the VA in CR2?

I would use hvmemul_virtual_to_linear() and hvm_translate_get_page()
to get the underlying mfn of the linear address.  But maybe there's a
part of this that I'm missing, I've certainly haven't tried to
implement any of it.

Thanks, Roger.
Jan Beulich April 9, 2025, 1:50 p.m. UTC | #9
On 09.04.2025 15:33, Roger Pau Monné wrote:
> On Wed, Apr 09, 2025 at 02:59:45PM +0200, Jan Beulich wrote:
>> On 09.04.2025 12:39, Roger Pau Monné wrote:
>>> On Wed, Apr 09, 2025 at 12:00:16PM +0200, Jan Beulich wrote:
>>>> On 09.04.2025 11:07, Roger Pau Monné wrote:
>>>>> On Tue, Apr 08, 2025 at 03:57:17PM +0200, Jan Beulich wrote:
>>>>>> On 08.04.2025 11:31, Roger Pau Monne wrote:
>>>>>>> When running on AMD hardware in HVM mode the guest linear address (GLA)
>>>>>>> will not be provided to hvm_emulate_one_mmio(), and instead is
>>>>>>> unconditionally set of ~0.  As a consequence mmio_ro_emulated_write() will
>>>>>>> always report an error, as the fault GLA generated by the emulation of the
>>>>>>> access won't be ~0.
>>>>>>
>>>>>> Which means subpage_mmio_write_accept() is flawed, too, on AMD (or more
>>>>>> generally whenever .gla_valid isn't set).
>>>>>
>>>>> Oh, yes, good catch.  I didn't notice that one.  We should move all
>>>>> those checks to use a paddr rather than a gla.
>>>>
>>>> Really that function could just be passed the offset into the page.
>>>>
>>>>>>> Fix this by only checking for the fault GLA in mmio_ro_emulated_write()
>>>>>>> when the guest is PV.
>>>>>>
>>>>>> This narrows checking too much, imo. For VT-x we could continue to do so,
>>>>>> provided we pass e.g. npfec down into hvm_emulate_one_mmio(), i.e. make
>>>>>> the gla_valid flag visible there.
>>>>>
>>>>> I don't think we should rely on the gla at all in
>>>>> mmio_ro_emulated_write(), and instead just use the physical address.
>>>>
>>>> But you can't validate a physical address against a CR2 value. And I view
>>>> this validation as meaningful, to guard (best effort, but still) against
>>>> e.g. insn re-writing under our feet.
>>>
>>> But we have the mfn in mmio_ro_ctxt, and could possibly use that to
>>> validate?  I could expand the context to include the offset also, so
>>> that we could fully validate it.
>>
>> How would you use the MFN to validate against the VA in CR2?
> 
> I would use hvmemul_virtual_to_linear()

If you mean to use the CR2 as input, you wouldn't need this. I said VA in
my earlier reply, yes, but strictly speaking that's a linear address.

> and hvm_translate_get_page()
> to get the underlying mfn of the linear address.  But maybe there's a
> part of this that I'm missing, I've certainly haven't tried to
> implement any of it.

Hmm, I see. I didn't think of doing it this way round. There's certainly
at least one caveat with this approach: Multiple linear addresses can all
map to the same GFN and hence MFN. Checking against the original linear
address (when available) doesn't have such an issue.

Jan
Roger Pau Monné April 9, 2025, 2:01 p.m. UTC | #10
On Wed, Apr 09, 2025 at 03:50:13PM +0200, Jan Beulich wrote:
> On 09.04.2025 15:33, Roger Pau Monné wrote:
> > On Wed, Apr 09, 2025 at 02:59:45PM +0200, Jan Beulich wrote:
> >> On 09.04.2025 12:39, Roger Pau Monné wrote:
> >>> On Wed, Apr 09, 2025 at 12:00:16PM +0200, Jan Beulich wrote:
> >>>> On 09.04.2025 11:07, Roger Pau Monné wrote:
> >>>>> On Tue, Apr 08, 2025 at 03:57:17PM +0200, Jan Beulich wrote:
> >>>>>> On 08.04.2025 11:31, Roger Pau Monne wrote:
> >>>>>>> When running on AMD hardware in HVM mode the guest linear address (GLA)
> >>>>>>> will not be provided to hvm_emulate_one_mmio(), and instead is
> >>>>>>> unconditionally set of ~0.  As a consequence mmio_ro_emulated_write() will
> >>>>>>> always report an error, as the fault GLA generated by the emulation of the
> >>>>>>> access won't be ~0.
> >>>>>>
> >>>>>> Which means subpage_mmio_write_accept() is flawed, too, on AMD (or more
> >>>>>> generally whenever .gla_valid isn't set).
> >>>>>
> >>>>> Oh, yes, good catch.  I didn't notice that one.  We should move all
> >>>>> those checks to use a paddr rather than a gla.
> >>>>
> >>>> Really that function could just be passed the offset into the page.
> >>>>
> >>>>>>> Fix this by only checking for the fault GLA in mmio_ro_emulated_write()
> >>>>>>> when the guest is PV.
> >>>>>>
> >>>>>> This narrows checking too much, imo. For VT-x we could continue to do so,
> >>>>>> provided we pass e.g. npfec down into hvm_emulate_one_mmio(), i.e. make
> >>>>>> the gla_valid flag visible there.
> >>>>>
> >>>>> I don't think we should rely on the gla at all in
> >>>>> mmio_ro_emulated_write(), and instead just use the physical address.
> >>>>
> >>>> But you can't validate a physical address against a CR2 value. And I view
> >>>> this validation as meaningful, to guard (best effort, but still) against
> >>>> e.g. insn re-writing under our feet.
> >>>
> >>> But we have the mfn in mmio_ro_ctxt, and could possibly use that to
> >>> validate?  I could expand the context to include the offset also, so
> >>> that we could fully validate it.
> >>
> >> How would you use the MFN to validate against the VA in CR2?
> > 
> > I would use hvmemul_virtual_to_linear()
> 
> If you mean to use the CR2 as input, you wouldn't need this. I said VA in
> my earlier reply, yes, but strictly speaking that's a linear address.

I was thinking about using the segment and offset parameters of the
mmio_ro_emulated_write() call.

> > and hvm_translate_get_page()
> > to get the underlying mfn of the linear address.  But maybe there's a
> > part of this that I'm missing, I've certainly haven't tried to
> > implement any of it.
> 
> Hmm, I see. I didn't think of doing it this way round. There's certainly
> at least one caveat with this approach: Multiple linear addresses can all
> map to the same GFN and hence MFN. Checking against the original linear
> address (when available) doesn't have such an issue.

I see... Yet for AMD that address is not uniformly available as part
of the vmexit information?  As I understand the checks done in
mmio_ro_emulated_write() are to ensure correctness, but carrying the
access even when the %cr2 check fail wouldn't cause issues to Xen
itself?

Thanks, Roger.
Jan Beulich April 9, 2025, 2:08 p.m. UTC | #11
On 09.04.2025 16:01, Roger Pau Monné wrote:
> On Wed, Apr 09, 2025 at 03:50:13PM +0200, Jan Beulich wrote:
>> On 09.04.2025 15:33, Roger Pau Monné wrote:
>>> On Wed, Apr 09, 2025 at 02:59:45PM +0200, Jan Beulich wrote:
>>>> On 09.04.2025 12:39, Roger Pau Monné wrote:
>>>>> On Wed, Apr 09, 2025 at 12:00:16PM +0200, Jan Beulich wrote:
>>>>>> On 09.04.2025 11:07, Roger Pau Monné wrote:
>>>>>>> On Tue, Apr 08, 2025 at 03:57:17PM +0200, Jan Beulich wrote:
>>>>>>>> On 08.04.2025 11:31, Roger Pau Monne wrote:
>>>>>>>>> When running on AMD hardware in HVM mode the guest linear address (GLA)
>>>>>>>>> will not be provided to hvm_emulate_one_mmio(), and instead is
>>>>>>>>> unconditionally set of ~0.  As a consequence mmio_ro_emulated_write() will
>>>>>>>>> always report an error, as the fault GLA generated by the emulation of the
>>>>>>>>> access won't be ~0.
>>>>>>>>
>>>>>>>> Which means subpage_mmio_write_accept() is flawed, too, on AMD (or more
>>>>>>>> generally whenever .gla_valid isn't set).
>>>>>>>
>>>>>>> Oh, yes, good catch.  I didn't notice that one.  We should move all
>>>>>>> those checks to use a paddr rather than a gla.
>>>>>>
>>>>>> Really that function could just be passed the offset into the page.
>>>>>>
>>>>>>>>> Fix this by only checking for the fault GLA in mmio_ro_emulated_write()
>>>>>>>>> when the guest is PV.
>>>>>>>>
>>>>>>>> This narrows checking too much, imo. For VT-x we could continue to do so,
>>>>>>>> provided we pass e.g. npfec down into hvm_emulate_one_mmio(), i.e. make
>>>>>>>> the gla_valid flag visible there.
>>>>>>>
>>>>>>> I don't think we should rely on the gla at all in
>>>>>>> mmio_ro_emulated_write(), and instead just use the physical address.
>>>>>>
>>>>>> But you can't validate a physical address against a CR2 value. And I view
>>>>>> this validation as meaningful, to guard (best effort, but still) against
>>>>>> e.g. insn re-writing under our feet.
>>>>>
>>>>> But we have the mfn in mmio_ro_ctxt, and could possibly use that to
>>>>> validate?  I could expand the context to include the offset also, so
>>>>> that we could fully validate it.
>>>>
>>>> How would you use the MFN to validate against the VA in CR2?
>>>
>>> I would use hvmemul_virtual_to_linear()
>>
>> If you mean to use the CR2 as input, you wouldn't need this. I said VA in
>> my earlier reply, yes, but strictly speaking that's a linear address.
> 
> I was thinking about using the segment and offset parameters of the
> mmio_ro_emulated_write() call.
> 
>>> and hvm_translate_get_page()
>>> to get the underlying mfn of the linear address.  But maybe there's a
>>> part of this that I'm missing, I've certainly haven't tried to
>>> implement any of it.
>>
>> Hmm, I see. I didn't think of doing it this way round. There's certainly
>> at least one caveat with this approach: Multiple linear addresses can all
>> map to the same GFN and hence MFN. Checking against the original linear
>> address (when available) doesn't have such an issue.
> 
> I see... Yet for AMD that address is not uniformly available as part
> of the vmexit information?

Even stronger, I thought: It's uniformly not available.

>  As I understand the checks done in
> mmio_ro_emulated_write() are to ensure correctness, but carrying the
> access even when the %cr2 check fail wouldn't cause issues to Xen
> itself?

Well, "wouldn't" is too strong for my taste, "shouldn't" would fit. The
checking is there to avoid guests playing games. Whether that prevents
merely in-guest just-bugs or actual XSAs we can't know until we find a
case where the game playing might make us do something wrong. I expect
it's unlikely for Xen itself to be affected. But an in-guest privilege
escalation would already be bad enough.

So why don't we do the linear address check as we do today (provided a
linear address is available), and only use the alternative approach when
no linear address is available?

Jan
Marek Marczykowski-Górecki April 9, 2025, 2:27 p.m. UTC | #12
On Tue, Apr 08, 2025 at 03:57:17PM +0200, Jan Beulich wrote:
> On 08.04.2025 11:31, Roger Pau Monne wrote:
> > When running on AMD hardware in HVM mode the guest linear address (GLA)
> > will not be provided to hvm_emulate_one_mmio(), and instead is
> > unconditionally set of ~0.  As a consequence mmio_ro_emulated_write() will
> > always report an error, as the fault GLA generated by the emulation of the
> > access won't be ~0.
> 
> Which means subpage_mmio_write_accept() is flawed, too, on AMD (or more
> generally whenever .gla_valid isn't set).

That may explain issues I see when using XHCI console on AMD (I can
crash the whole thing using sequence of driver binding/unbinding in
dom0). That's actually the hw12 runner in the other series, but tests
that are included in gitlab do not trigger the issue (fortunately?). But
also, it may be a different issue, as it affects PV dom0 too...

Anyway, I can probably test a patch if subpage_mmio_write_accept() works
as intended (I'll need to check if that path is exercised on AMD too as
it depends on xhci caps layout - it was definitely used on Intel).
Roger Pau Monné April 9, 2025, 3:33 p.m. UTC | #13
On Wed, Apr 09, 2025 at 04:08:47PM +0200, Jan Beulich wrote:
> On 09.04.2025 16:01, Roger Pau Monné wrote:
> > On Wed, Apr 09, 2025 at 03:50:13PM +0200, Jan Beulich wrote:
> >> On 09.04.2025 15:33, Roger Pau Monné wrote:
> >>> On Wed, Apr 09, 2025 at 02:59:45PM +0200, Jan Beulich wrote:
> >>>> On 09.04.2025 12:39, Roger Pau Monné wrote:
> >>>>> On Wed, Apr 09, 2025 at 12:00:16PM +0200, Jan Beulich wrote:
> >>>>>> On 09.04.2025 11:07, Roger Pau Monné wrote:
> >>>>>>> On Tue, Apr 08, 2025 at 03:57:17PM +0200, Jan Beulich wrote:
> >>>>>>>> On 08.04.2025 11:31, Roger Pau Monne wrote:
> >>>>>>>>> When running on AMD hardware in HVM mode the guest linear address (GLA)
> >>>>>>>>> will not be provided to hvm_emulate_one_mmio(), and instead is
> >>>>>>>>> unconditionally set of ~0.  As a consequence mmio_ro_emulated_write() will
> >>>>>>>>> always report an error, as the fault GLA generated by the emulation of the
> >>>>>>>>> access won't be ~0.
> >>>>>>>>
> >>>>>>>> Which means subpage_mmio_write_accept() is flawed, too, on AMD (or more
> >>>>>>>> generally whenever .gla_valid isn't set).
> >>>>>>>
> >>>>>>> Oh, yes, good catch.  I didn't notice that one.  We should move all
> >>>>>>> those checks to use a paddr rather than a gla.
> >>>>>>
> >>>>>> Really that function could just be passed the offset into the page.
> >>>>>>
> >>>>>>>>> Fix this by only checking for the fault GLA in mmio_ro_emulated_write()
> >>>>>>>>> when the guest is PV.
> >>>>>>>>
> >>>>>>>> This narrows checking too much, imo. For VT-x we could continue to do so,
> >>>>>>>> provided we pass e.g. npfec down into hvm_emulate_one_mmio(), i.e. make
> >>>>>>>> the gla_valid flag visible there.
> >>>>>>>
> >>>>>>> I don't think we should rely on the gla at all in
> >>>>>>> mmio_ro_emulated_write(), and instead just use the physical address.
> >>>>>>
> >>>>>> But you can't validate a physical address against a CR2 value. And I view
> >>>>>> this validation as meaningful, to guard (best effort, but still) against
> >>>>>> e.g. insn re-writing under our feet.
> >>>>>
> >>>>> But we have the mfn in mmio_ro_ctxt, and could possibly use that to
> >>>>> validate?  I could expand the context to include the offset also, so
> >>>>> that we could fully validate it.
> >>>>
> >>>> How would you use the MFN to validate against the VA in CR2?
> >>>
> >>> I would use hvmemul_virtual_to_linear()
> >>
> >> If you mean to use the CR2 as input, you wouldn't need this. I said VA in
> >> my earlier reply, yes, but strictly speaking that's a linear address.
> > 
> > I was thinking about using the segment and offset parameters of the
> > mmio_ro_emulated_write() call.
> > 
> >>> and hvm_translate_get_page()
> >>> to get the underlying mfn of the linear address.  But maybe there's a
> >>> part of this that I'm missing, I've certainly haven't tried to
> >>> implement any of it.
> >>
> >> Hmm, I see. I didn't think of doing it this way round. There's certainly
> >> at least one caveat with this approach: Multiple linear addresses can all
> >> map to the same GFN and hence MFN. Checking against the original linear
> >> address (when available) doesn't have such an issue.
> > 
> > I see... Yet for AMD that address is not uniformly available as part
> > of the vmexit information?
> 
> Even stronger, I thought: It's uniformly not available.

Oh yes, that's what I meant to say but got the words the other way
around.

> >  As I understand the checks done in
> > mmio_ro_emulated_write() are to ensure correctness, but carrying the
> > access even when the %cr2 check fail wouldn't cause issues to Xen
> > itself?
> 
> Well, "wouldn't" is too strong for my taste, "shouldn't" would fit. The
> checking is there to avoid guests playing games. Whether that prevents
> merely in-guest just-bugs or actual XSAs we can't know until we find a
> case where the game playing might make us do something wrong. I expect
> it's unlikely for Xen itself to be affected. But an in-guest privilege
> escalation would already be bad enough.

I see.  That was kind of my understanding of the checks.  At least on
HVM it feels a bit weird to handle r/o regions this way.  It would IMO
be more natural to use an hvm_io_handler, but that's maybe because I'm
more familiar with those.

And in that regard, hvm_io_handler don't seem to do any of the extra
checking that mmio_ro_emulated_write() does with the %cr2, but maybe
that's done by some higher layer?  AFAICT that would ultimately get
called by hvmemul_read(), and there are no checks there at all.

> So why don't we do the linear address check as we do today (provided a
> linear address is available), and only use the alternative approach when
> no linear address is available?

I can try to do that, albeit as said above, at least for HVM guests
that checking of %cr2 seems to be quite inconsistent, as
hvmemul_{read,write}() won't do any of it.

Thanks, Roger.
Jan Beulich April 10, 2025, 6:27 a.m. UTC | #14
On 09.04.2025 17:33, Roger Pau Monné wrote:
> On Wed, Apr 09, 2025 at 04:08:47PM +0200, Jan Beulich wrote:
>> On 09.04.2025 16:01, Roger Pau Monné wrote:
>>>  As I understand the checks done in
>>> mmio_ro_emulated_write() are to ensure correctness, but carrying the
>>> access even when the %cr2 check fail wouldn't cause issues to Xen
>>> itself?
>>
>> Well, "wouldn't" is too strong for my taste, "shouldn't" would fit. The
>> checking is there to avoid guests playing games. Whether that prevents
>> merely in-guest just-bugs or actual XSAs we can't know until we find a
>> case where the game playing might make us do something wrong. I expect
>> it's unlikely for Xen itself to be affected. But an in-guest privilege
>> escalation would already be bad enough.
> 
> I see.  That was kind of my understanding of the checks.  At least on
> HVM it feels a bit weird to handle r/o regions this way.  It would IMO
> be more natural to use an hvm_io_handler, but that's maybe because I'm
> more familiar with those.

I guess this would be an option; I assume it's the way it is because PVHv1
inherited it from PV, and PVHv2 inherited it from PVHv1.

> And in that regard, hvm_io_handler don't seem to do any of the extra
> checking that mmio_ro_emulated_write() does with the %cr2, but maybe
> that's done by some higher layer?  AFAICT that would ultimately get
> called by hvmemul_read(), and there are no checks there at all.

That more general framework isn't page-fault specific, and hence there's
no CR2 recorded to check against.

Jan
Roger Pau Monné April 10, 2025, 7:02 a.m. UTC | #15
On Thu, Apr 10, 2025 at 08:27:49AM +0200, Jan Beulich wrote:
> On 09.04.2025 17:33, Roger Pau Monné wrote:
> > On Wed, Apr 09, 2025 at 04:08:47PM +0200, Jan Beulich wrote:
> >> On 09.04.2025 16:01, Roger Pau Monné wrote:
> >>>  As I understand the checks done in
> >>> mmio_ro_emulated_write() are to ensure correctness, but carrying the
> >>> access even when the %cr2 check fail wouldn't cause issues to Xen
> >>> itself?
> >>
> >> Well, "wouldn't" is too strong for my taste, "shouldn't" would fit. The
> >> checking is there to avoid guests playing games. Whether that prevents
> >> merely in-guest just-bugs or actual XSAs we can't know until we find a
> >> case where the game playing might make us do something wrong. I expect
> >> it's unlikely for Xen itself to be affected. But an in-guest privilege
> >> escalation would already be bad enough.
> > 
> > I see.  That was kind of my understanding of the checks.  At least on
> > HVM it feels a bit weird to handle r/o regions this way.  It would IMO
> > be more natural to use an hvm_io_handler, but that's maybe because I'm
> > more familiar with those.
> 
> I guess this would be an option; I assume it's the way it is because PVHv1
> inherited it from PV, and PVHv2 inherited it from PVHv1.

I have a draft with this approach, and it seems quite better, as it
allows to get rid of hvm_emulate_one_mmio() and the special casing
done in hvm_hap_nested_page_fault().

Thanks, Roger.
diff mbox series

Patch

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 4fecd37aeca0..79836705c51e 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -5187,7 +5187,12 @@  int cf_check mmio_ro_emulated_write(
 
     /* Only allow naturally-aligned stores at the original %cr2 address. */
     if ( ((bytes | offset) & (bytes - 1)) || !bytes ||
-         offset != mmio_ro_ctxt->cr2 )
+         /*
+          * HVM domains might not have a valid fault GLA in the context, as AMD
+          * NPT faults don't report the faulting GLA.  It's also possible for
+          * the fault to happen in non-paging modes.
+          */
+         (is_pv_domain(current->domain) && offset != mmio_ro_ctxt->cr2) )
     {
         gdprintk(XENLOG_WARNING, "bad access (cr2=%lx, addr=%lx, bytes=%u)\n",
                 mmio_ro_ctxt->cr2, offset, bytes);