diff mbox series

[4.15] x86: mirror compat argument translation area for 32-bit PV

Message ID bdedf018-b6c4-d0da-fb4b-8cf2d048c3b1@suse.com (mailing list archive)
State Superseded
Headers show
Series [4.15] x86: mirror compat argument translation area for 32-bit PV | expand

Commit Message

Jan Beulich Feb. 22, 2021, 10:27 a.m. UTC
Now that we guard the entire Xen VA space against speculative abuse
through hypervisor accesses to guest memory, the argument translation
area's VA also needs to live outside this range, at least for 32-bit PV
guests. To avoid extra is_hvm_*() conditionals, use the alternative VA
uniformly.

While this could be conditionalized upon CONFIG_PV32 &&
CONFIG_SPECULATIVE_HARDEN_GUEST_ACCESS, omitting such extra conditionals
keeps the code more legible imo.

Fixes: 4dc181599142 ("x86/PV: harden guest memory accesses against speculative abuse")
Signed-off-by: Jan Beulich <jbeulich@suse.com>

Comments

Ian Jackson Feb. 22, 2021, 11:26 a.m. UTC | #1
Jan Beulich writes ("[PATCH][4.15] x86: mirror compat argument translation area for 32-bit PV"):
> Now that we guard the entire Xen VA space against speculative abuse
> through hypervisor accesses to guest memory, the argument translation
> area's VA also needs to live outside this range, at least for 32-bit PV
> guests. To avoid extra is_hvm_*() conditionals, use the alternative VA
> uniformly.
> 
> While this could be conditionalized upon CONFIG_PV32 &&
> CONFIG_SPECULATIVE_HARDEN_GUEST_ACCESS, omitting such extra conditionals
> keeps the code more legible imo.
> 
> Fixes: 4dc181599142 ("x86/PV: harden guest memory accesses against speculative abuse")
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Release-Acked-by: Ian Jackson <iwj@xenproject.org>

Despite the fact that this is trying to fix the current breakage, I
would still want to see a full maintainer review.

Thanks,
Ian.
Roger Pau Monné Feb. 22, 2021, 11:35 a.m. UTC | #2
On Mon, Feb 22, 2021 at 11:27:07AM +0100, Jan Beulich wrote:
> Now that we guard the entire Xen VA space against speculative abuse
> through hypervisor accesses to guest memory, the argument translation
> area's VA also needs to live outside this range, at least for 32-bit PV
> guests. To avoid extra is_hvm_*() conditionals, use the alternative VA
> uniformly.

Since you are double mapping the per-domain virtual area, won't it
make more sense to map it just once outside of the Xen virtual space
area? (so it's always using PML4_ADDR(511))

Is there anything concerning in the per-domain area that should be
protected against speculative accesses?

Thanks, Roger.
Jan Beulich Feb. 22, 2021, 2:12 p.m. UTC | #3
On 22.02.2021 12:35, Roger Pau Monné wrote:
> On Mon, Feb 22, 2021 at 11:27:07AM +0100, Jan Beulich wrote:
>> Now that we guard the entire Xen VA space against speculative abuse
>> through hypervisor accesses to guest memory, the argument translation
>> area's VA also needs to live outside this range, at least for 32-bit PV
>> guests. To avoid extra is_hvm_*() conditionals, use the alternative VA
>> uniformly.
> 
> Since you are double mapping the per-domain virtual area, won't it
> make more sense to map it just once outside of the Xen virtual space
> area? (so it's always using PML4_ADDR(511))

This would then require conditionals in paths using other parts of
the per-domain mappings for 64-bit PV, as the same range is under
guest control there.

> Is there anything concerning in the per-domain area that should be
> protected against speculative accesses?

First of all this is an unrelated question - I'm not changing what
gets accessed there, but only through which addresses these accesses
happen. What lives there are GDT/LDT mappings, map cache, and the
argument translation area. The guest has no control (or very limited
when considering GDT/LDT one) over the accesses made to this space.

Jan
Roger Pau Monné Feb. 22, 2021, 2:13 p.m. UTC | #4
On Mon, Feb 22, 2021 at 12:35:21PM +0100, Roger Pau Monné wrote:
> On Mon, Feb 22, 2021 at 11:27:07AM +0100, Jan Beulich wrote:
> > Now that we guard the entire Xen VA space against speculative abuse
> > through hypervisor accesses to guest memory, the argument translation
> > area's VA also needs to live outside this range, at least for 32-bit PV
> > guests. To avoid extra is_hvm_*() conditionals, use the alternative VA
> > uniformly.
> 
> Since you are double mapping the per-domain virtual area, won't it
> make more sense to map it just once outside of the Xen virtual space
> area? (so it's always using PML4_ADDR(511))

Right, that's not possible for PV 64bit domains because it's guest
owned linear address space in that case.

It seems like paravirt_ctxt_switch_to will modify the root_pgt to set
the PERDOMAIN_VIRT_START entry, does the same need to be done for
PERDOMAIN2_VIRT_START?

I would also consider giving the slot a more meaningful name, as
PERDOMAIN2_VIRT_START makes it seem like a new per-domain scratch
space, when it's just a different mapping of the existing physical
memory.

Maybe PERDOMAIN_MIRROR_VIRT_START? Or PERDOMAIN_XLAT_VIRT_START?

Thanks, Roger.
Andrew Cooper Feb. 22, 2021, 2:14 p.m. UTC | #5
On 22/02/2021 10:27, Jan Beulich wrote:
> Now that we guard the entire Xen VA space against speculative abuse
> through hypervisor accesses to guest memory, the argument translation
> area's VA also needs to live outside this range, at least for 32-bit PV
> guests. To avoid extra is_hvm_*() conditionals, use the alternative VA
> uniformly.
>
> While this could be conditionalized upon CONFIG_PV32 &&
> CONFIG_SPECULATIVE_HARDEN_GUEST_ACCESS, omitting such extra conditionals
> keeps the code more legible imo.
>
> Fixes: 4dc181599142 ("x86/PV: harden guest memory accesses against speculative abuse")
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>
> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -1727,6 +1727,11 @@ void init_xen_l4_slots(l4_pgentry_t *l4t
>                 (ROOT_PAGETABLE_FIRST_XEN_SLOT + slots -
>                  l4_table_offset(XEN_VIRT_START)) * sizeof(*l4t));
>      }
> +
> +    /* Slot 511: Per-domain mappings mirror. */
> +    if ( !is_pv_64bit_domain(d) )
> +        l4t[l4_table_offset(PERDOMAIN2_VIRT_START)] =
> +            l4e_from_page(d->arch.perdomain_l3_pg, __PAGE_HYPERVISOR_RW);

This virtual address is inside the extended directmap.  You're going to
need to rearrange more things than just this, to make it safe.

While largely a theoretical risk as far as the directmap goes, there is
now a rather higher risk of colliding with the ERR_PTR() range.  Its bad
enough this infrastructure is inherently unsafe with 64bit PV guests,

~Andrew
Jan Beulich Feb. 22, 2021, 2:20 p.m. UTC | #6
On 22.02.2021 15:13, Roger Pau Monné wrote:
> On Mon, Feb 22, 2021 at 12:35:21PM +0100, Roger Pau Monné wrote:
>> On Mon, Feb 22, 2021 at 11:27:07AM +0100, Jan Beulich wrote:
>>> Now that we guard the entire Xen VA space against speculative abuse
>>> through hypervisor accesses to guest memory, the argument translation
>>> area's VA also needs to live outside this range, at least for 32-bit PV
>>> guests. To avoid extra is_hvm_*() conditionals, use the alternative VA
>>> uniformly.
>>
>> Since you are double mapping the per-domain virtual area, won't it
>> make more sense to map it just once outside of the Xen virtual space
>> area? (so it's always using PML4_ADDR(511))
> 
> Right, that's not possible for PV 64bit domains because it's guest
> owned linear address space in that case.
> 
> It seems like paravirt_ctxt_switch_to will modify the root_pgt to set
> the PERDOMAIN_VIRT_START entry, does the same need to be done for
> PERDOMAIN2_VIRT_START?

I don't think so, no. Argument translation doesn't happen when
the restricted page tables are in use, and all other uses of
the per-domain area continue to use the "normal" VA.

> I would also consider giving the slot a more meaningful name, as
> PERDOMAIN2_VIRT_START makes it seem like a new per-domain scratch
> space, when it's just a different mapping of the existing physical
> memory.
> 
> Maybe PERDOMAIN_MIRROR_VIRT_START? Or PERDOMAIN_XLAT_VIRT_START?

XLAT would be too specific - while we use it for xlat only, it's
still all of the mappings that appear at the alternate addresses.
I did consider using MIRROR, but it got too long for my taste.
Now that I think about it maybe PERDOMAIN_ALT_VIRT_START would do?

Jan
Jan Beulich Feb. 22, 2021, 2:22 p.m. UTC | #7
On 22.02.2021 15:14, Andrew Cooper wrote:
> On 22/02/2021 10:27, Jan Beulich wrote:
>> Now that we guard the entire Xen VA space against speculative abuse
>> through hypervisor accesses to guest memory, the argument translation
>> area's VA also needs to live outside this range, at least for 32-bit PV
>> guests. To avoid extra is_hvm_*() conditionals, use the alternative VA
>> uniformly.
>>
>> While this could be conditionalized upon CONFIG_PV32 &&
>> CONFIG_SPECULATIVE_HARDEN_GUEST_ACCESS, omitting such extra conditionals
>> keeps the code more legible imo.
>>
>> Fixes: 4dc181599142 ("x86/PV: harden guest memory accesses against speculative abuse")
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>
>> --- a/xen/arch/x86/mm.c
>> +++ b/xen/arch/x86/mm.c
>> @@ -1727,6 +1727,11 @@ void init_xen_l4_slots(l4_pgentry_t *l4t
>>                 (ROOT_PAGETABLE_FIRST_XEN_SLOT + slots -
>>                  l4_table_offset(XEN_VIRT_START)) * sizeof(*l4t));
>>      }
>> +
>> +    /* Slot 511: Per-domain mappings mirror. */
>> +    if ( !is_pv_64bit_domain(d) )
>> +        l4t[l4_table_offset(PERDOMAIN2_VIRT_START)] =
>> +            l4e_from_page(d->arch.perdomain_l3_pg, __PAGE_HYPERVISOR_RW);
> 
> This virtual address is inside the extended directmap.

No. That one covers only the range excluding the last L4 slot.

>  You're going to
> need to rearrange more things than just this, to make it safe.

I specifically picked that entry because I don't think further
arrangements are needed.

> While largely a theoretical risk as far as the directmap goes, there is
> now a rather higher risk of colliding with the ERR_PTR() range.  Its bad
> enough this infrastructure is inherently unsafe with 64bit PV guests,

The ERR_PTR() range is still _far_ away from the sub-ranges we
use in the per-domain area.

Jan
Roger Pau Monné Feb. 22, 2021, 2:46 p.m. UTC | #8
On Mon, Feb 22, 2021 at 03:20:24PM +0100, Jan Beulich wrote:
> On 22.02.2021 15:13, Roger Pau Monné wrote:
> > On Mon, Feb 22, 2021 at 12:35:21PM +0100, Roger Pau Monné wrote:
> >> On Mon, Feb 22, 2021 at 11:27:07AM +0100, Jan Beulich wrote:
> >>> Now that we guard the entire Xen VA space against speculative abuse
> >>> through hypervisor accesses to guest memory, the argument translation
> >>> area's VA also needs to live outside this range, at least for 32-bit PV
> >>> guests. To avoid extra is_hvm_*() conditionals, use the alternative VA
> >>> uniformly.
> >>
> >> Since you are double mapping the per-domain virtual area, won't it
> >> make more sense to map it just once outside of the Xen virtual space
> >> area? (so it's always using PML4_ADDR(511))
> > 
> > Right, that's not possible for PV 64bit domains because it's guest
> > owned linear address space in that case.
> > 
> > It seems like paravirt_ctxt_switch_to will modify the root_pgt to set
> > the PERDOMAIN_VIRT_START entry, does the same need to be done for
> > PERDOMAIN2_VIRT_START?
> 
> I don't think so, no. Argument translation doesn't happen when
> the restricted page tables are in use, and all other uses of
> the per-domain area continue to use the "normal" VA.

Oh, OK, thanks for the clarification. AFAICT the PERDOMAIN2_VIRT_START
slot won't get populated on the restricted page tables, and hence will
always trigger a page fault if access is attempted with those tables
loaded.

> > I would also consider giving the slot a more meaningful name, as
> > PERDOMAIN2_VIRT_START makes it seem like a new per-domain scratch
> > space, when it's just a different mapping of the existing physical
> > memory.
> > 
> > Maybe PERDOMAIN_MIRROR_VIRT_START? Or PERDOMAIN_XLAT_VIRT_START?
> 
> XLAT would be too specific - while we use it for xlat only, it's
> still all of the mappings that appear at the alternate addresses.

Well, given that such mappings won't be available when running 64bit
PV guests I still think it's unlikely to be used for anything that's
not XLAT specific, as it won't work for 64bit PV guests otherwise.

> I did consider using MIRROR, but it got too long for my taste.
> Now that I think about it maybe PERDOMAIN_ALT_VIRT_START would do?

Indeed, I would prefer that rather than PERDOMAIN2_VIRT_START if you
still consider XLAT to be too specific.

Thanks, Roger.
Andrew Cooper Feb. 22, 2021, 4:47 p.m. UTC | #9
On 22/02/2021 14:22, Jan Beulich wrote:
> On 22.02.2021 15:14, Andrew Cooper wrote:
>> On 22/02/2021 10:27, Jan Beulich wrote:
>>> Now that we guard the entire Xen VA space against speculative abuse
>>> through hypervisor accesses to guest memory, the argument translation
>>> area's VA also needs to live outside this range, at least for 32-bit PV
>>> guests. To avoid extra is_hvm_*() conditionals, use the alternative VA
>>> uniformly.
>>>
>>> While this could be conditionalized upon CONFIG_PV32 &&
>>> CONFIG_SPECULATIVE_HARDEN_GUEST_ACCESS, omitting such extra conditionals
>>> keeps the code more legible imo.
>>>
>>> Fixes: 4dc181599142 ("x86/PV: harden guest memory accesses against speculative abuse")
>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>>
>>> --- a/xen/arch/x86/mm.c
>>> +++ b/xen/arch/x86/mm.c
>>> @@ -1727,6 +1727,11 @@ void init_xen_l4_slots(l4_pgentry_t *l4t
>>>                 (ROOT_PAGETABLE_FIRST_XEN_SLOT + slots -
>>>                  l4_table_offset(XEN_VIRT_START)) * sizeof(*l4t));
>>>      }
>>> +
>>> +    /* Slot 511: Per-domain mappings mirror. */
>>> +    if ( !is_pv_64bit_domain(d) )
>>> +        l4t[l4_table_offset(PERDOMAIN2_VIRT_START)] =
>>> +            l4e_from_page(d->arch.perdomain_l3_pg, __PAGE_HYPERVISOR_RW);
>> This virtual address is inside the extended directmap.
> No. That one covers only the range excluding the last L4 slot.
>
>>   You're going to
>> need to rearrange more things than just this, to make it safe.
> I specifically picked that entry because I don't think further
> arrangements are needed.

map_domain_page() will blindly hand this virtual address if an
appropriate mfn is passed, because there are no suitability checks.

The error handling isn't great, but at least any attempt to use that
pointer would fault, which is now no longer the case.

LA57 machines can have RAM or NVDIMMs in a range which will tickle this
bug.  In fact, they can have MFNs which would wrap around 0 into guest
space.

~Andrew
Roger Pau Monné Feb. 22, 2021, 7:36 p.m. UTC | #10
On Mon, Feb 22, 2021 at 04:47:38PM +0000, Andrew Cooper wrote:
> On 22/02/2021 14:22, Jan Beulich wrote:
> > On 22.02.2021 15:14, Andrew Cooper wrote:
> >> On 22/02/2021 10:27, Jan Beulich wrote:
> >>> Now that we guard the entire Xen VA space against speculative abuse
> >>> through hypervisor accesses to guest memory, the argument translation
> >>> area's VA also needs to live outside this range, at least for 32-bit PV
> >>> guests. To avoid extra is_hvm_*() conditionals, use the alternative VA
> >>> uniformly.
> >>>
> >>> While this could be conditionalized upon CONFIG_PV32 &&
> >>> CONFIG_SPECULATIVE_HARDEN_GUEST_ACCESS, omitting such extra conditionals
> >>> keeps the code more legible imo.
> >>>
> >>> Fixes: 4dc181599142 ("x86/PV: harden guest memory accesses against speculative abuse")
> >>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> >>>
> >>> --- a/xen/arch/x86/mm.c
> >>> +++ b/xen/arch/x86/mm.c
> >>> @@ -1727,6 +1727,11 @@ void init_xen_l4_slots(l4_pgentry_t *l4t
> >>>                 (ROOT_PAGETABLE_FIRST_XEN_SLOT + slots -
> >>>                  l4_table_offset(XEN_VIRT_START)) * sizeof(*l4t));
> >>>      }
> >>> +
> >>> +    /* Slot 511: Per-domain mappings mirror. */
> >>> +    if ( !is_pv_64bit_domain(d) )
> >>> +        l4t[l4_table_offset(PERDOMAIN2_VIRT_START)] =
> >>> +            l4e_from_page(d->arch.perdomain_l3_pg, __PAGE_HYPERVISOR_RW);
> >> This virtual address is inside the extended directmap.
> > No. That one covers only the range excluding the last L4 slot.
> >
> >>   You're going to
> >> need to rearrange more things than just this, to make it safe.
> > I specifically picked that entry because I don't think further
> > arrangements are needed.
> 
> map_domain_page() will blindly hand this virtual address if an
> appropriate mfn is passed, because there are no suitability checks.
> 
> The error handling isn't great, but at least any attempt to use that
> pointer would fault, which is now no longer the case.

AFAICT map_domain_page will never populate the error page virtual
address, as the slot end (MAPCACHE_VIRT_END) is way lower than
-MAX_ERRNO?

We could add:

BUILD_BUG_ON((PERDOMAIN_VIRT_SLOT(PERDOMAIN_SLOTS) - 1) >= (unsigned long)-MAX_ERRNO);

For safety somewhere.

Roger.
Jan Beulich Feb. 23, 2021, 7:13 a.m. UTC | #11
On 22.02.2021 17:47, Andrew Cooper wrote:
> On 22/02/2021 14:22, Jan Beulich wrote:
>> On 22.02.2021 15:14, Andrew Cooper wrote:
>>> On 22/02/2021 10:27, Jan Beulich wrote:
>>>> Now that we guard the entire Xen VA space against speculative abuse
>>>> through hypervisor accesses to guest memory, the argument translation
>>>> area's VA also needs to live outside this range, at least for 32-bit PV
>>>> guests. To avoid extra is_hvm_*() conditionals, use the alternative VA
>>>> uniformly.
>>>>
>>>> While this could be conditionalized upon CONFIG_PV32 &&
>>>> CONFIG_SPECULATIVE_HARDEN_GUEST_ACCESS, omitting such extra conditionals
>>>> keeps the code more legible imo.
>>>>
>>>> Fixes: 4dc181599142 ("x86/PV: harden guest memory accesses against speculative abuse")
>>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>>>
>>>> --- a/xen/arch/x86/mm.c
>>>> +++ b/xen/arch/x86/mm.c
>>>> @@ -1727,6 +1727,11 @@ void init_xen_l4_slots(l4_pgentry_t *l4t
>>>>                 (ROOT_PAGETABLE_FIRST_XEN_SLOT + slots -
>>>>                  l4_table_offset(XEN_VIRT_START)) * sizeof(*l4t));
>>>>      }
>>>> +
>>>> +    /* Slot 511: Per-domain mappings mirror. */
>>>> +    if ( !is_pv_64bit_domain(d) )
>>>> +        l4t[l4_table_offset(PERDOMAIN2_VIRT_START)] =
>>>> +            l4e_from_page(d->arch.perdomain_l3_pg, __PAGE_HYPERVISOR_RW);
>>> This virtual address is inside the extended directmap.
>> No. That one covers only the range excluding the last L4 slot.
>>
>>>   You're going to
>>> need to rearrange more things than just this, to make it safe.
>> I specifically picked that entry because I don't think further
>> arrangements are needed.
> 
> map_domain_page() will blindly hand this virtual address if an
> appropriate mfn is passed, because there are no suitability checks.
> 
> The error handling isn't great, but at least any attempt to use that
> pointer would fault, which is now no longer the case.
> 
> LA57 machines can have RAM or NVDIMMs in a range which will tickle this
> bug.  In fact, they can have MFNs which would wrap around 0 into guest
> space.

This latter fact would be a far worse problem than accesses through
the last L4 entry, populated or not. However, I don't really follow
your concern: There are ample cases where functions assume to be
passed sane arguments. A pretty good (imo) comparison here is with
mfn_to_page(), which also will assume a sane MFN (i.e. one with a
representable (in frame_table[]) value. If there was a bug, it
would be either the caller taking an MFN out of thin air, or us
introducing MFNs we can't cover in any of direct map, frame table,
or M2P. But afaict there is guarding against the latter (look for
the "Ignoring inaccessible memory range" loge messages in setup.c).

In any event - imo any such bug would need fixing there, rather
than being an argument against the change here.

Also, besides your objection going quite a bit too far for my taste,
I miss any form of an alternative suggestion. Do you want the mirror
range to be put below the canonical boundary? Taking in mind your
wrapping consideration, about _any_ VA would be unsuitable.

Jan
Andrew Cooper Feb. 24, 2021, 7:04 p.m. UTC | #12
On 23/02/2021 07:13, Jan Beulich wrote:
> On 22.02.2021 17:47, Andrew Cooper wrote:
>> On 22/02/2021 14:22, Jan Beulich wrote:
>>> On 22.02.2021 15:14, Andrew Cooper wrote:
>>>> On 22/02/2021 10:27, Jan Beulich wrote:
>>>>> Now that we guard the entire Xen VA space against speculative abuse
>>>>> through hypervisor accesses to guest memory, the argument translation
>>>>> area's VA also needs to live outside this range, at least for 32-bit PV
>>>>> guests. To avoid extra is_hvm_*() conditionals, use the alternative VA
>>>>> uniformly.
>>>>>
>>>>> While this could be conditionalized upon CONFIG_PV32 &&
>>>>> CONFIG_SPECULATIVE_HARDEN_GUEST_ACCESS, omitting such extra conditionals
>>>>> keeps the code more legible imo.
>>>>>
>>>>> Fixes: 4dc181599142 ("x86/PV: harden guest memory accesses against speculative abuse")
>>>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>>>>
>>>>> --- a/xen/arch/x86/mm.c
>>>>> +++ b/xen/arch/x86/mm.c
>>>>> @@ -1727,6 +1727,11 @@ void init_xen_l4_slots(l4_pgentry_t *l4t
>>>>>                 (ROOT_PAGETABLE_FIRST_XEN_SLOT + slots -
>>>>>                  l4_table_offset(XEN_VIRT_START)) * sizeof(*l4t));
>>>>>      }
>>>>> +
>>>>> +    /* Slot 511: Per-domain mappings mirror. */
>>>>> +    if ( !is_pv_64bit_domain(d) )
>>>>> +        l4t[l4_table_offset(PERDOMAIN2_VIRT_START)] =
>>>>> +            l4e_from_page(d->arch.perdomain_l3_pg, __PAGE_HYPERVISOR_RW);
>>>> This virtual address is inside the extended directmap.
>>> No. That one covers only the range excluding the last L4 slot.
>>>
>>>>   You're going to
>>>> need to rearrange more things than just this, to make it safe.
>>> I specifically picked that entry because I don't think further
>>> arrangements are needed.
>> map_domain_page() will blindly hand this virtual address if an
>> appropriate mfn is passed, because there are no suitability checks.
>>
>> The error handling isn't great, but at least any attempt to use that
>> pointer would fault, which is now no longer the case.
>>
>> LA57 machines can have RAM or NVDIMMs in a range which will tickle this
>> bug.  In fact, they can have MFNs which would wrap around 0 into guest
>> space.
> This latter fact would be a far worse problem than accesses through
> the last L4 entry, populated or not. However, I don't really follow
> your concern: There are ample cases where functions assume to be
> passed sane arguments. A pretty good (imo) comparison here is with
> mfn_to_page(), which also will assume a sane MFN (i.e. one with a
> representable (in frame_table[]) value. If there was a bug, it
> would be either the caller taking an MFN out of thin air, or us
> introducing MFNs we can't cover in any of direct map, frame table,
> or M2P. But afaict there is guarding against the latter (look for
> the "Ignoring inaccessible memory range" loge messages in setup.c).

I'm not trying to say that this patch has introduced the bug.

But we should absolutely have defence in depth where appropriate.  I
don't mind if it is an unrelated change, but 4.15 does start trying to
introduce support for IceLake and I think this qualifies as a reasonable
precaution to add.

> In any event - imo any such bug would need fixing there, rather
> than being an argument against the change here.
>
> Also, besides your objection going quite a bit too far for my taste,
> I miss any form of an alternative suggestion. Do you want the mirror
> range to be put below the canonical boundary? Taking in mind your
> wrapping consideration, about _any_ VA would be unsuitable.

Honestly, I want the XLAT area to disappear entirely.  This is partly
PTSD from the acquire_resource fixes, but was an opinion held from
before that series as well, and I'm convinced that the result without
XLAT will be clearer and faster code.

Obviously, that's not an option at this point in 4.15.


I'd forgotten that the lower half of the address space was available,
and I do prefer that idea.  We don't need to put everything adjacent
together in the upper half.

~Andrew
diff mbox series

Patch

--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -1727,6 +1727,11 @@  void init_xen_l4_slots(l4_pgentry_t *l4t
                (ROOT_PAGETABLE_FIRST_XEN_SLOT + slots -
                 l4_table_offset(XEN_VIRT_START)) * sizeof(*l4t));
     }
+
+    /* Slot 511: Per-domain mappings mirror. */
+    if ( !is_pv_64bit_domain(d) )
+        l4t[l4_table_offset(PERDOMAIN2_VIRT_START)] =
+            l4e_from_page(d->arch.perdomain_l3_pg, __PAGE_HYPERVISOR_RW);
 }
 
 bool fill_ro_mpt(mfn_t mfn)
--- a/xen/include/asm-x86/config.h
+++ b/xen/include/asm-x86/config.h
@@ -159,11 +159,11 @@  extern unsigned char boot_edid_info[128]
  *    1:1 direct mapping of all physical memory.
 #endif
  *  0xffff880000000000 - 0xffffffffffffffff [120TB,             PML4:272-511]
- *    PV: Guest-defined use.
+ *    PV (64-bit): Guest-defined use.
  *  0xffff880000000000 - 0xffffff7fffffffff [119.5TB,           PML4:272-510]
  *    HVM/idle: continuation of 1:1 mapping
  *  0xffffff8000000000 - 0xffffffffffffffff [512GB, 2^39 bytes  PML4:511]
- *    HVM/idle: unused
+ *    HVM / 32-bit PV: Secondary per-domain mappings.
  *
  * Compatibility guest area layout:
  *  0x0000000000000000 - 0x00000000f57fffff [3928MB,            PML4:0]
@@ -242,6 +242,9 @@  extern unsigned char boot_edid_info[128]
 #endif
 #define DIRECTMAP_VIRT_END      (DIRECTMAP_VIRT_START + DIRECTMAP_SIZE)
 
+/* Slot 511: secondary per-domain mappings (for compat xlat area accesses). */
+#define PERDOMAIN2_VIRT_START   (PML4_ADDR(511))
+
 #ifndef __ASSEMBLY__
 
 #ifdef CONFIG_PV32
--- a/xen/include/asm-x86/x86_64/uaccess.h
+++ b/xen/include/asm-x86/x86_64/uaccess.h
@@ -1,7 +1,17 @@ 
 #ifndef __X86_64_UACCESS_H
 #define __X86_64_UACCESS_H
 
-#define COMPAT_ARG_XLAT_VIRT_BASE ((void *)ARG_XLAT_START(current))
+/*
+ * With CONFIG_SPECULATIVE_HARDEN_GUEST_ACCESS (apparent) PV guest accesses
+ * are prohibited to touch the Xen private VA range.  The compat argument
+ * translation area, therefore, can't live within this range.  Domains
+ * (potentially) in need of argument translation (32-bit PV, possibly HVM) get
+ * a secondary mapping installed, which needs to be used for such accesses in
+ * the PV case, and will also be used for HVM to avoid extra conditionals.
+ */
+#define COMPAT_ARG_XLAT_VIRT_BASE ((void *)ARG_XLAT_START(current) + \
+                                   (PERDOMAIN2_VIRT_START - \
+                                    PERDOMAIN_VIRT_START))
 #define COMPAT_ARG_XLAT_SIZE      (2*PAGE_SIZE)
 struct vcpu;
 int setup_compat_arg_xlat(struct vcpu *v);