diff mbox series

[v5,1/3] x86/iommu: remove regions not to be mapped

Message ID 20240124172953.29814-2-roger.pau@citrix.com (mailing list archive)
State New
Headers show
Series x86/iommu: improve setup time of hwdom IOMMU | expand

Commit Message

Roger Pau Monne Jan. 24, 2024, 5:29 p.m. UTC
Introduce the code to remove regions not to be mapped from the rangeset
that will be used to setup the IOMMU page tables for the hardware domain.

This change also introduces two new functions: remove_xen_ranges() and
vpci_subtract_mmcfg() that copy the logic in xen_in_range() and
vpci_is_mmcfg_address() respectively and remove the ranges that would otherwise
be intercepted by the original functions.

Note that the rangeset is still not populated.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
Changes since v4:
 - Fix off-by-one when removing the Xen used ranges, as the rangesets are
   inclusive.

Changes since v3:
 - Remove unnecessary line wrapping.

Changes since v1:
 - Split from bigger patch.
---
 xen/arch/x86/hvm/io.c               | 16 ++++++++
 xen/arch/x86/include/asm/hvm/io.h   |  3 ++
 xen/arch/x86/include/asm/setup.h    |  1 +
 xen/arch/x86/setup.c                | 48 +++++++++++++++++++++++
 xen/drivers/passthrough/x86/iommu.c | 61 +++++++++++++++++++++++++++++
 5 files changed, 129 insertions(+)

Comments

Jan Beulich Jan. 25, 2024, 8:34 a.m. UTC | #1
On 24.01.2024 18:29, Roger Pau Monne wrote:
> --- a/xen/arch/x86/hvm/io.c
> +++ b/xen/arch/x86/hvm/io.c
> @@ -369,6 +369,22 @@ bool vpci_is_mmcfg_address(const struct domain *d, paddr_t addr)
>      return vpci_mmcfg_find(d, addr);
>  }
>  
> +int __hwdom_init vpci_subtract_mmcfg(const struct domain *d, struct rangeset *r)
> +{
> +    const struct hvm_mmcfg *mmcfg;
> +
> +    list_for_each_entry ( mmcfg, &d->arch.hvm.mmcfg_regions, next )
> +    {
> +        int rc = rangeset_remove_range(r, PFN_DOWN(mmcfg->addr),
> +                                       PFN_DOWN(mmcfg->addr + mmcfg->size - 1));

Along the lines of this, ...

> --- a/xen/arch/x86/setup.c
> +++ b/xen/arch/x86/setup.c
> @@ -2138,6 +2138,54 @@ int __hwdom_init xen_in_range(unsigned long mfn)
>      return 0;
>  }
>  
> +int __hwdom_init remove_xen_ranges(struct rangeset *r)
> +{
> +    paddr_t start, end;
> +    int rc;
> +
> +    /* S3 resume code (and other real mode trampoline code) */
> +    rc = rangeset_remove_range(r, PFN_DOWN(bootsym_phys(trampoline_start)),
> +                               PFN_DOWN(bootsym_phys(trampoline_end)) - 1);

... did you perhaps mean

                               PFN_DOWN(bootsym_phys(trampoline_end) - 1));

here (and then similarly below, except there the difference is benign I
think, for the labels being page-aligned)?

Jan
Roger Pau Monne Jan. 25, 2024, 8:47 a.m. UTC | #2
On Thu, Jan 25, 2024 at 09:34:40AM +0100, Jan Beulich wrote:
> On 24.01.2024 18:29, Roger Pau Monne wrote:
> > --- a/xen/arch/x86/hvm/io.c
> > +++ b/xen/arch/x86/hvm/io.c
> > @@ -369,6 +369,22 @@ bool vpci_is_mmcfg_address(const struct domain *d, paddr_t addr)
> >      return vpci_mmcfg_find(d, addr);
> >  }
> >  
> > +int __hwdom_init vpci_subtract_mmcfg(const struct domain *d, struct rangeset *r)
> > +{
> > +    const struct hvm_mmcfg *mmcfg;
> > +
> > +    list_for_each_entry ( mmcfg, &d->arch.hvm.mmcfg_regions, next )
> > +    {
> > +        int rc = rangeset_remove_range(r, PFN_DOWN(mmcfg->addr),
> > +                                       PFN_DOWN(mmcfg->addr + mmcfg->size - 1));
> 
> Along the lines of this, ...
> 
> > --- a/xen/arch/x86/setup.c
> > +++ b/xen/arch/x86/setup.c
> > @@ -2138,6 +2138,54 @@ int __hwdom_init xen_in_range(unsigned long mfn)
> >      return 0;
> >  }
> >  
> > +int __hwdom_init remove_xen_ranges(struct rangeset *r)
> > +{
> > +    paddr_t start, end;
> > +    int rc;
> > +
> > +    /* S3 resume code (and other real mode trampoline code) */
> > +    rc = rangeset_remove_range(r, PFN_DOWN(bootsym_phys(trampoline_start)),
> > +                               PFN_DOWN(bootsym_phys(trampoline_end)) - 1);
> 
> ... did you perhaps mean
> 
>                                PFN_DOWN(bootsym_phys(trampoline_end) - 1));
> 
> here (and then similarly below, except there the difference is benign I
> think, for the labels being page-aligned)?

They are all page aligned, so I didn't care much,  but now that you
point it might be safer to do the subtraction from the address instead
of the frame number, just in case.

Thanks, Roger.
Jan Beulich Jan. 25, 2024, 11:13 a.m. UTC | #3
On 25.01.2024 09:47, Roger Pau Monné wrote:
> On Thu, Jan 25, 2024 at 09:34:40AM +0100, Jan Beulich wrote:
>> On 24.01.2024 18:29, Roger Pau Monne wrote:
>>> --- a/xen/arch/x86/hvm/io.c
>>> +++ b/xen/arch/x86/hvm/io.c
>>> @@ -369,6 +369,22 @@ bool vpci_is_mmcfg_address(const struct domain *d, paddr_t addr)
>>>      return vpci_mmcfg_find(d, addr);
>>>  }
>>>  
>>> +int __hwdom_init vpci_subtract_mmcfg(const struct domain *d, struct rangeset *r)
>>> +{
>>> +    const struct hvm_mmcfg *mmcfg;
>>> +
>>> +    list_for_each_entry ( mmcfg, &d->arch.hvm.mmcfg_regions, next )
>>> +    {
>>> +        int rc = rangeset_remove_range(r, PFN_DOWN(mmcfg->addr),
>>> +                                       PFN_DOWN(mmcfg->addr + mmcfg->size - 1));
>>
>> Along the lines of this, ...
>>
>>> --- a/xen/arch/x86/setup.c
>>> +++ b/xen/arch/x86/setup.c
>>> @@ -2138,6 +2138,54 @@ int __hwdom_init xen_in_range(unsigned long mfn)
>>>      return 0;
>>>  }
>>>  
>>> +int __hwdom_init remove_xen_ranges(struct rangeset *r)
>>> +{
>>> +    paddr_t start, end;
>>> +    int rc;
>>> +
>>> +    /* S3 resume code (and other real mode trampoline code) */
>>> +    rc = rangeset_remove_range(r, PFN_DOWN(bootsym_phys(trampoline_start)),
>>> +                               PFN_DOWN(bootsym_phys(trampoline_end)) - 1);
>>
>> ... did you perhaps mean
>>
>>                                PFN_DOWN(bootsym_phys(trampoline_end) - 1));
>>
>> here (and then similarly below, except there the difference is benign I
>> think, for the labels being page-aligned)?
> 
> They are all page aligned, so I didn't care much,  but now that you
> point it might be safer to do the subtraction from the address instead
> of the frame number, just in case.

Hmm, no, for me neither trampoline_end nor trampoline_start are page
aligned. While bootsym_phys(trampoline_start) is, I don't think
bootsym_phys(trampoline_end) normally would be (it might only be by
coincidence).

Jan
Roger Pau Monne Jan. 25, 2024, 12:37 p.m. UTC | #4
On Thu, Jan 25, 2024 at 12:13:01PM +0100, Jan Beulich wrote:
> On 25.01.2024 09:47, Roger Pau Monné wrote:
> > On Thu, Jan 25, 2024 at 09:34:40AM +0100, Jan Beulich wrote:
> >> On 24.01.2024 18:29, Roger Pau Monne wrote:
> >>> --- a/xen/arch/x86/hvm/io.c
> >>> +++ b/xen/arch/x86/hvm/io.c
> >>> @@ -369,6 +369,22 @@ bool vpci_is_mmcfg_address(const struct domain *d, paddr_t addr)
> >>>      return vpci_mmcfg_find(d, addr);
> >>>  }
> >>>  
> >>> +int __hwdom_init vpci_subtract_mmcfg(const struct domain *d, struct rangeset *r)
> >>> +{
> >>> +    const struct hvm_mmcfg *mmcfg;
> >>> +
> >>> +    list_for_each_entry ( mmcfg, &d->arch.hvm.mmcfg_regions, next )
> >>> +    {
> >>> +        int rc = rangeset_remove_range(r, PFN_DOWN(mmcfg->addr),
> >>> +                                       PFN_DOWN(mmcfg->addr + mmcfg->size - 1));
> >>
> >> Along the lines of this, ...
> >>
> >>> --- a/xen/arch/x86/setup.c
> >>> +++ b/xen/arch/x86/setup.c
> >>> @@ -2138,6 +2138,54 @@ int __hwdom_init xen_in_range(unsigned long mfn)
> >>>      return 0;
> >>>  }
> >>>  
> >>> +int __hwdom_init remove_xen_ranges(struct rangeset *r)
> >>> +{
> >>> +    paddr_t start, end;
> >>> +    int rc;
> >>> +
> >>> +    /* S3 resume code (and other real mode trampoline code) */
> >>> +    rc = rangeset_remove_range(r, PFN_DOWN(bootsym_phys(trampoline_start)),
> >>> +                               PFN_DOWN(bootsym_phys(trampoline_end)) - 1);
> >>
> >> ... did you perhaps mean
> >>
> >>                                PFN_DOWN(bootsym_phys(trampoline_end) - 1));
> >>
> >> here (and then similarly below, except there the difference is benign I
> >> think, for the labels being page-aligned)?
> > 
> > They are all page aligned, so I didn't care much,  but now that you
> > point it might be safer to do the subtraction from the address instead
> > of the frame number, just in case.
> 
> Hmm, no, for me neither trampoline_end nor trampoline_start are page
> aligned. While bootsym_phys(trampoline_start) is, I don't think
> bootsym_phys(trampoline_end) normally would be (it might only be by
> coincidence).

Oh, so it had been a coincidence of the build I was using I guess then.

Thanks, Roger.
Andrew Cooper Jan. 25, 2024, 12:55 p.m. UTC | #5
On 25/01/2024 12:37 pm, Roger Pau Monné wrote:
> On Thu, Jan 25, 2024 at 12:13:01PM +0100, Jan Beulich wrote:
>> On 25.01.2024 09:47, Roger Pau Monné wrote:
>>> On Thu, Jan 25, 2024 at 09:34:40AM +0100, Jan Beulich wrote:
>>>> On 24.01.2024 18:29, Roger Pau Monne wrote:
>>>>> --- a/xen/arch/x86/hvm/io.c
>>>>> +++ b/xen/arch/x86/hvm/io.c
>>>>> @@ -369,6 +369,22 @@ bool vpci_is_mmcfg_address(const struct domain *d, paddr_t addr)
>>>>>      return vpci_mmcfg_find(d, addr);
>>>>>  }
>>>>>  
>>>>> +int __hwdom_init vpci_subtract_mmcfg(const struct domain *d, struct rangeset *r)
>>>>> +{
>>>>> +    const struct hvm_mmcfg *mmcfg;
>>>>> +
>>>>> +    list_for_each_entry ( mmcfg, &d->arch.hvm.mmcfg_regions, next )
>>>>> +    {
>>>>> +        int rc = rangeset_remove_range(r, PFN_DOWN(mmcfg->addr),
>>>>> +                                       PFN_DOWN(mmcfg->addr + mmcfg->size - 1));
>>>> Along the lines of this, ...
>>>>
>>>>> --- a/xen/arch/x86/setup.c
>>>>> +++ b/xen/arch/x86/setup.c
>>>>> @@ -2138,6 +2138,54 @@ int __hwdom_init xen_in_range(unsigned long mfn)
>>>>>      return 0;
>>>>>  }
>>>>>  
>>>>> +int __hwdom_init remove_xen_ranges(struct rangeset *r)
>>>>> +{
>>>>> +    paddr_t start, end;
>>>>> +    int rc;
>>>>> +
>>>>> +    /* S3 resume code (and other real mode trampoline code) */
>>>>> +    rc = rangeset_remove_range(r, PFN_DOWN(bootsym_phys(trampoline_start)),
>>>>> +                               PFN_DOWN(bootsym_phys(trampoline_end)) - 1);
>>>> ... did you perhaps mean
>>>>
>>>>                                PFN_DOWN(bootsym_phys(trampoline_end) - 1));
>>>>
>>>> here (and then similarly below, except there the difference is benign I
>>>> think, for the labels being page-aligned)?
>>> They are all page aligned, so I didn't care much,  but now that you
>>> point it might be safer to do the subtraction from the address instead
>>> of the frame number, just in case.
>> Hmm, no, for me neither trampoline_end nor trampoline_start are page
>> aligned. While bootsym_phys(trampoline_start) is, I don't think
>> bootsym_phys(trampoline_end) normally would be (it might only be by
>> coincidence).
> Oh, so it had been a coincidence of the build I was using I guess then.

trampoline_start has to be page aligned because of constraints from SIPI
and S3 (cant remember which one is the 4k constraint, but it's in the
comments).

On APs (and indeed, in Xen's pagetables), the trampoline is only a
single 4k page.

However, trampoline_end is quite a lot longer because there's various
things that get done on the BSP only, including recovering the E820 map,
EDID/etc in 16bit mode.

That said, we don't edit the trampoline very often, so if it happened to
work for you first time around, it probably hasn't changed since.

~Andrew
Jan Beulich Jan. 25, 2024, 1:13 p.m. UTC | #6
On 25.01.2024 13:55, Andrew Cooper wrote:
> On 25/01/2024 12:37 pm, Roger Pau Monné wrote:
>> On Thu, Jan 25, 2024 at 12:13:01PM +0100, Jan Beulich wrote:
>>> On 25.01.2024 09:47, Roger Pau Monné wrote:
>>>> On Thu, Jan 25, 2024 at 09:34:40AM +0100, Jan Beulich wrote:
>>>>> On 24.01.2024 18:29, Roger Pau Monne wrote:
>>>>>> --- a/xen/arch/x86/hvm/io.c
>>>>>> +++ b/xen/arch/x86/hvm/io.c
>>>>>> @@ -369,6 +369,22 @@ bool vpci_is_mmcfg_address(const struct domain *d, paddr_t addr)
>>>>>>      return vpci_mmcfg_find(d, addr);
>>>>>>  }
>>>>>>  
>>>>>> +int __hwdom_init vpci_subtract_mmcfg(const struct domain *d, struct rangeset *r)
>>>>>> +{
>>>>>> +    const struct hvm_mmcfg *mmcfg;
>>>>>> +
>>>>>> +    list_for_each_entry ( mmcfg, &d->arch.hvm.mmcfg_regions, next )
>>>>>> +    {
>>>>>> +        int rc = rangeset_remove_range(r, PFN_DOWN(mmcfg->addr),
>>>>>> +                                       PFN_DOWN(mmcfg->addr + mmcfg->size - 1));
>>>>> Along the lines of this, ...
>>>>>
>>>>>> --- a/xen/arch/x86/setup.c
>>>>>> +++ b/xen/arch/x86/setup.c
>>>>>> @@ -2138,6 +2138,54 @@ int __hwdom_init xen_in_range(unsigned long mfn)
>>>>>>      return 0;
>>>>>>  }
>>>>>>  
>>>>>> +int __hwdom_init remove_xen_ranges(struct rangeset *r)
>>>>>> +{
>>>>>> +    paddr_t start, end;
>>>>>> +    int rc;
>>>>>> +
>>>>>> +    /* S3 resume code (and other real mode trampoline code) */
>>>>>> +    rc = rangeset_remove_range(r, PFN_DOWN(bootsym_phys(trampoline_start)),
>>>>>> +                               PFN_DOWN(bootsym_phys(trampoline_end)) - 1);
>>>>> ... did you perhaps mean
>>>>>
>>>>>                                PFN_DOWN(bootsym_phys(trampoline_end) - 1));
>>>>>
>>>>> here (and then similarly below, except there the difference is benign I
>>>>> think, for the labels being page-aligned)?
>>>> They are all page aligned, so I didn't care much,  but now that you
>>>> point it might be safer to do the subtraction from the address instead
>>>> of the frame number, just in case.
>>> Hmm, no, for me neither trampoline_end nor trampoline_start are page
>>> aligned. While bootsym_phys(trampoline_start) is, I don't think
>>> bootsym_phys(trampoline_end) normally would be (it might only be by
>>> coincidence).
>> Oh, so it had been a coincidence of the build I was using I guess then.
> 
> trampoline_start has to be page aligned because of constraints from SIPI
> and S3 (cant remember which one is the 4k constraint, but it's in the
> comments).

What you're talking about is the copy of the trampoline code/data in
low memory. trampoline_{start,end} themselves point into the Xen image.

> On APs (and indeed, in Xen's pagetables), the trampoline is only a
> single 4k page.
> 
> However, trampoline_end is quite a lot longer because there's various
> things that get done on the BSP only, including recovering the E820 map,
> EDID/etc in 16bit mode.

And this BSP-only part really wouldn't need removing here, I think.
The issue is that the BSP-only and also-AP plus S3-wakeup parts aren't
properly delimited (hmm, maybe wakeup_stack can be used for this
purpose). But if, as you say, we map only a single page, we could as
well limit logic here to just that.

Jan

> That said, we don't edit the trampoline very often, so if it happened to
> work for you first time around, it probably hasn't changed since.
> 
> ~Andrew
Andrew Cooper Jan. 25, 2024, 1:22 p.m. UTC | #7
On 25/01/2024 1:13 pm, Jan Beulich wrote:
> On 25.01.2024 13:55, Andrew Cooper wrote:
>> On 25/01/2024 12:37 pm, Roger Pau Monné wrote:
>>> On Thu, Jan 25, 2024 at 12:13:01PM +0100, Jan Beulich wrote:
>>>> On 25.01.2024 09:47, Roger Pau Monné wrote:
>>>>> On Thu, Jan 25, 2024 at 09:34:40AM +0100, Jan Beulich wrote:
>>>>>> On 24.01.2024 18:29, Roger Pau Monne wrote:
>>>>>>> --- a/xen/arch/x86/hvm/io.c
>>>>>>> +++ b/xen/arch/x86/hvm/io.c
>>>>>>> @@ -369,6 +369,22 @@ bool vpci_is_mmcfg_address(const struct domain *d, paddr_t addr)
>>>>>>>      return vpci_mmcfg_find(d, addr);
>>>>>>>  }
>>>>>>>  
>>>>>>> +int __hwdom_init vpci_subtract_mmcfg(const struct domain *d, struct rangeset *r)
>>>>>>> +{
>>>>>>> +    const struct hvm_mmcfg *mmcfg;
>>>>>>> +
>>>>>>> +    list_for_each_entry ( mmcfg, &d->arch.hvm.mmcfg_regions, next )
>>>>>>> +    {
>>>>>>> +        int rc = rangeset_remove_range(r, PFN_DOWN(mmcfg->addr),
>>>>>>> +                                       PFN_DOWN(mmcfg->addr + mmcfg->size - 1));
>>>>>> Along the lines of this, ...
>>>>>>
>>>>>>> --- a/xen/arch/x86/setup.c
>>>>>>> +++ b/xen/arch/x86/setup.c
>>>>>>> @@ -2138,6 +2138,54 @@ int __hwdom_init xen_in_range(unsigned long mfn)
>>>>>>>      return 0;
>>>>>>>  }
>>>>>>>  
>>>>>>> +int __hwdom_init remove_xen_ranges(struct rangeset *r)
>>>>>>> +{
>>>>>>> +    paddr_t start, end;
>>>>>>> +    int rc;
>>>>>>> +
>>>>>>> +    /* S3 resume code (and other real mode trampoline code) */
>>>>>>> +    rc = rangeset_remove_range(r, PFN_DOWN(bootsym_phys(trampoline_start)),
>>>>>>> +                               PFN_DOWN(bootsym_phys(trampoline_end)) - 1);
>>>>>> ... did you perhaps mean
>>>>>>
>>>>>>                                PFN_DOWN(bootsym_phys(trampoline_end) - 1));
>>>>>>
>>>>>> here (and then similarly below, except there the difference is benign I
>>>>>> think, for the labels being page-aligned)?
>>>>> They are all page aligned, so I didn't care much,  but now that you
>>>>> point it might be safer to do the subtraction from the address instead
>>>>> of the frame number, just in case.
>>>> Hmm, no, for me neither trampoline_end nor trampoline_start are page
>>>> aligned. While bootsym_phys(trampoline_start) is, I don't think
>>>> bootsym_phys(trampoline_end) normally would be (it might only be by
>>>> coincidence).
>>> Oh, so it had been a coincidence of the build I was using I guess then.
>> trampoline_start has to be page aligned because of constraints from SIPI
>> and S3 (cant remember which one is the 4k constraint, but it's in the
>> comments).
> What you're talking about is the copy of the trampoline code/data in
> low memory. trampoline_{start,end} themselves point into the Xen image.

True, but we're operating on bootsym_phys(trampoline_start) which had
better be aligned.

We hard-code (by virtue of only filling in 1 single 4k PTE in the
pagetables) that the AP trampoline is 4k.

The range here should be 4k only too, or we're (falsely) marking lowmem
adjacent to the AP trampoline as a Xen range when it's not.

~Andrew
Roger Pau Monne Jan. 25, 2024, 2:37 p.m. UTC | #8
On Thu, Jan 25, 2024 at 01:22:15PM +0000, Andrew Cooper wrote:
> On 25/01/2024 1:13 pm, Jan Beulich wrote:
> > On 25.01.2024 13:55, Andrew Cooper wrote:
> >> On 25/01/2024 12:37 pm, Roger Pau Monné wrote:
> >>> On Thu, Jan 25, 2024 at 12:13:01PM +0100, Jan Beulich wrote:
> >>>> On 25.01.2024 09:47, Roger Pau Monné wrote:
> >>>>> On Thu, Jan 25, 2024 at 09:34:40AM +0100, Jan Beulich wrote:
> >>>>>> On 24.01.2024 18:29, Roger Pau Monne wrote:
> >>>>>>> --- a/xen/arch/x86/hvm/io.c
> >>>>>>> +++ b/xen/arch/x86/hvm/io.c
> >>>>>>> @@ -369,6 +369,22 @@ bool vpci_is_mmcfg_address(const struct domain *d, paddr_t addr)
> >>>>>>>      return vpci_mmcfg_find(d, addr);
> >>>>>>>  }
> >>>>>>>  
> >>>>>>> +int __hwdom_init vpci_subtract_mmcfg(const struct domain *d, struct rangeset *r)
> >>>>>>> +{
> >>>>>>> +    const struct hvm_mmcfg *mmcfg;
> >>>>>>> +
> >>>>>>> +    list_for_each_entry ( mmcfg, &d->arch.hvm.mmcfg_regions, next )
> >>>>>>> +    {
> >>>>>>> +        int rc = rangeset_remove_range(r, PFN_DOWN(mmcfg->addr),
> >>>>>>> +                                       PFN_DOWN(mmcfg->addr + mmcfg->size - 1));
> >>>>>> Along the lines of this, ...
> >>>>>>
> >>>>>>> --- a/xen/arch/x86/setup.c
> >>>>>>> +++ b/xen/arch/x86/setup.c
> >>>>>>> @@ -2138,6 +2138,54 @@ int __hwdom_init xen_in_range(unsigned long mfn)
> >>>>>>>      return 0;
> >>>>>>>  }
> >>>>>>>  
> >>>>>>> +int __hwdom_init remove_xen_ranges(struct rangeset *r)
> >>>>>>> +{
> >>>>>>> +    paddr_t start, end;
> >>>>>>> +    int rc;
> >>>>>>> +
> >>>>>>> +    /* S3 resume code (and other real mode trampoline code) */
> >>>>>>> +    rc = rangeset_remove_range(r, PFN_DOWN(bootsym_phys(trampoline_start)),
> >>>>>>> +                               PFN_DOWN(bootsym_phys(trampoline_end)) - 1);
> >>>>>> ... did you perhaps mean
> >>>>>>
> >>>>>>                                PFN_DOWN(bootsym_phys(trampoline_end) - 1));
> >>>>>>
> >>>>>> here (and then similarly below, except there the difference is benign I
> >>>>>> think, for the labels being page-aligned)?
> >>>>> They are all page aligned, so I didn't care much,  but now that you
> >>>>> point it might be safer to do the subtraction from the address instead
> >>>>> of the frame number, just in case.
> >>>> Hmm, no, for me neither trampoline_end nor trampoline_start are page
> >>>> aligned. While bootsym_phys(trampoline_start) is, I don't think
> >>>> bootsym_phys(trampoline_end) normally would be (it might only be by
> >>>> coincidence).
> >>> Oh, so it had been a coincidence of the build I was using I guess then.
> >> trampoline_start has to be page aligned because of constraints from SIPI
> >> and S3 (cant remember which one is the 4k constraint, but it's in the
> >> comments).
> > What you're talking about is the copy of the trampoline code/data in
> > low memory. trampoline_{start,end} themselves point into the Xen image.
> 
> True, but we're operating on bootsym_phys(trampoline_start) which had
> better be aligned.
> 
> We hard-code (by virtue of only filling in 1 single 4k PTE in the
> pagetables) that the AP trampoline is 4k.
> 
> The range here should be 4k only too, or we're (falsely) marking lowmem
> adjacent to the AP trampoline as a Xen range when it's not.

Hm, looking at zap_low_mappings() we do seem to possibly map more than
one page, in fact on my current build trampoline_end -
trampoline_start is 6528.

Thanks, Roger.
diff mbox series

Patch

diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
index d75af83ad01f..a42854c52b65 100644
--- a/xen/arch/x86/hvm/io.c
+++ b/xen/arch/x86/hvm/io.c
@@ -369,6 +369,22 @@  bool vpci_is_mmcfg_address(const struct domain *d, paddr_t addr)
     return vpci_mmcfg_find(d, addr);
 }
 
+int __hwdom_init vpci_subtract_mmcfg(const struct domain *d, struct rangeset *r)
+{
+    const struct hvm_mmcfg *mmcfg;
+
+    list_for_each_entry ( mmcfg, &d->arch.hvm.mmcfg_regions, next )
+    {
+        int rc = rangeset_remove_range(r, PFN_DOWN(mmcfg->addr),
+                                       PFN_DOWN(mmcfg->addr + mmcfg->size - 1));
+
+        if ( rc )
+            return rc;
+    }
+
+    return 0;
+}
+
 static unsigned int vpci_mmcfg_decode_addr(const struct hvm_mmcfg *mmcfg,
                                            paddr_t addr, pci_sbdf_t *sbdf)
 {
diff --git a/xen/arch/x86/include/asm/hvm/io.h b/xen/arch/x86/include/asm/hvm/io.h
index a97731657801..e1e5e6fe7491 100644
--- a/xen/arch/x86/include/asm/hvm/io.h
+++ b/xen/arch/x86/include/asm/hvm/io.h
@@ -156,6 +156,9 @@  void destroy_vpci_mmcfg(struct domain *d);
 /* Check if an address is between a MMCFG region for a domain. */
 bool vpci_is_mmcfg_address(const struct domain *d, paddr_t addr);
 
+/* Remove MMCFG regions from a given rangeset. */
+int vpci_subtract_mmcfg(const struct domain *d, struct rangeset *r);
+
 #endif /* __ASM_X86_HVM_IO_H__ */
 
 
diff --git a/xen/arch/x86/include/asm/setup.h b/xen/arch/x86/include/asm/setup.h
index 9a460e4db8f4..cd07d98101d8 100644
--- a/xen/arch/x86/include/asm/setup.h
+++ b/xen/arch/x86/include/asm/setup.h
@@ -37,6 +37,7 @@  void discard_initial_images(void);
 void *bootstrap_map(const module_t *mod);
 
 int xen_in_range(unsigned long mfn);
+int remove_xen_ranges(struct rangeset *r);
 
 extern uint8_t kbd_shift_flags;
 
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 897b7e92082e..c9f65c3a70b8 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -2138,6 +2138,54 @@  int __hwdom_init xen_in_range(unsigned long mfn)
     return 0;
 }
 
+int __hwdom_init remove_xen_ranges(struct rangeset *r)
+{
+    paddr_t start, end;
+    int rc;
+
+    /* S3 resume code (and other real mode trampoline code) */
+    rc = rangeset_remove_range(r, PFN_DOWN(bootsym_phys(trampoline_start)),
+                               PFN_DOWN(bootsym_phys(trampoline_end)) - 1);
+    if ( rc )
+        return rc;
+
+    /*
+     * This needs to remain in sync with the uses of the same symbols in
+     * - __start_xen()
+     * - is_xen_fixed_mfn()
+     * - tboot_shutdown()
+     */
+    /* hypervisor .text + .rodata */
+    rc = rangeset_remove_range(r, PFN_DOWN(__pa(&_stext)),
+                               PFN_DOWN(__pa(&__2M_rodata_end)) - 1);
+    if ( rc )
+        return rc;
+
+    /* hypervisor .data + .bss */
+    if ( efi_boot_mem_unused(&start, &end) )
+    {
+        ASSERT(__pa(start) >= __pa(&__2M_rwdata_start));
+        rc = rangeset_remove_range(r, PFN_DOWN(__pa(&__2M_rwdata_start)),
+                                   PFN_DOWN(__pa(start)) - 1);
+        if ( rc )
+            return rc;
+        ASSERT(__pa(end) <= __pa(&__2M_rwdata_end));
+        rc = rangeset_remove_range(r, PFN_DOWN(__pa(end)),
+                                   PFN_DOWN(__pa(&__2M_rwdata_end)) - 1);
+        if ( rc )
+            return rc;
+    }
+    else
+    {
+        rc = rangeset_remove_range(r, PFN_DOWN(__pa(&__2M_rwdata_start)),
+                                   PFN_DOWN(__pa(&__2M_rwdata_end)) - 1);
+        if ( rc )
+            return rc;
+    }
+
+    return 0;
+}
+
 static int __hwdom_init cf_check io_bitmap_cb(
     unsigned long s, unsigned long e, void *ctx)
 {
diff --git a/xen/drivers/passthrough/x86/iommu.c b/xen/drivers/passthrough/x86/iommu.c
index 59b0c7e980ca..fc5215a9dc40 100644
--- a/xen/drivers/passthrough/x86/iommu.c
+++ b/xen/drivers/passthrough/x86/iommu.c
@@ -370,6 +370,14 @@  static unsigned int __hwdom_init hwdom_iommu_map(const struct domain *d,
     return perms;
 }
 
+static int __hwdom_init cf_check map_subtract(unsigned long s, unsigned long e,
+                                              void *data)
+{
+    struct rangeset *map = data;
+
+    return rangeset_remove_range(map, s, e);
+}
+
 struct map_data {
     struct domain *d;
     unsigned int flush_flags;
@@ -533,6 +541,59 @@  void __hwdom_init arch_iommu_hwdom_init(struct domain *d)
             goto commit;
     }
 
+    /* Remove any areas in-use by Xen. */
+    rc = remove_xen_ranges(map);
+    if ( rc )
+        panic("IOMMU failed to remove Xen ranges: %d\n", rc);
+
+    /* Remove any overlap with the Interrupt Address Range. */
+    rc = rangeset_remove_range(map, 0xfee00, 0xfeeff);
+    if ( rc )
+        panic("IOMMU failed to remove Interrupt Address Range: %d\n", rc);
+
+    /* If emulating IO-APIC(s) make sure the base address is unmapped. */
+    if ( has_vioapic(d) )
+    {
+        for ( i = 0; i < d->arch.hvm.nr_vioapics; i++ )
+        {
+            rc = rangeset_remove_singleton(map,
+                PFN_DOWN(domain_vioapic(d, i)->base_address));
+            if ( rc )
+                panic("IOMMU failed to remove IO-APIC: %d\n", rc);
+        }
+    }
+
+    if ( is_pv_domain(d) )
+    {
+        /*
+         * Be consistent with CPU mappings: Dom0 is permitted to establish r/o
+         * ones there (also for e.g. HPET in certain cases), so it should also
+         * have such established for IOMMUs.  Remove any read-only ranges here,
+         * since ranges in mmio_ro_ranges are already explicitly mapped below
+         * in read-only mode.
+         */
+        rc = rangeset_report_ranges(mmio_ro_ranges, 0, ~0UL, map_subtract, map);
+        if ( rc )
+            panic("IOMMU failed to remove read-only regions: %d\n", rc);
+    }
+
+    if ( has_vpci(d) )
+    {
+        /*
+         * TODO: runtime added MMCFG regions are not checked to make sure they
+         * don't overlap with already mapped regions, thus preventing trapping.
+         */
+        rc = vpci_subtract_mmcfg(d, map);
+        if ( rc )
+            panic("IOMMU unable to remove MMCFG areas: %d\n", rc);
+    }
+
+    /* Remove any regions past the last address addressable by the domain. */
+    rc = rangeset_remove_range(map, PFN_DOWN(1UL << domain_max_paddr_bits(d)),
+                               ~0UL);
+    if ( rc )
+        panic("IOMMU unable to remove unaddressable ranges: %d\n", rc);
+
     if ( iommu_verbose )
         printk(XENLOG_INFO "%pd: identity mappings for IOMMU:\n", d);