Message ID | 20250214092928.28932-3-roger.pau@citrix.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | x86/pvh: workaround missing MMIO regions in dom0 p2m | expand |
On 14.02.2025 10:29, Roger Pau Monne wrote: > --- a/docs/misc/xen-command-line.pandoc > +++ b/docs/misc/xen-command-line.pandoc > @@ -822,7 +822,8 @@ Specify the bit width of the DMA heap. > > ### dom0 > = List of [ pv | pvh, shadow=<bool>, verbose=<bool>, > - cpuid-faulting=<bool>, msr-relaxed=<bool> ] (x86) > + cpuid-faulting=<bool>, msr-relaxed=<bool>, > + pf-fixup=<bool> ] (x86) > > = List of [ sve=<integer> ] (Arm64) > > @@ -883,6 +884,19 @@ Controls for how dom0 is constructed on x86 systems. > > If using this option is necessary to fix an issue, please report a bug. > > +* The `pf-fixup` boolean is only applicable when using a PVH dom0 and > + defaults to false. > + > + When running dom0 in PVH mode the dom0 kernel has no way to map MMIO > + regions into the p2m, such mode relies on Xen dom0 builder populating > + the p2m with all MMIO regions that dom0 should access. However Xen > + doesn't have a complete picture of the host memory map, due to not > + being able to process ACPI dynamic tables. > + > + The `pf-fixup` option allows Xen to attempt to add missing MMIO regions > + to the p2m in response to page-faults generated by dom0 trying to access > + unpopulated entries in the p2m. I wonder if this is to implementation focused for a command line option doc. In particular the multiple uses of "p2m" are standing out in this regard. > --- a/xen/arch/x86/dom0_build.c > +++ b/xen/arch/x86/dom0_build.c > @@ -286,6 +286,10 @@ int __init parse_arch_dom0_param(const char *s, const char *e) > opt_dom0_cpuid_faulting = val; > else if ( (val = parse_boolean("msr-relaxed", s, e)) >= 0 ) > opt_dom0_msr_relaxed = val; > +#ifdef CONFIG_HVM > + else if ( (val = parse_boolean("pf-fixup", s, e)) >= 0 ) > + opt_dom0_pf_fixup = val; > +#endif > else > return -EINVAL; I fear the scope of these sub-options is getting increasingly confusing. opt_dom0_msr_relaxed is what its name says - specific to Dom0. opt_dom0_cpuid_faulting, otoh, is a control domain option (i.e. also applicable to a [hypothetical?] late ctrldom). Now you add an option that's applicable to the hardware domain, i.e. also coverting late-hwdom. > --- a/xen/arch/x86/hvm/emulate.c > +++ b/xen/arch/x86/hvm/emulate.c > @@ -338,8 +338,38 @@ static int hvmemul_do_io( > if ( !s ) > { > if ( is_mmio && is_hardware_domain(currd) ) > - gdprintk(XENLOG_DEBUG, "unhandled memory %s to %#lx size %u\n", > - dir ? "read" : "write", addr, size); > + { > + /* > + * PVH dom0 is likely missing MMIO mappings on the p2m, due to > + * the incomplete information Xen has about the memory layout. > + * > + * Either print a message to note dom0 attempted to access an > + * unpopulated GPA, or try to fixup the p2m by creating an > + * identity mapping for the faulting GPA. > + */ > + if ( opt_dom0_pf_fixup ) > + { > + int inner_rc = hvm_hwdom_fixup_p2m(addr); Why not use rc, as we do elsewhere in the function? > --- a/xen/arch/x86/hvm/hvm.c > +++ b/xen/arch/x86/hvm/hvm.c > @@ -13,6 +13,7 @@ > #include <xen/lib.h> > #include <xen/trace.h> > #include <xen/sched.h> > +#include <xen/iocap.h> > #include <xen/irq.h> > #include <xen/softirq.h> > #include <xen/domain.h> > @@ -5458,6 +5459,36 @@ int hvm_copy_context_and_params(struct domain *dst, struct domain *src) > return rc; > } > > +bool __ro_after_init opt_dom0_pf_fixup; > +int hvm_hwdom_fixup_p2m(paddr_t addr) The placement here looks odd to me. Why not as static function in emulate.c? Or alternatively why not as p2m_hwdom_fixup() in mm/p2m.c? > +{ > + unsigned long gfn = paddr_to_pfn(addr); > + struct domain *currd = current->domain; > + p2m_type_t type; > + mfn_t mfn; > + int rc; > + > + ASSERT(is_hardware_domain(currd)); > + ASSERT(!altp2m_active(currd)); > + > + /* > + * Fixups are only applied for MMIO holes, and rely on the hardware domain > + * having identity mappings for non RAM regions (gfn == mfn). > + */ > + if ( !iomem_access_permitted(currd, gfn, gfn) || > + !is_memory_hole(_mfn(gfn), _mfn(gfn)) ) > + return -EPERM; > + > + mfn = get_gfn(currd, gfn, &type); > + if ( !mfn_eq(mfn, INVALID_MFN) || !p2m_is_hole(type) ) > + rc = mfn_eq(mfn, _mfn(gfn)) ? 0 : -EEXIST; I understand this is to cover the case where two vCPU-s access the same GFN at about the same time. However, the "success" log message at the call site being debug-only means we may be silently hiding bugs in release builds, if e.g. we get here despite the GFN having had an identity mapping already for ages. Jan
On Fri, Feb 14, 2025 at 12:53:01PM +0100, Jan Beulich wrote: > On 14.02.2025 10:29, Roger Pau Monne wrote: > > --- a/docs/misc/xen-command-line.pandoc > > +++ b/docs/misc/xen-command-line.pandoc > > @@ -822,7 +822,8 @@ Specify the bit width of the DMA heap. > > > > ### dom0 > > = List of [ pv | pvh, shadow=<bool>, verbose=<bool>, > > - cpuid-faulting=<bool>, msr-relaxed=<bool> ] (x86) > > + cpuid-faulting=<bool>, msr-relaxed=<bool>, > > + pf-fixup=<bool> ] (x86) > > > > = List of [ sve=<integer> ] (Arm64) > > > > @@ -883,6 +884,19 @@ Controls for how dom0 is constructed on x86 systems. > > > > If using this option is necessary to fix an issue, please report a bug. > > > > +* The `pf-fixup` boolean is only applicable when using a PVH dom0 and > > + defaults to false. > > + > > + When running dom0 in PVH mode the dom0 kernel has no way to map MMIO > > + regions into the p2m, such mode relies on Xen dom0 builder populating > > + the p2m with all MMIO regions that dom0 should access. However Xen > > + doesn't have a complete picture of the host memory map, due to not > > + being able to process ACPI dynamic tables. > > + > > + The `pf-fixup` option allows Xen to attempt to add missing MMIO regions > > + to the p2m in response to page-faults generated by dom0 trying to access > > + unpopulated entries in the p2m. > > I wonder if this is to implementation focused for a command line option doc. > In particular the multiple uses of "p2m" are standing out in this regard. Hm, let me try to change p2m with 'dom0 physical memory map' or similar. > > --- a/xen/arch/x86/dom0_build.c > > +++ b/xen/arch/x86/dom0_build.c > > @@ -286,6 +286,10 @@ int __init parse_arch_dom0_param(const char *s, const char *e) > > opt_dom0_cpuid_faulting = val; > > else if ( (val = parse_boolean("msr-relaxed", s, e)) >= 0 ) > > opt_dom0_msr_relaxed = val; > > +#ifdef CONFIG_HVM > > + else if ( (val = parse_boolean("pf-fixup", s, e)) >= 0 ) > > + opt_dom0_pf_fixup = val; > > +#endif > > else > > return -EINVAL; > > I fear the scope of these sub-options is getting increasingly confusing. > opt_dom0_msr_relaxed is what its name says - specific to Dom0. > opt_dom0_cpuid_faulting, otoh, is a control domain option (i.e. also > applicable to a [hypothetical?] late ctrldom). Now you add an option > that's applicable to the hardware domain, i.e. also coverting late-hwdom. It's kind of a mixed bag, but ATM I would leave it as-is because it's likely easier for users to find the options if they are grouped together. WE can always add more fine grained options if there's a desired for them. > > --- a/xen/arch/x86/hvm/emulate.c > > +++ b/xen/arch/x86/hvm/emulate.c > > @@ -338,8 +338,38 @@ static int hvmemul_do_io( > > if ( !s ) > > { > > if ( is_mmio && is_hardware_domain(currd) ) > > - gdprintk(XENLOG_DEBUG, "unhandled memory %s to %#lx size %u\n", > > - dir ? "read" : "write", addr, size); > > + { > > + /* > > + * PVH dom0 is likely missing MMIO mappings on the p2m, due to > > + * the incomplete information Xen has about the memory layout. > > + * > > + * Either print a message to note dom0 attempted to access an > > + * unpopulated GPA, or try to fixup the p2m by creating an > > + * identity mapping for the faulting GPA. > > + */ > > + if ( opt_dom0_pf_fixup ) > > + { > > + int inner_rc = hvm_hwdom_fixup_p2m(addr); > > Why not use rc, as we do elsewhere in the function? hvm_hwdom_fixup_p2m() returns an errno, while rc in this context contains X86EMUL_ values. I could indeed re-use rc, it just felt wrong to mix different error address spaces on the same variable. > > --- a/xen/arch/x86/hvm/hvm.c > > +++ b/xen/arch/x86/hvm/hvm.c > > @@ -13,6 +13,7 @@ > > #include <xen/lib.h> > > #include <xen/trace.h> > > #include <xen/sched.h> > > +#include <xen/iocap.h> > > #include <xen/irq.h> > > #include <xen/softirq.h> > > #include <xen/domain.h> > > @@ -5458,6 +5459,36 @@ int hvm_copy_context_and_params(struct domain *dst, struct domain *src) > > return rc; > > } > > > > +bool __ro_after_init opt_dom0_pf_fixup; > > +int hvm_hwdom_fixup_p2m(paddr_t addr) > > The placement here looks odd to me. Why not as static function in emulate.c? > Or alternatively why not as p2m_hwdom_fixup() in mm/p2m.c? I don't have a strong opinion, if you are fine with it a static function in emulate.c might be the best then. > > +{ > > + unsigned long gfn = paddr_to_pfn(addr); > > + struct domain *currd = current->domain; > > + p2m_type_t type; > > + mfn_t mfn; > > + int rc; > > + > > + ASSERT(is_hardware_domain(currd)); > > + ASSERT(!altp2m_active(currd)); > > + > > + /* > > + * Fixups are only applied for MMIO holes, and rely on the hardware domain > > + * having identity mappings for non RAM regions (gfn == mfn). > > + */ > > + if ( !iomem_access_permitted(currd, gfn, gfn) || > > + !is_memory_hole(_mfn(gfn), _mfn(gfn)) ) > > + return -EPERM; > > + > > + mfn = get_gfn(currd, gfn, &type); > > + if ( !mfn_eq(mfn, INVALID_MFN) || !p2m_is_hole(type) ) > > + rc = mfn_eq(mfn, _mfn(gfn)) ? 0 : -EEXIST; > > I understand this is to cover the case where two vCPU-s access the same GFN > at about the same time. However, the "success" log message at the call site > being debug-only means we may be silently hiding bugs in release builds, if > e.g. we get here despite the GFN having had an identity mapping already for > ages. Possibly, but what would be your suggestion to fix this? I will think about it, but I can't immediately see a solution that's not simply to make the message printed by the caller to be gprintk() instead of gdprintk() so catch such bugs. Would you agree to that? Thanks, Roger.
On 14.02.2025 13:38, Roger Pau Monné wrote: > On Fri, Feb 14, 2025 at 12:53:01PM +0100, Jan Beulich wrote: >> On 14.02.2025 10:29, Roger Pau Monne wrote: >>> --- a/xen/arch/x86/hvm/emulate.c >>> +++ b/xen/arch/x86/hvm/emulate.c >>> @@ -338,8 +338,38 @@ static int hvmemul_do_io( >>> if ( !s ) >>> { >>> if ( is_mmio && is_hardware_domain(currd) ) >>> - gdprintk(XENLOG_DEBUG, "unhandled memory %s to %#lx size %u\n", >>> - dir ? "read" : "write", addr, size); >>> + { >>> + /* >>> + * PVH dom0 is likely missing MMIO mappings on the p2m, due to >>> + * the incomplete information Xen has about the memory layout. >>> + * >>> + * Either print a message to note dom0 attempted to access an >>> + * unpopulated GPA, or try to fixup the p2m by creating an >>> + * identity mapping for the faulting GPA. >>> + */ >>> + if ( opt_dom0_pf_fixup ) >>> + { >>> + int inner_rc = hvm_hwdom_fixup_p2m(addr); >> >> Why not use rc, as we do elsewhere in the function? > > hvm_hwdom_fixup_p2m() returns an errno, while rc in this context > contains X86EMUL_ values. I could indeed re-use rc, it just felt > wrong to mix different error address spaces on the same variable. Hmm, yes, I see. >>> --- a/xen/arch/x86/hvm/hvm.c >>> +++ b/xen/arch/x86/hvm/hvm.c >>> @@ -13,6 +13,7 @@ >>> #include <xen/lib.h> >>> #include <xen/trace.h> >>> #include <xen/sched.h> >>> +#include <xen/iocap.h> >>> #include <xen/irq.h> >>> #include <xen/softirq.h> >>> #include <xen/domain.h> >>> @@ -5458,6 +5459,36 @@ int hvm_copy_context_and_params(struct domain *dst, struct domain *src) >>> return rc; >>> } >>> >>> +bool __ro_after_init opt_dom0_pf_fixup; >>> +int hvm_hwdom_fixup_p2m(paddr_t addr) >> >> The placement here looks odd to me. Why not as static function in emulate.c? >> Or alternatively why not as p2m_hwdom_fixup() in mm/p2m.c? > > I don't have a strong opinion, if you are fine with it a static > function in emulate.c might be the best then. I'd be fine with either of the suggested options. mm/p2m.c is perhaps the more logical home for such a function, yet the option of having it static is quite appealing, too. Hence why I came to think of that one first. >>> +{ >>> + unsigned long gfn = paddr_to_pfn(addr); >>> + struct domain *currd = current->domain; >>> + p2m_type_t type; >>> + mfn_t mfn; >>> + int rc; >>> + >>> + ASSERT(is_hardware_domain(currd)); >>> + ASSERT(!altp2m_active(currd)); >>> + >>> + /* >>> + * Fixups are only applied for MMIO holes, and rely on the hardware domain >>> + * having identity mappings for non RAM regions (gfn == mfn). >>> + */ >>> + if ( !iomem_access_permitted(currd, gfn, gfn) || >>> + !is_memory_hole(_mfn(gfn), _mfn(gfn)) ) >>> + return -EPERM; >>> + >>> + mfn = get_gfn(currd, gfn, &type); >>> + if ( !mfn_eq(mfn, INVALID_MFN) || !p2m_is_hole(type) ) >>> + rc = mfn_eq(mfn, _mfn(gfn)) ? 0 : -EEXIST; >> >> I understand this is to cover the case where two vCPU-s access the same GFN >> at about the same time. However, the "success" log message at the call site >> being debug-only means we may be silently hiding bugs in release builds, if >> e.g. we get here despite the GFN having had an identity mapping already for >> ages. > > Possibly, but what would be your suggestion to fix this? I will think > about it, but I can't immediately see a solution that's not simply to > make the message printed by the caller to be gprintk() instead of > gdprintk() so catch such bugs. Would you agree to that? My thinking was that it might be best to propagate a distinguishable error code (perhaps -EEXIST, with its present use then replaced) out of the function, and make the choice of gprintk() vs gdprintk() depend on that. Accompanied by a comment explaining things a little. Jan
On 14/02/2025 9:29 am, Roger Pau Monne wrote: > diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc > index 9bbd00baef91..e9884de07e9e 100644 > --- a/docs/misc/xen-command-line.pandoc > +++ b/docs/misc/xen-command-line.pandoc > @@ -822,7 +822,8 @@ Specify the bit width of the DMA heap. > > ### dom0 > = List of [ pv | pvh, shadow=<bool>, verbose=<bool>, > - cpuid-faulting=<bool>, msr-relaxed=<bool> ] (x86) > + cpuid-faulting=<bool>, msr-relaxed=<bool>, > + pf-fixup=<bool> ] (x86) > > = List of [ sve=<integer> ] (Arm64) > Looking at this, we need to make dom0= disjoint between architectures like we do for other options. I'll do a patch. ~Andrew
On Fri, Feb 14, 2025 at 02:07:05PM +0100, Jan Beulich wrote: > On 14.02.2025 13:38, Roger Pau Monné wrote: > > On Fri, Feb 14, 2025 at 12:53:01PM +0100, Jan Beulich wrote: > >> On 14.02.2025 10:29, Roger Pau Monne wrote: > >>> +{ > >>> + unsigned long gfn = paddr_to_pfn(addr); > >>> + struct domain *currd = current->domain; > >>> + p2m_type_t type; > >>> + mfn_t mfn; > >>> + int rc; > >>> + > >>> + ASSERT(is_hardware_domain(currd)); > >>> + ASSERT(!altp2m_active(currd)); > >>> + > >>> + /* > >>> + * Fixups are only applied for MMIO holes, and rely on the hardware domain > >>> + * having identity mappings for non RAM regions (gfn == mfn). > >>> + */ > >>> + if ( !iomem_access_permitted(currd, gfn, gfn) || > >>> + !is_memory_hole(_mfn(gfn), _mfn(gfn)) ) > >>> + return -EPERM; > >>> + > >>> + mfn = get_gfn(currd, gfn, &type); > >>> + if ( !mfn_eq(mfn, INVALID_MFN) || !p2m_is_hole(type) ) > >>> + rc = mfn_eq(mfn, _mfn(gfn)) ? 0 : -EEXIST; > >> > >> I understand this is to cover the case where two vCPU-s access the same GFN > >> at about the same time. However, the "success" log message at the call site > >> being debug-only means we may be silently hiding bugs in release builds, if > >> e.g. we get here despite the GFN having had an identity mapping already for > >> ages. > > > > Possibly, but what would be your suggestion to fix this? I will think > > about it, but I can't immediately see a solution that's not simply to > > make the message printed by the caller to be gprintk() instead of > > gdprintk() so catch such bugs. Would you agree to that? > > My thinking was that it might be best to propagate a distinguishable error > code (perhaps -EEXIST, with its present use then replaced) out of the function, > and make the choice of gprintk() vs gdprintk() depend on that. Accompanied by a > comment explaining things a little. I think it would be easier if I just made those gprintk() instead of gdprintk(), all with severity XENLOG_DEBUG except for the one that reports the failure of the fixup function that is XENLOG_WARNING. Would you be OK with that? Thanks, Roger.
On 17.02.2025 09:25, Roger Pau Monné wrote: > On Fri, Feb 14, 2025 at 02:07:05PM +0100, Jan Beulich wrote: >> On 14.02.2025 13:38, Roger Pau Monné wrote: >>> On Fri, Feb 14, 2025 at 12:53:01PM +0100, Jan Beulich wrote: >>>> On 14.02.2025 10:29, Roger Pau Monne wrote: >>>>> +{ >>>>> + unsigned long gfn = paddr_to_pfn(addr); >>>>> + struct domain *currd = current->domain; >>>>> + p2m_type_t type; >>>>> + mfn_t mfn; >>>>> + int rc; >>>>> + >>>>> + ASSERT(is_hardware_domain(currd)); >>>>> + ASSERT(!altp2m_active(currd)); >>>>> + >>>>> + /* >>>>> + * Fixups are only applied for MMIO holes, and rely on the hardware domain >>>>> + * having identity mappings for non RAM regions (gfn == mfn). >>>>> + */ >>>>> + if ( !iomem_access_permitted(currd, gfn, gfn) || >>>>> + !is_memory_hole(_mfn(gfn), _mfn(gfn)) ) >>>>> + return -EPERM; >>>>> + >>>>> + mfn = get_gfn(currd, gfn, &type); >>>>> + if ( !mfn_eq(mfn, INVALID_MFN) || !p2m_is_hole(type) ) >>>>> + rc = mfn_eq(mfn, _mfn(gfn)) ? 0 : -EEXIST; >>>> >>>> I understand this is to cover the case where two vCPU-s access the same GFN >>>> at about the same time. However, the "success" log message at the call site >>>> being debug-only means we may be silently hiding bugs in release builds, if >>>> e.g. we get here despite the GFN having had an identity mapping already for >>>> ages. >>> >>> Possibly, but what would be your suggestion to fix this? I will think >>> about it, but I can't immediately see a solution that's not simply to >>> make the message printed by the caller to be gprintk() instead of >>> gdprintk() so catch such bugs. Would you agree to that? >> >> My thinking was that it might be best to propagate a distinguishable error >> code (perhaps -EEXIST, with its present use then replaced) out of the function, >> and make the choice of gprintk() vs gdprintk() depend on that. Accompanied by a >> comment explaining things a little. > > I think it would be easier if I just made those gprintk() instead of > gdprintk(), all with severity XENLOG_DEBUG except for the one that > reports the failure of the fixup function that is XENLOG_WARNING. > Would you be OK with that? Hmm. Okay-ish at best. Even if debug+guest-level messages are suppressed by default, I think it wouldn't be nice if many of them might appear in release builds with guest_loglevel=all. What I find difficult is to predict how high the chances are to see any of them (and then possibly multiple times). Jan
On Mon, Feb 17, 2025 at 09:44:28AM +0100, Jan Beulich wrote: > On 17.02.2025 09:25, Roger Pau Monné wrote: > > On Fri, Feb 14, 2025 at 02:07:05PM +0100, Jan Beulich wrote: > >> On 14.02.2025 13:38, Roger Pau Monné wrote: > >>> On Fri, Feb 14, 2025 at 12:53:01PM +0100, Jan Beulich wrote: > >>>> On 14.02.2025 10:29, Roger Pau Monne wrote: > >>>>> +{ > >>>>> + unsigned long gfn = paddr_to_pfn(addr); > >>>>> + struct domain *currd = current->domain; > >>>>> + p2m_type_t type; > >>>>> + mfn_t mfn; > >>>>> + int rc; > >>>>> + > >>>>> + ASSERT(is_hardware_domain(currd)); > >>>>> + ASSERT(!altp2m_active(currd)); > >>>>> + > >>>>> + /* > >>>>> + * Fixups are only applied for MMIO holes, and rely on the hardware domain > >>>>> + * having identity mappings for non RAM regions (gfn == mfn). > >>>>> + */ > >>>>> + if ( !iomem_access_permitted(currd, gfn, gfn) || > >>>>> + !is_memory_hole(_mfn(gfn), _mfn(gfn)) ) > >>>>> + return -EPERM; > >>>>> + > >>>>> + mfn = get_gfn(currd, gfn, &type); > >>>>> + if ( !mfn_eq(mfn, INVALID_MFN) || !p2m_is_hole(type) ) > >>>>> + rc = mfn_eq(mfn, _mfn(gfn)) ? 0 : -EEXIST; > >>>> > >>>> I understand this is to cover the case where two vCPU-s access the same GFN > >>>> at about the same time. However, the "success" log message at the call site > >>>> being debug-only means we may be silently hiding bugs in release builds, if > >>>> e.g. we get here despite the GFN having had an identity mapping already for > >>>> ages. > >>> > >>> Possibly, but what would be your suggestion to fix this? I will think > >>> about it, but I can't immediately see a solution that's not simply to > >>> make the message printed by the caller to be gprintk() instead of > >>> gdprintk() so catch such bugs. Would you agree to that? > >> > >> My thinking was that it might be best to propagate a distinguishable error > >> code (perhaps -EEXIST, with its present use then replaced) out of the function, > >> and make the choice of gprintk() vs gdprintk() depend on that. Accompanied by a > >> comment explaining things a little. > > > > I think it would be easier if I just made those gprintk() instead of > > gdprintk(), all with severity XENLOG_DEBUG except for the one that > > reports the failure of the fixup function that is XENLOG_WARNING. > > Would you be OK with that? > > Hmm. Okay-ish at best. Even if debug+guest-level messages are suppressed by > default, I think it wouldn't be nice if many of them might appear in release > builds with guest_loglevel=all. What I find difficult is to predict how high > the chances are to see any of them (and then possibly multiple times). I think getting those messages even in non-debug builds might be helpful for debugging purposes. Sometimes it's difficult for users to switch to a debug build of Xen if not provided by their upstream. FWIW, on my Intel NUC I see three of those: (XEN) [ 5.418855] arch/x86/hvm/emulate.c:391:d0v0 fixup p2m mapping for page fedc7 added (XEN) [ 5.474574] arch/x86/hvm/emulate.c:391:d0v0 fixup p2m mapping for page fd6a0 added (XEN) [ 8.712784] arch/x86/hvm/emulate.c:391:d0v2 fixup p2m mapping for page fe410 added Would you be fine with this approach: bool __ro_after_init opt_dom0_pf_fixup; static int hwdom_fixup_p2m(paddr_t addr) { unsigned long gfn = paddr_to_pfn(addr); struct domain *currd = current->domain; p2m_type_t type; mfn_t mfn; int rc; ASSERT(is_hardware_domain(currd)); ASSERT(!altp2m_active(currd)); /* * Fixups are only applied for MMIO holes, and rely on the hardware domain * having identity mappings for non RAM regions (gfn == mfn). */ if ( !iomem_access_permitted(currd, gfn, gfn) || !is_memory_hole(_mfn(gfn), _mfn(gfn)) ) return -EPERM; mfn = get_gfn(currd, gfn, &type); if ( !mfn_eq(mfn, INVALID_MFN) || !p2m_is_hole(type) ) rc = mfn_eq(mfn, _mfn(gfn)) ? -EEXIST : -ENOTEMPTY; else rc = set_mmio_p2m_entry(currd, _gfn(gfn), _mfn(gfn), 0); put_gfn(currd, gfn); return rc; } [...] int inner_rc = hwdom_fixup_p2m(addr); if ( !inner_rc || inner_rc == -EEXIST ) { gdprintk(XENLOG_DEBUG, "fixup p2m mapping for page %lx %s\n", paddr_to_pfn(addr), !inner_rc ? "added" : "already present"); rc = X86EMUL_RETRY; vio->req.state = STATE_IOREQ_NONE; break; } gprintk(XENLOG_WARNING, "unable to fixup memory %s to %#lx size %u: %d\n", dir ? "read" : "write", addr, size, inner_rc); Thanks, Roger.
On 17.02.2025 11:20, Roger Pau Monné wrote: > On Mon, Feb 17, 2025 at 09:44:28AM +0100, Jan Beulich wrote: >> On 17.02.2025 09:25, Roger Pau Monné wrote: >>> On Fri, Feb 14, 2025 at 02:07:05PM +0100, Jan Beulich wrote: >>>> On 14.02.2025 13:38, Roger Pau Monné wrote: >>>>> On Fri, Feb 14, 2025 at 12:53:01PM +0100, Jan Beulich wrote: >>>>>> On 14.02.2025 10:29, Roger Pau Monne wrote: >>>>>>> +{ >>>>>>> + unsigned long gfn = paddr_to_pfn(addr); >>>>>>> + struct domain *currd = current->domain; >>>>>>> + p2m_type_t type; >>>>>>> + mfn_t mfn; >>>>>>> + int rc; >>>>>>> + >>>>>>> + ASSERT(is_hardware_domain(currd)); >>>>>>> + ASSERT(!altp2m_active(currd)); >>>>>>> + >>>>>>> + /* >>>>>>> + * Fixups are only applied for MMIO holes, and rely on the hardware domain >>>>>>> + * having identity mappings for non RAM regions (gfn == mfn). >>>>>>> + */ >>>>>>> + if ( !iomem_access_permitted(currd, gfn, gfn) || >>>>>>> + !is_memory_hole(_mfn(gfn), _mfn(gfn)) ) >>>>>>> + return -EPERM; >>>>>>> + >>>>>>> + mfn = get_gfn(currd, gfn, &type); >>>>>>> + if ( !mfn_eq(mfn, INVALID_MFN) || !p2m_is_hole(type) ) >>>>>>> + rc = mfn_eq(mfn, _mfn(gfn)) ? 0 : -EEXIST; >>>>>> >>>>>> I understand this is to cover the case where two vCPU-s access the same GFN >>>>>> at about the same time. However, the "success" log message at the call site >>>>>> being debug-only means we may be silently hiding bugs in release builds, if >>>>>> e.g. we get here despite the GFN having had an identity mapping already for >>>>>> ages. >>>>> >>>>> Possibly, but what would be your suggestion to fix this? I will think >>>>> about it, but I can't immediately see a solution that's not simply to >>>>> make the message printed by the caller to be gprintk() instead of >>>>> gdprintk() so catch such bugs. Would you agree to that? >>>> >>>> My thinking was that it might be best to propagate a distinguishable error >>>> code (perhaps -EEXIST, with its present use then replaced) out of the function, >>>> and make the choice of gprintk() vs gdprintk() depend on that. Accompanied by a >>>> comment explaining things a little. >>> >>> I think it would be easier if I just made those gprintk() instead of >>> gdprintk(), all with severity XENLOG_DEBUG except for the one that >>> reports the failure of the fixup function that is XENLOG_WARNING. >>> Would you be OK with that? >> >> Hmm. Okay-ish at best. Even if debug+guest-level messages are suppressed by >> default, I think it wouldn't be nice if many of them might appear in release >> builds with guest_loglevel=all. What I find difficult is to predict how high >> the chances are to see any of them (and then possibly multiple times). > > I think getting those messages even in non-debug builds might be > helpful for debugging purposes. Sometimes it's difficult for users to > switch to a debug build of Xen if not provided by their upstream. > > FWIW, on my Intel NUC I see three of those: > > (XEN) [ 5.418855] arch/x86/hvm/emulate.c:391:d0v0 fixup p2m mapping for page fedc7 added > (XEN) [ 5.474574] arch/x86/hvm/emulate.c:391:d0v0 fixup p2m mapping for page fd6a0 added > (XEN) [ 8.712784] arch/x86/hvm/emulate.c:391:d0v2 fixup p2m mapping for page fe410 added For my understanding: Did Xen with a PVH Dom0 not work on the NUC before? Or else how come it survived without this fixing up of mappings? > Would you be fine with this approach: > > bool __ro_after_init opt_dom0_pf_fixup; > static int hwdom_fixup_p2m(paddr_t addr) > { > unsigned long gfn = paddr_to_pfn(addr); > struct domain *currd = current->domain; > p2m_type_t type; > mfn_t mfn; > int rc; > > ASSERT(is_hardware_domain(currd)); > ASSERT(!altp2m_active(currd)); > > /* > * Fixups are only applied for MMIO holes, and rely on the hardware domain > * having identity mappings for non RAM regions (gfn == mfn). > */ > if ( !iomem_access_permitted(currd, gfn, gfn) || > !is_memory_hole(_mfn(gfn), _mfn(gfn)) ) > return -EPERM; > > mfn = get_gfn(currd, gfn, &type); > if ( !mfn_eq(mfn, INVALID_MFN) || !p2m_is_hole(type) ) > rc = mfn_eq(mfn, _mfn(gfn)) ? -EEXIST : -ENOTEMPTY; > else > rc = set_mmio_p2m_entry(currd, _gfn(gfn), _mfn(gfn), 0); > put_gfn(currd, gfn); > > return rc; > } > [...] > int inner_rc = hwdom_fixup_p2m(addr); > > if ( !inner_rc || inner_rc == -EEXIST ) > { > gdprintk(XENLOG_DEBUG, > "fixup p2m mapping for page %lx %s\n", > paddr_to_pfn(addr), > !inner_rc ? "added" : "already present"); As before, I think the "already present" message wants to be present also in release build logs. As opposed to the "added" one. Yet at the same time, if e.g. you and Andrew agree on the shape above, I won't stand in the way. Jan
On Mon, Feb 17, 2025 at 11:27:52AM +0100, Jan Beulich wrote: > On 17.02.2025 11:20, Roger Pau Monné wrote: > > On Mon, Feb 17, 2025 at 09:44:28AM +0100, Jan Beulich wrote: > >> On 17.02.2025 09:25, Roger Pau Monné wrote: > >>> On Fri, Feb 14, 2025 at 02:07:05PM +0100, Jan Beulich wrote: > >>>> On 14.02.2025 13:38, Roger Pau Monné wrote: > >>>>> On Fri, Feb 14, 2025 at 12:53:01PM +0100, Jan Beulich wrote: > >>>>>> On 14.02.2025 10:29, Roger Pau Monne wrote: > >>>>>>> +{ > >>>>>>> + unsigned long gfn = paddr_to_pfn(addr); > >>>>>>> + struct domain *currd = current->domain; > >>>>>>> + p2m_type_t type; > >>>>>>> + mfn_t mfn; > >>>>>>> + int rc; > >>>>>>> + > >>>>>>> + ASSERT(is_hardware_domain(currd)); > >>>>>>> + ASSERT(!altp2m_active(currd)); > >>>>>>> + > >>>>>>> + /* > >>>>>>> + * Fixups are only applied for MMIO holes, and rely on the hardware domain > >>>>>>> + * having identity mappings for non RAM regions (gfn == mfn). > >>>>>>> + */ > >>>>>>> + if ( !iomem_access_permitted(currd, gfn, gfn) || > >>>>>>> + !is_memory_hole(_mfn(gfn), _mfn(gfn)) ) > >>>>>>> + return -EPERM; > >>>>>>> + > >>>>>>> + mfn = get_gfn(currd, gfn, &type); > >>>>>>> + if ( !mfn_eq(mfn, INVALID_MFN) || !p2m_is_hole(type) ) > >>>>>>> + rc = mfn_eq(mfn, _mfn(gfn)) ? 0 : -EEXIST; > >>>>>> > >>>>>> I understand this is to cover the case where two vCPU-s access the same GFN > >>>>>> at about the same time. However, the "success" log message at the call site > >>>>>> being debug-only means we may be silently hiding bugs in release builds, if > >>>>>> e.g. we get here despite the GFN having had an identity mapping already for > >>>>>> ages. > >>>>> > >>>>> Possibly, but what would be your suggestion to fix this? I will think > >>>>> about it, but I can't immediately see a solution that's not simply to > >>>>> make the message printed by the caller to be gprintk() instead of > >>>>> gdprintk() so catch such bugs. Would you agree to that? > >>>> > >>>> My thinking was that it might be best to propagate a distinguishable error > >>>> code (perhaps -EEXIST, with its present use then replaced) out of the function, > >>>> and make the choice of gprintk() vs gdprintk() depend on that. Accompanied by a > >>>> comment explaining things a little. > >>> > >>> I think it would be easier if I just made those gprintk() instead of > >>> gdprintk(), all with severity XENLOG_DEBUG except for the one that > >>> reports the failure of the fixup function that is XENLOG_WARNING. > >>> Would you be OK with that? > >> > >> Hmm. Okay-ish at best. Even if debug+guest-level messages are suppressed by > >> default, I think it wouldn't be nice if many of them might appear in release > >> builds with guest_loglevel=all. What I find difficult is to predict how high > >> the chances are to see any of them (and then possibly multiple times). > > > > I think getting those messages even in non-debug builds might be > > helpful for debugging purposes. Sometimes it's difficult for users to > > switch to a debug build of Xen if not provided by their upstream. > > > > FWIW, on my Intel NUC I see three of those: > > > > (XEN) [ 5.418855] arch/x86/hvm/emulate.c:391:d0v0 fixup p2m mapping for page fedc7 added > > (XEN) [ 5.474574] arch/x86/hvm/emulate.c:391:d0v0 fixup p2m mapping for page fd6a0 added > > (XEN) [ 8.712784] arch/x86/hvm/emulate.c:391:d0v2 fixup p2m mapping for page fe410 added > > For my understanding: Did Xen with a PVH Dom0 not work on the NUC before? Or > else how come it survived without this fixing up of mappings? I've got no idea what's in those addresses. It did survive and seems to work fine without those identity mappings in the p2m. I assume that returning ~0 for reads and ignoring writes what good enough. > > > Would you be fine with this approach: > > > > bool __ro_after_init opt_dom0_pf_fixup; > > static int hwdom_fixup_p2m(paddr_t addr) > > { > > unsigned long gfn = paddr_to_pfn(addr); > > struct domain *currd = current->domain; > > p2m_type_t type; > > mfn_t mfn; > > int rc; > > > > ASSERT(is_hardware_domain(currd)); > > ASSERT(!altp2m_active(currd)); > > > > /* > > * Fixups are only applied for MMIO holes, and rely on the hardware domain > > * having identity mappings for non RAM regions (gfn == mfn). > > */ > > if ( !iomem_access_permitted(currd, gfn, gfn) || > > !is_memory_hole(_mfn(gfn), _mfn(gfn)) ) > > return -EPERM; > > > > mfn = get_gfn(currd, gfn, &type); > > if ( !mfn_eq(mfn, INVALID_MFN) || !p2m_is_hole(type) ) > > rc = mfn_eq(mfn, _mfn(gfn)) ? -EEXIST : -ENOTEMPTY; > > else > > rc = set_mmio_p2m_entry(currd, _gfn(gfn), _mfn(gfn), 0); > > put_gfn(currd, gfn); > > > > return rc; > > } > > [...] > > int inner_rc = hwdom_fixup_p2m(addr); > > > > if ( !inner_rc || inner_rc == -EEXIST ) > > { > > gdprintk(XENLOG_DEBUG, > > "fixup p2m mapping for page %lx %s\n", > > paddr_to_pfn(addr), > > !inner_rc ? "added" : "already present"); > > As before, I think the "already present" message wants to be present also in > release build logs. As opposed to the "added" one. Yet at the same time, if > e.g. you and Andrew agree on the shape above, I won't stand in the way. I didn't want to add yet another level of indentation, as it then becomes: int inner_rc = hwdom_fixup_p2m(addr); if ( !inner_rc || inner_rc == -EEXIST ) { if ( !inner_rc ) gdprintk(XENLOG_DEBUG, "fixup p2m mapping for page %lx added\n", paddr_to_pfn(addr)); else gprintk(XENLOG_INFO, "fixup p2m mapping for page %lx already present\n", paddr_to_pfn(addr)); Would you be OK with the above proposal then? Thanks, Roger.
On 17.02.2025 11:51, Roger Pau Monné wrote: > On Mon, Feb 17, 2025 at 11:27:52AM +0100, Jan Beulich wrote: >> On 17.02.2025 11:20, Roger Pau Monné wrote: >>> On Mon, Feb 17, 2025 at 09:44:28AM +0100, Jan Beulich wrote: >>>> On 17.02.2025 09:25, Roger Pau Monné wrote: >>>>> On Fri, Feb 14, 2025 at 02:07:05PM +0100, Jan Beulich wrote: >>>>>> On 14.02.2025 13:38, Roger Pau Monné wrote: >>>>>>> On Fri, Feb 14, 2025 at 12:53:01PM +0100, Jan Beulich wrote: >>>>>>>> On 14.02.2025 10:29, Roger Pau Monne wrote: >>>>>>>>> +{ >>>>>>>>> + unsigned long gfn = paddr_to_pfn(addr); >>>>>>>>> + struct domain *currd = current->domain; >>>>>>>>> + p2m_type_t type; >>>>>>>>> + mfn_t mfn; >>>>>>>>> + int rc; >>>>>>>>> + >>>>>>>>> + ASSERT(is_hardware_domain(currd)); >>>>>>>>> + ASSERT(!altp2m_active(currd)); >>>>>>>>> + >>>>>>>>> + /* >>>>>>>>> + * Fixups are only applied for MMIO holes, and rely on the hardware domain >>>>>>>>> + * having identity mappings for non RAM regions (gfn == mfn). >>>>>>>>> + */ >>>>>>>>> + if ( !iomem_access_permitted(currd, gfn, gfn) || >>>>>>>>> + !is_memory_hole(_mfn(gfn), _mfn(gfn)) ) >>>>>>>>> + return -EPERM; >>>>>>>>> + >>>>>>>>> + mfn = get_gfn(currd, gfn, &type); >>>>>>>>> + if ( !mfn_eq(mfn, INVALID_MFN) || !p2m_is_hole(type) ) >>>>>>>>> + rc = mfn_eq(mfn, _mfn(gfn)) ? 0 : -EEXIST; >>>>>>>> >>>>>>>> I understand this is to cover the case where two vCPU-s access the same GFN >>>>>>>> at about the same time. However, the "success" log message at the call site >>>>>>>> being debug-only means we may be silently hiding bugs in release builds, if >>>>>>>> e.g. we get here despite the GFN having had an identity mapping already for >>>>>>>> ages. >>>>>>> >>>>>>> Possibly, but what would be your suggestion to fix this? I will think >>>>>>> about it, but I can't immediately see a solution that's not simply to >>>>>>> make the message printed by the caller to be gprintk() instead of >>>>>>> gdprintk() so catch such bugs. Would you agree to that? >>>>>> >>>>>> My thinking was that it might be best to propagate a distinguishable error >>>>>> code (perhaps -EEXIST, with its present use then replaced) out of the function, >>>>>> and make the choice of gprintk() vs gdprintk() depend on that. Accompanied by a >>>>>> comment explaining things a little. >>>>> >>>>> I think it would be easier if I just made those gprintk() instead of >>>>> gdprintk(), all with severity XENLOG_DEBUG except for the one that >>>>> reports the failure of the fixup function that is XENLOG_WARNING. >>>>> Would you be OK with that? >>>> >>>> Hmm. Okay-ish at best. Even if debug+guest-level messages are suppressed by >>>> default, I think it wouldn't be nice if many of them might appear in release >>>> builds with guest_loglevel=all. What I find difficult is to predict how high >>>> the chances are to see any of them (and then possibly multiple times). >>> >>> I think getting those messages even in non-debug builds might be >>> helpful for debugging purposes. Sometimes it's difficult for users to >>> switch to a debug build of Xen if not provided by their upstream. >>> >>> FWIW, on my Intel NUC I see three of those: >>> >>> (XEN) [ 5.418855] arch/x86/hvm/emulate.c:391:d0v0 fixup p2m mapping for page fedc7 added >>> (XEN) [ 5.474574] arch/x86/hvm/emulate.c:391:d0v0 fixup p2m mapping for page fd6a0 added >>> (XEN) [ 8.712784] arch/x86/hvm/emulate.c:391:d0v2 fixup p2m mapping for page fe410 added >> >> For my understanding: Did Xen with a PVH Dom0 not work on the NUC before? Or >> else how come it survived without this fixing up of mappings? > > I've got no idea what's in those addresses. It did survive and seems > to work fine without those identity mappings in the p2m. I assume > that returning ~0 for reads and ignoring writes what good enough. On one hand I find this concerning. Otoh this way we'll maybe learn what issues were so far papered over by said read/write behavior. (Of course there's also a small chance of this opening up new problem areas.) >>> Would you be fine with this approach: >>> >>> bool __ro_after_init opt_dom0_pf_fixup; >>> static int hwdom_fixup_p2m(paddr_t addr) >>> { >>> unsigned long gfn = paddr_to_pfn(addr); >>> struct domain *currd = current->domain; >>> p2m_type_t type; >>> mfn_t mfn; >>> int rc; >>> >>> ASSERT(is_hardware_domain(currd)); >>> ASSERT(!altp2m_active(currd)); >>> >>> /* >>> * Fixups are only applied for MMIO holes, and rely on the hardware domain >>> * having identity mappings for non RAM regions (gfn == mfn). >>> */ >>> if ( !iomem_access_permitted(currd, gfn, gfn) || >>> !is_memory_hole(_mfn(gfn), _mfn(gfn)) ) >>> return -EPERM; >>> >>> mfn = get_gfn(currd, gfn, &type); >>> if ( !mfn_eq(mfn, INVALID_MFN) || !p2m_is_hole(type) ) >>> rc = mfn_eq(mfn, _mfn(gfn)) ? -EEXIST : -ENOTEMPTY; >>> else >>> rc = set_mmio_p2m_entry(currd, _gfn(gfn), _mfn(gfn), 0); >>> put_gfn(currd, gfn); >>> >>> return rc; >>> } >>> [...] >>> int inner_rc = hwdom_fixup_p2m(addr); >>> >>> if ( !inner_rc || inner_rc == -EEXIST ) >>> { >>> gdprintk(XENLOG_DEBUG, >>> "fixup p2m mapping for page %lx %s\n", >>> paddr_to_pfn(addr), >>> !inner_rc ? "added" : "already present"); >> >> As before, I think the "already present" message wants to be present also in >> release build logs. As opposed to the "added" one. Yet at the same time, if >> e.g. you and Andrew agree on the shape above, I won't stand in the way. > > I didn't want to add yet another level of indentation, as it then > becomes: > > int inner_rc = hwdom_fixup_p2m(addr); > > if ( !inner_rc || inner_rc == -EEXIST ) > { > if ( !inner_rc ) > gdprintk(XENLOG_DEBUG, > "fixup p2m mapping for page %lx added\n", > paddr_to_pfn(addr)); > else > gprintk(XENLOG_INFO, > "fixup p2m mapping for page %lx already present\n", > paddr_to_pfn(addr)); > > Would you be OK with the above proposal then? Yes (with off-by-1 indentation corrected). It's unfortunate that this can't be written in a more compact form. Jan
diff --git a/CHANGELOG.md b/CHANGELOG.md index 1de1d1eca17f..e5e6ab3a8902 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -4,6 +4,16 @@ Notable changes to Xen will be documented in this file. The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/) +## [4.21.0 UNRELEASED](https://xenbits.xenproject.org/gitweb/?p=xen.git;a=shortlog;h=staging) - TBD + +### Changed + +### Added + - On x86: + - Option to attempt to fixup p2m page-faults on PVH dom0. + +### Removed + ## [4.20.0 UNRELEASED](https://xenbits.xenproject.org/gitweb/?p=xen.git;a=shortlog;h=staging) - TBD ### Changed diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc index 9bbd00baef91..e9884de07e9e 100644 --- a/docs/misc/xen-command-line.pandoc +++ b/docs/misc/xen-command-line.pandoc @@ -822,7 +822,8 @@ Specify the bit width of the DMA heap. ### dom0 = List of [ pv | pvh, shadow=<bool>, verbose=<bool>, - cpuid-faulting=<bool>, msr-relaxed=<bool> ] (x86) + cpuid-faulting=<bool>, msr-relaxed=<bool>, + pf-fixup=<bool> ] (x86) = List of [ sve=<integer> ] (Arm64) @@ -883,6 +884,19 @@ Controls for how dom0 is constructed on x86 systems. If using this option is necessary to fix an issue, please report a bug. +* The `pf-fixup` boolean is only applicable when using a PVH dom0 and + defaults to false. + + When running dom0 in PVH mode the dom0 kernel has no way to map MMIO + regions into the p2m, such mode relies on Xen dom0 builder populating + the p2m with all MMIO regions that dom0 should access. However Xen + doesn't have a complete picture of the host memory map, due to not + being able to process ACPI dynamic tables. + + The `pf-fixup` option allows Xen to attempt to add missing MMIO regions + to the p2m in response to page-faults generated by dom0 trying to access + unpopulated entries in the p2m. + Enables features on dom0 on Arm systems. * The `sve` integer parameter enables Arm SVE usage for Dom0 and sets the diff --git a/xen/arch/x86/dom0_build.c b/xen/arch/x86/dom0_build.c index d1b4ef83b2d0..34b6166f4922 100644 --- a/xen/arch/x86/dom0_build.c +++ b/xen/arch/x86/dom0_build.c @@ -286,6 +286,10 @@ int __init parse_arch_dom0_param(const char *s, const char *e) opt_dom0_cpuid_faulting = val; else if ( (val = parse_boolean("msr-relaxed", s, e)) >= 0 ) opt_dom0_msr_relaxed = val; +#ifdef CONFIG_HVM + else if ( (val = parse_boolean("pf-fixup", s, e)) >= 0 ) + opt_dom0_pf_fixup = val; +#endif else return -EINVAL; diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c index 8aa7e49c056c..aa16ed0e9cac 100644 --- a/xen/arch/x86/hvm/emulate.c +++ b/xen/arch/x86/hvm/emulate.c @@ -338,8 +338,38 @@ static int hvmemul_do_io( if ( !s ) { if ( is_mmio && is_hardware_domain(currd) ) - gdprintk(XENLOG_DEBUG, "unhandled memory %s to %#lx size %u\n", - dir ? "read" : "write", addr, size); + { + /* + * PVH dom0 is likely missing MMIO mappings on the p2m, due to + * the incomplete information Xen has about the memory layout. + * + * Either print a message to note dom0 attempted to access an + * unpopulated GPA, or try to fixup the p2m by creating an + * identity mapping for the faulting GPA. + */ + if ( opt_dom0_pf_fixup ) + { + int inner_rc = hvm_hwdom_fixup_p2m(addr); + + if ( !inner_rc ) + { + gdprintk(XENLOG_DEBUG, + "fixup p2m mapping for page %lx added\n", + paddr_to_pfn(addr)); + rc = X86EMUL_RETRY; + vio->req.state = STATE_IOREQ_NONE; + break; + } + + gprintk(XENLOG_WARNING, + "unable to fixup memory %s to %#lx size %u: %d\n", + dir ? "read" : "write", addr, size, inner_rc); + } + else + gdprintk(XENLOG_DEBUG, + "unhandled memory %s to %#lx size %u\n", + dir ? "read" : "write", addr, size); + } rc = hvm_process_io_intercept(&null_handler, &p); vio->req.state = STATE_IOREQ_NONE; } diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index 39e39ce4ce36..4505868f025c 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -13,6 +13,7 @@ #include <xen/lib.h> #include <xen/trace.h> #include <xen/sched.h> +#include <xen/iocap.h> #include <xen/irq.h> #include <xen/softirq.h> #include <xen/domain.h> @@ -5458,6 +5459,36 @@ int hvm_copy_context_and_params(struct domain *dst, struct domain *src) return rc; } +bool __ro_after_init opt_dom0_pf_fixup; +int hvm_hwdom_fixup_p2m(paddr_t addr) +{ + unsigned long gfn = paddr_to_pfn(addr); + struct domain *currd = current->domain; + p2m_type_t type; + mfn_t mfn; + int rc; + + ASSERT(is_hardware_domain(currd)); + ASSERT(!altp2m_active(currd)); + + /* + * Fixups are only applied for MMIO holes, and rely on the hardware domain + * having identity mappings for non RAM regions (gfn == mfn). + */ + if ( !iomem_access_permitted(currd, gfn, gfn) || + !is_memory_hole(_mfn(gfn), _mfn(gfn)) ) + return -EPERM; + + mfn = get_gfn(currd, gfn, &type); + if ( !mfn_eq(mfn, INVALID_MFN) || !p2m_is_hole(type) ) + rc = mfn_eq(mfn, _mfn(gfn)) ? 0 : -EEXIST; + else + rc = set_mmio_p2m_entry(currd, _gfn(gfn), _mfn(gfn), 0); + put_gfn(currd, gfn); + + return rc; +} + /* * Local variables: * mode: C diff --git a/xen/arch/x86/include/asm/hvm/hvm.h b/xen/arch/x86/include/asm/hvm/hvm.h index cad3a9427801..e084e1c7d665 100644 --- a/xen/arch/x86/include/asm/hvm/hvm.h +++ b/xen/arch/x86/include/asm/hvm/hvm.h @@ -772,6 +772,11 @@ static inline int hvm_vmtrace_reset(struct vcpu *v) return -EOPNOTSUPP; } +/* For PVH dom0: signal whether to attempt fixup of p2m page-faults. */ +extern bool opt_dom0_pf_fixup; +/* Attempt to fixup a p2m page-fault by adding an identity mapping entry. */ +int hvm_hwdom_fixup_p2m(paddr_t addr); + /* * Accessors for registers which have per-guest-type or per-vendor locations * (e.g. VMCS, msr load/save lists, VMCB, VMLOAD lazy, etc).
When building a PVH dom0 Xen attempts to map all (relevant) MMIO regions into the p2m for dom0 access. However the information Xen has about the host memory map is limited. Xen doesn't have access to any resources described in ACPI dynamic tables, and hence the p2m mappings provided might not be complete. PV doesn't suffer from this issue because a PV dom0 is capable of mapping into it's page-tables any address not explicitly banned in d->iomem_caps. Introduce a new command line options that allows Xen to attempt to fixup the p2m page-faults, by creating p2m identity maps in response to p2m page-faults. This is aimed as a workaround to small ACPI regions Xen doesn't know about. Note that missing large MMIO regions mapped in this way will lead to slowness due to the VM exit processing, plus the mappings will always use small pages. The ultimate aim is to attempt to bring better parity with a classic PV dom0. Note such fixup rely on the CPU doing the access to the unpopulated address. If the access is attempted from a device instead there's no possible way to fixup, as IOMMU page-fault are asynchronous. Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> --- Only slightly tested on my local PVH dom0 deployment. --- CHANGELOG.md | 10 +++++++++ docs/misc/xen-command-line.pandoc | 16 +++++++++++++- xen/arch/x86/dom0_build.c | 4 ++++ xen/arch/x86/hvm/emulate.c | 34 ++++++++++++++++++++++++++++-- xen/arch/x86/hvm/hvm.c | 31 +++++++++++++++++++++++++++ xen/arch/x86/include/asm/hvm/hvm.h | 5 +++++ 6 files changed, 97 insertions(+), 3 deletions(-)