Message ID | 20220725175013.893-1-andrew.cooper3@citrix.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | x86/pv: Inject #GP for implicit grant unmaps | expand |
On 25.07.2022 19:50, Andrew Cooper wrote: > This is a debug behaviour to identify buggy kernels. Crashing the domain is > the most unhelpful thing to do, because it discards the relevant context. > > Instead, inject #GP[0] like other permission errors in x86. In particular, > this lets the kernel provide a backtrace that's actually helpful to a > developer trying to figure out what's going wrong. > > As a bugfix, this always injects #GP[0] to current, not l1e_owner. It is not > l1e_owner's fault if dom0 using superpowers triggers an implicit unmap. > > Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Albeit preferably with ... > --- a/xen/arch/x86/mm.c > +++ b/xen/arch/x86/mm.c > @@ -1232,7 +1232,7 @@ void put_page_from_l1e(l1_pgentry_t l1e, struct domain *l1e_owner) > gdprintk(XENLOG_WARNING, > "Attempt to implicitly unmap a granted PTE %" PRIpte "\n", > l1e_get_intpte(l1e)); > - domain_crash(l1e_owner); > + pv_inject_hw_exception(TRAP_gp_fault, 0); > } > #endif ... the gdprintk() adjusted to also log l1e_owner. Jan
On 26/07/2022 07:29, Jan Beulich wrote: > On 25.07.2022 19:50, Andrew Cooper wrote: >> This is a debug behaviour to identify buggy kernels. Crashing the domain is >> the most unhelpful thing to do, because it discards the relevant context. >> >> Instead, inject #GP[0] like other permission errors in x86. In particular, >> this lets the kernel provide a backtrace that's actually helpful to a >> developer trying to figure out what's going wrong. >> >> As a bugfix, this always injects #GP[0] to current, not l1e_owner. It is not >> l1e_owner's fault if dom0 using superpowers triggers an implicit unmap. >> >> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> > Acked-by: Jan Beulich <jbeulich@suse.com> > > Albeit preferably with ... > >> --- a/xen/arch/x86/mm.c >> +++ b/xen/arch/x86/mm.c >> @@ -1232,7 +1232,7 @@ void put_page_from_l1e(l1_pgentry_t l1e, struct domain *l1e_owner) >> gdprintk(XENLOG_WARNING, >> "Attempt to implicitly unmap a granted PTE %" PRIpte "\n", >> l1e_get_intpte(l1e)); >> - domain_crash(l1e_owner); >> + pv_inject_hw_exception(TRAP_gp_fault, 0); >> } >> #endif > ... the gdprintk() adjusted to also log l1e_owner. Ok, how about this incremental? diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c index b3393385ffb6..74054fb5f4ee 100644 --- a/xen/arch/x86/mm.c +++ b/xen/arch/x86/mm.c @@ -1229,9 +1229,9 @@ void put_page_from_l1e(l1_pgentry_t l1e, struct domain *l1e_owner) if ( (l1e_get_flags(l1e) & _PAGE_GNTTAB) && !l1e_owner->is_shutting_down && !l1e_owner->is_dying ) { - gdprintk(XENLOG_WARNING, - "Attempt to implicitly unmap a granted PTE %" PRIpte "\n", - l1e_get_intpte(l1e)); + gprintk(XENLOG_WARNING, + "Attempt to implicitly %pd's gntmap PTE %" PRIpte "\n", + l1e_owner, l1e_get_intpte(l1e)); pv_inject_hw_exception(TRAP_gp_fault, 0); } #endif The printk() needs to not be omitted in release builds which happen to have this logic compiled in. ~Andrew
On 26.07.2022 13:51, Andrew Cooper wrote: > On 26/07/2022 07:29, Jan Beulich wrote: >> On 25.07.2022 19:50, Andrew Cooper wrote: >>> This is a debug behaviour to identify buggy kernels. Crashing the domain is >>> the most unhelpful thing to do, because it discards the relevant context. >>> >>> Instead, inject #GP[0] like other permission errors in x86. In particular, >>> this lets the kernel provide a backtrace that's actually helpful to a >>> developer trying to figure out what's going wrong. >>> >>> As a bugfix, this always injects #GP[0] to current, not l1e_owner. It is not >>> l1e_owner's fault if dom0 using superpowers triggers an implicit unmap. >>> >>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> >> Acked-by: Jan Beulich <jbeulich@suse.com> >> >> Albeit preferably with ... >> >>> --- a/xen/arch/x86/mm.c >>> +++ b/xen/arch/x86/mm.c >>> @@ -1232,7 +1232,7 @@ void put_page_from_l1e(l1_pgentry_t l1e, struct domain *l1e_owner) >>> gdprintk(XENLOG_WARNING, >>> "Attempt to implicitly unmap a granted PTE %" PRIpte "\n", >>> l1e_get_intpte(l1e)); >>> - domain_crash(l1e_owner); >>> + pv_inject_hw_exception(TRAP_gp_fault, 0); >>> } >>> #endif >> ... the gdprintk() adjusted to also log l1e_owner. > > Ok, how about this incremental? > > diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c > index b3393385ffb6..74054fb5f4ee 100644 > --- a/xen/arch/x86/mm.c > +++ b/xen/arch/x86/mm.c > @@ -1229,9 +1229,9 @@ void put_page_from_l1e(l1_pgentry_t l1e, struct > domain *l1e_owner) > if ( (l1e_get_flags(l1e) & _PAGE_GNTTAB) && > !l1e_owner->is_shutting_down && !l1e_owner->is_dying ) > { > - gdprintk(XENLOG_WARNING, > - "Attempt to implicitly unmap a granted PTE %" PRIpte "\n", > - l1e_get_intpte(l1e)); > + gprintk(XENLOG_WARNING, > + "Attempt to implicitly %pd's gntmap PTE %" PRIpte "\n", > + l1e_owner, l1e_get_intpte(l1e)); DYM to drop "unmap"? With it restored (or anything similar to that effect), fine with me. Jan
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c index 5b81d5fbdbb2..b3393385ffb6 100644 --- a/xen/arch/x86/mm.c +++ b/xen/arch/x86/mm.c @@ -1232,7 +1232,7 @@ void put_page_from_l1e(l1_pgentry_t l1e, struct domain *l1e_owner) gdprintk(XENLOG_WARNING, "Attempt to implicitly unmap a granted PTE %" PRIpte "\n", l1e_get_intpte(l1e)); - domain_crash(l1e_owner); + pv_inject_hw_exception(TRAP_gp_fault, 0); } #endif
This is a debug behaviour to identify buggy kernels. Crashing the domain is the most unhelpful thing to do, because it discards the relevant context. Instead, inject #GP[0] like other permission errors in x86. In particular, this lets the kernel provide a backtrace that's actually helpful to a developer trying to figure out what's going wrong. As a bugfix, this always injects #GP[0] to current, not l1e_owner. It is not l1e_owner's fault if dom0 using superpowers triggers an implicit unmap. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> --- CC: Jan Beulich <JBeulich@suse.com> CC: Roger Pau Monné <roger.pau@citrix.com> CC: Wei Liu <wl@xen.org> CC: Juergen Gross <jgross@suse.com> This is a prerequisite to investigating https://github.com/QubesOS/qubes-issues/issues/7631 which is looking like an error in Linux's gntdev driver. --- xen/arch/x86/mm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)