Message ID | 10fb0354-9018-a714-44b7-efda1b579aa1@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v3,01/14] AMD/IOMMU: free more memory when cleaning up after error | expand |
On 16/07/2019 17:41, Jan Beulich wrote: > When there are sufficiently many devices listed in the ACPI tables (no > matter if they actually exist), output may take way longer than the > watchdog would like. > > Signed-off-by: Jan Beulich <jbeulich@suse.com> > --- > v3: New. > --- > TBD: Seeing the volume of output I wonder whether we should further > suppress logging headers of devices which have no active entry > (i.e. emit the header only upon finding the first IRTE worth > logging). And while minor for the total volume of output I'm > also unconvinced logging both a "per device" header line and a > "shared" one makes sense, when only one of the two can actually > be followed by actual contents. I don't have a system I can access at the moment, so can't judge how bad it is right now. However, I would advocate the removal of irrelevant information. Either way, this is debugging so Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> As an observation, I wonder whether continually sprinkling process_pending_softirqs() is the best thing to do for keyhandlers. We've got a number of other which incur the wrath of the watchdog (grant table in particular), which in practice means they are typically broken when they are actually used for debugging production. As these are for debugging only, might it be a better idea to stop the watchdog while keyhandlers are running? The only useful thing we actually manage here is to stop the watchdog killing us. ~Andrew
On Tue, Jul 16, 2019 at 04:41:21PM +0000, Jan Beulich wrote: > When there are sufficiently many devices listed in the ACPI tables (no > matter if they actually exist), output may take way longer than the > watchdog would like. > > Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Brian Woods <brian.woods@amd.com> > --- > v3: New. > --- > TBD: Seeing the volume of output I wonder whether we should further > suppress logging headers of devices which have no active entry > (i.e. emit the header only upon finding the first IRTE worth > logging). And while minor for the total volume of output I'm > also unconvinced logging both a "per device" header line and a > "shared" one makes sense, when only one of the two can actually > be followed by actual contents. > > --- a/xen/drivers/passthrough/amd/iommu_intr.c > +++ b/xen/drivers/passthrough/amd/iommu_intr.c > @@ -22,6 +22,7 @@ > #include <asm/hvm/svm/amd-iommu-proto.h> > #include <asm/io_apic.h> > #include <xen/keyhandler.h> > +#include <xen/softirq.h> > > struct irte_basic { > bool remap_en:1; > @@ -917,6 +918,8 @@ static int dump_intremap_mapping(const s > dump_intremap_table(iommu, ivrs_mapping->intremap_table); > spin_unlock_irqrestore(&(ivrs_mapping->intremap_lock), flags); > > + process_pending_softirqs(); > + > return 0; > } > >
On 19.07.2019 19:55, Andrew Cooper wrote: > On 16/07/2019 17:41, Jan Beulich wrote: >> When there are sufficiently many devices listed in the ACPI tables (no >> matter if they actually exist), output may take way longer than the >> watchdog would like. >> >> Signed-off-by: Jan Beulich <jbeulich@suse.com> >> --- >> v3: New. >> --- >> TBD: Seeing the volume of output I wonder whether we should further >> suppress logging headers of devices which have no active entry >> (i.e. emit the header only upon finding the first IRTE worth >> logging). And while minor for the total volume of output I'm >> also unconvinced logging both a "per device" header line and a >> "shared" one makes sense, when only one of the two can actually >> be followed by actual contents. > > I don't have a system I can access at the moment, so can't judge how bad > it is right now. However, I would advocate the removal of irrelevant > information. I'll try to get to putting together another patch to this effect. > Either way, this is debugging so Acked-by: Andrew Cooper > <andrew.cooper3@citrix.com> Thanks, also for all the other review of this series! > As an observation, I wonder whether continually sprinkling > process_pending_softirqs() is the best thing to do for keyhandlers. > We've got a number of other which incur the wrath of the watchdog (grant > table in particular), which in practice means they are typically broken > when they are actually used for debugging production. > > As these are for debugging only, might it be a better idea to stop the > watchdog while keyhandlers are running? The only useful thing we > actually manage here is to stop the watchdog killing us. Hmm, I would agree going this route if the watchdog could be disabled on a per-CPU basis, but right now watchdog_disable() is a system wide action. Jan
On 22/07/2019 09:49, Jan Beulich wrote: > On 19.07.2019 19:55, Andrew Cooper wrote: >> On 16/07/2019 17:41, Jan Beulich wrote: >> As an observation, I wonder whether continually sprinkling >> process_pending_softirqs() is the best thing to do for keyhandlers. >> We've got a number of other which incur the wrath of the watchdog (grant >> table in particular), which in practice means they are typically broken >> when they are actually used for debugging production. >> >> As these are for debugging only, might it be a better idea to stop the >> watchdog while keyhandlers are running? The only useful thing we >> actually manage here is to stop the watchdog killing us. > Hmm, I would agree going this route if the watchdog could be disabled > on a per-CPU basis, but right now watchdog_disable() is a system wide > action. It needs to be disabled system-wide. Disabling only the local CPU will still cause a watchdog timeout on other CPUs which are waiting on the current CPU to complete some action. Most keyhandlers run with interrupts enabled so we will be fine WRT TLB flushes, etc, but things like vcpu_pause() will block until softirqs are processed again, and we need to prevent those CPUs from taking a timeout. For other CPUs which really are having problems, the timeout will still trip 5 seconds after the keyhandler completes, and we'll still get a backtrace out of it. ~Andrew
--- a/xen/drivers/passthrough/amd/iommu_intr.c +++ b/xen/drivers/passthrough/amd/iommu_intr.c @@ -22,6 +22,7 @@ #include <asm/hvm/svm/amd-iommu-proto.h> #include <asm/io_apic.h> #include <xen/keyhandler.h> +#include <xen/softirq.h> struct irte_basic { bool remap_en:1; @@ -917,6 +918,8 @@ static int dump_intremap_mapping(const s dump_intremap_table(iommu, ivrs_mapping->intremap_table); spin_unlock_irqrestore(&(ivrs_mapping->intremap_lock), flags); + process_pending_softirqs(); + return 0; }
When there are sufficiently many devices listed in the ACPI tables (no matter if they actually exist), output may take way longer than the watchdog would like. Signed-off-by: Jan Beulich <jbeulich@suse.com> --- v3: New. --- TBD: Seeing the volume of output I wonder whether we should further suppress logging headers of devices which have no active entry (i.e. emit the header only upon finding the first IRTE worth logging). And while minor for the total volume of output I'm also unconvinced logging both a "per device" header line and a "shared" one makes sense, when only one of the two can actually be followed by actual contents.