Message ID | 20191107150609.93004-3-roger.pau@citrix.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | x86/ioapic: fix clear_IO_APIC_pin when using interrupt remapping | expand |
On 07.11.2019 16:06, Roger Pau Monne wrote: > clear_IO_APIC_pin can be called after the iommu has been enabled, and > using raw entry reads and writes will result in a misconfiguration of > the entries already setup to use the interrupt remapping table. I'm afraid I don't understand this: Raw reads and writes don't even go to the IOMMU interrupt remapping code, so how would the assertion be triggered? > (XEN) [ 10.082154] ENABLING IO-APIC IRQs > (XEN) [ 10.087789] -> Using new ACK method > (XEN) [ 10.093738] Assertion 'get_rte_index(rte) == offset' failed at iommu_intr.c:328 > > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> "Reported-by: Sergey ..." ahead of this? > --- a/xen/arch/x86/io_apic.c > +++ b/xen/arch/x86/io_apic.c > @@ -514,13 +514,13 @@ static void clear_IO_APIC_pin(unsigned int apic, unsigned int pin) > entry.mask = 1; > __ioapic_write_entry(apic, pin, false, entry); > } > - entry = __ioapic_read_entry(apic, pin, true); > + entry = __ioapic_read_entry(apic, pin, false); > > if (entry.irr) { > /* Make sure the trigger mode is set to level. */ > if (!entry.trigger) { > entry.trigger = 1; > - __ioapic_write_entry(apic, pin, true, entry); > + __ioapic_write_entry(apic, pin, false, entry); > } All we do here is set the trigger bit. No translation back and forth of the RTE should be needed. > @@ -530,9 +530,9 @@ static void clear_IO_APIC_pin(unsigned int apic, unsigned int pin) > */ > memset(&entry, 0, sizeof(entry)); > entry.mask = 1; > - __ioapic_write_entry(apic, pin, true, entry); > + __ioapic_write_entry(apic, pin, false, entry); I may be able to understand why this one can't use raw mode, but as per above a better overall description is needed. > - entry = __ioapic_read_entry(apic, pin, true); > + entry = __ioapic_read_entry(apic, pin, false); > if (entry.irr) > printk(KERN_ERR "IO-APIC%02x-%u: Unable to reset IRR\n", > IO_APIC_ID(apic), pin); This read again shouldn't need conversion, as the IRR bit doesn't get touched (I think) by the interrupt remapping code during the translation it does. Jan
On Thu, Nov 07, 2019 at 04:28:56PM +0100, Jan Beulich wrote: > On 07.11.2019 16:06, Roger Pau Monne wrote: > > clear_IO_APIC_pin can be called after the iommu has been enabled, and > > using raw entry reads and writes will result in a misconfiguration of > > the entries already setup to use the interrupt remapping table. > > I'm afraid I don't understand this: Raw reads and writes don't even > go to the IOMMU interrupt remapping code, so how would the assertion > be triggered? Because the code does something like: memset(&rte, 0, ...); ... __ioapic_write_entry(apic, pin, true, rte); At which point you misconfigure an ioapic entry that was already setup to point to an interrupt remapping entry, and the AMD IOMMU code chokes in the assert below. > > > (XEN) [ 10.082154] ENABLING IO-APIC IRQs > > (XEN) [ 10.087789] -> Using new ACK method > > (XEN) [ 10.093738] Assertion 'get_rte_index(rte) == offset' failed at iommu_intr.c:328 > > > > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> > > "Reported-by: Sergey ..." ahead of this? Oh yes. > > --- a/xen/arch/x86/io_apic.c > > +++ b/xen/arch/x86/io_apic.c > > @@ -514,13 +514,13 @@ static void clear_IO_APIC_pin(unsigned int apic, unsigned int pin) > > entry.mask = 1; > > __ioapic_write_entry(apic, pin, false, entry); > > } > > - entry = __ioapic_read_entry(apic, pin, true); > > + entry = __ioapic_read_entry(apic, pin, false); > > > > if (entry.irr) { > > /* Make sure the trigger mode is set to level. */ > > if (!entry.trigger) { > > entry.trigger = 1; > > - __ioapic_write_entry(apic, pin, true, entry); > > + __ioapic_write_entry(apic, pin, false, entry); > > } > > All we do here is set the trigger bit. No translation back and forth > of the RTE should be needed. > > > @@ -530,9 +530,9 @@ static void clear_IO_APIC_pin(unsigned int apic, unsigned int pin) > > */ > > memset(&entry, 0, sizeof(entry)); > > entry.mask = 1; > > - __ioapic_write_entry(apic, pin, true, entry); > > + __ioapic_write_entry(apic, pin, false, entry); > > I may be able to understand why this one can't use raw mode, but as > per above a better overall description is needed. Yes, this is the one that's actually incorrect, but see my reasoning below. > > > - entry = __ioapic_read_entry(apic, pin, true); > > + entry = __ioapic_read_entry(apic, pin, false); > > if (entry.irr) > > printk(KERN_ERR "IO-APIC%02x-%u: Unable to reset IRR\n", > > IO_APIC_ID(apic), pin); > > This read again shouldn't need conversion, as the IRR bit doesn't > get touched (I think) by the interrupt remapping code during the > translation it does. TBH, I think raw mode should only be used by the iommu code in order to setup the entries to point to the interrupt remapping table, everything else shouldn't be using raw mode. While it's true that some of the cases here are safe to use raw mode I would discourage it's usage as it can lead to issues, and this is not a performance critical path anyway. Thanks, Roger.
On Thu, Nov 07, 2019 at 04:46:32PM +0100, Roger Pau Monné wrote: > On Thu, Nov 07, 2019 at 04:28:56PM +0100, Jan Beulich wrote: > > On 07.11.2019 16:06, Roger Pau Monne wrote: > > > clear_IO_APIC_pin can be called after the iommu has been enabled, and > > > using raw entry reads and writes will result in a misconfiguration of > > > the entries already setup to use the interrupt remapping table. > > > > I'm afraid I don't understand this: Raw reads and writes don't even > > go to the IOMMU interrupt remapping code, so how would the assertion > > be triggered? > > Because the code does something like: > > memset(&rte, 0, ...); > ... > __ioapic_write_entry(apic, pin, true, rte); > > At which point you misconfigure an ioapic entry that was already setup > to point to an interrupt remapping entry, and the AMD IOMMU code > chokes in the assert below. Just to clarify since I think my reply hasn't been fully clear, the ASSERT doesn't trigger in clear_IO_APIC_pin, but at a later point when the IO-APIC entry is configured. Thanks, Roger.
On 07.11.2019 16:46, Roger Pau Monné wrote: > On Thu, Nov 07, 2019 at 04:28:56PM +0100, Jan Beulich wrote: >> On 07.11.2019 16:06, Roger Pau Monne wrote: >>> @@ -530,9 +530,9 @@ static void clear_IO_APIC_pin(unsigned int apic, unsigned int pin) >>> */ >>> memset(&entry, 0, sizeof(entry)); >>> entry.mask = 1; >>> - __ioapic_write_entry(apic, pin, true, entry); >>> + __ioapic_write_entry(apic, pin, false, entry); >> >> I may be able to understand why this one can't use raw mode, but as >> per above a better overall description is needed. > > Yes, this is the one that's actually incorrect, but see my reasoning > below. > >> >>> - entry = __ioapic_read_entry(apic, pin, true); >>> + entry = __ioapic_read_entry(apic, pin, false); >>> if (entry.irr) >>> printk(KERN_ERR "IO-APIC%02x-%u: Unable to reset IRR\n", >>> IO_APIC_ID(apic), pin); >> >> This read again shouldn't need conversion, as the IRR bit doesn't >> get touched (I think) by the interrupt remapping code during the >> translation it does. > > TBH, I think raw mode should only be used by the iommu code in order > to setup the entries to point to the interrupt remapping table, > everything else shouldn't be using raw mode. While it's true that some > of the cases here are safe to use raw mode I would discourage it's > usage as it can lead to issues, and this is not a performance critical > path anyway. You also should take the other possible perspective - not using raw mode means going through interrupt remapping logic, which can (needlessly) trigger errors. I think you want to break the patch into a necessary and an optional part. The optional part should be discussed separately and deferred until after 4.13. Jan
On Thu, Nov 07, 2019 at 04:56:09PM +0100, Jan Beulich wrote: > On 07.11.2019 16:46, Roger Pau Monné wrote: > > On Thu, Nov 07, 2019 at 04:28:56PM +0100, Jan Beulich wrote: > >> On 07.11.2019 16:06, Roger Pau Monne wrote: > >>> @@ -530,9 +530,9 @@ static void clear_IO_APIC_pin(unsigned int apic, unsigned int pin) > >>> */ > >>> memset(&entry, 0, sizeof(entry)); > >>> entry.mask = 1; > >>> - __ioapic_write_entry(apic, pin, true, entry); > >>> + __ioapic_write_entry(apic, pin, false, entry); > >> > >> I may be able to understand why this one can't use raw mode, but as > >> per above a better overall description is needed. > > > > Yes, this is the one that's actually incorrect, but see my reasoning > > below. > > > >> > >>> - entry = __ioapic_read_entry(apic, pin, true); > >>> + entry = __ioapic_read_entry(apic, pin, false); > >>> if (entry.irr) > >>> printk(KERN_ERR "IO-APIC%02x-%u: Unable to reset IRR\n", > >>> IO_APIC_ID(apic), pin); > >> > >> This read again shouldn't need conversion, as the IRR bit doesn't > >> get touched (I think) by the interrupt remapping code during the > >> translation it does. > > > > TBH, I think raw mode should only be used by the iommu code in order > > to setup the entries to point to the interrupt remapping table, > > everything else shouldn't be using raw mode. While it's true that some > > of the cases here are safe to use raw mode I would discourage it's > > usage as it can lead to issues, and this is not a performance critical > > path anyway. > > You also should take the other possible perspective - not using > raw mode means going through interrupt remapping logic, which > can (needlessly) trigger errors. I think you want to break the > patch into a necessary and an optional part. The optional part > should be discussed separately and deferred until after 4.13. IMO generic IO-APIC code has not business playing with raw entries when interrupt remapping is enabled, the layout of IO-APIC entries in that case is vendor-specific, and hence the generic IO-APIC code is not able to parse it. For example the code in clear_IO_APIC_pin modifies the mask or the trigger fields of RAW entries, is there any guarantee that those fields don't have different meanings/layout when interrupt remapping is enabled? I can split the specific bugfix into a separate patch, but IMO the code in clear_IO_APIC_pin is not safe. Thanks, Roger.
On 08.11.2019 10:27, Roger Pau Monné wrote: > On Thu, Nov 07, 2019 at 04:56:09PM +0100, Jan Beulich wrote: >> On 07.11.2019 16:46, Roger Pau Monné wrote: >>> On Thu, Nov 07, 2019 at 04:28:56PM +0100, Jan Beulich wrote: >>>> On 07.11.2019 16:06, Roger Pau Monne wrote: >>>>> @@ -530,9 +530,9 @@ static void clear_IO_APIC_pin(unsigned int apic, unsigned int pin) >>>>> */ >>>>> memset(&entry, 0, sizeof(entry)); >>>>> entry.mask = 1; >>>>> - __ioapic_write_entry(apic, pin, true, entry); >>>>> + __ioapic_write_entry(apic, pin, false, entry); >>>> >>>> I may be able to understand why this one can't use raw mode, but as >>>> per above a better overall description is needed. >>> >>> Yes, this is the one that's actually incorrect, but see my reasoning >>> below. >>> >>>> >>>>> - entry = __ioapic_read_entry(apic, pin, true); >>>>> + entry = __ioapic_read_entry(apic, pin, false); >>>>> if (entry.irr) >>>>> printk(KERN_ERR "IO-APIC%02x-%u: Unable to reset IRR\n", >>>>> IO_APIC_ID(apic), pin); >>>> >>>> This read again shouldn't need conversion, as the IRR bit doesn't >>>> get touched (I think) by the interrupt remapping code during the >>>> translation it does. >>> >>> TBH, I think raw mode should only be used by the iommu code in order >>> to setup the entries to point to the interrupt remapping table, >>> everything else shouldn't be using raw mode. While it's true that some >>> of the cases here are safe to use raw mode I would discourage it's >>> usage as it can lead to issues, and this is not a performance critical >>> path anyway. >> >> You also should take the other possible perspective - not using >> raw mode means going through interrupt remapping logic, which >> can (needlessly) trigger errors. I think you want to break the >> patch into a necessary and an optional part. The optional part >> should be discussed separately and deferred until after 4.13. > > IMO generic IO-APIC code has not business playing with raw entries > when interrupt remapping is enabled, the layout of IO-APIC entries in > that case is vendor-specific, and hence the generic IO-APIC code is > not able to parse it. > > For example the code in clear_IO_APIC_pin modifies the mask or the > trigger fields of RAW entries, is there any guarantee that those > fields don't have different meanings/layout when interrupt remapping > is enabled? From an abstract pov there's no such guarantee, but in practice the meaning of the fields doesn't change. You make a good point though nevertheless: For VT-d the trigger mode fields in RTE and IRTE need to match, so the interrupt remapping code needs to see the trigger mode change. See below for a possible alternative patch. > I can split the specific bugfix into a separate patch, but IMO the > code in clear_IO_APIC_pin is not safe. A change is needed, yes, but in particular because of the use of the function from clear_IO_APIC(), in turn called from disable_IO_APIC(), yet in turn used e.g. during emergency shutdown after a crash, I'd like the function to do as simple operations as possible, i.e. specifically avoid going through interrupt remapping code (because its data structures may also be corrupted at that point) unless really needed (hence the alternative patch suggestion below). As an aside, iommu_crash_shutdown() - even if actually doing something, i.e. disabling interrupt remapping - does _not_ cause the RTEs to be re-written in non-translated format. Jan --- unstable.orig/xen/arch/x86/io_apic.c +++ unstable/xen/arch/x86/io_apic.c @@ -519,8 +519,9 @@ static void clear_IO_APIC_pin(unsigned i if (entry.irr) { /* Make sure the trigger mode is set to level. */ if (!entry.trigger) { + entry = __ioapic_read_entry(apic, pin, false); entry.trigger = 1; - __ioapic_write_entry(apic, pin, TRUE, entry); + __ioapic_write_entry(apic, pin, false, entry); } __io_apic_eoi(apic, entry.vector, pin); } @@ -530,7 +531,7 @@ static void clear_IO_APIC_pin(unsigned i */ memset(&entry, 0, sizeof(entry)); entry.mask = 1; - __ioapic_write_entry(apic, pin, TRUE, entry); + __ioapic_write_entry(apic, pin, false, entry); entry = __ioapic_read_entry(apic, pin, TRUE); if (entry.irr)
On Fri, Nov 08, 2019 at 11:16:26AM +0100, Jan Beulich wrote: > On 08.11.2019 10:27, Roger Pau Monné wrote: > > On Thu, Nov 07, 2019 at 04:56:09PM +0100, Jan Beulich wrote: > >> On 07.11.2019 16:46, Roger Pau Monné wrote: > >>> On Thu, Nov 07, 2019 at 04:28:56PM +0100, Jan Beulich wrote: > >>>> On 07.11.2019 16:06, Roger Pau Monne wrote: > >>>>> @@ -530,9 +530,9 @@ static void clear_IO_APIC_pin(unsigned int apic, unsigned int pin) > >>>>> */ > >>>>> memset(&entry, 0, sizeof(entry)); > >>>>> entry.mask = 1; > >>>>> - __ioapic_write_entry(apic, pin, true, entry); > >>>>> + __ioapic_write_entry(apic, pin, false, entry); > >>>> > >>>> I may be able to understand why this one can't use raw mode, but as > >>>> per above a better overall description is needed. > >>> > >>> Yes, this is the one that's actually incorrect, but see my reasoning > >>> below. > >>> > >>>> > >>>>> - entry = __ioapic_read_entry(apic, pin, true); > >>>>> + entry = __ioapic_read_entry(apic, pin, false); > >>>>> if (entry.irr) > >>>>> printk(KERN_ERR "IO-APIC%02x-%u: Unable to reset IRR\n", > >>>>> IO_APIC_ID(apic), pin); > >>>> > >>>> This read again shouldn't need conversion, as the IRR bit doesn't > >>>> get touched (I think) by the interrupt remapping code during the > >>>> translation it does. > >>> > >>> TBH, I think raw mode should only be used by the iommu code in order > >>> to setup the entries to point to the interrupt remapping table, > >>> everything else shouldn't be using raw mode. While it's true that some > >>> of the cases here are safe to use raw mode I would discourage it's > >>> usage as it can lead to issues, and this is not a performance critical > >>> path anyway. > >> > >> You also should take the other possible perspective - not using > >> raw mode means going through interrupt remapping logic, which > >> can (needlessly) trigger errors. I think you want to break the > >> patch into a necessary and an optional part. The optional part > >> should be discussed separately and deferred until after 4.13. > > > > IMO generic IO-APIC code has not business playing with raw entries > > when interrupt remapping is enabled, the layout of IO-APIC entries in > > that case is vendor-specific, and hence the generic IO-APIC code is > > not able to parse it. > > > > For example the code in clear_IO_APIC_pin modifies the mask or the > > trigger fields of RAW entries, is there any guarantee that those > > fields don't have different meanings/layout when interrupt remapping > > is enabled? > > From an abstract pov there's no such guarantee, but in practice > the meaning of the fields doesn't change. You make a good point > though nevertheless: For VT-d the trigger mode fields in RTE and > IRTE need to match, so the interrupt remapping code needs to see > the trigger mode change. See below for a possible alternative > patch. > > > I can split the specific bugfix into a separate patch, but IMO the > > code in clear_IO_APIC_pin is not safe. > > A change is needed, yes, but in particular because of the use of > the function from clear_IO_APIC(), in turn called from > disable_IO_APIC(), yet in turn used e.g. during emergency > shutdown after a crash, I'd like the function to do as simple > operations as possible, i.e. specifically avoid going through > interrupt remapping code (because its data structures may also > be corrupted at that point) unless really needed (hence the > alternative patch suggestion below). Isn't just masking the entries fine when disabling the IO-APIC, or a full wipe of all the entries is required? I would be fine with having a maskall_IO_APIC function that reads entries in raw format, sets the mask bit and writes the entry back in raw format as long as it's annotated that this is done in order to limit as much a possible the chances of hitting corrupted data in the crash case. > As an aside, iommu_crash_shutdown() - even if actually doing > something, i.e. disabling interrupt remapping - does _not_ > cause the RTEs to be re-written in non-translated format. I'm not sure whether that will work, it's possible that some entries can't be translated because they use x2APIC IDs, and thus don't fit in a non-translated IO-APIC entry. > Jan > > --- unstable.orig/xen/arch/x86/io_apic.c > +++ unstable/xen/arch/x86/io_apic.c > @@ -519,8 +519,9 @@ static void clear_IO_APIC_pin(unsigned i > if (entry.irr) { > /* Make sure the trigger mode is set to level. */ > if (!entry.trigger) { > + entry = __ioapic_read_entry(apic, pin, false); > entry.trigger = 1; > - __ioapic_write_entry(apic, pin, TRUE, entry); > + __ioapic_write_entry(apic, pin, false, entry); > } > __io_apic_eoi(apic, entry.vector, pin); > } > @@ -530,7 +531,7 @@ static void clear_IO_APIC_pin(unsigned i > */ > memset(&entry, 0, sizeof(entry)); > entry.mask = 1; > - __ioapic_write_entry(apic, pin, TRUE, entry); > + __ioapic_write_entry(apic, pin, false, entry); > > entry = __ioapic_read_entry(apic, pin, TRUE); > if (entry.irr) Well, this is certainly better than what's there currently, and should fix the issue reported by Sergey, albeit I still think checking the irr or the trigger fields of a raw entry when using interrupt remapping is not safe future wise. There's no guarantee that future interrupt remapping implementations don't clobber the non-translated fields with different ones when using a remapped IO-APIC entry, and hence while this fixes the current issue at hand it seems fragile. Anyway, I think this should be fixed ASAP, so if you are happy with this version that's fine for me. Do you want me to pick this up and rebase it on top of my TRUE/FALSE removal patch, or would you rather send it formally standalone? Thanks, Roger.
On 08.11.2019 17:07, Roger Pau Monné wrote: > On Fri, Nov 08, 2019 at 11:16:26AM +0100, Jan Beulich wrote: >> On 08.11.2019 10:27, Roger Pau Monné wrote: >>> On Thu, Nov 07, 2019 at 04:56:09PM +0100, Jan Beulich wrote: >>>> On 07.11.2019 16:46, Roger Pau Monné wrote: >>>>> On Thu, Nov 07, 2019 at 04:28:56PM +0100, Jan Beulich wrote: >>>>>> On 07.11.2019 16:06, Roger Pau Monne wrote: >>>>>>> @@ -530,9 +530,9 @@ static void clear_IO_APIC_pin(unsigned int apic, unsigned int pin) >>>>>>> */ >>>>>>> memset(&entry, 0, sizeof(entry)); >>>>>>> entry.mask = 1; >>>>>>> - __ioapic_write_entry(apic, pin, true, entry); >>>>>>> + __ioapic_write_entry(apic, pin, false, entry); >>>>>> >>>>>> I may be able to understand why this one can't use raw mode, but as >>>>>> per above a better overall description is needed. >>>>> >>>>> Yes, this is the one that's actually incorrect, but see my reasoning >>>>> below. >>>>> >>>>>> >>>>>>> - entry = __ioapic_read_entry(apic, pin, true); >>>>>>> + entry = __ioapic_read_entry(apic, pin, false); >>>>>>> if (entry.irr) >>>>>>> printk(KERN_ERR "IO-APIC%02x-%u: Unable to reset IRR\n", >>>>>>> IO_APIC_ID(apic), pin); >>>>>> >>>>>> This read again shouldn't need conversion, as the IRR bit doesn't >>>>>> get touched (I think) by the interrupt remapping code during the >>>>>> translation it does. >>>>> >>>>> TBH, I think raw mode should only be used by the iommu code in order >>>>> to setup the entries to point to the interrupt remapping table, >>>>> everything else shouldn't be using raw mode. While it's true that some >>>>> of the cases here are safe to use raw mode I would discourage it's >>>>> usage as it can lead to issues, and this is not a performance critical >>>>> path anyway. >>>> >>>> You also should take the other possible perspective - not using >>>> raw mode means going through interrupt remapping logic, which >>>> can (needlessly) trigger errors. I think you want to break the >>>> patch into a necessary and an optional part. The optional part >>>> should be discussed separately and deferred until after 4.13. >>> >>> IMO generic IO-APIC code has not business playing with raw entries >>> when interrupt remapping is enabled, the layout of IO-APIC entries in >>> that case is vendor-specific, and hence the generic IO-APIC code is >>> not able to parse it. >>> >>> For example the code in clear_IO_APIC_pin modifies the mask or the >>> trigger fields of RAW entries, is there any guarantee that those >>> fields don't have different meanings/layout when interrupt remapping >>> is enabled? >> >> From an abstract pov there's no such guarantee, but in practice >> the meaning of the fields doesn't change. You make a good point >> though nevertheless: For VT-d the trigger mode fields in RTE and >> IRTE need to match, so the interrupt remapping code needs to see >> the trigger mode change. See below for a possible alternative >> patch. >> >>> I can split the specific bugfix into a separate patch, but IMO the >>> code in clear_IO_APIC_pin is not safe. >> >> A change is needed, yes, but in particular because of the use of >> the function from clear_IO_APIC(), in turn called from >> disable_IO_APIC(), yet in turn used e.g. during emergency >> shutdown after a crash, I'd like the function to do as simple >> operations as possible, i.e. specifically avoid going through >> interrupt remapping code (because its data structures may also >> be corrupted at that point) unless really needed (hence the >> alternative patch suggestion below). > > Isn't just masking the entries fine when disabling the IO-APIC, or a > full wipe of all the entries is required? Just masking the RTE _should_ be fine (but you never know). > I would be fine with having a maskall_IO_APIC function that reads > entries in raw format, sets the mask bit and writes the entry back in > raw format as long as it's annotated that this is done in order to > limit as much a possible the chances of hitting corrupted data in the > crash case. Right, certainly something we may want to do for 4.14. >> As an aside, iommu_crash_shutdown() - even if actually doing >> something, i.e. disabling interrupt remapping - does _not_ >> cause the RTEs to be re-written in non-translated format. > > I'm not sure whether that will work, it's possible that some entries > can't be translated because they use x2APIC IDs, and thus don't fit in > a non-translated IO-APIC entry. And of course there's no expectation that interrupts would still work, but any inspection (e.g. via dumping) of the RTEs would be misleading at that point. >> --- unstable.orig/xen/arch/x86/io_apic.c >> +++ unstable/xen/arch/x86/io_apic.c >> @@ -519,8 +519,9 @@ static void clear_IO_APIC_pin(unsigned i >> if (entry.irr) { >> /* Make sure the trigger mode is set to level. */ >> if (!entry.trigger) { >> + entry = __ioapic_read_entry(apic, pin, false); >> entry.trigger = 1; >> - __ioapic_write_entry(apic, pin, TRUE, entry); >> + __ioapic_write_entry(apic, pin, false, entry); >> } >> __io_apic_eoi(apic, entry.vector, pin); >> } >> @@ -530,7 +531,7 @@ static void clear_IO_APIC_pin(unsigned i >> */ >> memset(&entry, 0, sizeof(entry)); >> entry.mask = 1; >> - __ioapic_write_entry(apic, pin, TRUE, entry); >> + __ioapic_write_entry(apic, pin, false, entry); >> >> entry = __ioapic_read_entry(apic, pin, TRUE); >> if (entry.irr) > > Well, this is certainly better than what's there currently, and should > fix the issue reported by Sergey, albeit I still think checking the > irr or the trigger fields of a raw entry when using interrupt > remapping is not safe future wise. > > There's no guarantee that future interrupt remapping implementations > don't clobber the non-translated fields with different ones when using > a remapped IO-APIC entry, and hence while this fixes the current issue > at hand it seems fragile. > > Anyway, I think this should be fixed ASAP, so if you are happy with > this version that's fine for me. Do you want me to pick this up and > rebase it on top of my TRUE/FALSE removal patch, or would you rather > send it formally standalone? I'd appreciate you making this a v2 of your original patch, ideally with a further improved description. Jan
diff --git a/xen/arch/x86/io_apic.c b/xen/arch/x86/io_apic.c index b9c66acdb3..13b41b46a3 100644 --- a/xen/arch/x86/io_apic.c +++ b/xen/arch/x86/io_apic.c @@ -514,13 +514,13 @@ static void clear_IO_APIC_pin(unsigned int apic, unsigned int pin) entry.mask = 1; __ioapic_write_entry(apic, pin, false, entry); } - entry = __ioapic_read_entry(apic, pin, true); + entry = __ioapic_read_entry(apic, pin, false); if (entry.irr) { /* Make sure the trigger mode is set to level. */ if (!entry.trigger) { entry.trigger = 1; - __ioapic_write_entry(apic, pin, true, entry); + __ioapic_write_entry(apic, pin, false, entry); } __io_apic_eoi(apic, entry.vector, pin); } @@ -530,9 +530,9 @@ static void clear_IO_APIC_pin(unsigned int apic, unsigned int pin) */ memset(&entry, 0, sizeof(entry)); entry.mask = 1; - __ioapic_write_entry(apic, pin, true, entry); + __ioapic_write_entry(apic, pin, false, entry); - entry = __ioapic_read_entry(apic, pin, true); + entry = __ioapic_read_entry(apic, pin, false); if (entry.irr) printk(KERN_ERR "IO-APIC%02x-%u: Unable to reset IRR\n", IO_APIC_ID(apic), pin);
clear_IO_APIC_pin can be called after the iommu has been enabled, and using raw entry reads and writes will result in a misconfiguration of the entries already setup to use the interrupt remapping table. This fixes the following panic seen on AMD Rome boxes: (XEN) [ 10.082154] ENABLING IO-APIC IRQs (XEN) [ 10.087789] -> Using new ACK method (XEN) [ 10.093738] Assertion 'get_rte_index(rte) == offset' failed at iommu_intr.c:328 Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> --- Cc: Juergen Gross <jgross@suse.com> --- xen/arch/x86/io_apic.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)