Message ID | 20210115142820.35224-4-roger.pau@citrix.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | x86/intr: guest interrupt handling fixes/cleanup | expand |
On 15.01.2021 15:28, Roger Pau Monne wrote: > When pins are cleared from either ISR or IRR as part of the > initialization sequence forward the clearing of those pins to the dpci > EOI handler, as it is equivalent to an EOI. Not doing so can bring the > interrupt controller state out of sync with the dpci handling logic, > that expects a notification when a pin has been EOI'ed. The question though is what this clearing of ISR and some of IRR during ICW1 is based upon: Going through various manuals (up-to-date from Nov 2020, intermediate, and all the way through to an old hard copy from 1993) I can't find a single mention of ICW1 having any effect on ISR or IRR, despite both soft copy ones being detailed about other state getting changed. There is "Following initialization, an interrupt request (IRQ) input must make a low-to-high transition to generate an interrupt", but I'm afraid this can be read to mean different things. And if it was meant to describe an effect on ISR and/or IRR, it would imo be in conflict with us keeping IRR bits of level triggered interrupts. > @@ -217,6 +219,24 @@ static void vpic_ioport_write( > } > > vpic->init_state = ((val & 3) << 2) | 1; > + vpic_update_int_output(vpic); > + vpic_unlock(vpic); > + > + /* > + * Forward the EOI of any pending or in service interrupt that has > + * been cleared from IRR or ISR, or else the dpci logic will get > + * out of sync with the state of the interrupt controller. > + */ > + while ( pending ) > + { > + unsigned int pin = __scanbit(pending, 8); > + > + ASSERT(pin < 8); > + hvm_dpci_eoi(current->domain, > + hvm_isa_irq_to_gsi((addr >> 7) ? (pin | 8) : pin)); > + __clear_bit(pin, &pending); > + } > + goto unmask; A similar consideration applies here (albeit just as an observation, as being orthogonal to your change): A PIC becomes ready for handling interrupts only at the end of the ICWx sequence. Hence it would seem to me that invoking pt_may_unmask_irq() (maybe also vpic_update_int_output()) at ICW1 is premature. To me this seems particularly relevant considering that ICW1 clears IMR, and hence an interrupt becoming pending between ICW1 and ICW2 wouldn't know which vector to use. Or maybe on that side of things, vpic_update_int_output() should really do if ( vpic->int_output == (!vpic->init_state && irq >= 0) ) return; /* INT line transition L->H or H->L. */ vpic->int_output = !vpic->init_state && !vpic->int_output; ? Jan
On Fri, Jan 22, 2021 at 10:02:15AM +0100, Jan Beulich wrote: > On 15.01.2021 15:28, Roger Pau Monne wrote: > > When pins are cleared from either ISR or IRR as part of the > > initialization sequence forward the clearing of those pins to the dpci > > EOI handler, as it is equivalent to an EOI. Not doing so can bring the > > interrupt controller state out of sync with the dpci handling logic, > > that expects a notification when a pin has been EOI'ed. > > The question though is what this clearing of ISR and some of > IRR during ICW1 is based upon: Going through various manuals > (up-to-date from Nov 2020, intermediate, and all the way > through to an old hard copy from 1993) I can't find a single > mention of ICW1 having any effect on ISR or IRR, despite both > soft copy ones being detailed about other state getting > changed. > > There is "Following initialization, an interrupt request (IRQ) > input must make a low-to-high transition to generate an > interrupt", but I'm afraid this can be read to mean different > things. And if it was meant to describe an effect on ISR > and/or IRR, it would imo be in conflict with us keeping IRR > bits of level triggered interrupts. I have an old copy of the 8259A spec, and it does mention the same quote that you have above. I also wondered while I was adjusting this code whether what we do is fine. I have to admit I haven't considered changing this logic much because I don't have an effective way to test it. I've also taken a look at what others do, QEMU for example will do exactly the same logic as Xen during ICW1, bhyve OTOH will fully zero IRR and leave ISR as is. Doing a bit of archaeology in QEMU I've found the following change: commit aa24822bdc7c4e74afbc6fa1324b01cf067da7cb Author: Jan Kiszka <jan.kiszka@siemens.com> Date: Tue Jan 24 16:29:29 2012 +0100 i8259: Do not clear level-triggered lines in IRR on init When an input line is handled as level-triggered, it will immediately raise an IRQ on the output of a PIC again that goes through an init reset. So only clear the edge-triggered inputs from IRR in that scenario. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> Which seems to point out there's a reasoning behind what it's currently implemented. This seems to be against the spec as there's no low-to-high transition? > > @@ -217,6 +219,24 @@ static void vpic_ioport_write( > > } > > > > vpic->init_state = ((val & 3) << 2) | 1; > > + vpic_update_int_output(vpic); > > + vpic_unlock(vpic); > > + > > + /* > > + * Forward the EOI of any pending or in service interrupt that has > > + * been cleared from IRR or ISR, or else the dpci logic will get > > + * out of sync with the state of the interrupt controller. > > + */ > > + while ( pending ) > > + { > > + unsigned int pin = __scanbit(pending, 8); > > + > > + ASSERT(pin < 8); > > + hvm_dpci_eoi(current->domain, > > + hvm_isa_irq_to_gsi((addr >> 7) ? (pin | 8) : pin)); > > + __clear_bit(pin, &pending); > > + } > > + goto unmask; > > A similar consideration applies here (albeit just as an > observation, as being orthogonal to your change): A PIC > becomes ready for handling interrupts only at the end of the > ICWx sequence. Hence it would seem to me that invoking > pt_may_unmask_irq() Right, it might be best to force unmask = 1 when init_state gets set to 0. A spurious call to pt_may_unmask_irq won't be harmful anyway, even if no pins have been actually unmasked. > (maybe also vpic_update_int_output()) at > ICW1 is premature. To me this seems particularly relevant > considering that ICW1 clears IMR, and hence an interrupt > becoming pending between ICW1 and ICW2 wouldn't know which > vector to use. > > Or maybe on that side of things, vpic_update_int_output() > should really do > > if ( vpic->int_output == (!vpic->init_state && irq >= 0) ) > return; > > /* INT line transition L->H or H->L. */ > vpic->int_output = !vpic->init_state && !vpic->int_output; > > ? So to force int_output = false when in init state. Seems reasonable, I can implement this either as a pre-patch or a followup, but again I'm not sure I have means to properly test it. Thanks, Roger.
On 22.01.2021 10:53, Roger Pau Monné wrote: > On Fri, Jan 22, 2021 at 10:02:15AM +0100, Jan Beulich wrote: >> On 15.01.2021 15:28, Roger Pau Monne wrote: >>> When pins are cleared from either ISR or IRR as part of the >>> initialization sequence forward the clearing of those pins to the dpci >>> EOI handler, as it is equivalent to an EOI. Not doing so can bring the >>> interrupt controller state out of sync with the dpci handling logic, >>> that expects a notification when a pin has been EOI'ed. >> >> The question though is what this clearing of ISR and some of >> IRR during ICW1 is based upon: Going through various manuals >> (up-to-date from Nov 2020, intermediate, and all the way >> through to an old hard copy from 1993) I can't find a single >> mention of ICW1 having any effect on ISR or IRR, despite both >> soft copy ones being detailed about other state getting >> changed. >> >> There is "Following initialization, an interrupt request (IRQ) >> input must make a low-to-high transition to generate an >> interrupt", but I'm afraid this can be read to mean different >> things. And if it was meant to describe an effect on ISR >> and/or IRR, it would imo be in conflict with us keeping IRR >> bits of level triggered interrupts. > > I have an old copy of the 8259A spec, and it does mention the same > quote that you have above. I also wondered while I was adjusting this > code whether what we do is fine. I have to admit I haven't considered > changing this logic much because I don't have an effective way to test > it. > > I've also taken a look at what others do, QEMU for example will do > exactly the same logic as Xen during ICW1, bhyve OTOH will fully zero > IRR and leave ISR as is. > > Doing a bit of archaeology in QEMU I've found the following change: > > commit aa24822bdc7c4e74afbc6fa1324b01cf067da7cb > Author: Jan Kiszka <jan.kiszka@siemens.com> > Date: Tue Jan 24 16:29:29 2012 +0100 > > i8259: Do not clear level-triggered lines in IRR on init > > When an input line is handled as level-triggered, it will immediately > raise an IRQ on the output of a PIC again that goes through an init > reset. So only clear the edge-triggered inputs from IRR in that > scenario. > > Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> > Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> > > Which seems to point out there's a reasoning behind what it's > currently implemented. This seems to be against the spec as there's no > low-to-high transition? Would seem so to me, yes. Knowing how hard it was back at the time to find any doc on the 8259A at all, and knowing that the doc that initially became available wasn't very complete, I could very well imagine the present doc still being incomplete. Let me see if I can get something out of Intel. >>> @@ -217,6 +219,24 @@ static void vpic_ioport_write( >>> } >>> >>> vpic->init_state = ((val & 3) << 2) | 1; >>> + vpic_update_int_output(vpic); >>> + vpic_unlock(vpic); >>> + >>> + /* >>> + * Forward the EOI of any pending or in service interrupt that has >>> + * been cleared from IRR or ISR, or else the dpci logic will get >>> + * out of sync with the state of the interrupt controller. >>> + */ >>> + while ( pending ) >>> + { >>> + unsigned int pin = __scanbit(pending, 8); >>> + >>> + ASSERT(pin < 8); >>> + hvm_dpci_eoi(current->domain, >>> + hvm_isa_irq_to_gsi((addr >> 7) ? (pin | 8) : pin)); >>> + __clear_bit(pin, &pending); >>> + } >>> + goto unmask; >> >> A similar consideration applies here (albeit just as an >> observation, as being orthogonal to your change): A PIC >> becomes ready for handling interrupts only at the end of the >> ICWx sequence. Hence it would seem to me that invoking >> pt_may_unmask_irq() > > Right, it might be best to force unmask = 1 when init_state gets set > to 0. A spurious call to pt_may_unmask_irq won't be harmful anyway, > even if no pins have been actually unmasked. Right, that's what I was considering too. (Really I was thinking to propagate the "unmasked" value from ICW1 processing to when init_state gets cleared, but you're right about a stray call to pt_may_unmask_irq() not doing any harm.) >> (maybe also vpic_update_int_output()) at >> ICW1 is premature. To me this seems particularly relevant >> considering that ICW1 clears IMR, and hence an interrupt >> becoming pending between ICW1 and ICW2 wouldn't know which >> vector to use. >> >> Or maybe on that side of things, vpic_update_int_output() >> should really do >> >> if ( vpic->int_output == (!vpic->init_state && irq >= 0) ) >> return; >> >> /* INT line transition L->H or H->L. */ >> vpic->int_output = !vpic->init_state && !vpic->int_output; >> >> ? > > So to force int_output = false when in init state. Seems reasonable, I > can implement this either as a pre-patch or a followup, but again I'm > not sure I have means to properly test it. That testing consideration goes as well for the correctness of what we have right now. It functions for the very limited set of modes typical OSes use, and likely nobody really knows whether it would also function correctly when used in "exotic" ways. As to pre-patch or follow-up: For this latter aspect it probably doesn't matter much. If any adjustment is made affecting the code further up, it may want to be a pre-patch though, as it may change how you want to exit (e.g. I'd expect the "goto" to possibly go away). Jan
diff --git a/xen/arch/x86/hvm/vpic.c b/xen/arch/x86/hvm/vpic.c index c1c1de7fd0..522cacdc4b 100644 --- a/xen/arch/x86/hvm/vpic.c +++ b/xen/arch/x86/hvm/vpic.c @@ -193,6 +193,8 @@ static void vpic_ioport_write( { if ( val & 0x10 ) { + unsigned int pending = vpic->isr | (vpic->irr & ~vpic->elcr); + /* ICW1 */ /* Clear edge-sensing logic. */ vpic->irr &= vpic->elcr; @@ -217,6 +219,24 @@ static void vpic_ioport_write( } vpic->init_state = ((val & 3) << 2) | 1; + vpic_update_int_output(vpic); + vpic_unlock(vpic); + + /* + * Forward the EOI of any pending or in service interrupt that has + * been cleared from IRR or ISR, or else the dpci logic will get + * out of sync with the state of the interrupt controller. + */ + while ( pending ) + { + unsigned int pin = __scanbit(pending, 8); + + ASSERT(pin < 8); + hvm_dpci_eoi(current->domain, + hvm_isa_irq_to_gsi((addr >> 7) ? (pin | 8) : pin)); + __clear_bit(pin, &pending); + } + goto unmask; } else if ( val & 0x08 ) { @@ -306,6 +326,7 @@ static void vpic_ioport_write( vpic_unlock(vpic); + unmask: if ( unmasked ) pt_may_unmask_irq(vpic_domain(vpic), NULL); }
When pins are cleared from either ISR or IRR as part of the initialization sequence forward the clearing of those pins to the dpci EOI handler, as it is equivalent to an EOI. Not doing so can bring the interrupt controller state out of sync with the dpci handling logic, that expects a notification when a pin has been EOI'ed. Fixes: 7b3cb5e5416 ('IRQ injection changes for HVM PCI passthru.') Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> --- xen/arch/x86/hvm/vpic.c | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+)