Message ID | 20211207094427.3473-1-damien.hedde@greensocs.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2,for,6.2?] gicv3: fix ICH_MISR's LRENP computation | expand |
On 12/7/21 10:44, Damien Hedde wrote: > According to the "Arm Generic Interrupt Controller Architecture > Specification GIC architecture version 3 and 4" (version G: page 345 > for aarch64 or 509 for aarch32): > LRENP bit of ICH_MISR is set when ICH_HCR.LRENPIE==1 and > ICH_HCR.EOIcount is non-zero. > > When only LRENPIE was set (and EOI count was zero), the LRENP bit was > wrongly set and MISR value was wrong. > > As an additional consequence, if an hypervisor set ICH_HCR.LRENPIE, > the maintenance interrupt was constantly fired. It happens since patch > 9cee1efe92 ("hw/intc: Set GIC maintenance interrupt level to only 0 or 1") > which fixed another bug about maintenance interrupt (most significant > bits of misr, including this one, were ignored in the interrupt trigger). > > Fixes: 83f036fe3d ("hw/intc/arm_gicv3: Add accessors for ICH_ system registers") This commit predates 6.1 release, so technically this is not a regression for 6.2. > Signed-off-by: Damien Hedde <damien.hedde@greensocs.com> > --- > The gic doc is available here: > https://developer.arm.com/documentation/ihi0069/g > > v2: identical resend because subject screw-up (sorry) > > Thanks, > Damien > --- > hw/intc/arm_gicv3_cpuif.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/hw/intc/arm_gicv3_cpuif.c b/hw/intc/arm_gicv3_cpuif.c > index 7fba931450..85fc369e55 100644 > --- a/hw/intc/arm_gicv3_cpuif.c > +++ b/hw/intc/arm_gicv3_cpuif.c > @@ -351,7 +351,8 @@ static uint32_t maintenance_interrupt_state(GICv3CPUState *cs) > /* Scan list registers and fill in the U, NP and EOI bits */ > eoi_maintenance_interrupt_state(cs, &value); > > - if (cs->ich_hcr_el2 & (ICH_HCR_EL2_LRENPIE | ICH_HCR_EL2_EOICOUNT_MASK)) { > + if ((cs->ich_hcr_el2 & ICH_HCR_EL2_LRENPIE) && > + (cs->ich_hcr_el2 & ICH_HCR_EL2_EOICOUNT_MASK)) { > value |= ICH_MISR_EL2_LRENP; > } > >
On 12/7/21 13:45, Philippe Mathieu-Daudé wrote: > On 12/7/21 10:44, Damien Hedde wrote: >> According to the "Arm Generic Interrupt Controller Architecture >> Specification GIC architecture version 3 and 4" (version G: page 345 >> for aarch64 or 509 for aarch32): >> LRENP bit of ICH_MISR is set when ICH_HCR.LRENPIE==1 and >> ICH_HCR.EOIcount is non-zero. >> >> When only LRENPIE was set (and EOI count was zero), the LRENP bit was >> wrongly set and MISR value was wrong. >> >> As an additional consequence, if an hypervisor set ICH_HCR.LRENPIE, >> the maintenance interrupt was constantly fired. It happens since patch >> 9cee1efe92 ("hw/intc: Set GIC maintenance interrupt level to only 0 or 1") >> which fixed another bug about maintenance interrupt (most significant >> bits of misr, including this one, were ignored in the interrupt trigger). >> >> Fixes: 83f036fe3d ("hw/intc/arm_gicv3: Add accessors for ICH_ system registers") > > This commit predates 6.1 release, so technically this is not > a regression for 6.2. Do you mean "Fixes:" is meant only for regression or simply that this patch should not go for 6.2 ? 9cee1efe92 was introduced after 6.1, and changed the interrupt behavior. Thought I'm not sure if we can consider this as a fix for 9cee1efe92: it only makes the previous error more visible. Damien
On Tue, 7 Dec 2021 at 13:05, Damien Hedde <damien.hedde@greensocs.com> wrote: > On 12/7/21 13:45, Philippe Mathieu-Daudé wrote: > > On 12/7/21 10:44, Damien Hedde wrote: > >> According to the "Arm Generic Interrupt Controller Architecture > >> Specification GIC architecture version 3 and 4" (version G: page 345 > >> for aarch64 or 509 for aarch32): > >> LRENP bit of ICH_MISR is set when ICH_HCR.LRENPIE==1 and > >> ICH_HCR.EOIcount is non-zero. > >> > >> When only LRENPIE was set (and EOI count was zero), the LRENP bit was > >> wrongly set and MISR value was wrong. > >> > >> As an additional consequence, if an hypervisor set ICH_HCR.LRENPIE, > >> the maintenance interrupt was constantly fired. It happens since patch > >> 9cee1efe92 ("hw/intc: Set GIC maintenance interrupt level to only 0 or 1") > >> which fixed another bug about maintenance interrupt (most significant > >> bits of misr, including this one, were ignored in the interrupt trigger). > >> > >> Fixes: 83f036fe3d ("hw/intc/arm_gicv3: Add accessors for ICH_ system registers") > > > > This commit predates 6.1 release, so technically this is not > > a regression for 6.2. > > Do you mean "Fixes:" is meant only for regression or simply that this > patch should not go for 6.2 ? Fixes: is fine in all situations where the commit is fixing a bug that was introduced in the commit hash it mentions. Separately, given where we are in the release cycle, a patch has to hit a very high bar to go into 6.2: at least "this breaks a real world use case that worked fine in 6.1", and probably also "a use case that we expect a fair number of users to be using". -- PMM
On Tue, 7 Dec 2021 at 09:44, Damien Hedde <damien.hedde@greensocs.com> wrote: > > According to the "Arm Generic Interrupt Controller Architecture > Specification GIC architecture version 3 and 4" (version G: page 345 > for aarch64 or 509 for aarch32): > LRENP bit of ICH_MISR is set when ICH_HCR.LRENPIE==1 and > ICH_HCR.EOIcount is non-zero. > > When only LRENPIE was set (and EOI count was zero), the LRENP bit was > wrongly set and MISR value was wrong. > > As an additional consequence, if an hypervisor set ICH_HCR.LRENPIE, > the maintenance interrupt was constantly fired. It happens since patch > 9cee1efe92 ("hw/intc: Set GIC maintenance interrupt level to only 0 or 1") > which fixed another bug about maintenance interrupt (most significant > bits of misr, including this one, were ignored in the interrupt trigger). > > Fixes: 83f036fe3d ("hw/intc/arm_gicv3: Add accessors for ICH_ system registers") > Signed-off-by: Damien Hedde <damien.hedde@greensocs.com> > --- > The gic doc is available here: > https://developer.arm.com/documentation/ihi0069/g > > v2: identical resend because subject screw-up (sorry) Reviewed-by: Peter Maydell <peter.maydell@linaro.org> I won't try to put this into 6.2 unless you have a common guest that runs into this bug. thanks -- PMM
> -----Original Message----- > From: Qemu-devel <qemu-devel-bounces+bcain=quicinc.com@nongnu.org> > On Behalf Of Peter Maydell ... > On Tue, 7 Dec 2021 at 09:44, Damien Hedde <damien.hedde@greensocs.com> > wrote: > > > > According to the "Arm Generic Interrupt Controller Architecture > > Specification GIC architecture version 3 and 4" (version G: page 345 > > for aarch64 or 509 for aarch32): > > LRENP bit of ICH_MISR is set when ICH_HCR.LRENPIE==1 and > > ICH_HCR.EOIcount is non-zero. > > > > When only LRENPIE was set (and EOI count was zero), the LRENP bit was > > wrongly set and MISR value was wrong. > > > > As an additional consequence, if an hypervisor set ICH_HCR.LRENPIE, > > the maintenance interrupt was constantly fired. It happens since patch > > 9cee1efe92 ("hw/intc: Set GIC maintenance interrupt level to only 0 or 1") > > which fixed another bug about maintenance interrupt (most significant > > bits of misr, including this one, were ignored in the interrupt trigger). > > > > Fixes: 83f036fe3d ("hw/intc/arm_gicv3: Add accessors for ICH_ system > registers") > > Signed-off-by: Damien Hedde <damien.hedde@greensocs.com> > > --- > > The gic doc is available here: > > https://developer.arm.com/documentation/ihi0069/g > > > > v2: identical resend because subject screw-up (sorry) > > Reviewed-by: Peter Maydell <peter.maydell@linaro.org> > > I won't try to put this into 6.2 unless you have a common guest > that runs into this bug. Peter, I know that Qualcomm encounters this issue with its hypervisor (https://github.com/quic/gunyah-hypervisor). Apologies for not being familiar -- "common guest" means multiple guest systems/OSs that encounter the issue? Does that mean that it would not suffice to demonstrate the issue for the one known case? -Brian
On 12/7/21 15:21, Peter Maydell wrote: > On Tue, 7 Dec 2021 at 09:44, Damien Hedde <damien.hedde@greensocs.com> wrote: >> >> According to the "Arm Generic Interrupt Controller Architecture >> Specification GIC architecture version 3 and 4" (version G: page 345 >> for aarch64 or 509 for aarch32): >> LRENP bit of ICH_MISR is set when ICH_HCR.LRENPIE==1 and >> ICH_HCR.EOIcount is non-zero. >> >> When only LRENPIE was set (and EOI count was zero), the LRENP bit was >> wrongly set and MISR value was wrong. >> >> As an additional consequence, if an hypervisor set ICH_HCR.LRENPIE, >> the maintenance interrupt was constantly fired. It happens since patch >> 9cee1efe92 ("hw/intc: Set GIC maintenance interrupt level to only 0 or 1") >> which fixed another bug about maintenance interrupt (most significant >> bits of misr, including this one, were ignored in the interrupt trigger). >> >> Fixes: 83f036fe3d ("hw/intc/arm_gicv3: Add accessors for ICH_ system registers") >> Signed-off-by: Damien Hedde <damien.hedde@greensocs.com> >> --- >> The gic doc is available here: >> https://developer.arm.com/documentation/ihi0069/g >> >> v2: identical resend because subject screw-up (sorry) > > Reviewed-by: Peter Maydell <peter.maydell@linaro.org> > > I won't try to put this into 6.2 unless you have a common guest > that runs into this bug. > > thanks > -- PMM > I don't know if this fit into "common guest" but my use case is: > ./build/qemu-system-aarch64 \ > -machine virt,virtualization=on,gic-version=3,highmem=off \ > -cpu max -m size=4G -smp cpus=8 -nographic \ > -kernel hypvm.elf \ > -device loader,file=Image,addr=0x41080000 \ > -device loader,file=virt_512M.dtb,addr=0x44200000 where Image is a buildroot compiled kernel and hypvm.elf is an hypervisor from qualcomm (https://github.com/quic/gunyah-hypervisor). It boots fine on v6.0 or v6.1 but hangs on master. It's the same hypervisor Brian is talking about. Thanks, Damien
On Tue, 7 Dec 2021 at 15:18, Brian Cain <bcain@quicinc.com> wrote: > Peter Maydell wrote: > > I won't try to put this into 6.2 unless you have a common guest > > that runs into this bug. > I know that Qualcomm encounters this issue with its hypervisor (https://github.com/quic/gunyah-hypervisor). Apologies for not being familiar -- "common guest" means multiple guest systems/OSs that encounter the issue? Does that mean that it would not suffice to demonstrate the issue for the one known case? It means "if you see this with a Linux, BSD etc guest that's more important than if you see this with some oddball thing nobody else is using and whose users aren't as likely to be using release versions of QEMU rather than mainline". The bug is a bug in any case and we'll fix it, it's just a question of whether it meets the bar to go into 6.2, which is hopefully going to have its final RC tagged today. If this patch had arrived a week ago then the bar would have been lower and it would definitely have gone in. As it is I have to weigh up the chances of this change causing a regression for eg KVM running on emulated QEMU. thanks -- PMM
> -----Original Message----- > From: Peter Maydell <peter.maydell@linaro.org> ... > On Tue, 7 Dec 2021 at 15:18, Brian Cain <bcain@quicinc.com> wrote: > > Peter Maydell wrote: > > > I won't try to put this into 6.2 unless you have a common guest > > > that runs into this bug. > > > I know that Qualcomm encounters this issue with its hypervisor > (https://github.com/quic/gunyah-hypervisor). Apologies for not being familiar > -- "common guest" means multiple guest systems/OSs that encounter the > issue? Does that mean that it would not suffice to demonstrate the issue for > the one known case? > > It means "if you see this with a Linux, BSD etc guest that's > more important than if you see this with some oddball thing > nobody else is using and whose users aren't as likely to be > using release versions of QEMU rather than mainline". Ok, gotcha, thanks for the clarification :) > The bug is a bug in any case and we'll fix it, it's just a > question of whether it meets the bar to go into 6.2, which is > hopefully going to have its final RC tagged today. If this > patch had arrived a week ago then the bar would have been > lower and it would definitely have gone in. As it is I have > to weigh up the chances of this change causing a regression > for eg KVM running on emulated QEMU. I understand, and it sounds like the right call. Thanks! -Brian
On Tue, 7 Dec 2021 at 15:24, Peter Maydell <peter.maydell@linaro.org> wrote: > The bug is a bug in any case and we'll fix it, it's just a > question of whether it meets the bar to go into 6.2, which is > hopefully going to have its final RC tagged today. If this > patch had arrived a week ago then the bar would have been > lower and it would definitely have gone in. As it is I have > to weigh up the chances of this change causing a regression > for eg KVM running on emulated QEMU. Looking at the KVM source it doesn't ever set the LRENPIE bit (it doesn't even have a #define for it), which both explains why we didn't notice this bug before and also means we can be pretty certain we're not going to cause a regression for KVM at least if we fix it... -- PMM
On 12/7/21 16:45, Peter Maydell wrote: > On Tue, 7 Dec 2021 at 15:24, Peter Maydell <peter.maydell@linaro.org> wrote: >> The bug is a bug in any case and we'll fix it, it's just a >> question of whether it meets the bar to go into 6.2, which is >> hopefully going to have its final RC tagged today. If this >> patch had arrived a week ago then the bar would have been >> lower and it would definitely have gone in. As it is I have >> to weigh up the chances of this change causing a regression >> for eg KVM running on emulated QEMU. > > Looking at the KVM source it doesn't ever set the LRENPIE > bit (it doesn't even have a #define for it), which both > explains why we didn't notice this bug before and also > means we can be pretty certain we're not going to cause a > regression for KVM at least if we fix it... > > -- PMM > We are perfectly fine with this not going into 6.2. -- Damien
On Tue, 7 Dec 2021 at 15:49, Damien Hedde <damien.hedde@greensocs.com> wrote: > > > > On 12/7/21 16:45, Peter Maydell wrote: > > On Tue, 7 Dec 2021 at 15:24, Peter Maydell <peter.maydell@linaro.org> wrote: > >> The bug is a bug in any case and we'll fix it, it's just a > >> question of whether it meets the bar to go into 6.2, which is > >> hopefully going to have its final RC tagged today. If this > >> patch had arrived a week ago then the bar would have been > >> lower and it would definitely have gone in. As it is I have > >> to weigh up the chances of this change causing a regression > >> for eg KVM running on emulated QEMU. > > > > Looking at the KVM source it doesn't ever set the LRENPIE > > bit (it doesn't even have a #define for it), which both > > explains why we didn't notice this bug before and also > > means we can be pretty certain we're not going to cause a > > regression for KVM at least if we fix it... > We are perfectly fine with this not going into 6.2. I thought about it a bit more, and realized that we could end up giving KVM spurious maintenance interrupts even though it doesn't set the LRENPIE bit, because the incorrect OR meant we'd send a maint irq whenever the EOIcount was nonzero. So we've put this fix in for 6.2. Thanks for the patch and the discussion. -- PMM
diff --git a/hw/intc/arm_gicv3_cpuif.c b/hw/intc/arm_gicv3_cpuif.c index 7fba931450..85fc369e55 100644 --- a/hw/intc/arm_gicv3_cpuif.c +++ b/hw/intc/arm_gicv3_cpuif.c @@ -351,7 +351,8 @@ static uint32_t maintenance_interrupt_state(GICv3CPUState *cs) /* Scan list registers and fill in the U, NP and EOI bits */ eoi_maintenance_interrupt_state(cs, &value); - if (cs->ich_hcr_el2 & (ICH_HCR_EL2_LRENPIE | ICH_HCR_EL2_EOICOUNT_MASK)) { + if ((cs->ich_hcr_el2 & ICH_HCR_EL2_LRENPIE) && + (cs->ich_hcr_el2 & ICH_HCR_EL2_EOICOUNT_MASK)) { value |= ICH_MISR_EL2_LRENP; }
According to the "Arm Generic Interrupt Controller Architecture Specification GIC architecture version 3 and 4" (version G: page 345 for aarch64 or 509 for aarch32): LRENP bit of ICH_MISR is set when ICH_HCR.LRENPIE==1 and ICH_HCR.EOIcount is non-zero. When only LRENPIE was set (and EOI count was zero), the LRENP bit was wrongly set and MISR value was wrong. As an additional consequence, if an hypervisor set ICH_HCR.LRENPIE, the maintenance interrupt was constantly fired. It happens since patch 9cee1efe92 ("hw/intc: Set GIC maintenance interrupt level to only 0 or 1") which fixed another bug about maintenance interrupt (most significant bits of misr, including this one, were ignored in the interrupt trigger). Fixes: 83f036fe3d ("hw/intc/arm_gicv3: Add accessors for ICH_ system registers") Signed-off-by: Damien Hedde <damien.hedde@greensocs.com> --- The gic doc is available here: https://developer.arm.com/documentation/ihi0069/g v2: identical resend because subject screw-up (sorry) Thanks, Damien --- hw/intc/arm_gicv3_cpuif.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)