Message ID | 160554086275.1325084.12110142252189044646.stgit@bahia.lan (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [for-5.2] Revert series "spapr/xive: Allocate vCPU IPIs from the vCPU contexts" | expand |
On 11/16/20 4:34 PM, Greg Kurz wrote: > This series was largely built on the assumption that IPI numbers are > numerically equal to vCPU ids. Even if this happens to be the case > in practice with the default machine settings, this ceases to be true > if VSMT is set to a different value than the number of vCPUs per core. > This causes bogus IPI numbers to be created in KVM from a guest stand > point. This leads to unknow results in the guest, including crashes > or missing vCPUs (see BugLink) and even non-fatal oopses in current > KVM that lacks a check before accessing misconfigured HW (see [1]). > > A tentative patch was sent (see [2]) but it seems too complex to be > merged in an RC. Since the original changes are essentially an > optimization, it seems safer to revert them for now. The damage is > done by commit acbdb9956fe9 ("spapr/xive: Allocate IPIs independently > from the other sources") but the previous patches in the series are > really preparatory patches. So this reverts the whole series: > > eab0a2d06e97 ("spapr/xive: Allocate vCPU IPIs from the vCPU contexts") > acbdb9956fe9 ("spapr/xive: Allocate IPIs independently from the other sources") These are introducing the optimisation to allocate the vCPU IPI from the running task, and, at the same time, the issue for guests using vSMT. > fa94447a2cd6 ("spapr/xive: Use kvmppc_xive_source_reset() in post_load") > 235d3b116213 ("spapr/xive: Modify kvm_cpu_is_enabled() interface") IMO, these two last patches are fine. C. > [1] https://marc.info/?l=kvm-ppc&m=160458409722959&w=4 > [2] https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg03626.html > > Reported-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com> > Fixes: acbdb9956fe9 ("spapr/xive: Allocate IPIs independently from the other sources") > BugLink: https://bugs.launchpad.net/qemu/+bug/1900241 > Signed-off-by: Greg Kurz <groug@kaod.org> > --- > > Peter, > > I'm Cc'ing you because we really want to fix this regression in 5.2. > Reverting the culprit optimization seems a lot safer than the changes > proposed in my other patch. David is on PTO right now and I'm not sure > if he can post a PR anytime soon. Just in case: would it be acceptable > to you if I send a PR after it got some positive feedback from the > people on the Cc: list ? The better the sooner or do we wait for David > to get a chance to step in ? > --- > hw/intc/spapr_xive_kvm.c | 102 ++++++++-------------------------------------- > 1 file changed, 18 insertions(+), 84 deletions(-) > > diff --git a/hw/intc/spapr_xive_kvm.c b/hw/intc/spapr_xive_kvm.c > index 66bf4c06fe55..e8667ce5f621 100644 > --- a/hw/intc/spapr_xive_kvm.c > +++ b/hw/intc/spapr_xive_kvm.c > @@ -36,9 +36,10 @@ typedef struct KVMEnabledCPU { > static QLIST_HEAD(, KVMEnabledCPU) > kvm_enabled_cpus = QLIST_HEAD_INITIALIZER(&kvm_enabled_cpus); > > -static bool kvm_cpu_is_enabled(unsigned long vcpu_id) > +static bool kvm_cpu_is_enabled(CPUState *cs) > { > KVMEnabledCPU *enabled_cpu; > + unsigned long vcpu_id = kvm_arch_vcpu_id(cs); > > QLIST_FOREACH(enabled_cpu, &kvm_enabled_cpus, node) { > if (enabled_cpu->vcpu_id == vcpu_id) { > @@ -146,45 +147,6 @@ int kvmppc_xive_cpu_synchronize_state(XiveTCTX *tctx, Error **errp) > return s.ret; > } > > -/* > - * Allocate the vCPU IPIs from the vCPU context. This will allocate > - * the XIVE IPI interrupt on the chip on which the vCPU is running. > - * This gives a better distribution of IPIs when the guest has a lot > - * of vCPUs. When the vCPUs are pinned, this will make the IPI local > - * to the chip of the vCPU. It will reduce rerouting between interrupt > - * controllers and gives better performance. > - */ > -typedef struct { > - SpaprXive *xive; > - Error *err; > - int rc; > -} XiveInitIPI; > - > -static void kvmppc_xive_reset_ipi_on_cpu(CPUState *cs, run_on_cpu_data arg) > -{ > - unsigned long ipi = kvm_arch_vcpu_id(cs); > - XiveInitIPI *s = arg.host_ptr; > - uint64_t state = 0; > - > - s->rc = kvm_device_access(s->xive->fd, KVM_DEV_XIVE_GRP_SOURCE, ipi, > - &state, true, &s->err); > -} > - > -static int kvmppc_xive_reset_ipi(SpaprXive *xive, CPUState *cs, Error **errp) > -{ > - XiveInitIPI s = { > - .xive = xive, > - .err = NULL, > - .rc = 0, > - }; > - > - run_on_cpu(cs, kvmppc_xive_reset_ipi_on_cpu, RUN_ON_CPU_HOST_PTR(&s)); > - if (s.err) { > - error_propagate(errp, s.err); > - } > - return s.rc; > -} > - > int kvmppc_xive_cpu_connect(XiveTCTX *tctx, Error **errp) > { > ERRP_GUARD(); > @@ -195,7 +157,7 @@ int kvmppc_xive_cpu_connect(XiveTCTX *tctx, Error **errp) > assert(xive->fd != -1); > > /* Check if CPU was hot unplugged and replugged. */ > - if (kvm_cpu_is_enabled(kvm_arch_vcpu_id(tctx->cs))) { > + if (kvm_cpu_is_enabled(tctx->cs)) { > return 0; > } > > @@ -214,12 +176,6 @@ int kvmppc_xive_cpu_connect(XiveTCTX *tctx, Error **errp) > return ret; > } > > - /* Create/reset the vCPU IPI */ > - ret = kvmppc_xive_reset_ipi(xive, tctx->cs, errp); > - if (ret < 0) { > - return ret; > - } > - > kvm_cpu_enable(tctx->cs); > return 0; > } > @@ -279,12 +235,6 @@ int kvmppc_xive_source_reset_one(XiveSource *xsrc, int srcno, Error **errp) > > assert(xive->fd != -1); > > - /* > - * The vCPU IPIs are now allocated in kvmppc_xive_cpu_connect() > - * and not with all sources in kvmppc_xive_source_reset() > - */ > - assert(srcno >= SPAPR_XIRQ_BASE); > - > if (xive_source_irq_is_lsi(xsrc, srcno)) { > state |= KVM_XIVE_LEVEL_SENSITIVE; > if (xsrc->status[srcno] & XIVE_STATUS_ASSERTED) { > @@ -296,28 +246,12 @@ int kvmppc_xive_source_reset_one(XiveSource *xsrc, int srcno, Error **errp) > true, errp); > } > > -/* > - * To be valid, a source must have been claimed by the machine (valid > - * entry in the EAS table) and if it is a vCPU IPI, the vCPU should > - * have been enabled, which means the IPI has been allocated in > - * kvmppc_xive_cpu_connect(). > - */ > -static bool xive_source_is_valid(SpaprXive *xive, int i) > -{ > - return xive_eas_is_valid(&xive->eat[i]) && > - (i >= SPAPR_XIRQ_BASE || kvm_cpu_is_enabled(i)); > -} > - > static int kvmppc_xive_source_reset(XiveSource *xsrc, Error **errp) > { > SpaprXive *xive = SPAPR_XIVE(xsrc->xive); > int i; > > - /* > - * Skip the vCPU IPIs. These are created/reset when the vCPUs are > - * connected in kvmppc_xive_cpu_connect() > - */ > - for (i = SPAPR_XIRQ_BASE; i < xsrc->nr_irqs; i++) { > + for (i = 0; i < xsrc->nr_irqs; i++) { > int ret; > > if (!xive_eas_is_valid(&xive->eat[i])) { > @@ -399,7 +333,7 @@ static void kvmppc_xive_source_get_state(XiveSource *xsrc) > for (i = 0; i < xsrc->nr_irqs; i++) { > uint8_t pq; > > - if (!xive_source_is_valid(xive, i)) { > + if (!xive_eas_is_valid(&xive->eat[i])) { > continue; > } > > @@ -582,7 +516,7 @@ static void kvmppc_xive_change_state_handler(void *opaque, int running, > uint8_t pq; > uint8_t old_pq; > > - if (!xive_source_is_valid(xive, i)) { > + if (!xive_eas_is_valid(&xive->eat[i])) { > continue; > } > > @@ -610,7 +544,7 @@ static void kvmppc_xive_change_state_handler(void *opaque, int running, > for (i = 0; i < xsrc->nr_irqs; i++) { > uint8_t pq; > > - if (!xive_source_is_valid(xive, i)) { > + if (!xive_eas_is_valid(&xive->eat[i])) { > continue; > } > > @@ -713,22 +647,22 @@ int kvmppc_xive_post_load(SpaprXive *xive, int version_id) > } > } > > - /* > - * We can only restore the source config if the source has been > - * previously set in KVM. Since we don't do that at reset time > - * when restoring a VM, let's do it now. > - */ > - ret = kvmppc_xive_source_reset(&xive->source, &local_err); > - if (ret < 0) { > - goto fail; > - } > - > /* Restore the EAT */ > for (i = 0; i < xive->nr_irqs; i++) { > - if (!xive_source_is_valid(xive, i)) { > + if (!xive_eas_is_valid(&xive->eat[i])) { > continue; > } > > + /* > + * We can only restore the source config if the source has been > + * previously set in KVM. Since we don't do that for all interrupts > + * at reset time anymore, let's do it now. > + */ > + ret = kvmppc_xive_source_reset_one(&xive->source, i, &local_err); > + if (ret < 0) { > + goto fail; > + } > + > ret = kvmppc_xive_set_source_config(xive, i, &xive->eat[i], &local_err); > if (ret < 0) { > goto fail; > >
On Mon, 16 Nov 2020 16:54:32 +0100 Cédric Le Goater <clg@kaod.org> wrote: > On 11/16/20 4:34 PM, Greg Kurz wrote: > > This series was largely built on the assumption that IPI numbers are > > numerically equal to vCPU ids. Even if this happens to be the case > > in practice with the default machine settings, this ceases to be true > > if VSMT is set to a different value than the number of vCPUs per core. > > This causes bogus IPI numbers to be created in KVM from a guest stand > > point. This leads to unknow results in the guest, including crashes > > or missing vCPUs (see BugLink) and even non-fatal oopses in current > > KVM that lacks a check before accessing misconfigured HW (see [1]). > > > > A tentative patch was sent (see [2]) but it seems too complex to be > > merged in an RC. Since the original changes are essentially an > > optimization, it seems safer to revert them for now. The damage is > > done by commit acbdb9956fe9 ("spapr/xive: Allocate IPIs independently > > from the other sources") but the previous patches in the series are > > really preparatory patches. So this reverts the whole series: > > > > eab0a2d06e97 ("spapr/xive: Allocate vCPU IPIs from the vCPU contexts") > > acbdb9956fe9 ("spapr/xive: Allocate IPIs independently from the other sources") > > These are introducing the optimisation to allocate the vCPU IPI from the > running task, and, at the same time, the issue for guests using vSMT. > > > fa94447a2cd6 ("spapr/xive: Use kvmppc_xive_source_reset() in post_load") > > 235d3b116213 ("spapr/xive: Modify kvm_cpu_is_enabled() interface") > > IMO, these two last patches are fine. > 235d3b116213 is useless if you no longer want to feed kvm_cpu_is_enabled() with IPI numbers ;-) , so it seems safer to keep it taking a CPU state pointer. Keeping fa94447a2cd6 without 235d3b116213 and fa94447a2cd6 wouldn't make a lot of sense since the next try at implementing the optimization will likely result in a different set of changes. It would certainly be more beneficial to get the feature with a brand new series IMHO. Cheers, -- Greg > C. > >
On Mon, Nov 16, 2020 at 04:34:22PM +0100, Greg Kurz wrote: > This series was largely built on the assumption that IPI numbers are > numerically equal to vCPU ids. Even if this happens to be the case > in practice with the default machine settings, this ceases to be true > if VSMT is set to a different value than the number of vCPUs per core. > This causes bogus IPI numbers to be created in KVM from a guest stand > point. This leads to unknow results in the guest, including crashes > or missing vCPUs (see BugLink) and even non-fatal oopses in current > KVM that lacks a check before accessing misconfigured HW (see [1]). > > A tentative patch was sent (see [2]) but it seems too complex to be > merged in an RC. Since the original changes are essentially an > optimization, it seems safer to revert them for now. The damage is > done by commit acbdb9956fe9 ("spapr/xive: Allocate IPIs independently > from the other sources") but the previous patches in the series are > really preparatory patches. So this reverts the whole series: > > eab0a2d06e97 ("spapr/xive: Allocate vCPU IPIs from the vCPU contexts") > acbdb9956fe9 ("spapr/xive: Allocate IPIs independently from the other sources") > fa94447a2cd6 ("spapr/xive: Use kvmppc_xive_source_reset() in post_load") > 235d3b116213 ("spapr/xive: Modify kvm_cpu_is_enabled() interface") > > [1] https://marc.info/?l=kvm-ppc&m=160458409722959&w=4 > [2] https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg03626.html > > Reported-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com> > Fixes: acbdb9956fe9 ("spapr/xive: Allocate IPIs independently from the other sources") > BugLink: https://bugs.launchpad.net/qemu/+bug/1900241 > Signed-off-by: Greg Kurz <groug@kaod.org> > --- > > Peter, > > I'm Cc'ing you because we really want to fix this regression in 5.2. > Reverting the culprit optimization seems a lot safer than the changes > proposed in my other patch. David is on PTO right now and I'm not sure > if he can post a PR anytime soon. Just in case: would it be acceptable > to you if I send a PR after it got some positive feedback from the > people on the Cc: list ? The better the sooner or do we wait for David > to get a chance to step in ? I am indeed on holiday, and I'm not going to review this until next week. I trust Greg's judgement, though, so I'm happy for this to be applied directly. > --- > hw/intc/spapr_xive_kvm.c | 102 ++++++++-------------------------------------- > 1 file changed, 18 insertions(+), 84 deletions(-) > > diff --git a/hw/intc/spapr_xive_kvm.c b/hw/intc/spapr_xive_kvm.c > index 66bf4c06fe55..e8667ce5f621 100644 > --- a/hw/intc/spapr_xive_kvm.c > +++ b/hw/intc/spapr_xive_kvm.c > @@ -36,9 +36,10 @@ typedef struct KVMEnabledCPU { > static QLIST_HEAD(, KVMEnabledCPU) > kvm_enabled_cpus = QLIST_HEAD_INITIALIZER(&kvm_enabled_cpus); > > -static bool kvm_cpu_is_enabled(unsigned long vcpu_id) > +static bool kvm_cpu_is_enabled(CPUState *cs) > { > KVMEnabledCPU *enabled_cpu; > + unsigned long vcpu_id = kvm_arch_vcpu_id(cs); > > QLIST_FOREACH(enabled_cpu, &kvm_enabled_cpus, node) { > if (enabled_cpu->vcpu_id == vcpu_id) { > @@ -146,45 +147,6 @@ int kvmppc_xive_cpu_synchronize_state(XiveTCTX *tctx, Error **errp) > return s.ret; > } > > -/* > - * Allocate the vCPU IPIs from the vCPU context. This will allocate > - * the XIVE IPI interrupt on the chip on which the vCPU is running. > - * This gives a better distribution of IPIs when the guest has a lot > - * of vCPUs. When the vCPUs are pinned, this will make the IPI local > - * to the chip of the vCPU. It will reduce rerouting between interrupt > - * controllers and gives better performance. > - */ > -typedef struct { > - SpaprXive *xive; > - Error *err; > - int rc; > -} XiveInitIPI; > - > -static void kvmppc_xive_reset_ipi_on_cpu(CPUState *cs, run_on_cpu_data arg) > -{ > - unsigned long ipi = kvm_arch_vcpu_id(cs); > - XiveInitIPI *s = arg.host_ptr; > - uint64_t state = 0; > - > - s->rc = kvm_device_access(s->xive->fd, KVM_DEV_XIVE_GRP_SOURCE, ipi, > - &state, true, &s->err); > -} > - > -static int kvmppc_xive_reset_ipi(SpaprXive *xive, CPUState *cs, Error **errp) > -{ > - XiveInitIPI s = { > - .xive = xive, > - .err = NULL, > - .rc = 0, > - }; > - > - run_on_cpu(cs, kvmppc_xive_reset_ipi_on_cpu, RUN_ON_CPU_HOST_PTR(&s)); > - if (s.err) { > - error_propagate(errp, s.err); > - } > - return s.rc; > -} > - > int kvmppc_xive_cpu_connect(XiveTCTX *tctx, Error **errp) > { > ERRP_GUARD(); > @@ -195,7 +157,7 @@ int kvmppc_xive_cpu_connect(XiveTCTX *tctx, Error **errp) > assert(xive->fd != -1); > > /* Check if CPU was hot unplugged and replugged. */ > - if (kvm_cpu_is_enabled(kvm_arch_vcpu_id(tctx->cs))) { > + if (kvm_cpu_is_enabled(tctx->cs)) { > return 0; > } > > @@ -214,12 +176,6 @@ int kvmppc_xive_cpu_connect(XiveTCTX *tctx, Error **errp) > return ret; > } > > - /* Create/reset the vCPU IPI */ > - ret = kvmppc_xive_reset_ipi(xive, tctx->cs, errp); > - if (ret < 0) { > - return ret; > - } > - > kvm_cpu_enable(tctx->cs); > return 0; > } > @@ -279,12 +235,6 @@ int kvmppc_xive_source_reset_one(XiveSource *xsrc, int srcno, Error **errp) > > assert(xive->fd != -1); > > - /* > - * The vCPU IPIs are now allocated in kvmppc_xive_cpu_connect() > - * and not with all sources in kvmppc_xive_source_reset() > - */ > - assert(srcno >= SPAPR_XIRQ_BASE); > - > if (xive_source_irq_is_lsi(xsrc, srcno)) { > state |= KVM_XIVE_LEVEL_SENSITIVE; > if (xsrc->status[srcno] & XIVE_STATUS_ASSERTED) { > @@ -296,28 +246,12 @@ int kvmppc_xive_source_reset_one(XiveSource *xsrc, int srcno, Error **errp) > true, errp); > } > > -/* > - * To be valid, a source must have been claimed by the machine (valid > - * entry in the EAS table) and if it is a vCPU IPI, the vCPU should > - * have been enabled, which means the IPI has been allocated in > - * kvmppc_xive_cpu_connect(). > - */ > -static bool xive_source_is_valid(SpaprXive *xive, int i) > -{ > - return xive_eas_is_valid(&xive->eat[i]) && > - (i >= SPAPR_XIRQ_BASE || kvm_cpu_is_enabled(i)); > -} > - > static int kvmppc_xive_source_reset(XiveSource *xsrc, Error **errp) > { > SpaprXive *xive = SPAPR_XIVE(xsrc->xive); > int i; > > - /* > - * Skip the vCPU IPIs. These are created/reset when the vCPUs are > - * connected in kvmppc_xive_cpu_connect() > - */ > - for (i = SPAPR_XIRQ_BASE; i < xsrc->nr_irqs; i++) { > + for (i = 0; i < xsrc->nr_irqs; i++) { > int ret; > > if (!xive_eas_is_valid(&xive->eat[i])) { > @@ -399,7 +333,7 @@ static void kvmppc_xive_source_get_state(XiveSource *xsrc) > for (i = 0; i < xsrc->nr_irqs; i++) { > uint8_t pq; > > - if (!xive_source_is_valid(xive, i)) { > + if (!xive_eas_is_valid(&xive->eat[i])) { > continue; > } > > @@ -582,7 +516,7 @@ static void kvmppc_xive_change_state_handler(void *opaque, int running, > uint8_t pq; > uint8_t old_pq; > > - if (!xive_source_is_valid(xive, i)) { > + if (!xive_eas_is_valid(&xive->eat[i])) { > continue; > } > > @@ -610,7 +544,7 @@ static void kvmppc_xive_change_state_handler(void *opaque, int running, > for (i = 0; i < xsrc->nr_irqs; i++) { > uint8_t pq; > > - if (!xive_source_is_valid(xive, i)) { > + if (!xive_eas_is_valid(&xive->eat[i])) { > continue; > } > > @@ -713,22 +647,22 @@ int kvmppc_xive_post_load(SpaprXive *xive, int version_id) > } > } > > - /* > - * We can only restore the source config if the source has been > - * previously set in KVM. Since we don't do that at reset time > - * when restoring a VM, let's do it now. > - */ > - ret = kvmppc_xive_source_reset(&xive->source, &local_err); > - if (ret < 0) { > - goto fail; > - } > - > /* Restore the EAT */ > for (i = 0; i < xive->nr_irqs; i++) { > - if (!xive_source_is_valid(xive, i)) { > + if (!xive_eas_is_valid(&xive->eat[i])) { > continue; > } > > + /* > + * We can only restore the source config if the source has been > + * previously set in KVM. Since we don't do that for all interrupts > + * at reset time anymore, let's do it now. > + */ > + ret = kvmppc_xive_source_reset_one(&xive->source, i, &local_err); > + if (ret < 0) { > + goto fail; > + } > + > ret = kvmppc_xive_set_source_config(xive, i, &xive->eat[i], &local_err); > if (ret < 0) { > goto fail; > >
On 11/18/20 6:24 AM, David Gibson wrote: > On Mon, Nov 16, 2020 at 04:34:22PM +0100, Greg Kurz wrote: >> This series was largely built on the assumption that IPI numbers are >> numerically equal to vCPU ids. Even if this happens to be the case >> in practice with the default machine settings, this ceases to be true >> if VSMT is set to a different value than the number of vCPUs per core. >> This causes bogus IPI numbers to be created in KVM from a guest stand >> point. This leads to unknow results in the guest, including crashes >> or missing vCPUs (see BugLink) and even non-fatal oopses in current >> KVM that lacks a check before accessing misconfigured HW (see [1]). >> >> A tentative patch was sent (see [2]) but it seems too complex to be >> merged in an RC. Since the original changes are essentially an >> optimization, it seems safer to revert them for now. The damage is >> done by commit acbdb9956fe9 ("spapr/xive: Allocate IPIs independently >> from the other sources") but the previous patches in the series are >> really preparatory patches. So this reverts the whole series: >> >> eab0a2d06e97 ("spapr/xive: Allocate vCPU IPIs from the vCPU contexts") >> acbdb9956fe9 ("spapr/xive: Allocate IPIs independently from the other sources") >> fa94447a2cd6 ("spapr/xive: Use kvmppc_xive_source_reset() in post_load") >> 235d3b116213 ("spapr/xive: Modify kvm_cpu_is_enabled() interface") >> >> [1] https://marc.info/?l=kvm-ppc&m=160458409722959&w=4 >> [2] https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg03626.html >> >> Reported-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com> >> Fixes: acbdb9956fe9 ("spapr/xive: Allocate IPIs independently from the other sources") >> BugLink: https://bugs.launchpad.net/qemu/+bug/1900241 >> Signed-off-by: Greg Kurz <groug@kaod.org> >> --- >> >> Peter, >> >> I'm Cc'ing you because we really want to fix this regression in 5.2. >> Reverting the culprit optimization seems a lot safer than the changes >> proposed in my other patch. David is on PTO right now and I'm not sure >> if he can post a PR anytime soon. Just in case: would it be acceptable >> to you if I send a PR after it got some positive feedback from the >> people on the Cc: list ? The better the sooner or do we wait for David >> to get a chance to step in ? > > I am indeed on holiday, and I'm not going to review this until next > week. I trust Greg's judgement, though, so I'm happy for this to be > applied directly. Acked-by: Cédric Le Goater <clg@kaod.org> C. > > >> --- >> hw/intc/spapr_xive_kvm.c | 102 ++++++++-------------------------------------- >> 1 file changed, 18 insertions(+), 84 deletions(-) >> >> diff --git a/hw/intc/spapr_xive_kvm.c b/hw/intc/spapr_xive_kvm.c >> index 66bf4c06fe55..e8667ce5f621 100644 >> --- a/hw/intc/spapr_xive_kvm.c >> +++ b/hw/intc/spapr_xive_kvm.c >> @@ -36,9 +36,10 @@ typedef struct KVMEnabledCPU { >> static QLIST_HEAD(, KVMEnabledCPU) >> kvm_enabled_cpus = QLIST_HEAD_INITIALIZER(&kvm_enabled_cpus); >> >> -static bool kvm_cpu_is_enabled(unsigned long vcpu_id) >> +static bool kvm_cpu_is_enabled(CPUState *cs) >> { >> KVMEnabledCPU *enabled_cpu; >> + unsigned long vcpu_id = kvm_arch_vcpu_id(cs); >> >> QLIST_FOREACH(enabled_cpu, &kvm_enabled_cpus, node) { >> if (enabled_cpu->vcpu_id == vcpu_id) { >> @@ -146,45 +147,6 @@ int kvmppc_xive_cpu_synchronize_state(XiveTCTX *tctx, Error **errp) >> return s.ret; >> } >> >> -/* >> - * Allocate the vCPU IPIs from the vCPU context. This will allocate >> - * the XIVE IPI interrupt on the chip on which the vCPU is running. >> - * This gives a better distribution of IPIs when the guest has a lot >> - * of vCPUs. When the vCPUs are pinned, this will make the IPI local >> - * to the chip of the vCPU. It will reduce rerouting between interrupt >> - * controllers and gives better performance. >> - */ >> -typedef struct { >> - SpaprXive *xive; >> - Error *err; >> - int rc; >> -} XiveInitIPI; >> - >> -static void kvmppc_xive_reset_ipi_on_cpu(CPUState *cs, run_on_cpu_data arg) >> -{ >> - unsigned long ipi = kvm_arch_vcpu_id(cs); >> - XiveInitIPI *s = arg.host_ptr; >> - uint64_t state = 0; >> - >> - s->rc = kvm_device_access(s->xive->fd, KVM_DEV_XIVE_GRP_SOURCE, ipi, >> - &state, true, &s->err); >> -} >> - >> -static int kvmppc_xive_reset_ipi(SpaprXive *xive, CPUState *cs, Error **errp) >> -{ >> - XiveInitIPI s = { >> - .xive = xive, >> - .err = NULL, >> - .rc = 0, >> - }; >> - >> - run_on_cpu(cs, kvmppc_xive_reset_ipi_on_cpu, RUN_ON_CPU_HOST_PTR(&s)); >> - if (s.err) { >> - error_propagate(errp, s.err); >> - } >> - return s.rc; >> -} >> - >> int kvmppc_xive_cpu_connect(XiveTCTX *tctx, Error **errp) >> { >> ERRP_GUARD(); >> @@ -195,7 +157,7 @@ int kvmppc_xive_cpu_connect(XiveTCTX *tctx, Error **errp) >> assert(xive->fd != -1); >> >> /* Check if CPU was hot unplugged and replugged. */ >> - if (kvm_cpu_is_enabled(kvm_arch_vcpu_id(tctx->cs))) { >> + if (kvm_cpu_is_enabled(tctx->cs)) { >> return 0; >> } >> >> @@ -214,12 +176,6 @@ int kvmppc_xive_cpu_connect(XiveTCTX *tctx, Error **errp) >> return ret; >> } >> >> - /* Create/reset the vCPU IPI */ >> - ret = kvmppc_xive_reset_ipi(xive, tctx->cs, errp); >> - if (ret < 0) { >> - return ret; >> - } >> - >> kvm_cpu_enable(tctx->cs); >> return 0; >> } >> @@ -279,12 +235,6 @@ int kvmppc_xive_source_reset_one(XiveSource *xsrc, int srcno, Error **errp) >> >> assert(xive->fd != -1); >> >> - /* >> - * The vCPU IPIs are now allocated in kvmppc_xive_cpu_connect() >> - * and not with all sources in kvmppc_xive_source_reset() >> - */ >> - assert(srcno >= SPAPR_XIRQ_BASE); >> - >> if (xive_source_irq_is_lsi(xsrc, srcno)) { >> state |= KVM_XIVE_LEVEL_SENSITIVE; >> if (xsrc->status[srcno] & XIVE_STATUS_ASSERTED) { >> @@ -296,28 +246,12 @@ int kvmppc_xive_source_reset_one(XiveSource *xsrc, int srcno, Error **errp) >> true, errp); >> } >> >> -/* >> - * To be valid, a source must have been claimed by the machine (valid >> - * entry in the EAS table) and if it is a vCPU IPI, the vCPU should >> - * have been enabled, which means the IPI has been allocated in >> - * kvmppc_xive_cpu_connect(). >> - */ >> -static bool xive_source_is_valid(SpaprXive *xive, int i) >> -{ >> - return xive_eas_is_valid(&xive->eat[i]) && >> - (i >= SPAPR_XIRQ_BASE || kvm_cpu_is_enabled(i)); >> -} >> - >> static int kvmppc_xive_source_reset(XiveSource *xsrc, Error **errp) >> { >> SpaprXive *xive = SPAPR_XIVE(xsrc->xive); >> int i; >> >> - /* >> - * Skip the vCPU IPIs. These are created/reset when the vCPUs are >> - * connected in kvmppc_xive_cpu_connect() >> - */ >> - for (i = SPAPR_XIRQ_BASE; i < xsrc->nr_irqs; i++) { >> + for (i = 0; i < xsrc->nr_irqs; i++) { >> int ret; >> >> if (!xive_eas_is_valid(&xive->eat[i])) { >> @@ -399,7 +333,7 @@ static void kvmppc_xive_source_get_state(XiveSource *xsrc) >> for (i = 0; i < xsrc->nr_irqs; i++) { >> uint8_t pq; >> >> - if (!xive_source_is_valid(xive, i)) { >> + if (!xive_eas_is_valid(&xive->eat[i])) { >> continue; >> } >> >> @@ -582,7 +516,7 @@ static void kvmppc_xive_change_state_handler(void *opaque, int running, >> uint8_t pq; >> uint8_t old_pq; >> >> - if (!xive_source_is_valid(xive, i)) { >> + if (!xive_eas_is_valid(&xive->eat[i])) { >> continue; >> } >> >> @@ -610,7 +544,7 @@ static void kvmppc_xive_change_state_handler(void *opaque, int running, >> for (i = 0; i < xsrc->nr_irqs; i++) { >> uint8_t pq; >> >> - if (!xive_source_is_valid(xive, i)) { >> + if (!xive_eas_is_valid(&xive->eat[i])) { >> continue; >> } >> >> @@ -713,22 +647,22 @@ int kvmppc_xive_post_load(SpaprXive *xive, int version_id) >> } >> } >> >> - /* >> - * We can only restore the source config if the source has been >> - * previously set in KVM. Since we don't do that at reset time >> - * when restoring a VM, let's do it now. >> - */ >> - ret = kvmppc_xive_source_reset(&xive->source, &local_err); >> - if (ret < 0) { >> - goto fail; >> - } >> - >> /* Restore the EAT */ >> for (i = 0; i < xive->nr_irqs; i++) { >> - if (!xive_source_is_valid(xive, i)) { >> + if (!xive_eas_is_valid(&xive->eat[i])) { >> continue; >> } >> >> + /* >> + * We can only restore the source config if the source has been >> + * previously set in KVM. Since we don't do that for all interrupts >> + * at reset time anymore, let's do it now. >> + */ >> + ret = kvmppc_xive_source_reset_one(&xive->source, i, &local_err); >> + if (ret < 0) { >> + goto fail; >> + } >> + >> ret = kvmppc_xive_set_source_config(xive, i, &xive->eat[i], &local_err); >> if (ret < 0) { >> goto fail; >> >> >
diff --git a/hw/intc/spapr_xive_kvm.c b/hw/intc/spapr_xive_kvm.c index 66bf4c06fe55..e8667ce5f621 100644 --- a/hw/intc/spapr_xive_kvm.c +++ b/hw/intc/spapr_xive_kvm.c @@ -36,9 +36,10 @@ typedef struct KVMEnabledCPU { static QLIST_HEAD(, KVMEnabledCPU) kvm_enabled_cpus = QLIST_HEAD_INITIALIZER(&kvm_enabled_cpus); -static bool kvm_cpu_is_enabled(unsigned long vcpu_id) +static bool kvm_cpu_is_enabled(CPUState *cs) { KVMEnabledCPU *enabled_cpu; + unsigned long vcpu_id = kvm_arch_vcpu_id(cs); QLIST_FOREACH(enabled_cpu, &kvm_enabled_cpus, node) { if (enabled_cpu->vcpu_id == vcpu_id) { @@ -146,45 +147,6 @@ int kvmppc_xive_cpu_synchronize_state(XiveTCTX *tctx, Error **errp) return s.ret; } -/* - * Allocate the vCPU IPIs from the vCPU context. This will allocate - * the XIVE IPI interrupt on the chip on which the vCPU is running. - * This gives a better distribution of IPIs when the guest has a lot - * of vCPUs. When the vCPUs are pinned, this will make the IPI local - * to the chip of the vCPU. It will reduce rerouting between interrupt - * controllers and gives better performance. - */ -typedef struct { - SpaprXive *xive; - Error *err; - int rc; -} XiveInitIPI; - -static void kvmppc_xive_reset_ipi_on_cpu(CPUState *cs, run_on_cpu_data arg) -{ - unsigned long ipi = kvm_arch_vcpu_id(cs); - XiveInitIPI *s = arg.host_ptr; - uint64_t state = 0; - - s->rc = kvm_device_access(s->xive->fd, KVM_DEV_XIVE_GRP_SOURCE, ipi, - &state, true, &s->err); -} - -static int kvmppc_xive_reset_ipi(SpaprXive *xive, CPUState *cs, Error **errp) -{ - XiveInitIPI s = { - .xive = xive, - .err = NULL, - .rc = 0, - }; - - run_on_cpu(cs, kvmppc_xive_reset_ipi_on_cpu, RUN_ON_CPU_HOST_PTR(&s)); - if (s.err) { - error_propagate(errp, s.err); - } - return s.rc; -} - int kvmppc_xive_cpu_connect(XiveTCTX *tctx, Error **errp) { ERRP_GUARD(); @@ -195,7 +157,7 @@ int kvmppc_xive_cpu_connect(XiveTCTX *tctx, Error **errp) assert(xive->fd != -1); /* Check if CPU was hot unplugged and replugged. */ - if (kvm_cpu_is_enabled(kvm_arch_vcpu_id(tctx->cs))) { + if (kvm_cpu_is_enabled(tctx->cs)) { return 0; } @@ -214,12 +176,6 @@ int kvmppc_xive_cpu_connect(XiveTCTX *tctx, Error **errp) return ret; } - /* Create/reset the vCPU IPI */ - ret = kvmppc_xive_reset_ipi(xive, tctx->cs, errp); - if (ret < 0) { - return ret; - } - kvm_cpu_enable(tctx->cs); return 0; } @@ -279,12 +235,6 @@ int kvmppc_xive_source_reset_one(XiveSource *xsrc, int srcno, Error **errp) assert(xive->fd != -1); - /* - * The vCPU IPIs are now allocated in kvmppc_xive_cpu_connect() - * and not with all sources in kvmppc_xive_source_reset() - */ - assert(srcno >= SPAPR_XIRQ_BASE); - if (xive_source_irq_is_lsi(xsrc, srcno)) { state |= KVM_XIVE_LEVEL_SENSITIVE; if (xsrc->status[srcno] & XIVE_STATUS_ASSERTED) { @@ -296,28 +246,12 @@ int kvmppc_xive_source_reset_one(XiveSource *xsrc, int srcno, Error **errp) true, errp); } -/* - * To be valid, a source must have been claimed by the machine (valid - * entry in the EAS table) and if it is a vCPU IPI, the vCPU should - * have been enabled, which means the IPI has been allocated in - * kvmppc_xive_cpu_connect(). - */ -static bool xive_source_is_valid(SpaprXive *xive, int i) -{ - return xive_eas_is_valid(&xive->eat[i]) && - (i >= SPAPR_XIRQ_BASE || kvm_cpu_is_enabled(i)); -} - static int kvmppc_xive_source_reset(XiveSource *xsrc, Error **errp) { SpaprXive *xive = SPAPR_XIVE(xsrc->xive); int i; - /* - * Skip the vCPU IPIs. These are created/reset when the vCPUs are - * connected in kvmppc_xive_cpu_connect() - */ - for (i = SPAPR_XIRQ_BASE; i < xsrc->nr_irqs; i++) { + for (i = 0; i < xsrc->nr_irqs; i++) { int ret; if (!xive_eas_is_valid(&xive->eat[i])) { @@ -399,7 +333,7 @@ static void kvmppc_xive_source_get_state(XiveSource *xsrc) for (i = 0; i < xsrc->nr_irqs; i++) { uint8_t pq; - if (!xive_source_is_valid(xive, i)) { + if (!xive_eas_is_valid(&xive->eat[i])) { continue; } @@ -582,7 +516,7 @@ static void kvmppc_xive_change_state_handler(void *opaque, int running, uint8_t pq; uint8_t old_pq; - if (!xive_source_is_valid(xive, i)) { + if (!xive_eas_is_valid(&xive->eat[i])) { continue; } @@ -610,7 +544,7 @@ static void kvmppc_xive_change_state_handler(void *opaque, int running, for (i = 0; i < xsrc->nr_irqs; i++) { uint8_t pq; - if (!xive_source_is_valid(xive, i)) { + if (!xive_eas_is_valid(&xive->eat[i])) { continue; } @@ -713,22 +647,22 @@ int kvmppc_xive_post_load(SpaprXive *xive, int version_id) } } - /* - * We can only restore the source config if the source has been - * previously set in KVM. Since we don't do that at reset time - * when restoring a VM, let's do it now. - */ - ret = kvmppc_xive_source_reset(&xive->source, &local_err); - if (ret < 0) { - goto fail; - } - /* Restore the EAT */ for (i = 0; i < xive->nr_irqs; i++) { - if (!xive_source_is_valid(xive, i)) { + if (!xive_eas_is_valid(&xive->eat[i])) { continue; } + /* + * We can only restore the source config if the source has been + * previously set in KVM. Since we don't do that for all interrupts + * at reset time anymore, let's do it now. + */ + ret = kvmppc_xive_source_reset_one(&xive->source, i, &local_err); + if (ret < 0) { + goto fail; + } + ret = kvmppc_xive_set_source_config(xive, i, &xive->eat[i], &local_err); if (ret < 0) { goto fail;
This series was largely built on the assumption that IPI numbers are numerically equal to vCPU ids. Even if this happens to be the case in practice with the default machine settings, this ceases to be true if VSMT is set to a different value than the number of vCPUs per core. This causes bogus IPI numbers to be created in KVM from a guest stand point. This leads to unknow results in the guest, including crashes or missing vCPUs (see BugLink) and even non-fatal oopses in current KVM that lacks a check before accessing misconfigured HW (see [1]). A tentative patch was sent (see [2]) but it seems too complex to be merged in an RC. Since the original changes are essentially an optimization, it seems safer to revert them for now. The damage is done by commit acbdb9956fe9 ("spapr/xive: Allocate IPIs independently from the other sources") but the previous patches in the series are really preparatory patches. So this reverts the whole series: eab0a2d06e97 ("spapr/xive: Allocate vCPU IPIs from the vCPU contexts") acbdb9956fe9 ("spapr/xive: Allocate IPIs independently from the other sources") fa94447a2cd6 ("spapr/xive: Use kvmppc_xive_source_reset() in post_load") 235d3b116213 ("spapr/xive: Modify kvm_cpu_is_enabled() interface") [1] https://marc.info/?l=kvm-ppc&m=160458409722959&w=4 [2] https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg03626.html Reported-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com> Fixes: acbdb9956fe9 ("spapr/xive: Allocate IPIs independently from the other sources") BugLink: https://bugs.launchpad.net/qemu/+bug/1900241 Signed-off-by: Greg Kurz <groug@kaod.org> --- Peter, I'm Cc'ing you because we really want to fix this regression in 5.2. Reverting the culprit optimization seems a lot safer than the changes proposed in my other patch. David is on PTO right now and I'm not sure if he can post a PR anytime soon. Just in case: would it be acceptable to you if I send a PR after it got some positive feedback from the people on the Cc: list ? The better the sooner or do we wait for David to get a chance to step in ? --- hw/intc/spapr_xive_kvm.c | 102 ++++++++-------------------------------------- 1 file changed, 18 insertions(+), 84 deletions(-)