diff mbox series

[RFC,5/5] spapr: Work around spurious warnings from vfio INTx initialization

Message ID 20191017054218.8876-6-david@gibson.dropbear.id.au (mailing list archive)
State New, archived
Headers show
Series Handle PAPR irq chip changes for VFIO devices | expand

Commit Message

David Gibson Oct. 17, 2019, 5:42 a.m. UTC
Traditional PCI INTx for vfio devices can only perform well if using
an in-kernel irqchip.  Therefore, vfio_intx_update() issues a warning
if an in kernel irqchip is not available.

We usually do have an in-kernel irqchip available for pseries machines
on POWER hosts.  However, because the platform allows feature
negotiation of what interrupt controller model to use, we don't
currently initialize it until machine reset.  vfio_intx_update() is
called (first) from vfio_realize() before that, so it can issue a
spurious warning, even if we will have an in kernel irqchip by the
time we need it.

To workaround this, make a call to spapr_irq_update_active_intc() from
spapr_irq_init() which is called at machine realize time, before the
vfio realize.  This call will be pretty much obsoleted by the later
call at reset time, but it serves to suppress the spurious warning
from VFIO.

Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Alexey Kardashevskiy <aik@ozlabs.ru>

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 hw/ppc/spapr_irq.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

Comments

Cédric Le Goater Oct. 17, 2019, 8:43 a.m. UTC | #1
On 17/10/2019 07:42, David Gibson wrote:
> Traditional PCI INTx for vfio devices can only perform well if using
> an in-kernel irqchip.  Therefore, vfio_intx_update() issues a warning
> if an in kernel irqchip is not available.
> 
> We usually do have an in-kernel irqchip available for pseries machines
> on POWER hosts.  However, because the platform allows feature
> negotiation of what interrupt controller model to use, we don't
> currently initialize it until machine reset.  vfio_intx_update() is
> called (first) from vfio_realize() before that, so it can issue a
> spurious warning, even if we will have an in kernel irqchip by the
> time we need it.
> 
> To workaround this, make a call to spapr_irq_update_active_intc() from
> spapr_irq_init() which is called at machine realize time, before the
> vfio realize.  This call will be pretty much obsoleted by the later
> call at reset time, but it serves to suppress the spurious warning
> from VFIO.
> 
> Cc: Alex Williamson <alex.williamson@redhat.com>
> Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  hw/ppc/spapr_irq.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
> index 45544b8976..bb91c61fa0 100644
> --- a/hw/ppc/spapr_irq.c
> +++ b/hw/ppc/spapr_irq.c
> @@ -345,6 +345,14 @@ void spapr_irq_init(SpaprMachineState *spapr, Error **errp)
>  
>      spapr->qirqs = qemu_allocate_irqs(spapr_set_irq, spapr,
>                                        smc->nr_xirqs + SPAPR_XIRQ_BASE);
> +
> +    /*
> +     * Mostly we don't actually need this until reset, except that not
> +     * having this set up can cause VFIO devices to issue a
> +     * false-positive warning during realize(), because they don't yet
> +     * have an in-kernel irq chip.
> +     */
> +    spapr_irq_update_active_intc(spapr);

This will call the de/activate hooks of the irq chip very early 
before reset and CAS, and won't call them at reset if the intc
pointers are the same (xive only for instance). It breaks very 
obviously the emulated XIVE device which won't reset the OS CAM 
line with the CPU id values and CPU notification will be broken.

We have to find something else.


>  }
>  
>  int spapr_irq_claim(SpaprMachineState *spapr, int irq, bool lsi, Error **errp)
> @@ -500,7 +508,8 @@ void spapr_irq_update_active_intc(SpaprMachineState *spapr)
>           * this.
>           */
>          new_intc = SPAPR_INTC(spapr->xive);
> -    } else if (spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
> +    } else if (spapr->ov5_cas
> +               && spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {

This change can go in another patch.

C. 

>          new_intc = SPAPR_INTC(spapr->xive);
>      } else {
>          new_intc = SPAPR_INTC(spapr->ics);
>
Greg Kurz Oct. 17, 2019, 9 a.m. UTC | #2
On Thu, 17 Oct 2019 10:43:11 +0200
Cédric Le Goater <clg@kaod.org> wrote:

> On 17/10/2019 07:42, David Gibson wrote:
> > Traditional PCI INTx for vfio devices can only perform well if using
> > an in-kernel irqchip.  Therefore, vfio_intx_update() issues a warning
> > if an in kernel irqchip is not available.
> > 
> > We usually do have an in-kernel irqchip available for pseries machines
> > on POWER hosts.  However, because the platform allows feature
> > negotiation of what interrupt controller model to use, we don't
> > currently initialize it until machine reset.  vfio_intx_update() is
> > called (first) from vfio_realize() before that, so it can issue a
> > spurious warning, even if we will have an in kernel irqchip by the
> > time we need it.
> > 
> > To workaround this, make a call to spapr_irq_update_active_intc() from
> > spapr_irq_init() which is called at machine realize time, before the
> > vfio realize.  This call will be pretty much obsoleted by the later
> > call at reset time, but it serves to suppress the spurious warning
> > from VFIO.
> > 
> > Cc: Alex Williamson <alex.williamson@redhat.com>
> > Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
> > 
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> >  hw/ppc/spapr_irq.c | 11 ++++++++++-
> >  1 file changed, 10 insertions(+), 1 deletion(-)
> > 
> > diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
> > index 45544b8976..bb91c61fa0 100644
> > --- a/hw/ppc/spapr_irq.c
> > +++ b/hw/ppc/spapr_irq.c
> > @@ -345,6 +345,14 @@ void spapr_irq_init(SpaprMachineState *spapr, Error **errp)
> >  
> >      spapr->qirqs = qemu_allocate_irqs(spapr_set_irq, spapr,
> >                                        smc->nr_xirqs + SPAPR_XIRQ_BASE);
> > +
> > +    /*
> > +     * Mostly we don't actually need this until reset, except that not
> > +     * having this set up can cause VFIO devices to issue a
> > +     * false-positive warning during realize(), because they don't yet
> > +     * have an in-kernel irq chip.
> > +     */
> > +    spapr_irq_update_active_intc(spapr);
> 
> This will call the de/activate hooks of the irq chip very early 
> before reset and CAS, and won't call them at reset if the intc
> pointers are the same (xive only for instance). It breaks very 
> obviously the emulated XIVE device which won't reset the OS CAM 
> line with the CPU id values and CPU notification will be broken.
> 

Yes. The problem is that we now have a new path:

spapr_irq_init()
 spapr_irq_update_active_intc()
  set_active_intc()
   spapr_xive_activate()

and we go there before the CPUs are realized:

(gdb) p cpus
$6 = {tqh_first = 0x0, tqh_circ = {tql_next = 0x0, 
    tql_prev = 0x100f203a8 <cpus>}}

spapr_xive_activate() thus doesn't reset the OS CAM line.

When the initial machine reset happens later, CPUs are
present but we don't go down the activate path since XIVE
is already active.

Commenting away the following check in set_active_intc() is enough
to fix:

    if (new_intc == spapr->active_intc) {
        /* Nothing to do */
        return;
    }

but this also reveals that handling the OS CAM line reset in
spapr_xive_activate() only may be a bit fragile since it obviously
requires CPUs to be present... Maybe this should also be done by
the CPUs on their realize path ?

> We have to find something else.
> 
> 
> >  }
> >  
> >  int spapr_irq_claim(SpaprMachineState *spapr, int irq, bool lsi, Error **errp)
> > @@ -500,7 +508,8 @@ void spapr_irq_update_active_intc(SpaprMachineState *spapr)
> >           * this.
> >           */
> >          new_intc = SPAPR_INTC(spapr->xive);
> > -    } else if (spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
> > +    } else if (spapr->ov5_cas
> > +               && spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
> 
> This change can go in another patch.
> 
> C. 
> 
> >          new_intc = SPAPR_INTC(spapr->xive);
> >      } else {
> >          new_intc = SPAPR_INTC(spapr->ics);
> > 
>
Cédric Le Goater Oct. 17, 2019, 1:42 p.m. UTC | #3
On 17/10/2019 10:43, Cédric Le Goater wrote:
> On 17/10/2019 07:42, David Gibson wrote:
>> Traditional PCI INTx for vfio devices can only perform well if using
>> an in-kernel irqchip.  Therefore, vfio_intx_update() issues a warning
>> if an in kernel irqchip is not available.
>>
>> We usually do have an in-kernel irqchip available for pseries machines
>> on POWER hosts.  However, because the platform allows feature
>> negotiation of what interrupt controller model to use, we don't
>> currently initialize it until machine reset.  vfio_intx_update() is
>> called (first) from vfio_realize() before that, so it can issue a
>> spurious warning, even if we will have an in kernel irqchip by the
>> time we need it.
>>
>> To workaround this, make a call to spapr_irq_update_active_intc() from
>> spapr_irq_init() which is called at machine realize time, before the
>> vfio realize.  This call will be pretty much obsoleted by the later
>> call at reset time, but it serves to suppress the spurious warning
>> from VFIO.
>>
>> Cc: Alex Williamson <alex.williamson@redhat.com>
>> Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
>>
>> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
>> ---
>>  hw/ppc/spapr_irq.c | 11 ++++++++++-
>>  1 file changed, 10 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
>> index 45544b8976..bb91c61fa0 100644
>> --- a/hw/ppc/spapr_irq.c
>> +++ b/hw/ppc/spapr_irq.c
>> @@ -345,6 +345,14 @@ void spapr_irq_init(SpaprMachineState *spapr, Error **errp)
>>  
>>      spapr->qirqs = qemu_allocate_irqs(spapr_set_irq, spapr,
>>                                        smc->nr_xirqs + SPAPR_XIRQ_BASE);
>> +
>> +    /*
>> +     * Mostly we don't actually need this until reset, except that not
>> +     * having this set up can cause VFIO devices to issue a
>> +     * false-positive warning during realize(), because they don't yet
>> +     * have an in-kernel irq chip.
>> +     */
>> +    spapr_irq_update_active_intc(spapr);
> 
> This will call the de/activate hooks of the irq chip very early 
> before reset and CAS, and won't call them at reset if the intc
> pointers are the same (xive only for instance). It breaks very 
> obviously the emulated XIVE device which won't reset the OS CAM 
> line with the CPU id values and CPU notification will be broken.
> 
> We have to find something else.

Here is a possible fix for the (re)setting of the OS CAM line. 

Removing spapr_xive_set_tctx_os_cam() is a good thing but this solution
shows some issues in our modeling of hot-plugged CPUS with a reset() 
being called at realize().


Thanks,

C.


From eb6086645b1db13a8e021e8a5b8a2c6faba875b3 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= <clg@kaod.org>
Date: Thu, 17 Oct 2019 15:35:36 +0200
Subject: [PATCH] spapr/xive: Set the OS CAM line at reset
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

When a Virtual Processor is scheduled to run on a HW thread, the
hypervisor pushes its identifier in the OS CAM line. When running in
TCG or kernel_irqchip=off, QEMU needs to emulate the same behavior.

Introduce a 'os-cam' property which will be used to set the OS CAM
line at reset.

Complexity arise with hotplug CPUs which need an extra reset of the
intc presenter of the interrupt controller selected by CAS. The sPAPR
IRQ backend is extended with new handler cpu_intc_reset()

spapr_realize_vcpu() is also modified to call the CPU reset after the
creation of the intc presenter.

This is a combo patch which needs to be splitted.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 include/hw/ppc/spapr_irq.h  |  4 ++++
 include/hw/ppc/spapr_xive.h |  1 -
 include/hw/ppc/xics.h       |  1 +
 include/hw/ppc/xive.h       |  5 ++++-
 hw/intc/spapr_xive.c        | 37 ++++++++++++-------------------------
 hw/intc/xics.c              |  5 +++++
 hw/intc/xics_spapr.c        |  8 ++++++++
 hw/intc/xive.c              | 33 +++++++++++++++++++++++++++++----
 hw/ppc/pnv.c                |  3 ++-
 hw/ppc/spapr_cpu_core.c     |  8 ++++++--
 hw/ppc/spapr_irq.c          | 21 +++++++++++++++++++++
 11 files changed, 92 insertions(+), 34 deletions(-)

diff --git a/include/hw/ppc/spapr_irq.h b/include/hw/ppc/spapr_irq.h
index 5e150a667902..78327496c102 100644
--- a/include/hw/ppc/spapr_irq.h
+++ b/include/hw/ppc/spapr_irq.h
@@ -52,6 +52,8 @@ typedef struct SpaprInterruptControllerClass {
      */
     int (*cpu_intc_create)(SpaprInterruptController *intc,
                             PowerPCCPU *cpu, Error **errp);
+    int (*cpu_intc_reset)(SpaprInterruptController *intc, PowerPCCPU *cpu,
+                          Error **errp);
     int (*claim_irq)(SpaprInterruptController *intc, int irq, bool lsi,
                      Error **errp);
     void (*free_irq)(SpaprInterruptController *intc, int irq);
@@ -68,6 +70,8 @@ void spapr_irq_update_active_intc(SpaprMachineState *spapr);
 
 int spapr_irq_cpu_intc_create(SpaprMachineState *spapr,
                               PowerPCCPU *cpu, Error **errp);
+int spapr_irq_cpu_intc_reset(SpaprMachineState *spapr,
+                             PowerPCCPU *cpu, Error **errp);
 void spapr_irq_print_info(SpaprMachineState *spapr, Monitor *mon);
 void spapr_irq_dt(SpaprMachineState *spapr, uint32_t nr_servers,
                   void *fdt, uint32_t phandle);
diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
index d84bd5c229f0..742b7e834f2a 100644
--- a/include/hw/ppc/spapr_xive.h
+++ b/include/hw/ppc/spapr_xive.h
@@ -57,7 +57,6 @@ typedef struct SpaprXive {
 void spapr_xive_pic_print_info(SpaprXive *xive, Monitor *mon);
 
 void spapr_xive_hcall_init(SpaprMachineState *spapr);
-void spapr_xive_set_tctx_os_cam(XiveTCTX *tctx);
 void spapr_xive_mmio_set_enabled(SpaprXive *xive, bool enable);
 void spapr_xive_map_mmio(SpaprXive *xive);
 
diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h
index 1e6a9300eb2b..602173c12250 100644
--- a/include/hw/ppc/xics.h
+++ b/include/hw/ppc/xics.h
@@ -161,6 +161,7 @@ void icp_set_mfrr(ICPState *icp, uint8_t mfrr);
 uint32_t icp_accept(ICPState *ss);
 uint32_t icp_ipoll(ICPState *ss, uint32_t *mfrr);
 void icp_eoi(ICPState *icp, uint32_t xirr);
+void icp_reset(ICPState *icp);
 
 void ics_write_xive(ICSState *ics, int nr, int server,
                     uint8_t priority, uint8_t saved_priority);
diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
index fd3319bd3202..e273069c25a9 100644
--- a/include/hw/ppc/xive.h
+++ b/include/hw/ppc/xive.h
@@ -319,6 +319,7 @@ typedef struct XiveTCTX {
     qemu_irq    os_output;
 
     uint8_t     regs[XIVE_TM_RING_COUNT * XIVE_TM_RING_SIZE];
+    uint32_t    os_cam;
 } XiveTCTX;
 
 /*
@@ -414,7 +415,9 @@ void xive_tctx_tm_write(XiveTCTX *tctx, hwaddr offset, uint64_t value,
 uint64_t xive_tctx_tm_read(XiveTCTX *tctx, hwaddr offset, unsigned size);
 
 void xive_tctx_pic_print_info(XiveTCTX *tctx, Monitor *mon);
-Object *xive_tctx_create(Object *cpu, XiveRouter *xrtr, Error **errp);
+Object *xive_tctx_create(Object *cpu, XiveRouter *xrtr, uint32_t os_cam,
+                         Error **errp);
+void xive_tctx_reset(XiveTCTX *tctx);
 
 static inline uint32_t xive_nvt_cam_line(uint8_t nvt_blk, uint32_t nvt_idx)
 {
diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
index ba32d2cc5b0f..71f138512a1c 100644
--- a/hw/intc/spapr_xive.c
+++ b/hw/intc/spapr_xive.c
@@ -205,21 +205,13 @@ void spapr_xive_mmio_set_enabled(SpaprXive *xive, bool enable)
     memory_region_set_enabled(&xive->end_source.esb_mmio, false);
 }
 
-/*
- * When a Virtual Processor is scheduled to run on a HW thread, the
- * hypervisor pushes its identifier in the OS CAM line. Emulate the
- * same behavior under QEMU.
- */
-void spapr_xive_set_tctx_os_cam(XiveTCTX *tctx)
+static uint32_t spapr_xive_get_os_cam(PowerPCCPU *cpu)
 {
     uint8_t  nvt_blk;
     uint32_t nvt_idx;
-    uint32_t nvt_cam;
 
-    spapr_xive_cpu_to_nvt(POWERPC_CPU(tctx->cs), &nvt_blk, &nvt_idx);
-
-    nvt_cam = cpu_to_be32(TM_QW1W2_VO | xive_nvt_cam_line(nvt_blk, nvt_idx));
-    memcpy(&tctx->regs[TM_QW1_OS + TM_WORD2], &nvt_cam, 4);
+    spapr_xive_cpu_to_nvt(cpu, &nvt_blk, &nvt_idx);
+    return xive_nvt_cam_line(nvt_blk, nvt_idx);
 }
 
 static void spapr_xive_end_reset(XiveEND *end)
@@ -537,19 +529,21 @@ static int spapr_xive_cpu_intc_create(SpaprInterruptController *intc,
     SpaprXive *xive = SPAPR_XIVE(intc);
     Object *obj;
     SpaprCpuState *spapr_cpu = spapr_cpu_state(cpu);
+    uint32_t os_cam = spapr_xive_get_os_cam(cpu);
 
-    obj = xive_tctx_create(OBJECT(cpu), XIVE_ROUTER(xive), errp);
+    obj = xive_tctx_create(OBJECT(cpu), XIVE_ROUTER(xive), os_cam, errp);
     if (!obj) {
         return -1;
     }
 
     spapr_cpu->tctx = XIVE_TCTX(obj);
+    return 0;
+}
 
-    /*
-     * (TCG) Early setting the OS CAM line for hotplugged CPUs as they
-     * don't beneficiate from the reset of the XIVE IRQ backend
-     */
-    spapr_xive_set_tctx_os_cam(spapr_cpu->tctx);
+static int spapr_xive_cpu_intc_reset(SpaprInterruptController *intc,
+                                     PowerPCCPU *cpu, Error **errp)
+{
+    xive_tctx_reset(spapr_cpu_state(cpu)->tctx);
     return 0;
 }
 
@@ -643,14 +637,6 @@ static void spapr_xive_dt(SpaprInterruptController *intc, uint32_t nr_servers,
 static int spapr_xive_activate(SpaprInterruptController *intc, Error **errp)
 {
     SpaprXive *xive = SPAPR_XIVE(intc);
-    CPUState *cs;
-
-    CPU_FOREACH(cs) {
-        PowerPCCPU *cpu = POWERPC_CPU(cs);
-
-        /* (TCG) Set the OS CAM line of the thread interrupt context. */
-        spapr_xive_set_tctx_os_cam(spapr_cpu_state(cpu)->tctx);
-    }
 
     if (kvm_enabled()) {
         int rc = spapr_irq_init_kvm(kvmppc_xive_connect, intc, errp);
@@ -697,6 +683,7 @@ static void spapr_xive_class_init(ObjectClass *klass, void *data)
     sicc->activate = spapr_xive_activate;
     sicc->deactivate = spapr_xive_deactivate;
     sicc->cpu_intc_create = spapr_xive_cpu_intc_create;
+    sicc->cpu_intc_reset = spapr_xive_cpu_intc_reset;
     sicc->claim_irq = spapr_xive_claim_irq;
     sicc->free_irq = spapr_xive_free_irq;
     sicc->set_irq = spapr_xive_set_irq;
diff --git a/hw/intc/xics.c b/hw/intc/xics.c
index b5ac408f7b74..652771d6a5a5 100644
--- a/hw/intc/xics.c
+++ b/hw/intc/xics.c
@@ -295,6 +295,11 @@ static void icp_reset_handler(void *dev)
     }
 }
 
+void icp_reset(ICPState *icp)
+{
+    icp_reset_handler(icp);
+}
+
 static void icp_realize(DeviceState *dev, Error **errp)
 {
     ICPState *icp = ICP(dev);
diff --git a/hw/intc/xics_spapr.c b/hw/intc/xics_spapr.c
index 4f64b9a9fc66..c0b2a576effe 100644
--- a/hw/intc/xics_spapr.c
+++ b/hw/intc/xics_spapr.c
@@ -346,6 +346,13 @@ static int xics_spapr_cpu_intc_create(SpaprInterruptController *intc,
     return 0;
 }
 
+static int xics_spapr_cpu_intc_reset(SpaprInterruptController *intc,
+                                     PowerPCCPU *cpu, Error **errp)
+{
+    icp_reset(spapr_cpu_state(cpu)->icp);
+    return 0;
+}
+
 static int xics_spapr_claim_irq(SpaprInterruptController *intc, int irq,
                                 bool lsi, Error **errp)
 {
@@ -433,6 +440,7 @@ static void ics_spapr_class_init(ObjectClass *klass, void *data)
     sicc->activate = xics_spapr_activate;
     sicc->deactivate = xics_spapr_deactivate;
     sicc->cpu_intc_create = xics_spapr_cpu_intc_create;
+    sicc->cpu_intc_reset = xics_spapr_cpu_intc_reset;
     sicc->claim_irq = xics_spapr_claim_irq;
     sicc->free_irq = xics_spapr_free_irq;
     sicc->set_irq = xics_spapr_set_irq;
diff --git a/hw/intc/xive.c b/hw/intc/xive.c
index d420c6571e14..be4f2c974178 100644
--- a/hw/intc/xive.c
+++ b/hw/intc/xive.c
@@ -547,7 +547,7 @@ void xive_tctx_pic_print_info(XiveTCTX *tctx, Monitor *mon)
     }
 }
 
-static void xive_tctx_reset(void *dev)
+static void xive_tctx_reset_handler(void *dev)
 {
     XiveTCTX *tctx = XIVE_TCTX(dev);
 
@@ -566,6 +566,23 @@ static void xive_tctx_reset(void *dev)
         ipb_to_pipr(tctx->regs[TM_QW1_OS + TM_IPB]);
     tctx->regs[TM_QW3_HV_PHYS + TM_PIPR] =
         ipb_to_pipr(tctx->regs[TM_QW3_HV_PHYS + TM_IPB]);
+
+    /*
+     * (TCG) Set the OS CAM line of the thread interrupt context.
+     *
+     * When a Virtual Processor is scheduled to run on a HW thread,
+     * the hypervisor pushes its identifier in the OS CAM line.
+     * Emulate the same behavior under QEMU.
+     */
+    if (tctx->os_cam) {
+        uint32_t qw1w2 = cpu_to_be32(TM_QW1W2_VO | tctx->os_cam);
+        memcpy(&tctx->regs[TM_QW1_OS + TM_WORD2], &qw1w2, 4);
+    }
+}
+
+void xive_tctx_reset(XiveTCTX *tctx)
+{
+    xive_tctx_reset_handler(tctx);
 }
 
 static void xive_tctx_realize(DeviceState *dev, Error **errp)
@@ -608,12 +625,12 @@ static void xive_tctx_realize(DeviceState *dev, Error **errp)
         }
     }
 
-    qemu_register_reset(xive_tctx_reset, dev);
+    qemu_register_reset(xive_tctx_reset_handler, dev);
 }
 
 static void xive_tctx_unrealize(DeviceState *dev, Error **errp)
 {
-    qemu_unregister_reset(xive_tctx_reset, dev);
+    qemu_unregister_reset(xive_tctx_reset_handler, dev);
 }
 
 static int vmstate_xive_tctx_pre_save(void *opaque)
@@ -662,11 +679,17 @@ static const VMStateDescription vmstate_xive_tctx = {
     },
 };
 
+static Property  xive_tctx_properties[] = {
+    DEFINE_PROP_UINT32("os-cam", XiveTCTX, os_cam, 0),
+    DEFINE_PROP_END_OF_LIST(),
+};
+
 static void xive_tctx_class_init(ObjectClass *klass, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(klass);
 
     dc->desc = "XIVE Interrupt Thread Context";
+    dc->props = xive_tctx_properties;
     dc->realize = xive_tctx_realize;
     dc->unrealize = xive_tctx_unrealize;
     dc->vmsd = &vmstate_xive_tctx;
@@ -684,7 +707,8 @@ static const TypeInfo xive_tctx_info = {
     .class_init    = xive_tctx_class_init,
 };
 
-Object *xive_tctx_create(Object *cpu, XiveRouter *xrtr, Error **errp)
+Object *xive_tctx_create(Object *cpu, XiveRouter *xrtr, uint32_t os_cam,
+                         Error **errp)
 {
     Error *local_err = NULL;
     Object *obj;
@@ -693,6 +717,7 @@ Object *xive_tctx_create(Object *cpu, XiveRouter *xrtr, Error **errp)
     object_property_add_child(cpu, TYPE_XIVE_TCTX, obj, &error_abort);
     object_unref(obj);
     object_property_add_const_link(obj, "cpu", cpu, &error_abort);
+    object_property_set_int(obj, os_cam, "os-cam", &local_err);
     object_property_set_bool(obj, true, "realized", &local_err);
     if (local_err) {
         goto error;
diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
index 7cf64b6d2533..99c06842573e 100644
--- a/hw/ppc/pnv.c
+++ b/hw/ppc/pnv.c
@@ -806,7 +806,8 @@ static void pnv_chip_power9_intc_create(PnvChip *chip, PowerPCCPU *cpu,
      * controller object is initialized afterwards. Hopefully, it's
      * only used at runtime.
      */
-    obj = xive_tctx_create(OBJECT(cpu), XIVE_ROUTER(&chip9->xive), &local_err);
+    obj = xive_tctx_create(OBJECT(cpu), XIVE_ROUTER(&chip9->xive), 0,
+                           &local_err);
     if (local_err) {
         error_propagate(errp, local_err);
         return;
diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
index 3e4302c7d596..416aa75e5fba 100644
--- a/hw/ppc/spapr_cpu_core.c
+++ b/hw/ppc/spapr_cpu_core.c
@@ -33,6 +33,7 @@ static void spapr_cpu_reset(void *opaque)
     PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
     SpaprCpuState *spapr_cpu = spapr_cpu_state(cpu);
     target_ulong lpcr;
+    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
 
     cpu_reset(cs);
 
@@ -77,9 +78,11 @@ static void spapr_cpu_reset(void *opaque)
     spapr_cpu->dtl_addr = 0;
     spapr_cpu->dtl_size = 0;
 
-    spapr_caps_cpu_apply(SPAPR_MACHINE(qdev_get_machine()), cpu);
+    spapr_caps_cpu_apply(spapr, cpu);
 
     kvm_check_mmu(cpu, &error_fatal);
+
+    spapr_irq_cpu_intc_reset(spapr, cpu, &error_fatal);
 }
 
 void spapr_cpu_set_entry_state(PowerPCCPU *cpu, target_ulong nip, target_ulong r3)
@@ -235,12 +238,13 @@ static void spapr_realize_vcpu(PowerPCCPU *cpu, SpaprMachineState *spapr,
     kvmppc_set_papr(cpu);
 
     qemu_register_reset(spapr_cpu_reset, cpu);
-    spapr_cpu_reset(cpu);
 
     if (spapr_irq_cpu_intc_create(spapr, cpu, &local_err) < 0) {
         goto error_unregister;
     }
 
+    spapr_cpu_reset(cpu);
+
     if (!sc->pre_3_0_migration) {
         vmstate_register(NULL, cs->cpu_index, &vmstate_spapr_cpu_state,
                          cpu->machine_data);
diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
index bb91c61fa000..5d2b64029cd5 100644
--- a/hw/ppc/spapr_irq.c
+++ b/hw/ppc/spapr_irq.c
@@ -220,6 +220,27 @@ int spapr_irq_cpu_intc_create(SpaprMachineState *spapr,
     return 0;
 }
 
+int spapr_irq_cpu_intc_reset(SpaprMachineState *spapr,
+                             PowerPCCPU *cpu, Error **errp)
+{
+    SpaprInterruptController *intcs[] = ALL_INTCS(spapr);
+    int i;
+    int rc;
+
+    for (i = 0; i < ARRAY_SIZE(intcs); i++) {
+        SpaprInterruptController *intc = intcs[i];
+        if (intc) {
+            SpaprInterruptControllerClass *sicc = SPAPR_INTC_GET_CLASS(intc);
+            rc = sicc->cpu_intc_reset(intc, cpu, errp);
+            if (rc < 0) {
+                return rc;
+            }
+        }
+    }
+
+    return 0;
+}
+
 static void spapr_set_irq(void *opaque, int irq, int level)
 {
     SpaprMachineState *spapr = SPAPR_MACHINE(opaque);
David Gibson Nov. 20, 2019, 4:17 a.m. UTC | #4
On Thu, Oct 17, 2019 at 03:42:06PM +0200, Cédric Le Goater wrote:
> On 17/10/2019 10:43, Cédric Le Goater wrote:
> > On 17/10/2019 07:42, David Gibson wrote:
> >> Traditional PCI INTx for vfio devices can only perform well if using
> >> an in-kernel irqchip.  Therefore, vfio_intx_update() issues a warning
> >> if an in kernel irqchip is not available.
> >>
> >> We usually do have an in-kernel irqchip available for pseries machines
> >> on POWER hosts.  However, because the platform allows feature
> >> negotiation of what interrupt controller model to use, we don't
> >> currently initialize it until machine reset.  vfio_intx_update() is
> >> called (first) from vfio_realize() before that, so it can issue a
> >> spurious warning, even if we will have an in kernel irqchip by the
> >> time we need it.
> >>
> >> To workaround this, make a call to spapr_irq_update_active_intc() from
> >> spapr_irq_init() which is called at machine realize time, before the
> >> vfio realize.  This call will be pretty much obsoleted by the later
> >> call at reset time, but it serves to suppress the spurious warning
> >> from VFIO.
> >>
> >> Cc: Alex Williamson <alex.williamson@redhat.com>
> >> Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
> >>
> >> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> >> ---
> >>  hw/ppc/spapr_irq.c | 11 ++++++++++-
> >>  1 file changed, 10 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
> >> index 45544b8976..bb91c61fa0 100644
> >> --- a/hw/ppc/spapr_irq.c
> >> +++ b/hw/ppc/spapr_irq.c
> >> @@ -345,6 +345,14 @@ void spapr_irq_init(SpaprMachineState *spapr, Error **errp)
> >>  
> >>      spapr->qirqs = qemu_allocate_irqs(spapr_set_irq, spapr,
> >>                                        smc->nr_xirqs + SPAPR_XIRQ_BASE);
> >> +
> >> +    /*
> >> +     * Mostly we don't actually need this until reset, except that not
> >> +     * having this set up can cause VFIO devices to issue a
> >> +     * false-positive warning during realize(), because they don't yet
> >> +     * have an in-kernel irq chip.
> >> +     */
> >> +    spapr_irq_update_active_intc(spapr);
> > 
> > This will call the de/activate hooks of the irq chip very early 
> > before reset and CAS, and won't call them at reset if the intc
> > pointers are the same (xive only for instance). It breaks very 
> > obviously the emulated XIVE device which won't reset the OS CAM 
> > line with the CPU id values and CPU notification will be broken.
> > 
> > We have to find something else.
> 
> Here is a possible fix for the (re)setting of the OS CAM line. 
> 
> Removing spapr_xive_set_tctx_os_cam() is a good thing but this solution
> shows some issues in our modeling of hot-plugged CPUS with a reset() 
> being called at realize().

Ok, I've applied the patch below now.  Does that mean that my 5/5
patch should be good now?
Cédric Le Goater Nov. 20, 2019, 6:50 a.m. UTC | #5
On 20/11/2019 05:17, David Gibson wrote:
> On Thu, Oct 17, 2019 at 03:42:06PM +0200, Cédric Le Goater wrote:
>> On 17/10/2019 10:43, Cédric Le Goater wrote:
>>> On 17/10/2019 07:42, David Gibson wrote:
>>>> Traditional PCI INTx for vfio devices can only perform well if using
>>>> an in-kernel irqchip.  Therefore, vfio_intx_update() issues a warning
>>>> if an in kernel irqchip is not available.
>>>>
>>>> We usually do have an in-kernel irqchip available for pseries machines
>>>> on POWER hosts.  However, because the platform allows feature
>>>> negotiation of what interrupt controller model to use, we don't
>>>> currently initialize it until machine reset.  vfio_intx_update() is
>>>> called (first) from vfio_realize() before that, so it can issue a
>>>> spurious warning, even if we will have an in kernel irqchip by the
>>>> time we need it.
>>>>
>>>> To workaround this, make a call to spapr_irq_update_active_intc() from
>>>> spapr_irq_init() which is called at machine realize time, before the
>>>> vfio realize.  This call will be pretty much obsoleted by the later
>>>> call at reset time, but it serves to suppress the spurious warning
>>>> from VFIO.
>>>>
>>>> Cc: Alex Williamson <alex.williamson@redhat.com>
>>>> Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>>
>>>> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
>>>> ---
>>>>  hw/ppc/spapr_irq.c | 11 ++++++++++-
>>>>  1 file changed, 10 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
>>>> index 45544b8976..bb91c61fa0 100644
>>>> --- a/hw/ppc/spapr_irq.c
>>>> +++ b/hw/ppc/spapr_irq.c
>>>> @@ -345,6 +345,14 @@ void spapr_irq_init(SpaprMachineState *spapr, Error **errp)
>>>>  
>>>>      spapr->qirqs = qemu_allocate_irqs(spapr_set_irq, spapr,
>>>>                                        smc->nr_xirqs + SPAPR_XIRQ_BASE);
>>>> +
>>>> +    /*
>>>> +     * Mostly we don't actually need this until reset, except that not
>>>> +     * having this set up can cause VFIO devices to issue a
>>>> +     * false-positive warning during realize(), because they don't yet
>>>> +     * have an in-kernel irq chip.
>>>> +     */
>>>> +    spapr_irq_update_active_intc(spapr);
>>>
>>> This will call the de/activate hooks of the irq chip very early 
>>> before reset and CAS, and won't call them at reset if the intc
>>> pointers are the same (xive only for instance). It breaks very 
>>> obviously the emulated XIVE device which won't reset the OS CAM 
>>> line with the CPU id values and CPU notification will be broken.
>>>
>>> We have to find something else.
>>
>> Here is a possible fix for the (re)setting of the OS CAM line. 
>>
>> Removing spapr_xive_set_tctx_os_cam() is a good thing but this solution
>> shows some issues in our modeling of hot-plugged CPUS with a reset() 
>> being called at realize().
> 
> Ok, I've applied the patch below now. 

Yes. this was replaced by :

  http://patchwork.ozlabs.org/cover/1181522/

> Does that mean that my 5/5 patch should be good now?

We should.

C.
diff mbox series

Patch

diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
index 45544b8976..bb91c61fa0 100644
--- a/hw/ppc/spapr_irq.c
+++ b/hw/ppc/spapr_irq.c
@@ -345,6 +345,14 @@  void spapr_irq_init(SpaprMachineState *spapr, Error **errp)
 
     spapr->qirqs = qemu_allocate_irqs(spapr_set_irq, spapr,
                                       smc->nr_xirqs + SPAPR_XIRQ_BASE);
+
+    /*
+     * Mostly we don't actually need this until reset, except that not
+     * having this set up can cause VFIO devices to issue a
+     * false-positive warning during realize(), because they don't yet
+     * have an in-kernel irq chip.
+     */
+    spapr_irq_update_active_intc(spapr);
 }
 
 int spapr_irq_claim(SpaprMachineState *spapr, int irq, bool lsi, Error **errp)
@@ -500,7 +508,8 @@  void spapr_irq_update_active_intc(SpaprMachineState *spapr)
          * this.
          */
         new_intc = SPAPR_INTC(spapr->xive);
-    } else if (spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
+    } else if (spapr->ov5_cas
+               && spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
         new_intc = SPAPR_INTC(spapr->xive);
     } else {
         new_intc = SPAPR_INTC(spapr->ics);