[v5,08/36] ppc/xive: introduce a simplified XIVE presenter

Message ID	20181116105729.23240-9-clg@kaod.org (mailing list archive)
State	New, archived
Headers	show Return-Path: <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org> sender: clg@kaod.org) by player734.ha.ovh.net (Postfix) with ESMTPSA id D0C4D2800F0; Fri, 16 Nov 2018 11:58:19 +0100 (CET) From: =?utf-8?q?C=C3=A9dric_Le_Goater?= <clg@kaod.org> To: David Gibson <david@gibson.dropbear.id.au> Date: Fri, 16 Nov 2018 11:57:01 +0100 Message-Id: <20181116105729.23240-9-clg@kaod.org> In-Reply-To: <20181116105729.23240-1-clg@kaod.org> References: <20181116105729.23240-1-clg@kaod.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Subject: [Qemu-devel] [PATCH v5 08/36] ppc/xive: introduce a simplified XIVE presenter Precedence: list Cc: qemu-ppc@nongnu.org, qemu-devel@nongnu.org, =?utf-8?q?C=C3=A9dric_Le_Goa?= =?utf-8?q?ter?= <clg@kaod.org> Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org>
Series	ppc: support for the XIVE interrupt controller (POWER9) \| expand [v5,00/36] ppc: support for the XIVE interrupt controller (POWER9) [v5,01/36] ppc/xive: introduce a XIVE interrupt source model [v5,02/36] ppc/xive: add support for the LSI interrupt sources [v5,03/36] ppc/xive: introduce the XiveFabric interface [v5,04/36] ppc/xive: introduce the XiveRouter model [v5,05/36] ppc/xive: introduce the XIVE Event Notification Descriptors [v5,06/36] ppc/xive: add support for the END Event State buffers [v5,07/36] ppc/xive: introduce the XIVE interrupt thread context [v5,08/36] ppc/xive: introduce a simplified XIVE presenter [v5,09/36] ppc/xive: notify the CPU when the interrupt priority is more privileged [v5,10/36] spapr/xive: introduce a XIVE interrupt controller [v5,11/36] spapr/xive: use the VCPU id as a NVT identifier [v5,12/36] spapr: initialize VSMT before initializing the IRQ backend [v5,13/36] spapr: introduce a spapr_irq_init() routine [v5,14/36] spapr: modify the irq backend 'init' method [v5,15/36] spapr: introdude a new machine IRQ backend for XIVE [v5,16/36] spapr: add hcalls support for the XIVE exploitation interrupt mode [v5,17/36] spapr: add device tree support for the XIVE exploitation mode [v5,18/36] spapr: allocate the interrupt thread context under the CPU core [v5,19/36] spapr: add a 'pseries-3.1-xive' machine type [v5,20/36] spapr: add classes for the XIVE models [v5,21/36] spapr: extend the sPAPR IRQ backend for XICS migration [v5,22/36] spapr/xive: add models for KVM support [v5,23/36] spapr/xive: add migration support for KVM [v5,24/36] spapr: add a 'reset' method to the sPAPR IRQ backend [v5,25/36] spapr: set the interrupt presenter at reset [v5,26/36] spapr: add a 'pseries-3.1-dual' machine type [v5,27/36] sysbus: add a sysbus_mmio_unmap() helper [v5,28/36] ppc/xics: introduce a icp_kvm_init() routine [v5,29/36] ppc/xics: remove abort() in icp_kvm_init() [v5,30/36] spapr: check for KVM IRQ device activation [v5,31/36] spapr/xive: export the spapr_xive_kvm_init() routine [v5,32/36] spapr/rtas: modify spapr_rtas_register() to remove RTAS handlers [v5,33/36] spapr: introduce routines to delete the KVM IRQ device [v5,34/36] spapr: add KVM support to the 'dual' machine [v5,35/36] ppc: externalize ppc_get_vcpu_by_pir() [v5,36/36] ppc/pnv: add XIVE support

Cédric Le Goater Nov. 16, 2018, 10:57 a.m. UTC

The last sub-engine of the XIVE architecture is the Interrupt
Virtualization Presentation Engine (IVPE). On HW, they share elements,
the Power Bus interface (CQ), the routing table descriptors, and they
can be combined in the same HW logic. We do the same in QEMU and
combine both engines in the XiveRouter for simplicity.

When the IVRE has completed its job of matching an event source with a
Notification Virtual Target (NVT) to notify, it forwards the event
notification to the IVPE sub-engine. The IVPE scans the thread
interrupt contexts of the Notification Virtual Targets (NVT)
dispatched on the HW processor threads and if a match is found, it
signals the thread. If not, the IVPE escalates the notification to
some other targets and records the notification in a backlog queue.

The IVPE maintains the thread interrupt context state for each of its
NVTs not dispatched on HW processor threads in the Notification
Virtual Target table (NVTT).

The model currently only supports single NVT notifications.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 include/hw/ppc/xive.h      |  13 +++
 include/hw/ppc/xive_regs.h |  22 ++++
 hw/intc/xive.c             | 223 +++++++++++++++++++++++++++++++++++++
 3 files changed, 258 insertions(+)

David Gibson Nov. 27, 2018, 11:49 p.m. UTC | #1

On Fri, Nov 16, 2018 at 11:57:01AM +0100, Cédric Le Goater wrote:
> The last sub-engine of the XIVE architecture is the Interrupt
> Virtualization Presentation Engine (IVPE). On HW, they share elements,
> the Power Bus interface (CQ), the routing table descriptors, and they
> can be combined in the same HW logic. We do the same in QEMU and
> combine both engines in the XiveRouter for simplicity.

Ok, I'm not entirely convinced combining the IVPE and IVRE into a
single object is a good idea, but we can probably discuss that once
I've read further.

> When the IVRE has completed its job of matching an event source with a
> Notification Virtual Target (NVT) to notify, it forwards the event
> notification to the IVPE sub-engine. The IVPE scans the thread
> interrupt contexts of the Notification Virtual Targets (NVT)
> dispatched on the HW processor threads and if a match is found, it
> signals the thread. If not, the IVPE escalates the notification to
> some other targets and records the notification in a backlog queue.
> 
> The IVPE maintains the thread interrupt context state for each of its
> NVTs not dispatched on HW processor threads in the Notification
> Virtual Target table (NVTT).
> 
> The model currently only supports single NVT notifications.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---
>  include/hw/ppc/xive.h      |  13 +++
>  include/hw/ppc/xive_regs.h |  22 ++++
>  hw/intc/xive.c             | 223 +++++++++++++++++++++++++++++++++++++
>  3 files changed, 258 insertions(+)
> 
> diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
> index 5987f26ddb98..e715a6c6923d 100644
> --- a/include/hw/ppc/xive.h
> +++ b/include/hw/ppc/xive.h
> @@ -197,6 +197,10 @@ typedef struct XiveRouterClass {
>                     XiveEND *end);
>      int (*set_end)(XiveRouter *xrtr, uint8_t end_blk, uint32_t end_idx,
>                     XiveEND *end);
> +    int (*get_nvt)(XiveRouter *xrtr, uint8_t nvt_blk, uint32_t nvt_idx,
> +                   XiveNVT *nvt);
> +    int (*set_nvt)(XiveRouter *xrtr, uint8_t nvt_blk, uint32_t nvt_idx,
> +                   XiveNVT *nvt);

As with the ENDs, I don't think get/set is a good interface for a
bigger-than-word-size object.

>  } XiveRouterClass;
>  
>  void xive_eas_pic_print_info(XiveEAS *eas, uint32_t lisn, Monitor *mon);
> @@ -207,6 +211,10 @@ int xive_router_get_end(XiveRouter *xrtr, uint8_t end_blk, uint32_t end_idx,
>                          XiveEND *end);
>  int xive_router_set_end(XiveRouter *xrtr, uint8_t end_blk, uint32_t end_idx,
>                          XiveEND *end);
> +int xive_router_get_nvt(XiveRouter *xrtr, uint8_t nvt_blk, uint32_t nvt_idx,
> +                        XiveNVT *nvt);
> +int xive_router_set_nvt(XiveRouter *xrtr, uint8_t nvt_blk, uint32_t nvt_idx,
> +                        XiveNVT *nvt);
>  
>  /*
>   * XIVE END ESBs
> @@ -274,4 +282,9 @@ extern const MemoryRegionOps xive_tm_ops;
>  
>  void xive_tctx_pic_print_info(XiveTCTX *tctx, Monitor *mon);
>  
> +static inline uint32_t xive_tctx_cam_line(uint8_t nvt_blk, uint32_t nvt_idx)
> +{
> +    return (nvt_blk << 19) | nvt_idx;

I'm guessing this formula is the standard way of combining the NVT
block and index into a single word?  If so, I think we should
standardize on passing a single word "nvt_id" around and only
splitting it when we need to use the block separately.  Same goes for
the end_id, assuming there's a standard way of putting that into a
single word.  That will address the point I raised earlier about lisn
being passed around as a single word, but these later stage ids being
split.

We'll probably want some inlines or macros to build an
nvt/end/lisn/whatever id from block and index as well.

> +}
> +
>  #endif /* PPC_XIVE_H */
> diff --git a/include/hw/ppc/xive_regs.h b/include/hw/ppc/xive_regs.h
> index 2e3d6cb507da..05cb992d2815 100644
> --- a/include/hw/ppc/xive_regs.h
> +++ b/include/hw/ppc/xive_regs.h
> @@ -158,4 +158,26 @@ typedef struct XiveEND {
>  #define END_W7_F1_LOG_SERVER_ID  PPC_BITMASK32(1, 31)
>  } XiveEND;
>  
> +/* Notification Virtual Target (NVT) */
> +typedef struct XiveNVT {
> +        uint32_t        w0;
> +#define NVT_W0_VALID             PPC_BIT32(0)
> +        uint32_t        w1;
> +        uint32_t        w2;
> +        uint32_t        w3;
> +        uint32_t        w4;
> +        uint32_t        w5;
> +        uint32_t        w6;
> +        uint32_t        w7;
> +        uint32_t        w8;
> +#define NVT_W8_GRP_VALID         PPC_BIT32(0)
> +        uint32_t        w9;
> +        uint32_t        wa;
> +        uint32_t        wb;
> +        uint32_t        wc;
> +        uint32_t        wd;
> +        uint32_t        we;
> +        uint32_t        wf;
> +} XiveNVT;
> +
>  #endif /* PPC_XIVE_REGS_H */
> diff --git a/hw/intc/xive.c b/hw/intc/xive.c
> index 4c6cb5d52975..5ba3b06e6e25 100644
> --- a/hw/intc/xive.c
> +++ b/hw/intc/xive.c
> @@ -373,6 +373,32 @@ void xive_tctx_pic_print_info(XiveTCTX *tctx, Monitor *mon)
>      }
>  }
>  
> +/* The HW CAM (23bits) is hardwired to :
> + *
> + *   0x000||0b1||4Bit chip number||7Bit Thread number.
> + *
> + * and when the block grouping extension is enabled :
> + *
> + *   4Bit chip number||0x001||7Bit Thread number.
> + */
> +static uint32_t tctx_hw_cam_line(bool block_group, uint8_t chip_id, uint8_t tid)
> +{
> +    if (block_group) {
> +        return 1 << 11 | (chip_id & 0xf) << 7 | (tid & 0x7f);
> +    } else {
> +        return (chip_id & 0xf) << 11 | 1 << 7 | (tid & 0x7f);
> +    }
> +}
> +
> +static uint32_t xive_tctx_hw_cam_line(XiveTCTX *tctx, bool block_group)
> +{
> +    PowerPCCPU *cpu = POWERPC_CPU(tctx->cs);
> +    CPUPPCState *env = &cpu->env;
> +    uint32_t pir = env->spr_cb[SPR_PIR].default_value;

I don't much like reaching into the cpu state itself.  I think a
better idea would be to have the TCTX have its HW CAM id set during
initialization (via a property) and then use that.  This will mean
less mucking about if future cpu revisions don't split the PIR into
chip and tid ids in the same way.

> +    return tctx_hw_cam_line(block_group, (pir >> 8) & 0xf, pir & 0x7f);
> +}
> +
>  static void xive_tctx_reset(void *dev)
>  {
>      XiveTCTX *tctx = XIVE_TCTX(dev);
> @@ -1013,6 +1039,195 @@ int xive_router_set_end(XiveRouter *xrtr, uint8_t end_blk, uint32_t end_idx,
>     return xrc->set_end(xrtr, end_blk, end_idx, end);
>  }
>  
> +int xive_router_get_nvt(XiveRouter *xrtr, uint8_t nvt_blk, uint32_t nvt_idx,
> +                        XiveNVT *nvt)
> +{
> +   XiveRouterClass *xrc = XIVE_ROUTER_GET_CLASS(xrtr);
> +
> +   return xrc->get_nvt(xrtr, nvt_blk, nvt_idx, nvt);
> +}
> +
> +int xive_router_set_nvt(XiveRouter *xrtr, uint8_t nvt_blk, uint32_t nvt_idx,
> +                        XiveNVT *nvt)
> +{
> +   XiveRouterClass *xrc = XIVE_ROUTER_GET_CLASS(xrtr);
> +
> +   return xrc->set_nvt(xrtr, nvt_blk, nvt_idx, nvt);
> +}
> +
> +static bool xive_tctx_ring_match(XiveTCTX *tctx, uint8_t ring,
> +                                 uint8_t nvt_blk, uint32_t nvt_idx,
> +                                 bool cam_ignore, uint32_t logic_serv)
> +{
> +    uint8_t *regs = &tctx->regs[ring];
> +    uint32_t w2 = be32_to_cpu(*((uint32_t *) &regs[TM_WORD2]));
> +    uint32_t cam = xive_tctx_cam_line(nvt_blk, nvt_idx);
> +    bool block_group = false; /* TODO (PowerNV) */
> +
> +    /* TODO (PowerNV): ignore low order bits of nvt id */
> +
> +    switch (ring) {
> +    case TM_QW3_HV_PHYS:
> +        return (w2 & TM_QW3W2_VT) && xive_tctx_hw_cam_line(tctx, block_group) ==
> +            tctx_hw_cam_line(block_group, nvt_blk, nvt_idx);

The difference between "xive_tctx_hw_cam_line" and "tctx_hw_cam_line"
here is far from obvious.  Remember that namespacing prefixes aren't
necessary for static functions, which can let you give more
descriptive names without getting excessively long.

> +    case TM_QW2_HV_POOL:
> +        return (w2 & TM_QW2W2_VP) && (cam == GETFIELD(TM_QW2W2_POOL_CAM, w2));
> +
> +    case TM_QW1_OS:
> +        return (w2 & TM_QW1W2_VO) && (cam == GETFIELD(TM_QW1W2_OS_CAM, w2));
> +
> +    case TM_QW0_USER:
> +        return ((w2 & TM_QW1W2_VO) && (cam == GETFIELD(TM_QW1W2_OS_CAM, w2)) &&
> +                (w2 & TM_QW0W2_VU) &&
> +                (logic_serv == GETFIELD(TM_QW0W2_LOGIC_SERV, w2)));
> +
> +    default:
> +        g_assert_not_reached();
> +    }
> +}
> +
> +static int xive_presenter_tctx_match(XiveTCTX *tctx, uint8_t format,
> +                                     uint8_t nvt_blk, uint32_t nvt_idx,
> +                                     bool cam_ignore, uint32_t logic_serv)
> +{
> +    if (format == 0) {
> +        /* F=0 & i=1: Logical server notification */
> +        if (cam_ignore == true) {
> +            qemu_log_mask(LOG_GUEST_ERROR, "XIVE: no support for LS "
> +                          "NVT %x/%x\n", nvt_blk, nvt_idx);
> +             return -1;
> +        }
> +
> +        /* F=0 & i=0: Specific NVT notification */
> +        if (xive_tctx_ring_match(tctx, TM_QW3_HV_PHYS,
> +                                nvt_blk, nvt_idx, false, 0)) {
> +            return TM_QW3_HV_PHYS;
> +        }
> +        if (xive_tctx_ring_match(tctx, TM_QW2_HV_POOL,
> +                                nvt_blk, nvt_idx, false, 0)) {
> +            return TM_QW2_HV_POOL;
> +        }
> +        if (xive_tctx_ring_match(tctx, TM_QW1_OS,
> +                                nvt_blk, nvt_idx, false, 0)) {
> +            return TM_QW1_OS;
> +        }

Hm.  It's a bit pointless to iterate through each ring calling a
common function, when that "common" function consists entirely of a
switch which makes it not really common at all.

So I think you want separate helper functions for each ring's match,
or even just fold the previous function into this one.

> +    } else {
> +        /* F=1 : User level Event-Based Branch (EBB) notification */
> +        if (xive_tctx_ring_match(tctx, TM_QW0_USER,
> +                                nvt_blk, nvt_idx, false, logic_serv)) {
> +            return TM_QW0_USER;
> +        }
> +    }
> +    return -1;
> +}
> +
> +typedef struct XiveTCTXMatch {
> +    XiveTCTX *tctx;
> +    uint8_t ring;
> +} XiveTCTXMatch;
> +
> +static bool xive_presenter_match(XiveRouter *xrtr, uint8_t format,
> +                                 uint8_t nvt_blk, uint32_t nvt_idx,
> +                                 bool cam_ignore, uint8_t priority,
> +                                 uint32_t logic_serv, XiveTCTXMatch *match)
> +{
> +    CPUState *cs;
> +
> +    /* TODO (PowerNV): handle chip_id overwrite of block field for
> +     * hardwired CAM compares */
> +
> +    CPU_FOREACH(cs) {
> +        PowerPCCPU *cpu = POWERPC_CPU(cs);
> +        XiveTCTX *tctx = XIVE_TCTX(cpu->intc);
> +        int ring;
> +
> +        /*
> +         * HW checks that the CPU is enabled in the Physical Thread
> +         * Enable Register (PTER).
> +         */
> +
> +        /*
> +         * Check the thread context CAM lines and record matches. We
> +         * will handle CPU exception delivery later
> +         */
> +        ring = xive_presenter_tctx_match(tctx, format, nvt_blk, nvt_idx,
> +                                         cam_ignore, logic_serv);
> +        /*
> +         * Save the context and follow on to catch duplicates, that we
> +         * don't support yet.
> +         */
> +        if (ring != -1) {
> +            if (match->tctx) {
> +                qemu_log_mask(LOG_GUEST_ERROR, "XIVE: already found a thread "
> +                              "context NVT %x/%x\n", nvt_blk, nvt_idx);
> +                return false;
> +            }
> +
> +            match->ring = ring;
> +            match->tctx = tctx;
> +        }
> +    }
> +
> +    if (!match->tctx) {
> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: NVT %x/%x is not dispatched\n",
> +                      nvt_blk, nvt_idx);
> +        return false;

Hmm.. this isn't actually an error isn't it?  At least not for powernv
- that just means the NVT isn't currently dispatched, so we'll need to
trigger the escalation interrupt.  Does this get changed later in the
series?

> +    }
> +
> +    return true;
> +}
> +
> +/*
> + * This is our simple Xive Presenter Engine model. It is merged in the
> + * Router as it does not require an extra object.
> + *
> + * It receives notification requests sent by the IVRE to find one
> + * matching NVT (or more) dispatched on the processor threads. In case
> + * of a single NVT notification, the process is abreviated and the
> + * thread is signaled if a match is found. In case of a logical server
> + * notification (bits ignored at the end of the NVT identifier), the
> + * IVPE and IVRE select a winning thread using different filters. This
> + * involves 2 or 3 exchanges on the PowerBus that the model does not
> + * support.
> + *
> + * The parameters represent what is sent on the PowerBus
> + */
> +static void xive_presenter_notify(XiveRouter *xrtr, uint8_t format,
> +                                  uint8_t nvt_blk, uint32_t nvt_idx,
> +                                  bool cam_ignore, uint8_t priority,
> +                                  uint32_t logic_serv)
> +{
> +    XiveNVT nvt;
> +    XiveTCTXMatch match = { 0 };
> +    bool found;
> +
> +    /* NVT cache lookup */
> +    if (xive_router_get_nvt(xrtr, nvt_blk, nvt_idx, &nvt)) {
> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: no NVT %x/%x\n",
> +                      nvt_blk, nvt_idx);
> +        return;
> +    }
> +
> +    if (!(nvt.w0 & NVT_W0_VALID)) {
> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: NVT %x/%x is invalid\n",
> +                      nvt_blk, nvt_idx);
> +        return;
> +    }
> +
> +    found = xive_presenter_match(xrtr, format, nvt_blk, nvt_idx, cam_ignore,
> +                                 priority, logic_serv, &match);
> +    if (found) {
> +        return;
> +    }
> +
> +    /* If no matching NVT is dispatched on a HW thread :
> +     * - update the NVT structure if backlog is activated
> +     * - escalate (ESe PQ bits and EAS in w4-5) if escalation is
> +     *   activated
> +     */
> +}
> +
>  /*
>   * An END trigger can come from an event trigger (IPI or HW) or from
>   * another chip. We don't model the PowerBus but the END trigger
> @@ -1081,6 +1296,14 @@ static void xive_router_end_notify(XiveRouter *xrtr, uint8_t end_blk,
>      /*
>       * Follows IVPE notification
>       */
> +    xive_presenter_notify(xrtr, format,
> +                          GETFIELD(END_W6_NVT_BLOCK, end.w6),
> +                          GETFIELD(END_W6_NVT_INDEX, end.w6),
> +                          GETFIELD(END_W7_F0_IGNORE, end.w7),
> +                          priority,
> +                          GETFIELD(END_W7_F1_LOG_SERVER_ID, end.w7));
> +
> +    /* TODO: Auto EOI. */
>  }
>  
>  static void xive_router_notify(XiveFabric *xf, uint32_t lisn)

Benjamin Herrenschmidt Nov. 28, 2018, 2:34 a.m. UTC | #2

On Wed, 2018-11-28 at 10:49 +1100, David Gibson wrote:
> On Fri, Nov 16, 2018 at 11:57:01AM +0100, Cédric Le Goater wrote:
> > The last sub-engine of the XIVE architecture is the Interrupt
> > Virtualization Presentation Engine (IVPE). On HW, they share elements,
> > the Power Bus interface (CQ), the routing table descriptors, and they
> > can be combined in the same HW logic. We do the same in QEMU and
> > combine both engines in the XiveRouter for simplicity.
> 
> Ok, I'm not entirely convinced combining the IVPE and IVRE into a
> single object is a good idea, but we can probably discuss that once
> I've read further.

Keep in mind that the communication between the two is a bit more hairy
than simple notifications, though. Especially once we start
implementing escalation interrupts or worse, groups...

> > When the IVRE has completed its job of matching an event source with a
> > Notification Virtual Target (NVT) to notify, it forwards the event
> > notification to the IVPE sub-engine. The IVPE scans the thread
> > interrupt contexts of the Notification Virtual Targets (NVT)
> > dispatched on the HW processor threads and if a match is found, it
> > signals the thread. If not, the IVPE escalates the notification to
> > some other targets and records the notification in a backlog queue.
> > 
> > The IVPE maintains the thread interrupt context state for each of its
> > NVTs not dispatched on HW processor threads in the Notification
> > Virtual Target table (NVTT).
> > 
> > The model currently only supports single NVT notifications.
> > 
> > Signed-off-by: Cédric Le Goater <clg@kaod.org>
> > ---
> >  include/hw/ppc/xive.h      |  13 +++
> >  include/hw/ppc/xive_regs.h |  22 ++++
> >  hw/intc/xive.c             | 223 +++++++++++++++++++++++++++++++++++++
> >  3 files changed, 258 insertions(+)
> > 
> > diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
> > index 5987f26ddb98..e715a6c6923d 100644
> > --- a/include/hw/ppc/xive.h
> > +++ b/include/hw/ppc/xive.h
> > @@ -197,6 +197,10 @@ typedef struct XiveRouterClass {
> >                     XiveEND *end);
> >      int (*set_end)(XiveRouter *xrtr, uint8_t end_blk, uint32_t end_idx,
> >                     XiveEND *end);
> > +    int (*get_nvt)(XiveRouter *xrtr, uint8_t nvt_blk, uint32_t nvt_idx,
> > +                   XiveNVT *nvt);
> > +    int (*set_nvt)(XiveRouter *xrtr, uint8_t nvt_blk, uint32_t nvt_idx,
> > +                   XiveNVT *nvt);
> 
> As with the ENDs, I don't think get/set is a good interface for a
> bigger-than-word-size object.
> 
> >  } XiveRouterClass;
> >  
> >  void xive_eas_pic_print_info(XiveEAS *eas, uint32_t lisn, Monitor *mon);
> > @@ -207,6 +211,10 @@ int xive_router_get_end(XiveRouter *xrtr, uint8_t end_blk, uint32_t end_idx,
> >                          XiveEND *end);
> >  int xive_router_set_end(XiveRouter *xrtr, uint8_t end_blk, uint32_t end_idx,
> >                          XiveEND *end);
> > +int xive_router_get_nvt(XiveRouter *xrtr, uint8_t nvt_blk, uint32_t nvt_idx,
> > +                        XiveNVT *nvt);
> > +int xive_router_set_nvt(XiveRouter *xrtr, uint8_t nvt_blk, uint32_t nvt_idx,
> > +                        XiveNVT *nvt);
> >  
> >  /*
> >   * XIVE END ESBs
> > @@ -274,4 +282,9 @@ extern const MemoryRegionOps xive_tm_ops;
> >  
> >  void xive_tctx_pic_print_info(XiveTCTX *tctx, Monitor *mon);
> >  
> > +static inline uint32_t xive_tctx_cam_line(uint8_t nvt_blk, uint32_t nvt_idx)
> > +{
> > +    return (nvt_blk << 19) | nvt_idx;
> 
> I'm guessing this formula is the standard way of combining the NVT
> block and index into a single word?  If so, I think we should
> standardize on passing a single word "nvt_id" around and only
> splitting it when we need to use the block separately.  Same goes for
> the end_id, assuming there's a standard way of putting that into a
> single word.  That will address the point I raised earlier about lisn
> being passed around as a single word, but these later stage ids being
> split.
> 
> We'll probably want some inlines or macros to build an
> nvt/end/lisn/whatever id from block and index as well.
> 
> > +}
> > +
> >  #endif /* PPC_XIVE_H */
> > diff --git a/include/hw/ppc/xive_regs.h b/include/hw/ppc/xive_regs.h
> > index 2e3d6cb507da..05cb992d2815 100644
> > --- a/include/hw/ppc/xive_regs.h
> > +++ b/include/hw/ppc/xive_regs.h
> > @@ -158,4 +158,26 @@ typedef struct XiveEND {
> >  #define END_W7_F1_LOG_SERVER_ID  PPC_BITMASK32(1, 31)
> >  } XiveEND;
> >  
> > +/* Notification Virtual Target (NVT) */
> > +typedef struct XiveNVT {
> > +        uint32_t        w0;
> > +#define NVT_W0_VALID             PPC_BIT32(0)
> > +        uint32_t        w1;
> > +        uint32_t        w2;
> > +        uint32_t        w3;
> > +        uint32_t        w4;
> > +        uint32_t        w5;
> > +        uint32_t        w6;
> > +        uint32_t        w7;
> > +        uint32_t        w8;
> > +#define NVT_W8_GRP_VALID         PPC_BIT32(0)
> > +        uint32_t        w9;
> > +        uint32_t        wa;
> > +        uint32_t        wb;
> > +        uint32_t        wc;
> > +        uint32_t        wd;
> > +        uint32_t        we;
> > +        uint32_t        wf;
> > +} XiveNVT;
> > +
> >  #endif /* PPC_XIVE_REGS_H */
> > diff --git a/hw/intc/xive.c b/hw/intc/xive.c
> > index 4c6cb5d52975..5ba3b06e6e25 100644
> > --- a/hw/intc/xive.c
> > +++ b/hw/intc/xive.c
> > @@ -373,6 +373,32 @@ void xive_tctx_pic_print_info(XiveTCTX *tctx, Monitor *mon)
> >      }
> >  }
> >  
> > +/* The HW CAM (23bits) is hardwired to :
> > + *
> > + *   0x000||0b1||4Bit chip number||7Bit Thread number.
> > + *
> > + * and when the block grouping extension is enabled :
> > + *
> > + *   4Bit chip number||0x001||7Bit Thread number.
> > + */
> > +static uint32_t tctx_hw_cam_line(bool block_group, uint8_t chip_id, uint8_t tid)
> > +{
> > +    if (block_group) {
> > +        return 1 << 11 | (chip_id & 0xf) << 7 | (tid & 0x7f);
> > +    } else {
> > +        return (chip_id & 0xf) << 11 | 1 << 7 | (tid & 0x7f);
> > +    }
> > +}
> > +
> > +static uint32_t xive_tctx_hw_cam_line(XiveTCTX *tctx, bool block_group)
> > +{
> > +    PowerPCCPU *cpu = POWERPC_CPU(tctx->cs);
> > +    CPUPPCState *env = &cpu->env;
> > +    uint32_t pir = env->spr_cb[SPR_PIR].default_value;
> 
> I don't much like reaching into the cpu state itself.  I think a
> better idea would be to have the TCTX have its HW CAM id set during
> initialization (via a property) and then use that.  This will mean
> less mucking about if future cpu revisions don't split the PIR into
> chip and tid ids in the same way.
> 
> > +    return tctx_hw_cam_line(block_group, (pir >> 8) & 0xf, pir & 0x7f);
> > +}
> > +
> >  static void xive_tctx_reset(void *dev)
> >  {
> >      XiveTCTX *tctx = XIVE_TCTX(dev);
> > @@ -1013,6 +1039,195 @@ int xive_router_set_end(XiveRouter *xrtr, uint8_t end_blk, uint32_t end_idx,
> >     return xrc->set_end(xrtr, end_blk, end_idx, end);
> >  }
> >  
> > +int xive_router_get_nvt(XiveRouter *xrtr, uint8_t nvt_blk, uint32_t nvt_idx,
> > +                        XiveNVT *nvt)
> > +{
> > +   XiveRouterClass *xrc = XIVE_ROUTER_GET_CLASS(xrtr);
> > +
> > +   return xrc->get_nvt(xrtr, nvt_blk, nvt_idx, nvt);
> > +}
> > +
> > +int xive_router_set_nvt(XiveRouter *xrtr, uint8_t nvt_blk, uint32_t nvt_idx,
> > +                        XiveNVT *nvt)
> > +{
> > +   XiveRouterClass *xrc = XIVE_ROUTER_GET_CLASS(xrtr);
> > +
> > +   return xrc->set_nvt(xrtr, nvt_blk, nvt_idx, nvt);
> > +}
> > +
> > +static bool xive_tctx_ring_match(XiveTCTX *tctx, uint8_t ring,
> > +                                 uint8_t nvt_blk, uint32_t nvt_idx,
> > +                                 bool cam_ignore, uint32_t logic_serv)
> > +{
> > +    uint8_t *regs = &tctx->regs[ring];
> > +    uint32_t w2 = be32_to_cpu(*((uint32_t *) &regs[TM_WORD2]));
> > +    uint32_t cam = xive_tctx_cam_line(nvt_blk, nvt_idx);
> > +    bool block_group = false; /* TODO (PowerNV) */
> > +
> > +    /* TODO (PowerNV): ignore low order bits of nvt id */
> > +
> > +    switch (ring) {
> > +    case TM_QW3_HV_PHYS:
> > +        return (w2 & TM_QW3W2_VT) && xive_tctx_hw_cam_line(tctx, block_group) ==
> > +            tctx_hw_cam_line(block_group, nvt_blk, nvt_idx);
> 
> The difference between "xive_tctx_hw_cam_line" and "tctx_hw_cam_line"
> here is far from obvious.  Remember that namespacing prefixes aren't
> necessary for static functions, which can let you give more
> descriptive names without getting excessively long.
> 
> > +    case TM_QW2_HV_POOL:
> > +        return (w2 & TM_QW2W2_VP) && (cam == GETFIELD(TM_QW2W2_POOL_CAM, w2));
> > +
> > +    case TM_QW1_OS:
> > +        return (w2 & TM_QW1W2_VO) && (cam == GETFIELD(TM_QW1W2_OS_CAM, w2));
> > +
> > +    case TM_QW0_USER:
> > +        return ((w2 & TM_QW1W2_VO) && (cam == GETFIELD(TM_QW1W2_OS_CAM, w2)) &&
> > +                (w2 & TM_QW0W2_VU) &&
> > +                (logic_serv == GETFIELD(TM_QW0W2_LOGIC_SERV, w2)));
> > +
> > +    default:
> > +        g_assert_not_reached();
> > +    }
> > +}
> > +
> > +static int xive_presenter_tctx_match(XiveTCTX *tctx, uint8_t format,
> > +                                     uint8_t nvt_blk, uint32_t nvt_idx,
> > +                                     bool cam_ignore, uint32_t logic_serv)
> > +{
> > +    if (format == 0) {
> > +        /* F=0 & i=1: Logical server notification */
> > +        if (cam_ignore == true) {
> > +            qemu_log_mask(LOG_GUEST_ERROR, "XIVE: no support for LS "
> > +                          "NVT %x/%x\n", nvt_blk, nvt_idx);
> > +             return -1;
> > +        }
> > +
> > +        /* F=0 & i=0: Specific NVT notification */
> > +        if (xive_tctx_ring_match(tctx, TM_QW3_HV_PHYS,
> > +                                nvt_blk, nvt_idx, false, 0)) {
> > +            return TM_QW3_HV_PHYS;
> > +        }
> > +        if (xive_tctx_ring_match(tctx, TM_QW2_HV_POOL,
> > +                                nvt_blk, nvt_idx, false, 0)) {
> > +            return TM_QW2_HV_POOL;
> > +        }
> > +        if (xive_tctx_ring_match(tctx, TM_QW1_OS,
> > +                                nvt_blk, nvt_idx, false, 0)) {
> > +            return TM_QW1_OS;
> > +        }
> 
> Hm.  It's a bit pointless to iterate through each ring calling a
> common function, when that "common" function consists entirely of a
> switch which makes it not really common at all.
> 
> So I think you want separate helper functions for each ring's match,
> or even just fold the previous function into this one.
> 
> > +    } else {
> > +        /* F=1 : User level Event-Based Branch (EBB) notification */
> > +        if (xive_tctx_ring_match(tctx, TM_QW0_USER,
> > +                                nvt_blk, nvt_idx, false, logic_serv)) {
> > +            return TM_QW0_USER;
> > +        }
> > +    }
> > +    return -1;
> > +}
> > +
> > +typedef struct XiveTCTXMatch {
> > +    XiveTCTX *tctx;
> > +    uint8_t ring;
> > +} XiveTCTXMatch;
> > +
> > +static bool xive_presenter_match(XiveRouter *xrtr, uint8_t format,
> > +                                 uint8_t nvt_blk, uint32_t nvt_idx,
> > +                                 bool cam_ignore, uint8_t priority,
> > +                                 uint32_t logic_serv, XiveTCTXMatch *match)
> > +{
> > +    CPUState *cs;
> > +
> > +    /* TODO (PowerNV): handle chip_id overwrite of block field for
> > +     * hardwired CAM compares */
> > +
> > +    CPU_FOREACH(cs) {
> > +        PowerPCCPU *cpu = POWERPC_CPU(cs);
> > +        XiveTCTX *tctx = XIVE_TCTX(cpu->intc);
> > +        int ring;
> > +
> > +        /*
> > +         * HW checks that the CPU is enabled in the Physical Thread
> > +         * Enable Register (PTER).
> > +         */
> > +
> > +        /*
> > +         * Check the thread context CAM lines and record matches. We
> > +         * will handle CPU exception delivery later
> > +         */
> > +        ring = xive_presenter_tctx_match(tctx, format, nvt_blk, nvt_idx,
> > +                                         cam_ignore, logic_serv);
> > +        /*
> > +         * Save the context and follow on to catch duplicates, that we
> > +         * don't support yet.
> > +         */
> > +        if (ring != -1) {
> > +            if (match->tctx) {
> > +                qemu_log_mask(LOG_GUEST_ERROR, "XIVE: already found a thread "
> > +                              "context NVT %x/%x\n", nvt_blk, nvt_idx);
> > +                return false;
> > +            }
> > +
> > +            match->ring = ring;
> > +            match->tctx = tctx;
> > +        }
> > +    }
> > +
> > +    if (!match->tctx) {
> > +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: NVT %x/%x is not dispatched\n",
> > +                      nvt_blk, nvt_idx);
> > +        return false;
> 
> Hmm.. this isn't actually an error isn't it?  At least not for powernv
> - that just means the NVT isn't currently dispatched, so we'll need to
> trigger the escalation interrupt.  Does this get changed later in the
> series?
> 
> > +    }
> > +
> > +    return true;
> > +}
> > +
> > +/*
> > + * This is our simple Xive Presenter Engine model. It is merged in the
> > + * Router as it does not require an extra object.
> > + *
> > + * It receives notification requests sent by the IVRE to find one
> > + * matching NVT (or more) dispatched on the processor threads. In case
> > + * of a single NVT notification, the process is abreviated and the
> > + * thread is signaled if a match is found. In case of a logical server
> > + * notification (bits ignored at the end of the NVT identifier), the
> > + * IVPE and IVRE select a winning thread using different filters. This
> > + * involves 2 or 3 exchanges on the PowerBus that the model does not
> > + * support.
> > + *
> > + * The parameters represent what is sent on the PowerBus
> > + */
> > +static void xive_presenter_notify(XiveRouter *xrtr, uint8_t format,
> > +                                  uint8_t nvt_blk, uint32_t nvt_idx,
> > +                                  bool cam_ignore, uint8_t priority,
> > +                                  uint32_t logic_serv)
> > +{
> > +    XiveNVT nvt;
> > +    XiveTCTXMatch match = { 0 };
> > +    bool found;
> > +
> > +    /* NVT cache lookup */
> > +    if (xive_router_get_nvt(xrtr, nvt_blk, nvt_idx, &nvt)) {
> > +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: no NVT %x/%x\n",
> > +                      nvt_blk, nvt_idx);
> > +        return;
> > +    }
> > +
> > +    if (!(nvt.w0 & NVT_W0_VALID)) {
> > +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: NVT %x/%x is invalid\n",
> > +                      nvt_blk, nvt_idx);
> > +        return;
> > +    }
> > +
> > +    found = xive_presenter_match(xrtr, format, nvt_blk, nvt_idx, cam_ignore,
> > +                                 priority, logic_serv, &match);
> > +    if (found) {
> > +        return;
> > +    }
> > +
> > +    /* If no matching NVT is dispatched on a HW thread :
> > +     * - update the NVT structure if backlog is activated
> > +     * - escalate (ESe PQ bits and EAS in w4-5) if escalation is
> > +     *   activated
> > +     */
> > +}
> > +
> >  /*
> >   * An END trigger can come from an event trigger (IPI or HW) or from
> >   * another chip. We don't model the PowerBus but the END trigger
> > @@ -1081,6 +1296,14 @@ static void xive_router_end_notify(XiveRouter *xrtr, uint8_t end_blk,
> >      /*
> >       * Follows IVPE notification
> >       */
> > +    xive_presenter_notify(xrtr, format,
> > +                          GETFIELD(END_W6_NVT_BLOCK, end.w6),
> > +                          GETFIELD(END_W6_NVT_INDEX, end.w6),
> > +                          GETFIELD(END_W7_F0_IGNORE, end.w7),
> > +                          priority,
> > +                          GETFIELD(END_W7_F1_LOG_SERVER_ID, end.w7));
> > +
> > +    /* TODO: Auto EOI. */
> >  }
> >  
> >  static void xive_router_notify(XiveFabric *xf, uint32_t lisn)

Cédric Le Goater Nov. 28, 2018, 10:59 a.m. UTC | #3

On 11/28/18 12:49 AM, David Gibson wrote:
> On Fri, Nov 16, 2018 at 11:57:01AM +0100, Cédric Le Goater wrote:
>> The last sub-engine of the XIVE architecture is the Interrupt
>> Virtualization Presentation Engine (IVPE). On HW, they share elements,
>> the Power Bus interface (CQ), the routing table descriptors, and they
>> can be combined in the same HW logic. We do the same in QEMU and
>> combine both engines in the XiveRouter for simplicity.
> 
> Ok, I'm not entirely convinced combining the IVPE and IVRE into a
> single object is a good idea, but we can probably discuss that once
> I've read further.

We could introduce a simplified presenter for sPAPR but I am not even
sure of that as it will get more complex if we support the EBB one day. 
 
>> When the IVRE has completed its job of matching an event source with a
>> Notification Virtual Target (NVT) to notify, it forwards the event
>> notification to the IVPE sub-engine. The IVPE scans the thread
>> interrupt contexts of the Notification Virtual Targets (NVT)
>> dispatched on the HW processor threads and if a match is found, it
>> signals the thread. If not, the IVPE escalates the notification to
>> some other targets and records the notification in a backlog queue.
>>
>> The IVPE maintains the thread interrupt context state for each of its
>> NVTs not dispatched on HW processor threads in the Notification
>> Virtual Target table (NVTT).
>>
>> The model currently only supports single NVT notifications.
>>
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>> ---
>>  include/hw/ppc/xive.h      |  13 +++
>>  include/hw/ppc/xive_regs.h |  22 ++++
>>  hw/intc/xive.c             | 223 +++++++++++++++++++++++++++++++++++++
>>  3 files changed, 258 insertions(+)
>>
>> diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
>> index 5987f26ddb98..e715a6c6923d 100644
>> --- a/include/hw/ppc/xive.h
>> +++ b/include/hw/ppc/xive.h
>> @@ -197,6 +197,10 @@ typedef struct XiveRouterClass {
>>                     XiveEND *end);
>>      int (*set_end)(XiveRouter *xrtr, uint8_t end_blk, uint32_t end_idx,
>>                     XiveEND *end);
>> +    int (*get_nvt)(XiveRouter *xrtr, uint8_t nvt_blk, uint32_t nvt_idx,
>> +                   XiveNVT *nvt);
>> +    int (*set_nvt)(XiveRouter *xrtr, uint8_t nvt_blk, uint32_t nvt_idx,
>> +                   XiveNVT *nvt);
> 
> As with the ENDs, I don't think get/set is a good interface for a
> bigger-than-word-size object.

We need to agree on this interface before I respin. So you would like 
to add a extra argument specifying the word being accessed ? 

> 
>>  } XiveRouterClass;
>>  
>>  void xive_eas_pic_print_info(XiveEAS *eas, uint32_t lisn, Monitor *mon);
>> @@ -207,6 +211,10 @@ int xive_router_get_end(XiveRouter *xrtr, uint8_t end_blk, uint32_t end_idx,
>>                          XiveEND *end);
>>  int xive_router_set_end(XiveRouter *xrtr, uint8_t end_blk, uint32_t end_idx,
>>                          XiveEND *end);
>> +int xive_router_get_nvt(XiveRouter *xrtr, uint8_t nvt_blk, uint32_t nvt_idx,
>> +                        XiveNVT *nvt);
>> +int xive_router_set_nvt(XiveRouter *xrtr, uint8_t nvt_blk, uint32_t nvt_idx,
>> +                        XiveNVT *nvt);
>>  
>>  /*
>>   * XIVE END ESBs
>> @@ -274,4 +282,9 @@ extern const MemoryRegionOps xive_tm_ops;
>>  
>>  void xive_tctx_pic_print_info(XiveTCTX *tctx, Monitor *mon);
>>  
>> +static inline uint32_t xive_tctx_cam_line(uint8_t nvt_blk, uint32_t nvt_idx)
>> +{
>> +    return (nvt_blk << 19) | nvt_idx;
> 
> I'm guessing this formula is the standard way of combining the NVT
> block and index into a single word?  

That number is the VP/NVT identifier which is written in the CAM value. 
The index is on 19 bits because of the NVT  definition in the END 
structure. It is being increased to 24 bits on Power10 

> If so, I think we should
> standardize on passing a single word "nvt_id" around and only
> splitting it when we need to use the block separately.  

This is really the only place where we concatenate the two NVT values,
block and index. 

> Same goes for
> the end_id, assuming there's a standard way of putting that into a
> single word.  That will address the point I raised earlier about lisn
> being passed around as a single word, but these later stage ids being
> split.

Hmm, I am not sure this is a good option. It is not how the PowerNV 
model would use it, skiboot is very much aware of these blocks and 
indexes and for remote accesses chips are identified using the block. 
I will take a look at it but I am not found of it. I can add helpers 
in some places though.    

I agree we have some kind of issue linking the HW model with the sPAPR 
machine. The guest interface is only  about IRQ numbers, priorities and
cpu numbers. We really don't care about XIVE blocks and indexes in that 
case. we can clarify the code by bypassing the XiveRouter interfaces
to the table and directly use the sPAPR interrupt controller. That 
should help a bit for the hcalls but we would still have to fill in 
the EAT and the END with some index values if we want to use the router
algorithm.   



> We'll probably want some inlines or macros to build an
> nvt/end/lisn/whatever id from block and index as well.
> 
>> +}
>> +
>>  #endif /* PPC_XIVE_H */
>> diff --git a/include/hw/ppc/xive_regs.h b/include/hw/ppc/xive_regs.h
>> index 2e3d6cb507da..05cb992d2815 100644
>> --- a/include/hw/ppc/xive_regs.h
>> +++ b/include/hw/ppc/xive_regs.h
>> @@ -158,4 +158,26 @@ typedef struct XiveEND {
>>  #define END_W7_F1_LOG_SERVER_ID  PPC_BITMASK32(1, 31)
>>  } XiveEND;
>>  
>> +/* Notification Virtual Target (NVT) */
>> +typedef struct XiveNVT {
>> +        uint32_t        w0;
>> +#define NVT_W0_VALID             PPC_BIT32(0)
>> +        uint32_t        w1;
>> +        uint32_t        w2;
>> +        uint32_t        w3;
>> +        uint32_t        w4;
>> +        uint32_t        w5;
>> +        uint32_t        w6;
>> +        uint32_t        w7;
>> +        uint32_t        w8;
>> +#define NVT_W8_GRP_VALID         PPC_BIT32(0)
>> +        uint32_t        w9;
>> +        uint32_t        wa;
>> +        uint32_t        wb;
>> +        uint32_t        wc;
>> +        uint32_t        wd;
>> +        uint32_t        we;
>> +        uint32_t        wf;
>> +} XiveNVT;
>> +
>>  #endif /* PPC_XIVE_REGS_H */
>> diff --git a/hw/intc/xive.c b/hw/intc/xive.c
>> index 4c6cb5d52975..5ba3b06e6e25 100644
>> --- a/hw/intc/xive.c
>> +++ b/hw/intc/xive.c
>> @@ -373,6 +373,32 @@ void xive_tctx_pic_print_info(XiveTCTX *tctx, Monitor *mon)
>>      }
>>  }
>>  
>> +/* The HW CAM (23bits) is hardwired to :
>> + *
>> + *   0x000||0b1||4Bit chip number||7Bit Thread number.
>> + *
>> + * and when the block grouping extension is enabled :
>> + *
>> + *   4Bit chip number||0x001||7Bit Thread number.
>> + */
>> +static uint32_t tctx_hw_cam_line(bool block_group, uint8_t chip_id, uint8_t tid)
>> +{
>> +    if (block_group) {
>> +        return 1 << 11 | (chip_id & 0xf) << 7 | (tid & 0x7f);
>> +    } else {
>> +        return (chip_id & 0xf) << 11 | 1 << 7 | (tid & 0x7f);
>> +    }
>> +}
>> +
>> +static uint32_t xive_tctx_hw_cam_line(XiveTCTX *tctx, bool block_group)
>> +{
>> +    PowerPCCPU *cpu = POWERPC_CPU(tctx->cs);
>> +    CPUPPCState *env = &cpu->env;
>> +    uint32_t pir = env->spr_cb[SPR_PIR].default_value;
> 
> I don't much like reaching into the cpu state itself.  I think a
> better idea would be to have the TCTX have its HW CAM id set during
> initialization (via a property) and then use that.  This will mean
> less mucking about if future cpu revisions don't split the PIR into
> chip and tid ids in the same way.

yes good idea. I will see how to handle the block_group boolean. may be we
can leave it out of the model for now as it is not used. 

> 
>> +    return tctx_hw_cam_line(block_group, (pir >> 8) & 0xf, pir & 0x7f);
>> +}
>> +
>>  static void xive_tctx_reset(void *dev)
>>  {
>>      XiveTCTX *tctx = XIVE_TCTX(dev);
>> @@ -1013,6 +1039,195 @@ int xive_router_set_end(XiveRouter *xrtr, uint8_t end_blk, uint32_t end_idx,
>>     return xrc->set_end(xrtr, end_blk, end_idx, end);
>>  }
>>  
>> +int xive_router_get_nvt(XiveRouter *xrtr, uint8_t nvt_blk, uint32_t nvt_idx,
>> +                        XiveNVT *nvt)
>> +{
>> +   XiveRouterClass *xrc = XIVE_ROUTER_GET_CLASS(xrtr);
>> +
>> +   return xrc->get_nvt(xrtr, nvt_blk, nvt_idx, nvt);
>> +}
>> +
>> +int xive_router_set_nvt(XiveRouter *xrtr, uint8_t nvt_blk, uint32_t nvt_idx,
>> +                        XiveNVT *nvt)
>> +{
>> +   XiveRouterClass *xrc = XIVE_ROUTER_GET_CLASS(xrtr);
>> +
>> +   return xrc->set_nvt(xrtr, nvt_blk, nvt_idx, nvt);
>> +}
>> +
>> +static bool xive_tctx_ring_match(XiveTCTX *tctx, uint8_t ring,
>> +                                 uint8_t nvt_blk, uint32_t nvt_idx,
>> +                                 bool cam_ignore, uint32_t logic_serv)
>> +{
>> +    uint8_t *regs = &tctx->regs[ring];
>> +    uint32_t w2 = be32_to_cpu(*((uint32_t *) &regs[TM_WORD2]));
>> +    uint32_t cam = xive_tctx_cam_line(nvt_blk, nvt_idx);
>> +    bool block_group = false; /* TODO (PowerNV) */
>> +
>> +    /* TODO (PowerNV): ignore low order bits of nvt id */
>> +
>> +    switch (ring) {
>> +    case TM_QW3_HV_PHYS:
>> +        return (w2 & TM_QW3W2_VT) && xive_tctx_hw_cam_line(tctx, block_group) ==
>> +            tctx_hw_cam_line(block_group, nvt_blk, nvt_idx);
> 
> The difference between "xive_tctx_hw_cam_line" and "tctx_hw_cam_line"
> here is far from obvious.  

yes. I lacked inspiration ...

> Remember that namespacing prefixes aren't
> necessary for static functions, which can let you give more
> descriptive names without getting excessively long.

OK.
 
>> +    case TM_QW2_HV_POOL:
>> +        return (w2 & TM_QW2W2_VP) && (cam == GETFIELD(TM_QW2W2_POOL_CAM, w2));
>> +
>> +    case TM_QW1_OS:
>> +        return (w2 & TM_QW1W2_VO) && (cam == GETFIELD(TM_QW1W2_OS_CAM, w2));
>> +
>> +    case TM_QW0_USER:
>> +        return ((w2 & TM_QW1W2_VO) && (cam == GETFIELD(TM_QW1W2_OS_CAM, w2)) &&
>> +                (w2 & TM_QW0W2_VU) &&
>> +                (logic_serv == GETFIELD(TM_QW0W2_LOGIC_SERV, w2)));
>> +
>> +    default:
>> +        g_assert_not_reached();
>> +    }
>> +}
>> +
>> +static int xive_presenter_tctx_match(XiveTCTX *tctx, uint8_t format,
>> +                                     uint8_t nvt_blk, uint32_t nvt_idx,
>> +                                     bool cam_ignore, uint32_t logic_serv)
>> +{
>> +    if (format == 0) {
>> +        /* F=0 & i=1: Logical server notification */
>> +        if (cam_ignore == true) {
>> +            qemu_log_mask(LOG_GUEST_ERROR, "XIVE: no support for LS "
>> +                          "NVT %x/%x\n", nvt_blk, nvt_idx);
>> +             return -1;
>> +        }
>> +
>> +        /* F=0 & i=0: Specific NVT notification */
>> +        if (xive_tctx_ring_match(tctx, TM_QW3_HV_PHYS,
>> +                                nvt_blk, nvt_idx, false, 0)) {
>> +            return TM_QW3_HV_PHYS;
>> +        }
>> +        if (xive_tctx_ring_match(tctx, TM_QW2_HV_POOL,
>> +                                nvt_blk, nvt_idx, false, 0)) {
>> +            return TM_QW2_HV_POOL;
>> +        }
>> +        if (xive_tctx_ring_match(tctx, TM_QW1_OS,
>> +                                nvt_blk, nvt_idx, false, 0)) {
>> +            return TM_QW1_OS;
>> +        }
> 
> Hm.  It's a bit pointless to iterate through each ring calling a
> common function, when that "common" function consists entirely of a
> switch which makes it not really common at all.
> 
> So I think you want separate helper functions for each ring's match,
> or even just fold the previous function into this one.

yes. It can be improved. I did try different layouts. I might just fold 
both routine in one as you propose.  

>> +    } else {
>> +        /* F=1 : User level Event-Based Branch (EBB) notification */
>> +        if (xive_tctx_ring_match(tctx, TM_QW0_USER,
>> +                                nvt_blk, nvt_idx, false, logic_serv)) {
>> +            return TM_QW0_USER;
>> +        }
>> +    }
>> +    return -1;
>> +}
>> +
>> +typedef struct XiveTCTXMatch {
>> +    XiveTCTX *tctx;
>> +    uint8_t ring;
>> +} XiveTCTXMatch;
>> +
>> +static bool xive_presenter_match(XiveRouter *xrtr, uint8_t format,
>> +                                 uint8_t nvt_blk, uint32_t nvt_idx,
>> +                                 bool cam_ignore, uint8_t priority,
>> +                                 uint32_t logic_serv, XiveTCTXMatch *match)
>> +{
>> +    CPUState *cs;
>> +
>> +    /* TODO (PowerNV): handle chip_id overwrite of block field for
>> +     * hardwired CAM compares */
>> +
>> +    CPU_FOREACH(cs) {
>> +        PowerPCCPU *cpu = POWERPC_CPU(cs);
>> +        XiveTCTX *tctx = XIVE_TCTX(cpu->intc);
>> +        int ring;
>> +
>> +        /*
>> +         * HW checks that the CPU is enabled in the Physical Thread
>> +         * Enable Register (PTER).
>> +         */
>> +
>> +        /*
>> +         * Check the thread context CAM lines and record matches. We
>> +         * will handle CPU exception delivery later
>> +         */
>> +        ring = xive_presenter_tctx_match(tctx, format, nvt_blk, nvt_idx,
>> +                                         cam_ignore, logic_serv);
>> +        /*
>> +         * Save the context and follow on to catch duplicates, that we
>> +         * don't support yet.
>> +         */
>> +        if (ring != -1) {
>> +            if (match->tctx) {
>> +                qemu_log_mask(LOG_GUEST_ERROR, "XIVE: already found a thread "
>> +                              "context NVT %x/%x\n", nvt_blk, nvt_idx);
>> +                return false;
>> +            }
>> +
>> +            match->ring = ring;
>> +            match->tctx = tctx;
>> +        }
>> +    }
>> +
>> +    if (!match->tctx) {
>> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: NVT %x/%x is not dispatched\n",
>> +                      nvt_blk, nvt_idx);
>> +        return false;
> 
> Hmm.. this isn't actually an error isn't it? At least not for powernv

It is on sPAPR, it would mean the END was configured with an unknow CPU. 

It is not error on PowerNV, when we support escalations.

> - that just means the NVT isn't currently dispatched, so we'll need to
> trigger the escalation interrupt.  

Yes.

> Does this get changed later in the series?

No.

Thanks,

C.

>> +    }
>> +
>> +    return true;
>> +}
>> +
>> +/*
>> + * This is our simple Xive Presenter Engine model. It is merged in the
>> + * Router as it does not require an extra object.
>> + *
>> + * It receives notification requests sent by the IVRE to find one
>> + * matching NVT (or more) dispatched on the processor threads. In case
>> + * of a single NVT notification, the process is abreviated and the
>> + * thread is signaled if a match is found. In case of a logical server
>> + * notification (bits ignored at the end of the NVT identifier), the
>> + * IVPE and IVRE select a winning thread using different filters. This
>> + * involves 2 or 3 exchanges on the PowerBus that the model does not
>> + * support.
>> + *
>> + * The parameters represent what is sent on the PowerBus
>> + */
>> +static void xive_presenter_notify(XiveRouter *xrtr, uint8_t format,
>> +                                  uint8_t nvt_blk, uint32_t nvt_idx,
>> +                                  bool cam_ignore, uint8_t priority,
>> +                                  uint32_t logic_serv)
>> +{
>> +    XiveNVT nvt;
>> +    XiveTCTXMatch match = { 0 };
>> +    bool found;
>> +
>> +    /* NVT cache lookup */
>> +    if (xive_router_get_nvt(xrtr, nvt_blk, nvt_idx, &nvt)) {
>> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: no NVT %x/%x\n",
>> +                      nvt_blk, nvt_idx);
>> +        return;
>> +    }
>> +
>> +    if (!(nvt.w0 & NVT_W0_VALID)) {
>> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: NVT %x/%x is invalid\n",
>> +                      nvt_blk, nvt_idx);
>> +        return;
>> +    }
>> +
>> +    found = xive_presenter_match(xrtr, format, nvt_blk, nvt_idx, cam_ignore,
>> +                                 priority, logic_serv, &match);
>> +    if (found) {
>> +        return;
>> +    }
>> +
>> +    /* If no matching NVT is dispatched on a HW thread :
>> +     * - update the NVT structure if backlog is activated
>> +     * - escalate (ESe PQ bits and EAS in w4-5) if escalation is
>> +     *   activated
>> +     */
>> +}
>> +
>>  /*
>>   * An END trigger can come from an event trigger (IPI or HW) or from
>>   * another chip. We don't model the PowerBus but the END trigger
>> @@ -1081,6 +1296,14 @@ static void xive_router_end_notify(XiveRouter *xrtr, uint8_t end_blk,
>>      /*
>>       * Follows IVPE notification
>>       */
>> +    xive_presenter_notify(xrtr, format,
>> +                          GETFIELD(END_W6_NVT_BLOCK, end.w6),
>> +                          GETFIELD(END_W6_NVT_INDEX, end.w6),
>> +                          GETFIELD(END_W7_F0_IGNORE, end.w7),
>> +                          priority,
>> +                          GETFIELD(END_W7_F1_LOG_SERVER_ID, end.w7));
>> +
>> +    /* TODO: Auto EOI. */
>>  }
>>  
>>  static void xive_router_notify(XiveFabric *xf, uint32_t lisn)
>

David Gibson Nov. 29, 2018, 12:47 a.m. UTC | #4

On Wed, Nov 28, 2018 at 11:59:58AM +0100, Cédric Le Goater wrote:
> On 11/28/18 12:49 AM, David Gibson wrote:
> > On Fri, Nov 16, 2018 at 11:57:01AM +0100, Cédric Le Goater wrote:
> >> The last sub-engine of the XIVE architecture is the Interrupt
> >> Virtualization Presentation Engine (IVPE). On HW, they share elements,
> >> the Power Bus interface (CQ), the routing table descriptors, and they
> >> can be combined in the same HW logic. We do the same in QEMU and
> >> combine both engines in the XiveRouter for simplicity.
> > 
> > Ok, I'm not entirely convinced combining the IVPE and IVRE into a
> > single object is a good idea, but we can probably discuss that once
> > I've read further.
> 
> We could introduce a simplified presenter for sPAPR but I am not even
> sure of that as it will get more complex if we support the EBB one day. 

I wasn't really thinking about PAPR for this comment.

> >> When the IVRE has completed its job of matching an event source with a
> >> Notification Virtual Target (NVT) to notify, it forwards the event
> >> notification to the IVPE sub-engine. The IVPE scans the thread
> >> interrupt contexts of the Notification Virtual Targets (NVT)
> >> dispatched on the HW processor threads and if a match is found, it
> >> signals the thread. If not, the IVPE escalates the notification to
> >> some other targets and records the notification in a backlog queue.
> >>
> >> The IVPE maintains the thread interrupt context state for each of its
> >> NVTs not dispatched on HW processor threads in the Notification
> >> Virtual Target table (NVTT).
> >>
> >> The model currently only supports single NVT notifications.
> >>
> >> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> >> ---
> >>  include/hw/ppc/xive.h      |  13 +++
> >>  include/hw/ppc/xive_regs.h |  22 ++++
> >>  hw/intc/xive.c             | 223 +++++++++++++++++++++++++++++++++++++
> >>  3 files changed, 258 insertions(+)
> >>
> >> diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
> >> index 5987f26ddb98..e715a6c6923d 100644
> >> --- a/include/hw/ppc/xive.h
> >> +++ b/include/hw/ppc/xive.h
> >> @@ -197,6 +197,10 @@ typedef struct XiveRouterClass {
> >>                     XiveEND *end);
> >>      int (*set_end)(XiveRouter *xrtr, uint8_t end_blk, uint32_t end_idx,
> >>                     XiveEND *end);
> >> +    int (*get_nvt)(XiveRouter *xrtr, uint8_t nvt_blk, uint32_t nvt_idx,
> >> +                   XiveNVT *nvt);
> >> +    int (*set_nvt)(XiveRouter *xrtr, uint8_t nvt_blk, uint32_t nvt_idx,
> >> +                   XiveNVT *nvt);
> > 
> > As with the ENDs, I don't think get/set is a good interface for a
> > bigger-than-word-size object.
> 
> We need to agree on this interface before I respin. So you would like 
> to add a extra argument specifying the word being accessed ?

Yes.  Ok, 3 options I can see at this point:

1) read/write accessors which take a word number

2) A "get" accessor which copies the whole structure, but "write"
accessor which takes a word number.  The asymmetry is a bit ugly, but
it's the non-atomic writeback of the whole structure which I'm most
uncomfortable with.

3) A map/unmap interface which gives you / releases a pointer to the
"live" structure.  For powernv that would become
address_space_map()/unmap().  For PAPR it would just be reutn pointer
/ no-op.

> 
> > 
> >>  } XiveRouterClass;
> >>  
> >>  void xive_eas_pic_print_info(XiveEAS *eas, uint32_t lisn, Monitor *mon);
> >> @@ -207,6 +211,10 @@ int xive_router_get_end(XiveRouter *xrtr, uint8_t end_blk, uint32_t end_idx,
> >>                          XiveEND *end);
> >>  int xive_router_set_end(XiveRouter *xrtr, uint8_t end_blk, uint32_t end_idx,
> >>                          XiveEND *end);
> >> +int xive_router_get_nvt(XiveRouter *xrtr, uint8_t nvt_blk, uint32_t nvt_idx,
> >> +                        XiveNVT *nvt);
> >> +int xive_router_set_nvt(XiveRouter *xrtr, uint8_t nvt_blk, uint32_t nvt_idx,
> >> +                        XiveNVT *nvt);
> >>  
> >>  /*
> >>   * XIVE END ESBs
> >> @@ -274,4 +282,9 @@ extern const MemoryRegionOps xive_tm_ops;
> >>  
> >>  void xive_tctx_pic_print_info(XiveTCTX *tctx, Monitor *mon);
> >>  
> >> +static inline uint32_t xive_tctx_cam_line(uint8_t nvt_blk, uint32_t nvt_idx)
> >> +{
> >> +    return (nvt_blk << 19) | nvt_idx;
> > 
> > I'm guessing this formula is the standard way of combining the NVT
> > block and index into a single word?  
> 
> That number is the VP/NVT identifier which is written in the CAM value. 
> The index is on 19 bits because of the NVT  definition in the END 
> structure. It is being increased to 24 bits on Power10 
> 
> > If so, I think we should
> > standardize on passing a single word "nvt_id" around and only
> > splitting it when we need to use the block separately.  
> 
> This is really the only place where we concatenate the two NVT values,
> block and index. 

Hm, ok.  I know we don't model them (yet, maybe ever) but could
combined values appear in the PowerBUS messages that handle remote
notifications?

> > Same goes for
> > the end_id, assuming there's a standard way of putting that into a
> > single word.  That will address the point I raised earlier about lisn
> > being passed around as a single word, but these later stage ids being
> > split.
> 
> Hmm, I am not sure this is a good option. It is not how the PowerNV 
> model would use it, skiboot is very much aware of these blocks and 
> indexes and for remote accesses chips are identified using the block. 
> I will take a look at it but I am not found of it. I can add helpers 
> in some places though.    

Hm, ok.  Do the block and index appear as an (effectively) single
field in the EAS?

> I agree we have some kind of issue linking the HW model with the sPAPR 
> machine. The guest interface is only  about IRQ numbers, priorities and
> cpu numbers. We really don't care about XIVE blocks and indexes in that 
> case. we can clarify the code by bypassing the XiveRouter interfaces
> to the table and directly use the sPAPR interrupt controller. That 
> should help a bit for the hcalls but we would still have to fill in 
> the EAT and the END with some index values if we want to use the router
> algorithm.

I don't think this is too much of a problem.  These are essentially
machine internal details so we can choose an allocation to suit us.
The obvious one is to put everything in a single block, at least as
long as that won't limit our numbers too much.

> > We'll probably want some inlines or macros to build an
> > nvt/end/lisn/whatever id from block and index as well.
> > 
> >> +}
> >> +
> >>  #endif /* PPC_XIVE_H */
> >> diff --git a/include/hw/ppc/xive_regs.h b/include/hw/ppc/xive_regs.h
> >> index 2e3d6cb507da..05cb992d2815 100644
> >> --- a/include/hw/ppc/xive_regs.h
> >> +++ b/include/hw/ppc/xive_regs.h
> >> @@ -158,4 +158,26 @@ typedef struct XiveEND {
> >>  #define END_W7_F1_LOG_SERVER_ID  PPC_BITMASK32(1, 31)
> >>  } XiveEND;
> >>  
> >> +/* Notification Virtual Target (NVT) */
> >> +typedef struct XiveNVT {
> >> +        uint32_t        w0;
> >> +#define NVT_W0_VALID             PPC_BIT32(0)
> >> +        uint32_t        w1;
> >> +        uint32_t        w2;
> >> +        uint32_t        w3;
> >> +        uint32_t        w4;
> >> +        uint32_t        w5;
> >> +        uint32_t        w6;
> >> +        uint32_t        w7;
> >> +        uint32_t        w8;
> >> +#define NVT_W8_GRP_VALID         PPC_BIT32(0)
> >> +        uint32_t        w9;
> >> +        uint32_t        wa;
> >> +        uint32_t        wb;
> >> +        uint32_t        wc;
> >> +        uint32_t        wd;
> >> +        uint32_t        we;
> >> +        uint32_t        wf;
> >> +} XiveNVT;
> >> +
> >>  #endif /* PPC_XIVE_REGS_H */
> >> diff --git a/hw/intc/xive.c b/hw/intc/xive.c
> >> index 4c6cb5d52975..5ba3b06e6e25 100644
> >> --- a/hw/intc/xive.c
> >> +++ b/hw/intc/xive.c
> >> @@ -373,6 +373,32 @@ void xive_tctx_pic_print_info(XiveTCTX *tctx, Monitor *mon)
> >>      }
> >>  }
> >>  
> >> +/* The HW CAM (23bits) is hardwired to :
> >> + *
> >> + *   0x000||0b1||4Bit chip number||7Bit Thread number.
> >> + *
> >> + * and when the block grouping extension is enabled :
> >> + *
> >> + *   4Bit chip number||0x001||7Bit Thread number.
> >> + */
> >> +static uint32_t tctx_hw_cam_line(bool block_group, uint8_t chip_id, uint8_t tid)
> >> +{
> >> +    if (block_group) {
> >> +        return 1 << 11 | (chip_id & 0xf) << 7 | (tid & 0x7f);
> >> +    } else {
> >> +        return (chip_id & 0xf) << 11 | 1 << 7 | (tid & 0x7f);
> >> +    }
> >> +}
> >> +
> >> +static uint32_t xive_tctx_hw_cam_line(XiveTCTX *tctx, bool block_group)
> >> +{
> >> +    PowerPCCPU *cpu = POWERPC_CPU(tctx->cs);
> >> +    CPUPPCState *env = &cpu->env;
> >> +    uint32_t pir = env->spr_cb[SPR_PIR].default_value;
> > 
> > I don't much like reaching into the cpu state itself.  I think a
> > better idea would be to have the TCTX have its HW CAM id set during
> > initialization (via a property) and then use that.  This will mean
> > less mucking about if future cpu revisions don't split the PIR into
> > chip and tid ids in the same way.
> 
> yes good idea. I will see how to handle the block_group boolean. may be we
> can leave it out of the model for now as it is not used.

Yes, it would be nice to leave the block_group stuff as a later
extensions when/if we need it.  If we put it in as a stub and nothing
is using/testing it, it's likely it will be broken if we ever do
actually try to use it.

> 
> > 
> >> +    return tctx_hw_cam_line(block_group, (pir >> 8) & 0xf, pir & 0x7f);
> >> +}
> >> +
> >>  static void xive_tctx_reset(void *dev)
> >>  {
> >>      XiveTCTX *tctx = XIVE_TCTX(dev);
> >> @@ -1013,6 +1039,195 @@ int xive_router_set_end(XiveRouter *xrtr, uint8_t end_blk, uint32_t end_idx,
> >>     return xrc->set_end(xrtr, end_blk, end_idx, end);
> >>  }
> >>  
> >> +int xive_router_get_nvt(XiveRouter *xrtr, uint8_t nvt_blk, uint32_t nvt_idx,
> >> +                        XiveNVT *nvt)
> >> +{
> >> +   XiveRouterClass *xrc = XIVE_ROUTER_GET_CLASS(xrtr);
> >> +
> >> +   return xrc->get_nvt(xrtr, nvt_blk, nvt_idx, nvt);
> >> +}
> >> +
> >> +int xive_router_set_nvt(XiveRouter *xrtr, uint8_t nvt_blk, uint32_t nvt_idx,
> >> +                        XiveNVT *nvt)
> >> +{
> >> +   XiveRouterClass *xrc = XIVE_ROUTER_GET_CLASS(xrtr);
> >> +
> >> +   return xrc->set_nvt(xrtr, nvt_blk, nvt_idx, nvt);
> >> +}
> >> +
> >> +static bool xive_tctx_ring_match(XiveTCTX *tctx, uint8_t ring,
> >> +                                 uint8_t nvt_blk, uint32_t nvt_idx,
> >> +                                 bool cam_ignore, uint32_t logic_serv)
> >> +{
> >> +    uint8_t *regs = &tctx->regs[ring];
> >> +    uint32_t w2 = be32_to_cpu(*((uint32_t *) &regs[TM_WORD2]));
> >> +    uint32_t cam = xive_tctx_cam_line(nvt_blk, nvt_idx);
> >> +    bool block_group = false; /* TODO (PowerNV) */
> >> +
> >> +    /* TODO (PowerNV): ignore low order bits of nvt id */
> >> +
> >> +    switch (ring) {
> >> +    case TM_QW3_HV_PHYS:
> >> +        return (w2 & TM_QW3W2_VT) && xive_tctx_hw_cam_line(tctx, block_group) ==
> >> +            tctx_hw_cam_line(block_group, nvt_blk, nvt_idx);
> > 
> > The difference between "xive_tctx_hw_cam_line" and "tctx_hw_cam_line"
> > here is far from obvious.  
> 
> yes. I lacked inspiration ...

I'd suggest that the one which takes the tctx as a parameter be
tctx_hw_cam_line() and the other be nvt_hw_cam_line() or similar.  The
crucial difference here is that one is what the thread is looking for,
the other is what the NVT is advertising.

> > Remember that namespacing prefixes aren't
> > necessary for static functions, which can let you give more
> > descriptive names without getting excessively long.
> 
> OK.
>  
> >> +    case TM_QW2_HV_POOL:
> >> +        return (w2 & TM_QW2W2_VP) && (cam == GETFIELD(TM_QW2W2_POOL_CAM, w2));
> >> +
> >> +    case TM_QW1_OS:
> >> +        return (w2 & TM_QW1W2_VO) && (cam == GETFIELD(TM_QW1W2_OS_CAM, w2));
> >> +
> >> +    case TM_QW0_USER:
> >> +        return ((w2 & TM_QW1W2_VO) && (cam == GETFIELD(TM_QW1W2_OS_CAM, w2)) &&
> >> +                (w2 & TM_QW0W2_VU) &&
> >> +                (logic_serv == GETFIELD(TM_QW0W2_LOGIC_SERV, w2)));
> >> +
> >> +    default:
> >> +        g_assert_not_reached();
> >> +    }
> >> +}
> >> +
> >> +static int xive_presenter_tctx_match(XiveTCTX *tctx, uint8_t format,
> >> +                                     uint8_t nvt_blk, uint32_t nvt_idx,
> >> +                                     bool cam_ignore, uint32_t logic_serv)
> >> +{
> >> +    if (format == 0) {
> >> +        /* F=0 & i=1: Logical server notification */
> >> +        if (cam_ignore == true) {
> >> +            qemu_log_mask(LOG_GUEST_ERROR, "XIVE: no support for LS "
> >> +                          "NVT %x/%x\n", nvt_blk, nvt_idx);
> >> +             return -1;
> >> +        }
> >> +
> >> +        /* F=0 & i=0: Specific NVT notification */
> >> +        if (xive_tctx_ring_match(tctx, TM_QW3_HV_PHYS,
> >> +                                nvt_blk, nvt_idx, false, 0)) {
> >> +            return TM_QW3_HV_PHYS;
> >> +        }
> >> +        if (xive_tctx_ring_match(tctx, TM_QW2_HV_POOL,
> >> +                                nvt_blk, nvt_idx, false, 0)) {
> >> +            return TM_QW2_HV_POOL;
> >> +        }
> >> +        if (xive_tctx_ring_match(tctx, TM_QW1_OS,
> >> +                                nvt_blk, nvt_idx, false, 0)) {
> >> +            return TM_QW1_OS;
> >> +        }
> > 
> > Hm.  It's a bit pointless to iterate through each ring calling a
> > common function, when that "common" function consists entirely of a
> > switch which makes it not really common at all.
> > 
> > So I think you want separate helper functions for each ring's match,
> > or even just fold the previous function into this one.
> 
> yes. It can be improved. I did try different layouts. I might just fold 
> both routine in one as you propose.  
> 
> >> +    } else {
> >> +        /* F=1 : User level Event-Based Branch (EBB) notification */
> >> +        if (xive_tctx_ring_match(tctx, TM_QW0_USER,
> >> +                                nvt_blk, nvt_idx, false, logic_serv)) {
> >> +            return TM_QW0_USER;
> >> +        }
> >> +    }
> >> +    return -1;
> >> +}
> >> +
> >> +typedef struct XiveTCTXMatch {
> >> +    XiveTCTX *tctx;
> >> +    uint8_t ring;
> >> +} XiveTCTXMatch;
> >> +
> >> +static bool xive_presenter_match(XiveRouter *xrtr, uint8_t format,
> >> +                                 uint8_t nvt_blk, uint32_t nvt_idx,
> >> +                                 bool cam_ignore, uint8_t priority,
> >> +                                 uint32_t logic_serv, XiveTCTXMatch *match)
> >> +{
> >> +    CPUState *cs;
> >> +
> >> +    /* TODO (PowerNV): handle chip_id overwrite of block field for
> >> +     * hardwired CAM compares */
> >> +
> >> +    CPU_FOREACH(cs) {
> >> +        PowerPCCPU *cpu = POWERPC_CPU(cs);
> >> +        XiveTCTX *tctx = XIVE_TCTX(cpu->intc);
> >> +        int ring;
> >> +
> >> +        /*
> >> +         * HW checks that the CPU is enabled in the Physical Thread
> >> +         * Enable Register (PTER).
> >> +         */
> >> +
> >> +        /*
> >> +         * Check the thread context CAM lines and record matches. We
> >> +         * will handle CPU exception delivery later
> >> +         */
> >> +        ring = xive_presenter_tctx_match(tctx, format, nvt_blk, nvt_idx,
> >> +                                         cam_ignore, logic_serv);
> >> +        /*
> >> +         * Save the context and follow on to catch duplicates, that we
> >> +         * don't support yet.
> >> +         */
> >> +        if (ring != -1) {
> >> +            if (match->tctx) {
> >> +                qemu_log_mask(LOG_GUEST_ERROR, "XIVE: already found a thread "
> >> +                              "context NVT %x/%x\n", nvt_blk, nvt_idx);
> >> +                return false;
> >> +            }
> >> +
> >> +            match->ring = ring;
> >> +            match->tctx = tctx;
> >> +        }
> >> +    }
> >> +
> >> +    if (!match->tctx) {
> >> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: NVT %x/%x is not dispatched\n",
> >> +                      nvt_blk, nvt_idx);
> >> +        return false;
> > 
> > Hmm.. this isn't actually an error isn't it? At least not for powernv
> 
> It is on sPAPR, it would mean the END was configured with an unknow CPU. 

Right.

> It is not error on PowerNV, when we support escalations.
> 
> > - that just means the NVT isn't currently dispatched, so we'll need to
> > trigger the escalation interrupt.  
> 
> Yes.
> 
> > Does this get changed later in the series?
> 
> No.

But this code is common to PAPR and powernv, yes, so it will need to?

Benjamin Herrenschmidt Nov. 29, 2018, 3:39 a.m. UTC | #5

On Thu, 2018-11-29 at 11:47 +1100, David Gibson wrote:
> 
> 1) read/write accessors which take a word number
> 
> 2) A "get" accessor which copies the whole structure, but "write"
> accessor which takes a word number.  The asymmetry is a bit ugly, but
> it's the non-atomic writeback of the whole structure which I'm most
> uncomfortable with.

It shouldn't be a big deal though, there are HW facilities to access
the structures "atomically" anyway, due to the caching done by the
XIVE.

> 3) A map/unmap interface which gives you / releases a pointer to the
> "live" structure.  For powernv that would become
> address_space_map()/unmap().  For PAPR it would just be reutn pointer
> / no-op.

Cédric Le Goater Nov. 29, 2018, 5:51 p.m. UTC | #6

On 11/29/18 4:39 AM, Benjamin Herrenschmidt wrote:
> On Thu, 2018-11-29 at 11:47 +1100, David Gibson wrote:
>>
>> 1) read/write accessors which take a word number

ok for single word updates of the structures.

>> 2) A "get" accessor which copies the whole structure, 

ok

>> but "write"
>> accessor which takes a word number.  The asymmetry is a bit ugly, but
>> it's the non-atomic writeback of the whole structure which I'm most
>> uncomfortable with.

And, how would you make the update of the whole structure in RAM look 
"atomic" under QEMU ? 

> It shouldn't be a big deal though, there are HW facilities to access
> the structures "atomically" anyway, due to the caching done by the
> XIVE.

Are you suggesting that the PowerNV model should update the VPC, EQC, 
IVC in the VST accessors before updating the VSTs in RAM ?
>> 3) A map/unmap interface which gives you / releases a pointer to the
>> "live" structure.  For powernv that would become
>> address_space_map()/unmap().  

yes.

>> For PAPR it would just be reutn pointer  / no-op.

ok

I think I will introduce these handlers progressively in the patchset.

Thanks,

C.

David Gibson Nov. 30, 2018, 1:09 a.m. UTC | #7

On Thu, Nov 29, 2018 at 06:51:53PM +0100, Cédric Le Goater wrote:
> On 11/29/18 4:39 AM, Benjamin Herrenschmidt wrote:
> > On Thu, 2018-11-29 at 11:47 +1100, David Gibson wrote:
> >>
> >> 1) read/write accessors which take a word number
> 
> ok for single word updates of the structures.
> 
> >> 2) A "get" accessor which copies the whole structure, 
> 
> ok
> 
> >> but "write"
> >> accessor which takes a word number.  The asymmetry is a bit ugly, but
> >> it's the non-atomic writeback of the whole structure which I'm most
> >> uncomfortable with.
> 
> And, how would you make the update of the whole structure in RAM look 
> "atomic" under QEMU ? 

So, the BQL means it actually is atomic now (at least for PAPR where
the guest doesn't have access to it), but I don't want to rely on that
always being the case - there are moves to put less stuff under the
BQL, and with KVM we might be mapping some of these things such that
real hardware can touch it.

But the real point is that we don't *need* it to be atomic.  Perhaps
the individual field updates need to be atomic, but not writes to the
END as a whole.  Writing back the whole thing is also a whole heap of
unnecessary stores.

> > It shouldn't be a big deal though, there are HW facilities to access
> > the structures "atomically" anyway, due to the caching done by the
> > XIVE.
> 
> Are you suggesting that the PowerNV model should update the VPC, EQC, 
> IVC in the VST accessors before updating the VSTs in RAM ?
> >> 3) A map/unmap interface which gives you / releases a pointer to the
> >> "live" structure.  For powernv that would become
> >> address_space_map()/unmap().  
> 
> yes.
> 
> >> For PAPR it would just be reutn pointer  / no-op.
> 
> ok
> 
> I think I will introduce these handlers progressively in the patchset.
> 
> Thanks,
> 
> C. 
> 
>

Cédric Le Goater Dec. 3, 2018, 5:05 p.m. UTC | #8

I forgot to reply to this one.

On 11/29/18 1:47 AM, David Gibson wrote:
> On Wed, Nov 28, 2018 at 11:59:58AM +0100, Cédric Le Goater wrote:
>> On 11/28/18 12:49 AM, David Gibson wrote:
>>> On Fri, Nov 16, 2018 at 11:57:01AM +0100, Cédric Le Goater wrote:
>>>> The last sub-engine of the XIVE architecture is the Interrupt
>>>> Virtualization Presentation Engine (IVPE). On HW, they share elements,
>>>> the Power Bus interface (CQ), the routing table descriptors, and they
>>>> can be combined in the same HW logic. We do the same in QEMU and
>>>> combine both engines in the XiveRouter for simplicity.
>>>
>>> Ok, I'm not entirely convinced combining the IVPE and IVRE into a
>>> single object is a good idea, but we can probably discuss that once
>>> I've read further.
>>
>> We could introduce a simplified presenter for sPAPR but I am not even
>> sure of that as it will get more complex if we support the EBB one day. 
> 
> I wasn't really thinking about PAPR for this comment.
> 
>>>> When the IVRE has completed its job of matching an event source with a
>>>> Notification Virtual Target (NVT) to notify, it forwards the event
>>>> notification to the IVPE sub-engine. The IVPE scans the thread
>>>> interrupt contexts of the Notification Virtual Targets (NVT)
>>>> dispatched on the HW processor threads and if a match is found, it
>>>> signals the thread. If not, the IVPE escalates the notification to
>>>> some other targets and records the notification in a backlog queue.
>>>>
>>>> The IVPE maintains the thread interrupt context state for each of its
>>>> NVTs not dispatched on HW processor threads in the Notification
>>>> Virtual Target table (NVTT).
>>>>
>>>> The model currently only supports single NVT notifications.
>>>>
>>>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>>>> ---
>>>>  include/hw/ppc/xive.h      |  13 +++
>>>>  include/hw/ppc/xive_regs.h |  22 ++++
>>>>  hw/intc/xive.c             | 223 +++++++++++++++++++++++++++++++++++++
>>>>  3 files changed, 258 insertions(+)
>>>>
>>>> diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
>>>> index 5987f26ddb98..e715a6c6923d 100644
>>>> --- a/include/hw/ppc/xive.h
>>>> +++ b/include/hw/ppc/xive.h
>>>> @@ -197,6 +197,10 @@ typedef struct XiveRouterClass {
>>>>                     XiveEND *end);
>>>>      int (*set_end)(XiveRouter *xrtr, uint8_t end_blk, uint32_t end_idx,
>>>>                     XiveEND *end);
>>>> +    int (*get_nvt)(XiveRouter *xrtr, uint8_t nvt_blk, uint32_t nvt_idx,
>>>> +                   XiveNVT *nvt);
>>>> +    int (*set_nvt)(XiveRouter *xrtr, uint8_t nvt_blk, uint32_t nvt_idx,
>>>> +                   XiveNVT *nvt);
>>>
>>> As with the ENDs, I don't think get/set is a good interface for a
>>> bigger-than-word-size object.
>>
>> We need to agree on this interface before I respin. So you would like 
>> to add a extra argument specifying the word being accessed ?
> 
> Yes.  Ok, 3 options I can see at this point:
> 
> 1) read/write accessors which take a word number
> 
> 2) A "get" accessor which copies the whole structure, but "write"
> accessor which takes a word number.  The asymmetry is a bit ugly, but
> it's the non-atomic writeback of the whole structure which I'm most
> uncomfortable with.
> 
> 3) A map/unmap interface which gives you / releases a pointer to the
> "live" structure.  For powernv that would become
> address_space_map()/unmap().  For PAPR it would just be reutn pointer
> / no-op.

This discussion is in progress in another subthread.

>>
>>>
>>>>  } XiveRouterClass;
>>>>  
>>>>  void xive_eas_pic_print_info(XiveEAS *eas, uint32_t lisn, Monitor *mon);
>>>> @@ -207,6 +211,10 @@ int xive_router_get_end(XiveRouter *xrtr, uint8_t end_blk, uint32_t end_idx,
>>>>                          XiveEND *end);
>>>>  int xive_router_set_end(XiveRouter *xrtr, uint8_t end_blk, uint32_t end_idx,
>>>>                          XiveEND *end);
>>>> +int xive_router_get_nvt(XiveRouter *xrtr, uint8_t nvt_blk, uint32_t nvt_idx,
>>>> +                        XiveNVT *nvt);
>>>> +int xive_router_set_nvt(XiveRouter *xrtr, uint8_t nvt_blk, uint32_t nvt_idx,
>>>> +                        XiveNVT *nvt);
>>>>  
>>>>  /*
>>>>   * XIVE END ESBs
>>>> @@ -274,4 +282,9 @@ extern const MemoryRegionOps xive_tm_ops;
>>>>  
>>>>  void xive_tctx_pic_print_info(XiveTCTX *tctx, Monitor *mon);
>>>>  
>>>> +static inline uint32_t xive_tctx_cam_line(uint8_t nvt_blk, uint32_t nvt_idx)
>>>> +{
>>>> +    return (nvt_blk << 19) | nvt_idx;
>>>
>>> I'm guessing this formula is the standard way of combining the NVT
>>> block and index into a single word?  
>>
>> That number is the VP/NVT identifier which is written in the CAM value. 
>> The index is on 19 bits because of the NVT  definition in the END 
>> structure. It is being increased to 24 bits on Power10 
>>
>>> If so, I think we should
>>> standardize on passing a single word "nvt_id" around and only
>>> splitting it when we need to use the block separately.  
>>
>> This is really the only place where we concatenate the two NVT values,
>> block and index. 
> 
> Hm, ok.  I know we don't model them (yet, maybe ever) but could
> combined values appear in the PowerBUS messages that handle remote
> notifications?

They do. 
 
>>> Same goes for
>>> the end_id, assuming there's a standard way of putting that into a
>>> single word.  That will address the point I raised earlier about lisn
>>> being passed around as a single word, but these later stage ids being
>>> split.
>>
>> Hmm, I am not sure this is a good option. It is not how the PowerNV 
>> model would use it, skiboot is very much aware of these blocks and 
>> indexes and for remote accesses chips are identified using the block. 
>> I will take a look at it but I am not found of it. I can add helpers 
>> in some places though.    
> 
> Hm, ok.  Do the block and index appear as an (effectively) single
> field in the EAS?

no. In all XIVE structures, block and index are always distinct.

>> I agree we have some kind of issue linking the HW model with the sPAPR 
>> machine. The guest interface is only  about IRQ numbers, priorities and
>> cpu numbers. We really don't care about XIVE blocks and indexes in that 
>> case. we can clarify the code by bypassing the XiveRouter interfaces
>> to the table and directly use the sPAPR interrupt controller. That 
>> should help a bit for the hcalls but we would still have to fill in 
>> the EAT and the END with some index values if we want to use the router
>> algorithm.
> 
> I don't think this is too much of a problem.  These are essentially
> machine internal details so we can choose an allocation to suit us.
> The obvious one is to put everything in a single block, at least as
> long as that won't limit our numbers too much.
> 
>>> We'll probably want some inlines or macros to build an
>>> nvt/end/lisn/whatever id from block and index as well.
>>>
>>>> +}
>>>> +
>>>>  #endif /* PPC_XIVE_H */
>>>> diff --git a/include/hw/ppc/xive_regs.h b/include/hw/ppc/xive_regs.h
>>>> index 2e3d6cb507da..05cb992d2815 100644
>>>> --- a/include/hw/ppc/xive_regs.h
>>>> +++ b/include/hw/ppc/xive_regs.h
>>>> @@ -158,4 +158,26 @@ typedef struct XiveEND {
>>>>  #define END_W7_F1_LOG_SERVER_ID  PPC_BITMASK32(1, 31)
>>>>  } XiveEND;
>>>>  
>>>> +/* Notification Virtual Target (NVT) */
>>>> +typedef struct XiveNVT {
>>>> +        uint32_t        w0;
>>>> +#define NVT_W0_VALID             PPC_BIT32(0)
>>>> +        uint32_t        w1;
>>>> +        uint32_t        w2;
>>>> +        uint32_t        w3;
>>>> +        uint32_t        w4;
>>>> +        uint32_t        w5;
>>>> +        uint32_t        w6;
>>>> +        uint32_t        w7;
>>>> +        uint32_t        w8;
>>>> +#define NVT_W8_GRP_VALID         PPC_BIT32(0)
>>>> +        uint32_t        w9;
>>>> +        uint32_t        wa;
>>>> +        uint32_t        wb;
>>>> +        uint32_t        wc;
>>>> +        uint32_t        wd;
>>>> +        uint32_t        we;
>>>> +        uint32_t        wf;
>>>> +} XiveNVT;
>>>> +
>>>>  #endif /* PPC_XIVE_REGS_H */
>>>> diff --git a/hw/intc/xive.c b/hw/intc/xive.c
>>>> index 4c6cb5d52975..5ba3b06e6e25 100644
>>>> --- a/hw/intc/xive.c
>>>> +++ b/hw/intc/xive.c
>>>> @@ -373,6 +373,32 @@ void xive_tctx_pic_print_info(XiveTCTX *tctx, Monitor *mon)
>>>>      }
>>>>  }
>>>>  
>>>> +/* The HW CAM (23bits) is hardwired to :
>>>> + *
>>>> + *   0x000||0b1||4Bit chip number||7Bit Thread number.
>>>> + *
>>>> + * and when the block grouping extension is enabled :
>>>> + *
>>>> + *   4Bit chip number||0x001||7Bit Thread number.
>>>> + */
>>>> +static uint32_t tctx_hw_cam_line(bool block_group, uint8_t chip_id, uint8_t tid)
>>>> +{
>>>> +    if (block_group) {
>>>> +        return 1 << 11 | (chip_id & 0xf) << 7 | (tid & 0x7f);
>>>> +    } else {
>>>> +        return (chip_id & 0xf) << 11 | 1 << 7 | (tid & 0x7f);
>>>> +    }
>>>> +}
>>>> +
>>>> +static uint32_t xive_tctx_hw_cam_line(XiveTCTX *tctx, bool block_group)
>>>> +{
>>>> +    PowerPCCPU *cpu = POWERPC_CPU(tctx->cs);
>>>> +    CPUPPCState *env = &cpu->env;
>>>> +    uint32_t pir = env->spr_cb[SPR_PIR].default_value;
>>>
>>> I don't much like reaching into the cpu state itself.  I think a
>>> better idea would be to have the TCTX have its HW CAM id set during
>>> initialization (via a property) and then use that.  This will mean
>>> less mucking about if future cpu revisions don't split the PIR into
>>> chip and tid ids in the same way.
>>
>> yes good idea. I will see how to handle the block_group boolean. may be we
>> can leave it out of the model for now as it is not used.
> 
> Yes, it would be nice to leave the block_group stuff as a later
> extensions when/if we need it.  If we put it in as a stub and nothing
> is using/testing it, it's likely it will be broken if we ever do
> actually try to use it.

> 
>>
>>>
>>>> +    return tctx_hw_cam_line(block_group, (pir >> 8) & 0xf, pir & 0x7f);
>>>> +}
>>>> +
>>>>  static void xive_tctx_reset(void *dev)
>>>>  {
>>>>      XiveTCTX *tctx = XIVE_TCTX(dev);
>>>> @@ -1013,6 +1039,195 @@ int xive_router_set_end(XiveRouter *xrtr, uint8_t end_blk, uint32_t end_idx,
>>>>     return xrc->set_end(xrtr, end_blk, end_idx, end);
>>>>  }
>>>>  
>>>> +int xive_router_get_nvt(XiveRouter *xrtr, uint8_t nvt_blk, uint32_t nvt_idx,
>>>> +                        XiveNVT *nvt)
>>>> +{
>>>> +   XiveRouterClass *xrc = XIVE_ROUTER_GET_CLASS(xrtr);
>>>> +
>>>> +   return xrc->get_nvt(xrtr, nvt_blk, nvt_idx, nvt);
>>>> +}
>>>> +
>>>> +int xive_router_set_nvt(XiveRouter *xrtr, uint8_t nvt_blk, uint32_t nvt_idx,
>>>> +                        XiveNVT *nvt)
>>>> +{
>>>> +   XiveRouterClass *xrc = XIVE_ROUTER_GET_CLASS(xrtr);
>>>> +
>>>> +   return xrc->set_nvt(xrtr, nvt_blk, nvt_idx, nvt);
>>>> +}
>>>> +
>>>> +static bool xive_tctx_ring_match(XiveTCTX *tctx, uint8_t ring,
>>>> +                                 uint8_t nvt_blk, uint32_t nvt_idx,
>>>> +                                 bool cam_ignore, uint32_t logic_serv)
>>>> +{
>>>> +    uint8_t *regs = &tctx->regs[ring];
>>>> +    uint32_t w2 = be32_to_cpu(*((uint32_t *) &regs[TM_WORD2]));
>>>> +    uint32_t cam = xive_tctx_cam_line(nvt_blk, nvt_idx);
>>>> +    bool block_group = false; /* TODO (PowerNV) */
>>>> +
>>>> +    /* TODO (PowerNV): ignore low order bits of nvt id */
>>>> +
>>>> +    switch (ring) {
>>>> +    case TM_QW3_HV_PHYS:
>>>> +        return (w2 & TM_QW3W2_VT) && xive_tctx_hw_cam_line(tctx, block_group) ==
>>>> +            tctx_hw_cam_line(block_group, nvt_blk, nvt_idx);
>>>
>>> The difference between "xive_tctx_hw_cam_line" and "tctx_hw_cam_line"
>>> here is far from obvious.  
>>
>> yes. I lacked inspiration ...
> 
> I'd suggest that the one which takes the tctx as a parameter be
> tctx_hw_cam_line() and the other be nvt_hw_cam_line() or similar.  The
> crucial difference here is that one is what the thread is looking for,
> the other is what the NVT is advertising.
> 
>>> Remember that namespacing prefixes aren't
>>> necessary for static functions, which can let you give more
>>> descriptive names without getting excessively long.
>>
>> OK.
>>  
>>>> +    case TM_QW2_HV_POOL:
>>>> +        return (w2 & TM_QW2W2_VP) && (cam == GETFIELD(TM_QW2W2_POOL_CAM, w2));
>>>> +
>>>> +    case TM_QW1_OS:
>>>> +        return (w2 & TM_QW1W2_VO) && (cam == GETFIELD(TM_QW1W2_OS_CAM, w2));
>>>> +
>>>> +    case TM_QW0_USER:
>>>> +        return ((w2 & TM_QW1W2_VO) && (cam == GETFIELD(TM_QW1W2_OS_CAM, w2)) &&
>>>> +                (w2 & TM_QW0W2_VU) &&
>>>> +                (logic_serv == GETFIELD(TM_QW0W2_LOGIC_SERV, w2)));
>>>> +
>>>> +    default:
>>>> +        g_assert_not_reached();
>>>> +    }
>>>> +}
>>>> +
>>>> +static int xive_presenter_tctx_match(XiveTCTX *tctx, uint8_t format,
>>>> +                                     uint8_t nvt_blk, uint32_t nvt_idx,
>>>> +                                     bool cam_ignore, uint32_t logic_serv)
>>>> +{
>>>> +    if (format == 0) {
>>>> +        /* F=0 & i=1: Logical server notification */
>>>> +        if (cam_ignore == true) {
>>>> +            qemu_log_mask(LOG_GUEST_ERROR, "XIVE: no support for LS "
>>>> +                          "NVT %x/%x\n", nvt_blk, nvt_idx);
>>>> +             return -1;
>>>> +        }
>>>> +
>>>> +        /* F=0 & i=0: Specific NVT notification */
>>>> +        if (xive_tctx_ring_match(tctx, TM_QW3_HV_PHYS,
>>>> +                                nvt_blk, nvt_idx, false, 0)) {
>>>> +            return TM_QW3_HV_PHYS;
>>>> +        }
>>>> +        if (xive_tctx_ring_match(tctx, TM_QW2_HV_POOL,
>>>> +                                nvt_blk, nvt_idx, false, 0)) {
>>>> +            return TM_QW2_HV_POOL;
>>>> +        }
>>>> +        if (xive_tctx_ring_match(tctx, TM_QW1_OS,
>>>> +                                nvt_blk, nvt_idx, false, 0)) {
>>>> +            return TM_QW1_OS;
>>>> +        }
>>>
>>> Hm.  It's a bit pointless to iterate through each ring calling a
>>> common function, when that "common" function consists entirely of a
>>> switch which makes it not really common at all.
>>>
>>> So I think you want separate helper functions for each ring's match,
>>> or even just fold the previous function into this one.
>>
>> yes. It can be improved. I did try different layouts. I might just fold 
>> both routine in one as you propose.  
>>
>>>> +    } else {
>>>> +        /* F=1 : User level Event-Based Branch (EBB) notification */
>>>> +        if (xive_tctx_ring_match(tctx, TM_QW0_USER,
>>>> +                                nvt_blk, nvt_idx, false, logic_serv)) {
>>>> +            return TM_QW0_USER;
>>>> +        }
>>>> +    }
>>>> +    return -1;
>>>> +}
>>>> +
>>>> +typedef struct XiveTCTXMatch {
>>>> +    XiveTCTX *tctx;
>>>> +    uint8_t ring;
>>>> +} XiveTCTXMatch;
>>>> +
>>>> +static bool xive_presenter_match(XiveRouter *xrtr, uint8_t format,
>>>> +                                 uint8_t nvt_blk, uint32_t nvt_idx,
>>>> +                                 bool cam_ignore, uint8_t priority,
>>>> +                                 uint32_t logic_serv, XiveTCTXMatch *match)
>>>> +{
>>>> +    CPUState *cs;
>>>> +
>>>> +    /* TODO (PowerNV): handle chip_id overwrite of block field for
>>>> +     * hardwired CAM compares */
>>>> +
>>>> +    CPU_FOREACH(cs) {
>>>> +        PowerPCCPU *cpu = POWERPC_CPU(cs);
>>>> +        XiveTCTX *tctx = XIVE_TCTX(cpu->intc);
>>>> +        int ring;
>>>> +
>>>> +        /*
>>>> +         * HW checks that the CPU is enabled in the Physical Thread
>>>> +         * Enable Register (PTER).
>>>> +         */
>>>> +
>>>> +        /*
>>>> +         * Check the thread context CAM lines and record matches. We
>>>> +         * will handle CPU exception delivery later
>>>> +         */
>>>> +        ring = xive_presenter_tctx_match(tctx, format, nvt_blk, nvt_idx,
>>>> +                                         cam_ignore, logic_serv);
>>>> +        /*
>>>> +         * Save the context and follow on to catch duplicates, that we
>>>> +         * don't support yet.
>>>> +         */
>>>> +        if (ring != -1) {
>>>> +            if (match->tctx) {
>>>> +                qemu_log_mask(LOG_GUEST_ERROR, "XIVE: already found a thread "
>>>> +                              "context NVT %x/%x\n", nvt_blk, nvt_idx);
>>>> +                return false;
>>>> +            }
>>>> +
>>>> +            match->ring = ring;
>>>> +            match->tctx = tctx;
>>>> +        }
>>>> +    }
>>>> +
>>>> +    if (!match->tctx) {
>>>> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: NVT %x/%x is not dispatched\n",
>>>> +                      nvt_blk, nvt_idx);
>>>> +        return false;
>>>
>>> Hmm.. this isn't actually an error isn't it? At least not for powernv
>>
>> It is on sPAPR, it would mean the END was configured with an unknow CPU. 
> 
> Right.
> 
>> It is not error on PowerNV, when we support escalations.
>>
>>> - that just means the NVT isn't currently dispatched, so we'll need to
>>> trigger the escalation interrupt.  
>>
>> Yes.
>>
>>> Does this get changed later in the series?
>>
>> No.
> 
> But this code is common to PAPR and powernv, yes, so it will need to?

When we add support for escalations, yes, it will change. Would you rather
use an error_report() until then ? 

C.

David Gibson Dec. 4, 2018, 1:54 a.m. UTC | #9

On Mon, Dec 03, 2018 at 06:05:12PM +0100, Cédric Le Goater wrote:
> I forgot to reply to this one.
> 
> On 11/29/18 1:47 AM, David Gibson wrote:
> > On Wed, Nov 28, 2018 at 11:59:58AM +0100, Cédric Le Goater wrote:
> >> On 11/28/18 12:49 AM, David Gibson wrote:
> >>> On Fri, Nov 16, 2018 at 11:57:01AM +0100, Cédric Le Goater wrote:
> >>>> The last sub-engine of the XIVE architecture is the Interrupt
> >>>> Virtualization Presentation Engine (IVPE). On HW, they share elements,
> >>>> the Power Bus interface (CQ), the routing table descriptors, and they
> >>>> can be combined in the same HW logic. We do the same in QEMU and
> >>>> combine both engines in the XiveRouter for simplicity.
> >>>
> >>> Ok, I'm not entirely convinced combining the IVPE and IVRE into a
> >>> single object is a good idea, but we can probably discuss that once
> >>> I've read further.
> >>
> >> We could introduce a simplified presenter for sPAPR but I am not even
> >> sure of that as it will get more complex if we support the EBB
> >> one day.

[snip]
> >>>> +static inline uint32_t xive_tctx_cam_line(uint8_t nvt_blk, uint32_t nvt_idx)
> >>>> +{
> >>>> +    return (nvt_blk << 19) | nvt_idx;
> >>>
> >>> I'm guessing this formula is the standard way of combining the NVT
> >>> block and index into a single word?  
> >>
> >> That number is the VP/NVT identifier which is written in the CAM value. 
> >> The index is on 19 bits because of the NVT  definition in the END 
> >> structure. It is being increased to 24 bits on Power10 
> >>
> >>> If so, I think we should
> >>> standardize on passing a single word "nvt_id" around and only
> >>> splitting it when we need to use the block separately.  
> >>
> >> This is really the only place where we concatenate the two NVT values,
> >> block and index. 
> > 
> > Hm, ok.  I know we don't model them (yet, maybe ever) but could
> > combined values appear in the PowerBUS messages that handle remote
> > notifications?
> 
> They do. 
>  
> >>> Same goes for
> >>> the end_id, assuming there's a standard way of putting that into a
> >>> single word.  That will address the point I raised earlier about lisn
> >>> being passed around as a single word, but these later stage ids being
> >>> split.
> >>
> >> Hmm, I am not sure this is a good option. It is not how the PowerNV 
> >> model would use it, skiboot is very much aware of these blocks and 
> >> indexes and for remote accesses chips are identified using the block. 
> >> I will take a look at it but I am not found of it. I can add helpers 
> >> in some places though.    
> > 
> > Hm, ok.  Do the block and index appear as an (effectively) single
> > field in the EAS?
> 
> no. In all XIVE structures, block and index are always distinct.

Hm.  Distinct in what sense?  I get that the fields are labelled
separately in the documentation, but if the fields are adjacent, you
could equally well treat them as one.

> >>>> +    if (!match->tctx) {
> >>>> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: NVT %x/%x is not dispatched\n",
> >>>> +                      nvt_blk, nvt_idx);
> >>>> +        return false;
> >>>
> >>> Hmm.. this isn't actually an error isn't it? At least not for powernv
> >>
> >> It is on sPAPR, it would mean the END was configured with an unknow CPU. 
> > 
> > Right.
> > 
> >> It is not error on PowerNV, when we support escalations.
> >>
> >>> - that just means the NVT isn't currently dispatched, so we'll need to
> >>> trigger the escalation interrupt.  
> >>
> >> Yes.
> >>
> >>> Does this get changed later in the series?
> >>
> >> No.
> > 
> > But this code is common to PAPR and powernv, yes, so it will need to?
> 
> When we add support for escalations, yes, it will change. Would you rather
> use an error_report() until then ?

Ah, I guess leaving an error until we implement escalation makes
sense.  It shouldn't be LOG_GUEST_ERROR, though, the guest didn't do
anything wrong, and error_report() doesn't really make sense for the
same reason.

LOG_UNIMP, I guess?

Cédric Le Goater Dec. 4, 2018, 5:04 p.m. UTC | #10

On 12/4/18 2:54 AM, David Gibson wrote:
> On Mon, Dec 03, 2018 at 06:05:12PM +0100, Cédric Le Goater wrote:
>> I forgot to reply to this one.
>>
>> On 11/29/18 1:47 AM, David Gibson wrote:
>>> On Wed, Nov 28, 2018 at 11:59:58AM +0100, Cédric Le Goater wrote:
>>>> On 11/28/18 12:49 AM, David Gibson wrote:
>>>>> On Fri, Nov 16, 2018 at 11:57:01AM +0100, Cédric Le Goater wrote:
>>>>>> The last sub-engine of the XIVE architecture is the Interrupt
>>>>>> Virtualization Presentation Engine (IVPE). On HW, they share elements,
>>>>>> the Power Bus interface (CQ), the routing table descriptors, and they
>>>>>> can be combined in the same HW logic. We do the same in QEMU and
>>>>>> combine both engines in the XiveRouter for simplicity.
>>>>>
>>>>> Ok, I'm not entirely convinced combining the IVPE and IVRE into a
>>>>> single object is a good idea, but we can probably discuss that once
>>>>> I've read further.
>>>>
>>>> We could introduce a simplified presenter for sPAPR but I am not even
>>>> sure of that as it will get more complex if we support the EBB
>>>> one day.
> 
> [snip]
>>>>>> +static inline uint32_t xive_tctx_cam_line(uint8_t nvt_blk, uint32_t nvt_idx)
>>>>>> +{
>>>>>> +    return (nvt_blk << 19) | nvt_idx;
>>>>>
>>>>> I'm guessing this formula is the standard way of combining the NVT
>>>>> block and index into a single word?  
>>>>
>>>> That number is the VP/NVT identifier which is written in the CAM value. 
>>>> The index is on 19 bits because of the NVT  definition in the END 
>>>> structure. It is being increased to 24 bits on Power10 
>>>>
>>>>> If so, I think we should
>>>>> standardize on passing a single word "nvt_id" around and only
>>>>> splitting it when we need to use the block separately.  
>>>>
>>>> This is really the only place where we concatenate the two NVT values,
>>>> block and index. 
>>>
>>> Hm, ok.  I know we don't model them (yet, maybe ever) but could
>>> combined values appear in the PowerBUS messages that handle remote
>>> notifications?
>>
>> They do. 
>>  
>>>>> Same goes for
>>>>> the end_id, assuming there's a standard way of putting that into a
>>>>> single word.  That will address the point I raised earlier about lisn
>>>>> being passed around as a single word, but these later stage ids being
>>>>> split.
>>>>
>>>> Hmm, I am not sure this is a good option. It is not how the PowerNV 
>>>> model would use it, skiboot is very much aware of these blocks and 
>>>> indexes and for remote accesses chips are identified using the block. 
>>>> I will take a look at it but I am not found of it. I can add helpers 
>>>> in some places though.    
>>>
>>> Hm, ok.  Do the block and index appear as an (effectively) single
>>> field in the EAS?
>>
>> no. In all XIVE structures, block and index are always distinct.
> 
> Hm.  Distinct in what sense?  I get that the fields are labelled
> separately in the documentation, but if the fields are adjacent, you
> could equally well treat them as one.

yes. Indeed. They are adjacent. The size of the index is subject to 
change in P10. 

I am not sure that treating them as one will be of any help because 
we need to extract them from their XIVE structure with the *_INDEX 
and *_BLOCK masks first. I will take a look. May be not in v6.


>>>>>> +    if (!match->tctx) {
>>>>>> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: NVT %x/%x is not dispatched\n",
>>>>>> +                      nvt_blk, nvt_idx);
>>>>>> +        return false;
>>>>>
>>>>> Hmm.. this isn't actually an error isn't it? At least not for powernv
>>>>
>>>> It is on sPAPR, it would mean the END was configured with an unknow CPU. 
>>>
>>> Right.
>>>
>>>> It is not error on PowerNV, when we support escalations.
>>>>
>>>>> - that just means the NVT isn't currently dispatched, so we'll need to
>>>>> trigger the escalation interrupt.  
>>>>
>>>> Yes.
>>>>
>>>>> Does this get changed later in the series?
>>>>
>>>> No.
>>>
>>> But this code is common to PAPR and powernv, yes, so it will need to?
>>
>> When we add support for escalations, yes, it will change. Would you rather
>> use an error_report() until then ?
> 
> Ah, I guess leaving an error until we implement escalation makes
> sense.  It shouldn't be LOG_GUEST_ERROR, though, the guest didn't do
> anything wrong, and error_report() doesn't really make sense for the
> same reason.
> 
> LOG_UNIMP, I guess?

OK. will do.

Thanks,

C.

David Gibson Dec. 5, 2018, 1:40 a.m. UTC | #11

On Tue, Dec 04, 2018 at 06:04:13PM +0100, Cédric Le Goater wrote:
> On 12/4/18 2:54 AM, David Gibson wrote:
> > On Mon, Dec 03, 2018 at 06:05:12PM +0100, Cédric Le Goater wrote:
> >> I forgot to reply to this one.
> >>
> >> On 11/29/18 1:47 AM, David Gibson wrote:
> >>> On Wed, Nov 28, 2018 at 11:59:58AM +0100, Cédric Le Goater wrote:
> >>>> On 11/28/18 12:49 AM, David Gibson wrote:
> >>>>> On Fri, Nov 16, 2018 at 11:57:01AM +0100, Cédric Le Goater wrote:
> >>>>>> The last sub-engine of the XIVE architecture is the Interrupt
> >>>>>> Virtualization Presentation Engine (IVPE). On HW, they share elements,
> >>>>>> the Power Bus interface (CQ), the routing table descriptors, and they
> >>>>>> can be combined in the same HW logic. We do the same in QEMU and
> >>>>>> combine both engines in the XiveRouter for simplicity.
> >>>>>
> >>>>> Ok, I'm not entirely convinced combining the IVPE and IVRE into a
> >>>>> single object is a good idea, but we can probably discuss that once
> >>>>> I've read further.
> >>>>
> >>>> We could introduce a simplified presenter for sPAPR but I am not even
> >>>> sure of that as it will get more complex if we support the EBB
> >>>> one day.
> > 
> > [snip]
> >>>>>> +static inline uint32_t xive_tctx_cam_line(uint8_t nvt_blk, uint32_t nvt_idx)
> >>>>>> +{
> >>>>>> +    return (nvt_blk << 19) | nvt_idx;
> >>>>>
> >>>>> I'm guessing this formula is the standard way of combining the NVT
> >>>>> block and index into a single word?  
> >>>>
> >>>> That number is the VP/NVT identifier which is written in the CAM value. 
> >>>> The index is on 19 bits because of the NVT  definition in the END 
> >>>> structure. It is being increased to 24 bits on Power10 
> >>>>
> >>>>> If so, I think we should
> >>>>> standardize on passing a single word "nvt_id" around and only
> >>>>> splitting it when we need to use the block separately.  
> >>>>
> >>>> This is really the only place where we concatenate the two NVT values,
> >>>> block and index. 
> >>>
> >>> Hm, ok.  I know we don't model them (yet, maybe ever) but could
> >>> combined values appear in the PowerBUS messages that handle remote
> >>> notifications?
> >>
> >> They do. 
> >>  
> >>>>> Same goes for
> >>>>> the end_id, assuming there's a standard way of putting that into a
> >>>>> single word.  That will address the point I raised earlier about lisn
> >>>>> being passed around as a single word, but these later stage ids being
> >>>>> split.
> >>>>
> >>>> Hmm, I am not sure this is a good option. It is not how the PowerNV 
> >>>> model would use it, skiboot is very much aware of these blocks and 
> >>>> indexes and for remote accesses chips are identified using the block. 
> >>>> I will take a look at it but I am not found of it. I can add helpers 
> >>>> in some places though.    
> >>>
> >>> Hm, ok.  Do the block and index appear as an (effectively) single
> >>> field in the EAS?
> >>
> >> no. In all XIVE structures, block and index are always distinct.
> > 
> > Hm.  Distinct in what sense?  I get that the fields are labelled
> > separately in the documentation, but if the fields are adjacent, you
> > could equally well treat them as one.
> 
> yes. Indeed. They are adjacent. The size of the index is subject to 
> change in P10. 

Ah, ok.  If the boundary might change in P10 that is indeed a reason
to keep them separate.

> I am not sure that treating them as one will be of any help because 
> we need to extract them from their XIVE structure with the *_INDEX 
> and *_BLOCK masks first. I will take a look. May be not in v6.

Ok.

> >>>>>> +    if (!match->tctx) {
> >>>>>> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: NVT %x/%x is not dispatched\n",
> >>>>>> +                      nvt_blk, nvt_idx);
> >>>>>> +        return false;
> >>>>>
> >>>>> Hmm.. this isn't actually an error isn't it? At least not for powernv
> >>>>
> >>>> It is on sPAPR, it would mean the END was configured with an unknow CPU. 
> >>>
> >>> Right.
> >>>
> >>>> It is not error on PowerNV, when we support escalations.
> >>>>
> >>>>> - that just means the NVT isn't currently dispatched, so we'll need to
> >>>>> trigger the escalation interrupt.  
> >>>>
> >>>> Yes.
> >>>>
> >>>>> Does this get changed later in the series?
> >>>>
> >>>> No.
> >>>
> >>> But this code is common to PAPR and powernv, yes, so it will need to?
> >>
> >> When we add support for escalations, yes, it will change. Would you rather
> >> use an error_report() until then ?
> > 
> > Ah, I guess leaving an error until we implement escalation makes
> > sense.  It shouldn't be LOG_GUEST_ERROR, though, the guest didn't do
> > anything wrong, and error_report() doesn't really make sense for the
> > same reason.
> > 
> > LOG_UNIMP, I guess?
> 
> OK. will do.
> 
> Thanks,
> 
> C.
>

[v5,08/36] ppc/xive: introduce a simplified XIVE presenter

Commit Message

Comments

Patch